作者:全球人工智慧
前言
Tensorflow當前官網僅包含python、C、Java、Go的發佈包, 並無C++ release包, 並且tensorflow官網也注明了並不保證除python以外庫的穩定性, 在功能方面python也是最完善的。 眾所周知, python在開發效率、易用性上有著巨大的優勢, 但作為一個解釋性語言, 在性能方面還是存在比較大的缺陷, 在各類AI服務化過程中, 採用python作為模型快速構建工具, 使用高階語言(如C++, java)作為服務化程式實現是大勢所趨。 本文重點介紹tensorflow C++服務化過程中實現方式及遇到的各種問題。
實現方案
對於tensorflow c++庫的使用, 有兩種方法:
(1) 最佳方式當然是直接用C++構建graph, 但是當前c++tensorflow庫並不像python api那樣full-featured。
(2) 常用的方式, c++調用python生成好的graph。 本文主要介紹該方案。
實現步驟
(1) 編譯tensorflow源碼C++ so
(2) 模型訓練輸出結果
(3) 模型固化
(4) 模型載入及運行
(5) 運行問題
(1) 源碼編譯
環境要求: 公司tlinux2.2版本, GCC版本 >= 4.8.5
安裝組件: protobuf 3.3.0 bazel 0.5.0 python 2.7 java8
機器要求: 4GB記憶體
a. 安裝java8
yum install java
b. 安裝protobuf 3.3.0
下載 下載https://github.com/google/protobuf/archive/v3.3.0.zip
./configure && make && make install
c. 安裝bazel
download https://github.com/bazelbuild/bazel/releases
sh bazel-0.5.0-installer-linux-x86_64.sh
d. 編譯源碼
最好採用最新release版本:https://github.com/tensorflow/tensorflow/releases
bazel build //tensorflow:libtensorflow_cc.so
編譯過程中可能遇到的問題:
問題一: fatal error: unsupported/Eigen/CXX11/Tensor: No such file or directory
安裝Eigen3.3或以上版本
問題二: java.io.IOException: Cannot run program "patch"
yum install patch
問題三: 記憶體不夠
(2) 模型訓練與輸出
模型訓練輸出可參照改用例去實踐https://blog.metaflow.fr/tensorflow-saving-restoring-and-mixing-multiple-models-c4c94d5d7125 , google上也很多, 模型訓練保存好得到下面檔:
(3) 模型固化
模型固化方式有三種:
a. freeze_graph 工具
bazel build tensorflow/python/tools:freeze_graph && bazel-bin/tensorflow/python/tools/freeze_graph --input_graph=graph.pb --input_checkpoint=checkpoint --output_graph=./frozen_graph.pb --output_node_names=output/output/scoresb. 利用freeze_graph.py工具
# We save out the graph to disk, and then call the const conversion# routine.checkpoint_state_name = "checkpoint"input_graph_name = "graph.pb"output_graph_name = "frozen_graph.pb"input_graph_path = os.path.join(FLAGS.model_dir, input_graph_name)input_saver_def_path = ""input_binary = Falseinput_checkpoint_path = os.path.join(FLAGS.checkpoint_dir, 'saved_checkpoint') + "-0"# Note that we this normally should be only "output_node"!!!output_node_names = "output/output/scores" restore_op_name = "save/restore_all"filename_tensor_name = "save/Const:0"output_graph_path = os.path.join(FLAGS.model_dir, output_graph_name)clear_devices = Falsefreeze_graph.freeze_graph(input_graph_path, input_saver_def_path, input_binary, input_checkpoint_path, output_node_names, restore_op_name, filename_tensor_name, output_graph_path, clear_devices)c. 利用tensorflow python
import os, argparseimport tensorflow as tffrom tensorflow.python.framework import graph_util
dir = os.path.dirname(os.path.realpath(__file__))def freeze_graph(model_folder): # We retrieve our checkpoint fullpath checkpoint = tf.train.get_checkpoint_state(model_folder) input_checkpoint = checkpoint.model_checkpoint_path # We precise the file fullname of our freezed graph absolute_model_folder = "/".join(input_checkpoint.split('/')[:-1]) output_graph = absolute_model_folder + "/frozen_model.pb" print output_graph # Before exporting our graph, we need to precise what is our output node # This is how TF decides what part of the Graph he has to keep and what part it can dump # NOTE: this variable is plural, because you can have multiple output nodes output_node_names = "output/output/scores" # We clear devices to allow TensorFlow to control on which device it will load operations clear_devices = True # We import the meta graph and retrieve a Saver saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=clear_devices) # We retrieve the protobuf graph definition graph = tf.get_default_graph() input_graph_def = graph.as_graph_def() # fix batch norm nodes for node in input_graph_def.node: if node.op == 'RefSwitch': node.op = 'Switch' for index in xrange(len(node.input)): if 'moving_' in node.input[index]: node.input[index] = node.input[index] + '/read' elif node.op == 'AssignSub': node.op = 'Sub' if 'use_locking' in node.attr: del node.attr['use_locking'] # We start a session and restore the graph weights with tf.Session() as sess: saver.restore(sess, input_checkpoint) # We use a built-in TF helper to export variables to constants output_graph_def = graph_util.convert_variables_to_constants( sess, # The session is used to retrieve the weights input_graph_def, # The graph_def is used to retrieve the nodes output_node_names.split(",") # The output node names are used to select the usefull nodes ) # Finally we serialize and dump the output graph to the filesystem with tf.gfile.GFile(output_graph, "wb") as f: f.write(output_graph_def.SerializeToString()) print("%d ops in the final graph." % len(output_graph_def.node))if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("--model_folder", type=str, help="Model folder to export") args = parser.parse_args() freeze_graph(args.model_folder)坑 BatchNorm bug
在具體實際專案, 用方式一與方式二將生成的模型利用tensorflow c++ api載入, 報以上錯誤,
原因是模型中用到了BatchNorm, 修復方式如上面c中給出的方案
(4) 模型載入及運行
構建輸入輸出
模型輸入輸出主要就是構造輸入輸出矩陣, 相比python的numpy庫, tensorflow提供的Tensor和Eigen::Tensor還是非常難用的, 特別是動態矩陣創建, 如果你的編譯器支援C++14, 可以用xTensor庫, 和numpy一樣強大, 並且用法機器類似。 如果是C++11版本就好好看看eigen庫和tensorflow::Tensor文檔吧。 例舉集中簡單的用法:
矩陣賦值:
tensorflow::Tensor four_dim_plane(DT_FLOAT, tensorflow::TensorShape({1, MODEL_X_AXIS_LEN, MODEL_Y_AXIS_LEN, fourth_dim_size}));auto plane_tensor = four_dim_plane.tensorSOFTMAX:
Eigen::Tensor模型載入及session初始化:
int32_t ModelApp::Init(const std::string& graph_file, Logger *logger){ auto status = NewSession(SessionOptions(), &m_session); if (!status.ok()) { LOG_ERR(logger, "New session failed! %s", status.ToString().c_str()); return Error::ERR_FAILED_NEW_TENSORFLOW_SESSION; } GraphDef graph_def; status = ReadBinaryProto(Env::Default(), graph_file, &graph_def); if (!status.ok()) { LOG_ERR(logger, "Read binary proto failed! %s", status.ToString().c_str()); return Error::ERR_FAILED_READ_BINARY_PROTO; } status = m_session->Create(graph_def); if (!status.ok()) { LOG_ERR(logger, "Session create failed! %s", status.ToString().c_str()); return Error::ERR_FAILED_CREATE_TENSORFLOW_SESSION; } return Error::Success;}運行:
0.10以上的tensorflow庫是執行緒安全的, 因此可多執行緒調用predict
int32_t ModelApp::Predict(const Action& action, std::vector(5) 運行問題
問題一:運行告警
2017-08-16 14:11:14.393295: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.2017-08-16 14:11:14.393324: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.2017-08-16 14:11:14.393331: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.2017-08-16 14:11:14.393338: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.是因為在編譯tensorflow so庫的時候沒有把這些CPU加速指令編譯進去, 因此可以在編譯的時候加入加速指令, 在沒有GPU條件下, 加入這些庫實測可以將CPU計算提高10%左右。
bazel build -c opt --copt=-mavx --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 -k //tensorflow:libtensorflow_cc.so需要注意的是並不是所有CPU都支援這些指令,
問題二: C++ libtensorflow和python tensorflow混用
為驗證C++載入模型調用的準確性, 利用swig將c++ api封裝成了python庫供python調用, 在同時import tensorflow as tf和import封裝好的python swig介面時, core dump
該問題tensorflow官方並不打算解決