An Overview inside Distributed Tensorflow Workflow
This article summarizes my understanding of distributed Tensorflow’s workflow. I’d like divide it into following four parts:
- Create a server;
- Create a session;
- Build a computation graph;
- Run a session.
So let’s start from creating a server.
Create a server
grpc_tensorflow_server.cc
Start to create a new server. Each server changes between these three states: Start, Join and Stop.
1 | int main(int argc, char* argv[]) { |
grpc_server_lib.cc
Create a Grpc server and call the Init( ) function
Inside the Init( ) function, firstly it creates both master environment and worker environment.
The master environment mainly holds:
local devices
Worker cache
Master session factory
And worker environment holds:
- Local devices
- Device_mgr
- rendezvous_mgr
- Session_mgr
- Compute_pool
Then the Grpc server starts one thread for master service and another for worker service respectively. We will see how these two services work in the following sections:
1 | class GrpcServerFactory : public ServerFactory { |
Starts the master service. The function HandleRPCsLoop( ) handles the out-coming Grpc request. Each request of function invokes the specific function handler.
grpc_master_service.cc
1 | void HandleRPCsLoop() override { |
Master class is responsible for creating and maintaining a master session:
master.cc
1 | // Master implements the service MasterSerivce. |
master_session.cc
1 | Status MasterSession::Create(GraphDef* graph_def, |
grpc_worker_cache.cc
1 | WorkerInterface* CreateWorker(const string& target) override { |
grpc_remote_worker.cc
1 | void CreateWorkerSessionAsync(const CreateWorkerSessionRequest* request, |
grpc_worker_service.cc
1 | void HandleRPCsLoop() override { |
worker.cc
1 | void Worker::CreateWorkerSessionAsync(const CreateWorkerSessionRequest* request, |
session_mgr.cc
1 | Status SessionMgr::CreateSession(const string& session, |
worker_session.cc
1 | WorkerSession::WorkerSession(const string& session_name, |
Graph execution
A graph definition is passed through the session, and constructed by the master.
graph.proto
1 | // Represents the graph of operations |
node_def.proto
1 | message NodeDef { |
graph_execution.cc
1 | /* static */ Status GraphExecutionState::MakeForBaseGraph( |
graph_constructor.cc
1 | Status ConvertGraphDefToGraph(const GraphConstructorOptions& opts, |
Run session
master_session.cc
1 | Status MasterSession::Run(CallOptions* opts, const RunStepRequestWrapper& req, |
worker.cc
1 | void Worker::RunGraphAsync(CallOptions* opts, RunGraphRequestWrapper* request, |
graph_mgr.cc
1 | void GraphMgr::ExecuteAsync(const string& handle, const int64 step_id, |