Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

Tensorflow
In Practice
By Donghwi Cha
with the help of engineering

2

Summary
- An introduction to Tensorflow
- Goal
- Get used to terms
- Basics about its structure
- catch up basic pillars of tensorflow
- This document
- based on web search and Tensorflow document
( www.tensorflow.org )

3

Summary
- The standpoint of this session
Graph
Train
cluster
(replica)
Bird's eye view
(Tensorflow)
Deep Dive
( mathematics,
whitepapers.. )
Here !!

4

Short intro

5

Intro to Deep learning
1) What is it?
a) Perceptron, deep neutral network
b) A type of Machine learning

6

Tensorflow,
as a Computation graph engine

7

Tensorflow in practice by Engineer - donghwi cha

8

Computation Graph(1/2)
1) Graph algorithm with computation
a) Graph with equations, Expression
b) Parallel processing -> devices, hosts..
c) Theano, Pytorch, ...
X
F(u) = u^T
?
X
F(u) = u^T
?
A
?
f(U, V) = UV
x^T x^T*A
X
?
A
?
?
Y
b
?
c
Y= x^Ax + b · x + c

9

Computation Graph(2/2)
2) Graph on Tensorboard
a = tf.add(1, 2,)
b = tf.multiply(a, 3)
c = tf.add(4, 5,)
d = tf.multiply(c, 6,)
e = tf.multiply(4, 5,)
f = tf.div(c, 6,)
g = tf.add(b, d)
h = tf.multiply(g, f)
tf.summary.scalar("h", h)
merged_summary_op = tf.summary.merge_all()
with tf.Session() as sess:
summary_writer =
tf.summary.FileWriter('output',graph=tf.get_default_graph())
h_val, summary = sess.run([h,merged_summary_op])
summary_writer.add_summary(summary)

10

Ops of Computation graph (1/3)
1) Ops -> registered on Graph, compiled on runtime
2) Graph management -> default graph, Lifecycle managed with Session
3) How Ops is registered to Graph
a) tf.matmul (call) -> math_ops.py (pass args) -> gen_math_ops ( create and attach op to
graph with op_def_library)
b) Later, compiled on sess.run
tensorflow/python/ops/math_ops.py
def matmul(a, b, ...
....
else:
return gen_math_ops._mat_mul(
a, b, transp... , name=name)
tensorflow/python/ops/gen_math_ops.py
def _InitOpDefLibrary(op_list_proto_bytes):
..
op_def_lib = _op_def_library.OpDefLibrary()
op_def_lib.add_op_list(op_list)
return op_def_lib
_op_def_lib = _InitOpDefLibrary
def _mat_mul(a, b,
result = _op_def_lib.apply_op("MatMul", a=..
tensorflow/python/framework/op_def_library.py
def apply_op(self, op_t...
outpu..., op = self._apply_op_helper( ...
def _apply_op_helper(self, op_...
....
g =ops._get_graph_from_inputs(..
with g.as_default(), ops.nam.. as scope:
...
op = g.create_op(op_type_name, inp...
return ou..., op_def.is_stateful, op

11

Ops of Computation graph (2/3)
2) Kernel
a) Kernel with 200 standard operations
b) Source of operational kernels in Tensorflow
i) Eigen::Tensor(아이겐 텐서)
ii) CuDNN (쿠디엔엔)
c) Ops of gen_X : generated by C++
i) example
(1) /usr/local/lib/python2.7/dist-
packages/tensorflow/python/ops/gen_math_ops.py
3) Further..
a) Static Graph vs Dynamic Graph (TensorFlow Fold, Pytorch ...)

12

Ops of Computation graph (3/3)
1) How to implement backend
a) tensorflow document (link)
b) Define the op's interface
c) Implement the kernel for the op
CPU GPU

13

Protocol Buffer of Graph
1) Protocol Buffer
a) A type of protocol for data
i) Json alike, but based on Binary stream, small size
ii) made by Google
b) Graph
i) A protocol Buffer object where Ops is registered
c) Graph Deserialize
i) can be used to serve a web service
ii) freeze_graph (a tool)
(1) tensorflow/python/tools/freeze_graph.py
iii) export (a class)
(1) saved_model_builder

14

Tensor and tf.tensor
1) Tensor
a) Generalisation of vector, scala
b) flows on computation graph (ops) => Tensorflow
2) tf.tensor
a) tf.tensor as a function pointer without any value (Pointer)
b) tf.tensor as a return type of OPs
c) Calculation of Ops by kernel implementations
3) tf.tensor and Numpy Array
a) Both of them is related with Array value
b) conversion
i) tf.tensor -> Numpy Array (O),
ii) reverse -> tf.convert_to_tensor (constant)

15

Session (1/3) How tensorflow code is built and executed
1) Scope: Define and Run
a) Define Graph (prepare Objects)
b) Compute Graph with Session
2) Session Object
a) Context on runtime
b) tf.Operation
i) tf.Variable().Initializer
ii) Matmul, Softmax
c) Multiple Argument
d) Eval Method with default sess
i) "default sess" or "y.eval(session=ss)"
x = tf.constant([[37.0, -23.0], [1.0, 4.0]])
w = tf.Variable(tf.random_uniform([2, 2]))
y = tf.matmul(x, w)
output = tf.nn.softmax(y)
init_op = w.initializer
with tf.Session() as sess:
print("-- sess.run: 1 arg --")
sess.run(init_op)
print(sess.run(output))
print("-- sess.run: 2 args --")
y_val, output_val = sess.run([y, output])
print(y_val)
print(output_val)
print("-- Y Eval -- ")
print(y.eval())

16

Session (2/3)
3) Session class
a) python/client/session.py module
b) assigned devices, Graph, run(), close()
class SessionInterface(object):
@property
def graph(self):
def run(self, fetches,
..
class BaseSession(SessionInterface):
def __init__(self, targe..
def list_devices(self):
@property
def graph(self):
def run(self,...
class Session(BaseSession):
def __enter__(self):
def __exit__(self, ...):
...
setup graph
setup config
create session
return device list
return graph info
run based on
argument array type
Context
management
Set of actions that
Session Inst will do

17

Session (3/3)
4) Session.run()
a) Python to C++
b) Swig (Simplified Wrapper and Interface Generator )
tf_session.TF_SessionPRun_wrapper
tf_session.TF_PRun
pywrap_tensorflow
_pywrap_tensorflow.so
Python Session Object python/client/tf_sessio
n_helper.cc
-> C++ client
C++ Kernel
Interface API
Processing Arguments
Python and C binding C ++

18

Training

19

Tensorflow in practice by Engineer - donghwi cha

20

Training - basics building model of neutron net
X
(Placeholder)
tf.nn.relu
tf.nn.relu
output
tf.matmul
B1
W1
tf.add
tf.matmul
B1
W1
tf.add
W1
W2B1
B2
Training
flow
Input data

21

Training - basics
1) Example: Multi-layer
Epoch: 0001 cost= 54.984976748
Epoch: 0002 cost= 12.827661085
...
Epoch: 0024 cost= 0.237979749
Epoch: 0025 cost= 0.212254504
Optimization Finished!
Accuracy: 0.9347
Sample MNIST code
https://raw.githubusercontent.com/aymericdamien/Tenso
rFlow-
Examples/master/examples/4_Utils/tensorboard_advance
d.py

22

Training - Terms
1) Training
a) repeating feedforward and backprop
b) updating variables (weight, bias) on each batch
2) Epoch (에포크: 주기)
a) Iteration of all training sets
b) Steps = "number of training sets" / "batch"
3) Batch (배치)
a) Gradient Descent (feedforward and backprop)
b) loading image : size of memory (GPU)
c) tf.train.batch(..., allow_smaller_final_batch)
4) Training Accuracy
a) minimum loss(or cost) -> for building model

23

Train - Back Propagation
1) creating Train OP
a) Loss Op
b) Optimizer
2) Loss Op
a) diff btw ideal(true) value
and the predicted one
b) Any op for calculation
3) Optimizer
a) type of Gradient Descent
(경사하강법)
4) run on session
a) pass over as array

24

Train - Back Propagation
1) Optimizers on Tensorflow
a) variants of Gradient Descent
b) training OP on session
c) Stochastic Gradient Descent
i) stəˈkastik (Random)
ii) 경사 하강법
d) vs "Batch Gradient Descent"

25

Run - Inference without training
1) meaning of Inference
a) exactly -> Inference layer of model
b) usually -> without training
2) how to
a) load graph and predict
b) Estimator.predict

26

Run - tf.train.Saver
1) Saving Graph
a) during training, save model (in-graph, between-graph)
b) Save, restore from Checkpoint
< cluster sync
replication >
< How to save
during training >

27

Training example

28

example: CNN model of cnn
1) convolution filter
a) Feature Extraction
Using Convolution
b) Image handling
using matrix dot product
c) feature map
1) max pooling
a) resizing image

29

example: CNN model of cnn
1) Dropout
a) dropping out units
in a Neutral network
b) regularization technique
c) overfitting
1) Subsampling

30

example: CNN sample code - model

31

example: CNN sample code - session
From
https://github.com/aymericdamien/TensorFl
ow-
Examples/blob/master/examples/3_NeuralN
etworks/convolutional_network_raw.py

32

Replication and Cluster

33

Tensorflow in practice by Engineer - donghwi cha

34

Tensorflow in Parallel
1) Parallelism on running computation graph
Model
Model
Model
Model
Data
Worker 1 Worker 2
Worker 3 Worker 4
1) Data Parallelism
Data
Dev1 Dev2
Dev3 Dev4
Model
2) Model Parallelism
Or
3) Mix Match

35

Tensorflow in Parallel - Model Parallelism
1) Model Parallelism
Data
Dev1 Dev2
Dev3 Dev4
Model
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
sess =
tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(sum))

36

Tensorflow in Parallel - Data Parallelism
Preference to
"Replication with Data Parallelism"
over Model Parallelism

37

Tensorflow in Parallel - Data Parallelism
1) Types of Data Parallelism
a) In-graph
Asnyc Sync
In-graph
Between-graph

38

Tensorflow in Parallel - Data Parallelism
1) In-graph replication
a) One graph with tasks of ps, worker (<-> model Parallel)
b) manually assign compute intensive graph to GPU
with tf.device("/job:ps/task:0"):
weights_1 = tf.Variable(...)
biases_1 = tf.Variable(...)
...
worker_devices = ["/job:worker/task:0/gpu:0", ..., "/job:worker/task:7/gpu:0"]
for worker_device in worker_devices:
with tf.device(worker_device):
input, labels = ...
layer_1 = tf.nn.relu(tf.matmul(input, weights_1) + biases_1)
logits = tf.nn.relu(tf.matmul(layer_1, weights_2) + iases_2)
train_op = ...
with tf.Session("grpc://worker7.example.com:2222") as sess:
for _ in range(10000):
sess.run(train_op)
Manually assign multiple
devices on compute
intensive work
-> replicating model on
multiple device
Setup Param Server

39

Tensorflow in Parallel - Data Parallelism
1) In-graph replication -> Sync
a) Tower, SyncReplicasOptimizer, global_step
Below for-each loop with Devices,
Variables assigned

40

Tensorflow in Parallel - Data Parallelism
1) In-graph replication -> Sync
a) dividing up batch of data
across the GPUs.
b) all GPUs share the model param
c) data, to and from GPUs is slow
i) model param the CPU
d) GPUs synchronized
i) All gradients accumulated
ii) model updated
across replica

41

Tensorflow in Parallel - Data Parallelism
1) In-graph replication -> Sync
a) When to use
i) (O) General environment with GPUs
ii) (X) Effective Single GPU cluster with multiple GPUs
using CUDA Peer to Peer

42

Tensorflow in Parallel - Data Parallelism
1) Types of Data Parallelism
a) Between-graph
Asnyc Sync
In-graph
Between-graph

43

Tensorflow in Parallel - Data Parallelism
1) Between-graph replication
a) Sync, Async
b) Add -> Update

44

Tensorflow in Parallel - Data Parallelism
1) Between-graph replication
a) multi Graphs (distributed with similar or same graph)
b) Server, worker process
p
$ python mnist_replica.py --ps_hosts=ps-server:12222 --
worker_hosts=worker1:12223,worker2:12224 --job_name=ps
--task_index=0 --num_gpus=0 --sync_replicas=true
$ python mnist_replica.py --ps_hosts=ps-server:12222 --
worker_hosts=worker1:12223,worker2:12224 --
job_name=worker --task_index=0 --num_gpus=0 --
sync_replicas=true
Index0
Index1
Index0
Index1
Python
client
Server
of client
worker
of client
Param
Server

45

Tensorflow in Parallel - Data Parallelism
1) Between-graph replication
a) How to prepare
1) Create ClusterSpec, Server
2) Create Model under Device Scope
3) Create Supervisor, run with session
1
2
3

46

Tensorflow in Parallel - Data Parallelism
1) Between-graph replication -> Sync
a) Supervisor, tf.train.SyncReplicasOptimizer, global Session
b) Chief worker (index==0)

47

Serving
1) Tensorflow Serving
a) How to
i) Generate model after training
ii) setup server with model, and connect client

48

Serving
1) serving.api
a) Data input, response handling
b) Get prediction

49

Serving
1) Prediction_signature, Claasification_signature
2) Metagraph
Model Train
and Save
Serving and
inference

50

Tools and Framework

51

Tools
1) Jupyter

52

Tools
1) Tensorboard

53

Other tools

54

Tools
1) Other DL framework
a) Pytorch(Facebook)
b) MXnet(Apache foundation, Amazon)
c) CNTK(Microsoft Cognitive Toolkit)
d) Caffe (Berkeley Vision and Learning Center)
e) Deeplarning4J (Eclipse Foundation)
2) Keras
a) Abstraction layer Framework
3) Theano
a) a model for Tensorflow
b) To Cease Development After Version 1.0

55

Keras
1) what is Keras ?
a) abstraction class
b) Write once, run "with" any framework
Keras Python
Theano
Tensorflow
MXnet

56

Keras
1) Serving for production
a) configure and create model
b) use tensorflow serving
< From Keras Document >

57

Comparison
1) Tool sets (from documents of each tool)
data type Operation graph gpu
cluster,
data parallel
Tensorflow tf.Tensor tf.Operation
(ex: tf.mat_mul)
tf.Graph
(static by default,
TF.fold.. )
supports with
parameter
supports
Pytorch torch.Tensor torch package
(ex: torch.matmul)
dynamic supports with
parameter
supports
(torch.distribute
d.all_reduce)
MXnet mx.nd.array mx.nd package
(ex:mx.nd.dot)
mxnet.symbol
(static)
supports with
parameter
supports
(kvstore)

More Related Content

Tensorflow in practice by Engineer - donghwi cha

  • 1. Tensorflow In Practice By Donghwi Cha with the help of engineering
  • 2. Summary - An introduction to Tensorflow - Goal - Get used to terms - Basics about its structure - catch up basic pillars of tensorflow - This document - based on web search and Tensorflow document ( www.tensorflow.org )
  • 3. Summary - The standpoint of this session Graph Train cluster (replica) Bird's eye view (Tensorflow) Deep Dive ( mathematics, whitepapers.. ) Here !!
  • 5. Intro to Deep learning 1) What is it? a) Perceptron, deep neutral network b) A type of Machine learning
  • 8. Computation Graph(1/2) 1) Graph algorithm with computation a) Graph with equations, Expression b) Parallel processing -> devices, hosts.. c) Theano, Pytorch, ... X F(u) = u^T ? X F(u) = u^T ? A ? f(U, V) = UV x^T x^T*A X ? A ? ? Y b ? c Y= x^Ax + b · x + c
  • 9. Computation Graph(2/2) 2) Graph on Tensorboard a = tf.add(1, 2,) b = tf.multiply(a, 3) c = tf.add(4, 5,) d = tf.multiply(c, 6,) e = tf.multiply(4, 5,) f = tf.div(c, 6,) g = tf.add(b, d) h = tf.multiply(g, f) tf.summary.scalar("h", h) merged_summary_op = tf.summary.merge_all() with tf.Session() as sess: summary_writer = tf.summary.FileWriter('output',graph=tf.get_default_graph()) h_val, summary = sess.run([h,merged_summary_op]) summary_writer.add_summary(summary)
  • 10. Ops of Computation graph (1/3) 1) Ops -> registered on Graph, compiled on runtime 2) Graph management -> default graph, Lifecycle managed with Session 3) How Ops is registered to Graph a) tf.matmul (call) -> math_ops.py (pass args) -> gen_math_ops ( create and attach op to graph with op_def_library) b) Later, compiled on sess.run tensorflow/python/ops/math_ops.py def matmul(a, b, ... .... else: return gen_math_ops._mat_mul( a, b, transp... , name=name) tensorflow/python/ops/gen_math_ops.py def _InitOpDefLibrary(op_list_proto_bytes): .. op_def_lib = _op_def_library.OpDefLibrary() op_def_lib.add_op_list(op_list) return op_def_lib _op_def_lib = _InitOpDefLibrary def _mat_mul(a, b, result = _op_def_lib.apply_op("MatMul", a=.. tensorflow/python/framework/op_def_library.py def apply_op(self, op_t... outpu..., op = self._apply_op_helper( ... def _apply_op_helper(self, op_... .... g =ops._get_graph_from_inputs(.. with g.as_default(), ops.nam.. as scope: ... op = g.create_op(op_type_name, inp... return ou..., op_def.is_stateful, op
  • 11. Ops of Computation graph (2/3) 2) Kernel a) Kernel with 200 standard operations b) Source of operational kernels in Tensorflow i) Eigen::Tensor(아이겐 텐서) ii) CuDNN (쿠디엔엔) c) Ops of gen_X : generated by C++ i) example (1) /usr/local/lib/python2.7/dist- packages/tensorflow/python/ops/gen_math_ops.py 3) Further.. a) Static Graph vs Dynamic Graph (TensorFlow Fold, Pytorch ...)
  • 12. Ops of Computation graph (3/3) 1) How to implement backend a) tensorflow document (link) b) Define the op's interface c) Implement the kernel for the op CPU GPU
  • 13. Protocol Buffer of Graph 1) Protocol Buffer a) A type of protocol for data i) Json alike, but based on Binary stream, small size ii) made by Google b) Graph i) A protocol Buffer object where Ops is registered c) Graph Deserialize i) can be used to serve a web service ii) freeze_graph (a tool) (1) tensorflow/python/tools/freeze_graph.py iii) export (a class) (1) saved_model_builder
  • 14. Tensor and tf.tensor 1) Tensor a) Generalisation of vector, scala b) flows on computation graph (ops) => Tensorflow 2) tf.tensor a) tf.tensor as a function pointer without any value (Pointer) b) tf.tensor as a return type of OPs c) Calculation of Ops by kernel implementations 3) tf.tensor and Numpy Array a) Both of them is related with Array value b) conversion i) tf.tensor -> Numpy Array (O), ii) reverse -> tf.convert_to_tensor (constant)
  • 15. Session (1/3) How tensorflow code is built and executed 1) Scope: Define and Run a) Define Graph (prepare Objects) b) Compute Graph with Session 2) Session Object a) Context on runtime b) tf.Operation i) tf.Variable().Initializer ii) Matmul, Softmax c) Multiple Argument d) Eval Method with default sess i) "default sess" or "y.eval(session=ss)" x = tf.constant([[37.0, -23.0], [1.0, 4.0]]) w = tf.Variable(tf.random_uniform([2, 2])) y = tf.matmul(x, w) output = tf.nn.softmax(y) init_op = w.initializer with tf.Session() as sess: print("-- sess.run: 1 arg --") sess.run(init_op) print(sess.run(output)) print("-- sess.run: 2 args --") y_val, output_val = sess.run([y, output]) print(y_val) print(output_val) print("-- Y Eval -- ") print(y.eval())
  • 16. Session (2/3) 3) Session class a) python/client/session.py module b) assigned devices, Graph, run(), close() class SessionInterface(object): @property def graph(self): def run(self, fetches, .. class BaseSession(SessionInterface): def __init__(self, targe.. def list_devices(self): @property def graph(self): def run(self,... class Session(BaseSession): def __enter__(self): def __exit__(self, ...): ... setup graph setup config create session return device list return graph info run based on argument array type Context management Set of actions that Session Inst will do
  • 17. Session (3/3) 4) Session.run() a) Python to C++ b) Swig (Simplified Wrapper and Interface Generator ) tf_session.TF_SessionPRun_wrapper tf_session.TF_PRun pywrap_tensorflow _pywrap_tensorflow.so Python Session Object python/client/tf_sessio n_helper.cc -> C++ client C++ Kernel Interface API Processing Arguments Python and C binding C ++
  • 20. Training - basics building model of neutron net X (Placeholder) tf.nn.relu tf.nn.relu output tf.matmul B1 W1 tf.add tf.matmul B1 W1 tf.add W1 W2B1 B2 Training flow Input data
  • 21. Training - basics 1) Example: Multi-layer Epoch: 0001 cost= 54.984976748 Epoch: 0002 cost= 12.827661085 ... Epoch: 0024 cost= 0.237979749 Epoch: 0025 cost= 0.212254504 Optimization Finished! Accuracy: 0.9347 Sample MNIST code https://raw.githubusercontent.com/aymericdamien/Tenso rFlow- Examples/master/examples/4_Utils/tensorboard_advance d.py
  • 22. Training - Terms 1) Training a) repeating feedforward and backprop b) updating variables (weight, bias) on each batch 2) Epoch (에포크: 주기) a) Iteration of all training sets b) Steps = "number of training sets" / "batch" 3) Batch (배치) a) Gradient Descent (feedforward and backprop) b) loading image : size of memory (GPU) c) tf.train.batch(..., allow_smaller_final_batch) 4) Training Accuracy a) minimum loss(or cost) -> for building model
  • 23. Train - Back Propagation 1) creating Train OP a) Loss Op b) Optimizer 2) Loss Op a) diff btw ideal(true) value and the predicted one b) Any op for calculation 3) Optimizer a) type of Gradient Descent (경사하강법) 4) run on session a) pass over as array
  • 24. Train - Back Propagation 1) Optimizers on Tensorflow a) variants of Gradient Descent b) training OP on session c) Stochastic Gradient Descent i) stəˈkastik (Random) ii) 경사 하강법 d) vs "Batch Gradient Descent"
  • 25. Run - Inference without training 1) meaning of Inference a) exactly -> Inference layer of model b) usually -> without training 2) how to a) load graph and predict b) Estimator.predict
  • 26. Run - tf.train.Saver 1) Saving Graph a) during training, save model (in-graph, between-graph) b) Save, restore from Checkpoint < cluster sync replication > < How to save during training >
  • 28. example: CNN model of cnn 1) convolution filter a) Feature Extraction Using Convolution b) Image handling using matrix dot product c) feature map 1) max pooling a) resizing image
  • 29. example: CNN model of cnn 1) Dropout a) dropping out units in a Neutral network b) regularization technique c) overfitting 1) Subsampling
  • 30. example: CNN sample code - model
  • 31. example: CNN sample code - session From https://github.com/aymericdamien/TensorFl ow- Examples/blob/master/examples/3_NeuralN etworks/convolutional_network_raw.py
  • 34. Tensorflow in Parallel 1) Parallelism on running computation graph Model Model Model Model Data Worker 1 Worker 2 Worker 3 Worker 4 1) Data Parallelism Data Dev1 Dev2 Dev3 Dev4 Model 2) Model Parallelism Or 3) Mix Match
  • 35. Tensorflow in Parallel - Model Parallelism 1) Model Parallelism Data Dev1 Dev2 Dev3 Dev4 Model c = [] for d in ['/gpu:2', '/gpu:3']: with tf.device(d): a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3]) b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2]) c.append(tf.matmul(a, b)) with tf.device('/cpu:0'): sum = tf.add_n(c) sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) print(sess.run(sum))
  • 36. Tensorflow in Parallel - Data Parallelism Preference to "Replication with Data Parallelism" over Model Parallelism
  • 37. Tensorflow in Parallel - Data Parallelism 1) Types of Data Parallelism a) In-graph Asnyc Sync In-graph Between-graph
  • 38. Tensorflow in Parallel - Data Parallelism 1) In-graph replication a) One graph with tasks of ps, worker (<-> model Parallel) b) manually assign compute intensive graph to GPU with tf.device("/job:ps/task:0"): weights_1 = tf.Variable(...) biases_1 = tf.Variable(...) ... worker_devices = ["/job:worker/task:0/gpu:0", ..., "/job:worker/task:7/gpu:0"] for worker_device in worker_devices: with tf.device(worker_device): input, labels = ... layer_1 = tf.nn.relu(tf.matmul(input, weights_1) + biases_1) logits = tf.nn.relu(tf.matmul(layer_1, weights_2) + iases_2) train_op = ... with tf.Session("grpc://worker7.example.com:2222") as sess: for _ in range(10000): sess.run(train_op) Manually assign multiple devices on compute intensive work -> replicating model on multiple device Setup Param Server
  • 39. Tensorflow in Parallel - Data Parallelism 1) In-graph replication -> Sync a) Tower, SyncReplicasOptimizer, global_step Below for-each loop with Devices, Variables assigned
  • 40. Tensorflow in Parallel - Data Parallelism 1) In-graph replication -> Sync a) dividing up batch of data across the GPUs. b) all GPUs share the model param c) data, to and from GPUs is slow i) model param the CPU d) GPUs synchronized i) All gradients accumulated ii) model updated across replica
  • 41. Tensorflow in Parallel - Data Parallelism 1) In-graph replication -> Sync a) When to use i) (O) General environment with GPUs ii) (X) Effective Single GPU cluster with multiple GPUs using CUDA Peer to Peer
  • 42. Tensorflow in Parallel - Data Parallelism 1) Types of Data Parallelism a) Between-graph Asnyc Sync In-graph Between-graph
  • 43. Tensorflow in Parallel - Data Parallelism 1) Between-graph replication a) Sync, Async b) Add -> Update
  • 44. Tensorflow in Parallel - Data Parallelism 1) Between-graph replication a) multi Graphs (distributed with similar or same graph) b) Server, worker process p $ python mnist_replica.py --ps_hosts=ps-server:12222 -- worker_hosts=worker1:12223,worker2:12224 --job_name=ps --task_index=0 --num_gpus=0 --sync_replicas=true $ python mnist_replica.py --ps_hosts=ps-server:12222 -- worker_hosts=worker1:12223,worker2:12224 -- job_name=worker --task_index=0 --num_gpus=0 -- sync_replicas=true Index0 Index1 Index0 Index1 Python client Server of client worker of client Param Server
  • 45. Tensorflow in Parallel - Data Parallelism 1) Between-graph replication a) How to prepare 1) Create ClusterSpec, Server 2) Create Model under Device Scope 3) Create Supervisor, run with session 1 2 3
  • 46. Tensorflow in Parallel - Data Parallelism 1) Between-graph replication -> Sync a) Supervisor, tf.train.SyncReplicasOptimizer, global Session b) Chief worker (index==0)
  • 47. Serving 1) Tensorflow Serving a) How to i) Generate model after training ii) setup server with model, and connect client
  • 48. Serving 1) serving.api a) Data input, response handling b) Get prediction
  • 49. Serving 1) Prediction_signature, Claasification_signature 2) Metagraph Model Train and Save Serving and inference
  • 54. Tools 1) Other DL framework a) Pytorch(Facebook) b) MXnet(Apache foundation, Amazon) c) CNTK(Microsoft Cognitive Toolkit) d) Caffe (Berkeley Vision and Learning Center) e) Deeplarning4J (Eclipse Foundation) 2) Keras a) Abstraction layer Framework 3) Theano a) a model for Tensorflow b) To Cease Development After Version 1.0
  • 55. Keras 1) what is Keras ? a) abstraction class b) Write once, run "with" any framework Keras Python Theano Tensorflow MXnet
  • 56. Keras 1) Serving for production a) configure and create model b) use tensorflow serving < From Keras Document >
  • 57. Comparison 1) Tool sets (from documents of each tool) data type Operation graph gpu cluster, data parallel Tensorflow tf.Tensor tf.Operation (ex: tf.mat_mul) tf.Graph (static by default, TF.fold.. ) supports with parameter supports Pytorch torch.Tensor torch package (ex: torch.matmul) dynamic supports with parameter supports (torch.distribute d.all_reduce) MXnet mx.nd.array mx.nd package (ex:mx.nd.dot) mxnet.symbol (static) supports with parameter supports (kvstore)

Editor's Notes

  1. http://www.cs.cornell.edu/courses/cs5740/2017sp/lectures/04-nn-compgraph.pdf
  2. http://www.cs.cornell.edu/courses/cs5740/2017sp/lectures/04-nn-compgraph.pdf
  3. http://ruder.io/optimizing-gradient-descent/
  4. http://ruder.io/optimizing-gradient-descent/