- Tensorflow is an introduction to the machine learning framework Tensorflow covering key concepts like computation graphs, operations, sessions, training, replication, and clustering.
- Key aspects discussed include how Tensorflow executes operations as a static computation graph, uses sessions to run graphs and tensors to hold values, and supports data parallelism through replication across devices/workers.
- The document provides examples of building neural network models in Tensorflow and discusses techniques for training models like backpropagation and distributing training using data parallelism.
2. Summary
- An introduction to Tensorflow
- Goal
- Get used to terms
- Basics about its structure
- catch up basic pillars of tensorflow
- This document
- based on web search and Tensorflow document
( www.tensorflow.org )
3. Summary
- The standpoint of this session
Graph
Train
cluster
(replica)
Bird's eye view
(Tensorflow)
Deep Dive
( mathematics,
whitepapers.. )
Here !!
8. Computation Graph(1/2)
1) Graph algorithm with computation
a) Graph with equations, Expression
b) Parallel processing -> devices, hosts..
c) Theano, Pytorch, ...
X
F(u) = u^T
?
X
F(u) = u^T
?
A
?
f(U, V) = UV
x^T x^T*A
X
?
A
?
?
Y
b
?
c
Y= x^Ax + b · x + c
9. Computation Graph(2/2)
2) Graph on Tensorboard
a = tf.add(1, 2,)
b = tf.multiply(a, 3)
c = tf.add(4, 5,)
d = tf.multiply(c, 6,)
e = tf.multiply(4, 5,)
f = tf.div(c, 6,)
g = tf.add(b, d)
h = tf.multiply(g, f)
tf.summary.scalar("h", h)
merged_summary_op = tf.summary.merge_all()
with tf.Session() as sess:
summary_writer =
tf.summary.FileWriter('output',graph=tf.get_default_graph())
h_val, summary = sess.run([h,merged_summary_op])
summary_writer.add_summary(summary)
10. Ops of Computation graph (1/3)
1) Ops -> registered on Graph, compiled on runtime
2) Graph management -> default graph, Lifecycle managed with Session
3) How Ops is registered to Graph
a) tf.matmul (call) -> math_ops.py (pass args) -> gen_math_ops ( create and attach op to
graph with op_def_library)
b) Later, compiled on sess.run
tensorflow/python/ops/math_ops.py
def matmul(a, b, ...
....
else:
return gen_math_ops._mat_mul(
a, b, transp... , name=name)
tensorflow/python/ops/gen_math_ops.py
def _InitOpDefLibrary(op_list_proto_bytes):
..
op_def_lib = _op_def_library.OpDefLibrary()
op_def_lib.add_op_list(op_list)
return op_def_lib
_op_def_lib = _InitOpDefLibrary
def _mat_mul(a, b,
result = _op_def_lib.apply_op("MatMul", a=..
tensorflow/python/framework/op_def_library.py
def apply_op(self, op_t...
outpu..., op = self._apply_op_helper( ...
def _apply_op_helper(self, op_...
....
g =ops._get_graph_from_inputs(..
with g.as_default(), ops.nam.. as scope:
...
op = g.create_op(op_type_name, inp...
return ou..., op_def.is_stateful, op
11. Ops of Computation graph (2/3)
2) Kernel
a) Kernel with 200 standard operations
b) Source of operational kernels in Tensorflow
i) Eigen::Tensor(아이겐 텐서)
ii) CuDNN (쿠디엔엔)
c) Ops of gen_X : generated by C++
i) example
(1) /usr/local/lib/python2.7/dist-
packages/tensorflow/python/ops/gen_math_ops.py
3) Further..
a) Static Graph vs Dynamic Graph (TensorFlow Fold, Pytorch ...)
12. Ops of Computation graph (3/3)
1) How to implement backend
a) tensorflow document (link)
b) Define the op's interface
c) Implement the kernel for the op
CPU GPU
13. Protocol Buffer of Graph
1) Protocol Buffer
a) A type of protocol for data
i) Json alike, but based on Binary stream, small size
ii) made by Google
b) Graph
i) A protocol Buffer object where Ops is registered
c) Graph Deserialize
i) can be used to serve a web service
ii) freeze_graph (a tool)
(1) tensorflow/python/tools/freeze_graph.py
iii) export (a class)
(1) saved_model_builder
14. Tensor and tf.tensor
1) Tensor
a) Generalisation of vector, scala
b) flows on computation graph (ops) => Tensorflow
2) tf.tensor
a) tf.tensor as a function pointer without any value (Pointer)
b) tf.tensor as a return type of OPs
c) Calculation of Ops by kernel implementations
3) tf.tensor and Numpy Array
a) Both of them is related with Array value
b) conversion
i) tf.tensor -> Numpy Array (O),
ii) reverse -> tf.convert_to_tensor (constant)
15. Session (1/3) How tensorflow code is built and executed
1) Scope: Define and Run
a) Define Graph (prepare Objects)
b) Compute Graph with Session
2) Session Object
a) Context on runtime
b) tf.Operation
i) tf.Variable().Initializer
ii) Matmul, Softmax
c) Multiple Argument
d) Eval Method with default sess
i) "default sess" or "y.eval(session=ss)"
x = tf.constant([[37.0, -23.0], [1.0, 4.0]])
w = tf.Variable(tf.random_uniform([2, 2]))
y = tf.matmul(x, w)
output = tf.nn.softmax(y)
init_op = w.initializer
with tf.Session() as sess:
print("-- sess.run: 1 arg --")
sess.run(init_op)
print(sess.run(output))
print("-- sess.run: 2 args --")
y_val, output_val = sess.run([y, output])
print(y_val)
print(output_val)
print("-- Y Eval -- ")
print(y.eval())
16. Session (2/3)
3) Session class
a) python/client/session.py module
b) assigned devices, Graph, run(), close()
class SessionInterface(object):
@property
def graph(self):
def run(self, fetches,
..
class BaseSession(SessionInterface):
def __init__(self, targe..
def list_devices(self):
@property
def graph(self):
def run(self,...
class Session(BaseSession):
def __enter__(self):
def __exit__(self, ...):
...
setup graph
setup config
create session
return device list
return graph info
run based on
argument array type
Context
management
Set of actions that
Session Inst will do
17. Session (3/3)
4) Session.run()
a) Python to C++
b) Swig (Simplified Wrapper and Interface Generator )
tf_session.TF_SessionPRun_wrapper
tf_session.TF_PRun
pywrap_tensorflow
_pywrap_tensorflow.so
Python Session Object python/client/tf_sessio
n_helper.cc
-> C++ client
C++ Kernel
Interface API
Processing Arguments
Python and C binding C ++
20. Training - basics building model of neutron net
X
(Placeholder)
tf.nn.relu
tf.nn.relu
output
tf.matmul
B1
W1
tf.add
tf.matmul
B1
W1
tf.add
W1
W2B1
B2
Training
flow
Input data
22. Training - Terms
1) Training
a) repeating feedforward and backprop
b) updating variables (weight, bias) on each batch
2) Epoch (에포크: 주기)
a) Iteration of all training sets
b) Steps = "number of training sets" / "batch"
3) Batch (배치)
a) Gradient Descent (feedforward and backprop)
b) loading image : size of memory (GPU)
c) tf.train.batch(..., allow_smaller_final_batch)
4) Training Accuracy
a) minimum loss(or cost) -> for building model
23. Train - Back Propagation
1) creating Train OP
a) Loss Op
b) Optimizer
2) Loss Op
a) diff btw ideal(true) value
and the predicted one
b) Any op for calculation
3) Optimizer
a) type of Gradient Descent
(경사하강법)
4) run on session
a) pass over as array
24. Train - Back Propagation
1) Optimizers on Tensorflow
a) variants of Gradient Descent
b) training OP on session
c) Stochastic Gradient Descent
i) stəˈkastik (Random)
ii) 경사 하강법
d) vs "Batch Gradient Descent"
25. Run - Inference without training
1) meaning of Inference
a) exactly -> Inference layer of model
b) usually -> without training
2) how to
a) load graph and predict
b) Estimator.predict
26. Run - tf.train.Saver
1) Saving Graph
a) during training, save model (in-graph, between-graph)
b) Save, restore from Checkpoint
< cluster sync
replication >
< How to save
during training >
28. example: CNN model of cnn
1) convolution filter
a) Feature Extraction
Using Convolution
b) Image handling
using matrix dot product
c) feature map
1) max pooling
a) resizing image
29. example: CNN model of cnn
1) Dropout
a) dropping out units
in a Neutral network
b) regularization technique
c) overfitting
1) Subsampling
34. Tensorflow in Parallel
1) Parallelism on running computation graph
Model
Model
Model
Model
Data
Worker 1 Worker 2
Worker 3 Worker 4
1) Data Parallelism
Data
Dev1 Dev2
Dev3 Dev4
Model
2) Model Parallelism
Or
3) Mix Match
35. Tensorflow in Parallel - Model Parallelism
1) Model Parallelism
Data
Dev1 Dev2
Dev3 Dev4
Model
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
sess =
tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(sum))
36. Tensorflow in Parallel - Data Parallelism
Preference to
"Replication with Data Parallelism"
over Model Parallelism
37. Tensorflow in Parallel - Data Parallelism
1) Types of Data Parallelism
a) In-graph
Asnyc Sync
In-graph
Between-graph
38. Tensorflow in Parallel - Data Parallelism
1) In-graph replication
a) One graph with tasks of ps, worker (<-> model Parallel)
b) manually assign compute intensive graph to GPU
with tf.device("/job:ps/task:0"):
weights_1 = tf.Variable(...)
biases_1 = tf.Variable(...)
...
worker_devices = ["/job:worker/task:0/gpu:0", ..., "/job:worker/task:7/gpu:0"]
for worker_device in worker_devices:
with tf.device(worker_device):
input, labels = ...
layer_1 = tf.nn.relu(tf.matmul(input, weights_1) + biases_1)
logits = tf.nn.relu(tf.matmul(layer_1, weights_2) + iases_2)
train_op = ...
with tf.Session("grpc://worker7.example.com:2222") as sess:
for _ in range(10000):
sess.run(train_op)
Manually assign multiple
devices on compute
intensive work
-> replicating model on
multiple device
Setup Param Server
39. Tensorflow in Parallel - Data Parallelism
1) In-graph replication -> Sync
a) Tower, SyncReplicasOptimizer, global_step
Below for-each loop with Devices,
Variables assigned
40. Tensorflow in Parallel - Data Parallelism
1) In-graph replication -> Sync
a) dividing up batch of data
across the GPUs.
b) all GPUs share the model param
c) data, to and from GPUs is slow
i) model param the CPU
d) GPUs synchronized
i) All gradients accumulated
ii) model updated
across replica
41. Tensorflow in Parallel - Data Parallelism
1) In-graph replication -> Sync
a) When to use
i) (O) General environment with GPUs
ii) (X) Effective Single GPU cluster with multiple GPUs
using CUDA Peer to Peer
42. Tensorflow in Parallel - Data Parallelism
1) Types of Data Parallelism
a) Between-graph
Asnyc Sync
In-graph
Between-graph
43. Tensorflow in Parallel - Data Parallelism
1) Between-graph replication
a) Sync, Async
b) Add -> Update
44. Tensorflow in Parallel - Data Parallelism
1) Between-graph replication
a) multi Graphs (distributed with similar or same graph)
b) Server, worker process
p
$ python mnist_replica.py --ps_hosts=ps-server:12222 --
worker_hosts=worker1:12223,worker2:12224 --job_name=ps
--task_index=0 --num_gpus=0 --sync_replicas=true
$ python mnist_replica.py --ps_hosts=ps-server:12222 --
worker_hosts=worker1:12223,worker2:12224 --
job_name=worker --task_index=0 --num_gpus=0 --
sync_replicas=true
Index0
Index1
Index0
Index1
Python
client
Server
of client
worker
of client
Param
Server
45. Tensorflow in Parallel - Data Parallelism
1) Between-graph replication
a) How to prepare
1) Create ClusterSpec, Server
2) Create Model under Device Scope
3) Create Supervisor, run with session
1
2
3
46. Tensorflow in Parallel - Data Parallelism
1) Between-graph replication -> Sync
a) Supervisor, tf.train.SyncReplicasOptimizer, global Session
b) Chief worker (index==0)
54. Tools
1) Other DL framework
a) Pytorch(Facebook)
b) MXnet(Apache foundation, Amazon)
c) CNTK(Microsoft Cognitive Toolkit)
d) Caffe (Berkeley Vision and Learning Center)
e) Deeplarning4J (Eclipse Foundation)
2) Keras
a) Abstraction layer Framework
3) Theano
a) a model for Tensorflow
b) To Cease Development After Version 1.0
55. Keras
1) what is Keras ?
a) abstraction class
b) Write once, run "with" any framework
Keras Python
Theano
Tensorflow
MXnet
56. Keras
1) Serving for production
a) configure and create model
b) use tensorflow serving
< From Keras Document >
57. Comparison
1) Tool sets (from documents of each tool)
data type Operation graph gpu
cluster,
data parallel
Tensorflow tf.Tensor tf.Operation
(ex: tf.mat_mul)
tf.Graph
(static by default,
TF.fold.. )
supports with
parameter
supports
Pytorch torch.Tensor torch package
(ex: torch.matmul)
dynamic supports with
parameter
supports
(torch.distribute
d.all_reduce)
MXnet mx.nd.array mx.nd package
(ex:mx.nd.dot)
mxnet.symbol
(static)
supports with
parameter
supports
(kvstore)