Tensorflow in practice by Engineer - donghwi cha

Tensorflow
In Practice
By Donghwi Cha
with the help of engineering

Summary
- An introduction to Tensorflow
- Goal
- Get used to terms
- Basics about its structure
- catch up basic pillars of tensorflow
- This document
- based on web search and Tensorflow document
( www.tensorflow.org )

Summary
- The standpoint of this session
Graph
Train
cluster
(replica)
Bird's eye view
(Tensorflow)
Deep Dive
( mathematics,
whitepapers.. )
Here !!

Intro to Deep learning
1) What is it?
a) Perceptron, deep neutral network
b) A type of Machine learning

Tensorflow,
as a Computation graph engine

Tensorflow in practice by Engineer - donghwi cha

Computation Graph(1/2)
1) Graph algorithm with computation
a) Graph with equations, Expression
b) Parallel processing -> devices, hosts..
c) Theano, Pytorch, ...
X
F(u) = u^T
?
X
F(u) = u^T
?
A
?
f(U, V) = UV
x^T x^T*A
X
?
A
?
?
Y
b
?
c
Y= x^Ax + b · x + c

Computation Graph(2/2)
2) Graph on Tensorboard
a = tf.add(1, 2,)
b = tf.multiply(a, 3)
c = tf.add(4, 5,)
d = tf.multiply(c, 6,)
e = tf.multiply(4, 5,)
f = tf.div(c, 6,)
g = tf.add(b, d)
h = tf.multiply(g, f)
tf.summary.scalar("h", h)
merged_summary_op = tf.summary.merge_all()
with tf.Session() as sess:
summary_writer =
tf.summary.FileWriter('output',graph=tf.get_default_graph())
h_val, summary = sess.run([h,merged_summary_op])
summary_writer.add_summary(summary)

Ops of Computation graph (1/3)
1) Ops -> registered on Graph, compiled on runtime
2) Graph management -> default graph, Lifecycle managed with Session
3) How Ops is registered to Graph
a) tf.matmul (call) -> math_ops.py (pass args) -> gen_math_ops ( create and attach op to
graph with op_def_library)
b) Later, compiled on sess.run
tensorflow/python/ops/math_ops.py
def matmul(a, b, ...
....
else:
return gen_math_ops._mat_mul(
a, b, transp... , name=name)
tensorflow/python/ops/gen_math_ops.py
def _InitOpDefLibrary(op_list_proto_bytes):
..
op_def_lib = _op_def_library.OpDefLibrary()
op_def_lib.add_op_list(op_list)
return op_def_lib
_op_def_lib = _InitOpDefLibrary
def _mat_mul(a, b,
result = _op_def_lib.apply_op("MatMul", a=..
tensorflow/python/framework/op_def_library.py
def apply_op(self, op_t...
outpu..., op = self._apply_op_helper( ...
def _apply_op_helper(self, op_...
....
g =ops._get_graph_from_inputs(..
with g.as_default(), ops.nam.. as scope:
...
op = g.create_op(op_type_name, inp...
return ou..., op_def.is_stateful, op

2) Kernel
a) Kernel with 200 standard operations
b) Source of operational kernels in Tensorflow
i) Eigen::Tensor(아이겐 텐서)
ii) CuDNN (쿠디엔엔)
c) Ops of gen_X : generated by C++
i) example
(1) /usr/local/lib/python2.7/dist-
packages/tensorflow/python/ops/gen_math_ops.py
3) Further..
a) Static Graph vs Dynamic Graph (TensorFlow Fold, Pytorch ...)

1) How to implement backend
a) tensorflow document (link)
b) Define the op's interface
c) Implement the kernel for the op
CPU GPU

Protocol Buffer of Graph
1) Protocol Buffer
a) A type of protocol for data
i) Json alike, but based on Binary stream, small size
ii) made by Google
b) Graph
i) A protocol Buffer object where Ops is registered
c) Graph Deserialize
i) can be used to serve a web service
ii) freeze_graph (a tool)
(1) tensorflow/python/tools/freeze_graph.py
iii) export (a class)
(1) saved_model_builder

Tensor and tf.tensor
1) Tensor
a) Generalisation of vector, scala
b) flows on computation graph (ops) => Tensorflow
2) tf.tensor
a) tf.tensor as a function pointer without any value (Pointer)
b) tf.tensor as a return type of OPs
c) Calculation of Ops by kernel implementations
3) tf.tensor and Numpy Array
a) Both of them is related with Array value
b) conversion
i) tf.tensor -> Numpy Array (O),
ii) reverse -> tf.convert_to_tensor (constant)

Session (1/3) How tensorflow code is built and executed
1) Scope: Define and Run
a) Define Graph (prepare Objects)
b) Compute Graph with Session
2) Session Object
a) Context on runtime
b) tf.Operation
i) tf.Variable().Initializer
ii) Matmul, Softmax
c) Multiple Argument
d) Eval Method with default sess
i) "default sess" or "y.eval(session=ss)"
x = tf.constant([[37.0, -23.0], [1.0, 4.0]])
w = tf.Variable(tf.random_uniform([2, 2]))
y = tf.matmul(x, w)
output = tf.nn.softmax(y)
init_op = w.initializer
with tf.Session() as sess:
print("-- sess.run: 1 arg --")
sess.run(init_op)
print(sess.run(output))
print("-- sess.run: 2 args --")
y_val, output_val = sess.run([y, output])
print(y_val)
print(output_val)
print("-- Y Eval -- ")
print(y.eval())

Session (2/3)
3) Session class
a) python/client/session.py module
b) assigned devices, Graph, run(), close()
class SessionInterface(object):
@property
def graph(self):
def run(self, fetches,
..
class BaseSession(SessionInterface):
def __init__(self, targe..
def list_devices(self):
@property
def graph(self):
def run(self,...
class Session(BaseSession):
def __enter__(self):
def __exit__(self, ...):
...
setup graph
setup config
create session
return device list
return graph info
run based on
argument array type
Context
management
Set of actions that
Session Inst will do

Session (3/3)
4) Session.run()
a) Python to C++
b) Swig (Simplified Wrapper and Interface Generator )
tf_session.TF_SessionPRun_wrapper
tf_session.TF_PRun
pywrap_tensorflow
_pywrap_tensorflow.so
Python Session Object python/client/tf_sessio
n_helper.cc
-> C++ client
C++ Kernel
Interface API
Processing Arguments
Python and C binding C ++

Training - basics building model of neutron net
X
(Placeholder)
tf.nn.relu
tf.nn.relu
output
tf.matmul
B1
W1
tf.add
tf.matmul
B1
W1
tf.add
W1
W2B1
B2
Training
flow
Input data

Training - basics
1) Example: Multi-layer
Epoch: 0001 cost= 54.984976748
Epoch: 0002 cost= 12.827661085
...
Epoch: 0024 cost= 0.237979749
Epoch: 0025 cost= 0.212254504
Optimization Finished!
Accuracy: 0.9347
Sample MNIST code
https://raw.githubusercontent.com/aymericdamien/Tenso
rFlow-
Examples/master/examples/4_Utils/tensorboard_advance
d.py

Training - Terms
1) Training
a) repeating feedforward and backprop
b) updating variables (weight, bias) on each batch
2) Epoch (에포크: 주기)
a) Iteration of all training sets
b) Steps = "number of training sets" / "batch"
3) Batch (배치)
a) Gradient Descent (feedforward and backprop)
b) loading image : size of memory (GPU)
c) tf.train.batch(..., allow_smaller_final_batch)
4) Training Accuracy
a) minimum loss(or cost) -> for building model

Train - Back Propagation
1) creating Train OP
a) Loss Op
b) Optimizer
2) Loss Op
a) diff btw ideal(true) value
and the predicted one
b) Any op for calculation
3) Optimizer
a) type of Gradient Descent
(경사하강법)
4) run on session
a) pass over as array

Train - Back Propagation
1) Optimizers on Tensorflow
a) variants of Gradient Descent
b) training OP on session
c) Stochastic Gradient Descent
i) stəˈkastik (Random)
ii) 경사 하강법
d) vs "Batch Gradient Descent"

Run - Inference without training
1) meaning of Inference
a) exactly -> Inference layer of model
b) usually -> without training
2) how to
a) load graph and predict
b) Estimator.predict

Run - tf.train.Saver
1) Saving Graph
a) during training, save model (in-graph, between-graph)
b) Save, restore from Checkpoint
< cluster sync
replication >
< How to save
during training >

example: CNN model of cnn
1) convolution filter
a) Feature Extraction
Using Convolution
b) Image handling
using matrix dot product
c) feature map
1) max pooling
a) resizing image

example: CNN model of cnn
1) Dropout
a) dropping out units
in a Neutral network
b) regularization technique
c) overfitting
1) Subsampling

example: CNN sample code - model

example: CNN sample code - session
From
https://github.com/aymericdamien/TensorFl
ow-
Examples/blob/master/examples/3_NeuralN
etworks/convolutional_network_raw.py

Tensorflow in Parallel
1) Parallelism on running computation graph
Model
Model
Model
Model
Data
Worker 1 Worker 2
Worker 3 Worker 4
1) Data Parallelism
Data
Dev1 Dev2
Dev3 Dev4
Model
2) Model Parallelism
Or
3) Mix Match

Tensorflow in Parallel - Model Parallelism
1) Model Parallelism
Data
Dev1 Dev2
Dev3 Dev4
Model
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
sess =
tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(sum))

Tensorflow in Parallel - Data Parallelism
Preference to
"Replication with Data Parallelism"
over Model Parallelism

1) Types of Data Parallelism
a) In-graph
Asnyc Sync
In-graph
Between-graph

1) In-graph replication
a) One graph with tasks of ps, worker (<-> model Parallel)
b) manually assign compute intensive graph to GPU
with tf.device("/job:ps/task:0"):
weights_1 = tf.Variable(...)
biases_1 = tf.Variable(...)
...
worker_devices = ["/job:worker/task:0/gpu:0", ..., "/job:worker/task:7/gpu:0"]
for worker_device in worker_devices:
with tf.device(worker_device):
input, labels = ...
layer_1 = tf.nn.relu(tf.matmul(input, weights_1) + biases_1)
logits = tf.nn.relu(tf.matmul(layer_1, weights_2) + iases_2)
train_op = ...
with tf.Session("grpc://worker7.example.com:2222") as sess:
for _ in range(10000):
sess.run(train_op)
Manually assign multiple
devices on compute
intensive work
-> replicating model on
multiple device
Setup Param Server

1) In-graph replication -> Sync
a) Tower, SyncReplicasOptimizer, global_step
Below for-each loop with Devices,
Variables assigned

a) dividing up batch of data
across the GPUs.
b) all GPUs share the model param
c) data, to and from GPUs is slow
i) model param the CPU
d) GPUs synchronized
i) All gradients accumulated
ii) model updated
across replica

a) When to use
i) (O) General environment with GPUs
ii) (X) Effective Single GPU cluster with multiple GPUs
using CUDA Peer to Peer

1) Types of Data Parallelism
a) Between-graph
Asnyc Sync
In-graph
Between-graph

1) Between-graph replication
a) Sync, Async
b) Add -> Update

a) multi Graphs (distributed with similar or same graph)
b) Server, worker process
p
$ python mnist_replica.py --ps_hosts=ps-server:12222 --
worker_hosts=worker1:12223,worker2:12224 --job_name=ps
--task_index=0 --num_gpus=0 --sync_replicas=true
$ python mnist_replica.py --ps_hosts=ps-server:12222 --
worker_hosts=worker1:12223,worker2:12224 --
job_name=worker --task_index=0 --num_gpus=0 --
sync_replicas=true
Index0
Index1
Index0
Index1
Python
client
Server
of client
worker
of client
Param
Server

a) How to prepare
1) Create ClusterSpec, Server
2) Create Model under Device Scope
3) Create Supervisor, run with session
1
2
3

1) Between-graph replication -> Sync
a) Supervisor, tf.train.SyncReplicasOptimizer, global Session
b) Chief worker (index==0)

Serving
1) Tensorflow Serving
a) How to
i) Generate model after training
ii) setup server with model, and connect client

Serving
1) serving.api
a) Data input, response handling
b) Get prediction

Serving
1) Prediction_signature, Claasification_signature
2) Metagraph
Model Train
and Save
Serving and
inference

Tools
1) Other DL framework
a) Pytorch(Facebook)
b) MXnet(Apache foundation, Amazon)
c) CNTK(Microsoft Cognitive Toolkit)
d) Caffe (Berkeley Vision and Learning Center)
e) Deeplarning4J (Eclipse Foundation)
2) Keras
a) Abstraction layer Framework
3) Theano
a) a model for Tensorflow
b) To Cease Development After Version 1.0

Keras
1) what is Keras ?
a) abstraction class
b) Write once, run "with" any framework
Keras Python
Theano
Tensorflow
MXnet

Keras
1) Serving for production
a) configure and create model
b) use tensorflow serving
< From Keras Document >

Comparison
1) Tool sets (from documents of each tool)
data type Operation graph gpu
cluster,
data parallel
Tensorflow tf.Tensor tf.Operation
(ex: tf.mat_mul)
tf.Graph
(static by default,
TF.fold.. )
supports with
parameter
supports
Pytorch torch.Tensor torch package
(ex: torch.matmul)
dynamic supports with
parameter
supports
(torch.distribute
d.all_reduce)
MXnet mx.nd.array mx.nd package
(ex:mx.nd.dot)
mxnet.symbol
(static)
supports with
parameter
supports
(kvstore)

Tensorflow in practice by Engineer - donghwi cha

More Related Content

Tensorflow in practice by Engineer - donghwi cha

Editor's Notes