TensorFlow 深度學習講座

TensorFlow深度學習講座
By Mark Chang

簡介
•  深度學習是什麼？
•  深度學習的原理
•  Tensorflow是什麼？

⼈人腦 vs 電腦
8
<
:
3x + 2y + 5z = 7
5x + 1y + 8z = 9
9x + 4y + 3z = 14

⼈人腦 vs 電腦
貨櫃船
機⾞車

⼈人腦 vs 電腦
•  ⼈人腦優勢：
–  影像、聲⾳音
–  語⾔言
–  ⾃自我意識（⾃自決⼒力）
–  …
•  電腦優勢：
–  數學運算
–  記憶（儲存）能⼒力
–  …

深度學習
•  ⼀一種機器學習的⽅方法
•  ⽤用電腦模擬⼈人腦神經系統構造
•  讓電腦學會⼈人腦可做的事

影像識別
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

藝術創作
http://arxiv.org/abs/1508.06576

語意理解
https://papers.nips.cc/paper/5021-distributed-representations-of-
words-and-phrases-and-their-compositionality.pdf

詩詞創作
http://emnlp2014.org/papers/pdf/EMNLP2014074.pdf

打電動
http://arxiv.org/pdf/1312.5602v1.pdf

深度學習可以做的事
•  作畫
•  寫詩
•  開⾞車
•  下棋
•  ……

機器學習
監督式學習
Supervised Learning
⾮非監督式學習
Unsupervised Learning
增強式學習
Reinforcement Learning

監督式學習
機器學習模型
機器學習模型 ship
ship
資料
⼈人⼯工標記
資料
答案

機器學習模型
Beijing is the capital of China.
As China's capital, Beijing is a large and vibrant city.
Tokyo is the capital of Japan.
As Japan’s capital, Tokyo is a large and vibrant city.
…….
資料
結果

增強式學習
機器學習模型環境
訊息
動作

機器學習
監督式學習
Supervised Learning
Unsupervised Learning
增強式學習
Reinforcement Learning
深度學習
Deep Learning

監督式機器學習
訓練資料
機器學習模型
輸出值
正確答案
對答案
如果答錯了，
要修正模型
機器學習模型
測試資料
訓練
完成
輸出值

符號慣例
訓練資料
全部：X , Y
單筆：x(i), y(i)
機器學習模型
h
模型參數
w
輸出值
h(X)
正確答案
Y
對答案
E(h(X),Y)
如果答錯了，
要修正模型
X
Y

邏輯迴歸（Logistic Regression）
•  ⽤用Sigmoid曲線去逼近資料的分佈情形
x
y
x
y
訓練
完成

邏輯迴歸（Logistic Regression）
•  ⽤用Sigmoid曲線去逼近資料的分佈情形
x
y

訓練資料
X
Y

-0.47241379 0
-0.35344828 0
-0.30148276 0
0.33448276 1
0.35344828 1
0.37241379 1
0.39137931 1
0.41034483 1
0.44931034 1
0.49827586 1
0.51724138 1
…. ….

機器學習模型
Sigmoid function h(x) =
1
1 + e (w0+w1x)
w0 + w1x < 0
h(x) ⇡ 0
w0 + w1x > 0
h(x) ⇡ 1

修正模型
•  Error function : Cross Entropy
E(h(X), Y ) =
1
m
(
mX
i
y(i)
log(h(x(i)
)) + (1 y(i)
)log(1 h(x(i)
)))
h(x(i)
) ⇡ 0 and y(i)
= 0 ) E(h(X), Y ) ⇡ 0
h(x(i)
) ⇡ 1 and y(i)
= 1 ) E(h(X), Y ) ⇡ 0
h(x(i)
) ⇡ 0 and y(i)
= 1 ) E(h(X), Y ) ⇡ 1
h(x(i)
) ⇡ 1 and y(i)
= 0 ) E(h(X), Y ) ⇡ 1

w1
w0
修正模型
•  梯度下降:
w0 w0–⌘
@E(h(X), Y )
@w0
w1 w1–⌘
@E(h(X), Y )
@w1
(
@E(h(X), Y )
@w0
,
@E(h(X), Y )
@w1
)

神經元與動作電位
http://humanphisiology.wikispaces.com/file/view/neuron.png/
216460814/neuron.png
http://upload.wikimedia.org/wikipedia/commons/
thumb/4/4a/Action_potential.svg/1037px-
Action_potential.svg.png

模擬神經元
n
W1
W2
x1
x2
b
Wb
y
nin = w1x1 + w2x2 + wb
nout =
1
1 + e nin
nin
nout
y =
1
1 + e (w1x1+w2x2+wb)

nout = 1
nout = 0.5
nout = 0(0,0)
x2
x1
模擬神經元
nout =
1
1 + e nin
nout =
1
1 + e nin
w1x1 + w2x2 + wb = 0
w1x1 + w2x2 + wb > 0
w1x1 + w2x2 + wb < 0
1
0

⼆二元分類：AND Gate
x1
x2
y
0
0
0
0
1
0
1
0
0
1
1
1
(0,0)
(0,1)
(1,1)
(1,0)
0
1
n
20
20
b
-30
y
x1
x2
y =
1
1 + e (20x1+20x2 30)
20x1 + 20x2 30 = 0

XOR Gate ?
(0,0)
(0,1)
(1,1)
(1,0)
0
0
1
x1
x2
y
0
0
0
0
1
1
1
0
1
1
1
0

⼆二元分類：XOR Gate
n
-20
20
b
-10
y
(0,0)
(0,1)
(1,1)
(1,0)
0
1
(0,0)
(0,1)
(1,1)
(1,0)
1
0
(0,0)
(0,1)
(1,1)
(1,0)
0
0
1
n1
20
20
b
-30
x1
x2
n2
20
20
b
-10
x1
x2
x1
x2
n1
n2
y
0
0
0
0
0
0
1
0
1
1
1
0
0
1
1
1
1
1
1
0

類神經網路
x
y
n11
n12
n21
n22
W12,y
W12,x
b
W11,y
W11,b
W12,b
b
W11,x
W21,11
W22,12
W21,12
W22,11
W21,b
W22,b
z1
z2
Input
Layer
Hidden
Layer
Output
Layer

視覺認知
http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg

訓練類神經網路
•  ⽤用隨機值初始化模型參數w
•  Forward Propagation
– ⽤用⺫⽬目前的模型參數計算出答案
•  計算錯誤量（⽤用Error Function）
•  Backward Propagation
– ⽤用錯誤量來修正模型

訓練類神經網路
訓練資料
機器學習模型
輸出值
正確答案
對答案
如果答錯了，
要修正模型
初始化
Forward
Propagation
Error
Function
Backward
Propagation

初始化
•  將所有的W隨機設成-N～N之間的數
•  每層之間W的值都不能相同
x
y
n11
n12
n21
n22
W12,y
W12,x
b
W11,y
W11,b
W12,b
b
W11,x
W21,11
W22,12
W21,12
W22,11
W21,b
W22,b
z1
z2

Error Function
J = (z1log(n21(out)) + (1 z1)log(1 n21(out)))
(z2log(n22(out)) + (1 z2)log(1 n22(out)))
n21
n22
z1
z2
nout ⇡ 0 and z = 0 ) J ⇡ 0
nout ⇡ 1 and z = 1 ) J ⇡ 0
nout ⇡ 0 and z = 1 ) J ⇡ 1
nout ⇡ 1 and z = 0 ) J ⇡ 1

w1
w0
Gradient Descent
w21,11 w21,11 ⌘
@J
@w21,11
w21,12 w21,12 ⌘
@J
@w21,12
w21,b w21,b ⌘
@J
@w21,b
w22,11 w21,11 ⌘
@J
@w22,11
w22,12 w21,12 ⌘
@J
@w22,12
w22,b w21,b ⌘
@J
@w22,b
w11,x w11,x ⌘
@J
@w11,x
w11,y w11,y ⌘
@J
@w11,y
w11,b w11,b ⌘
@J
@w11,b
w12,x w12,x ⌘
@J
@w12,x
w12,y w12,y ⌘
@J
@w12,y
w12,b w12,b ⌘
@J
@w12,b
(–
@J
@w0
, –
@J
@w1
)

Backward Propagation
@J
@n21(out)
@n21(out)
@n21(in)
21(out)
@J
@w21,11
=
@n21(in)
@w21,11
=
@n21(out)
@n21(in)
@n21(in)
@w21,11
n11(out)
21(in)
@n21(in)
@w21,11
21(in)
=
=
n11(out)21(in)
w21,11 w21,11 ⌘
@J
@w21,11
w21,11 w21,11 ⌘

11(in) =
@J
@n11(in)
=
@J
@n21(out)
@n21(out)
@n11(in)
+
@J
@n22(out)
@n22(out)
@n11(in)
= ( 21(in)w21,11 + 22(in)w22,11)
@n11(out)
@n11(in)
=
@J
@n21(out)
@n21(out)
@n21(in)
@n21(in)
@n11(out)
@n11(out)
@n11(in)
+
@J2
@n22(out)
@n22(out)
@n22(in)
@n22(in)
@n11(out)
@n11(out)
@n11(in)
= (
@J
@n21(out)
@n21(out)
@n21(in)
@n21(in)
@n11(out)
+
@J2
@n22(out)
@n22(out)
@n22(in)
@n22(in)
@n11(out)
)
@n11(out)
@n11(in)

http://cpmarkchang.logdown.com/posts/277349-neural-network-backward-propagation

Tensorflow
•  https://www.tensorflow.org/
•  TensorFlow 是 Google 開發的開源機器學習⼯工具。
•  透過使⽤用Computational Graph，來進⾏行數值演算。
•  ⽀支援程式語⾔言：python、C++
•  系統需求：
–  作業系統必須為Mac或Linux
–  Python 2.7 或 3.3 （含以上）

Tensorflow
機器學習Library
(ex, scikit-learn)
TensorFlow 從頭開始寫
彈性
技術門檻
把資料整理好後，剩
下的就直接呼叫API
自行定義
Computational Graph，
並交由TensorFlow計算。
自己推導微分公式，
自己寫整個流程
低
低
高
高

Tensorflow
•  彈性
– 只要是可以⽤用Computational Graph來表達的運
算，都可以⽤用Tensorflow來解。
•  ⾃自動微分
– ⾃自動計算Computational Graph微分後的結果。
•  平台相容性
– 同樣的程式碼可⽤用CPU執⾏行，亦可⽤用GPU執⾏行。

CPU V.S GPU
http://allegroviva.com/gpu-computing/difference-between-gpu-and-cpu/

Example : Binary Classification
n
w1
w2
1
b
y
x1
x2
x1
x2
y
y =
1
1 + e x1w1+x2w2+b
x_data = np.random.rand(50,2)
y_data = ((x_data[:,1] > 0.5)*
( x_data[:,0] > 0.5))
模型資料

Example : Binary Classification
n
w1
w2
1
b
y
x1
x2
y =
1
1 + e x1w1+x2w2+b
訓練後

Tensorflow
x_ = tf.placeholder(tf.float32,[None,2])
y_ = tf.placeholder(tf.float32,[None,1])
w = tf.Variable(tf.random_uniform([2,1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1,1]))
y = tf.nn.sigmoid(tf.matmul(x_,w)+b)
cross_entropy = -tf.reduce_sum(y_*tf.log(y) + (1- y_) * tf.log(1 - y) )
optimizer = tf.train.GradientDescentOptimizer(0.1)
train = optimizer.minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in xrange(500):
sess.run(train,feed_dict={x_:x_data,y_:y_data})
print sess.run(cross_entropy)
sess.close()
Computational
Graph
Session

Computation Graph
# placeholder
# variable
# operations
# error function
cross_entropy = -tf.reduce_sum(y_*tf.log(y) + (1-
y_) * tf.log(1 - y) )
# trainer
# initalizer

Placeholder
0.70828883
0.27190551

0.89042455
0.63832092

0.11332515
0.00849676

0.73278006
0.37781084

0.292448
0.09819899

0.9802261
0.94339143

0.36212146
0.54404682

……..
……..

0!
1!
0!
0!
0!
1!
0!
…!
x_ y_

Variable
0.42905441

-0.43841863

b
0!
w

Matrix Multiplication
0.42905441

-‐0.43841863

w
x_
0.70828883
0.27190551

0.89042455
0.63832092

0.11332515
0.00849676

….
….

0.184686

0.1021888

0.04489752

….

tf.matmul(x_,w)+b
b
0!
0.70828883
*
0.42905441
+

0.27190551
*
-‐0.43841863
+
0

0.89042455*
0.42905441
+

0.63832092*
-‐0.43841863
+
0

0.11332515*
0.42905441
+

0.00849676*
-‐0.43841863
+
0

….

Sigmoid
0.54604071

0.52552499

0.51122249

….

0.184686

0.1021888

0.04489752

….

tf.nn.sigmoid

Error Function
E(h(X), Y ) =
1
m
(
mX
i
y(i)
log(h(x(i)
)) + (1 y(i)
)log(1 h(x(i)
)))
cross_entropy = -tf.reduce_sum(y_*tf.log(y) + (1- y_) * tf.log(1 - y) )
0.54604071

0.52552499

…

0!
1!
…!
y_ y
1.4331052
-tf.reduce_sum(y_*tf.log(y))

Trainer

Trainer
w w ⌘
@E(h(X), Y )
@w
b b ⌘
@E(h(X), Y )
@b

Computation Graph
•  Initializer
w
b
0.42905441

-0.43841863

0!

Session
# create session
sess = tf.Session()
# initialize variable
sess.run(init)
# gradient descent
sess.run(train, feed_dict={x_:x_data,y_:y_data})
# fetch variable
print sess.run(cross_entropy, feed_dict={x_:x_data,y_:y_data})
# release resource
sess.close()

Run Operations
sess.run(init)
the Node in
Computational
Graph

Run Operations
sess.run(train, feed_dict={x_:x_data,y_:y_data} )
the Node in
Computational
Graph
Input
Data
0.70828883
0.27190551

0.89042455
0.63832092

0.11332515
0.00849676

0.73278006
0.37781084

……..
……..

0!
1!
0!
0!
…!
x_data y_data

Run Operations
print sess.run(cross_entropy, feed_dict={x_:x_data,y_:y_data})
the Node in
Computational
Graph
Input
Data
0.70828883
0.27190551

0.89042455
0.63832092

0.11332515
0.00849676

0.73278006
0.37781084

……..
……..

0!
1!
0!
0!
…!
x_data y_data
Results
2.4564333

Training
sess.run(train, feed_dict={x_:x_data,y_:y_data} )

Demo : Binary Classification
https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/
blob/master/intro/binaryClassification.ipynb

Tensorboard
Histogram Summary
Scalar Summary Computational Graph

summary
tf.scalar_summarytf.histogram_summary
Histogram Summary Scalar Summary
merged = tf.merge_all_summaries()
writer = tf.train.SummaryWriter("./", sess.graph_def)
….
summary_str = sess.run(merged,feed_dict={x_:x_data,y_:y_data})
writer.add_summary(summary_str, step)

name_scope
with tf.name_scope("cross_entropy") as scope:
cross_entropy = -tf.reduce_sum(y_*tf.log(y) + (1-y_)*tf.log(1-y))

Launch Tensorboard
> tensorboard --logdir=./
Starting TensorBoard on port 6006
(You can navigate to http://0.0.0.0:6006)

Demo : TensorBoard
blob/master/intro/tensorboard.py

Demo
•  影像識別：GoogLeNet
blob/master/intro/googlenet.ipynb

About the Speaker
•  Email: ckmarkoh at gmail dot com
•  Blog: http://cpmarkchang.logdown.com
•  Github: https://github.com/ckmarkoh
Mark Chang
•  Facebook: https://www.facebook.com/ckmarkoh.chang
•  Slideshare: http://www.slideshare.net/ckmarkohchang
•  Linkedin:
https://www.linkedin.com/pub/mark-chang/85/25b/847
77

TensorFlow 深度學習講座

Related slideshows

More Related Content

TensorFlow 深度學習講座