5 Multilayer Perceptrons
167
168 Multilayer Perceptrons
169 Multilayer Perceptrons
퐿−1
Fig. 5.1.1 An MLP with a hidden layer of five hidden units.
X ∈ R푛×푑 푛
푑 ℎ
H ∈ R푛×ℎ
W (1) ∈ R푑×ℎ b (1) ∈ R1×ℎ W (2) ∈ Rℎ×푞
b (2) ∈ R1×푞 O∈ R푛×푞
H = XW (1) + b (1) ,
(5.1.1)
O = HW (2) + b (2) .
170 Multilayer Perceptrons
W = W (1) W (2) b = b (1) W (2) +
b (2)
O = (XW (1) + b (1) )W (2) + b (2) = XW (1) W (2) + b (1) W (2) + b (2) = XW + b.
(5.1.2)
휎(푥) = max(0, 푥)
휎(·)
H = 휎(XW (1) + b (1) ),
(5.1.3)
O = HW (2) + b (2) .
X
휎
H (1) =
휎1 (XW (1) +b (1) ) H (2) = 휎2 (H (1) W (2) +b (2) )
171 Multilayer Perceptrons
푥
0
ReLU(푥) = max(푥, 0). (5.1.4)
172 Multilayer Perceptrons
pReLU(푥) = max(0, 푥) + 훼 min(0, 푥). (5.1.5)
1
sigmoid(푥) = . (5.1.6)
1 + exp(−푥)
173 Multilayer Perceptrons
푑 exp(−푥)
sigmoid(푥) = = sigmoid(푥) (1 − sigmoid(푥)) . (5.1.7)
푑푥 (1 + exp(−푥)) 2
−1 1
1 − exp(−2푥)
tanh(푥) = . (5.1.8)
1 + exp(−2푥)
174 Multilayer Perceptrons
푑
tanh(푥) = 1 − tanh2 (푥). (5.1.9)
푑푥
175 Multilayer Perceptrons
푥Φ(푥) Φ(푥)
휎(푥) = 푥 sigmoid( 훽푥)
푥 sigmoid(훽푥)
tanh(푥) + 1 = 2 sigmoid(2푥)
102
102
176 Multilayer Perceptrons
28 ×
28 = 784
177 Implementation of Multilayer Perceptrons
178 Multilayer Perceptrons
179 Implementation of Multilayer Perceptrons
103
103
180 Multilayer Perceptrons
ℓ2
x ∈ R푑
z = W (1) x, (5.3.1)
W (1) ∈ Rℎ×푑
z ∈ Rℎ 휙
ℎ
h = 휙(z). (5.3.2)
h
W (2) ∈ R푞×ℎ
푞
o = W (2) h. (5.3.3)
181 Forward Propagation, Backward Propagation, and Computational Graphs
푙 푦
퐿 = 푙 (o, 푦). (5.3.4)
ℓ2
휆
휆 2 2
푠= W (1) + W (2) , (5.3.5)
2
ℓ2
퐽 = 퐿 + 푠. (5.3.6)
Fig. 5.3.1 Computational graph of forward propagation.
Y = 푓 (X) Z = 푔(Y)
X, Y, Z
Z X
휕Z 휕Z 휕Y
= , . (5.3.7)
휕X 휕Y 휕X
182 Multilayer Perceptrons
W (1) W (2)
휕퐽/휕W (1) 휕퐽/휕W (2)
퐽 = 퐿+푠
퐿 푠
휕퐽 휕퐽
=1 = 1. (5.3.8)
휕퐿 휕푠
o
휕퐽 휕퐽 휕퐿 휕퐿
= , = ∈ R푞 . (5.3.9)
휕o 휕퐿 휕o 휕o
휕푠 휕푠
= 휆W (1) = 휆W (2) . (5.3.10)
휕W (1) 휕W (2)
휕퐽/휕W (2) ∈ R푞×ℎ
휕퐽 휕퐽 휕o 휕퐽 휕푠 휕퐽
= , + , = h + 휆W (2) . (5.3.11)
휕W (2) 휕o 휕W (2) 휕푠 휕W (2) 휕o
W (1)
휕퐽/휕h ∈ Rℎ
휕퐽 휕퐽 휕o 휕퐽
= , = W (2) . (5.3.12)
휕h 휕o 휕h 휕o
휙 휕퐽/휕z ∈ Rℎ
z
휕퐽 휕퐽 휕h 휕퐽
= , = 휙 (z) . (5.3.13)
휕z 휕h 휕z 휕h
휕퐽/휕W (1) ∈ Rℎ×푑
휕퐽 휕퐽 휕z 휕퐽 휕푠 휕퐽
= , + , = x + 휆W (1) . (5.3.14)
휕W (1) 휕z 휕W (1) 휕푠 휕W (1) 휕z
183 Forward Propagation, Backward Propagation, and Computational Graphs
(5.3.5)
W (1) W (2)
(5.3.11)
h
X 푓 푛×푚
푓 X
184 Multilayer Perceptrons
104
104
퐿 x o 푙
푓푙 W (푙) h (푙 )
h (0) =x
h (푙) = 푓푙 (h (푙−1) ) o = 푓 퐿 ◦ · · · ◦ 푓1 (x). (5.4.1)
o
W (푙 )
휕W (푙) o = 휕h ( 퐿−1) h ( 퐿) · · · 휕h (푙) h (푙+1) 휕W (푙) h (푙) .
(5.4.2)
M (퐿) = M (푙+1) = v (푙) =
퐿−푙 M ( 퐿) · · · M (푙+1)
v (푙)
185 Numerical Stability and Initialization
M (푙)
휎
1/(1+exp(−푥))
186 Multilayer Perceptrons
휎2 = 1
W (1)
W (1) = 푐 푐
W (1)
W (1)
187 Numerical Stability and Initialization
표푖
푛 푥푗 푤푖 푗
푛
표푖 = 푤푖 푗 푥 푗 . (5.4.3)
푗=1
푤푖 푗
휎2
푥푗 훾2
푤푖 푗
표푖
푛
퐸 [표푖 ] = 퐸 [푤 푖 푗 푥 푗 ]
푗=1
푛
(5.4.4)
= 퐸 [푤 푖 푗 ]퐸 [푥 푗 ]
푗=1
= 0,
[표푖 ] = 퐸 [표2푖 ] − (퐸 [표 푖 ]) 2
푛
= 퐸 [푤 2푖 푗 푥 2푗 ] − 0
푗=1
푛 (5.4.5)
= 퐸 [푤 2푖 푗 ]퐸 [푥 2푗 ]
푗=1
= 푛 휎2 훾2 .
푛 휎2 = 1
188 Multilayer Perceptrons
푛 휎2 = 1 푛
1 2
(푛 + 푛 )휎 2 = 1 휎= . (5.4.6)
2 푛 +푛
2
휎2 = 푛 +푛
푎2
푈 (−푎, 푎) 3
푎2
3 휎2
6 6
푈 − , . (5.4.7)
푛 +푛 푛 +푛
189 Generalization in Deep Learning
105
105
190 Multilayer Perceptrons
191 Generalization in Deep Learning
ℓ2
x 푘 푘
x푖 푑(x, x푖 ) 푘 =1 1
1 푑
휙(x)
192 Multilayer Perceptrons
휖
193 Generalization in Deep Learning
ℓ2 ℓ1
ℓ2
194 Multilayer Perceptrons
106 106
ℓ2
195 Dropout
휖 ∼ N (0, 휎 2 ) x x = x+휖
퐸 [x ] = x
푝 ℎ
ℎ
0 푝
ℎ = ℎ
(5.6.1)
1− 푝
퐸 [ℎ ] = ℎ
푝
ℎ2
ℎ5 ℎ2
ℎ5
ℎ1 , . . . , ℎ5
Fig. 5.6.1 MLP before and after dropout.
196 Multilayer Perceptrons
1 1− 푝 0 푝
푈 [0, 1]
푝
197 Dropout
198 Multilayer Perceptrons
ℎ
ℎ