Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
32 views32 pages

Understanding Multilayer Perceptrons

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views32 pages

Understanding Multilayer Perceptrons

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

5 Multilayer Perceptrons

167
168 Multilayer Perceptrons
169 Multilayer Perceptrons

퐿−1

Fig. 5.1.1 An MLP with a hidden layer of five hidden units.

X ∈ R푛×푑 푛
푑 ℎ
H ∈ R푛×ℎ

W (1) ∈ R푑×ℎ b (1) ∈ R1×ℎ W (2) ∈ Rℎ×푞


b (2) ∈ R1×푞 O∈ R푛×푞

H = XW (1) + b (1) ,
(5.1.1)
O = HW (2) + b (2) .
170 Multilayer Perceptrons

W = W (1) W (2) b = b (1) W (2) +


b (2)
O = (XW (1) + b (1) )W (2) + b (2) = XW (1) W (2) + b (1) W (2) + b (2) = XW + b.
(5.1.2)

휎(푥) = max(0, 푥)
휎(·)

H = 휎(XW (1) + b (1) ),


(5.1.3)
O = HW (2) + b (2) .

X

H (1) =
휎1 (XW (1) +b (1) ) H (2) = 휎2 (H (1) W (2) +b (2) )
171 Multilayer Perceptrons


0

ReLU(푥) = max(푥, 0). (5.1.4)


172 Multilayer Perceptrons

pReLU(푥) = max(0, 푥) + 훼 min(0, 푥). (5.1.5)

1
sigmoid(푥) = . (5.1.6)
1 + exp(−푥)
173 Multilayer Perceptrons

푑 exp(−푥)
sigmoid(푥) = = sigmoid(푥) (1 − sigmoid(푥)) . (5.1.7)
푑푥 (1 + exp(−푥)) 2

−1 1
1 − exp(−2푥)
tanh(푥) = . (5.1.8)
1 + exp(−2푥)
174 Multilayer Perceptrons


tanh(푥) = 1 − tanh2 (푥). (5.1.9)
푑푥
175 Multilayer Perceptrons

푥Φ(푥) Φ(푥)
휎(푥) = 푥 sigmoid( 훽푥)

푥 sigmoid(훽푥)

tanh(푥) + 1 = 2 sigmoid(2푥)

102

102
176 Multilayer Perceptrons

28 ×
28 = 784
177 Implementation of Multilayer Perceptrons
178 Multilayer Perceptrons
179 Implementation of Multilayer Perceptrons

103

103
180 Multilayer Perceptrons

ℓ2

x ∈ R푑

z = W (1) x, (5.3.1)

W (1) ∈ Rℎ×푑
z ∈ Rℎ 휙

h = 휙(z). (5.3.2)

h
W (2) ∈ R푞×ℎ

o = W (2) h. (5.3.3)
181 Forward Propagation, Backward Propagation, and Computational Graphs

푙 푦

퐿 = 푙 (o, 푦). (5.3.4)

ℓ2

휆 2 2
푠= W (1) + W (2) , (5.3.5)
2
ℓ2

퐽 = 퐿 + 푠. (5.3.6)

Fig. 5.3.1 Computational graph of forward propagation.

Y = 푓 (X) Z = 푔(Y)
X, Y, Z
Z X
휕Z 휕Z 휕Y
= , . (5.3.7)
휕X 휕Y 휕X
182 Multilayer Perceptrons

W (1) W (2)
휕퐽/휕W (1) 휕퐽/휕W (2)

퐽 = 퐿+푠
퐿 푠
휕퐽 휕퐽
=1 = 1. (5.3.8)
휕퐿 휕푠

o
휕퐽 휕퐽 휕퐿 휕퐿
= , = ∈ R푞 . (5.3.9)
휕o 휕퐿 휕o 휕o

휕푠 휕푠
= 휆W (1) = 휆W (2) . (5.3.10)
휕W (1) 휕W (2)
휕퐽/휕W (2) ∈ R푞×ℎ

휕퐽 휕퐽 휕o 휕퐽 휕푠 휕퐽
= , + , = h + 휆W (2) . (5.3.11)
휕W (2) 휕o 휕W (2) 휕푠 휕W (2) 휕o

W (1)

휕퐽/휕h ∈ Rℎ
휕퐽 휕퐽 휕o 휕퐽
= , = W (2) . (5.3.12)
휕h 휕o 휕h 휕o

휙 휕퐽/휕z ∈ Rℎ
z

휕퐽 휕퐽 휕h 휕퐽
= , = 휙 (z) . (5.3.13)
휕z 휕h 휕z 휕h

휕퐽/휕W (1) ∈ Rℎ×푑

휕퐽 휕퐽 휕z 휕퐽 휕푠 휕퐽
= , + , = x + 휆W (1) . (5.3.14)
휕W (1) 휕z 휕W (1) 휕푠 휕W (1) 휕z
183 Forward Propagation, Backward Propagation, and Computational Graphs

(5.3.5)
W (1) W (2)

(5.3.11)
h

X 푓 푛×푚
푓 X
184 Multilayer Perceptrons

104
104

퐿 x o 푙
푓푙 W (푙) h (푙 )
h (0) =x

h (푙) = 푓푙 (h (푙−1) ) o = 푓 퐿 ◦ · · · ◦ 푓1 (x). (5.4.1)

o
W (푙 )
휕W (푙) o = 휕h ( 퐿−1) h ( 퐿) · · · 휕h (푙) h (푙+1) 휕W (푙) h (푙) .
(5.4.2)
M (퐿) = M (푙+1) = v (푙) =

퐿−푙 M ( 퐿) · · · M (푙+1)
v (푙)
185 Numerical Stability and Initialization

M (푙)


1/(1+exp(−푥))
186 Multilayer Perceptrons

휎2 = 1

W (1)

W (1) = 푐 푐

W (1)

W (1)
187 Numerical Stability and Initialization

표푖
푛 푥푗 푤푖 푗


표푖 = 푤푖 푗 푥 푗 . (5.4.3)
푗=1

푤푖 푗
휎2

푥푗 훾2
푤푖 푗
표푖

퐸 [표푖 ] = 퐸 [푤 푖 푗 푥 푗 ]
푗=1

(5.4.4)
= 퐸 [푤 푖 푗 ]퐸 [푥 푗 ]
푗=1

= 0,

[표푖 ] = 퐸 [표2푖 ] − (퐸 [표 푖 ]) 2

= 퐸 [푤 2푖 푗 푥 2푗 ] − 0
푗=1
푛 (5.4.5)
= 퐸 [푤 2푖 푗 ]퐸 [푥 2푗 ]
푗=1

= 푛 휎2 훾2 .
푛 휎2 = 1
188 Multilayer Perceptrons

푛 휎2 = 1 푛

1 2
(푛 + 푛 )휎 2 = 1 휎= . (5.4.6)
2 푛 +푛

2
휎2 = 푛 +푛
푎2
푈 (−푎, 푎) 3
푎2
3 휎2

6 6
푈 − , . (5.4.7)
푛 +푛 푛 +푛
189 Generalization in Deep Learning

105
105
190 Multilayer Perceptrons
191 Generalization in Deep Learning

ℓ2

x 푘 푘
x푖 푑(x, x푖 ) 푘 =1 1

1 푑
휙(x)
192 Multilayer Perceptrons


193 Generalization in Deep Learning

ℓ2 ℓ1

ℓ2
194 Multilayer Perceptrons

106 106

ℓ2
195 Dropout

휖 ∼ N (0, 휎 2 ) x x = x+휖
퐸 [x ] = x

푝 ℎ

0 푝
ℎ = ℎ
(5.6.1)
1− 푝

퐸 [ℎ ] = ℎ


ℎ2
ℎ5 ℎ2
ℎ5

ℎ1 , . . . , ℎ5

Fig. 5.6.1 MLP before and after dropout.


196 Multilayer Perceptrons

1 1− 푝 0 푝
푈 [0, 1]

197 Dropout
198 Multilayer Perceptrons


You might also like