Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
[N] – SHOT Learning
In deep learning models
davinnovation@gmail.com
Learning process
Human Deep Learning
What they need…
single one picture book 60,000 train data (MNIST)
In Traditional Deep Learning…
http://pinktentacle.com/tag/waseda-university/
One Shot Learning
Give Inference ability to Learning model
One Shot Learning
Train One(Few) Data, Works Awesome
One Shot Learning – Many approaches
Transfer Learning
Domain Adaptation
Imitation Learning
…..
Started from Meta Learning
Meta means something that is "self-referencing".
‘Knowing about knowing’
Today…
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Turing Machine
Basic Structure
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
N x M memory
Block
𝑀𝑡 : t is time
N
M
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Write Heads
Controller
External Input External Output
Read Heads
σ𝑖 𝑤𝑡(𝑖) = 1 (0 ≤ 𝑤𝑡(𝑖) ≤ 1)0.9 0.1 0 ...
Weight vector : 𝑤𝑡
1 2
2 1
1 3
1.1
1.9
1.2
𝑟𝑡 ← ෍
𝑖
𝑤𝑡(𝑖)𝑀𝑡(𝑖)
Read vector : 𝑟𝑡
i = 0 i = 1 …
i = 0 i = 1 …
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1)
0.9 0.1 0 ...
Weight vector : 𝑊𝑡0.1 1.8
2 1
0.1 2.7
erase vector : 𝑒𝑡 (0 < 𝑒𝑡 < 1)
𝑀′
𝑡(𝑖) ← 𝑀𝑡−1(𝑖)[1 − 𝑤𝑡 𝑖 𝑒𝑡]
~1
~0
~1
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
`
Erase -> Add
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1)
0.9 0.1 0 ...
Weight vector : 𝑊𝑡
1 1.9
2.9 1.1
0.1 2.7
add vector : 𝑎 𝑡 (0 < 𝑎 𝑡 < 1)
𝑀𝑡 𝑖 ← 𝑀′
𝑡 𝑖 + 𝑤𝑡 𝑖 𝑎 𝑡
~1
~1
~0
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
Erase -> Add
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Content-based addressing >
- Based on Similarity of ‘Current value’ and ‘predicted by controller value’
location-based addressing >
Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve
them and perform a multiplication algorithm
=>
- Based on addressed by location
Use both mechanisms concurrently
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Content
Addressing
𝑤𝑡
𝑐
𝑘 𝑡
β 𝑡
𝑀𝑡
𝑤𝑡
𝑐
𝑖 ←
exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑖 )
σ 𝑗 exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑗 )
𝐾 u, v =
𝑢 ∙ 𝑣
| 𝑢 | ∙ | 𝑣 |
∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
1 2
2 1
1 3
3
2
1
𝑘 𝑡 𝑀𝑡
β 𝑡 = 0 0.5
0.5
.61
.39
β 𝑡 = 5
.98
.02
β 𝑡 = 50
𝑤𝑡
𝑐
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Interpolation𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡−1
𝑔𝑡
𝑤𝑡
𝑔
← 𝑔𝑡 𝑤𝑡
𝑐
+ (1 − 𝑔𝑡) 𝑤𝑡−1
.61
.39
𝑤𝑡
𝑐
.01
.90
𝑤𝑡−1
𝑔𝑡 = 0 .01
.90
𝑔𝑡 = 0.5 .36
.64
𝑤𝑡
𝑔
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Convolutional
Shift
𝑤𝑡
𝑔
𝑤′ 𝑡
𝑠𝑡
𝑤′
𝑡(𝑖) ← σ 𝑗=0
𝑁−1
𝑤𝑡
𝑔
𝑗 𝑠𝑡(𝑖 − 𝑗) // circular convolution
.01
.90
𝑤𝑡
𝑔
0
1
𝑠𝑡
0.9
𝑤′
𝑡(0)
.01
.90
𝑤𝑡
𝑔
0
1
𝑠𝑡
.01
𝑤′
𝑡(1)
.01
.90
𝑤′
𝑡
0
1
𝑠𝑡
1
0
𝑠𝑡
0.5
0.5
𝑠𝑡
Right shift Left shift Spread shift
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Sharpening 𝑤𝑡
𝑤′ 𝑡
γ 𝑡
𝑤𝑡 𝑖 ←
𝑤′
𝑡(𝑖)γ 𝑡
σ 𝑗 𝑤′
𝑡(𝑗)γ 𝑡
.01
.90
𝑤′
𝑡
γ 𝑡 = 0 0.5
0.5
.01
.90
.0001
.9998
γ 𝑡 = 1
γ 𝑡 = 2
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Neural Network
(RNN+LSTM)
Controller
Experiment – Copy train time
Experiment – Copy result
NTM
LSTM
Input : Random binary vector ( 8bit )
NTM > LSTM
Quick & Clear
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
NTM is ‘Meta learning’
=>
Use Memory-Augment, make rapid learn model.
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
Memory
Read Heads Write Heads
Controller
External Input External OutputAddressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Content-based addressing >
- Based on Similarity of ‘Current value’ and ‘predicted by controller value’
location-based addressing >
Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve
them and perform a multiplication algorithm
=>
- Based on addressed by location
Use both mechanisms concurrently
=> Doesn’t use location-based addressing
=> Only Use Content-based addressing
Memory-Augmented Neural Networks (MANN)
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑟
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Content
Addressing
𝑤𝑡
𝑟
𝑘 𝑡
𝑀𝑡
𝑤𝑡
𝑟
𝑖 ←
exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑖 )
σ 𝑗 exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑗 )
𝐾 u, v =
𝑢 ∙ 𝑣
| 𝑢 | ∙ | 𝑣 |
∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
1 2
2 1
1 3
3
2
1
𝑘 𝑡 𝑀𝑡
.53
.47
𝑤𝑡
𝑟
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
0.9 0.1 0 ...
0.1 1.8
2 1
0.1 2.7
𝑤𝑡
𝑢
← γ 𝑤𝑡−1
𝑢
+ 𝑤𝑡
𝑟
+ 𝑤𝑡
𝑤
~1
~0
~1
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
`
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
Least Recently Used Access ( LRUA )
Compute from MANN
𝑤𝑡
𝑤
← σ α 𝑤𝑡−1
𝑟
+ (1 − σ α ) 𝑤𝑡−1
𝑙𝑢
Decay Parameter
𝑤𝑡
𝑙𝑢
(𝑖) = ቊ
0, 𝑖𝑓 𝑤𝑡
𝑢
𝑖 𝑖𝑠 𝑏𝑖𝑔 𝑒𝑛𝑜𝑢𝑔ℎ
1, 𝑒𝑙𝑠𝑒
𝑀𝑡 𝑖 ← 𝑀𝑡−1 𝑖 + 𝑤𝑡
𝑤
𝑖 𝑘 𝑡
Least Used Memory
Sigmoid Function
Scalar gate parameter
𝑤𝑡
𝑤
𝑘 𝑡 : write vector
Experiment – Data
One Episode :
Input : (𝑥0, 𝑛𝑢𝑙𝑙), 𝑥1, 𝑦0 , 𝑥2, 𝑦1 , … (𝑥 𝑇, 𝑦 𝑇−1)
Output : (𝑦′0), 𝑦′1 , 𝑦′2 , … (𝑦′ 𝑇)
𝑝 𝑦𝑡 𝑥𝑡; 𝐷1:𝑡−1; 𝜃)
(𝑥0, 𝑛𝑢𝑙𝑙 )
(𝑦′0)
RNN
𝑥1, 𝑦0
(𝑦′1)
Experiment – Data
Omniglot Dataset : 1600 > classes
 1200 class train, 423 class test ( downscale to 20x20 )
+ plus rotate augmentation
Experiment – Classification Result
Trained with one-hot vector representations
With Five randomly chosen labels, train 100,000 episode ( each episode are ‘new class’ )
Instance
: class emerge count…?
Active One-Shot https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
MANN + ‘decision predict or pass’
Just Like Quiz Show!
Active One-Shot
https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
RNN
𝑦𝑡, 𝑥 𝑡+1 or (0, 𝑥 𝑡+1)
[0,1] || (𝑦′ 𝑡, 0)
𝑦𝑡, 𝑥 𝑡+1
[0,1]
𝑟𝑡 = −0.05
𝑦𝑡, 𝑥 𝑡+1
(𝑦′ 𝑡, 0)
𝑟𝑡 = ቊ
+1, 𝑖𝑓 𝑦′ 𝑡 = 𝑦𝑡
−1
Results https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
Want some more?
http://proceedings.mlr.press/v37/romera-
paredes15.pdf
https://arxiv.org/pdf/1606.04080.pdf
Reference
• https://tensorflow.blog/tag/one-shot-learning/
• http://www.modulabs.co.kr/DeepLAB_library/11115
• https://www.youtube.com/watch?v=CzQSQ_0Z-QU
• https://www.slideshare.net/JisungDavidKim/oneshot-
learning
• https://norman3.github.io/papers/docs/neural_turing_
machine.html
• https://www.slideshare.net/webaquebec/guillaume-
chevalier-deep-learning-avec-tensor-flow
• https://www.slideshare.net/ssuser06e0c5/metalearnin
g-with-memory-augmented-neural-networks

More Related Content

One shot learning - deep learning ( meta learn )

  • 1. [N] – SHOT Learning In deep learning models davinnovation@gmail.com
  • 3. What they need… single one picture book 60,000 train data (MNIST)
  • 4. In Traditional Deep Learning… http://pinktentacle.com/tag/waseda-university/
  • 5. One Shot Learning Give Inference ability to Learning model
  • 6. One Shot Learning Train One(Few) Data, Works Awesome
  • 7. One Shot Learning – Many approaches Transfer Learning Domain Adaptation Imitation Learning …..
  • 8. Started from Meta Learning Meta means something that is "self-referencing". ‘Knowing about knowing’ Today…
  • 9. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Turing Machine Basic Structure
  • 10. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) N x M memory Block 𝑀𝑡 : t is time N M
  • 11. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Write Heads Controller External Input External Output Read Heads σ𝑖 𝑤𝑡(𝑖) = 1 (0 ≤ 𝑤𝑡(𝑖) ≤ 1)0.9 0.1 0 ... Weight vector : 𝑤𝑡 1 2 2 1 1 3 1.1 1.9 1.2 𝑟𝑡 ← ෍ 𝑖 𝑤𝑡(𝑖)𝑀𝑡(𝑖) Read vector : 𝑟𝑡 i = 0 i = 1 … i = 0 i = 1 …
  • 12. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1) 0.9 0.1 0 ... Weight vector : 𝑊𝑡0.1 1.8 2 1 0.1 2.7 erase vector : 𝑒𝑡 (0 < 𝑒𝑡 < 1) 𝑀′ 𝑡(𝑖) ← 𝑀𝑡−1(𝑖)[1 − 𝑤𝑡 𝑖 𝑒𝑡] ~1 ~0 ~1 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) ` Erase -> Add
  • 13. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1) 0.9 0.1 0 ... Weight vector : 𝑊𝑡 1 1.9 2.9 1.1 0.1 2.7 add vector : 𝑎 𝑡 (0 < 𝑎 𝑡 < 1) 𝑀𝑡 𝑖 ← 𝑀′ 𝑡 𝑖 + 𝑤𝑡 𝑖 𝑎 𝑡 ~1 ~1 ~0 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) Erase -> Add
  • 14. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Content-based addressing > - Based on Similarity of ‘Current value’ and ‘predicted by controller value’ location-based addressing > Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve them and perform a multiplication algorithm => - Based on addressed by location Use both mechanisms concurrently
  • 15. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing
  • 16. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Content Addressing 𝑤𝑡 𝑐 𝑘 𝑡 β 𝑡 𝑀𝑡 𝑤𝑡 𝑐 𝑖 ← exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑖 ) σ 𝑗 exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑗 ) 𝐾 u, v = 𝑢 ∙ 𝑣 | 𝑢 | ∙ | 𝑣 | ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 1 2 2 1 1 3 3 2 1 𝑘 𝑡 𝑀𝑡 β 𝑡 = 0 0.5 0.5 .61 .39 β 𝑡 = 5 .98 .02 β 𝑡 = 50 𝑤𝑡 𝑐
  • 17. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Interpolation𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡−1 𝑔𝑡 𝑤𝑡 𝑔 ← 𝑔𝑡 𝑤𝑡 𝑐 + (1 − 𝑔𝑡) 𝑤𝑡−1 .61 .39 𝑤𝑡 𝑐 .01 .90 𝑤𝑡−1 𝑔𝑡 = 0 .01 .90 𝑔𝑡 = 0.5 .36 .64 𝑤𝑡 𝑔
  • 18. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Convolutional Shift 𝑤𝑡 𝑔 𝑤′ 𝑡 𝑠𝑡 𝑤′ 𝑡(𝑖) ← σ 𝑗=0 𝑁−1 𝑤𝑡 𝑔 𝑗 𝑠𝑡(𝑖 − 𝑗) // circular convolution .01 .90 𝑤𝑡 𝑔 0 1 𝑠𝑡 0.9 𝑤′ 𝑡(0) .01 .90 𝑤𝑡 𝑔 0 1 𝑠𝑡 .01 𝑤′ 𝑡(1) .01 .90 𝑤′ 𝑡 0 1 𝑠𝑡 1 0 𝑠𝑡 0.5 0.5 𝑠𝑡 Right shift Left shift Spread shift
  • 19. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Sharpening 𝑤𝑡 𝑤′ 𝑡 γ 𝑡 𝑤𝑡 𝑖 ← 𝑤′ 𝑡(𝑖)γ 𝑡 σ 𝑗 𝑤′ 𝑡(𝑗)γ 𝑡 .01 .90 𝑤′ 𝑡 γ 𝑡 = 0 0.5 0.5 .01 .90 .0001 .9998 γ 𝑡 = 1 γ 𝑡 = 2
  • 20. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Neural Network (RNN+LSTM) Controller
  • 21. Experiment – Copy train time
  • 22. Experiment – Copy result NTM LSTM Input : Random binary vector ( 8bit ) NTM > LSTM Quick & Clear
  • 23. Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf NTM is ‘Meta learning’ => Use Memory-Augment, make rapid learn model.
  • 24. Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf Memory Read Heads Write Heads Controller External Input External OutputAddressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Content-based addressing > - Based on Similarity of ‘Current value’ and ‘predicted by controller value’ location-based addressing > Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve them and perform a multiplication algorithm => - Based on addressed by location Use both mechanisms concurrently => Doesn’t use location-based addressing => Only Use Content-based addressing Memory-Augmented Neural Networks (MANN)
  • 25. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 26. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑟 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 27. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Content Addressing 𝑤𝑡 𝑟 𝑘 𝑡 𝑀𝑡 𝑤𝑡 𝑟 𝑖 ← exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑖 ) σ 𝑗 exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑗 ) 𝐾 u, v = 𝑢 ∙ 𝑣 | 𝑢 | ∙ | 𝑣 | ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 1 2 2 1 1 3 3 2 1 𝑘 𝑡 𝑀𝑡 .53 .47 𝑤𝑡 𝑟 Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 28. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads 0.9 0.1 0 ... 0.1 1.8 2 1 0.1 2.7 𝑤𝑡 𝑢 ← γ 𝑤𝑡−1 𝑢 + 𝑤𝑡 𝑟 + 𝑤𝑡 𝑤 ~1 ~0 ~1 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) ` Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf Least Recently Used Access ( LRUA ) Compute from MANN 𝑤𝑡 𝑤 ← σ α 𝑤𝑡−1 𝑟 + (1 − σ α ) 𝑤𝑡−1 𝑙𝑢 Decay Parameter 𝑤𝑡 𝑙𝑢 (𝑖) = ቊ 0, 𝑖𝑓 𝑤𝑡 𝑢 𝑖 𝑖𝑠 𝑏𝑖𝑔 𝑒𝑛𝑜𝑢𝑔ℎ 1, 𝑒𝑙𝑠𝑒 𝑀𝑡 𝑖 ← 𝑀𝑡−1 𝑖 + 𝑤𝑡 𝑤 𝑖 𝑘 𝑡 Least Used Memory Sigmoid Function Scalar gate parameter 𝑤𝑡 𝑤 𝑘 𝑡 : write vector
  • 29. Experiment – Data One Episode : Input : (𝑥0, 𝑛𝑢𝑙𝑙), 𝑥1, 𝑦0 , 𝑥2, 𝑦1 , … (𝑥 𝑇, 𝑦 𝑇−1) Output : (𝑦′0), 𝑦′1 , 𝑦′2 , … (𝑦′ 𝑇) 𝑝 𝑦𝑡 𝑥𝑡; 𝐷1:𝑡−1; 𝜃) (𝑥0, 𝑛𝑢𝑙𝑙 ) (𝑦′0) RNN 𝑥1, 𝑦0 (𝑦′1)
  • 30. Experiment – Data Omniglot Dataset : 1600 > classes  1200 class train, 423 class test ( downscale to 20x20 ) + plus rotate augmentation
  • 31. Experiment – Classification Result Trained with one-hot vector representations With Five randomly chosen labels, train 100,000 episode ( each episode are ‘new class’ ) Instance : class emerge count…?
  • 33. Active One-Shot https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf RNN 𝑦𝑡, 𝑥 𝑡+1 or (0, 𝑥 𝑡+1) [0,1] || (𝑦′ 𝑡, 0) 𝑦𝑡, 𝑥 𝑡+1 [0,1] 𝑟𝑡 = −0.05 𝑦𝑡, 𝑥 𝑡+1 (𝑦′ 𝑡, 0) 𝑟𝑡 = ቊ +1, 𝑖𝑓 𝑦′ 𝑡 = 𝑦𝑡 −1
  • 36. Reference • https://tensorflow.blog/tag/one-shot-learning/ • http://www.modulabs.co.kr/DeepLAB_library/11115 • https://www.youtube.com/watch?v=CzQSQ_0Z-QU • https://www.slideshare.net/JisungDavidKim/oneshot- learning • https://norman3.github.io/papers/docs/neural_turing_ machine.html • https://www.slideshare.net/webaquebec/guillaume- chevalier-deep-learning-avec-tensor-flow • https://www.slideshare.net/ssuser06e0c5/metalearnin g-with-memory-augmented-neural-networks