Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Recurrent Neural Networks(RNN)
&
Long Short-Term Memory (LSTM)
Subash Chandra Pakhrin
PhD Student
Wichita State University
Wichita, Kansas
Sequence Data Modeling
Neural Network are being applied to the problems that involve
sequential processing of data
Inferring and understanding genomics sequences
A Sequence Modeling Problem: Predict the
Next Word
“This morning I took my dog for a walk.”
given these words predict the next word
Idea # 1: Use a Fixed Window
“This morning I took my dog for a walk.”
given these predict the
two word next word
One-hot feature encoding: tells us what each word is
[ 1 0 0 0 0 0 1 0 0 0 ]
for a
Prediction
Problem #1: Can’t Model Long-Term
Dependencies
“Nepal is where I grew up, but I now live in Wichita. I speak fluent ___.”
We need information from the distant past to accurately predict the
correct word.
Idea #2: Use Entire Sequence as Set of Counts
“This morning I took my dog for a”
“bag of words”
[ 0 1 0 0 1 0 0 … 0 0 1 1 0 0 0 1 ]
prediction
Sequence Modeling: Design Criteria
To model sequences, we need to:
1. Handle variable-length sequences
2. Track long-term dependencies
3. Maintain information about order
4. Share parameters across the sequence
Recurrent Neural Network for Sequence
Modeling
Recurrent Neural Network
Apply a recurrence relation at every
time step to process a sequence:
Note: The same function and set of
parameters are used at every time step
RNN State Update and Output
RNNs: Computational Graph Across Time
Back propagation through Time (BPTT)
Standard RNN Gradient Flow
Computing the gradient w.r.t. h0 involves many factors of Whh + repeated gradient
computation!
Many values > 1: Many values < 1:
Exploding gradients Vanishing gradients
Gradient clipping to 1) Activation Function
Scale big gradients 2) Weight Initialization
3) Network Architecture
The Problem of Long – Term Dependencies
Why are vanishing gradients a problem ?
“The clouds are in the ______”
“I grew up in Nepal, … and I speak fluent ______.”
Task #1: Activation Functions
Task #2: Parameter Initialization
Initialize weights to identity matrix
Initialize biases to zero
This helps prevent the weights from shrinking to zero.
Solution #3: Gated Cells
Idea: use a more complex recurrent unit with gates to
Control what information is passed through
Long Short Term Memory (LSTMs) networks rely on a gated cell to track
information throughout many time steps.
Standard RNN
In a standard RNN, repeating modules contain a simple computation
node
Long Short Term Memory (LSTMs)
LSTMs networks rely on gated cell to track information throughout many
time steps.
LSTM modules contain computational blocks that control information flow
LSTM cells are able to track information throughout many time steps
Long Short Term Memory (LSTMs)
Information is added or removed through structures called gates
Gates optionally let information through, for example via a sigmoid
neural net layer and point wise multiplication
Long Short Term Memory (LSTMs)
How does LSTMs work ?
1) Forget 2) Store 3) Update 4) Output
Long Short Term Memory (LSTMs)
1) Forget 2) Store 3) Update 4) Output
LSTMs forget irrelevant parts of the previous state
Long Short Term Memory (LSTMs)
1) Forget 2) Store 3) Update 4) Output
LSTMs store relevant new information into the cell state
Long Short Term Memory (LSTMs)
1) Forget 2) Store 3) Update 4) Output
LSTMs selectively update cell state values
Long Short Term Memory (LSTMs)
1) Forget 2) Store 3) Update 4) Output
The output gate controls what information is sent to the next time step
LSTM Gradient Flow
Uninterrupted gradient flow!
LSTMs: Key Concepts
1. Maintain a separate cell state from what is outputted
2. Use gates to control the flow of information
• Forget gate gets rid of irrelevant information
• Store relevant information from current input
• Selectively update cell state
• Output gate returns a filtered version of the cell state
3. Back propagation through time with uninterrupted gradient flow
Application of RNN
Music Generation
Sentiment Classification
Input: Sequence of words
Output: probability of having
positive sentiment
Application of RNN
Trajectory Prediction: Self-Driving Cars
Environmental Modeling
Machine Translation
Attention Mechanisms
Deep Learning for Sequence Modeling:
Summary
1. RNNs are well suited for sequence modeling tasks
2. Model sequences via a recurrence relation
3. Training RNNs with back propagation through time
4. Gated cells like LSTMs let us model long-term dependencies
5. Models for music generation, classification, machine translation,
and more
References
1. Ava Soleimany, Recurrent Neural Networks, MIT Introduction to
Deep Learning, 2020, 6. S191: https://youtu.be/SEnXr6v2ifU
2. Lex Friedman, MIT 6.S094: Recurrent Neural Networks for Steering
Through Time, 2017, https://youtu.be/nFTQ7kHQWtc
3. Fei-Fei Li, Justin Johnson & Serena Young, Stanford University,
Convolutional Neural Networks for Visual Recognition, 2017,
https://youtu.be/6niqTuYFZLQ?list=PL3FW7Lu3i5JvHM8ljYj-
zLfQRF3EO8sYv

More Related Content

Rnn & Lstm

  • 1. Recurrent Neural Networks(RNN) & Long Short-Term Memory (LSTM) Subash Chandra Pakhrin PhD Student Wichita State University Wichita, Kansas
  • 2. Sequence Data Modeling Neural Network are being applied to the problems that involve sequential processing of data Inferring and understanding genomics sequences
  • 3. A Sequence Modeling Problem: Predict the Next Word “This morning I took my dog for a walk.” given these words predict the next word Idea # 1: Use a Fixed Window “This morning I took my dog for a walk.” given these predict the two word next word One-hot feature encoding: tells us what each word is [ 1 0 0 0 0 0 1 0 0 0 ] for a Prediction
  • 4. Problem #1: Can’t Model Long-Term Dependencies “Nepal is where I grew up, but I now live in Wichita. I speak fluent ___.” We need information from the distant past to accurately predict the correct word.
  • 5. Idea #2: Use Entire Sequence as Set of Counts “This morning I took my dog for a” “bag of words” [ 0 1 0 0 1 0 0 … 0 0 1 1 0 0 0 1 ] prediction
  • 6. Sequence Modeling: Design Criteria To model sequences, we need to: 1. Handle variable-length sequences 2. Track long-term dependencies 3. Maintain information about order 4. Share parameters across the sequence
  • 7. Recurrent Neural Network for Sequence Modeling
  • 8. Recurrent Neural Network Apply a recurrence relation at every time step to process a sequence: Note: The same function and set of parameters are used at every time step
  • 9. RNN State Update and Output
  • 12. Standard RNN Gradient Flow Computing the gradient w.r.t. h0 involves many factors of Whh + repeated gradient computation! Many values > 1: Many values < 1: Exploding gradients Vanishing gradients Gradient clipping to 1) Activation Function Scale big gradients 2) Weight Initialization 3) Network Architecture
  • 13. The Problem of Long – Term Dependencies Why are vanishing gradients a problem ? “The clouds are in the ______” “I grew up in Nepal, … and I speak fluent ______.”
  • 14. Task #1: Activation Functions
  • 15. Task #2: Parameter Initialization Initialize weights to identity matrix Initialize biases to zero This helps prevent the weights from shrinking to zero.
  • 16. Solution #3: Gated Cells Idea: use a more complex recurrent unit with gates to Control what information is passed through Long Short Term Memory (LSTMs) networks rely on a gated cell to track information throughout many time steps.
  • 17. Standard RNN In a standard RNN, repeating modules contain a simple computation node
  • 18. Long Short Term Memory (LSTMs) LSTMs networks rely on gated cell to track information throughout many time steps. LSTM modules contain computational blocks that control information flow LSTM cells are able to track information throughout many time steps
  • 19. Long Short Term Memory (LSTMs) Information is added or removed through structures called gates Gates optionally let information through, for example via a sigmoid neural net layer and point wise multiplication
  • 20. Long Short Term Memory (LSTMs) How does LSTMs work ? 1) Forget 2) Store 3) Update 4) Output
  • 21. Long Short Term Memory (LSTMs) 1) Forget 2) Store 3) Update 4) Output LSTMs forget irrelevant parts of the previous state
  • 22. Long Short Term Memory (LSTMs) 1) Forget 2) Store 3) Update 4) Output LSTMs store relevant new information into the cell state
  • 23. Long Short Term Memory (LSTMs) 1) Forget 2) Store 3) Update 4) Output LSTMs selectively update cell state values
  • 24. Long Short Term Memory (LSTMs) 1) Forget 2) Store 3) Update 4) Output The output gate controls what information is sent to the next time step
  • 26. LSTMs: Key Concepts 1. Maintain a separate cell state from what is outputted 2. Use gates to control the flow of information • Forget gate gets rid of irrelevant information • Store relevant information from current input • Selectively update cell state • Output gate returns a filtered version of the cell state 3. Back propagation through time with uninterrupted gradient flow
  • 27. Application of RNN Music Generation Sentiment Classification Input: Sequence of words Output: probability of having positive sentiment
  • 28. Application of RNN Trajectory Prediction: Self-Driving Cars Environmental Modeling Machine Translation
  • 30. Deep Learning for Sequence Modeling: Summary 1. RNNs are well suited for sequence modeling tasks 2. Model sequences via a recurrence relation 3. Training RNNs with back propagation through time 4. Gated cells like LSTMs let us model long-term dependencies 5. Models for music generation, classification, machine translation, and more
  • 31. References 1. Ava Soleimany, Recurrent Neural Networks, MIT Introduction to Deep Learning, 2020, 6. S191: https://youtu.be/SEnXr6v2ifU 2. Lex Friedman, MIT 6.S094: Recurrent Neural Networks for Steering Through Time, 2017, https://youtu.be/nFTQ7kHQWtc 3. Fei-Fei Li, Justin Johnson & Serena Young, Stanford University, Convolutional Neural Networks for Visual Recognition, 2017, https://youtu.be/6niqTuYFZLQ?list=PL3FW7Lu3i5JvHM8ljYj- zLfQRF3EO8sYv

Editor's Notes

  1. RNN = Represent as computational graph unrolled across time