Rnn & Lstm

Recurrent Neural Networks(RNN)
&
Long Short-Term Memory (LSTM)
Subash Chandra Pakhrin
PhD Student
Wichita State University
Wichita, Kansas

Sequence Data Modeling
Neural Network are being applied to the problems that involve
sequential processing of data
Inferring and understanding genomics sequences

A Sequence Modeling Problem: Predict the
Next Word
“This morning I took my dog for a walk.”
given these words predict the next word
Idea # 1: Use a Fixed Window
“This morning I took my dog for a walk.”
given these predict the
two word next word
One-hot feature encoding: tells us what each word is
[ 1 0 0 0 0 0 1 0 0 0 ]
for a
Prediction

Problem #1: Can’t Model Long-Term
Dependencies
“Nepal is where I grew up, but I now live in Wichita. I speak fluent ___.”
We need information from the distant past to accurately predict the
correct word.

Idea #2: Use Entire Sequence as Set of Counts
“This morning I took my dog for a”
“bag of words”
[ 0 1 0 0 1 0 0 … 0 0 1 1 0 0 0 1 ]
prediction

Sequence Modeling: Design Criteria
To model sequences, we need to:
1. Handle variable-length sequences
2. Track long-term dependencies
3. Maintain information about order
4. Share parameters across the sequence

Recurrent Neural Network for Sequence
Modeling

Recurrent Neural Network
Apply a recurrence relation at every
time step to process a sequence:
Note: The same function and set of
parameters are used at every time step

RNNs: Computational Graph Across Time

Back propagation through Time (BPTT)

Standard RNN Gradient Flow
Computing the gradient w.r.t. h0 involves many factors of Whh + repeated gradient
computation!
Many values > 1: Many values < 1:
Exploding gradients Vanishing gradients
Gradient clipping to 1) Activation Function
Scale big gradients 2) Weight Initialization
3) Network Architecture

The Problem of Long – Term Dependencies
Why are vanishing gradients a problem ?
“The clouds are in the ______”
“I grew up in Nepal, … and I speak fluent ______.”

Task #2: Parameter Initialization
Initialize weights to identity matrix
Initialize biases to zero
This helps prevent the weights from shrinking to zero.

Solution #3: Gated Cells
Idea: use a more complex recurrent unit with gates to
Control what information is passed through
Long Short Term Memory (LSTMs) networks rely on a gated cell to track
information throughout many time steps.

Standard RNN
In a standard RNN, repeating modules contain a simple computation
node

Long Short Term Memory (LSTMs)
LSTMs networks rely on gated cell to track information throughout many
time steps.
LSTM modules contain computational blocks that control information flow
LSTM cells are able to track information throughout many time steps

Information is added or removed through structures called gates
Gates optionally let information through, for example via a sigmoid
neural net layer and point wise multiplication

How does LSTMs work ?
1) Forget 2) Store 3) Update 4) Output

LSTMs forget irrelevant parts of the previous state

LSTMs store relevant new information into the cell state

LSTMs selectively update cell state values

The output gate controls what information is sent to the next time step

LSTM Gradient Flow
Uninterrupted gradient flow!

LSTMs: Key Concepts
1. Maintain a separate cell state from what is outputted
2. Use gates to control the flow of information
• Forget gate gets rid of irrelevant information
• Store relevant information from current input
• Selectively update cell state
• Output gate returns a filtered version of the cell state
3. Back propagation through time with uninterrupted gradient flow

Application of RNN
Music Generation
Sentiment Classification
Input: Sequence of words
Output: probability of having
positive sentiment

Application of RNN
Trajectory Prediction: Self-Driving Cars
Environmental Modeling
Machine Translation

Deep Learning for Sequence Modeling:
Summary
1. RNNs are well suited for sequence modeling tasks
2. Model sequences via a recurrence relation
3. Training RNNs with back propagation through time
4. Gated cells like LSTMs let us model long-term dependencies
5. Models for music generation, classification, machine translation,
and more

References
1. Ava Soleimany, Recurrent Neural Networks, MIT Introduction to
Deep Learning, 2020, 6. S191: https://youtu.be/SEnXr6v2ifU
2. Lex Friedman, MIT 6.S094: Recurrent Neural Networks for Steering
Through Time, 2017, https://youtu.be/nFTQ7kHQWtc
3. Fei-Fei Li, Justin Johnson & Serena Young, Stanford University,
Convolutional Neural Networks for Visual Recognition, 2017,
https://youtu.be/6niqTuYFZLQ?list=PL3FW7Lu3i5JvHM8ljYj-
zLfQRF3EO8sYv

Rnn & Lstm

More Related Content

Rnn & Lstm

Editor's Notes