LSTM Basics

Akshay Sehgal (www.akshaysehgal.com)
LSTM
Long Short Term Memory
Akshay Sehgal, Lead Data Scientist @ Reliance Industries

Pre-requisites

• Neural Networks using Keras

• Forward pass & computation graphs

• Back propagation

• Basics of RNN

• Activation functions

How to handle sequence data?
• Text, Stock prices, Sensor signals, DNA, Customer purchase behaviour, Sound signals

• Bag of words doesn’t preserve order/sequence in data

• Modelling sequential data requires a ‘temporal’ architecture to simulate ‘memory’

• The attempt is to encode a sequence into itself in an iterative manner (recurrent) over a ‘time step’

• Applications include predictive models, natural language understanding, POS tagging, Machine
translation, natural language generation etc.

An RNN (Recurrent Neural Network) can be seen as a
layer in a neural network used for encoding sequential
data into a vector representation that can then be used
for various tasks such as classiﬁcation or just as an
encoding. In other words, it's a method to perform
feature engineering in an automated way for sequential
data.
What is an RNN?
What time is ?

• Long-term dependencies not captured, as
the number of time steps increase, the RNN
is unable to connect information

• Vanishing gradient problem causes loss of
long term memory, while emphasising short
term.
Why don’t RNNs work in practice?

• LSTMs try to add long term memory to remember certain hidden states more than others. This allows
them to retain knowledge over longer sequences.

• They have 2 outputs instead of 1, the hidden state and the cell state. Their computation is a bit more
complex than RNNs
How do LSTMs work?
RNN Chain
LSTM Chain

• An LSTMs architecture consists of 3 gates - Forget
gate, Input gate, Output gate

• Tanh acts as a squashing function while Sigmoid
acts as a decision function (gate)

• Cell state is a channel that runs along the LSTM
chain carrying information from one time-step to
another freely
LSTM cell architecture

A cell state is a conveyor belt that can carry information
from one time step to another. The three gates add
information to the cell state. Whether to add information
or not is dependent on the Sigmoid function. 0 means
add no information, 1 means add complete information.
The Cell state

Let's say that the previous few time steps encode the
information about the gender of the subject. This is useful to
predict the next few words when the subject is the same.
But when a new subject enters, we would not want to retain
memory of the information about gender. This is what the
forget gate gets trained to do.

It concatenates the previous hidden state to the current
input, multiplies it with weights and adds a bias, then applies
a sigmoid function before multiplying it to the cell state.
The Forget Gate

Input gate decides what information needs to be saved to the cell state. It simply does the same operation
as a forget gate but instead of writing it onto the cell state, it combines (multiplies) it with the Tanh
(squashed) of the concatenated vector of hidden state and input (plus bias). This is then added to the cell
state, which has been updated by the forget gate already.
The Input Gate

Finally, we decide what is the output of the LSTM
cell (other than the cell state, which becomes the
hidden state for the next LSTM cell). This is done
simply by applying a sigmoid function on the
concatenation of the previous hidden state and
current input. But we then multiply it with the
squashed (tanh) version of the cell state which
contains what to remember and what to forget.
The Output Gate

Using LSTMs as an encoder and decoder for
machine translation or Question-Answering bot.
Machine Translation

Reading Material
• https://arxiv.org/pdf/1506.00019.pdf
• https://machinelearningmastery.com/sequence-classiﬁcation-lstm-recurrent-neural-networks-python-keras/
• http://www.bioinf.jku.at/publications/older/2604.pdf
• https://github.com/oxford-cs-deepnlp-2017/lectures

LSTM Basics

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to LSTM Basics

Similar to LSTM Basics (20)

Recently uploaded

Recently uploaded (20)

LSTM Basics