Sequence Modeling

Sequence Modeling
Sequence model
• Sequence models are the machine learning models that input or(and)
output sequences of data.
• Examples: Speech recognition
Image captioning
Sentiment classification
Language translation
Stock market prediction
Music generation
Why we need Recurrent Neural Network (RNN) ?
• Traditional ANN can not process sequence data because of different sizes in
input/output neurons.
• Sequence data processing involves too much computation which ANN can not
handle.
• Parameter sharing is not possible in ANN.
• ANN does not have internal memory to keep track of previous inputs.
RNN vs ANN
Recurrent Neural Network
 A recurrent neural network (RNN) is a type of artificial neural network (ANN) which uses sequential data or
time series data.
 Extensively used in Language translation, natural language processing (NLP), speech recognition, and image
captioning ,etc.
 In a RNN output of the current layer is the input of the next layer.
 In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like
when it is required to predict the next word of a sentence, the previous words are required and hence
there is a need to remember the previous words(data).
 In a RNN same weights and bias are used at each time step. Because it performs the same task on all the
inputs of hidden layer to produce the output.
 The main and most important feature of RNN is Hidden state, which remembers some information which
gets used in the next layer.
 Specialized for processing a sequence data
 Sharing parameters across different parts of a model
 Time step index refer to the position in sequence
Types of RNN
• One to one
• One to many
• Many to one
• Many to many
Types of RNN
Building block of RNN
 RNN uses a loop to remember the relation among all the previous outputs to predict the next output.
 So first X0 is the input and h0 is the output. And also h0 is input for the next step where the already
existing input is x1. Similarly output of this layer which is h1, and the x2 will be the input for the next
step. This way it keeps remembering the context in training time
 Here A is hidden layer activation function. Activation function takes decision for the output to be
qualified or not for our result .Activation function is used based on my desired output pattern or type.
RNN is hard to Train
Long-Short Term Memory (LSTM)
 A type of RNN architecture that addresses the vanishing/exploding
gradient problem and allows learning of long-term dependencies
 Recently risen to prominence with state-of-the-art performance in
speech recognition, language modeling, translation, image captioning
Example:
 Today is holiday, so students ___ ____ in class.
 Tomorrow is Monday, so students __ __ ____ in class.
Memory
Today Holiday Tomorrow Monday
RNN vs LSTM
Introducing Gates
Special RNN - LSTM
Gates
The LSTM does have the ability to remove or add information to the cell state,
carefully regulated by structures called gates.
Gate: sigmoid neural net layer followed by pointwise multiplication operator
Gates control the flow of information to/from the memory
Gates are controlled by a concatenation of the output from the previous time
step and the current input and optionally the cell state vector.
Cell State Vector
 Represents the memory of the LSTM
 Undergoes changes via forgetting of old memory (forget gate) and
addition of new memory (input gate)
Long-Short Term Memory (LSTM)
Forget Gate/Keep Gate
• Controls what information to throw away from memory
Input Gate/Write Gate
• Controls what new information is added to cell state from current input
Output Gate/ Read Gate
• Conditionally decides what to output from the memory
Memory Update/Update Cell State
• The cell state vector aggregates the two components (old memory via the
forget gate and new memory via the input gate)
Summary
Bidirectional RNN
 Maintains two hidden layers, one for the forward states and the
other for the backward states.
 Consumes twice as much memory space for its weight and bias
parameters.
 Output units with a relevant summary of the past in the forward
states and the future in the backward states.
Encoder-Decoder Sequence to Sequence Architecture
Deep Recurrent Networks
Recursive Network
LSTM Networks Example
Training LSTM
LSTM Applications
• Reference material: https://colah.github.io/posts/2015-08-Understanding-
LSTMs/
• https://colah.github.io/posts/2015-08-Understanding-LSTMs/
• https://www.youtube.com/watch?v=ZVN14xYm7JA&list=PLbBjZEwyU7W1C
Ds3Vx_GOJ9b3EgYQB3GE&index=13&ab_channel=AlenaKruchkova
• https://www.slideshare.net/xuyangela/introduction-to-recurrent-neural-
network
• https://www.slideshare.net/LarryGuo2/chapter-10-170505-l

Sequence Modeling

Uploaded by

Copyright:

Available Formats

Sequence Modeling

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sequence Modeling

Uploaded by

Copyright:

Available Formats

Sequence Modeling

You might also like