Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Akshay Sehgal (www.akshaysehgal.com)
LSTM
Long Short Term Memory
Akshay Sehgal, Lead Data Scientist @ Reliance Industries
Akshay Sehgal (www.akshaysehgal.com)
Pre-requisites

• Neural Networks using Keras

• Forward pass & computation graphs

• Back propagation

• Basics of RNN

• Activation functions
Akshay Sehgal (www.akshaysehgal.com)
How to handle sequence data?
• Text, Stock prices, Sensor signals, DNA, Customer purchase behaviour, Sound signals

• Bag of words doesn’t preserve order/sequence in data 

• Modelling sequential data requires a ‘temporal’ architecture to simulate ‘memory’

• The attempt is to encode a sequence into itself in an iterative manner (recurrent) over a ‘time step’

• Applications include predictive models, natural language understanding, POS tagging, Machine
translation, natural language generation etc.
Akshay Sehgal (www.akshaysehgal.com)
An RNN (Recurrent Neural Network) can be seen as a
layer in a neural network used for encoding sequential
data into a vector representation that can then be used
for various tasks such as classification or just as an
encoding. In other words, it's a method to perform
feature engineering in an automated way for sequential
data.
What is an RNN?
What time is ?
Akshay Sehgal (www.akshaysehgal.com)
• Long-term dependencies not captured, as
the number of time steps increase, the RNN
is unable to connect information

• Vanishing gradient problem causes loss of
long term memory, while emphasising short
term.
Why don’t RNNs work in practice?
Akshay Sehgal (www.akshaysehgal.com)
• LSTMs try to add long term memory to remember certain hidden states more than others. This allows
them to retain knowledge over longer sequences.

• They have 2 outputs instead of 1, the hidden state and the cell state. Their computation is a bit more
complex than RNNs
How do LSTMs work?
RNN Chain
LSTM Chain
Akshay Sehgal (www.akshaysehgal.com)
• An LSTMs architecture consists of 3 gates - Forget
gate, Input gate, Output gate

• Tanh acts as a squashing function while Sigmoid
acts as a decision function (gate)

• Cell state is a channel that runs along the LSTM
chain carrying information from one time-step to
another freely
LSTM cell architecture
Akshay Sehgal (www.akshaysehgal.com)
A cell state is a conveyor belt that can carry information
from one time step to another. The three gates add
information to the cell state. Whether to add information
or not is dependent on the Sigmoid function. 0 means
add no information, 1 means add complete information.
The Cell state
Akshay Sehgal (www.akshaysehgal.com)
Let's say that the previous few time steps encode the
information about the gender of the subject. This is useful to
predict the next few words when the subject is the same.
But when a new subject enters, we would not want to retain
memory of the information about gender. This is what the
forget gate gets trained to do.

It concatenates the previous hidden state to the current
input, multiplies it with weights and adds a bias, then applies
a sigmoid function before multiplying it to the cell state.
The Forget Gate
Akshay Sehgal (www.akshaysehgal.com)
Input gate decides what information needs to be saved to the cell state. It simply does the same operation
as a forget gate but instead of writing it onto the cell state, it combines (multiplies) it with the Tanh
(squashed) of the concatenated vector of hidden state and input (plus bias). This is then added to the cell
state, which has been updated by the forget gate already.
The Input Gate
Akshay Sehgal (www.akshaysehgal.com)
Finally, we decide what is the output of the LSTM
cell (other than the cell state, which becomes the
hidden state for the next LSTM cell). This is done
simply by applying a sigmoid function on the
concatenation of the previous hidden state and
current input. But we then multiply it with the
squashed (tanh) version of the cell state which
contains what to remember and what to forget.
The Output Gate
Akshay Sehgal (www.akshaysehgal.com)
Using LSTMs as an encoder and decoder for
machine translation or Question-Answering bot.
Machine Translation
Akshay Sehgal (www.akshaysehgal.com)
Reading Material
• https://arxiv.org/pdf/1506.00019.pdf
• https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
• http://www.bioinf.jku.at/publications/older/2604.pdf
• https://github.com/oxford-cs-deepnlp-2017/lectures

More Related Content

What's hot

RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
Yao-Chieh Hu
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
Yan Xu
 
Time series predictions using LSTMs
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMs
Setu Chokshi
 
An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)
EmmanuelJosterSsenjo
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
Priyanka Reddy
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
Databricks
 
Cnn
CnnCnn
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
Andrii Gakhov
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
Akshay Hegde
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
Kuppusamy P
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 

What's hot (20)

RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Time series predictions using LSTMs
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMs
 
An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)An Introduction to Long Short-term Memory (LSTMs)
An Introduction to Long Short-term Memory (LSTMs)
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 

Similar to LSTM Basics

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Sharath TS
 
Rnn presentation 2
Rnn presentation 2Rnn presentation 2
Rnn presentation 2
Shubhangi Tandon
 
Long Short Term Memory LSTM
Long Short Term Memory LSTMLong Short Term Memory LSTM
Long Short Term Memory LSTM
Abdullah al Mamun
 
Synchronicity of a distributed account system
Synchronicity of a distributed account systemSynchronicity of a distributed account system
Synchronicity of a distributed account system
Luis Caldeira
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
AbhijitVenkatesh1
 
Building stateful systems with akka cluster sharding
Building stateful systems with akka cluster shardingBuilding stateful systems with akka cluster sharding
Building stateful systems with akka cluster sharding
Knoldus Inc.
 
Long short term memory on tensorflow using python
Long short term memory on tensorflow using pythonLong short term memory on tensorflow using python
Long short term memory on tensorflow using python
rahulk2004
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
SagarTekwani4
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
Dean Hamstead
 
Oscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simple
Martin Odersky
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
srisatish ambati
 
Deep Learning for Text (Text Mining) LSTM
Deep Learning for Text (Text Mining) LSTMDeep Learning for Text (Text Mining) LSTM
Deep Learning for Text (Text Mining) LSTM
m0972220819
 
Impromptu ideas in respect of v2 v and other
Impromptu ideas in respect of v2 v and otherImpromptu ideas in respect of v2 v and other
Impromptu ideas in respect of v2 v and other
Harshit Srivastava
 
Deep learning
Deep learningDeep learning
Deep learning
Ratnakar Pandey
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
SaumyaMundra3
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Fordham University
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
Yaroslav Tkachenko
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
JAXLondon2014
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 

Similar to LSTM Basics (20)

Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Rnn presentation 2
Rnn presentation 2Rnn presentation 2
Rnn presentation 2
 
Long Short Term Memory LSTM
Long Short Term Memory LSTMLong Short Term Memory LSTM
Long Short Term Memory LSTM
 
Synchronicity of a distributed account system
Synchronicity of a distributed account systemSynchronicity of a distributed account system
Synchronicity of a distributed account system
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
 
Building stateful systems with akka cluster sharding
Building stateful systems with akka cluster shardingBuilding stateful systems with akka cluster sharding
Building stateful systems with akka cluster sharding
 
Long short term memory on tensorflow using python
Long short term memory on tensorflow using pythonLong short term memory on tensorflow using python
Long short term memory on tensorflow using python
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
Oscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simple
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
 
Deep Learning for Text (Text Mining) LSTM
Deep Learning for Text (Text Mining) LSTMDeep Learning for Text (Text Mining) LSTM
Deep Learning for Text (Text Mining) LSTM
 
Impromptu ideas in respect of v2 v and other
Impromptu ideas in respect of v2 v and otherImpromptu ideas in respect of v2 v and other
Impromptu ideas in respect of v2 v and other
 
Deep learning
Deep learningDeep learning
Deep learning
 
Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 

Recently uploaded

Data management and excel appication.pptx
Data management and excel appication.pptxData management and excel appication.pptx
Data management and excel appication.pptx
OlabodeSamuel3
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
SantuJana12
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
wojakmodern
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
AltanAtabarut
 
Indian KS Unit 2 Mathematicians (1).pptx
Indian KS Unit 2 Mathematicians (1).pptxIndian KS Unit 2 Mathematicians (1).pptx
Indian KS Unit 2 Mathematicians (1).pptx
Nikita Gaikwad
 
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
Ladislau5
 
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Larry Smarr
 
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
ks1ni2di
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
DALubis
 
CT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptxCT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptx
RejoJohn2
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
Priyanka Jadhav
 
Why You Need Real-Time Data to Compete in E-Commerce
Why You Need  Real-Time Data to Compete in  E-CommerceWhy You Need  Real-Time Data to Compete in  E-Commerce
Why You Need Real-Time Data to Compete in E-Commerce
PromptCloud
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
Becky Burwell
 
emotional interface - dehligame satta for you
emotional interface  -  dehligame satta for youemotional interface  -  dehligame satta for you
emotional interface - dehligame satta for you
bkldehligame1
 
chapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someonechapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someone
abeeeeeeeer588
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
District 11 Solutions
 
ICAN Canada Decision Making Optimization through Data Mining Prof Oyedokun.pptx
ICAN Canada Decision Making Optimization through Data Mining Prof Oyedokun.pptxICAN Canada Decision Making Optimization through Data Mining Prof Oyedokun.pptx
ICAN Canada Decision Making Optimization through Data Mining Prof Oyedokun.pptx
Godwin Emmanuel Oyedokun MBA MSc PhD FCA FCTI FCNA CFE FFAR
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
HeidiLivengood
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
evwcarr
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
amazenolmedojeruel
 

Recently uploaded (20)

Data management and excel appication.pptx
Data management and excel appication.pptxData management and excel appication.pptx
Data management and excel appication.pptx
 
Vrinda store data analysis project using Excel
Vrinda store data analysis project using ExcelVrinda store data analysis project using Excel
Vrinda store data analysis project using Excel
 
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptxSAMPLE PRODUCT RESEARCH PR - strikingly.pptx
SAMPLE PRODUCT RESEARCH PR - strikingly.pptx
 
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdfParcel Delivery - Intel Segmentation and Last Mile Opt.pdf
Parcel Delivery - Intel Segmentation and Last Mile Opt.pdf
 
Indian KS Unit 2 Mathematicians (1).pptx
Indian KS Unit 2 Mathematicians (1).pptxIndian KS Unit 2 Mathematicians (1).pptx
Indian KS Unit 2 Mathematicians (1).pptx
 
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
393947940-The-Dell-EMC-PowerMax-Family-Overview.pdf
 
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
 
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
 
Accounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-RegulationsAccounting and Auditing Laws-Rules-and-Regulations
Accounting and Auditing Laws-Rules-and-Regulations
 
CT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptxCT AnGIOGRAPHY of pulmonary embolism.pptx
CT AnGIOGRAPHY of pulmonary embolism.pptx
 
Unit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptxUnit 1 Introduction to DATA SCIENCE .pptx
Unit 1 Introduction to DATA SCIENCE .pptx
 
Why You Need Real-Time Data to Compete in E-Commerce
Why You Need  Real-Time Data to Compete in  E-CommerceWhy You Need  Real-Time Data to Compete in  E-Commerce
Why You Need Real-Time Data to Compete in E-Commerce
 
SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024SFBA Splunk Usergroup meeting July 17, 2024
SFBA Splunk Usergroup meeting July 17, 2024
 
emotional interface - dehligame satta for you
emotional interface  -  dehligame satta for youemotional interface  -  dehligame satta for you
emotional interface - dehligame satta for you
 
chapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someonechapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someone
 
Data Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 SolutionsData Analytics for Decision Making By District 11 Solutions
Data Analytics for Decision Making By District 11 Solutions
 
ICAN Canada Decision Making Optimization through Data Mining Prof Oyedokun.pptx
ICAN Canada Decision Making Optimization through Data Mining Prof Oyedokun.pptxICAN Canada Decision Making Optimization through Data Mining Prof Oyedokun.pptx
ICAN Canada Decision Making Optimization through Data Mining Prof Oyedokun.pptx
 
Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635Data Storytelling Final Project for MBA 635
Data Storytelling Final Project for MBA 635
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
 

LSTM Basics

  • 1. Akshay Sehgal (www.akshaysehgal.com) LSTM Long Short Term Memory Akshay Sehgal, Lead Data Scientist @ Reliance Industries
  • 2. Akshay Sehgal (www.akshaysehgal.com) Pre-requisites • Neural Networks using Keras • Forward pass & computation graphs • Back propagation • Basics of RNN • Activation functions
  • 3. Akshay Sehgal (www.akshaysehgal.com) How to handle sequence data? • Text, Stock prices, Sensor signals, DNA, Customer purchase behaviour, Sound signals • Bag of words doesn’t preserve order/sequence in data • Modelling sequential data requires a ‘temporal’ architecture to simulate ‘memory’ • The attempt is to encode a sequence into itself in an iterative manner (recurrent) over a ‘time step’ • Applications include predictive models, natural language understanding, POS tagging, Machine translation, natural language generation etc.
  • 4. Akshay Sehgal (www.akshaysehgal.com) An RNN (Recurrent Neural Network) can be seen as a layer in a neural network used for encoding sequential data into a vector representation that can then be used for various tasks such as classification or just as an encoding. In other words, it's a method to perform feature engineering in an automated way for sequential data. What is an RNN? What time is ?
  • 5. Akshay Sehgal (www.akshaysehgal.com) • Long-term dependencies not captured, as the number of time steps increase, the RNN is unable to connect information • Vanishing gradient problem causes loss of long term memory, while emphasising short term. Why don’t RNNs work in practice?
  • 6. Akshay Sehgal (www.akshaysehgal.com) • LSTMs try to add long term memory to remember certain hidden states more than others. This allows them to retain knowledge over longer sequences. • They have 2 outputs instead of 1, the hidden state and the cell state. Their computation is a bit more complex than RNNs How do LSTMs work? RNN Chain LSTM Chain
  • 7. Akshay Sehgal (www.akshaysehgal.com) • An LSTMs architecture consists of 3 gates - Forget gate, Input gate, Output gate • Tanh acts as a squashing function while Sigmoid acts as a decision function (gate) • Cell state is a channel that runs along the LSTM chain carrying information from one time-step to another freely LSTM cell architecture
  • 8. Akshay Sehgal (www.akshaysehgal.com) A cell state is a conveyor belt that can carry information from one time step to another. The three gates add information to the cell state. Whether to add information or not is dependent on the Sigmoid function. 0 means add no information, 1 means add complete information. The Cell state
  • 9. Akshay Sehgal (www.akshaysehgal.com) Let's say that the previous few time steps encode the information about the gender of the subject. This is useful to predict the next few words when the subject is the same. But when a new subject enters, we would not want to retain memory of the information about gender. This is what the forget gate gets trained to do. It concatenates the previous hidden state to the current input, multiplies it with weights and adds a bias, then applies a sigmoid function before multiplying it to the cell state. The Forget Gate
  • 10. Akshay Sehgal (www.akshaysehgal.com) Input gate decides what information needs to be saved to the cell state. It simply does the same operation as a forget gate but instead of writing it onto the cell state, it combines (multiplies) it with the Tanh (squashed) of the concatenated vector of hidden state and input (plus bias). This is then added to the cell state, which has been updated by the forget gate already. The Input Gate
  • 11. Akshay Sehgal (www.akshaysehgal.com) Finally, we decide what is the output of the LSTM cell (other than the cell state, which becomes the hidden state for the next LSTM cell). This is done simply by applying a sigmoid function on the concatenation of the previous hidden state and current input. But we then multiply it with the squashed (tanh) version of the cell state which contains what to remember and what to forget. The Output Gate
  • 12. Akshay Sehgal (www.akshaysehgal.com) Using LSTMs as an encoder and decoder for machine translation or Question-Answering bot. Machine Translation
  • 13. Akshay Sehgal (www.akshaysehgal.com) Reading Material • https://arxiv.org/pdf/1506.00019.pdf • https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/ • http://www.bioinf.jku.at/publications/older/2604.pdf • https://github.com/oxford-cs-deepnlp-2017/lectures