0% found this document useful (0 votes)

255 views

Lecture Notes - Recurrent Neural Networks

Uploaded by

Bhavin Panchal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

255 views

Lecture Notes - Recurrent Neural Networks

Uploaded by

Bhavin Panchal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

In this module, you learnt about the recurrent neural network (RNN) and its different types of

architectures. You learnt about the different kinds of RNN architectures and their usage on problems that
involve sequences. Then, you learnt about backpropagation and the problem of vanishing and exploding
gradient in RNNs. You also learnt about bidirectional RNNs which are variants of the standard RNNs.
Finally, you learnt about the LSTM network and its variant GRU network both of which help in tackling the
problem of vanishing gradients faced by a vanilla RNN network.

You learnt that a normal neural network is insufficient to train sequence data. Some examples of sequence
data are:
● Time series
● Music
● Videos
● Text

You learnt that sequential data contains multiple entities. The order in which these entities are present is
important.

You learnt about the architecture of an RNN. The architecture is such that it takes into account the
multiple entities present in a sequence. The architecture of an RNN and the feedforward equations are
shown below:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Figure 1: Architecture of an RNN

You learnt that an RNN consists of recurrent layers. The weights present below the recurrent layers are
denoted by WR. You also learnt that WR is a square matrix because it connects each and every neuron at
timestamp ‘t’ in layer ‘l’ with each and every neuron at timestep ‘t+1’ in the same layer ‘l’.

You also learnt that each activation is dependent on two things: the activation in the previous layer ‘l-1’ at
the current timestep ‘t’, and the activation in the same layer ‘l’ at the previous timestep ‘t-1’.

You also learnt about the matrix sizes of the terms involved in the feedforward equations. The following
table shows the matrix sizes:

Table 1: Matrix sizes in an RNN network

Term Size

WF(l) (#neurons in layer l, #neurons in layer l-1)

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

WR(l) (#neurons in layer l, #neurons in layer l)

b(l) (#neurons in layer l, 1)

zt(l) (#neurons in layer l, batch_size)

at(l) (#neurons in layer l, batch_size)

In the above notation, ‘t’ denotes the timestep, ‘l’ denotes the layer in the network and batch_size is the
number of data points passed in one go.

You also learnt that tthere’s one more way to write the feedforward equation in an RNN shown below:

zt(l) = W(l)[at(l-1), at-1(l)] + b(l)

where,
W(l) = [ WF(l) | WR(l)], that is the column-wise concatenation of the weight matrices at layer ‘l’.
[at(l-1), at-1(l)] is the row-wise concatenation of the activations at(l-1) and at-1(l).

Next, you went through the different types of RNN. You learnt that changing the input and/or the output
leads to a different architecture. The different types of RNN that you learnt about are:

● Many-to-one architecture:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Figure 2: A many-to-one architecture of an RNN

You learnt that this architecture involves a sequence as an input and a single entity as an output. You used
this architecture in the C-code generator which was a character level text generator.

● Standard many-to-many architecture:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Figure 3: A standard many-to-many architecture of an RNN

You learnt that this type of RNN can be used to model data which involves sequences in the input as well
as the output. The important thing to note here is that the input and output sequences must have a
one-to-one correspondence and therefore the input and output sequences are equal in length. You used
this type of architecture while building a POS tagger where the input comprised of a sentence and the
output comprised of a part-of-speech tag for each word in the sentence.

● Encoder-decoder architecture:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Figure 4: An encoder-decoder architecture of an RNN

You learnt that this is also a many-to-many architecture type. But the input and output sequences don’t
have a one-to-one correspondence. As a result, most often than not, the length of the input and the
output sequence is not equal. You learnt that this architecture can be deployed in problems such as
language translation and document summarization. You also learnt that the errors are backpropagated
from the decoder to the encoder. The encoder and decoder have a different set of weights and they are
different RNNs altogether. The loss is calculated at each timestep which can either be backpropagated at
each timestep, or the cumulative loss (sum of all the losses from all the timesteps of a sequence) can be
backpropagated after the entire sequence is ingested. Generally, the errors are backpropagated once an
RNN ingests an entire batch.

● One-to-many architecture:

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

Figure 5: A one-to-many architecture of an RNN

You learnt that this type of architecture has a single entity as an input and a sequence as the output. You
can use this architecture for generation such as music generation, creating drawings, generating text, etc.

After going through the architectures, you learnt the mechanism in which gradients flow in an RNN. This
mechanism is called backpropagation through time (BPTT). You learnt that in any given term in an RNN
depends not only on the current input but also on the input from previous timesteps. The gradients not
only flow back from the output layer to the input layer, but they also flow back in time from the last
timestep to the first timestep. Hence the name backpropagation through time.

© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved

You also learnt that the backpropagation problem leads to the problem of vanishing and exploding
gradients. The reason being the recurrent weight matrix WR. WR is raised to higher and higher power the
more one goes back in time.

You learnt that there are two kinds of problems in sequences:

● Offline sequences: These allow for a lookahead in time. The entire sequence is present for your
perusal. For example, a document present in your local drive where you have access to the entire
document.
● Online sequences: These don’t allow for a lookahead. For example, a person is speaking and you
need to transcribe the speech. In this case, you don’t know what is going to come next.

You learnt that you can make use of offline sequences by looking ahead. You can feed the offline
sequences to an RNN in regular order as well as the reverse order to get better results in whatever task
you’re doing. Such an RNN is called a bidirectional RNN. You also learnt that in a bidirectional RNN, the
input at each timestep is a concatenation of the entity present in regular order and the entity present in
reverse order. For example, for a sentence of length 100, the input at the first timestep will be a
concatenation of the first word x1 and the last word x100. You learnt that a bidirectional RNN has 2x number
of parameters than a vanilla RNN.

To get rid of the vanishing gradients problem, you learnt that researchers came up with another type of
cell that can be used inside an RNN layer, called the LSTM cell.

You learnt that the LSTM cell has three characteristics:

● Cell state: You learnt that the LSTM cell has an explicit ‘memory’ which is stored in the cell state.
This memory helps an RNN to store important information about the sequences that it ingests.
● Gating mechanisms: The gating mechanisms allow an LSTM cell to manipulate the cell state in an
efficient way. It also allows an LSTM to output efficiently.
● Constant error carousel: You learnt that the errors flow smoothly without vanishing during
backpropagating through time. You learnt that the errors can flow from the current cell state to the
previous cell state uninterrupted. This helps the LSTM to learn long-term dependencies unlike a
vanilla RNN.

Next, you saw how an LSTM cell looks like and what are its feedforward equations. A diagram of an LSTM
cell is shown below.

Figure 6: An LSTM cell

There are three inputs to an LSTM cell: the cell state from the previous timestep (ct-1), the activation from
the previous timestep (ht-1) and the input from the current timestep (xt) from the previous layer. There are
two outputs: the current cell state (ct) and the current activation (ht) which goes in two directions: into the
next timestep and into the next layer just like normal RNN activations are passed in two directions.

You also learnt that there are three gates: forget gate, update gate and the output gate. The forget gate is
used to discard information from the previous cell state. The update gate writes new information to the
previous cell state. After discarding and writing new information, you get the new cell state.

You learnt that an LSTM layer is made of multiple LSTM cells and an LSTM network can have multiple
LSTM layers stacked on top of each other in the same way as an RNN with multiple RNN layers. The
feedforward equations of an LSTM network are:

f t = σ (W f [h t−1 , xt ] + bf )
it = σ (W i [h t−1 , xt ] + bi )
ct′ = tanh(W c [h t−1 , xt ] + bc )
© Copyright 2018. UpGrad Education Pvt. Ltd. All rights reserved
ct = f t ct−1 + it ct′
ot = σ (W o [h t−1 , xt ] + bo )
ht = ot tanh(ct )

You also learnt that each of the fours weight matrices involved in the LSTM feedforward equations is a
column-wise concatenation of the feedforward weight (WF) and the recurrent weights (WR) in the layer.

[
W f = W F f | W Rf ]
W i = [W F i | W Ri ]
W c = [W F c | W Rc ]
W o = [W F o | W Ro ]

You learnt that as a result of 4 weight matrices and biases, an LSTM layer has 4x parameters than an RNN
layer.

Finally, you briefly saw an LSTM variant - the gated recurrent unit (GRU). A GRU network consists of GRU
layers which consist of GRU cells which are similar to LSTM cells. However, the GRU has fewer parameters
than an LSTM network. A GRU has three weight matrices as compared to the four in an LSTM layer. This
means that a GRU has 3x parameters than a vanilla RNN layer.

Finally, you learnt how to build different types of RNN in Python using the Keras library. You are
recommended to go through the RNN code provided to you.

Disclaimer: All content and material on the UpGrad website is copyrighted material, either belonging to UpGrad or
its bonafide contributors and is purely for the dissemination of education. You are permitted to access print and
download extracts from this site purely for your own education only and on the following basis:

● You can download this document from the website for self-use only.
● Any copies of this document, in part or full, saved to disc or to any other storage medium may only be used
for subsequent, self-viewing purposes or to print an individual extract or copy for non-commercial personal
use only.
● Any further dissemination, distribution, reproduction, copying of the content of the document herein or the
uploading thereof on other websites or use of content for any other commercial/unauthorized purposes in
any way which could infringe the intellectual property rights of UpGrad or its contributors, is strictly
prohibited.
● No graphics, images or photographs from any accompanying text in this document will be used separately
for unauthorised purposes.
● No material in this document will be modified, adapted or altered in any way.
● No part of this document or UpGrad content may be reproduced or stored in any other web site or included
in any public or private electronic retrieval system or service without UpGrad’s prior written permission.
● Any rights not expressly granted in these terms are reserved.

Javascript Leetcode Examples
No ratings yet
Javascript Leetcode Examples
34 pages
Unit 5
No ratings yet
Unit 5
61 pages
Sign Language Detection
No ratings yet
Sign Language Detection
32 pages
2 DNN-CNN-RNN
100% (1)
2 DNN-CNN-RNN
87 pages
Java Certification Study Notes
No ratings yet
Java Certification Study Notes
91 pages
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
No ratings yet
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
5 pages
Unit 2 AI
No ratings yet
Unit 2 AI
22 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
Neural Network Complete Notes
No ratings yet
Neural Network Complete Notes
46 pages
Matrix Multiplication Using SIMD Technologies
No ratings yet
Matrix Multiplication Using SIMD Technologies
13 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Aggregate Data Models Unit 2
No ratings yet
Aggregate Data Models Unit 2
16 pages
CD Questions With Answers
100% (1)
CD Questions With Answers
36 pages
NN UNIT-1 Complete Notes with 153 pages (1)
No ratings yet
NN UNIT-1 Complete Notes with 153 pages (1)
153 pages
Overfitting vs. Underfitting, Bias vs. Variance
No ratings yet
Overfitting vs. Underfitting, Bias vs. Variance
7 pages
East West Institute of Technology: Sadp Notes
No ratings yet
East West Institute of Technology: Sadp Notes
30 pages
Laboratory 1. Working With Images in Opencv
No ratings yet
Laboratory 1. Working With Images in Opencv
13 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
Android Interview Questions PDF
No ratings yet
Android Interview Questions PDF
24 pages
Apache Mahout Essentials - Sample Chapter
No ratings yet
Apache Mahout Essentials - Sample Chapter
25 pages
PPS Course Material
100% (1)
PPS Course Material
177 pages
300+ TOP Operating System LAB VIVA Questions and Answers
No ratings yet
300+ TOP Operating System LAB VIVA Questions and Answers
25 pages
Ai-Unit-Iii Notes
No ratings yet
Ai-Unit-Iii Notes
46 pages
Chapter 8. Software Prototyping
No ratings yet
Chapter 8. Software Prototyping
2 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Full Download Physically Based Rendering From Theory to Implementation 4th edition Matt Pharr PDF DOCX
100% (2)
Full Download Physically Based Rendering From Theory to Implementation 4th edition Matt Pharr PDF DOCX
50 pages
Machine Learning Tech. (Unit-1), KAI-601
No ratings yet
Machine Learning Tech. (Unit-1), KAI-601
18 pages
Fundamentals of Machine Learning For Predictive Data Analytics
No ratings yet
Fundamentals of Machine Learning For Predictive Data Analytics
49 pages
C Notes Iitkgp
No ratings yet
C Notes Iitkgp
231 pages
Chapter 8 - Arrays - PPT Slides
No ratings yet
Chapter 8 - Arrays - PPT Slides
96 pages
Push-Pop Get - Return CFG
No ratings yet
Push-Pop Get - Return CFG
5 pages
Deep Learning (MODULE-3) (1)
No ratings yet
Deep Learning (MODULE-3) (1)
85 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Robotics and Machine Vision Internal 3 Important Questions
No ratings yet
Robotics and Machine Vision Internal 3 Important Questions
1 page
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
ADL Unit-3
No ratings yet
ADL Unit-3
21 pages
JNTUA R20 B.tech - CSE AIML III IV Year Course Structure Syllabus
No ratings yet
JNTUA R20 B.tech - CSE AIML III IV Year Course Structure Syllabus
117 pages
Data Structure Module 5
No ratings yet
Data Structure Module 5
22 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
CO - CSE 4102_AI Lab course Outline
100% (1)
CO - CSE 4102_AI Lab course Outline
4 pages
MST 4220
No ratings yet
MST 4220
15 pages
Evolution of Computer Architecture
0% (1)
Evolution of Computer Architecture
6 pages
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
No ratings yet
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
10 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Ccs349 Iva Record - Final
No ratings yet
Ccs349 Iva Record - Final
49 pages
AI-UNIT-2 PPT
No ratings yet
AI-UNIT-2 PPT
135 pages
Expert System Architecture
No ratings yet
Expert System Architecture
11 pages
DL Unit-2
No ratings yet
DL Unit-2
31 pages
Neural Network and Their Applications
No ratings yet
Neural Network and Their Applications
2 pages
UNIT 5 RISC Architecture
No ratings yet
UNIT 5 RISC Architecture
16 pages
Data Science New
No ratings yet
Data Science New
9 pages
Cse-IV-unix and Shell Programming (10cs44) - Notes
No ratings yet
Cse-IV-unix and Shell Programming (10cs44) - Notes
161 pages
Web LAB Manual 2024-25
100% (1)
Web LAB Manual 2024-25
77 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Object Oriented Programming - CS8391
No ratings yet
Object Oriented Programming - CS8391
9 pages
Kat Level - Ii Syllabus19-20
No ratings yet
Kat Level - Ii Syllabus19-20
1 page
Patanjali Assignment - BHP
No ratings yet
Patanjali Assignment - BHP
2 pages
My Learning Diary 18-June 2017 BPP
No ratings yet
My Learning Diary 18-June 2017 BPP
1 page
Scenario Summary: Plan Comparision
No ratings yet
Scenario Summary: Plan Comparision
5 pages
Ececk6242505 PPT
No ratings yet
Ececk6242505 PPT
10 pages
Final Copy Brochure-2 (Set Conference)
No ratings yet
Final Copy Brochure-2 (Set Conference)
2 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
TRF LAC Reporte Regional IA JUN 2024 V3
No ratings yet
TRF LAC Reporte Regional IA JUN 2024 V3
90 pages
Data For GenAI
No ratings yet
Data For GenAI
17 pages
Computer Science
No ratings yet
Computer Science
10 pages
P3 Building of Personalised Ai Assistant Phase 3
No ratings yet
P3 Building of Personalised Ai Assistant Phase 3
12 pages
Cnn
No ratings yet
Cnn
22 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
Aiml FPP
No ratings yet
Aiml FPP
227 pages
Feature Engineering PDF
No ratings yet
Feature Engineering PDF
19 pages
Chatgpt
No ratings yet
Chatgpt
23 pages
Natural Language Processing A Machine Learning Perspective Yue Zhang pdf download
100% (1)
Natural Language Processing A Machine Learning Perspective Yue Zhang pdf download
54 pages
01 - Semantic Segmentation
No ratings yet
01 - Semantic Segmentation
16 pages
836-Article Text-1553-1-10-20221230
No ratings yet
836-Article Text-1553-1-10-20221230
11 pages
The Math Behind Convolutional Neural Networks - Towards Data Science
No ratings yet
The Math Behind Convolutional Neural Networks - Towards Data Science
37 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
Real-Time Tool Condition Monitoring With The Internet of Things and Machine Learning Algorithms
No ratings yet
Real-Time Tool Condition Monitoring With The Internet of Things and Machine Learning Algorithms
20 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 1st Edition by AurÃ©lien GÃ©ron 9352135210 9789352135219instant download
100% (2)
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 1st Edition by AurÃ©lien GÃ©ron 9352135210 9789352135219instant download
89 pages
A Machine Learning Model For Average FuelConsumption in Heavy Vehicles
No ratings yet
A Machine Learning Model For Average FuelConsumption in Heavy Vehicles
9 pages
Sex Trouble Sexgender Slippage Sex Confusion and S
No ratings yet
Sex Trouble Sexgender Slippage Sex Confusion and S
11 pages
Machine Learning Cover Letter
100% (2)
Machine Learning Cover Letter
6 pages
Amazon Web Services Vendor Profile Worldwide 20220216
No ratings yet
Amazon Web Services Vendor Profile Worldwide 20220216
39 pages
Program Book 10-18
No ratings yet
Program Book 10-18
1,276 pages
Deep Learning - A Gentle Introduction
No ratings yet
Deep Learning - A Gentle Introduction
100 pages
Innovations in Stroke Identification A Machine Learning-Based Diagnostic Model Using Neuroimages
No ratings yet
Innovations in Stroke Identification A Machine Learning-Based Diagnostic Model Using Neuroimages
11 pages
Enhancing Accessibility Through Machine Learning a Review on Visual and Hearing Impairment Technologies
No ratings yet
Enhancing Accessibility Through Machine Learning a Review on Visual and Hearing Impairment Technologies
24 pages
Advancements in Artificial Intelligence For Stroke Management: Enhancing Diagnostics, Treatment, and Rehabilitation
No ratings yet
Advancements in Artificial Intelligence For Stroke Management: Enhancing Diagnostics, Treatment, and Rehabilitation
15 pages
Linear Regression Homework
100% (1)
Linear Regression Homework
5 pages