Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CNN RNN LSTM GRU Simple

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 20

Architecture of Simple CNN ( Convolutional

Neural Network)

Fig: Simple CNN model


Convolution layer
 To extract the features from the inputs convolution layer is
used. These feature maps are produced by the multiplication
of input data with so many filters . The example of the
convolution process is shown in the below figure.

Fig : Convolution operation of n×1 matrix with 3×1 filter


Pooling layer
 The purpose of the pooling layer is to reduce the size of
convolved features. By considering only the dominant
information, the computation power needed to process the data
is also reduced. Generally used pooling techniques are Max
pooling and average pooling. Max pooling gives the maximum
value and the Average pooling gives the average value from
the selected portions.

Fig : 1D max pool operation


Fully Connected layer
 Fully connected layer have full connections to all
activations in the previous layer.
 Fully connected layers are the normal flat feed-forward
neural network layers. After the pooling layers, pixels of
pooling layers is stretched to single column vector. These
vectorized and concatenated data points are fed into dense
layers ,known as fully connected layers for the
classification.
Recurrent Neural Networks
Recurrent neural network (RNN) is a neural network model
proposed in the 80’s for modelling time series
Recurrent neural networks were created because there were a
few issues in the feed-forward neural network:
Cannot handle sequential data
Considers only the current input
Cannot memorize previous inputs
The solution to these issues is the Recurrent Neural Network
(RNN). An RNN can handle sequential data, accepting the
current input data, and previously received inputs. RNNs can
memorize previous inputs due to their internal memory.
Structure of RNN
• The structure of the network is similar to feedforward neural network,
with the distinction that it allows a recurrent hidden state whose
activation at each time is dependent on that of the previous time (cycle).

Fig: Recurrent Neural Network


Formulas for calculating Current and output
states
Long Short Term Memory
 Long Short-Term Memory (LSTM) is one of the most
widely used recurrent structures in sequence modeling. It
uses gates to control information flow in the recurrent
computations.
 LSTM networks are very good at holding long term
memories. The memory may or may not be retained by the
network depending on the data. Preserving the long term
dependencies in the network is done by its
Gating mechanisms. The network can store or release
memory on the go through the gating mechanism.
Need of LSTM’s
 Recurrent Neural Networks suffer from short-term
memory. If a sequence is long enough, they have a hard
time carrying information from earlier time steps to later
ones. So if you are trying to process a paragraph of text to
do predictions, RNN’s may leave out important
information from the beginning.
 During back propagation, recurrent neural networks suffer
from below problems:
Vanishing Gradients
Exploding Gradients
Vanishing Gradients occurs when the values of a gradient
are too small and the model stops learning or takes way too
long because of that.
Exploding Gradients occurs when the algorithm assigns a
stupidly high importance to the weights, without much
reason.
This issue can be resolved by applying a slightly tweaked
version of RNNs – the Long Short-Term Memory Networks
and Gated Recurrent units.
LSTM Architecture
 Here is the internal functioning of the LSTM network.

Fig: LSTM cell


Working of LSTM’s
 The main three parts of an LSTM cell are known as gates.
The first part is called Forget gate, the second part is
known as the Input gate and the last one is the Output
gate.
 The first gate is called the forget gate. If you shut it, no old
memory will be kept. If you fully open this gate, all old
memory will pass through. It is actually an element-wise
multiplication operation. So if you multiply the old
memory with a vector that is close to 0, that means you
want to forget most of the old memory. If you want to let
the old memory go through, forget gate should be equal to
1.
 The second gate is the input gate. Exactly how much new
input should come in is controlled by the second gate. This
gate controls how much the new memory should influence
the old memory.
 Next is + operator. This operator means piece-wise
summation. Current input and the old memory will merge
by this operation. This is element-wise summation to
merge the old memory and the current input to form St.
 We need to generate the output for this LSTM unit. This
step has an output gate that is controlled by the new
memory, the previous output & the current input. This gate
controls how much new memory should output to the next
LSTM unit.
Gated Recurrent Units
GRUs were introduced only in 2014 by Cho, et al. and
can be considered a relatively new architecture,
especially when compared to the widely-adopted
LSTM, which was proposed in 1997 by Sepp
Hochreiter and Jürgen Schmidhuber.
A Gated Recurrent Unit (GRU), as its name suggests,
is a variant of the RNN architecture, and uses gating
mechanisms to control and manage the flow of
information between cells in the neural network.
Differences between LSTM and GRU
The key difference between a GRU and an LSTM is
that a GRU has two gates (reset and update gates)
whereas an LSTM has three gates
(namely input, output and forget gates.

Fig: GRU vs LSTMM


 LSTMs have two different states passed between the cells
the cell state and hidden state, which carry the long and
short-term memory, respectively.
 GRUs only have one hidden state transferred between time
steps. This hidden state is able to hold both the long-term
and short-term dependencies at the same time due to the
gating mechanisms and computations that the hidden state
and input data go through.
 LSTMs remember longer sequences than GRUs and
outperform them in tasks requiring modeling long-distance
relations in language modelling problems. GRUs train
faster and perform better than LSTMs on less training data
for language modeling problems. GRUs are simpler and
easier to modify and has less code in general.
The Architecture of Gated Recurrent Unit
 Now lets’ understand how GRU works. Here we have a GRU
cell which more or less similar to an LSTM cell or RNN cell.

 At each timestamp t, it takes an input Xt and the hidden state


Ht-1 from the previous timestamp t-1. Later it outputs a new
hidden state Ht which again passed to the next timestamp.
 Now there are primarily two gates in a GRU as opposed to
three gates in an LSTM cell. The first gate is the Reset gate
and the other one is the update gate.
Architecture of GRU cell

Fig: GRU cell


Reset Gate (Short term memory)
The Reset Gate is responsible for the short-term memory of the
network i.e the hidden state (Ht). The value of rt will range from
0 to 1 because of the sigmoid function.

Here Ur and Wr are weight matrices for the reset gate.


• Update Gate (Long Term memory)
Similarly, we have an Update gate for long-term memory and the
equation of the gate is shown below.

The only difference is of weight metrics i.e Uu and Wu.


 
THANK YOU

You might also like