The document provides an overview of different neural network architectures including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks. It describes the basic components and functions of a CNN including convolution, pooling, and fully connected layers. It then explains the need for RNNs and LSTMs to process sequential data and address issues like vanishing gradients. Key aspects of RNNs, LSTMs, and Gated Recurrent Units (GRUs) like their structures, gates, memory cells, and differences are defined. Diagrams are included to illustrate how information flows through these different neural network types.
The document provides an overview of different neural network architectures including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks. It describes the basic components and functions of a CNN including convolution, pooling, and fully connected layers. It then explains the need for RNNs and LSTMs to process sequential data and address issues like vanishing gradients. Key aspects of RNNs, LSTMs, and Gated Recurrent Units (GRUs) like their structures, gates, memory cells, and differences are defined. Diagrams are included to illustrate how information flows through these different neural network types.
The document provides an overview of different neural network architectures including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks. It describes the basic components and functions of a CNN including convolution, pooling, and fully connected layers. It then explains the need for RNNs and LSTMs to process sequential data and address issues like vanishing gradients. Key aspects of RNNs, LSTMs, and Gated Recurrent Units (GRUs) like their structures, gates, memory cells, and differences are defined. Diagrams are included to illustrate how information flows through these different neural network types.
The document provides an overview of different neural network architectures including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks. It describes the basic components and functions of a CNN including convolution, pooling, and fully connected layers. It then explains the need for RNNs and LSTMs to process sequential data and address issues like vanishing gradients. Key aspects of RNNs, LSTMs, and Gated Recurrent Units (GRUs) like their structures, gates, memory cells, and differences are defined. Diagrams are included to illustrate how information flows through these different neural network types.
Download as PPT, PDF, TXT or read online from Scribd
Download as ppt, pdf, or txt
You are on page 1of 20
Architecture of Simple CNN ( Convolutional
Neural Network)
Fig: Simple CNN model
Convolution layer To extract the features from the inputs convolution layer is used. These feature maps are produced by the multiplication of input data with so many filters . The example of the convolution process is shown in the below figure.
Fig : Convolution operation of n×1 matrix with 3×1 filter
Pooling layer The purpose of the pooling layer is to reduce the size of convolved features. By considering only the dominant information, the computation power needed to process the data is also reduced. Generally used pooling techniques are Max pooling and average pooling. Max pooling gives the maximum value and the Average pooling gives the average value from the selected portions.
Fig : 1D max pool operation
Fully Connected layer Fully connected layer have full connections to all activations in the previous layer. Fully connected layers are the normal flat feed-forward neural network layers. After the pooling layers, pixels of pooling layers is stretched to single column vector. These vectorized and concatenated data points are fed into dense layers ,known as fully connected layers for the classification. Recurrent Neural Networks Recurrent neural network (RNN) is a neural network model proposed in the 80’s for modelling time series Recurrent neural networks were created because there were a few issues in the feed-forward neural network: Cannot handle sequential data Considers only the current input Cannot memorize previous inputs The solution to these issues is the Recurrent Neural Network (RNN). An RNN can handle sequential data, accepting the current input data, and previously received inputs. RNNs can memorize previous inputs due to their internal memory. Structure of RNN • The structure of the network is similar to feedforward neural network, with the distinction that it allows a recurrent hidden state whose activation at each time is dependent on that of the previous time (cycle).
Fig: Recurrent Neural Network
Formulas for calculating Current and output states Long Short Term Memory Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It uses gates to control information flow in the recurrent computations. LSTM networks are very good at holding long term memories. The memory may or may not be retained by the network depending on the data. Preserving the long term dependencies in the network is done by its Gating mechanisms. The network can store or release memory on the go through the gating mechanism. Need of LSTM’s Recurrent Neural Networks suffer from short-term memory. If a sequence is long enough, they have a hard time carrying information from earlier time steps to later ones. So if you are trying to process a paragraph of text to do predictions, RNN’s may leave out important information from the beginning. During back propagation, recurrent neural networks suffer from below problems: Vanishing Gradients Exploding Gradients Vanishing Gradients occurs when the values of a gradient are too small and the model stops learning or takes way too long because of that. Exploding Gradients occurs when the algorithm assigns a stupidly high importance to the weights, without much reason. This issue can be resolved by applying a slightly tweaked version of RNNs – the Long Short-Term Memory Networks and Gated Recurrent units. LSTM Architecture Here is the internal functioning of the LSTM network.
Fig: LSTM cell
Working of LSTM’s The main three parts of an LSTM cell are known as gates. The first part is called Forget gate, the second part is known as the Input gate and the last one is the Output gate. The first gate is called the forget gate. If you shut it, no old memory will be kept. If you fully open this gate, all old memory will pass through. It is actually an element-wise multiplication operation. So if you multiply the old memory with a vector that is close to 0, that means you want to forget most of the old memory. If you want to let the old memory go through, forget gate should be equal to 1. The second gate is the input gate. Exactly how much new input should come in is controlled by the second gate. This gate controls how much the new memory should influence the old memory. Next is + operator. This operator means piece-wise summation. Current input and the old memory will merge by this operation. This is element-wise summation to merge the old memory and the current input to form St. We need to generate the output for this LSTM unit. This step has an output gate that is controlled by the new memory, the previous output & the current input. This gate controls how much new memory should output to the next LSTM unit. Gated Recurrent Units GRUs were introduced only in 2014 by Cho, et al. and can be considered a relatively new architecture, especially when compared to the widely-adopted LSTM, which was proposed in 1997 by Sepp Hochreiter and Jürgen Schmidhuber. A Gated Recurrent Unit (GRU), as its name suggests, is a variant of the RNN architecture, and uses gating mechanisms to control and manage the flow of information between cells in the neural network. Differences between LSTM and GRU The key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates.
Fig: GRU vs LSTMM
LSTMs have two different states passed between the cells the cell state and hidden state, which carry the long and short-term memory, respectively. GRUs only have one hidden state transferred between time steps. This hidden state is able to hold both the long-term and short-term dependencies at the same time due to the gating mechanisms and computations that the hidden state and input data go through. LSTMs remember longer sequences than GRUs and outperform them in tasks requiring modeling long-distance relations in language modelling problems. GRUs train faster and perform better than LSTMs on less training data for language modeling problems. GRUs are simpler and easier to modify and has less code in general. The Architecture of Gated Recurrent Unit Now lets’ understand how GRU works. Here we have a GRU cell which more or less similar to an LSTM cell or RNN cell.
At each timestamp t, it takes an input Xt and the hidden state
Ht-1 from the previous timestamp t-1. Later it outputs a new hidden state Ht which again passed to the next timestamp. Now there are primarily two gates in a GRU as opposed to three gates in an LSTM cell. The first gate is the Reset gate and the other one is the update gate. Architecture of GRU cell
Fig: GRU cell
Reset Gate (Short term memory) The Reset Gate is responsible for the short-term memory of the network i.e the hidden state (Ht). The value of rt will range from 0 to 1 because of the sigmoid function.
Here Ur and Wr are weight matrices for the reset gate.
• Update Gate (Long Term memory) Similarly, we have an Update gate for long-term memory and the equation of the gate is shown below.
The only difference is of weight metrics i.e Uu and Wu.