DL_Unit IV

/ 1
UNIT – IV
INTRODUCTION TO DEEP
LEARNING
Final Year
BTECH Subject : Deep Learning (PE4)
Unit IV : Contents
2
Advanced Deep Learning:

Deep Learning Architectures: LeNet, AlexNet, VGG, RESNET, RNN, LSTM
Deep Learning Platforms and Applications: Tensorflow and Keras
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Advanced Deep Learning:
Deep Learning Architectures: LeNet, AlexNet, VGG, RESNET, RNN, LSTM
Sources:
LeNet Resourses :
https://www.analyticsvidhya.com/blog/2021/03/the-architecture-of-lenet-5/
https://analyticsindiamag.com/complete-tutorial-on-lenet-5-guide-to-begin-with-cn
ns/
https://www.jeremyjordan.me/convnet-architectures/#lenet5
Final Year
BTECH Subject : Deep Learning (PE4)
What is transfer learning?
4
◻ A neural network is trained on a data. This network gains

knowledge from this data, which is compiled as “weights”
of the network. These weights can be extracted and then
transferred to any other neural network. Instead of training
the other neural network from scratch, we “transfer” the
learned features.
◻ Ex. A teacher has years of experience in the particular topic he/she

teaches. With all this accumulated information, the lectures that students
get is a concise and brief overview of the topic. So it can be seen as a
“transfer” of information from the learned to a novice.
Pre-trained Model
5
◻Transfer learning is the method that uses a neural network trained on

a large and generalized enough dataset and being used for another
problem. These neural networks are called Pre-trained networks.
◻The basic requirement for transfer learning is the availability of a
pre-trained network.
◻Ex. LeNet, AlexNet, VGG, RESNET, RNN, LSTM
Why to use Pre-trained Model
6
◻You will spend a serious amount of time training your model from
scratch.
◻You might not have a large enough data set where the model can
generalize well enough(not having computational resources)
◻Pre-trained model is a life-saver. Optimizing the parameters has
already been done, need to work on fine-tune the model by playing
with the hyperparameters.
Ways to Fine tune the models
7
1. Feature extraction – We can use a pre-trained model as a feature extraction mechanism. What we
can do is that we can remove the output layer( the one which gives the probabilities for being in
each of the 1000 classes) and then use the entire network as a fixed feature extractor for the new
data set.
2. Use the Architecture of the pre-trained model – What we can do is that we use architecture of the
model while we initialize all the weights randomly and train the model according to our dataset
again.
3. Train some layers while freeze others – Another way to use a pre-trained model is to train is
partially. What we can do is we keep the weights of initial layers of the model frozen while we
retrain only the higher layers. We can try and test as to how many layers to be frozen and how many
to be trained.
Pre-trained Model – LeNet5
8
◻Lenet-5 is one of the earliest pre-trained models proposed by Yann

LeCun and team in the year 1998 to identify handwritten digits for
zip code recognition in the postal service.
◻The main reason behind the popularity of this model was its simple
and straightforward architecture.
◻The network has 5 layers with learnable parameters and hence
named Lenet-5.
Architecture of LeNet Model
9
• 5 layers with learnable parameters.

• The input to the model is a grayscale image.
• It has 3 convolution layers, two average pooling layers, and two
fully connected layers with a softmax classifier.
• The number of trainable parameters is 60000.
Architecture of Model - LeNet
10
Architecture of Model - LeNet
11
◻Let’s understand the architecture in more detail.
Architecture of Model - Alexnet
12
◻ Proposed in 2012 (Alex Krizhevsky, Sutskever, & Hinton, 2012)

◻ It has 8 layers with learnable parameters
◻ Architecture Details
▪ 5 convolutional layers with combination of max pooling , split across two GPUs
▪ 2 fully connected layers
◻ Activation function : ReLU
◻ Also used dropouts
◻ Trained on imagenet dataset
◻ In this model, the depth of the network was increased in comparison to Lenet-5.
13
14
https://neurohive.io/en/popular-networks/alexnet-imagenet-classification-with-deep-convolutional-neural-networks/
15
16
https://analyticsindiamag.com/hands-on-guide-to-implementing-alexnet-with-keras-for-multi-class-image-classification/
The CIFAR-10 dataset is a publicly available image data set provided by the Canadian Institute for Advanced Research
(CIFAR). It consists of 60000 32×32 colour images in 10 classes, with 6000 images per class. The 10 different classes represent
airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 50000 training images and 10000 test images in
this dataset.
State-of-the-art deep learning image classifiers in Keras
17
Keras ships out-of-the-box with five Convolutional Neural Networks that have been pre-
trained on the ImageNet dataset:
1. VGG16
2. VGG19
3. ResNet50
4. Inception V3
5. Xception
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
Architecture of Model - VGG
18
VGG16 is a convolutional neural network model

proposed by K. Simonyan and A. Zisserman from
the University of Oxford in the paper “Very Deep
Convolutional Networks for Large-Scale Image
Recognition”. The model achieves 92.7% top-5 test
accuracy in ImageNet, which is a dataset of over 14
million images belonging to 1000 classes.
It makes the improvement over AlexNet by replacing
large kernel-sized filters (11 and 5 in the first and
second convolutional layer, respectively) with
multiple 3×3 kernel-sized filters one after another.
VGG16 was trained for weeks and was using
NVIDIA Titan Black GPU’s.
https://arxiv.org/abs/1409.1556
Architecture of Model - VGG
19
https://neurohive.io/en/popular-networks/vgg16/
VGG16 and VGG19
20
Table 1 of Very Deep Convolutional Networks for Large Scale Image Recognition, Simonyan and Zisserman (2014).
ResNet
21
Unlike traditional sequential network architectures such as AlexNet, OverFeat, and

VGG, ResNet is instead a form of “exotic architecture” that relies on micro-
architecture modules (also called “network-in-network architectures”).
The term micro-architecture refers to the set of “building blocks” used to construct
the network. A collection of micro-architecture building blocks (along with your
standard CONV, POOL, etc. layers) leads to the macro-architecture (i.e,. the end
network itself).
ResNet
22
The residual module in ResNet as originally proposed by He et al. in 2015.

23
Recurrent Neural Networks (RNN)
24
Humans don’t start their thinking from scratch every second. As you read this essay, you understand each word based on your
understanding of previous words. You don’t throw everything away and start thinking from scratch again. Your thoughts have
persistence.
Traditional neural networks can’t do this, and it seems like a major shortcoming. For example, imagine you want to classify
what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its
reasoning about previous events in the film to inform later ones.
Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks (RNN)
25
● This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists.
They’re the natural architecture of neural network to use for such data.
● And they certainly are used! In the last few years, there have been incredible success applying RNNs to a
variety of problems: speech recognition, language modeling, translation, image captioning… The list goes
on.
● Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network which
works, for many tasks, much better than the standard version. Almost all exciting results based on recurrent
neural networks are achieved with them. It’s these LSTMs that this essay will explore.
Long Short Term Memory Networks (LSTM)
26
● Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning
long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and
popularized by many people in following work.1 They work tremendously well on a large variety of problems, and
are now widely used.
● LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long
periods of time is practically their default behavior, not something they struggle to learn!
● All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs,
this repeating module will have a very simple structure, such as a single tanh layer.
Fig. The repeating module in a

standard RNN contains a single
layer.
27
LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a
single neural network layer, there are four, interacting in a very special way.
Fig. The repeating module in an LSTM contains four interacting layers.
28
Applications of LSTM:
• Speech Recognition (Input is audio and output is text) - as done by Google Assistant, Microsoft Cortana,
Apple Siri
• Machine Translation (Input is text and output is also text) - as done by Google Translate
• Image Captioning (Input is image and output is text)
• Sentiment Analysis (Input is text and output is rating)
• Music Generation/Synthesis ( input music notes and output is music)
• Video Activity Recognition (input is video and output is type of activity)
• Time series prediction ( Forecasting)
CNN vs RNN
29
https://searchenterpriseai.techtarget.com/feature/CNN-vs-RNN-How-they-differ-and-where-they-overlap
RNN and LSTM
30
● Advantages and Disadvantages

31
Animated RNN, LSTM and GRU
Recurrent neural network cells in GIFs

https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45
GAN-[content beyond syllabus]
33
• Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used
for unsupervised learning.
• Generative Adversarial Networks (GANs) were first introduced in 2014 by Ian Goodfellow et. al. and
since then this topic itself opened up a new area of research.
• GAN-Generative Adversarial Networks-an approach to generative modeling using deep learning
methods, such as convolutional neural networks.
• Generative modeling is an unsupervised learning task in machine learning that involves automatically
discovering and learning the regularities or patterns in input data in such a way that the model can be
used to generate or output new examples that plausibly could have been drawn from the original dataset.
• GANs are a clever way of training a generative model by framing the problem as a supervised learning
problem with two sub-models: the generator model that we train to generate new examples, and the
discriminator model that tries to classify examples as either real (from the domain) or fake (generated).
The two models are trained together in a zero-sum game, adversarial, until the discriminator model is
fooled about half the time, meaning the generator model is generating plausible examples.
GAN
34
Generative Adversarial Networks (GANs) can be broken

down into three parts:
• Generative: To learn a generative model, which
describes how data is generated in terms of a
probabilistic model.
• Adversarial: The training of a model is done in an
adversarial setting.
• Networks: Use deep neural networks as the artificial
intelligence (AI) algorithms for training purpose.
GAN
35
• GANs are an exciting and rapidly changing field, delivering on the promise of generative
models in their ability to generate realistic examples across a range of problem domains, most
notably in image-to-image translation tasks such as translating photos of summer to winter or
day to night, and in generating photorealistic photos of objects, scenes, and people that even
humans cannot tell are fake.
• With the invention of GANs, Generative Models had started showing promising results in
generating realistic images. GANs has shown tremendous success in Computer Vision. In
recent times, it started showing promising results in Audio, Text as well.
• Some of the most popular GAN formulations are:
• Transforming an image from one domain to another (CycleGAN),
• Generating an image from a textual description (text-to-image),
• Generating very high-resolution images (ProgressiveGAN) and many more.
GAN-Types
36
Basic
• Generative Adversarial Network (GAN)
• Deep Convolutional Generative Adversarial Network (DCGAN)
Extensions
• Conditional Generative Adversarial Network (cGAN)
• Information Maximizing Generative Adversarial Network (InfoGAN)
• Auxiliary Classifier Generative Adversarial Network (AC-GAN)
• Stacked Generative Adversarial Network (StackGAN)
• Context Encoders
• Pix2Pix
Advanced
• Wasserstein Generative Adversarial Network (WGAN)
• Cycle-Consistent Generative Adversarial Network (CycleGAN)
• Progressive Growing Generative Adversarial Network (Progressive GAN)
• Style-Based Generative Adversarial Network (StyleGAN)
• Big Generative Adversarial Network (BigGAN)

DL_Unit IV

Uploaded by

Copyright:

Available Formats

DL_Unit IV

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL_Unit IV

Uploaded by

Copyright:

Available Formats

/ 1

Advanced Deep Learning:

◻ A neural network is trained on a data. This network gains

◻ Ex. A teacher has years of experience in the particular topic he/she

◻Transfer learning is the method that uses a neural network trained on

◻Lenet-5 is one of the earliest pre-trained models proposed by Yann

• 5 layers with learnable parameters.

◻Let’s understand the architecture in more detail.

◻ Proposed in 2012 (Alex Krizhevsky, Sutskever, & Hinton, 2012)

VGG16 is a convolutional neural network model

Unlike traditional sequential network architectures such as AlexNet, OverFeat, and

The residual module in ResNet as originally proposed by He et al. in 2015.

Fig. The repeating module in a

Fig. The repeating module in an LSTM contains four interacting layers.

● Advantages and Disadvantages

Recurrent neural network cells in GIFs

Generative Adversarial Networks (GANs) can be broken

You might also like