DL_Unit IV
DL_Unit IV
DL_Unit IV
UNIT – IV
INTRODUCTION TO DEEP
LEARNING
Final Year
BTECH Subject : Deep Learning (PE4)
Unit IV : Contents
2
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Advanced Deep Learning:
Deep Learning Architectures: LeNet, AlexNet, VGG, RESNET, RNN, LSTM
Sources:
LeNet Resourses :
https://www.analyticsvidhya.com/blog/2021/03/the-architecture-of-lenet-5/
https://analyticsindiamag.com/complete-tutorial-on-lenet-5-guide-to-begin-with-cn
ns/
https://www.jeremyjordan.me/convnet-architectures/#lenet5
Final Year
BTECH Subject : Deep Learning (PE4)
What is transfer learning?
4
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Pre-trained Model
5
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Why to use Pre-trained Model
6
◻You will spend a serious amount of time training your model from
scratch.
◻You might not have a large enough data set where the model can
generalize well enough(not having computational resources)
◻Pre-trained model is a life-saver. Optimizing the parameters has
already been done, need to work on fine-tune the model by playing
with the hyperparameters.
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Ways to Fine tune the models
7
1. Feature extraction – We can use a pre-trained model as a feature extraction mechanism. What we
can do is that we can remove the output layer( the one which gives the probabilities for being in
each of the 1000 classes) and then use the entire network as a fixed feature extractor for the new
data set.
2. Use the Architecture of the pre-trained model – What we can do is that we use architecture of the
model while we initialize all the weights randomly and train the model according to our dataset
again.
3. Train some layers while freeze others – Another way to use a pre-trained model is to train is
partially. What we can do is we keep the weights of initial layers of the model frozen while we
retrain only the higher layers. We can try and test as to how many layers to be frozen and how many
to be trained.
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Pre-trained Model – LeNet5
8
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Architecture of LeNet Model
9
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Architecture of Model - LeNet
10
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Architecture of Model - LeNet
11
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Architecture of Model - Alexnet
12
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Architecture of Model - Alexnet
13
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
Architecture of Model - Alexnet
14
https://neurohive.io/en/popular-networks/alexnet-imagenet-classification-with-deep-convolutional-neural-networks/
Architecture of Model - Alexnet
15
https://www.analyticsvidhya.com/blog/2021/03/introduction-to-the-
architecture-of-alexnet/
16
https://analyticsindiamag.com/hands-on-guide-to-implementing-alexnet-with-keras-for-multi-class-image-classification/
The CIFAR-10 dataset is a publicly available image data set provided by the Canadian Institute for Advanced Research
(CIFAR). It consists of 60000 32×32 colour images in 10 classes, with 6000 images per class. The 10 different classes represent
airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 50000 training images and 10000 test images in
this dataset.
State-of-the-art deep learning image classifiers in Keras
17
Keras ships out-of-the-box with five Convolutional Neural Networks that have been pre-
trained on the ImageNet dataset:
1. VGG16
2. VGG19
3. ResNet50
4. Inception V3
5. Xception
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
Architecture of Model - VGG
18
https://arxiv.org/abs/1409.1556
Architecture of Model - VGG
19
https://neurohive.io/en/popular-networks/vgg16/
VGG16 and VGG19
20
Table 1 of Very Deep Convolutional Networks for Large Scale Image Recognition, Simonyan and Zisserman (2014).
ResNet
21
The term micro-architecture refers to the set of “building blocks” used to construct
the network. A collection of micro-architecture building blocks (along with your
standard CONV, POOL, etc. layers) leads to the macro-architecture (i.e,. the end
network itself).
ResNet
22
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks (RNN)
25
● This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists.
They’re the natural architecture of neural network to use for such data.
● And they certainly are used! In the last few years, there have been incredible success applying RNNs to a
variety of problems: speech recognition, language modeling, translation, image captioning… The list goes
on.
● Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network which
works, for many tasks, much better than the standard version. Almost all exciting results based on recurrent
neural networks are achieved with them. It’s these LSTMs that this essay will explore.
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
26
● Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning
long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and
popularized by many people in following work.1 They work tremendously well on a large variety of problems, and
are now widely used.
● LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long
periods of time is practically their default behavior, not something they struggle to learn!
● All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs,
this repeating module will have a very simple structure, such as a single tanh layer.
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
27
LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a
single neural network layer, there are four, interacting in a very special way.
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Long Short Term Memory Networks (LSTM)
28
Applications of LSTM:
• Speech Recognition (Input is audio and output is text) - as done by Google Assistant, Microsoft Cortana,
Apple Siri
• Machine Translation (Input is text and output is also text) - as done by Google Translate
• Image Captioning (Input is image and output is text)
• Sentiment Analysis (Input is text and output is rating)
• Music Generation/Synthesis ( input music notes and output is music)
• Video Activity Recognition (input is video and output is type of activity)
• Time series prediction ( Forecasting)
CNN vs RNN
29
https://searchenterpriseai.techtarget.com/feature/CNN-vs-RNN-How-they-differ-and-where-they-overlap
RNN and LSTM
30
• Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used
for unsupervised learning.
• Generative Adversarial Networks (GANs) were first introduced in 2014 by Ian Goodfellow et. al. and
since then this topic itself opened up a new area of research.
• GAN-Generative Adversarial Networks-an approach to generative modeling using deep learning
methods, such as convolutional neural networks.
• Generative modeling is an unsupervised learning task in machine learning that involves automatically
discovering and learning the regularities or patterns in input data in such a way that the model can be
used to generate or output new examples that plausibly could have been drawn from the original dataset.
• GANs are a clever way of training a generative model by framing the problem as a supervised learning
problem with two sub-models: the generator model that we train to generate new examples, and the
discriminator model that tries to classify examples as either real (from the domain) or fake (generated).
The two models are trained together in a zero-sum game, adversarial, until the discriminator model is
fooled about half the time, meaning the generator model is generating plausible examples.
GAN
34
• GANs are an exciting and rapidly changing field, delivering on the promise of generative
models in their ability to generate realistic examples across a range of problem domains, most
notably in image-to-image translation tasks such as translating photos of summer to winter or
day to night, and in generating photorealistic photos of objects, scenes, and people that even
humans cannot tell are fake.
• With the invention of GANs, Generative Models had started showing promising results in
generating realistic images. GANs has shown tremendous success in Computer Vision. In
recent times, it started showing promising results in Audio, Text as well.
• Some of the most popular GAN formulations are:
• Transforming an image from one domain to another (CycleGAN),
• Generating an image from a textual description (text-to-image),
• Generating very high-resolution images (ProgressiveGAN) and many more.
GAN-Types
36
Basic
• Generative Adversarial Network (GAN)
• Deep Convolutional Generative Adversarial Network (DCGAN)
Extensions
• Conditional Generative Adversarial Network (cGAN)
• Information Maximizing Generative Adversarial Network (InfoGAN)
• Auxiliary Classifier Generative Adversarial Network (AC-GAN)
• Stacked Generative Adversarial Network (StackGAN)
• Context Encoders
• Pix2Pix
Advanced
• Wasserstein Generative Adversarial Network (WGAN)
• Cycle-Consistent Generative Adversarial Network (CycleGAN)
• Progressive Growing Generative Adversarial Network (Progressive GAN)
• Style-Based Generative Adversarial Network (StyleGAN)
• Big Generative Adversarial Network (BigGAN)