Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Deep Learning Own Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

UNIT-1 DL

Linear SVM

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
used for Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine. Consider the below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that
can accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats and dogs so that it
can learn about different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two data (cat and
dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog.
On the basis of the support vectors, it will classify it as a cat. Consider the below diagram:

52.8M
950
Prime Ministers of India | List of Prime Minister of India (1947-2020)Next

SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such data
is termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-


dimensional space, but we need to find out the best decision boundary that helps to classify
the data points. This best boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which means
if there are 2 features (as shown in image), then hyperplane will be a straight line. And if
there are 3 features, then hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.

How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2.
We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of the
lines from both the classes. These points are called support vectors. The distance between the
vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this
margin. The hyperplane with maximum margin is called the optimal hyperplane.
UNIT – 2

Deep Learning

1. Deep learning is the subfield of artificial intelligence that focuses on creating large neural
network models that are capable of making accurate data-driven decisions.

2. Deep learning is used where the data is complex and has large datasets.

3. Facebook uses deep learning to analyze text in online conversations. Google and
Microsoft all use deep learning for image search and machine translation.

4. All modern smart phones have deep learning systems running on them. For example, deep
learning is the standard technology for speech recognition, and also for face detection on
digital cameras.

5. In the healthcare sector, deep learning is used to process medical images (X-rays, CT, and
MRI scans) and diagnose health conditions.

6. Deep learning is also at the core of self-driving cars, where it is used for localization and
mapping, motion planning and steering, and environment perception, as well as tracking
driver state.

History of Deep Learning

1. In 300 BC : Aristotle introduce associationism, started the history of human’s attempt to


understand brain.

2. In 1873 : Alexander Bain introduce neural groupings as the earliest models of neural
network.

3. In 1913 : MeCulloch and Pitts introduce MCP model, which is considered as the ancestor
of artificial neural model.

4. In 1919 : Donald Hebb considered as the father of neural networks, introduced Hebbian
Learning Rule, which lays the foundation of modern neural network.

5. In 1958 : Frank Rosenblatt introduce the first perception, which highly resembles modern
perception.
6. In 1974 : Paul Werbos introduce backpropagation.

7. In 1980 : Tenvo Kohonen introduce self organizing map.

8. In 1980 : Kumihiko Fukushima introduce Neocognitron, which inspired convolutional


neural network.

9. In 1982 : John Hopfield introduce Hopfield network.

10. In 1985 : Hilton and Sejnowski introduce Boltzmann machine.

11. In 1986 : Paul Smolensky introduce Harmonium, which is later known as restricted
Boltzmann machine.

12. In 1986 : Michael I. Jordan defined and introduce recurrent neural network.

13. In 1990 : Yann LeCun introduce LeNet, showed the possibility of deep neural networks
in practice.

14. In 1997 : Scluster and Paliwal introduce bidirectional recurrent neural network.

15. In 2006 : Geoffrey Hinton introduce deep belief networks, also introduced layer-wise
pretraining technique, opened current deep learning era.

16. In 2009 : Salakhutdinov and Hinton introduce deep Boltzmann machines.

17. In 2012 : Geoffrey Hinton introduce Dropont, an efficient way of training neural
networks.

Advantages of batch normalization :

1. It reduces internal covariant shift.

2. It reduces the dependence of gradients on the scale of the parameters or their initial values.

3. Regularizes the model and reduces the need for dropout, photometric distortions, local
response normalization and other regularization techniques.

4. It allows use of saturating nonlinearities and higher learning rates.

Disadvantages of batch normalization :


1. Difficult to estimate mean and standard derivation of input during testing.

2. It cannot use batch size of one during training.

3. Computational overhead occurs during training.

SHALLOW VS DEEP NEURAL NETWORKS

Shallow neural networks give us basic idea about deep neural network which consist
of only 1 or 2 hidden layers. Understanding a shallow neural network gives us an
understanding into what exactly is going on inside a deep neural network A neural
network is built using various hidden layers.

GAN

Generative Adversarial Networks (GANs) are a powerful class of neural networks that are
used for unsupervised learning.

How does GANs work?


Generative Adversarial Networks (GANs) can be broken down into three parts:
 Generative: To learn a generative model, which describes how data is generated in
terms of a probabilistic model.
 Adversarial: The training of a model is done in an adversarial setting.
 Networks: Use deep neural networks as the artificial intelligence (AI) algorithms for
training purpose.
In GANs, there is a generator and a discriminator. The Generator generates fake samples
of data(be it an image, audio, etc.) and tries to fool the Discriminator. The Discriminator,
on the other hand, tries to distinguish between the real and fake samples. The Generator and
the Discriminator are both Neural Networks and they both run in competition with each
other in the training phase. The steps are repeated several times and in this, the Generator
and Discriminator get better and better in their respective jobs after each repetition. The
working can be visualized by the diagram given below:

Here, the generative model captures the distribution of data and is trained in such a manner
that it tries to maximize the probability of the Discriminator in making a mistake. The
Discriminator, on the other hand, is based on a model that estimates the probability that the
sample that it got is received from the training data and not from the Generator.
The GANs are formulated as a minimax game, where the Discriminator is trying to
minimize its reward V(D, G) and the Generator is trying to minimize the Discriminator’s
reward or in other words, maximize its loss. It can be mathematically described by the
formula below:

where,
G = Generator
D = Discriminator
Pdata(x) = distribution of real data
P(z) = distribution of generator
x = sample from Pdata(x)
z = sample from P(z)
D(x) = Discriminator network
G(z) = Generator network
So, basically, training a GAN has two parts:
 Part 1: The Discriminator is trained while the Generator is idle. In this phase, the
network is only forward propagated and no back-propagation is done. The
Discriminator is trained on real data for n epochs, and see if it can correctly predict
them as real. Also, in this phase, the Discriminator is also trained on the fake generated
data from the Generator and see if it can correctly predict them as fake.
 Part 2: The Generator is trained while the Discriminator is idle. After the Discriminator
is trained by the generated fake data of the Generator, we can get its predictions and use
the results for training the Generator and get better from the previous state to try and
fool the Discriminator.
The above method is repeated for a few epochs and then manually check the fake data if it
seems genuine. If it seems acceptable, then the training is stopped, otherwise, its allowed to
continue for few more epochs.

Advantages of GAN :
1. Better modeling of data distribution (images sharper and clearer
2. GANs can train any kind of generator network. Other frameworks require generator
networks to have some specific form of functionality, such as the output layer being
Gaussian.
3. There is no need to use the Markov chain to repeatedly sample, without inferring in the
learning process, without complicated variational lower bounds, avoiding the difficulty of
approximating the difficult probability of calculation.

Disadvantages of GAN :
1. Hard to train, unstable. Good synchronization is required between the generator and the
discriminator.
2. Mode collapse issue. The learning process of GANs may have a missing pattern, the
generator begins to degenerate, and the same sample points are always generated, and the
learning cannot be continued

You might also like