Deep Learning Own Notes
Deep Learning Own Notes
Deep Learning Own Notes
Linear SVM
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is
used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector
Machine. Consider the below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:
Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model that
can accurately identify whether it is a cat or dog, so such a model can be created by using the
SVM algorithm. We will first train our model with lots of images of cats and dogs so that it
can learn about different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two data (cat and
dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog.
On the basis of the support vectors, it will classify it as a cat. Consider the below diagram:
52.8M
950
Prime Ministers of India | List of Prime Minister of India (1947-2020)Next
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such data
is termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
The dimensions of the hyperplane depend on the features present in the dataset, which means
if there are 2 features (as shown in image), then hyperplane will be a straight line. And if
there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2.
We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue.
Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of the
lines from both the classes. These points are called support vectors. The distance between the
vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this
margin. The hyperplane with maximum margin is called the optimal hyperplane.
UNIT – 2
Deep Learning
1. Deep learning is the subfield of artificial intelligence that focuses on creating large neural
network models that are capable of making accurate data-driven decisions.
2. Deep learning is used where the data is complex and has large datasets.
3. Facebook uses deep learning to analyze text in online conversations. Google and
Microsoft all use deep learning for image search and machine translation.
4. All modern smart phones have deep learning systems running on them. For example, deep
learning is the standard technology for speech recognition, and also for face detection on
digital cameras.
5. In the healthcare sector, deep learning is used to process medical images (X-rays, CT, and
MRI scans) and diagnose health conditions.
6. Deep learning is also at the core of self-driving cars, where it is used for localization and
mapping, motion planning and steering, and environment perception, as well as tracking
driver state.
2. In 1873 : Alexander Bain introduce neural groupings as the earliest models of neural
network.
3. In 1913 : MeCulloch and Pitts introduce MCP model, which is considered as the ancestor
of artificial neural model.
4. In 1919 : Donald Hebb considered as the father of neural networks, introduced Hebbian
Learning Rule, which lays the foundation of modern neural network.
5. In 1958 : Frank Rosenblatt introduce the first perception, which highly resembles modern
perception.
6. In 1974 : Paul Werbos introduce backpropagation.
11. In 1986 : Paul Smolensky introduce Harmonium, which is later known as restricted
Boltzmann machine.
12. In 1986 : Michael I. Jordan defined and introduce recurrent neural network.
13. In 1990 : Yann LeCun introduce LeNet, showed the possibility of deep neural networks
in practice.
14. In 1997 : Scluster and Paliwal introduce bidirectional recurrent neural network.
15. In 2006 : Geoffrey Hinton introduce deep belief networks, also introduced layer-wise
pretraining technique, opened current deep learning era.
17. In 2012 : Geoffrey Hinton introduce Dropont, an efficient way of training neural
networks.
2. It reduces the dependence of gradients on the scale of the parameters or their initial values.
3. Regularizes the model and reduces the need for dropout, photometric distortions, local
response normalization and other regularization techniques.
Shallow neural networks give us basic idea about deep neural network which consist
of only 1 or 2 hidden layers. Understanding a shallow neural network gives us an
understanding into what exactly is going on inside a deep neural network A neural
network is built using various hidden layers.
GAN
Generative Adversarial Networks (GANs) are a powerful class of neural networks that are
used for unsupervised learning.
Here, the generative model captures the distribution of data and is trained in such a manner
that it tries to maximize the probability of the Discriminator in making a mistake. The
Discriminator, on the other hand, is based on a model that estimates the probability that the
sample that it got is received from the training data and not from the Generator.
The GANs are formulated as a minimax game, where the Discriminator is trying to
minimize its reward V(D, G) and the Generator is trying to minimize the Discriminator’s
reward or in other words, maximize its loss. It can be mathematically described by the
formula below:
where,
G = Generator
D = Discriminator
Pdata(x) = distribution of real data
P(z) = distribution of generator
x = sample from Pdata(x)
z = sample from P(z)
D(x) = Discriminator network
G(z) = Generator network
So, basically, training a GAN has two parts:
Part 1: The Discriminator is trained while the Generator is idle. In this phase, the
network is only forward propagated and no back-propagation is done. The
Discriminator is trained on real data for n epochs, and see if it can correctly predict
them as real. Also, in this phase, the Discriminator is also trained on the fake generated
data from the Generator and see if it can correctly predict them as fake.
Part 2: The Generator is trained while the Discriminator is idle. After the Discriminator
is trained by the generated fake data of the Generator, we can get its predictions and use
the results for training the Generator and get better from the previous state to try and
fool the Discriminator.
The above method is repeated for a few epochs and then manually check the fake data if it
seems genuine. If it seems acceptable, then the training is stopped, otherwise, its allowed to
continue for few more epochs.
Advantages of GAN :
1. Better modeling of data distribution (images sharper and clearer
2. GANs can train any kind of generator network. Other frameworks require generator
networks to have some specific form of functionality, such as the output layer being
Gaussian.
3. There is no need to use the Markov chain to repeatedly sample, without inferring in the
learning process, without complicated variational lower bounds, avoiding the difficulty of
approximating the difficult probability of calculation.
Disadvantages of GAN :
1. Hard to train, unstable. Good synchronization is required between the generator and the
discriminator.
2. Mode collapse issue. The learning process of GANs may have a missing pattern, the
generator begins to degenerate, and the same sample points are always generated, and the
learning cannot be continued