ML Report Finel To Be Submitted

MACHINE LEARNING
A SUMMER INTER REPORT
Submitted by
Mohit Pal
00214807322
In partial fulfilment of summer internship for the award of the degree

of
BACHELOR OF TECHNOLOGY IN
ECE
Maharaja Agrasen Institute of Technology

ACKNOWLEDGEMENT
It is a matter of great pleasure for me to submit this Summer

Training Report on MACHINE LEARNING, as a part of curriculum for
award of BACHELOR’S IN TECHNOLOGY (ECE) degree of MAHARAJA
AGRASEN INSTITUTE OF TECHNOLOGY ( Affiliated to GGSIPU), Delhi.
It gives me an immense pleasure in acknowledging the effort of

entire technical and non-technical staff of 1stop, for giving me their
valuable time and full cooperation for undertaking this Practical
Summer Training Program at their online portal. I am indebted to the
members of the department for their wholehearted cooperation and
for their extended support in the use of available resources.
I would especially like to thank Project Mentor & Faculty Head,

without whose guidance and support this training would not have
been possible. Their encouragement and experience helped to
realize the practical aspect of programming. They gave me ample
support and help for accomplishment of my project. I feel grateful to
them for giving me the opportunity to have a practical experience in
this field. Their knowledge and immense work experience helped me
a lot in making this six weeks Practical Summer Training Program a
great learning experience.
MOHIT PAL
00214807322
5E1
ECE
ABSTRACT
Present day computer applications require the representation of
huge amount of complex knowledge and data in programs and thus
require tremendous amount of work. Our ability to code the
computers falls short of the demand for applications. If the
computers are endowed with the learning ability, then our burden of
coding the machine is eased (or at least reduced). This is
particularly true for developing expert systems where the "bottle-
neck" is to extract the expert’s knowledge and feed the knowledge
to computers. The present day computer programs in general (with
the exception of some Machine Learning programs) cannot correct
their own errors or improve from past mistakes, or learn to perform a
new task by analogy to a previously seen task. In contrast, human
beings are capable of all the above. Machine Learning will produce
smarter computers capable of all the above intelligent behavior.
The area of Machine Learning deals with the design of programs that
can learn rules from data, adapt to changes, and improve
performance with experience. In addition to being one of the initial
dreams of Computer Science, Machine Learning has become crucial
as computers are expected to solve increasingly complex problems
and become more integrated into our daily lives. This is a hard
problem, since making a machine learn from its computational tasks
requires work at several levels, and complexities and ambiguities
arise at each of those levels.
So, here we study how the Machine learning take place, what are the
methods, discuss various Projects (Implemented during Training)
applications, present and future status of machine learning.
TABLE OF CONTENT
Introduction to Machine Learning
-------------------------------------------------- 01
Architecture of Machine Learning Model -------------------------------------

04
Classification of Machine Learning -----------------------------------------------

09
Type of Machine Learning Algorithms----------------------------------------------------
Neural Networks--------------------------------------------------------------------------------
Reinforcement Learning----------------------------------------------------------------------
Python Machine Learning Packages-------------------------------------------------------
Projects--------------------------------------------------------------------------------------------
Future of Machine Learning-------------------------------------------------------------------
REFERENCES-------------------------------------------------------------------------------------
Introduction to Machine
Learning
Machine learning is the science of getting computers to act without being
explicitly programmed. In the past decade, machine learning has given us
self-driving cars, practical speech recognition, effective web search, and a
vastly improved understanding of the human genome. Machine learning
is so pervasive today that you probably use it dozens of times a day
without knowing it. Many researchers also think it is the best way to
make progress towards human-level AI.
General Definition: Ability of a machine to improve its own

performance through the use ofa software that employs artificial
intelligence techniques to mimic the ways by which humans seem to
learn, such as repetition and experience.
ML Definition by Tom M. Mitchell: A computer program is said to

learn from experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T, as measured by
P, improves with experience E.
Machine Learning (ML) is a sub-field of Artificial Intelligence (AI) which
concerns with developing computational theories of learning and building
learning machines. The goal of machine learning, closely coupled with
the goal of AI, is to achieve a thorough understanding about the nature of
learning process (both human learning and other forms of learning), about
the computational aspects of learning behaviors, and to implant the
learning capability in computer systems. Machine learning has been
recognized as central to the success of Artificial Intelligence, and it has
applications in various areas of science, engineering and society.
Learning?
Learning is a phenomenon and process which has manifestations of
various aspects. Roughly speaking, learning process includes (one or
more of) the following:
1.) Acquisition of new (symbolic) knowledge
2) Development of cognitive skills through instruction and practice.
3) Refinement and organization of knowledge into more effective

representations or more useful form
4) Discovery of new facts and theories through observation and

experiment
The general effect of learning in a system is the improvement of the

system’s capability to solve problems. It is hard to imagine a system
capable of learning cannot improve its problem- solving performance. A
system with learning capability should be able to do self-changing in
order to perform better in its future problem-solving.
We also note that learning cannot take place in isolation: We typically

learn something (knowledge K) to perform some tasks (T), through some
experience E, and whether we have learned well or not will be judged by
some performance criteria P at the task T
There are various forms of improvement of a system’s problem-solving

ability:
1) To solve wider range of problems than before and perform

generalization.
2) To solve the same problem more effectively and give better quality
solutions.
3) To solve the same problem more efficiently and faster.
The Goals of Machine Learning.

The goal of ML, in simple words, is to understand the nature of (human
and other forms of) learning, and to build learning capability in
computers. To be more specific, there are three aspects of the goals of
ML.
1) To make the computers smarter, more intelligent. The more direct

objective in this aspect is to develop systems (programs) for specific
practical learning tasks in application domains.
2) To develop computational models of human learning process and

perform computer simulations. The study in this aspect is also called
cognitive modeling.
3) To explore new learning methods and develop general learning

algorithms independent of applications.
Why the goals of ML are important and

desirable.?
The present day computer programs in general (with the exception of
some ML programs) cannot correct their own errors or improve from past
mistakes, or learn to perform a new task by analogy to a previously seen
task. In contrast, human beings are capable of all the above. ML will
produce smarter computers capable of all the above intelligent behavior.
It is clear that central to our intelligence is our ability to learn. Thus a

thorough understanding of human learning process is crucial to
understand human intelligence. ML will gain us the insight into the
underlying principles of human learning and that may lead to the
discovery of more effective education techniques. It will also contribute
to the design of machine learning systems.
Architecture of Machine
Learning Model
If we go into details of machine learning process, firstly we identify,

choose and get the data that we want to work with the data with which we
start is raw and unstructured, it is never in the correct form as needed for
actual processing. It could have duplicate data, or data that is missing, or
else a lot of extra data that is not needed. The data could be formed from
various sources which may also eventually end up being duplicate or
redundant data. In this case, there comes the requirement for
preprocessing the data, so that the process could understand the data, and
the good thing is that the machine learning products usually provide some
data pre-processing modules to process the raw or unstructured data.
So, in order to apply the actual algorithm to the data, we need to have that
complete unstructured data into a structured and shaped data for which a
process of pre-massaging is required, through which the data is passed.
Finally, we get a candidate copy of data which could be processes through
the algorithm to get the actual golden copy.
After the data is pre-processed, we get some good structured data, and this
data is now an input for machine learning. But is this a one-time job? Of
course not, the process has to be iterative, and it has to be iterative until
the data is available. In machine learning the major chunk of time is spent
in this process. That is, working on the data to make it structured, clean,
ready and available. Once the data is available, the algorithms could be
applied to the data. Not only pre-processing tools, but the machine
learning products also offer a large number of machine learning
algorithms as well. The result of the algorithm applied data is a model,
but now the question is whether this is the final model we needed.
No, it is the candidate model that we got. Candidate model means the first
most appropriate model that we get, but still it needs to be massaged. But
do we get only one candidate model? Of course not, since this is an
iterative process, we do not actually know what the best candidate model
is, until we again and again produce several candidate models through the
iterative process. We do it until we get the model that is good enough to
be deployed. Once the model is deployed, applications start making use of
it, so there is iteration at small levels and at the largest level as well.
We need to repeat the entire process again and again and re-create the
model at regular intervals. The reason again for this process is very
simple, it’s because the scenarios and factors change and we need to have
our model up to date and real all the time. This could eventually also
mean to process new data or applying new algorithms altogether.
Classification of Machine
Learning System
There some variations of how to define the types of Machine Learning
Systems but commonly they can be divided into categories according to
their purpose and the main categories are the following:
Supervised Machine Learning: Supervised learning is a machine

learning technique for learning a function from training data. The training
data consist of pairs of input objects (typically vectors), and desired
outputs. The output of the function can be a continuous value (called
regression), or can predict a class label of the input object (called
classification).
The task of the supervised learner is to predict the value of the function
for any valid input object after having seen a number of training examples
(i.e. pairs of input and target output). To achieve this, the learner has to
generalize from the presented data to unseen situations in a "reasonable"
way. “Supervised learning is a machine learning technique whereby the
algorithm is first presented with training data which consists of examples
which include both the inputs and the desired outputs; thus enabling it to
learn a function. The learner should then be able to generalize from the
presented data to unseen examples.” By Tom M. Mitchell
Unsupervised Machine Learning: Unsupervised learning is a type of
machine learning where manual labels of inputs are not used. It is
distinguished from supervised learning approaches which learn how to
perform a task, such as classification or regression, using a set of human
prepared examples. Unsupervised learning means we are only given the X
(Feature Vector) and some (ultimate) feedback function on our
performance. We simply have a training set of vectors without function
values of them. The problem in this case, typically, is to partition the
training set into subsets in some appropriate way. Input data is not labeled
and does not have a known result. A model is prepared by deducing
structures present in the input data. This may be to extract general rules. It
may be through a mathematical process to systematically reduce
redundancy, or it may be to organize data by similarity.
Semi-Supervised Learning: Semi-Supervised learning uses both labeled
and unlabeled data to perform an otherwise supervised learning or
unsupervised learning task. There is a desired prediction problem but the
model must learn the structures to organize the data as well as make
predictions. The goal is to learn a predictor that predicts future test data
better than the predictor learned from the labeled training data alone.
semisupervised learning finds applications in cognitive psychology as a
computational model for human learning. In human categorization and
concept forming, the environment provides unsupervised data (e.g., a
child watching surrounding objects by herself) in addition to labeled data
from a teacher (e.g., Dad points to an object and says “bird!”). There is
evidence that human beings can combine labeled and unlabeled data to
facilitate learning.
Reinforcement Learning: Reinforcement Learning is a type of Machine

Learning, and thereby also a branch of Artificial Intelligence. It allows
machines and software agents to automatically determine the ideal
behavior within a specific context, in order to maximize its performance.
Simple reward feedback is required for the agent to learn its behavior;
this is known as the reinforcement signal. Some applications of the
reinforcement learning algorithms are computer played board games
(Chess, Go), robotic hands, and self-driving cars.
Types of Machine Learning
Algorithms
Machine learning comes in many different flavors, depending on the

algorithm and its objectives. You can divide machine learning algorithms
into three main groups based on their purpose:
1.) Supervised Learning Algorithms

2.) Unsupervised Learning Algorithms
3.) Reinforcement Learning Algorithms
Supervised Learning Algorithms: Supervised learning is where you

have input variables (x) and an output variable (Y) and you use an
algorithm to learn the mapping function from the input to the output.
Y = F(X)
The goal is to approximate the mapping function so well that when you
have new input data (x) that you can predict the output variables (Y) for
that data.
We know the correct answers, the algorithm iteratively makes predictions
on the training data and is corrected by the teacher. Learning stops when
the algorithm achieves an acceptable level of performance.
Supervised learning problems can be further grouped into regression and

classification problems.
• Classification: A classification problem is when the output variable is a

category, such as “red” or “blue” or “disease” and “no disease”.
• Regression: A regression problem is when the output variable is a
real(continues) value, such as “dollars” or “weight”.
Some popular examples of supervised machine learning algorithms are:
• Linear Regression: Linear regression is a linear model, e.g. a model that

assumes a linear relationship between the input variables (x) and the
single output variable (y). More specifically, that y can be calculated from
a linear combination of the input variables (x).
When there is a single input variable (x), the method is referred to as
simple linear regression. When there are multiple input variables,
literature from statistics often refers to the method as multiple linear
regression. To define the supervised learning problem more formally,
given a training set, the aim is to learn a function so that is a
predictor for the corresponding value of Y. This function is called a
hypothesis. Next, we need to decide while designing a learning algorithm
is the representation if the hypothesis function as a function of . Let us
initially assume that the hypothesis function looks like this:
Here, are called parameters.
In linear regression, we have a training set and we want to come up with

values for the parameters so that the straight line we get out of
somehow fits the data well.
Let's try to choose values for the parameters so that given the in the
training set, we make reasonable predictions for the values. Formally,
we want to solve a minimization problem, that is, we want to minimize
the difference between . To achieve that, we solve the
following equation:
Here, is the number of training examples. To make the math a little bit
easier, we put a factor of , and it gives us the same value of the

process.
By convention, we define a cost function:
This cost function is also called the squared error function.

The expression means that we want to find the values of so that
the cost function is minimized.
Gradient Descent
Gradient descent is an algorithm that is used to minimize a function.
Gradient descent is used not only in linear regression; it is a more general
algorithm.
We will now learn how gradient descent algorithm is used to minimize
some arbitrary function f and, later on, we will apply it to a cost function
to determine its minimum.
We will start off by some initial guesses for the values of and
then keep on changing the values according to the formula:
Here, is called the learning rate, and it determines how big a step needs
to be taken when updating the parameters. The learning rate is always a
positive number.
We want to simultaneously update , that is, calculate the right-
hand-side of the above equation for both and then update the values of
the parameters to the newly calculated ones. This process is repeated till
convergence is achieved.
If is too small, then we will end up taking tiny baby steps, which
means a lot of steps before we get anywhere near the global minimum.
Now, if is too large, then there is a possibility that we miss the minimum
entirely. It may fail to converge or it can even diverge.
•Logistic Regression: Logistic regression is used for a different class of

problems known as classification problems. Here the aim is to predict the
group to which the current object under observation belongs to.
Classification is all about portioning the data with us into groups based on
certain features. Logistic regression is one of the most popular machine
learning algorithms for binary classification. This is because it is a simple
algorithm that performs very well on a wide range of problems.
T
Z=Θ X
Θ is coefficient Vector and X is Feature Vector.
In Logistic Regression, a sigmoid (also knows as logistic) function is

applied over the general known hypothesis function (as in Linear
Regression) to get it into a range of (0,1). Sigmoid function is as follows,
Here is Plot of Sigmoid Function,
the output is transformed into a probability using the logistic function

g(X)=P(y=1|x; Θ)
and as y can take only 0 & 1, the other value probability is 1 minus the
hypothesis value.
With the above interpretation we can safely decide the decision boundary
with the following rule: y=1 if g(y)≥0.5,
T T
else y=0. g(Θ X)≥0.5 implies Θ X≥0 and similarly for less than
condition.
Cost function
With the modified hypothesis function, taking a square error function
won't work as it no longer convex in nature and tedious to minimize. We
take up a new form of cost function which is as follows:
E(g(Θ,X),y) = −log(g(Θ,X)) if y=1 E(g(Θ,X),y) = −log(1−g(Θ,X)) if y=0
This can be written in a simpler form as:

E(g(Θ,X),y) = −ylog(g(Θ,X))−(1−y)log(1−g(Θ,X))
and it is quiet evident that it is equivalent to the above cost function. For
estimation of parameters, we take the mean of cost function over all
points in the training data. So,
where, C is Equal to Θ and “sg(z)” to “g(z)”.
For parameter estimation, we use an iterative method called gradient

descent that improves the parameters over each step and minimizes the
cost function H(C)H(C) to the most possible value.
In gradient descent, you start with random parameter values and then
update their values at each step to minimize the cost function by a some
amount at each step until we reach a minimum hopefully or until there is
negligible change over certain number of consecutive steps. The steps of
gradient descent go as follows:
where β is equal to Θ.
for each i =1,...n and p is the learning rate at which we move along the
slope on the curve to minimize the cost function.
• Naïve Bayes Classifier: The Naive Bayes Classifier technique is based on

the so-called Bayesian theorem and is particularly suited when the
dimensionality of the inputs is high. Despite its simplicity, Naive Bayes
can often outperform more sophisticated classification methods.
GREEN RED
To demonstrate the concept of Naïve Bayes Classification, consider the

example displayed in the illustration above. As indicated, the objects can
be classified as either GREEN or RED. Our task is to classify new cases
as they arrive, i.e., decide to which class label they belong, based on the
currently exiting objects.
Since there are twice as many GREEN objects as RED, it is reasonable to
believe that a new case (which hasn't been observed yet) is twice as likely
to have membership GREEN rather than RED. In the Bayesian analysis,
this belief is known as the prior probability. Prior probabilities are based
on previous experience, in this case the percentage of GREEN and RED
objects, and often used to predict outcomes before they actually happen.
Thus, we can write:
Since there is a total of 60 objects, 40 of which are GREEN and 20 RED,

our prior probabilities for class membership are:
Having formulated our prior probability, we are now ready to classify a

new object (WHITE circle). Since the objects are well clustered, it is
reasonable to assume that the more GREEN (or RED) objects in the
vicinity of X, the more likely that the new cases belong to that particular
color. To measure this likelihood, we draw a circle around X which
encompasses a number (to be chosen a priori) of points irrespective of
their class labels. Then we calculate the number of points in the circle
belonging to each class label. From this we calculate the likelihood:
From the illustration above, it is clear that Likelihood of X given GREEN
is smaller than Likelihood of X given RED, since the circle encompasses
1 GREEN object and 3 RED ones. Thus:
Although the prior probabilities indicate that X may belong to GREEN

(given that there are twice as many GREEN compared to RED) the
likelihood indicates otherwise; that the class membership of X is RED
(given that there are more RED objects in the vicinity of X than GREEN).
In the Bayesian analysis, the final classification is produced by combining
both sources of information, i.e., the prior and the likelihood, to form a
posterior probability using the so-called Bayes' rule (named after Rev.
Thomas Bayes 1702-1761).
Finally, we classify X as RED since its class membership achieves the

largest posterior probability.
(The above probabilities are not normalized. However, this does not affect
the classification outcome since their normalizing constants are the same.)
There are multiple variations of the Naive Bayes algorithm depending on
the distribution of . e.g. The Gaussian Naive Bayes algorithm,
The Multinomial Naive Bayes algorithm, The Bernoulli algorithm.
• Support Vector Machine: “Support Vector Machine” (SVM) is a

supervised machine learning algorithm which can be used for both
classification or regression challenges. However, it is mostly used in
classification problems. In this algorithm, we plot each data item as a
point in n-dimensional space (where n is number of features you have)
with the value of each feature being the value of a particular coordinate.
Then, we perform classification by finding the hyper-plane that
differentiate the two classes very well. The margin is de ned as the
distance between the separating hyperplane (decision boundary) and the
training samples that are closest to this hyperplane, which are the so-
called support vectors.
Maximum margin intuition
The rationale behind having decision boundaries with large margins is

that they tend to have a lower generalization error whereas models with
small margins are more prone to over tting. To get an intuition for the
margin maximization, let's take a closer look at those positive and
negative hyperplanes that are parallel to the decision boundary, which can
be expressed as follows:
T T
w0 +w xpos = 1 (1) w0+w xneg
= −1 (2)
If we subtract those two linear equations (1) and (2) from each other, we
get:
⇒wT
(xpos −xneg)=2
We can normalize this by the length of the vector w, which is de ned as
follows:
The left side of the preceding equation can then be interpreted as the
distance between the positive and negative hyperplane, which is the
socalled margin that we want to maximize. Now the objective function of
the SVM becomes the maximization of this margin
solved by quadratic programming.
• Tree Based Algorithms:

Decision Tree: Decision tree is a type of supervised learning algorithm
(having a pre-defined target variable) that is mostly used in classification
problems. It works for both categorical and continuous input and output
variables. In this technique, we split the population or sample into two or
more homogeneous sets (or sub-populations) based on most significant
splitter / differentiator in input variables.
Types of decision tree is based on the type of target variable we have. It

can be of two types:
1. Categorical Variable Decision Tree: Decision Tree which has

categorical target variable then it called as categorical variable
decision tree. Example:- In above scenario of student problem,
where the target variable was “Student will play cricket or not” i.e.
YES or NO.
2. Continuous Variable Decision Tree: Decision Tree has
continuous target variable then it is called as Continuous Variable
Decision Tree.
Random forest: Random forest is just an improvement over the top of the
decision tree algorithm. The core idea behind Random Forest is to
generate multiple small decision trees from random subsets of the data
(hence the name “Random Forest”). Each of the decision tree gives a
biased classifier (as it only considers a subset of the data). They each
capture different trends in the data. This ensemble of trees is like a team
of experts each with a little knowledge over the overall subject but
thorough in their area of expertise. Now, in case of classification the
majority vote is considered to classify a class. In analogy with experts, it
is like asking the same multiple choice question to each expert and taking
the answer as the one that most no. of experts vote as correct. In case of
Regression, we can use the avg. of all trees as our prediction. In addition
to this, we can also weight some more decisive trees high relative to
others by testing on the validation data.
Unsupervised Learning Algorithms: Unsupervised learning is where
you only have input data (X) and no corresponding output variables.The
goal for unsupervised learning is to model the underlying structure or
distribution in the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning

above there is no correct answers and there is no teacher. Algorithms are
left to their own devises to discover and present the interesting structure
in the data.
Unsupervised learning problems can be further grouped into clustering

and association problems.
• Clustering: A clustering problem is where you want to discover the

inherent groupings in the data, such as grouping customers by purchasing
behavior.
• Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people
that buy X also tend to buy Y.
• Clustering: Clustering is the task of dividing the population or data

points into a number of groups such that data points in the same groups
are more similar to other data points in the same group than those in other
groups. In simple words, the aim is to segregate groups with similar traits
and assign them into clusters.
Clustering can be divided into two subgroups :
Hard Clustering: In hard clustering, each data point either belongs to a
cluster completely or not. For example, in the above example each
customer is put into one group out of the 10 groups.
Soft Clustering: In soft clustering, instead of putting each data point into
a separate cluster, a probability or likelihood of that data point to be in
those clusters is assigned. For example, from the above scenario each
costumer is assigned a probability to be in either of 10 clusters of the
retail store.
K Means Clustering: K means is an iterative clustering algorithm that

aims to find local maxima in each iteration. This algorithm works in these
5 steps:
1 Specify the desired number of clusters K : Let us choose k=2 for

these 5 data points in 2-D space.
2 Randomly assign each data point to a cluster : Let’s assign three

points in cluster 1 shown using red color and two points in cluster 2
shown using
grey color.
3 Compute cluster centroids : The centroid of data points in the red
cluster is shown using red cross and those in grey cluster using grey cross.
4 Re-assign each point to the closest cluster centroid : Note that only
the data point at the bottom is assigned to the red cluster even though its
closer to the centroid of grey cluster. Thus, we assign that data point into
grey cluster
5 Re-compute cluster centroids : Now, re-computing the centroids for

both the clusters.
6 Repeat steps 4 and 5 until no improvements are possible : Similarly,

we’ll repeat the 4th and 5th steps until we’ll reach global optima. When
there will be no further switching of data points between two clusters for
two successive repeats. It will mark the termination of the algorithm if not
explicitly mentioned.
Hierarchical Clustering: Hierarchical clustering, as the name suggests is
an algorithm that builds hierarchy of clusters. This algorithm starts with
all the data points assigned to a cluster of their own. Then two nearest
clusters are merged into the same cluster. In the end, this algorithm
terminates when there is only a single cluster left.
The results of hierarchical clustering can be shown using dendrogram.
The dendrogram can be interpreted as:
At the bottom, we start with 25 data points, each assigned to separate

clusters. Two closest clusters are then merged till we have just one cluster
at the top. The height in the dendrogram at which two clusters are merged
represents the distance between two clusters in the data space.
The decision of the no. of clusters that can best depict different groups
can be chosen by observing the dendrogram. The best choice of the no. of
clusters is the no. of vertical lines in the dendrogram cut by a horizontal
line that can transverse the maximum distance vertically without
intersecting a cluster.
In the above example, the best choice of no. of clusters will be 4 as the
red horizontal line in the dendrogram below covers maximum vertical
distance AB.
Two important things that you should know about hierarchical clustering
are:
1 This algorithm has been implemented above using bottom up

approach. It is also possible to follow top-down approach starting with
all data points assigned in the same cluster and recursively performing
splits till each data point is assigned a separate cluster.
2 The decision of merging two clusters is taken on the basis of

closeness of these clusters. There are multiple metrics for deciding the
closeness of two clusters : o Euclidean distance: ||a-b||2 = √(Σ(ai-bi)) o
Squared Euclidean distance: ||a-b||22 = Σ((ai-bi)2) o Manhattan distance:
||a-b||1 = Σ|ai-bi|
o Maximum distance:||a-b||INFINITY = maxi|ai-bi|
Neural Networks
An Artificial Neural Network (ANN) is a computational model that is
inspired by the way biological neural networks in the human brain
process information
Single Neuron(Perceptron):
The basic unit of computation in a neural network is the neuron, often
called a node or unit. It receives input from some other nodes, or from an
external source and computes an output. Each input has an associated
weight (w), which is assigned on the basis of its relative importance to
other inputs. The node applies a function f (defined below) to the
weighted sum of its inputs as shown in Figure 1 below:
The above network takes numerical inputs X1 and X2 and has weights w1
and w2 associated with those inputs. Additionally, there is another input 1
with weight b (called the Bias) associated with it. We will learn more
details about role of the bias later.
The output Y from the neuron is computed as shown in the Figure 1. The
function f is non-linear and is called the Activation Function. The
purpose of the activation function is to introduce non-linearity into the
output of a neuron. This is important because most real world data is non
linear and we want neurons to learn these non linear representations.
Every activation function (or non-linearity) takes a single number and

performs a certain fixed mathematical operation on it [2]. There are
several activation functions you may encounter in practice:
• Sigmoid: takes a real-valued input and squashes it to range
between 0 and 1
σ(x) = 1 / (1 + exp(−x))
• tanh: takes a real-valued input and squashes it to the range [-1, 1]
tanh(x) = 2σ(2x) − 1
• ReLU: ReLU stands for Rectified Linear Unit. It takes a real-

valued input and thresholds it at zero (replaces negative values with
zero)
f(x) = max(0, x)
The below figures show each of the above activation functions.
FeedForward Neural Network: The feedforward neural network was

the first and simplest type of artificial neural network devised. It contains
multiple neurons (nodes) arranged in layers. Nodes from adjacent layers
have connections or edges between them All these connections have
weights associated with them. In a feedforward network, the information
moves in only one direction – forward – from the input nodes, through the
hidden nodes (if any) and to the output nodes. There are no cycles or
loops in the network (this property of feed forward networks is different
from Recurrent Neural Networks in which the connections between the
nodes form a cycle).
A feedforward neural network can consist of three types of nodes:
1. Input Nodes – The Input nodes provide information from the

outside world to the network and are together referred to as the
“Input Layer”. No computation is performed in any of the Input
nodes – they just pass on the information to the hidden nodes.
2. Hidden Nodes – The Hidden nodes have no direct connection with
the outside world (hence the name “hidden”). They perform
computations and transfer information from the input nodes to the
output nodes. A collection of hidden nodes forms a “Hidden
Layer”. While a feedforward network will only have a single input
layer and a single output layer, it can have zero or multiple Hidden
Layers.
3. Output Nodes – The Output nodes are collectively referred to as
the “Output Layer” and are responsible for computations and
transferring information from the network to the outside world.
Two examples of feedforward networks are given below:
Single Layer Perceptron – This is the simplest feedforward neural

network and does not contain any hidden layer.
Multi Layer Perceptron – A Multi Layer Perceptron has one or more

hidden layers. We will only discuss Multi Layer Perceptron’s below
since they are more useful than Single Layer Perceptron’s for practical
applications today.
Backpropagation Algorithm: The Backpropagation algorithm is a
supervised learning method for multilayer feed-forward networks from
the field of Artificial Neural Networks.
Feed-forward neural networks are inspired by the information processing
of one or more neural cells, called a neuron. A neuron accepts input
signals via its dendrites, which pass the electrical signal down to the cell
body. The axon carries the signal out to synapses, which are the
connections of a cell’s axon to other cell’s dendrites.
The principle of the backpropagation approach is to model a given

function by modifying internal weightings of input signals to produce an
expected output signal. The system is trained using a supervised learning
method, where the error between the system’s output and a known
expected output is presented to the system and used to modify its internal
state.
Reinforcement Learning
In reinforcement learning, the goal is to develop a system (agent) that
improves its performance based on interactions with the environment.
Since the information about the current state of the environment typically
also includes a so-called reward signal, we can think of reinforcement
learning as a eld related to supervised learning. However, in
reinforcement learning this feedback is not the correct ground truth label
or value, but a measure of how well the action was measured by a reward
function. Through the interaction with the environment, an agent can then
use reinforcement learning to learn a series of actions that maximizes this
reward via an exploratory trial-and-error approach or deliberative
planning.
Consider an example of a child learning to walk.
Let’s formalize the above example, the “problem statement” of the

example is to walk, where the child is an agent trying to manipulate the
environment (which is the surface on which it walks) by taking
actions (viz walking) and he/she tries to go from one state (viz each
step he/she takes) to another. The child gets a reward (let’s say
chocolate) when he/she accomplishes a submodule of the task (viz
taking couple of steps) and will not receive any chocolate (a.k.a
negative reward) when he/she is not able to walk. This is a simplified
description of a reinforcement learning problem.
Markov Decision Process: The mathematical framework for defining a
solution in reinforcement learning scenario is called Markov Decision
Process. This can be designed as:
• Set of states, S
• Set of actions, A
• Reward function, R
• Policy, π
• Value, V
We have to take an action (A) to transition from our start state to our end
state (S). In return getting rewards (R) for each action we take. Our
actions can lead to a positive reward or negative reward.
The set of actions we took define our policy (π) and the rewards we get in
return defines our value (V). Our task here is to maximize our rewards by
choosing the correct policy. So we have to maximize
for all possible values of S for a time t.
Q-learning: Q-learning is a policy based learning algorithm with the

function approximator as a neural network. This algorithm was used by
Google to beat humans at Atari games!
Let’s see a pseudocode of Q-learning:
1. Initialize the Values table ‘Q(s, a)’.

2. Observe the current state ‘s’.
3. Choose an action ‘a’ for that state based on one of the action
selection policies (e.g. epsilon greedy)
4. Take the action, and observe the reward ‘r’ as well as the new state
‘s’.
5. Update the Value for the state using the observed reward and the
maximum reward possible for the next state. The updating is done
according to the formula and parameters described above.
6. Set the state to the new state, and repeat the process until a terminal
state is reached.
A simple description of Q-learning can be summarized as follows:
Some major domains where RL has been applied are as follows:
• Game Theory and Multi-Agent Interaction

• Robotics
• Computer Networking
• Vehicular Navigation
• Medicine and
• Industrial Logistic.
Python Machine Learning
Packages
Python is often the choice for developers who need to apply statistical
techniques or data analysis in their work, or for data scientists whose
tasks need to be integrated with web apps or production environments. In
particular, Python really shines in the field of machine learning. Its
combination of machine learning libraries and flexibility makes Python
uniquely well-suited to developing sophisticated models and prediction
engines that plug directly into production systems.
One of Python’s greatest assets is its extensive set of libraries. Libraries
are sets of routines and functions that are written in a given language. A
robust set of libraries can make it easier for developers to perform
complex tasks without rewriting many lines of code.
Basic libraries for Machine Learning:
These are the basic libraries that transform Python from a general purpose
programming language into a powerful and robust tool for data analysis
and visualization. Sometimes called the SciPy Stack, they’re the
foundation that the more specialized tools are built on.
1.) NumPy is the foundational library for scientific computing in

Python, and many of the libraries on this list use NumPy arrays as
their basic inputs and outputs. In short, NumPy introduces objects
for multidimensional arrays and matrices, as well as routines that
allow developers to perform advanced mathematical and statistical
functions on those arrays with as little code as possible.
2.) SciPy builds on NumPy by adding a collection of algorithms and
high-level commands for manipulating and visualizing data. This
package includes functions for computing integrals numerically,
solving differential equations, optimization, and more.
3.) Pandas adds data structures and tools that are designed for practical
data analysis in finance, statistics, social sciences, and engineering.
Pandas works well with incomplete, messy, and unlabeled data
(i.e., the kind of data you’re likely to encounter in the real world),
and provides tools for shaping, merging, reshaping, and slicing
datasets.
4.) IPython(Jupyter Notebook) extends the functionality of Python’s
interactive interpreter with a souped-up interactive shell that adds
introspection, rich media, shell syntax, tab completion, and
command history retrieval. It also acts as an embeddable interpreter
for your programs that can be really useful for debugging. If
you’ve ever used Mathematica or MATLAB, you should feel
comfortable with IPython.
5.) matplotlib is the standard Python library for creating 2D plots and
graphs. It’s pretty low-level, meaning it requires more commands
to generate nice-looking graphs and figures than with some more
advanced libraries. However, the flip side of that is flexibility. With
enough commands, you can make just about any kind of graph you
want with matplotlib.
Libraries for Machine Learning:
Machine learning sits at the intersection of Artificial Intelligence and

statistical analysis. By training computers with sets of real-world data,
we’re able to create algorithms that make more accurate and sophisticated
predictions, whether we’re talking about getting better driving directions
or building computers that can identify landmarks just from looking at
pictures. The following libraries give Python the ability to tackle a
number of machine learning tasks, from performing basic regressions to
training complex neural networks.
1. scikit-learn builds on NumPy and SciPy by adding a set of

algorithms for common machine learning and data mining tasks,
including clustering, regression, and classification. As a library,
scikit-learn has a lot going for it. Its tools are well-documented and
its contributors include many machine learning experts. What’s
more, it’s a very curated library, meaning developers won’t have to
choose between different versions of the same algorithm. Its power
and ease of use make it popular with a lot of data-heavy startups,
including Evernote, OKCupid, Spotify, and Birchbox.
2. Theano uses NumPy-like syntax to optimize and evaluate
mathematical expressions. What sets Theano apart is that it takes
advantage of the computer’s GPU in order to make data-intensive
calculations up to 100x faster than the CPU alone. Theano’s speed
makes it especially valuable for deep learning and other
computationally complex tasks.
3. TensorFlow is another high-profile entrant into machine learning,
developed by Google as an open-source successor to DistBelief,
their previous framework for training neural networks. TensorFlow
uses a system of multi-layered nodes that allow you to quickly set
up, train, and deploy artificial neural networks with large datasets.
It’s what allows Google to identify objects in photos or understand
spoken words in its voice-recognition app.
Projects
During Summer Training Various Machine Learning Projects are done. A
short Introduction of one of important project give below.
MNIST Handwritten Digit Recognition:

It is a digit recognition task. As such there are 10 digits (0 to 9) or 10
classes to predict. Results are reported using prediction error, which is
nothing more than the inverted classification accuracy
Images (MNIST Dateset) of digits were taken from a variety of scanned
documents, normalized in size and centered. This makes it an excellent
dataset for evaluating models, allowing the developer to focus on the
machine learning with very little data cleaning or preparation required.
Each image is a 28 by 28 pixel square (784 pixels total). A standard spit
of the dataset is used to evaluate and compare models, where 60,000
images are used to train a model and a separate set of 10,000 images are
used to test it.
During the Project Various Python Scientific Computing (NumPy) and
Data Visualization (Matplotlib) Packages is Used for Exploring Dataset
and to Visualize Data to see if there is Relation between the features of
Dataset. Scikit-Learn (A Machine Learning Python Library) is used to
Model Machine Learning Algorithms.
Project Code:
Best Estimator learned through GridSearch
SVC(C=3,cache_size=200,class_weight=None,coef0=0.0, degree=3,
gamma=0.001,kernel='rbf', max_iter=-1, probability=False,
random_state=None,
shrinking=True, tol=0.001, verbose=False)
Other Projects Which are implemented During Summer Training:
1.) Sentimental Analysis of Tweets

2.) Face Recognition
3.) Stock Prediction
4.) Music Genre Classification
5.) Image Classification
Future of Machine Learning
Research in Machine Learning Theory is a combination of attacking
established fundamental questions, and developing new frameworks for
modeling the needs of new machine learning applications. While it is
impossible to know where the next breakthroughs will come, a few topics
one can expect the future to hold include:
• Better understanding how auxiliary information, such as unlabeled

data, hints from a user, or previously-learned tasks, can best be used by a
machine learning algorithm to improve its ability to learn new things.
Traditionally, Machine Learning Theory has focused on problems of
learning a task (say, identifying spam) from labeled examples (email
labeled as spam or not). However, often there is additional information
available. One might have access to large quantities of unlabeled data
(email messages not labeled by their type, or discussion-group transcripts
on the web) that could potentially provide useful information. One might
have other hints from the user besides just labels, e.g. highlighting
relevant portions of the email message. Or, one might have previously
learned similar tasks and want to transfer some of that experience to the
job at hand. These are all issues for which a solid theory is only beginning
to be developed.
• Further developing connections to economic theory. As software

agents based on machine learning are used in competitive settings,
“strategic” issues become increasingly important. Most algorithms and
models to date have focused on the case of a single learning algorithm
operating in an environment that, while it may be changing, does not have
its own motivations and strategies. However, if learning algorithms are to
operate in settings dominated by other adaptive algorithms acting in their
own users’ interests, such as bidding on items or performing various kinds
of negotiations, then we have a true merging of computer science and
economic models. In this combination, many of the fundamental issues
are still wide open. Report
REFERENCES
Books:
Sebastian Raschka (2015), Python Machine Learning
Richard S. Sutton, A. G. (2015 Draft). Reinforcement Learning. MIT

Press.
Jiawei Han, Micheline Kamber, Jian Pei(2000). Data Mining: Concepts

and Techniques, 3rd Edition.
Links:
https://www.medium.com/
https://www.analyticsvidhya.com
http://www.tutorialspoint.com/numpy
http://www.tutorialpoint.com/pandas

ML Report Finel To Be Submitted

Uploaded by

Copyright:

Available Formats

ML Report Finel To Be Submitted

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Report Finel To Be Submitted

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING

A SUMMER INTER REPORT

In partial fulfilment of summer internship for the award of the degree

Maharaja Agrasen Institute of Technology

It is a matter of great pleasure for me to submit this Summer

It gives me an immense pleasure in acknowledging the effort of

I would especially like to thank Project Mentor & Faculty Head,

Architecture of Machine Learning Model -------------------------------------

Classification of Machine Learning -----------------------------------------------

Type of Machine Learning Algorithms----------------------------------------------------

Python Machine Learning Packages-------------------------------------------------------

Future of Machine Learning-------------------------------------------------------------------

General Definition: Ability of a machine to improve its own

ML Definition by Tom M. Mitchell: A computer program is said to

1.) Acquisition of new (symbolic) knowledge

2) Development of cognitive skills through instruction and practice.

3) Refinement and organization of knowledge into more effective

4) Discovery of new facts and theories through observation and

The general effect of learning in a system is the improvement of the

We also note that learning cannot take place in isolation: We typically

There are various forms of improvement of a system’s problem-solving

1) To solve wider range of problems than before and perform

3) To solve the same problem more efficiently and faster.

The Goals of Machine Learning.

1) To make the computers smarter, more intelligent. The more direct

2) To develop computational models of human learning process and

3) To explore new learning methods and develop general learning

Why the goals of ML are important and

It is clear that central to our intelligence is our ability to learn. Thus a

If we go into details of machine learning process, firstly we identify,

Supervised Machine Learning: Supervised learning is a machine

Reinforcement Learning: Reinforcement Learning is a type of Machine

Machine learning comes in many different flavors, depending on the

1.) Supervised Learning Algorithms

Supervised Learning Algorithms: Supervised learning is where you

Supervised learning problems can be further grouped into regression and

• Classification: A classification problem is when the output variable is a

Some popular examples of supervised machine learning algorithms are:

• Linear Regression: Linear regression is a linear model, e.g. a model that

In linear regression, we have a training set and we want to come up with

easier, we put a factor of , and it gives us the same value of the

By convention, we define a cost function:

This cost function is also called the squared error function.

•Logistic Regression: Logistic regression is used for a different class of

In Logistic Regression, a sigmoid (also knows as logistic) function is

the output is transformed into a probability using the logistic function

This can be written in a simpler form as:

For parameter estimation, we use an iterative method called gradient

• Naïve Bayes Classifier: The Naive Bayes Classifier technique is based on

To demonstrate the concept of Naïve Bayes Classification, consider the

Since there is a total of 60 objects, 40 of which are GREEN and 20 RED,

Having formulated our prior probability, we are now ready to classify a

Although the prior probabilities indicate that X may belong to GREEN

Finally, we classify X as RED since its class membership achieves the

• Support Vector Machine: “Support Vector Machine” (SVM) is a

Maximum margin intuition

The rationale behind having decision boundaries with large margins is

solved by quadratic programming.

• Tree Based Algorithms: