ML Report Finel To Be Submitted
ML Report Finel To Be Submitted
ML Report Finel To Be Submitted
Submitted by
Mohit Pal
00214807322
BACHELOR OF TECHNOLOGY IN
ECE
MOHIT PAL
00214807322
5E1
ECE
ABSTRACT
Present day computer applications require the representation of
huge amount of complex knowledge and data in programs and thus
require tremendous amount of work. Our ability to code the
computers falls short of the demand for applications. If the
computers are endowed with the learning ability, then our burden of
coding the machine is eased (or at least reduced). This is
particularly true for developing expert systems where the "bottle-
neck" is to extract the expert’s knowledge and feed the knowledge
to computers. The present day computer programs in general (with
the exception of some Machine Learning programs) cannot correct
their own errors or improve from past mistakes, or learn to perform a
new task by analogy to a previously seen task. In contrast, human
beings are capable of all the above. Machine Learning will produce
smarter computers capable of all the above intelligent behavior.
The area of Machine Learning deals with the design of programs that
can learn rules from data, adapt to changes, and improve
performance with experience. In addition to being one of the initial
dreams of Computer Science, Machine Learning has become crucial
as computers are expected to solve increasingly complex problems
and become more integrated into our daily lives. This is a hard
problem, since making a machine learn from its computational tasks
requires work at several levels, and complexities and ambiguities
arise at each of those levels.
So, here we study how the Machine learning take place, what are the
methods, discuss various Projects (Implemented during Training)
applications, present and future status of machine learning.
TABLE OF CONTENT
Introduction to Machine Learning
-------------------------------------------------- 01
Neural Networks--------------------------------------------------------------------------------
Reinforcement Learning----------------------------------------------------------------------
Projects--------------------------------------------------------------------------------------------
REFERENCES-------------------------------------------------------------------------------------
Introduction to Machine
Learning
Machine learning is the science of getting computers to act without being
explicitly programmed. In the past decade, machine learning has given us
self-driving cars, practical speech recognition, effective web search, and a
vastly improved understanding of the human genome. Machine learning
is so pervasive today that you probably use it dozens of times a day
without knowing it. Many researchers also think it is the best way to
make progress towards human-level AI.
Learning?
Learning is a phenomenon and process which has manifestations of
various aspects. Roughly speaking, learning process includes (one or
more of) the following:
So, in order to apply the actual algorithm to the data, we need to have that
complete unstructured data into a structured and shaped data for which a
process of pre-massaging is required, through which the data is passed.
Finally, we get a candidate copy of data which could be processes through
the algorithm to get the actual golden copy.
After the data is pre-processed, we get some good structured data, and this
data is now an input for machine learning. But is this a one-time job? Of
course not, the process has to be iterative, and it has to be iterative until
the data is available. In machine learning the major chunk of time is spent
in this process. That is, working on the data to make it structured, clean,
ready and available. Once the data is available, the algorithms could be
applied to the data. Not only pre-processing tools, but the machine
learning products also offer a large number of machine learning
algorithms as well. The result of the algorithm applied data is a model,
but now the question is whether this is the final model we needed.
No, it is the candidate model that we got. Candidate model means the first
most appropriate model that we get, but still it needs to be massaged. But
do we get only one candidate model? Of course not, since this is an
iterative process, we do not actually know what the best candidate model
is, until we again and again produce several candidate models through the
iterative process. We do it until we get the model that is good enough to
be deployed. Once the model is deployed, applications start making use of
it, so there is iteration at small levels and at the largest level as well.
We need to repeat the entire process again and again and re-create the
model at regular intervals. The reason again for this process is very
simple, it’s because the scenarios and factors change and we need to have
our model up to date and real all the time. This could eventually also
mean to process new data or applying new algorithms altogether.
Classification of Machine
Learning System
There some variations of how to define the types of Machine Learning
Systems but commonly they can be divided into categories according to
their purpose and the main categories are the following:
The task of the supervised learner is to predict the value of the function
for any valid input object after having seen a number of training examples
(i.e. pairs of input and target output). To achieve this, the learner has to
generalize from the presented data to unseen situations in a "reasonable"
way. “Supervised learning is a machine learning technique whereby the
algorithm is first presented with training data which consists of examples
which include both the inputs and the desired outputs; thus enabling it to
learn a function. The learner should then be able to generalize from the
presented data to unseen examples.” By Tom M. Mitchell
Unsupervised Machine Learning: Unsupervised learning is a type of
machine learning where manual labels of inputs are not used. It is
distinguished from supervised learning approaches which learn how to
perform a task, such as classification or regression, using a set of human
prepared examples. Unsupervised learning means we are only given the X
(Feature Vector) and some (ultimate) feedback function on our
performance. We simply have a training set of vectors without function
values of them. The problem in this case, typically, is to partition the
training set into subsets in some appropriate way. Input data is not labeled
and does not have a known result. A model is prepared by deducing
structures present in the input data. This may be to extract general rules. It
may be through a mathematical process to systematically reduce
redundancy, or it may be to organize data by similarity.
Semi-Supervised Learning: Semi-Supervised learning uses both labeled
and unlabeled data to perform an otherwise supervised learning or
unsupervised learning task. There is a desired prediction problem but the
model must learn the structures to organize the data as well as make
predictions. The goal is to learn a predictor that predicts future test data
better than the predictor learned from the labeled training data alone.
semisupervised learning finds applications in cognitive psychology as a
computational model for human learning. In human categorization and
concept forming, the environment provides unsupervised data (e.g., a
child watching surrounding objects by herself) in addition to labeled data
from a teacher (e.g., Dad points to an object and says “bird!”). There is
evidence that human beings can combine labeled and unlabeled data to
facilitate learning.
Y = F(X)
The goal is to approximate the mapping function so well that when you
have new input data (x) that you can predict the output variables (Y) for
that data.
We know the correct answers, the algorithm iteratively makes predictions
on the training data and is corrected by the teacher. Learning stops when
the algorithm achieves an acceptable level of performance.
Here, is the number of training examples. To make the math a little bit
Gradient Descent
Gradient descent is an algorithm that is used to minimize a function.
Gradient descent is used not only in linear regression; it is a more general
algorithm.
We will now learn how gradient descent algorithm is used to minimize
some arbitrary function f and, later on, we will apply it to a cost function
to determine its minimum.
We will start off by some initial guesses for the values of and
then keep on changing the values according to the formula:
Here, is called the learning rate, and it determines how big a step needs
to be taken when updating the parameters. The learning rate is always a
positive number.
We want to simultaneously update , that is, calculate the right-
hand-side of the above equation for both and then update the values of
the parameters to the newly calculated ones. This process is repeated till
convergence is achieved.
If is too small, then we will end up taking tiny baby steps, which
means a lot of steps before we get anywhere near the global minimum.
Now, if is too large, then there is a possibility that we miss the minimum
entirely. It may fail to converge or it can even diverge.
and as y can take only 0 & 1, the other value probability is 1 minus the
hypothesis value.
With the above interpretation we can safely decide the decision boundary
with the following rule: y=1 if g(y)≥0.5,
T T
else y=0. g(Θ X)≥0.5 implies Θ X≥0 and similarly for less than
condition.
Cost function
With the modified hypothesis function, taking a square error function
won't work as it no longer convex in nature and tedious to minimize. We
take up a new form of cost function which is as follows:
E(g(Θ,X),y) = −log(g(Θ,X)) if y=1 E(g(Θ,X),y) = −log(1−g(Θ,X)) if y=0
where β is equal to Θ.
for each i =1,...n and p is the learning rate at which we move along the
slope on the curve to minimize the cost function.
GREEN RED
T T
w0 +w xpos = 1 (1) w0+w xneg
= −1 (2)
If we subtract those two linear equations (1) and (2) from each other, we
get:
⇒wT
(xpos −xneg)=2
We can normalize this by the length of the vector w, which is de ned as
follows:
The left side of the preceding equation can then be interpreted as the
distance between the positive and negative hyperplane, which is the
socalled margin that we want to maximize. Now the objective function of
the SVM becomes the maximization of this margin
Random forest: Random forest is just an improvement over the top of the
decision tree algorithm. The core idea behind Random Forest is to
generate multiple small decision trees from random subsets of the data
(hence the name “Random Forest”). Each of the decision tree gives a
biased classifier (as it only considers a subset of the data). They each
capture different trends in the data. This ensemble of trees is like a team
of experts each with a little knowledge over the overall subject but
thorough in their area of expertise. Now, in case of classification the
majority vote is considered to classify a class. In analogy with experts, it
is like asking the same multiple choice question to each expert and taking
the answer as the one that most no. of experts vote as correct. In case of
Regression, we can use the avg. of all trees as our prediction. In addition
to this, we can also weight some more decisive trees high relative to
others by testing on the validation data.
Unsupervised Learning Algorithms: Unsupervised learning is where
you only have input data (X) and no corresponding output variables.The
goal for unsupervised learning is to model the underlying structure or
distribution in the data in order to learn more about the data.
Soft Clustering: In soft clustering, instead of putting each data point into
a separate cluster, a probability or likelihood of that data point to be in
those clusters is assigned. For example, from the above scenario each
costumer is assigned a probability to be in either of 10 clusters of the
retail store.
grey color.
3 Compute cluster centroids : The centroid of data points in the red
cluster is shown using red cross and those in grey cluster using grey cross.
4 Re-assign each point to the closest cluster centroid : Note that only
the data point at the bottom is assigned to the red cluster even though its
closer to the centroid of grey cluster. Thus, we assign that data point into
grey cluster
Single Neuron(Perceptron):
The basic unit of computation in a neural network is the neuron, often
called a node or unit. It receives input from some other nodes, or from an
external source and computes an output. Each input has an associated
weight (w), which is assigned on the basis of its relative importance to
other inputs. The node applies a function f (defined below) to the
weighted sum of its inputs as shown in Figure 1 below:
The above network takes numerical inputs X1 and X2 and has weights w1
and w2 associated with those inputs. Additionally, there is another input 1
with weight b (called the Bias) associated with it. We will learn more
details about role of the bias later.
The output Y from the neuron is computed as shown in the Figure 1. The
function f is non-linear and is called the Activation Function. The
purpose of the activation function is to introduce non-linearity into the
output of a neuron. This is important because most real world data is non
linear and we want neurons to learn these non linear representations.
σ(x) = 1 / (1 + exp(−x))
tanh(x) = 2σ(2x) − 1
f(x) = max(0, x)
• Set of states, S
• Set of actions, A
• Reward function, R
• Policy, π
• Value, V
We have to take an action (A) to transition from our start state to our end
state (S). In return getting rewards (R) for each action we take. Our
actions can lead to a positive reward or negative reward.
The set of actions we took define our policy (π) and the rewards we get in
return defines our value (V). Our task here is to maximize our rewards by
choosing the correct policy. So we have to maximize
These are the basic libraries that transform Python from a general purpose
programming language into a powerful and robust tool for data analysis
and visualization. Sometimes called the SciPy Stack, they’re the
foundation that the more specialized tools are built on.
3.) Pandas adds data structures and tools that are designed for practical
data analysis in finance, statistics, social sciences, and engineering.
Pandas works well with incomplete, messy, and unlabeled data
(i.e., the kind of data you’re likely to encounter in the real world),
and provides tools for shaping, merging, reshaping, and slicing
datasets.
4.) IPython(Jupyter Notebook) extends the functionality of Python’s
interactive interpreter with a souped-up interactive shell that adds
introspection, rich media, shell syntax, tab completion, and
command history retrieval. It also acts as an embeddable interpreter
for your programs that can be really useful for debugging. If
you’ve ever used Mathematica or MATLAB, you should feel
comfortable with IPython.
5.) matplotlib is the standard Python library for creating 2D plots and
graphs. It’s pretty low-level, meaning it requires more commands
to generate nice-looking graphs and figures than with some more
advanced libraries. However, the flip side of that is flexibility. With
enough commands, you can make just about any kind of graph you
want with matplotlib.
Project Code:
Best Estimator learned through GridSearch
SVC(C=3,cache_size=200,class_weight=None,coef0=0.0, degree=3,
gamma=0.001,kernel='rbf', max_iter=-1, probability=False,
random_state=None,
shrinking=True, tol=0.001, verbose=False)
Other Projects Which are implemented During Summer Training:
Books:
Links:
https://www.medium.com/
https://www.analyticsvidhya.com
http://www.tutorialspoint.com/numpy
http://www.tutorialpoint.com/pandas