Module 3
Module 3
Module 3
4CS1201
Syllabus • MODULE 3
Supervised Learning
• Types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the
output.
• The labelled data means some input data is already tagged with the
correct output.
• Supervised learning is a process of providing input data as well as
correct output data to the machine learning model.
• The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
How Supervised Learning Works?
Manhattan Distance
Manhattan Distance metric is generally used when we are interested in the total distance traveled by the object
instead of the displacement. This metric is calculated by summing the absolute difference between the coordinates of
the points in n-dimensions.
1. K-Nearest Neighbor
• Distance Metric Used in K-NN
Understanding data points and Euclidean Distance
Let X be the training dataset with n data points, where each data point is
represented by a d-dimensional feature vector and Y be the corresponding
labels or values for each data point in X. Given a new data point x, the
algorithm calculates the distance between x and each data point in X using
Euclidean distance as :
1. K-Nearest Neighbor
• How to select the value of K in the K-NN Algorithm?
• If the input data has more outliers or noise, a higher value of k would
be better.
• It is recommended to choose an odd value for k to avoid ties in binary
classification.
1. K-Nearest Neighbor
1. K-Nearest Neighbor
• 1. Solved Numerical Example of KNN Classifier to classify New Insta
nce IRIS Example by Mahesh
Huddar – YouTube
• 2. Solved Example KNN Classifier to classify New Instance Height an
d Weight Example by
mahesh Huddar (youtube.com)
• 3. K nearest Neighbor Learning Algorithm Lazy Learner Solved Exam
ple Dr. Mahesh
Huddar (youtube.com)
• Solved Example K Nearest Neighbors Algorithm Weighted KNN to cl
assify New Instance by Mahesh
Huddar (youtube.com)
KNN Algorithm In Machine Learning | KNN Algorithm Using Python | K Nearest Neighbor | Simplilearn (youtube.com)
Support Vector Machines (SVM)
• Used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine.
Support Vector Machines (SVM)
Support
Vector
Machines
(SVM)
Support Vector Machines (SVM)
• Hyperplane: There can be multiple lines/decision boundaries to
segregate the classes in n-dimensional space, but we need to find out
the best decision boundary that helps to classify the data points. This
best boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in
the dataset, which means if there are 2 features (as shown in image),
then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which
means the maximum distance between the data points.
Support Vector Machines (SVM)
• Support Vectors: The data points or vectors that are the closest to the
hyperplane and which affect the position of the hyperplane are termed
as Support Vector.
• Since these vectors support the hyperplane, hence called a Support
vector.
Types of SVM
• 2 Types
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Support Vector Machines (SVM)
1. Linear SVM
Support Vector Machines (SVM)
Since projection of any vector on
another vector is dot-product .
Therefore,
We all know the equation of a hyperplane is w.x+b=0 where w is a
vector normal to hyperplane and b is an offset. To classify a point as
negative or positive we need to define a decision rule.
Sensitivity tells us what proportion of the positive class got correctly classified.
False Negative Rate (FNR) tells us what proportion of the positive class got incorrectly
classified by the classifier.
A higher TPR and a lower FNR are desirable since we want to classify the positive class
correctly.
• Specificity / True Negative Rate
Specificity tells us what proportion of the negative class got correctly classified.
FPR tells us what proportion of the negative class got incorrectly classified by the
classifier.
A higher TNR and a lower FPR are desirable since we want to classify the negative
class correctly.
Let’s understand ROC curve with an example. Suppose we need to distinguish between
patients with a particular disease(say phobic) and those that do not have a disease (non-
phobic). The patient population for both these states forms two overlapping normal
distributions as shown below. It can be seen that the curves overlap, as they almost
always will in real life. Some individuals with disease have test scores or other
characteristics that are similar to those that do not have disease.
Streiner (2007) examines the ROC with a problem of classifying patients into Phobic (disease state positive category)
and Non Phobic ( disease state negative category) using a 10 point test score. Table 1 shows the test score from 1–10
and a frequency table of the test results categorized by label categories:
To predict Phobic and Non Phobic cases, we need to define a cutoff score. Lets do it for different cutoff scores
Plotting the ROC Curve
*The lower left-hand corner of the curve shows the beginning of the
classification process. No classifications are identified initially.
*Initially the cutoff is 9/10 ( cutoff of 10+ represents Phobic and less than or
equal to 9 is Non Phobic) — see Table 1. In this case the classification criteria
is very strict and only strong TP cases are classified. But there are only a few
examples in the sample which meet this criterion(21 instances for score 10).
*Once we make the cutoff a little more relaxed, for example 8/9, we can begin
to see some FPs as well.
*The cutoff 7/8 is the one which is closest to the upper left corner and hence it
minimizes the overall classification error. At this point, TPR =.8 and FPR =.1
approximately.
Evaluating classification model performance
• Confusion Matrix Solved Example Accuracy Precision Recall F1 Scor
e Prevalence by Mahesh
Huddar - YouTube
Thank you