0% found this document useful (0 votes)

14 views

KNN - Algorithm - SVM - Algorithm

Uploaded by

MohitKhemka

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

KNN - Algorithm - SVM - Algorithm

Uploaded by

MohitKhemka

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Mithilesh Singh

KNN regression/Classifier
-non-parametric method
- approximates the association between
independent variables and the continuous
outcome by averaging the observations in
the same neighborhood.
-The size of the neighborhood K value
needs to be set by the analyst or can be
chosen using cross-validation to select the
size that minimizes the mean-squared
error.
Parametric V/s Non-parametric method
Parametric Method-assume the data is of
sufficient “Quality”, exp :- Linear Regression,
Logistic Regression.
- The result can be misleading if
assumptions are wrong.
- Quality is defined in terms of certain
properties of the data, like Normally
distributed ,Symmetrical linear
distribution, homogeneity of variance etc.

Non Parametric tests can be used when the data is not

of sufficient quality to satisfied the assumptions of parametric
test. Non parameter tests STILL have assumptions but are less
stringent.
Non parameter tests can be applied to Normal distributed data
but parametric tests have greater power IF assumptions met.
- Parametric tests are preferred when the assumptions are
met because they are more sensitive.
KNN model classifies the points based on
proximity or distance.
Important Distance Metrics in Machine Learning
 Euclidean Distance
 Manhattan Distance
 Minkowski distance

1. Euclidean Distance

Euclidean Distance represents the shortest distance between

two points.

Pythagorean theorem
The 2 dimensions’ formula for Euclidean Distance:

For n-dimensional space as:

Where,
 n = number of dimensions
 pi, qi = data points
2. Manhattan Distance
Manhattan Distance is the sum of absolute differences
between points across all the dimensions. Also called L1
Norm ,Taxi Cab Norm.

We can represent Manhattan Distance as:

Manhattan Distance, sum of absolute distances x

and y directions.
In a 2-dimensional space is given as:

And the generalized formula for an n-dimensional

space is given as:

Where,
 n = number of dimensions
 pi, qi = data points
(3)Minkowski distance:
This distance measure is the generalized form
of Euclidean and Manhattan distance metrics.
Euclidean distance is represented by this
formula when p is equal to 2, and Manhattan
distance is denoted with p equal to 1.
Minkowski distance=

 KNN model used for classification (and regression).

  KNN uses distance metrics in order to find
similarities or dissimilarities.

 working off the assumption that similar points can be

found near one another.
 The distinction between these terminologies is that
“majority voting”
Example of KNN- Euclidean distance in real life

Liability a b c d Status
Person (loan approved)
1 160 10 20 2000 No
2 180 20 30 5000 Yes
3 200 30 35 8000 Yes
4 150 20 25 4000 No
5 350 60 100 6500 Yes
---
--
100 200 50 80 7500 Yes
A 175 35 30 4500 ?

Euclidean distance: -

A-1square-root(15 sq +25 sq+10 sq +2500 sq)

A-2 square-root (5 sq+15 sq+ 0 sq+500 sq)

A-3 square-root (25 sq+5 sq+5 sq+3500 sq)

A-4 square-root (25 sq+15 sq +5 sq+ 500 sq)

A-5 square-root (175 sq+25 sq+70 sq+2000 sq)

-----

A-100……

Nearest neighbor lowest distance…

KNN method working concept:-

The K-NN working can be explained on the basis

of the below algorithm:
o Step-1: Select the number K of the
neighbors
o Step-2: Calculate the Euclidean distance of K
number of neighbors
o Step-3: Take the K nearest neighbors as per
the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the
number of the data points in each category.
o Step-5: Assign the new data points to that
category for which the number of the
neighbor is maximum.
o Step-6: Our model is ready.

Suppose we have a new data point and we

need to put it in the required category.
Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B. Consider
the below image:

o As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.

• KNN can be used in regression, the target

continuous value is computed as the mean of
the target value of k nearest neighbors.
For classification Problems:-

from sklearn.neighbors import KNeighborsClassifier

FOR Regression Problems: -
from sklearn.neighbors import KNeighborsRegressor
How to select the optimal value of k?
 Prefer odd k values, as there are chances of
tie in even values.
 K should not be too small.
 Thumb rule: k should be generally sqrt(n),
where n denotes the total number of data
points
 Further, one can try different values of k,
and then observe the evaluation metrics to
decide upon the best value of k
Python code:-

from sklearn.neighbors import KNeighborsClassifier

model_name = ‘K-Nearest Neighbor Classifier’
knnClassifier = KNeighborsClassifier(n_neighbors = 5, metric =
‘minkowski’, p=2)
knn_model = Pipeline(steps=[(‘preprocessor’,
preprocessorForFeatures), (‘classifier’ , knnClassifier)])
knn_model.fit(X_train, y_train)
y_pred = knn_model.predict(X_test)
# find the more effective value of n_neighbors parameter k value:

accuracy_K = []
for k in range(1, 50):
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train1, Y_train)
Y_pred = knn.predict(X_test)
accuracy =accuracy_score(Y_test, Y_pred)
accuracy_K.append(accuracy)
plt.figure(figsize=(12,8))
plt.xlabel("k values")
plt.ylabel("ACCURACY")
plt.plot(range(1,50),accuracy_K, marker='o', markersize=9)
KNN is a Lazy Learner model
 Almost all the models get trained on the training
dataset, but KNN does not get
trained on the training dataset.
 When we use knn.fit (X train, Y train), this model
‘memorizes’ the dataset. It does not understand it or
tries to learn the underlying trend.
 Now when we ask the model to predict some value,
then it takes a lot of time because now it actually will
have to recall all the points and work around them so
that it can help predict the correct value.
 Hence where most of the models, take time during
training, this model does not take any time during
training.
 Most of the models take no time in prediction, but the
KNN model takes a lot of time during the prediction
stage.
Important points
 Since it is a distance-based model, feature scaling is a must
for it.
 Besides logistic regression, the rest all the classification
models can work on multi class classification.
Advantages of KNN Algorithm:

o It is simple to implement.
o When the label data is too expensive or
impossible to obtain.
o It is robust to the noisy training data
o It can be more effective if the training data is
large.
Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which

may be complex some time.
o Entire dataset is processed for every prediction,
not good for large dataset.

The computation cost is high because of calculating

the distance between the data points for all the
training samples. computational Time complexity for
each prediction is equal to MNlog(k)
M is the dimension of data, N is the size of instance
of training data and K is nearest point selected.
Mithilesh singh
Support vector machines (SVM)
1990AI called winter year. USA had invested huge
amount in military but not successful due to failure
of AI.
SVM During that period SVM Invented popular
& successful More than 200 paper published.

SVM—called the maximal margin classifier

commonly used and originally intended for a binary classification.

It is often considered one of the best “out of the

box” classifiers.

 Used in Regression & Classification

 Preferred for Medium and small sized
dataset.

 It separates data in two components using hyper plane by

maximizing the margin (Also called large marginal classifier). the
maximal margin classifier tries to find the optimal separating
hyperplane.

Hyper-plane
It is plane that linearly divide the n-dimensional
data points in two components. In case of 2D,
hyperplane is line,
in case of 3D it is plane.
It is also called as n-dimensional line.

Hyperplane: A-line classify with highest margin

(maximum margin hyperplane) .

SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM

SVM can be of two types based on hyper_planes:

o Linear SVM: Linear SVM is used for linearly separable data,

which means if a dataset can be classified into two classes by
using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear
SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be classified
by using a straight line, then such data is termed as non-
linear data and classifier used is called as Non-linear SVM
classifier.

Support Vectors:

The data points or vectors that are the closest to the hyperplane
and which affect the position of the hyperplane are termed as
Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.

How does SVM works?

Kernel means approach ---- Hyperplane

There are different kernel functions available:
 linear
 Gaussian (RBF kernel- Radial Basis Function)
 Polynomial
 Sigmoid
Linear Plane and Non Linear Plane
(Scenario-1)
Identify the right hyper-plane: Here, we have
three hyper-planes (A, B, and C). Now, identify the right
hyper-plane to classify stars and circles.

In this scenario, hyper-plane“B” has excellently

performed this job.
(Scenario-2)
 Identify the right hyper-plane: Here, we have three
hyper-planes (A, B, and C) and all are segregating
the classes well. Now, How can we identify the right
hyper-plane?

Here, maximizing the Margin. the right

hyper-plane as C. If we select a hyper-
plane having low margin then there is high
chance of miss-classification.
(Scenario-3)
 Identify the right hyper-plane: Hint: Use the rules
as discussed in previous section to identify the right
hyper-plane

Note:- hyper-plane B as it has higher margin

compared to A.
But, here is the catch, SVM selects the hyper-
plane which classifies the classes
accurately prior to maximizing margin. Here,
hyper-plane B has a classification error and A
has classified all correctly. Therefore, the right
hyper-plane is A.
(Scenario-4)

The SVM algorithm has a feature to ignore outliers and find the hyper-plane
that has the maximum margin. Hence, SVM classification is robust to
outliers. Similarly, decision trees (scikit-learn's classifier) are also robust to
outlier.
Non-Linear SVM:

If data is linearly arranged, then we can separate it

by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the
below image:
(Scenario-5)
Find the hyper-plane to segregate to
classes: In the scenario below, we can’t have
linear hyper-plane between the two classes.

If data is linearly arranged, then we can separate it by

using a straight line, but for non-linear data, we cannot
draw a single straight line..

Do SVM has the Solution?

SVM will add one extra dimension to the
data points to make it separable.
Kernel trick CONVERTING TO HIGH
DIMENSIONALITY
To separate these data points, SVM add one
more dimension. For linear data, SVM use two
dimensions x and y, for non-linear data, SVM
will add a third dimension z.
It can be calculated as:-
z=x2 +y2

SVM keep on increasing the dimensions unless

the classes are separable.
By adding the third dimension, the sample space will
become as below image:
Python code:-

1. from sklearn.svm import SVC # "Support vector classifier"

2. classifier = SVC(kernel='linear', random_state=0)
3. classifier.fit(x_train, y_train)

In the above code, we have used kernel='linear', as here we are creating SVM for linearly
separable data. However, we can change it for non-linear data. And then we fitted the classifier
to the training dataset (x_train, y_train)

The model performance can be altered by changing the value of

kernel, gamma and C value
Kernel functions. - transform non-linear spaces into linear
spaces. It transforms data into higher dimension so that the data
can be classified.
kernel: various options available - “linear”, “rbf”,”poly” and
sigmoid (default is “rbf”).
Here “rbf” and “poly” are useful for non-linear hyper-plane.
gamma: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
Higher the value of gamma, will try to exact fit the as per
training data set i.e. generalization error and cause over-
fitting problem.

gamma different gamma values like 0.001,0.01,1, 10

or 100.
C: Penalty parameter C of the error term. It also controls
the trade-off between smooth decision boundaries and
classifying the training points correctly.

Pro & Cons associated with SVM: -

Advantage: -
 Works really well with margin of separation.
 Effective in high dimensional space
 Accurate results.
 Useful for both linearly separable and non-linearly separable data.

Disadvantages: -
 It doesn’t perform well when we have large dataset because the
required training time is very high.
Applications of SVM
 Sentiment analysis
 Spam Detection
 Handwritten digit recognition
 Image recognition

from sklearn.svm import SVC #"Support vector classifier"

from sklearn.svm import SVR #"Support vector Regressor"

# Building a Support Vector Machine on train data

svc_model = SVC(C=1, kernel='linear', gamma= 100)
svc_model.fit(X_train, Y_train)
prediction = svc_model.predict(X_test)

Final Project - Introduction To Gamification 2024
No ratings yet
Final Project - Introduction To Gamification 2024
13 pages
Exam MB 600 Microsoft Dynamics 365 Power Platform Solution Architect Skills Measured
No ratings yet
Exam MB 600 Microsoft Dynamics 365 Power Platform Solution Architect Skills Measured
5 pages
ML04_KNN-SVM_2024-2025
No ratings yet
ML04_KNN-SVM_2024-2025
57 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
KNN
No ratings yet
KNN
53 pages
IV Distance and Rule Based Models 4.1 Distance Based Models
No ratings yet
IV Distance and Rule Based Models 4.1 Distance Based Models
45 pages
KNN
No ratings yet
KNN
29 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
KNN Algorithm - PPT (Autosaved)
0% (1)
KNN Algorithm - PPT (Autosaved)
8 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Lecture#2. K Nearest Neighbors
No ratings yet
Lecture#2. K Nearest Neighbors
10 pages
KNN_Algorithm
No ratings yet
KNN_Algorithm
2 pages
4+KNN+Classifier
No ratings yet
4+KNN+Classifier
6 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
ML unit-2 (CEC)
No ratings yet
ML unit-2 (CEC)
96 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
No ratings yet
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
15 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
Lecture Slides#7
No ratings yet
Lecture Slides#7
21 pages
Instance-Based Learning: K-Nearest Neighbour Learning
No ratings yet
Instance-Based Learning: K-Nearest Neighbour Learning
21 pages
AIML PPT[1]
No ratings yet
AIML PPT[1]
13 pages
MachineLearning-Spring24 - KNN Implementation For Classification
No ratings yet
MachineLearning-Spring24 - KNN Implementation For Classification
3 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
ML Unit -2
No ratings yet
ML Unit -2
85 pages
Unit 5 Learning with Algorithm
No ratings yet
Unit 5 Learning with Algorithm
7 pages
Lecture Note #3_PEC-CS701E
No ratings yet
Lecture Note #3_PEC-CS701E
27 pages
KNN 2
No ratings yet
KNN 2
53 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Chapter 4. K Nearest Neighbors (2)
No ratings yet
Chapter 4. K Nearest Neighbors (2)
55 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
KNN Algorithm
No ratings yet
KNN Algorithm
10 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
CSE445 NSU Week_5
No ratings yet
CSE445 NSU Week_5
26 pages
KNN
No ratings yet
KNN
9 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
Introduction To KNN and R
No ratings yet
Introduction To KNN and R
12 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Top 25 Interview Questions On RNN - Reader View
No ratings yet
Top 25 Interview Questions On RNN - Reader View
9 pages
Homeopresc
No ratings yet
Homeopresc
2 pages
Hive
No ratings yet
Hive
37 pages
Class 5 Memory Allocaion
No ratings yet
Class 5 Memory Allocaion
15 pages
Top 170 Machine Learning Interview Questions and Answers (2024) - Reader View
No ratings yet
Top 170 Machine Learning Interview Questions and Answers (2024) - Reader View
51 pages
Linear Regression
No ratings yet
Linear Regression
59 pages
Machine Learning Interview Question
No ratings yet
Machine Learning Interview Question
72 pages
Text Analytics - Capstone Project
No ratings yet
Text Analytics - Capstone Project
19 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
R 2
No ratings yet
R 2
8 pages
Encounted AI - Interview Questions
No ratings yet
Encounted AI - Interview Questions
37 pages
Formula Sheet Data Science
No ratings yet
Formula Sheet Data Science
7 pages
Speedmaster XL 106 Product Information
No ratings yet
Speedmaster XL 106 Product Information
24 pages
Symbol Mobile C-Arms 2
No ratings yet
Symbol Mobile C-Arms 2
26 pages
Java Full Stack (1)
No ratings yet
Java Full Stack (1)
11 pages
Time: 2 Hrs Marks:50: DTP & Printing Technology Graphic Designing Module OAS-Microsoft Word (Theory) Internal Exam
No ratings yet
Time: 2 Hrs Marks:50: DTP & Printing Technology Graphic Designing Module OAS-Microsoft Word (Theory) Internal Exam
1 page
Arduino Based Third Eye For Blind People
No ratings yet
Arduino Based Third Eye For Blind People
7 pages
Mobile Compatibility
No ratings yet
Mobile Compatibility
4 pages
Data Science ML Full Stack Roadmap
No ratings yet
Data Science ML Full Stack Roadmap
35 pages
T1 Binary Systems
No ratings yet
T1 Binary Systems
31 pages
Unit 4
No ratings yet
Unit 4
3 pages
Lab 10
No ratings yet
Lab 10
6 pages
GSX Manual Msfs
No ratings yet
GSX Manual Msfs
69 pages
For Queries Please Contact:::: All Fields Marked With Are Mandatory Manage Stamp Sales
No ratings yet
For Queries Please Contact:::: All Fields Marked With Are Mandatory Manage Stamp Sales
4 pages
Hong Kong Arrow Trading Co., Limited: Proforma Invoice
No ratings yet
Hong Kong Arrow Trading Co., Limited: Proforma Invoice
1 page
ADAPT Handbook
No ratings yet
ADAPT Handbook
31 pages
The Role of Digital Twins in Connected and Automated Vehicles
No ratings yet
The Role of Digital Twins in Connected and Automated Vehicles
11 pages
Ansible Note
No ratings yet
Ansible Note
2 pages
ZXA10 OLT-C600 Product Datasheet
No ratings yet
ZXA10 OLT-C600 Product Datasheet
4 pages
21EC581 Manual
No ratings yet
21EC581 Manual
37 pages
Get The Materials!: Github Repo
No ratings yet
Get The Materials!: Github Repo
14 pages
Release Notes
No ratings yet
Release Notes
23 pages
Job Details Page 2
No ratings yet
Job Details Page 2
5 pages
Computer ch5 Class 8
No ratings yet
Computer ch5 Class 8
5 pages
Bosch DCN IP v1.0 Help
No ratings yet
Bosch DCN IP v1.0 Help
7 pages
Cash2013 - Highly Scalable Searchable Symmetric Encryption With Support For Boolean Queries
No ratings yet
Cash2013 - Highly Scalable Searchable Symmetric Encryption With Support For Boolean Queries
21 pages
Lab 01 - Lab Exercise
No ratings yet
Lab 01 - Lab Exercise
2 pages
Answer key Class vii computer for Half Yearly examination
No ratings yet
Answer key Class vii computer for Half Yearly examination
5 pages
Sure
No ratings yet
Sure
11 pages
IBM Spectrum Copy Data Management
No ratings yet
IBM Spectrum Copy Data Management
4 pages