Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Tilani Gunawardena
Algorithms: K Nearest Neighbors
1
Algorithms: K Nearest Neighbors
2
Simple Analogy..
• Tell me about your friends(who your
neighbors are) and I will tell you who you are.
3
Instance-based Learning
Its very similar to a
Desktop!!
4
KNN – Different names
• K-Nearest Neighbors
• Memory-Based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
5
What is KNN?
• A powerful classification algorithm used in pattern
recognition.
• K nearest neighbors stores all available cases and
classifies new cases based on a similarity measure(e.g
distance function)
• One of the top data mining algorithms used today.
• A non-parametric lazy learning algorithm (An Instance-
based Learning method).
6
KNN: Classification Approach
• An object (a new instance) is classified by a
majority votes for its neighbor classes.
• The object is assigned to the most common class
amongst its K nearest neighbors.(measured by a
distant function )
7
8
Distance Measure
Training
Records
Test
Record
Compute
Distance
Choose k of the
“nearest” records
9
Distance measure for Continuous
Variables
10
Distance Between Neighbors
• Calculate the distance between new example
(E) and all examples in the training set.
• Euclidean distance between two examples.
– X = [x1,x2,x3,..,xn]
– Y = [y1,y2,y3,...,yn]
– The Euclidean distance between X and Y is defined
as:
11


n
i
ii yxYXD
1
2
)(),(
K-Nearest Neighbor Algorithm
• All the instances correspond to points in an n-dimensional
feature space.
• Each instance is represented with a set of numerical
attributes.
• Each of the training data consists of a set of vectors and a
class label associated with each vector.
• Classification is done by comparing feature vectors of
different K nearest points.
• Select the K-nearest examples to E in the training set.
• Assign E to the most common class among its K-nearest
neighbors.
12
3-KNN: Example(1)
Distance from John
sqrt [(35-37)2+(35-50)2 +(3-
2)2]=15.16
sqrt [(22-37)2+(50-50)2 +(2-
2)2]=15
sqrt [(63-37)2+(200-50)2 +(1-
2)2]=152.23
sqrt [(59-37)2+(170-50)2 +(1-
2)2]=122
sqrt [(25-37)2+(40-50)2 +(4-
2)2]=15.74
13
Customer Age Income No.
credit
cards
Class
George 35 35K 3 No
Rachel 22 50K 2 Yes
Steve 63 200K 1 No
Tom 59 170K 1 No
Anne 25 40K 4 Yes
John 37 50K 2 ? YES
How to choose K?
• If K is too small it is sensitive to noise points.
• Larger K works well. But too large K may include majority
points from other classes.
• Rule of thumb is K < sqrt(n), n is number of examples.
14
X
15
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points
that have the k smallest distance to x
16
KNN Feature Weighting
• Scale each feature by its importance for
classification
• Can use our prior knowledge about which features
are more important
• Can learn the weights wk using cross‐validation (to
be covered later)
17
Feature Normalization
• Distance between neighbors could be dominated
by some attributes with relatively large numbers.
 e.g., income of customers in our previous example.
• Arises when two features are in different scales.
• Important to normalize those features.
– Mapping values to numbers between 0 – 1.
18
Nominal/Categorical Data
• Distance works naturally with numerical attributes.
• Binary value categorical data attributes can be regarded as 1
or 0.
19
KNN Classification
$0
$50,000
$100,000
$150,000
$200,000
$250,000
0 10 20 30 40 50 60 70
Non-Default
Default
Age
Loan$
20
KNN Classification – Distance
Age Loan Default Distance
25 $40,000 N 102000
35 $60,000 N 82000
45 $80,000 N 62000
20 $20,000 N 122000
35 $120,000 N 22000
52 $18,000 N 124000
23 $95,000 Y 47000
40 $62,000 Y 80000
60 $100,000 Y 42000
48 $220,000 Y 78000
33 $150,000 Y 8000
48 $142,000 ?
2
21
2
21 )()( yyxxD 
21
KNN Classification – Standardized Distance
Age Loan Default Distance
0.125 0.11 N 0.7652
0.375 0.21 N 0.5200
0.625 0.31 N 0.3160
0 0.01 N 0.9245
0.375 0.50 N 0.3428
0.8 0.00 N 0.6220
0.075 0.38 Y 0.6669
0.5 0.22 Y 0.4437
1 0.41 Y 0.3650
0.7 1.00 Y 0.3861
0.325 0.65 Y 0.3771
0.7 0.61 ?
MinMax
MinX
Xs



22
Strengths of KNN
• Very simple and intuitive.
• Can be applied to the data from any distribution.
• Good classification if the number of samples is large enough.
23
Weaknesses of KNN
• Takes more time to classify a new example.
• need to calculate and compare distance from new example
to all other examples.
• Choosing k may be tricky.
• Need large number of samples for accuracy.

More Related Content

What's hot

Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
Random forest
Random forestRandom forest
Random forest
Musa Hawamdah
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
Ashraf Uddin
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
Dr Ganesh Iyer
 
Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
Suresh Pokharel
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
rameswara reddy venkat
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
Samra Shahzadi
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 

What's hot (20)

Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Random forest
Random forestRandom forest
Random forest
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
Clustering
ClusteringClustering
Clustering
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 

Similar to K Nearest Neighbors

knn is the k nearest algorithm ppt that includes all about knn, its adv and d...
knn is the k nearest algorithm ppt that includes all about knn, its adv and d...knn is the k nearest algorithm ppt that includes all about knn, its adv and d...
knn is the k nearest algorithm ppt that includes all about knn, its adv and d...
BootNeck1
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
Shani729
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
Subrata Kumer Paul
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
Abdullah al Mamun
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor Approach
Kumud Arora
 
KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
AbhilashChauhan14
 
K nearest neighbours
K nearest neighboursK nearest neighbours
K nearest neighbours
Learnbay Datascience
 
07 learning
07 learning07 learning
07 learning
ankit_ppt
 
Classification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationClassification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_Presentation
Madeleine Organ
 
Clasification approaches
Clasification approachesClasification approaches
Clasification approaches
gscprasad1111
 
Knn
KnnKnn
Knn
KnnKnn
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
Melba Shaya Sweety
 
Instance Based Learning in machine learning
Instance Based Learning in machine learningInstance Based Learning in machine learning
Instance Based Learning in machine learning
tanishqgujari
 
Knn
KnnKnn
K Nearest neighbour
K Nearest neighbourK Nearest neighbour
K Nearest neighbour
Learnbay Datascience
 
knn-1.pptx
knn-1.pptxknn-1.pptx
knn-1.pptx
MohammedSahil63
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptx
gamingzonedead880
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 
Data Mining Lecture_5.pptx
Data Mining Lecture_5.pptxData Mining Lecture_5.pptx
Data Mining Lecture_5.pptx
Subrata Kumer Paul
 

Similar to K Nearest Neighbors (20)

knn is the k nearest algorithm ppt that includes all about knn, its adv and d...
knn is the k nearest algorithm ppt that includes all about knn, its adv and d...knn is the k nearest algorithm ppt that includes all about knn, its adv and d...
knn is the k nearest algorithm ppt that includes all about knn, its adv and d...
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor Approach
 
KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
 
K nearest neighbours
K nearest neighboursK nearest neighbours
K nearest neighbours
 
07 learning
07 learning07 learning
07 learning
 
Classification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationClassification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_Presentation
 
Clasification approaches
Clasification approachesClasification approaches
Clasification approaches
 
Knn
KnnKnn
Knn
 
Knn
KnnKnn
Knn
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
 
Instance Based Learning in machine learning
Instance Based Learning in machine learningInstance Based Learning in machine learning
Instance Based Learning in machine learning
 
Knn
KnnKnn
Knn
 
K Nearest neighbour
K Nearest neighbourK Nearest neighbour
K Nearest neighbour
 
knn-1.pptx
knn-1.pptxknn-1.pptx
knn-1.pptx
 
k-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptxk-Nearest Neighbors with brief explanation.pptx
k-Nearest Neighbors with brief explanation.pptx
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Data Mining Lecture_5.pptx
Data Mining Lecture_5.pptxData Mining Lecture_5.pptx
Data Mining Lecture_5.pptx
 

More from Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

BlockChain.pptx
BlockChain.pptxBlockChain.pptx
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Data analytics
Data analyticsData analytics
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Decision tree
Decision treeDecision tree
kmean clustering
kmean clusteringkmean clustering
Covering algorithm
Covering algorithmCovering algorithm
Hierachical clustering
Hierachical clusteringHierachical clustering
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Big data in telecom
Big data in telecomBig data in telecom
Cloud Computing
Cloud ComputingCloud Computing
MapReduce
MapReduceMapReduce
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Pig Experience
Pig ExperiencePig Experience
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 

More from Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL (20)

BlockChain.pptx
BlockChain.pptxBlockChain.pptx
BlockChain.pptx
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 
Data analytics
Data analyticsData analytics
Data analytics
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
Decision tree
Decision treeDecision tree
Decision tree
 
kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
Covering algorithm
Covering algorithmCovering algorithm
Covering algorithm
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Big data in telecom
Big data in telecomBig data in telecom
Big data in telecom
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
MapReduce
MapReduceMapReduce
MapReduce
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 

Recently uploaded

Types of Diode and its working principle.pptx
Types of Diode and its working principle.pptxTypes of Diode and its working principle.pptx
Types of Diode and its working principle.pptx
nitugatkal
 
great athletes ppt bahasa inggris kelas x kurikulum merdeka
great athletes ppt bahasa inggris kelas x kurikulum merdekagreat athletes ppt bahasa inggris kelas x kurikulum merdeka
great athletes ppt bahasa inggris kelas x kurikulum merdeka
MonicaWijaya13
 
How to Set Start Category in Odoo 17 POS
How to Set Start Category in Odoo 17 POSHow to Set Start Category in Odoo 17 POS
How to Set Start Category in Odoo 17 POS
Celine George
 
A history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, PennsylvaniaA history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, Pennsylvania
ThomasRue2
 
ACTION PLAN ON NUTRITION MONTH 2024.docx
ACTION PLAN ON NUTRITION MONTH 2024.docxACTION PLAN ON NUTRITION MONTH 2024.docx
ACTION PLAN ON NUTRITION MONTH 2024.docx
LeviMaePacatang1
 
Brigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptxBrigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptx
aiofits06
 
Bipolar Junction Transistors and operation .pptx
Bipolar Junction Transistors and operation .pptxBipolar Junction Transistors and operation .pptx
Bipolar Junction Transistors and operation .pptx
nitugatkal
 
Odoo 17 Project Module : New Features - Odoo 17 Slides
Odoo 17 Project Module : New Features - Odoo 17 SlidesOdoo 17 Project Module : New Features - Odoo 17 Slides
Odoo 17 Project Module : New Features - Odoo 17 Slides
Celine George
 
How to Manage Advanced Pricelist in Odoo 17
How to Manage Advanced Pricelist in Odoo 17How to Manage Advanced Pricelist in Odoo 17
How to Manage Advanced Pricelist in Odoo 17
Celine George
 
Email Marketing in Odoo 17 - Odoo 17 Slides
Email Marketing  in Odoo 17 - Odoo 17 SlidesEmail Marketing  in Odoo 17 - Odoo 17 Slides
Email Marketing in Odoo 17 - Odoo 17 Slides
Celine George
 
SD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptxSD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptx
MarkKennethBellen1
 
Lecture Notes Unit5 chapter 15 PL/SQL Programming
Lecture Notes Unit5 chapter 15 PL/SQL ProgrammingLecture Notes Unit5 chapter 15 PL/SQL Programming
Lecture Notes Unit5 chapter 15 PL/SQL Programming
Murugan146644
 
How to Use Serial Numbers to Track Products in Odoo 17 Inventory
How to Use Serial Numbers to Track Products in Odoo 17 InventoryHow to Use Serial Numbers to Track Products in Odoo 17 Inventory
How to Use Serial Numbers to Track Products in Odoo 17 Inventory
Celine George
 
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docx
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docxQ1_LE_English 7_Lesson 1_Week 1 wordfile.docx
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docx
SANDRAMEMBRERE1
 
Brigada eskwela 2024 sample template NARRATIVE REPORT.docx
Brigada eskwela 2024 sample template NARRATIVE REPORT.docxBrigada eskwela 2024 sample template NARRATIVE REPORT.docx
Brigada eskwela 2024 sample template NARRATIVE REPORT.docx
BerlynFamilaran1
 
Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...
Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...
Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...
OH TEIK BIN
 
Multi Language and Language Translation with the Website of Odoo 17
Multi Language and Language Translation with the Website of Odoo 17Multi Language and Language Translation with the Website of Odoo 17
Multi Language and Language Translation with the Website of Odoo 17
Celine George
 
Class-Orientation for school year 2024 - 2025
Class-Orientation for school year 2024 - 2025Class-Orientation for school year 2024 - 2025
Class-Orientation for school year 2024 - 2025
KIPAIZAGABAWA1
 
Understanding Clergy Payroll : QuickBooks
Understanding Clergy Payroll : QuickBooksUnderstanding Clergy Payroll : QuickBooks
Understanding Clergy Payroll : QuickBooks
TechSoup
 
How to Integrate Facebook in Odoo 17 - Odoo 17 Slides
How to Integrate Facebook in Odoo 17 - Odoo 17 SlidesHow to Integrate Facebook in Odoo 17 - Odoo 17 Slides
How to Integrate Facebook in Odoo 17 - Odoo 17 Slides
Celine George
 

Recently uploaded (20)

Types of Diode and its working principle.pptx
Types of Diode and its working principle.pptxTypes of Diode and its working principle.pptx
Types of Diode and its working principle.pptx
 
great athletes ppt bahasa inggris kelas x kurikulum merdeka
great athletes ppt bahasa inggris kelas x kurikulum merdekagreat athletes ppt bahasa inggris kelas x kurikulum merdeka
great athletes ppt bahasa inggris kelas x kurikulum merdeka
 
How to Set Start Category in Odoo 17 POS
How to Set Start Category in Odoo 17 POSHow to Set Start Category in Odoo 17 POS
How to Set Start Category in Odoo 17 POS
 
A history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, PennsylvaniaA history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, Pennsylvania
 
ACTION PLAN ON NUTRITION MONTH 2024.docx
ACTION PLAN ON NUTRITION MONTH 2024.docxACTION PLAN ON NUTRITION MONTH 2024.docx
ACTION PLAN ON NUTRITION MONTH 2024.docx
 
Brigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptxBrigada Eskwela editable Certificate.pptx
Brigada Eskwela editable Certificate.pptx
 
Bipolar Junction Transistors and operation .pptx
Bipolar Junction Transistors and operation .pptxBipolar Junction Transistors and operation .pptx
Bipolar Junction Transistors and operation .pptx
 
Odoo 17 Project Module : New Features - Odoo 17 Slides
Odoo 17 Project Module : New Features - Odoo 17 SlidesOdoo 17 Project Module : New Features - Odoo 17 Slides
Odoo 17 Project Module : New Features - Odoo 17 Slides
 
How to Manage Advanced Pricelist in Odoo 17
How to Manage Advanced Pricelist in Odoo 17How to Manage Advanced Pricelist in Odoo 17
How to Manage Advanced Pricelist in Odoo 17
 
Email Marketing in Odoo 17 - Odoo 17 Slides
Email Marketing  in Odoo 17 - Odoo 17 SlidesEmail Marketing  in Odoo 17 - Odoo 17 Slides
Email Marketing in Odoo 17 - Odoo 17 Slides
 
SD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptxSD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptx
 
Lecture Notes Unit5 chapter 15 PL/SQL Programming
Lecture Notes Unit5 chapter 15 PL/SQL ProgrammingLecture Notes Unit5 chapter 15 PL/SQL Programming
Lecture Notes Unit5 chapter 15 PL/SQL Programming
 
How to Use Serial Numbers to Track Products in Odoo 17 Inventory
How to Use Serial Numbers to Track Products in Odoo 17 InventoryHow to Use Serial Numbers to Track Products in Odoo 17 Inventory
How to Use Serial Numbers to Track Products in Odoo 17 Inventory
 
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docx
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docxQ1_LE_English 7_Lesson 1_Week 1 wordfile.docx
Q1_LE_English 7_Lesson 1_Week 1 wordfile.docx
 
Brigada eskwela 2024 sample template NARRATIVE REPORT.docx
Brigada eskwela 2024 sample template NARRATIVE REPORT.docxBrigada eskwela 2024 sample template NARRATIVE REPORT.docx
Brigada eskwela 2024 sample template NARRATIVE REPORT.docx
 
Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...
Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...
Life of Ah Gong and Ah Kim ~ A Story with Life Lessons (Hokkien, English & Ch...
 
Multi Language and Language Translation with the Website of Odoo 17
Multi Language and Language Translation with the Website of Odoo 17Multi Language and Language Translation with the Website of Odoo 17
Multi Language and Language Translation with the Website of Odoo 17
 
Class-Orientation for school year 2024 - 2025
Class-Orientation for school year 2024 - 2025Class-Orientation for school year 2024 - 2025
Class-Orientation for school year 2024 - 2025
 
Understanding Clergy Payroll : QuickBooks
Understanding Clergy Payroll : QuickBooksUnderstanding Clergy Payroll : QuickBooks
Understanding Clergy Payroll : QuickBooks
 
How to Integrate Facebook in Odoo 17 - Odoo 17 Slides
How to Integrate Facebook in Odoo 17 - Odoo 17 SlidesHow to Integrate Facebook in Odoo 17 - Odoo 17 Slides
How to Integrate Facebook in Odoo 17 - Odoo 17 Slides
 

K Nearest Neighbors

  • 1. Tilani Gunawardena Algorithms: K Nearest Neighbors 1
  • 2. Algorithms: K Nearest Neighbors 2
  • 3. Simple Analogy.. • Tell me about your friends(who your neighbors are) and I will tell you who you are. 3
  • 4. Instance-based Learning Its very similar to a Desktop!! 4
  • 5. KNN – Different names • K-Nearest Neighbors • Memory-Based Reasoning • Example-Based Reasoning • Instance-Based Learning • Lazy Learning 5
  • 6. What is KNN? • A powerful classification algorithm used in pattern recognition. • K nearest neighbors stores all available cases and classifies new cases based on a similarity measure(e.g distance function) • One of the top data mining algorithms used today. • A non-parametric lazy learning algorithm (An Instance- based Learning method). 6
  • 7. KNN: Classification Approach • An object (a new instance) is classified by a majority votes for its neighbor classes. • The object is assigned to the most common class amongst its K nearest neighbors.(measured by a distant function ) 7
  • 8. 8
  • 10. Distance measure for Continuous Variables 10
  • 11. Distance Between Neighbors • Calculate the distance between new example (E) and all examples in the training set. • Euclidean distance between two examples. – X = [x1,x2,x3,..,xn] – Y = [y1,y2,y3,...,yn] – The Euclidean distance between X and Y is defined as: 11   n i ii yxYXD 1 2 )(),(
  • 12. K-Nearest Neighbor Algorithm • All the instances correspond to points in an n-dimensional feature space. • Each instance is represented with a set of numerical attributes. • Each of the training data consists of a set of vectors and a class label associated with each vector. • Classification is done by comparing feature vectors of different K nearest points. • Select the K-nearest examples to E in the training set. • Assign E to the most common class among its K-nearest neighbors. 12
  • 13. 3-KNN: Example(1) Distance from John sqrt [(35-37)2+(35-50)2 +(3- 2)2]=15.16 sqrt [(22-37)2+(50-50)2 +(2- 2)2]=15 sqrt [(63-37)2+(200-50)2 +(1- 2)2]=152.23 sqrt [(59-37)2+(170-50)2 +(1- 2)2]=122 sqrt [(25-37)2+(40-50)2 +(4- 2)2]=15.74 13 Customer Age Income No. credit cards Class George 35 35K 3 No Rachel 22 50K 2 Yes Steve 63 200K 1 No Tom 59 170K 1 No Anne 25 40K 4 Yes John 37 50K 2 ? YES
  • 14. How to choose K? • If K is too small it is sensitive to noise points. • Larger K works well. But too large K may include majority points from other classes. • Rule of thumb is K < sqrt(n), n is number of examples. 14 X
  • 15. 15
  • 16. X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x 16
  • 17. KNN Feature Weighting • Scale each feature by its importance for classification • Can use our prior knowledge about which features are more important • Can learn the weights wk using cross‐validation (to be covered later) 17
  • 18. Feature Normalization • Distance between neighbors could be dominated by some attributes with relatively large numbers.  e.g., income of customers in our previous example. • Arises when two features are in different scales. • Important to normalize those features. – Mapping values to numbers between 0 – 1. 18
  • 19. Nominal/Categorical Data • Distance works naturally with numerical attributes. • Binary value categorical data attributes can be regarded as 1 or 0. 19
  • 20. KNN Classification $0 $50,000 $100,000 $150,000 $200,000 $250,000 0 10 20 30 40 50 60 70 Non-Default Default Age Loan$ 20
  • 21. KNN Classification – Distance Age Loan Default Distance 25 $40,000 N 102000 35 $60,000 N 82000 45 $80,000 N 62000 20 $20,000 N 122000 35 $120,000 N 22000 52 $18,000 N 124000 23 $95,000 Y 47000 40 $62,000 Y 80000 60 $100,000 Y 42000 48 $220,000 Y 78000 33 $150,000 Y 8000 48 $142,000 ? 2 21 2 21 )()( yyxxD  21
  • 22. KNN Classification – Standardized Distance Age Loan Default Distance 0.125 0.11 N 0.7652 0.375 0.21 N 0.5200 0.625 0.31 N 0.3160 0 0.01 N 0.9245 0.375 0.50 N 0.3428 0.8 0.00 N 0.6220 0.075 0.38 Y 0.6669 0.5 0.22 Y 0.4437 1 0.41 Y 0.3650 0.7 1.00 Y 0.3861 0.325 0.65 Y 0.3771 0.7 0.61 ? MinMax MinX Xs    22
  • 23. Strengths of KNN • Very simple and intuitive. • Can be applied to the data from any distribution. • Good classification if the number of samples is large enough. 23 Weaknesses of KNN • Takes more time to classify a new example. • need to calculate and compare distance from new example to all other examples. • Choosing k may be tricky. • Need large number of samples for accuracy.

Editor's Notes

  1. Based on similarity function “tell me who your neighbors are, and I’ll tell you who you are” Simplest Algorithms
  2. KNN stattistical estimation Pattern recognition in the beginning of 1970
  3. Algorithm
  4. Weight and hight of people and we assume Red : female Blue: Male So we have new measurement of hight and weight … K=1,female ,we assigned class =female K=5 3 female, 2 males K=odd K=6 we have ties
  5. Although there are other possible choices, most instance-based learners use Euclid- ean distance.
  6. Although there are other possible choices, most instance-based learners use Euclid- ean distance. Other distance metrics may be more appropriate in special circumstances. If we have too many points Sum the squared differences and we get the square roots In Manhanttan distances : get absoulte values Min-ko-waski distance= Different distance measure .. We use the common one
  7. N-feature size
  8. This single number K is given prior to the testing phase and this number decides how many neighbors influence the classification.
  9. If the sample set is finite.. Historically the optomal K for mpst datasets has been between 3-10 . That produces much better results than 1NN
  10. Some configurations over data before applying KNN Different attributes are often measured on different scales, Numerical: Here the difference between two values is just the numerical difference between them, and it is this difference that is squared and added to yield the distance functio Nominal: the difference between two values that are not the same is often taken to be 1, whereas if the values are the same the difference is 0. No scaling is required in this case because only the values 0 and 1 are used.
  11. For attributes with more than two values, binary indicator variables are used as an implicit solution. Ex: In a customers’ Bank field with 3 values, Bank1 – 100 Bank2 – 010 Bank3 - 001
  12. That’s price to pay
  13. Robust: strong and healthy; vigorou