Machine Learning and Data Mining: Prof. Alexander Ihler
Machine Learning and Data Mining: Prof. Alexander Ihler
Machine Learning and Data Mining: Prof. Alexander Ihler
y(new) =?
20
x(new)
0
0 10 20
Feature x
20
x(new)
0
0 10 20
Feature x
20
0
0 10 20
Feature x
0
1
X2 !
1 0
1 ?
0
X1 !
Nearest neighbor classifier
Predictor:
Given new features:
Find nearest example
1 Return its value
0
1
X2 !
1 0
1 ?
0 training x?
Closest
Typically Euclidean distance:
0
X1 !
Nearest neighbor classifier
0
1
X2 !
1 0
1 ?
0
0
All points where we decide 0
X1 !
Nearest neighbor classifier
Voronoi tessellation:
Each datum is
assigned to a region, in
1 which all points are
closer to it than any
other datum
0
1
Decision boundary:
X2 !
X1 !
Nearest neighbor classifier
Nearest Nbr:
Class 1 1 Piecewise linear boundary
0
1
X2 !
Class 0
1
0
0
X1 !
More Data Points
1
1 1
2
2
1 1 2
2
X2 !
1
1 2
1
2 2
2
X1 !
More Complex Decision Boundary
1 In general:
Nearest-neighbor classifier
1 1
2 produces piecewise linear
decision boundaries
2
1 1 2
2
X2 !
1
1 2
1
2 2
2
X1 !
+
Regression
Usually just average the y-values of the k closest training examples
Classification
ranking yields k feature vectors and a set of k class labels
pick the class label which is most common in this set (vote)
classify x as belonging to this class
Note: for two-class problems, if k is odd (k=1, 3, 5, ) there will never
be any ties
K=1 K=3
kNN Decision Boundary
Recall: piecewise linear decision boundary
Increasing k simplifies decision boundary
Majority voting means less emphasis on individual points
K=5 K=7
kNN Decision Boundary
Recall: piecewise linear decision boundary
Increasing k simplifies decision boundary
Majority voting means less emphasis on individual points
K = 25
Error rates and K
Predictive
Error
K (# neighbors)
Best value of K
Complexity & Overfitting
Complex model predicts all training points well
Doesnt generalize to new data points
K=1 : perfect memorization of examples (complex)
K=M : always predict majority class in dataset (simple)
Can select K using validation data, etc.
simpler
Too complex
K (# neighbors)
K-Nearest Neighbor (kNN) Classifier
Theoretical Considerations
as k increases
we are averaging over more neighbors
the effective decision boundary is more smooth
as N increases, the optimal k value tends to increase
k=1, m increasing to infinity : error < 2x optimal