Module3
Module3
Training Compute
Records Distance Test
Record
Choose k of the
“nearest” records
Nearest-Neighbor Classifiers
Unknown record Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve
X X X
d ( p, q ) ( pi
i
q)
i
2
X
Nearest Neighbor Classification…
• Scaling issues
• Attributes may have to be scaled to prevent distance
measures from being dominated by one of the
attributes
• Example:
• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 90lb to 300lb
• income of a person may vary from $10K to $1M
• Solution: Normalize the vectors to unit length
Nearest Neighbor Classification…
1. Let k be the no. of nearest neighbors and D be the set of
training examples.
2. for each test example z = (x’, y’) do
2.1 compute d(x’, x), the distance between z and every
example (x,y) Є D.
2.2 Select ⊆ D, the set of k closest training examples to
z.
2.3
Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets
Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers
Step 3:
Combine C*
Classifiers
General Procedure for ensemble methods
Bagging
•
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
Example of Bagging
• Refer notes for a numerical example.
• Data Set used to construct an ensemble of bagging
classifiers
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y 1 1 1 -1 -1 -1 -1 1 1 1
Boosting
• An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records
• Initially, all N records are assigned equal weights, 1/N
• Unlike bagging, weights may change at the end of
boosting round
• With each boosting sample, a classifier is
induced(iteratively) and is used to classify all training
examples.
• Misclassified examples are assigned more weights for the
next round.
Boosting
N
1
i
N
w C ( x ) y
j 1
j i j j
1 1 i
i ln
2 i
Example: AdaBoost
•
j
( j 1)
( j)
w exp if C j ( xi ) yi
wi i
Z j exp j if C j ( xi ) yi
where Z j is the normalizat ion factor
Ada Boost
•
T
C * ( x ) arg max j C j ( x ) y
y j 1