Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Module3

Uploaded by

aarchanasingh20
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module3

Uploaded by

aarchanasingh20
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Agenda

• Nearest Neighbor techniques


• Cost functions and Optimization Technique
• introduction to Gradient Descent, its applications on Linear Regression.
• Ensemble Learning algorithms – Bagging (Random Forest),
Boosting(AdaBoost)
Nearest Neighbor Classifiers
• Basic idea:
• If it walks like a duck, quacks like a duck, then it’s
probably a duck

Training Compute
Records Distance Test
Record

Choose k of the
“nearest” records
Nearest-Neighbor Classifiers
Unknown record  Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

 To classify an unknown record:


– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points


that have the k smallest distance to x
Nearest Neighbor Classification
• Compute distance between two points:
• Euclidean distance

d ( p, q )   ( pi
i
 q)
i
2

• Determine the class from nearest neighbor list


• take the majority vote of class labels among the k-
nearest neighbors
• Weigh the vote according to distance
• weight factor, w = 1/d2
Nearest Neighbor Classification…
• Choosing the value of k:
• If k is too small, susceptible to overfitting, due to
noise points in the training data.
• If k is too large, neighborhood may include points
from other classes.

X
Nearest Neighbor Classification…
• Scaling issues
• Attributes may have to be scaled to prevent distance
measures from being dominated by one of the
attributes
• Example:
• height of a person may vary from 1.5m to 1.8m
• weight of a person may vary from 90lb to 300lb
• income of a person may vary from $10K to $1M
• Solution: Normalize the vectors to unit length
Nearest Neighbor Classification…
1. Let k be the no. of nearest neighbors and D be the set of
training examples.
2. for each test example z = (x’, y’) do
2.1 compute d(x’, x), the distance between z and every
example (x,y) Є D.
2.2 Select ⊆ D, the set of k closest training examples to
z.
2.3

2.4 end for


Nearest neighbor Classification…
• k-NN classifiers are lazy learners
• It does not build models explicitly
• Unlike eager learners such as decision tree induction
and rule-based systems
• Classifying unknown records are relatively expensive
Very Important Topic
• https://www.analyticsvidhya.com/blog/
2021/04/gradient-descent-in-linear-regression/
Ensemble Methods/Learning
• Construct a set of classifiers from the training
data
• Predict class label of previously unseen records
by aggregating predictions made by multiple
classifiers
• Improves the classification accuracy
• Predicted output of the base classifiers is
combined by majority voting
• Build different experts and let them vote.
General Idea
Original
D Training data

Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets

Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers

Step 3:
Combine C*
Classifiers
General Procedure for ensemble methods
Bagging

Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
Example of Bagging
• Refer notes for a numerical example.
• Data Set used to construct an ensemble of bagging
classifiers
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y 1 1 1 -1 -1 -1 -1 1 1 1
Boosting
• An iterative procedure to adaptively change
distribution of training data by focusing more on
previously misclassified records
• Initially, all N records are assigned equal weights, 1/N
• Unlike bagging, weights may change at the end of
boosting round
• With each boosting sample, a classifier is
induced(iteratively) and is used to classify all training
examples.
• Misclassified examples are assigned more weights for the
next round.
Boosting

• Records that are wrongly classified will have


their weights increased
• Records that are classified correctly will have
their weights decreased
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify


• Its weight is increased, therefore it is more likely to be chosen again
in subsequent rounds
•Final ensemble is an aggregate of the base classifiers got from each
boosting round.
How Boosting Works?

Basic Idea
• Suppose there are just 5 training examples {1, 2, 3, 4, 5}
• Initially each example has 0.2 (1/5) probability, of being
sampled.
• If the boosting samples for the first round are {2,4,4,3,2}, a
base classifier is built from this.
• Suppose 2,3,5 are correctly predicted by this classifier and
1,4 are wrongly predicted:
• Weight of 1,4 is increased
• Weight of 2,3,5 is decreased.
• Second round of boosting , again 5 samples, but now 1,4 are
more likely to be sampled.
Boosting
• Is an iterative procedure.
• The distribution of training examples are adaptively
changed, so that the base classifiers in the next
iteration, focus more on examples that are wrongly
predicted in the previous iteration.
• Boosting assigns a weight to each example.
• Weights are adaptively changed at the end of each
boosting round.
• Weights assigned to the training examples are used
in the following ways:-
Boosting
1. To draw a set of bootstrap samples from the
original data
2. Can be used by the base classifier to learn a
model that is biased towards heigher weight
examples.
Steps:-
3. Initially wt of all examples are same 1/N.
4. A sample is drawn as per the sampling distbn
of the training examples to get a new training
set.
5. A classifier is induced from this training set.
Boosting
4. All examples of the original data are classified
using this classifier.
5. Wrongly classified examples … increase in
weight
Correctly classified examples …. decrease in
weight.
So, wrongly classified examples will be focussed
more in subsequent iterations.
6. Repeat steps 2 to 5 for k times(k = no. of base
classifiers)
Boosting
7. As boosting round proceeds, wrongly classified
examples become more prevalent.
8. Final ensemble is got by aggregating base
classifiers got from each boosting round.

Several implementations of the boosting algorithm


have been developed. They all differ in terms of
1) How the weights of the examples are updated.
2) How the predictions of the base classifiers are
combined.
Example: AdaBoost

N
1
i 
N
 w  C ( x )  y 
j 1
j i j j

1  1 i 
 i  ln  
2  i 
Example: AdaBoost


j
( j 1)
( j)
w  exp if C j ( xi )  yi
wi i
 
Z j  exp j if C j ( xi )  yi
where Z j is the normalizat ion factor
Ada Boost

T
C * ( x ) arg max   j C j ( x )  y 
y j 1

You might also like