Ensemble Classifiers
Ensemble Classifiers
Ensemble Classifiers
Classifiers
Prof. Navneet Goyal
Ensemble Classifiers
Introduction & Motivation
Construction of Ensemble Classifiers
Boosting (Ada Boost)
Bagging
Random Forests
Empirical Comparison
Introduction &
Motivation
Ensemble Methods
Construct a set of classifiers from the
training data
Predict class label of previously
unseen records by aggregating
predictions made by multiple
classifiers
General Idea
Manipulating
Manipulating
Manipulating
Manipulating
training set
input features
class labels
learning algorithms
Ensemble Classifiers
Ensemble methods work better with
unstable classifiers
Classifiers that are sensitive to minor
perturbations in the training set
Examples:
Decision trees
Rule-based
Artificial neural networks
i
25 i
(
1
)
0.06
i
i 13
Examples of Ensemble
Methods
How to generate an ensemble of
classifiers?
Bagging
Boosting
Random Forests
Bagging
Also known as bootstrap aggregation
Original Data
Bagging (Round 1)
Bagging (Round 2)
Bagging (Round 3)
1
7
1
1
2
8
4
8
3
10
9
5
4
8
1
10
5
2
2
5
6
5
3
5
7
10
2
9
8
10
7
6
9
5
3
3
10
9
2
7
Bagging
Accuracy of bagging:
k
0.
1
0.
2
0.
3
0.
4
0.
5
0.
6
0.
7
0.
8
0.
9
1.
0
-1
-1
-1
-1
Bagging
Decision Stump
Single level decision
binary tree
Entropy x<=0.35 or
x<=0.75
Accuracy at most 70%
Bagging
Boosting
An iterative procedure to adaptively
change distribution of training data
by focusing more on previously
misclassified records
Initially, all N records are assigned equal
weights
Unlike bagging, weights may change at
the end of a boosting round
Boosting
Records that are wrongly classified
will have their weights increased
Records that are classified correctly
will have their weights decreased
Original Data
Boosting (Round 1)
Boosting (Round 2)
Boosting (Round 3)
1
7
5
4
2
3
4
4
3
2
9
8
4
8
4
10
5
7
2
4
6
9
5
5
7
4
1
4
8
10
7
6
9
6
4
3
10
3
2
4
Boosting
Equal weights are assigned to each
training tuple (1/d for round 1)
After a classifier Mi is learned, the weights
are adjusted to allow the subsequent
classifier Mi+1 to pay more attention to
tuples that were misclassified by M i.
Final boosted classifier M* combines the
votes of each individual classifier
Weight of each classifiers vote is a
function of its accuracy
Adaboost popular boosting algorithm
Adaboost
Input:
Training set D containing d tuples
k rounds
A classification learning scheme
Output:
A composite model
Adaboost
Data set D containing d class-labeled tuples
(X1,y1), (X2,y2), (X3,y3),.(Xd,yd)
Initially assign equal weight 1/d to each tuple
To generate k base classifiers, we need k
rounds or iterations
Round i, tuples from D are sampled with
replacement , to form Di (size d)
Each tuples chance of being selected
depends on its weight
Adaboost
Base classifier Mi, is derived from
training tuples of Di
Error of Mi is tested using Di
Weights of training tuples are adjusted
depending on how they were classified
Correctly classified: Decrease weight
Incorrectly classified: Increase weight
Adaboost
Some classifiers may be better at classifying some
hard tuples than others
We finally have a series of classifiers that
complement each other!
Error rate of model Mi:
d
( M i misclassification
) w j * err ( X j )
where err(Xj)error
is the
error for
j
Xj(=1)
If classifier error exceeds 0.5, we abandon it
Try again with a new Di and a new Mi derived from
it
Adaboost
error (Mi) affects how the weights of training tuples
are updated
If a tuple is correctly classified in round i, its weight
is multiplied by
error ( M i )
1 error ( M i )
error ( M i )
Weight of a classifier Mis weight
is
log
1 error ( M i )
Adaboost
The lower a classifier error rate, the more
accurate it is, and therefore, the higher its weight
for voting should be
Weight of a classifier
errorMis
( M i ) vote is
log
1 error ( M i )
Example: AdaBoost
Base classifiers: C1, C2, ,
CT
Error rate:
N
1
i
N
w C ( x ) y
j 1
Importance of a classifier:
1 1 i
i ln
2 i
Example: AdaBoost
Weight update:
exp j
if C j ( xi ) yi
w
wi
Z j exp j if C j ( xi ) yi
where Z j is the normalization factor
( j 1)
( j)
i
C * ( x ) arg max j C j ( x ) y
T
j 1
Illustrating AdaBoost
Initial weights for each data
point
Data points
for training
Illustrating AdaBoost
Random Forests
Ensemble method specifically designed for
decision tree classifiers
Random Forests grows many classification
trees (that is why the name!)
Ensemble of unpruned decision trees
Each base classifier classifies a new
vector
Forest chooses the classification having the
most votes (over all the trees in the forest)
Random Forests
Introduce two sources of
randomness: Bagging and
Random input vectors
Each tree is grown using a bootstrap
sample of training data
At each node, best split is chosen from
random sample of mtry variables instead
of all variables
Random Forests
Random Forest
Algorithm
M input variables, a number m<<M is specified
Random Forest
Algorithm
Out-of-bag (OOB) error
Good accuracy without over-fitting
Fast algorithm (can be faster than
growing/pruning a single tree); easily parallelized
Handle high dimensional data without much
problem
p
Only one tuning parameter mtry =
, usually not
sensitive to it