Ensemble Classifiers

Ensemble
Classifiers
Prof. Navneet Goyal
Ensemble Classifiers
Introduction & Motivation
Construction of Ensemble Classifiers
Boosting (Ada Boost)
Bagging
Random Forests
Empirical Comparison
Introduction &
Motivation
Suppose that you are a patient with a set of symptoms

Instead of taking opinion of just one doctor (classifier),
you decide to take opinion of a few doctors!
Is this a good idea? Indeed it is.
Consult many doctors and then based on their
diagnosis; you can get a fairly accurate idea of the
diagnosis.
Majority voting - bagging
More weightage to the opinion of some good
(accurate) doctors - boosting
In bagging, you give equal weightage to all classifiers,
whereas in boosting you give weightage according to
the accuracy of the classifier.
Ensemble Methods
Construct a set of classifiers from the
training data
Predict class label of previously
unseen records by aggregating
predictions made by multiple
classifiers
General Idea
Ensemble Classifiers (EC)

An ensemble classifier constructs a
set of base classifiers from the
training data
Methods for constructing an EC
Manipulating
Manipulating
Manipulating
Manipulating
training set
input features
class labels
learning algorithms

Manipulating training set
Multiple training sets are created by
resampling the data according to some
sampling distribution
Sampling distribution determines how likely it
is that an example will be selected for training
may vary from one trial to another
Classifier is built from each training set using
a paritcular learning algorithm
Examples: Bagging & Boosting

Manipulating input features
Subset of input features chosen to form
each training set
Subset can be chosen randomly or based
on inputs given by Domain Experts
Good for data that has redundant features
Random Forest is an example which uses
DT as its base classifierss

Manipulating class labels
When no. of classes is sufficiently large
Training data is transformed into a binary class
problem by randomly partitioning the class labels
into 2 disjoint subsets, A0 & A1
Re-labelled examples are used to train a base
classifier
By repeating the class labeling and model building
steps several times, and ensemble of base
classifiers is obtained
How a new tuple is classified?
Example error correcting output codings (pp
307)

Manipulating learning algorithm
Learning algorithms can be manipulated in such a
way that applying the algorithm several times on the
same training data may result in different models
Example ANN can produce different models by
changing network topology or the initial weights of
links between neurons
Example ensemble of DTs can be constructed by
introducing randomness into the tree growing
procedure instrad of choosing the best split
attribute at each node, we randomly choose one of
the top k attributes

First 3 approaches are generic can be
applied to any classifier
Fourth approach depends on the type
of classifier used
Base classifiers can be generated
sequentially or in parallel
Ensemble Classifiers
Ensemble methods work better with
unstable classifiers
Classifiers that are sensitive to minor
perturbations in the training set
Examples:
Decision trees
Rule-based
Artificial neural networks
Why does it work?

Suppose there are 25 base classifiers
Each classifier has error rate, = 0.35
Assume classifiers are independent
Probability that the ensemble classifier
makes a wrong prediction:
25 25
i
25 i
(
1
)
0.06
i
i 13
CHK out yourself if it is correct!!
Examples of Ensemble
Methods
How to generate an ensemble of
classifiers?
Bagging
Boosting
Random Forests
Bagging
Also known as bootstrap aggregation
Original Data
Bagging (Round 1)
Bagging (Round 2)
Bagging (Round 3)
1
7
1
1
2
8
4
8
3
10
9
5
4
8
1
10
5
2
2
5
6
5
3
5
7
10
2
9
8
10
7
6
9
5
3
3
10
9
2
7
Sampling uniformly with replacement

Build classifier on each bootstrap sample
0.632 bootstrap
Each bootstrap sample Di contains approx.
63.2% of the original training data
Remaining (36.8%) are used as test set
Bagging
Accuracy of bagging:
k
Acc( M ) (0.632 * Acc( M i ) test _ set 0.368 * Acc( M i ) train _ set )

i 1
Works well for small data sets

Example:
X
0.
1
0.
2
0.
3
0.
4
0.
5
0.
6
0.
7
0.
8
0.
9
1.
0
-1
-1
-1
-1
Bagging
Decision Stump
Single level decision
binary tree
Entropy x<=0.35 or
x<=0.75
Accuracy at most 70%
Bagging
Accuracy of ensemble classifier:

100%
Bagging- Final Points

Works well if the base classifiers are
unstable
Increased accuracy because it reduces
the variance of the individual classifier
Does not focus on any particular instance
of the training data
Therefore, less susceptible to model overfitting when applied to noisy data
What if we want to focus on a particular
instances of training data?
Boosting
An iterative procedure to adaptively
change distribution of training data
by focusing more on previously
misclassified records
Initially, all N records are assigned equal
weights
Unlike bagging, weights may change at
the end of a boosting round
Boosting
Records that are wrongly classified
will have their weights increased
Records that are classified correctly
will have their weights decreased
Original Data
Boosting (Round 1)
Boosting (Round 2)
Boosting (Round 3)
1
7
5
4
2
3
4
4
3
2
9
8
4
8
4
10
5
7
2
4
6
9
5
5
7
4
1
4
8
10
7
6
9
6
4
3
Example 4 is hard to classify

Its weight is increased, therefore it is
more likely to be chosen again in
subsequent rounds
10
3
2
4
Boosting
Equal weights are assigned to each
training tuple (1/d for round 1)
After a classifier Mi is learned, the weights
are adjusted to allow the subsequent
classifier Mi+1 to pay more attention to
tuples that were misclassified by M i.
Final boosted classifier M* combines the
votes of each individual classifier
Weight of each classifiers vote is a
function of its accuracy
Adaboost popular boosting algorithm
Adaboost
Input:
Training set D containing d tuples
k rounds
A classification learning scheme
Output:
A composite model
Adaboost
Data set D containing d class-labeled tuples
(X1,y1), (X2,y2), (X3,y3),.(Xd,yd)
Initially assign equal weight 1/d to each tuple
To generate k base classifiers, we need k
rounds or iterations
Round i, tuples from D are sampled with
replacement , to form Di (size d)
Each tuples chance of being selected
depends on its weight
Adaboost
Base classifier Mi, is derived from
training tuples of Di
Error of Mi is tested using Di
Weights of training tuples are adjusted
depending on how they were classified
Correctly classified: Decrease weight
Incorrectly classified: Increase weight
Weight of a tuple indicates how hard it is

to classify it (directly proportional)
Adaboost
Some classifiers may be better at classifying some
hard tuples than others
We finally have a series of classifiers that
complement each other!
Error rate of model Mi:
d
( M i misclassification
) w j * err ( X j )
where err(Xj)error
is the
error for
j
Xj(=1)
If classifier error exceeds 0.5, we abandon it
Try again with a new Di and a new Mi derived from
it
Adaboost
error (Mi) affects how the weights of training tuples
are updated
If a tuple is correctly classified in round i, its weight
is multiplied by
error ( M i )
1 error ( M i )
Adjust weights of all correctly classified tuples

Now weights of all tuples (including the
misclassified tuples) are normalized
Normalization factorsum
= _ of _ old _ weights
sum _ of _ new _ weights
error ( M i )
Weight of a classifier Mis weight
is
log
1 error ( M i )
Adaboost
The lower a classifier error rate, the more
accurate it is, and therefore, the higher its weight
for voting should be
Weight of a classifier
errorMis
( M i ) vote is
log
1 error ( M i )
For each class c, sum the weights of each

classifier that assigned class c to X (unseen tuple)
The class with the highest sum is the WINNER!
Example: AdaBoost
Base classifiers: C1, C2, ,
CT
Error rate:
N
1
i
N
w C ( x ) y
j 1
Importance of a classifier:
1 1 i
i ln
2 i
Example: AdaBoost
Weight update:
exp j
if C j ( xi ) yi
w
wi
Z j exp j if C j ( xi ) yi
where Z j is the normalization factor
( j 1)
( j)
i
C * ( x ) arg max j C j ( x ) y
T
j 1
If any intermediate rounds produce error rate

higher than 50%, the weights are reverted back
to 1/n and the re-sampling procedure is
repeated
Classification:
Illustrating AdaBoost
Initial weights for each data
point
Data points
for training
Illustrating AdaBoost
Random Forests
Ensemble method specifically designed for
decision tree classifiers
Random Forests grows many classification
trees (that is why the name!)
Ensemble of unpruned decision trees
Each base classifier classifies a new
vector
Forest chooses the classification having the
most votes (over all the trees in the forest)
Random Forests
Introduce two sources of
randomness: Bagging and
Random input vectors
Each tree is grown using a bootstrap
sample of training data
At each node, best split is chosen from
random sample of mtry variables instead
of all variables
Random Forests
Random Forest
Algorithm
M input variables, a number m<<M is specified
such that at each node, m variables are selected

at random out of the M and the best split on
these m is used to split the node.
m is held constant during the forest growing
Each tree is grown to the largest extent possible
There is no pruning
m=Mdecision trees is a special case of
Bagging using
random forests when
Random Forest
Algorithm
Out-of-bag (OOB) error
Good accuracy without over-fitting
Fast algorithm (can be faster than
growing/pruning a single tree); easily parallelized
Handle high dimensional data without much
problem
p
Only one tuning parameter mtry =
, usually not
sensitive to it

Ensemble Classifiers

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Ensemble Classifiers

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ensemble Classifiers

Uploaded by

Copyright:

Available Formats

Ensemble

Suppose that you are a patient with a set of symptoms

Ensemble Classifiers (EC)

Ensemble Classifiers (EC)

Ensemble Classifiers (EC)

Ensemble Classifiers (EC)

Ensemble Classifiers (EC)

Ensemble Classifiers (EC)

Why does it work?

CHK out yourself if it is correct!!

Sampling uniformly with replacement

Acc( M ) (0.632 * Acc( M i ) test _ set 0.368 * Acc( M i ) train _ set )

Works well for small data sets

Accuracy of ensemble classifier:

Bagging- Final Points

Example 4 is hard to classify

Weight of a tuple indicates how hard it is

Adjust weights of all correctly classified tuples

For each class c, sum the weights of each

If any intermediate rounds produce error rate

such that at each node, m variables are selected

You might also like