0% found this document useful (0 votes)

34 views

Ensembles of Classifiers: Evgueni Smirnov

This document discusses various ensemble methods for combining multiple classifiers to improve predictive accuracy. It describes methods for independently constructing ensembles like majority voting, bagging, and random forests which generate diverse classifiers by training on different subsets of the data. It also covers boosting and stacking which coordinate the construction of classifiers to be complementary. Key advantages of ensembles are reducing bias and variance to improve performance over single classifiers.

Uploaded by

Christian I. Ango

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Ensembles of Classifiers: Evgueni Smirnov

Uploaded by

Christian I. Ango

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Ensembles of Classifiers

Evgueni Smirnov
Outline
• Methods for Independently Constructing Ensembles
– Majority Vote
– Bagging and Random Forest
– Randomness Injection
– Feature-Selection Ensembles
– Error-Correcting Output Coding
• Methods for Coordinated Construction of Ensembles
– Boosting
– Stacking
• Reliable Classification: Meta-Classifier Approach
• Co-Training and Self-Training
Ensembles of Classifiers
• Basic idea is to learn a set of
classifiers (experts) and to allow them
to vote.
• Advantage: improvement in
predictive accuracy.
• Disadvantage: it is difficult to
understand an ensemble of classifiers.
Why do ensembles work?
Dietterich(2002) showed that ensembles overcome three problems:
• The Statistical Problem arises when the hypothesis space is too
large for the amount of available data. Hence, there are many
hypotheses with the same accuracy on the data and the learning
algorithm chooses only one of them! There is a risk that the
accuracy of the chosen hypothesis is low on unseen data!
• The Computational Problem arises when the learning algorithm
cannot guarantees finding the best hypothesis.
• The Representational Problem arises when the hypothesis space
does not contain any good approximation of the target class(es).

The statistical problem and computational problem result in the

variance component of the error of the classifiers!
The representational problem results in the bias component of the
error of the classifiers!
Methods for Independently
Constructing Ensembles
One way to force a learning algorithm to construct
multiple hypotheses is to run the algorithm several
times and provide it with somewhat different data in
each run. This idea is used in the following methods:
• Majority Voting
• Bagging
• Randomness Injection
• Feature-Selection Ensembles
• Error-Correcting Output Coding.
Majority Vote

Original
D Training data

Step 1:
Build Multiple C1 C2 Ct -1 Ct
Classifiers

Step 2:
Combine C*
Classifiers
Why Majority Voting works?
• Suppose there are 25
base classifiers
– Each classifier has
error rate,  = 0.35
– Assume errors made
by classifiers are
uncorrelated
– Probability that the
ensemble classifier makes
a wrong prediction:
25
 25  i
P( X  13)     (1   ) 25i  0.06
i 13  i 
Bagging
• Employs simplest way of combining predictions that
belong to the same type.
• Combining can be realized with voting or averaging
• Each model receives equal weight
• “Idealized” version of bagging:
– Sample several training sets of size n (instead of just
having one training set of size n)
– Build a classifier for each training set
– Combine the classifier’s predictions
• This improves performance in almost all cases if
learning scheme is unstable (i.e. decision trees)
Bagging classifiers
Classifier generation
Let n be the size of the training set.
For each of t iterations:
Sample n instances with replacement from the
training set.
Apply the learning algorithm to the sample.
Store the resulting classifier.

classification
For each of the t classifiers:
Predict class of instance using classifier.
Return class that was predicted most often.
Why does bagging work?
• Bagging reduces variance by voting/
averaging, thus reducing the overall expected
error
– In the case of classification there are pathological
situations where the overall error might increase
– Usually, the more classifiers the better
Random Forest
Classifier generation
Let n be the size of the training set.
For each of t iterations:
(1) Sample n instances with replacement from
the training set.
(2) Learn a decision tree s.t. the variable
for any new node is the best variable among m
randomly selected variables.
(3) Store the resulting decision tree.

Classification
For each of the t decision trees:
Predict class of instance.
Return class that was predicted most often.
Bagging and Random Forest
• Bagging usually improves decision trees.
• Random forest usually outperforms
bagging due to the fact that errors of the
decision trees in the forest are less
correlated.
Randomization Injection

• Inject some randomization into a standard

learning algorithm (usually easy):
– Neural network: random initial weights
– Decision tree: when splitting, choose one of the
top N attributes at random (uniformly)
• Dietterich (2000) showed that 200 randomized
trees are statistically significantly better than
C4.5 for over 33 datasets!
Feature-Selection Ensembles
• Key idea: Provide a different subset of the input
features in each call of the learning algorithm.
• Example: Venus&Cherkauer (1996) trained an
ensemble with 32 neural networks. The 32
networks were based on 8 different subsets of 119
available features and 4 different algorithms. The
ensemble was significantly better than any of the
neural networks!
Error-Correcting Output Codes
• Exhaustive Codes:
Binary Classification Problems
BP1 BP2 BP3 BP4 BP5 BP6 BP7
y1 +1 +1 +1 +1 +1 +1 +1

y2 +1 +1 +1 -1 -1 -1 -1
Classes

y3 +1 -1 -1 +1 +1 -1 -1

y4 -1 +1 -1 +1 -1 +1 -1

• We receive 2(K-1)-1 number of binary classifiers. The

final classification rule is the nearest neighbor.
Assume that for an instance x we have a code word
[+1,+1, +1, +1,+1, +1, -1].
Error-Correcting Output Codes (ECOC)
• An ECOC matrix M has to satisfy two properties:
– Row separation: any class code word in M should be well-separated
from all other class code words M
– Column separation: any class-partition code word in M should be
well-separated from all other class-partition code words and their
complements.
Binary Classification Problems
BP1 BP2 BP3 BP4 BP5 BP6 BP7
y1 +1 +1 +1 +1 +1 +1 +1

y2 +1 +1 +1 -1 -1 -1 -1
Classes

y3 +1 -1 -1 +1 +1 -1 -1

y4 -1 +1 -1 +1 -1 +1 -1

Example. For Exhaustive ECOC:

• the Hamming distance between class code words is 2^(|Y|-2);
• the minimal Hamming distance between class partition code words is 1.
Error-Correcting Output Codes
• The number K of classes is greater than 2 and
we have a binary classifier L only.
• One-Against- All Strategy:
Binary Classification Problems
BP1 BP2 BP3 BP4
y1 +1 -1 -1 -1
y2 -1 +1 -1 -1
Classes

y3 -1 -1 +1 -1
y4 -1 -1 -1 +1

• We receive K number of binary classifiers. The

final classification rule is majority vote.
Error-Correcting Output Codes
• One-Against- One Strategy :
Binary Classification Problems

BP1 BP2 BP3 BP4 BP5 BP4

y1 +1 +1 +1 0 0 0
y2 -1 0 0 +1 +1 0
Classes

y3 0 -1 0 -1 0 +1
y4 0 0 -1 0 -1 -1

• We receive K(K-1)/2 number of binary classifiers.

The final classification rule is majority vote.
Error-Correcting Output Codes
• Minimal Codes:
Binary Classification Problems
BP1 BP2
y1 +1 -1
Classes y2 +1 +1

y3 -1 -1

y4 -1 +1

• We receive log2(K) number of binary classifiers. The

code word determines exactly the class.
• Problem: an error of one binary classifier causes error
of the whole ensemble.
Methods for Coordinated
Construction of Ensembles

The key idea is to learn complementary classifiers so

that instance classification is realized by taking an
weighted sum of the classifiers. This idea is used in
two methods:
• Boosting
• Stacking.
Boosting
• Also uses voting/averaging but models are
weighted according to their performance
• Iterative procedure: new models are influenced
by performance of previously built ones
– New model is encouraged to become expert for
instances classified incorrectly by earlier models
– Intuitive justification: models should be experts
that complement each other
• There are several variants of this algorithm
AdaBoost.M1
classifier generation
Assign equal weight to each training instance.
For each of t iterations:
Learn a classifier from weighted dataset.
Compute error e of classifier on weighted dataset.
If e equal to zero, or e greater or equal to 0.5:
Terminate classifier generation.
For each instance in dataset:
If instance classified correctly by classifier:
Multiply weight of instance by e / (1 - e).
Normalize weight of all instances.

classification
Assign weight of zero to all classes.
For each of the t classifiers:
Add -log(e / (1 - e)) to weight of class predicted
by the classifier.
Return class with highest weight.
Remarks on Boosting
• Boosting can be applied without weights using re-
sampling with probability determined by weights;
• Boosting decreases exponentially the training error in
the number of iterations;
• Boosting works well if base classifiers are not too
complex and their error doesn’t become too large too
quickly!
• Boosting reduces the bias component of the error of
simple classifiers!
Stacking
• Uses meta learner instead of voting to
combine predictions of base learners
– Predictions of base learners (level-0 models) are
used as input for meta learner (level-1 model)
• Base learners usually different learning
schemes
• Hard to analyze theoretically: “black magic”
Stacking

BC1 0

BC2 1
instance1

BCn 1

meta instances BC1 BC2 … BCn Class

instance1 0 1 1 1
Stacking

BC1 1

BC2 0
instance2

BCn 0

meta instances BC1 BC2 … BCn Class

instance1 0 1 1 1
instance2 1 0 0 0
Stacking

Meta Classifier

meta instances BC1 BC2 … BCn Class

instance1 0 1 1 1
instance2 1 0 0 0
Stacking

BC1 0
1
BC2 1
instance Meta Classifier

BCn 1

meta instance BC1 BC2 … BCn

instance 0 1 1
More on stacking
• Predictions on training data can’t be used to generate
data for level-1 model! The reason is that the level-0
classifier that better fit training data will be chosen by
the level-1 model! Thus,
• k-fold cross-validation-like scheme is employed! An
example for k = 3!
train train test
train test train

test train train

Meta Data test test test

More on stacking
• If base learners can output probabilities it’s
better to use those as input to meta learner
• Which algorithm to use to generate meta
learner?
– In principle, any learning scheme can be
applied
– David Wolpert: “relatively global, smooth”
model
• Base learners do most of the work
• Reduces risk of overfitting
Some Practical Advices
• If the classifier is unstable (high variance), then apply
bagging!
• If the classifier is stable and simple (high bias) then
apply boosting!
• If the classifier is stable and complex then apply
randomization injection!
• If you have many classes and a binary classifier then
try error-correcting codes! If it does not work then use
a complex binary classifier!
Reliable Classification
• Classifiers applied in critical applications with
high misclassification costs need to determine
whether classifications they assign to
individual instances are indeed correct.
• We consider one of the simplest approaches
that is related to ensembles of classifiers:
– Meta-Classifier Approach
The Task of Reliable Classification
Given:
• Instance space X.
• Classifier space H.
• Class set Y.
• Training sets D  X x Y.
Find:
• Classifier h  H, h: X  Y that correctly classifies
future, unseen instances. If h cannot classify an
instance correctly, symbol “?” is returned.
Meta Classifier Approach

instance BC

BC Class Meta Class

instance1 0 1 0
…………………………………………..
instancen 1 1 1
Meta Classifier Approach

instance BC

meta instances Meta Class

instance1 0
…………………..
instancen 1
Meta Classifier Approach
Combined Classifier

instance BC

The classification of the base classifier BC is outputted if the

meta classifier decides that the instance is classified correctly.
Theorem. The precision of the meta classifier equals the
accuracy of the combined classifier on the classified
instances.
Co-Training (WWW application)
• Consider the problem of learning to classify pages
of hypertext from the www, given labeled training
data consist of individual web pages along with
their correct classifications.
• The task of classifying a web page can be done by
considering just the words on the web page, and
the words on hyperlinks that point to the web
page.
Co-Training
Professor Faloutsos my advisor
The Co-Training algorithm
• Given:
– Set L of labeled training examples
– Set U of unlabeled examples
• Loop:
– Learn hyperlink-based classifier H from L
– Learn full-text classifier F from L
– Allow H to label p positive and n negative
examples from U
– Allow F to label p positive and n negative example
from U
– Add these self-labeled examples to L
The Self-Training algorithm
• Given:
– Set L of labeled training examples
– Set U of unlabeled examples
• Loop:
– Learn a classifier H from L
– Allow H to label p positive and n negative
examples from U
– Add these self-labeled examples to L
Learning to Classify Web using
Co-Training
• Mitchell(1999) reported an experiment to co-train
text classifiers that recognize course home pages.
• In experiment, he used 16 labeled examples, 800
unlabeled pages.
• Mitchell(1999) found that the Co-training
algorithm does improve classification accuracy
when learning to classify web pages.
Learning to Classify Web using
Co-Training
When does Co-Training work?
• When examples are described by
redundantly sufficient features; and
• When the hypothesis spaces corresponding
to the sets of redundantly sufficient features
contain different hypotheses or the learning
algorithms are different.

Cavity Theory: F.A. Attix, Introduction To Radiological Physics and Radiation Dosimetry
No ratings yet
Cavity Theory: F.A. Attix, Introduction To Radiological Physics and Radiation Dosimetry
28 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Combining Classifiers: Outline
No ratings yet
Combining Classifiers: Outline
15 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Module3
No ratings yet
Module3
26 pages
Three Approaches To Ordinal Classification (Slides 2009)
No ratings yet
Three Approaches To Ordinal Classification (Slides 2009)
25 pages
3_answers
No ratings yet
3_answers
19 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Homework3
No ratings yet
Homework3
10 pages
lecture_06
No ratings yet
lecture_06
51 pages
Chapter 9 SCM New
No ratings yet
Chapter 9 SCM New
33 pages
Cluster
100% (1)
Cluster
72 pages
Unit 3
No ratings yet
Unit 3
95 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
15 dm2 Imbalanced Learning 2022 23
No ratings yet
15 dm2 Imbalanced Learning 2022 23
35 pages
ML Unit 3
No ratings yet
ML Unit 3
83 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Tutorial em
No ratings yet
Tutorial em
57 pages
DWDM PPT
No ratings yet
DWDM PPT
35 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Elements of Dynamic Programming
No ratings yet
Elements of Dynamic Programming
13 pages
03 Supervised Classification
No ratings yet
03 Supervised Classification
68 pages
8_22
No ratings yet
8_22
10 pages
Optimization-Modeling PDF
No ratings yet
Optimization-Modeling PDF
99 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
ML Unit 2
No ratings yet
ML Unit 2
66 pages
Multi-label class
No ratings yet
Multi-label class
131 pages
8 Classification
No ratings yet
8 Classification
16 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
Final Exam: CS 189 Spring 2020 Introduction To Machine Learning
No ratings yet
Final Exam: CS 189 Spring 2020 Introduction To Machine Learning
19 pages
UNIT-4NEW
No ratings yet
UNIT-4NEW
39 pages
03 Classification
No ratings yet
03 Classification
66 pages
Linear - Classification
No ratings yet
Linear - Classification
72 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Integer Programming
No ratings yet
Integer Programming
37 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Simple Neural Nets For Pattern Classification
No ratings yet
Simple Neural Nets For Pattern Classification
68 pages
Simple Neural Nets For Pattern Classification
No ratings yet
Simple Neural Nets For Pattern Classification
68 pages
Agenda: 1. Introduction To Clustering
No ratings yet
Agenda: 1. Introduction To Clustering
47 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Class12-PatternClassification_PerformanceMetric_ReferenceTemplate
No ratings yet
Class12-PatternClassification_PerformanceMetric_ReferenceTemplate
33 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
6 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
Presentation UNIT-2(Old)
No ratings yet
Presentation UNIT-2(Old)
58 pages
T6- KNN - Features, Distances &amp; Non-Parametric Models
No ratings yet
T6- KNN - Features, Distances &amp; Non-Parametric Models
23 pages
Classification
100% (2)
Classification
105 pages
1 Eric Boosting304FinalRpdf
No ratings yet
1 Eric Boosting304FinalRpdf
19 pages
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
No ratings yet
C4.5 and CHAID Algorithm: Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
30 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
COMP3308/COMP3608 Artificial Intelligence Week 10 Tutorial Exercises Support Vector Machines. Ensembles of Classifiers
No ratings yet
COMP3308/COMP3608 Artificial Intelligence Week 10 Tutorial Exercises Support Vector Machines. Ensembles of Classifiers
3 pages
Pradipta Kumar Pattanayak - Ada Boosting
No ratings yet
Pradipta Kumar Pattanayak - Ada Boosting
44 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Machine Learning and Web Scraping Lecture 03
No ratings yet
Machine Learning and Web Scraping Lecture 03
22 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Combining Pattern Classifiers: Methods and Algorithms
From Everand
Combining Pattern Classifiers: Methods and Algorithms
Ludmila I. Kuncheva
No ratings yet
Projection Radiography: Pend. Pencitraan Medis Dan KN
No ratings yet
Projection Radiography: Pend. Pencitraan Medis Dan KN
108 pages
10 Kajian Klinis Berkas Elektron
No ratings yet
10 Kajian Klinis Berkas Elektron
41 pages
Dosimetri Radiasi
No ratings yet
Dosimetri Radiasi
96 pages
License
No ratings yet
License
1 page
Tugas 1 Post UTS
No ratings yet
Tugas 1 Post UTS
3 pages
License
No ratings yet
License
1 page
Installation Instruction of Egsnrc On Windows 8 or 10: 1. Install Mingw
No ratings yet
Installation Instruction of Egsnrc On Windows 8 or 10: 1. Install Mingw
6 pages
Image Haze Removal Using DCP
No ratings yet
Image Haze Removal Using DCP
4 pages
Python Solutions
No ratings yet
Python Solutions
11 pages
Rice Mill
No ratings yet
Rice Mill
42 pages
Address
No ratings yet
Address
2 pages
Good Introduction To Elastomer
No ratings yet
Good Introduction To Elastomer
12 pages
Channelling Fisher Randomization Tests and The Statistical Insignificance of Seemingly Significant Experimental Results
No ratings yet
Channelling Fisher Randomization Tests and The Statistical Insignificance of Seemingly Significant Experimental Results
47 pages
Ktu Syllabus
No ratings yet
Ktu Syllabus
87 pages
3D Cylinder
No ratings yet
3D Cylinder
14 pages
Cable Stay Bridge Modelling
No ratings yet
Cable Stay Bridge Modelling
7 pages
Mechanics of Deformable Bodies Module 2
No ratings yet
Mechanics of Deformable Bodies Module 2
19 pages
Practice Sheet - 4.3,4.4
No ratings yet
Practice Sheet - 4.3,4.4
7 pages
Mathematics Stage 5 - tcm142-354097
No ratings yet
Mathematics Stage 5 - tcm142-354097
4 pages
Weekly Learning Activity Sheets Proving Triangle Congruence: Take Note
No ratings yet
Weekly Learning Activity Sheets Proving Triangle Congruence: Take Note
6 pages
BOW - MATH 4 Complete
No ratings yet
BOW - MATH 4 Complete
14 pages
Sas#1 Bam040
0% (1)
Sas#1 Bam040
6 pages
Curve Fitting Matlab
No ratings yet
Curve Fitting Matlab
19 pages
Golang, Python or C/C++, Who Wins?
0% (1)
Golang, Python or C/C++, Who Wins?
35 pages
SPSS Youtube Video Transcript
No ratings yet
SPSS Youtube Video Transcript
2 pages
Module 1 Chapter 4 Week 4 Fundamentals of Surveying Lecture
No ratings yet
Module 1 Chapter 4 Week 4 Fundamentals of Surveying Lecture
10 pages
PhysicsLAB - Constant Velocity - Position-Time Graphs #2
No ratings yet
PhysicsLAB - Constant Velocity - Position-Time Graphs #2
2 pages
4 Informed Search
No ratings yet
4 Informed Search
84 pages
Statistical Analysis of A Sample of Exam Grades
No ratings yet
Statistical Analysis of A Sample of Exam Grades
14 pages
Demo
No ratings yet
Demo
46 pages
11th Maths EM Vol2 Study Materials English Medium PDF Download
No ratings yet
11th Maths EM Vol2 Study Materials English Medium PDF Download
21 pages
Cs2251 - Design and Analysis of Algorithms Unit I Algorithm Analysis
No ratings yet
Cs2251 - Design and Analysis of Algorithms Unit I Algorithm Analysis
21 pages
Ijimsep M 2 2018
No ratings yet
Ijimsep M 2 2018
13 pages
ACRES 3255 - Session-5 - Research Methodology and Statistical Tools
No ratings yet
ACRES 3255 - Session-5 - Research Methodology and Statistical Tools
12 pages
Experiment 8 - Free
No ratings yet
Experiment 8 - Free
15 pages