A Simple Proof of AdaBoost Algorithm

This document provides a simple proof of the AdaBoost algorithm. It first summarizes the AdaBoost algorithm and its goal of exponentially reducing training error. It then proves two key expressions: (1) that the training error of the ensemble is bounded above, and (2) that if each base classifier performs slightly better than random guessing, the training error will decrease exponentially fast. The document gives a new proof of these expressions and explains the parameter selection in the AdaBoost algorithm.

Uploaded by

Xuqing Wu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

A Simple Proof of AdaBoost Algorithm

Uploaded by

Xuqing Wu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

A Simple Proof of AdaBoost Algorithm

Yin Zhao
yz_math@hotmail.com

Updated on November 29, 2014

Adaboost is a powerful algorithm for predicting models. However, a major dis-

advantage is that Adaboost may lead to over-fit in the presence of noise. [1]
proved that the training error of the ensemble is bounded by the following ex-
pression: Y p
e ensembl e ≤ 2 · ǫt · (1 − ǫt ) (1)
t

where ǫt is the error rate of each base classifier t . If the error rate is less than 0.5,
we can write ǫt = 0.5 − γt , where γt measures how much better the classifier is
than random guessing (on binary problems). The bound on the training error of
the ensemble becomes
Yq P 2
e ensembl e ≤ 1 − 4γt 2 ≤ e −2 t γt (2)
t

Thus if each base classifier is slightly better than random so that γt > γ for some
γ > 0, then the training error drops exponentially fast. Nevertheless, because
of its tendency to focus on training examples that are misclassified, Adaboost
algorithm can be quite susceptible to over-fitting.[2]
We will give a new simple proof of 1 and 2; additionally, we try to explain why
the parameter
1 1 − ǫt
αt = · log
2 ǫt
in boosting algorithm.

AdaBoost Algorithm:
Recall the boosting algorithm is:
Given (x 1 , y 1 ), (x 2 , y 2 ), · · · , (x m , y m ), where x i ∈ X , y i ∈ Y = {−1, +1}.
Initialize
1
D 1 (i ) =
m
For t = 1, 2, . . . , T : Train weak learner using distribution D t .
Get weak hypothesis h t : X → {−1, +1} with error

ǫt = Pr [h t (x i ) 6= y i ]
i ∼D t

1
If ǫi > 0.5, then the weights D t (i ) are reverted back to their original uniform
1
values m .
Choose
1 1 − ǫt
αt = · log (3)
2 ǫt
Update:
e −αt
½
D t (i ) if h t (x i ) = y i
D t +1 (i ) = × (4)
Zt e αt if h t (x i ) 6= y i
where Z t is a normalization factor.
Output the final hypothesis:
Ã !
T
X
H (x) = sign αt · h t (x)
t =1

Proof:
Firstly, we will prove 1, note that D t +1 (i ) is the distribution and its summation
P
i D t +1 (i ) equals to 1, hence
½ −αt
X X e if h t (x i ) = y i
Z t = D t +1 (i ) · Z t = D t (i ) × αt
i i
e if h t (x i ) 6= y i

D t (i ) · e −αt +
X X
= D t (i ) · e αt
i : h t (x i )=y i i : h t (x i )6= y i

= e −αt ·
X X
D t (i ) + e αt · D t (i )
i : h t (x i )=y i i : h t (x i )6= y i

= e −αt · (1 − ǫt ) + e αt · ǫt (5)
In order to find αt we can minimize Z t by making its first order derivative equal
to 0.
′
[e −αt · (1 − ǫt ) + e αt · ǫt ] = −e −αt · (1 − ǫt ) + e αt · ǫt = 0
1 1 − ǫt
⇒ αt = · log
2 ǫt
which is 3 in the boosting algorithm. Then we put αt back to 5
1−ǫt 1−ǫt
− 1 log 1
log
Z t = e −αt · (1 − ǫt ) + e αt · ǫt = e 2 ǫt · (1 − ǫ ) + e 2
t
ǫt · ǫt
p
= 2 ǫt · (1 − ǫt ) (6)
On the other hand, derive from 4 we have

D t (i ) · e −αt ·y i ·h t (xi ) D t (i ) · e K t
D t +1 (i ) = =
Zt Zt
Since the product will either be 1 if h t (x i ) = y i or −1 if h t (x i ) 6= y i .
Thus we can write down all of the equations
1
D 1 (i ) =
m

2
D 1 (i ) · e K 1
D 2 (i ) =
Z1
D 2 (i ) · e K 2
D 3 (i ) =
Z2
.........
D t (i ) · e K t
D t +1 (i ) =
Zt
Multiply all equalities above and obtain

1 e −y i · f (xi )
D t +1 (i ) = · Q
m t Zt
P
where f (x i ) = t αt · h t (x i ).
Thus
1 X −y i · f (xi ) X Y Y
· e = D t +1 (i ) · Z t = Z t (7)
m i i t t

Note that if ǫi > 0.5 the data set will be re-sampled until ǫi ≤ 0.5. In other words,
the parameter αt ≥ 0 in each valid iteration process. The training error of the
ensemble can be expressed as
½ ½
1 X 1 if y i 6= h t (x i ) 1 X 1 if y i · f (x i ) ≤ 0
e ensembl e = · = ·
m i 0 if y i = h t (x i ) m i 0 if y i · f (x i ) > 0

1 X −y i · f (xi ) Y
≤ · e = Zt (8)
m i t

The last step derives from 7.

According to 6 and 8, we have proved 1
Y p
e ensembl e ≤ 2 · ǫt · (1 − ǫt ) (9)
t

In order to prove 2, we have to firstly prove the following inequality:

1 + x ≤ ex (10)

Or the equivalence e x − x − 1 ≥ 0.
Let f (x) = e x − x − 1, then
′
f (x) = e x − 1 = 0 ⇒ x = 0
′′
Since f (x) = e x > 0, so

f (x)mi n = f (0) = 0 ⇒ e x − x − 1 ≥ 0

which is desired. Now we go back to 9 and let

1
ǫt = − γt
2

3
where γt measures how much better the classifier is than random guessing (on
binary problems). Based on 10 we have
Y p
e ensembl e ≤ 2· ǫt · (1 − ǫt )
t

Yq
= 1 − 4γ2t
t
1
[1 + (−4γ2t )] 2
Y
=
t
2 1 2
(e −4γt ) 2 = e −2γt
Y Y
≤
t t
P 2
= e −2· t γt

as desired.

References
[1] Freund, Y. & Schapire, R. E. (1997). A decision-theoretic generalization of on-line
learning and an application to boosting. Journal of computer and system sciences,
Elsevier, 55, 119-139

[2] Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston:
Pearson Addison Wesley.

The Arrow Technique Booklet PDF
100% (15)
The Arrow Technique Booklet PDF
68 pages
Motivational Letter For ERASMUS
100% (5)
Motivational Letter For ERASMUS
1 page
WHY DO PEOPLE FOLLOW THE CROWD... (Diego Burgos 9-2)
No ratings yet
WHY DO PEOPLE FOLLOW THE CROWD... (Diego Burgos 9-2)
7 pages
Grammar Practice Book: Teaching Notes
100% (2)
Grammar Practice Book: Teaching Notes
17 pages
sol3_2016
No ratings yet
sol3_2016
8 pages
16 Boosting
No ratings yet
16 Boosting
7 pages
Foundations of Machine Learning: Boosting
No ratings yet
Foundations of Machine Learning: Boosting
41 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
Adaboost Matas
No ratings yet
Adaboost Matas
136 pages
Foundations of Machine Learning: Courant Institute and Google Research
No ratings yet
Foundations of Machine Learning: Courant Institute and Google Research
42 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
T R Ik-Cl Ervor Er Kis: (Example)
No ratings yet
T R Ik-Cl Ervor Er Kis: (Example)
122 pages
Adaboost
No ratings yet
Adaboost
13 pages
Boosting
No ratings yet
Boosting
11 pages
Boosting and Applications Yuan
No ratings yet
Boosting and Applications Yuan
41 pages
Lec13 PDF
No ratings yet
Lec13 PDF
10 pages
sol3_2015
No ratings yet
sol3_2015
8 pages
homework4_v1.0
No ratings yet
homework4_v1.0
5 pages
CS229 Supplemental Lecture Notes: 1 Boosting
No ratings yet
CS229 Supplemental Lecture Notes: 1 Boosting
11 pages
AdaBoost New PDF
No ratings yet
AdaBoost New PDF
45 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
sol3_2020
No ratings yet
sol3_2020
5 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
Boosting Buehlmann
No ratings yet
Boosting Buehlmann
52 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
LectureNotes7
No ratings yet
LectureNotes7
8 pages
lecture slide 12
No ratings yet
lecture slide 12
22 pages
Lecture-10-boosting
No ratings yet
Lecture-10-boosting
20 pages
Adaboost Algorithm
No ratings yet
Adaboost Algorithm
17 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
Ada Boost
No ratings yet
Ada Boost
25 pages
Bagging and Boosting: Amit Srinet Dave Snyder
No ratings yet
Bagging and Boosting: Amit Srinet Dave Snyder
33 pages
Selected theoretical aspects of ML and deep learning
No ratings yet
Selected theoretical aspects of ML and deep learning
46 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
No ratings yet
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
29 pages
_LECTURE+NOTES_Boosting
No ratings yet
_LECTURE+NOTES_Boosting
8 pages
AdaBoost Is Consistent
No ratings yet
AdaBoost Is Consistent
22 pages
Ada Boost
No ratings yet
Ada Boost
7 pages
Module3
No ratings yet
Module3
26 pages
chapter 3- boosting theory
No ratings yet
chapter 3- boosting theory
7 pages
Survey - Gradient Boosting Machine
No ratings yet
Survey - Gradient Boosting Machine
9 pages
09_EnsembleLearning
No ratings yet
09_EnsembleLearning
36 pages
DM - Lecture 4
No ratings yet
DM - Lecture 4
65 pages
Main
No ratings yet
Main
50 pages
ML8Ensembles (1)
No ratings yet
ML8Ensembles (1)
31 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
Lec3 Linear Regression With Multiple Vars
No ratings yet
Lec3 Linear Regression With Multiple Vars
30 pages
Ghojogh, Benyamin, and Mark Crowley
No ratings yet
Ghojogh, Benyamin, and Mark Crowley
23 pages
Tex
No ratings yet
Tex
7 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
ML 14 Boosting
No ratings yet
ML 14 Boosting
57 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Boosting On The Functional ANOVA Decomposition
No ratings yet
Boosting On The Functional ANOVA Decomposition
8 pages
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Classifying Cutting Volume at Shale
No ratings yet
Classifying Cutting Volume at Shale
12 pages
A Robust First-Arrival Picking Workflow Using Convolutional and Recurrent Neural Networks
No ratings yet
A Robust First-Arrival Picking Workflow Using Convolutional and Recurrent Neural Networks
44 pages
spwla-2019-iiii
No ratings yet
spwla-2019-iiii
13 pages
A Supervised Descent Learning Technique
No ratings yet
A Supervised Descent Learning Technique
13 pages
Special Section: Formation Evaluation Using Petrophysics and Borehole Geophysics
No ratings yet
Special Section: Formation Evaluation Using Petrophysics and Borehole Geophysics
10 pages
Final For Week 8
No ratings yet
Final For Week 8
13 pages
PE - Draft Phys Ed Assessment Criteria
No ratings yet
PE - Draft Phys Ed Assessment Criteria
13 pages
TTL PPT - 031508
No ratings yet
TTL PPT - 031508
33 pages
STUDENT COPY SCES1152 Computational Thinking in Science
No ratings yet
STUDENT COPY SCES1152 Computational Thinking in Science
3 pages
Mil
No ratings yet
Mil
36 pages
Research Essay
100% (1)
Research Essay
11 pages
Human Factor
No ratings yet
Human Factor
75 pages
German Vocab L1
No ratings yet
German Vocab L1
16 pages
Part6transcpits
No ratings yet
Part6transcpits
52 pages
3-Handbook English For Nurse
No ratings yet
3-Handbook English For Nurse
39 pages
Lecture 1 Part 2
No ratings yet
Lecture 1 Part 2
9 pages
Meaning and Process of Doing Philosophy: Explore: What's Next?
No ratings yet
Meaning and Process of Doing Philosophy: Explore: What's Next?
14 pages
Psychiatric Social Work Intervention With A Person With Severe Depression Based On Cognitive Behavioural Case Work Approach: A Case Study
No ratings yet
Psychiatric Social Work Intervention With A Person With Severe Depression Based On Cognitive Behavioural Case Work Approach: A Case Study
7 pages
Concept Mapping Is A General Method That Can Be Used To Help Any Individual or Group To Describe Their Ideas About Some Topic in A Pictorial Form
No ratings yet
Concept Mapping Is A General Method That Can Be Used To Help Any Individual or Group To Describe Their Ideas About Some Topic in A Pictorial Form
4 pages
Workplace Communication 1
No ratings yet
Workplace Communication 1
21 pages
Whitman 1984
100% (1)
Whitman 1984
115 pages
Implementing and Executing The Chosen Strategy: Stage 4 of The Strategic Management Process
No ratings yet
Implementing and Executing The Chosen Strategy: Stage 4 of The Strategic Management Process
3 pages
Lesson 2
No ratings yet
Lesson 2
4 pages
AI Project Cycle
No ratings yet
AI Project Cycle
74 pages
Planned Job Observation
No ratings yet
Planned Job Observation
52 pages
Solutions 7 - Giáo Án
No ratings yet
Solutions 7 - Giáo Án
109 pages
Peer Mentoring in Undergraduate Clinical Education of Orthoptic Students
No ratings yet
Peer Mentoring in Undergraduate Clinical Education of Orthoptic Students
12 pages
Afifi, 2016 Theory of Motivated Information
No ratings yet
Afifi, 2016 Theory of Motivated Information
9 pages
1000 Frequent Sight Words For All Classes by R Gopinath SG Teacher Ict Education Tools - 9
No ratings yet
1000 Frequent Sight Words For All Classes by R Gopinath SG Teacher Ict Education Tools - 9
1 page
Aims and Principles of Foreign Language Teaching
No ratings yet
Aims and Principles of Foreign Language Teaching
3 pages