Online ensemble learning

January 2001

Author:
Nikunj Chandrakant Oza,
Chair:
Stuart Russell

Publisher:

University of California, Berkeley

ISBN:978-0-493-58497-3

Order Number:AAI3044618

Pages:

116

Purchase on ProQuest

Bibliometrics

Abstract

This thesis presents online versions of the popular bagging and boosting algorithms. We demonstrate theoretically and experimentally that the online versions perform comparably to their original batch counterparts in terms of classification performance. However, our online algorithms yield the typical practical benefits of online learning algorithms when the amount of training data available is large.

Ensemble learning algorithms have become extremely popular over the last several years because these algorithms, which generate multiple base models using traditional machine learning algorithms and combine them into an ensemble model, have often demonstrated significantly better performance than single models. Bagging and boosting are two of the most popular algorithms because of their good empirical results and theoretical support. However, most ensemble algorithms operate in batch mode, i.e., they repeatedly read and process the entire training set. Typically, they require at least one pass through the training set for every base model to be included in the ensemble. The base model learning algorithms themselves may require several passes through the training set to create each base model. In situations where data is being generated continuously, storing data for batch learning is impractical, which makes using these ensemble learning algorithms impossible. These algorithms are also impractical in situations where the training set is large enough that reading and processing it many times would be prohibitively expensive.

This thesis describes online versions of bagging and boosting. Unlike the batch versions, our online versions require only one pass through the training examples in order regardless of the number of base models to be combined. We discuss how we derive the online algorithms from their batch counterparts as well as theoretical and experimental evidence that our online algorithms perform comparably to the batch versions in terms of classification performance. We also demonstrate that our online algorithms have the practical advantage of lower running time, especially for larger datasets. This makes our online algorithms practical for machine learning and data mining tasks where the amount of training data available is very large.

Cited By

Contributors

Nikunj Chandrakant Oza
NASA Ames Research Center
- Publication Years2000 - 2022
- Publication counts26
- Citation count540
- Available for Download5
- Downloads (cumulative)4,939
- Downloads (12 months)223
- Downloads (6 weeks)24
- Average Downloads per Article988
- Average Citation per Article21
View Full Profile
Stuart J Russell
University of California, Berkeley
- Publication Years1985 - 2024
- Publication counts159
- Citation count6,149
- Available for Download30
- Downloads (cumulative)20,619
- Downloads (12 months)2,295
- Downloads (6 weeks)263
- Average Downloads per Article687
- Average Citation per Article39
View Full Profile

Comments

Recommendations

Online Ensemble Learning: An Empirical Study

We study resource-limited online learning, motivated by the problem of conditional-branch outcome prediction in computer architecture. In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically ...
A Novel Online Stacked Ensemble for Multi-Label Stream Classification
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

As data streams become more prevalent, the necessity for online algorithms that mine this transient and dynamic data becomes clearer. Multi-label data stream classification is a supervised learning problem where each instance in the data stream is ...
Ensemble Extreme Learning Machine for Multi-instance Learning
ICMLC '17: Proceedings of the 9th International Conference on Machine Learning and Computing

Multi-instance learning (MIL) is a classification approach for classifying on a collection of instances which each group is represented as a bag. The main task of MIL is to learn from labels and features of instances to produce a model to predict a ...

Browse Theses

Sections

Cited By