Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Online ensemble learning
Publisher:
  • University of California, Berkeley
ISBN:978-0-493-58497-3
Order Number:AAI3044618
Pages:
116
Bibliometrics
Skip Abstract Section
Abstract

This thesis presents online versions of the popular bagging and boosting algorithms. We demonstrate theoretically and experimentally that the online versions perform comparably to their original batch counterparts in terms of classification performance. However, our online algorithms yield the typical practical benefits of online learning algorithms when the amount of training data available is large.

Ensemble learning algorithms have become extremely popular over the last several years because these algorithms, which generate multiple base models using traditional machine learning algorithms and combine them into an ensemble model, have often demonstrated significantly better performance than single models. Bagging and boosting are two of the most popular algorithms because of their good empirical results and theoretical support. However, most ensemble algorithms operate in batch mode, i.e., they repeatedly read and process the entire training set. Typically, they require at least one pass through the training set for every base model to be included in the ensemble. The base model learning algorithms themselves may require several passes through the training set to create each base model. In situations where data is being generated continuously, storing data for batch learning is impractical, which makes using these ensemble learning algorithms impossible. These algorithms are also impractical in situations where the training set is large enough that reading and processing it many times would be prohibitively expensive.

This thesis describes online versions of bagging and boosting. Unlike the batch versions, our online versions require only one pass through the training examples in order regardless of the number of base models to be combined. We discuss how we derive the online algorithms from their batch counterparts as well as theoretical and experimental evidence that our online algorithms perform comparably to the batch versions in terms of classification performance. We also demonstrate that our online algorithms have the practical advantage of lower running time, especially for larger datasets. This makes our online algorithms practical for machine learning and data mining tasks where the amount of training data available is very large.

Cited By

  1. Fall E, Chang K and Chen L (2024). Tree-managed network ensembles for video prediction, Machine Vision and Applications, 35:4, Online publication date: 1-Jul-2024.
  2. Gohar U, Biswas S and Rajan H Towards Understanding Fairness and its Composition in Ensemble Machine Learning Proceedings of the 45th International Conference on Software Engineering, (1533-1545)
  3. Li N, Ma L, Zhang T and He M Multi-objective Evolutionary Ensemble Learning for Disease Classification Advances in Swarm Intelligence, (491-500)
  4. ACM
    Liu F, Li H, Yang X and Jiang L L3E-HD: A Framework Enabling Efficient Ensemble in High-Dimensional Space for Language Tasks Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, (1844-1848)
  5. Kafiyan-Safari M and Rouhani M (2022). Adaptive one-pass passive-aggressive radial basis function for classification problems, Neurocomputing, 491:C, (91-103), Online publication date: 28-Jun-2022.
  6. ACM
    Bonab H and Can F (2018). GOOWE, ACM Transactions on Knowledge Discovery from Data, 12:2, (1-33), Online publication date: 30-Apr-2018.
  7. Sharma V and Mahapatra K (2017). MIL based visual object tracking with kernel and scale adaptation, Image Communication, 53:C, (51-64), Online publication date: 1-Apr-2017.
  8. Aldoan D and Yaslan Y (2017). A comparison study on active learning integrated ensemble approaches in sentiment analysis, Computers and Electrical Engineering, 57:C, (311-323), Online publication date: 1-Jan-2017.
  9. Ahmad A and Brown G (2015). Random Ordinality Ensembles, Information Sciences: an International Journal, 296:C, (75-94), Online publication date: 1-Mar-2015.
  10. ACM
    Gama J, Žliobaitė I, Bifet A, Pechenizkiy M and Bouchachia A (2014). A survey on concept drift adaptation, ACM Computing Surveys, 46:4, (1-37), Online publication date: 1-Apr-2014.
  11. ACM
    Binh N Online multiple tasks one-shot learning of object categories and vision Proceedings of the 9th International Conference on Advances in Mobile Computing and Multimedia, (131-138)
  12. Gama J and Cornuéjols A Resource aware distributed knowledge discovery Ubiquitous knowledge discovery, (40-60)
  13. Gama J and Cornuéjols A Resource aware distributed knowledge discovery Ubiquitous knowledge discovery, (40-60)
  14. ACM
    Chen Y and Chen Y Combining incremental Hidden Markov Model and Adaboost algorithm for anomaly intrusion detection Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, (3-9)
  15. Masip D, Lapedriza À and Vitrià J (2009). Boosted online learning for face recognition, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39:2, (530-538), Online publication date: 1-Apr-2009.
  16. Muhlbaier M and Polikar R An ensemble approach for incremental learning in nonstationary environments Proceedings of the 7th international conference on Multiple classifier systems, (490-500)
  17. ACM
    Yilmaz A, Javed O and Shah M (2006). Object tracking, ACM Computing Surveys (CSUR), 38:4, (13-es), Online publication date: 25-Dec-2006.
  18. Harrington E Online ranking/collaborative filtering using the perceptron algorithm Proceedings of the Twentieth International Conference on International Conference on Machine Learning, (250-257)
  19. Oza N Boosting with averaged weight vectors Proceedings of the 4th international conference on Multiple classifier systems, (15-24)
Contributors
  • NASA Ames Research Center
  • University of California, Berkeley

Recommendations