Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

The online performance estimation framework: heterogeneous ensemble learning for data streams

Published: 01 January 2018 Publication History

Abstract

Ensembles of classifiers are among the best performing classifiers available in many data mining applications, including the mining of data streams. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to a given voting schedule. An important prerequisite for ensembles to be successful is that the individual models are diverse. One way to vastly increase the diversity among the models is to build an heterogeneous ensemble, comprised of fundamentally different model types. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. We study the use of heterogeneous ensembles for data streams. We introduce the Online Performance Estimation framework, which dynamically weights the votes of individual classifiers in an ensemble. Using an internal evaluation on recent training data, it measures how well ensemble members performed on this and dynamically updates their weights. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging, while being significantly faster. All experimental results from this work are easily reproducible and publicly available online.

References

[1]
Apté, C., & Weiss, S. (1997). Data mining with decision trees and decision rules. Future Generation Computer Systems, 13(2), 197-210.
[2]
Beringer, J., & Hüllermeier, E. (2007). Efficient instance-based learning on data streams. Intelligent Data Analysis, 11(6), 627-650.
[3]
Bifet, A., Frank, E., Holmes, G., & Pfahringer, B. (2012). Ensembles of restricted hoeffding trees. ACM Transactions on Intelligent Systems and Technology (TIST), 3(2), 30.
[4]
Bifet, A., & Gavalda, R. (2007). Learning from time-changing data with adaptive windowing. SDM, SIAM, 7, 139-148.
[5]
Bifet, A., & Gavaldà, R. (2009). Adaptive learning from evolving data streams. In Advances in intelligent data analysis VIII (pp. 249-260). Springer.
[6]
Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2010a). MOA: Massive online analysis. Journal of Machine Learning Research, 11, 1601-1604.
[7]
Bifet, A., Holmes, G., & Pfahringer, B. (2010b). Leveraging bagging for evolving data streams. In Machine learning and knowledge discovery in databases, Lecture Notes in Computer Science (Vol. 6321, pp. 135-150). Springer.
[8]
Bottou, L. (2004). Stochastic learning. In Advanced lectures on machine learning (pp. 146-168). Springer.
[9]
Brazdil, P., Gama, J., & Henery, B. (1994). Characterizing the applicability of classification algorithms using meta-level learning. In Machine learning: ECML-94 (pp. 83-102). Springer.
[10]
Brazdil, P., Soares, C., & Da Costa, J. P. (2003). Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning, 50(3), 251-277.
[11]
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
[12]
Caruana, R., Niculescu-Mizil, A., Crew, G., & Ksikes, A. (2004). Ensemble selection from libraries of models. In Proceedings of the twenty-first international conference on Machine learning (p. 18). ACM.
[13]
Dem¿ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1-30.
[14]
Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 71-80).
[15]
Domingos, P., & Hulten, G. (2003). A general framework for mining massive data streams. Journal of Computational and Graphical Statistics, 12(4), 945-949.
[16]
Gama, J., & Brazdil, P. (2000). Cascade generalization. Machine Learning, 41(3), 315-343.
[17]
Gama, J., & Kosina, P. (2014). Recurrent concepts in data streams classification. Knowledge and Information Systems, 40(3), 489-507.
[18]
Gama, J., Medas, P., Castillo, G., & Rodrigues, P. (2004a). Learning with drift detection. In SBIA Brazilian symposium on artificial intelligence, Lecture Notes in Computer Science (Vol. 3171, pp. 286-295). Springer.
[19]
Gama, J., Medas, P., & Rocha, R. (2004b). Forest trees for on-line data. In Proceedings of the 2004 ACM symposium on applied computing (pp. 632-636). ACM.
[20]
Gama, J., Sebastião, R., & Rodrigues, P. P. (2009). Issues in evaluation of stream learning algorithms. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 329-338). ACM.
[21]
Gama, J., Sebastião, R., & Rodrigues, P. P. (2013). On evaluating stream learning algorithms. Machine Learning, 90(3), 317-346.
[22]
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I.H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.
[23]
Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993-1001.
[24]
Hintze, J. L., & Nelson, R. D. (1998). Violin plots: A box plot-density trace synergism. The American Statistician, 52(2), 181-184.
[25]
Kolter, J. Z., & Maloof, M. A. (2007). Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research, 8, 2755-2790.
[26]
Ladha, K. K. (1993). Condorcet's jury theorem in light of de Finetti's theorem. Social Choice and Welfare, 10(1), 69-85.
[27]
Lee, J. W., & Giraud-Carrier, C. (2011). A metric for unsupervised metalearning. Intelligent Data Analysis, 15(6), 827-841.
[28]
Littlestone, N., & Warmuth, M. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212-261.
[29]
Nguyen, H. L., Woon, Y. K., Ng, W. K., & Wan, L. (2012). Heterogeneous ensemble for feature drifts in data streams. In Advances in knowledge discovery and data mining (pp. 1-12). Springer.
[30]
Oza, N. C. (2005). Online bagging and boosting. In 2005 IEEE international conference on systems, man and cybernetics (Vol. 3, pp. 2340-2345). IEEE.
[31]
Peterson, A. H., & Martinez, T. (2005). Estimating the potential for combining learning models. In Proceedings of the ICML workshop on meta-learning (pp. 68-75).
[32]
Pfahringer, B., Bensusan, H., & Giraud-Carrier, C. (2000). Tell me who can learn you and I can tell you who you are: Landmarking Various learning algorithms. In Proceedings of the 17th international conference on machine learning (pp. 743-750).
[33]
Pfahringer, B., Holmes, G., & Kirkby, R. (2007). New options for hoeffding trees. In AI 2007: Advances in artificial intelligence (pp. 90-99). Springer.
[34]
Read, J., Bifet, A., Pfahringer, B., & Holmes, G. (2012) Batch-incremental versus instance-incremental learning in dynamic and evolving data. In Advances in intelligent data analysis XI (pp. 313-323). Springer.
[35]
Rice, J. R. (1976). The algorithm selection problem. Advances in Computers, 15, 65-118.
[36]
Rokach, L., & Maimon, O. (2005). Clustering methods. In Data mining and knowledge discovery handbook (pp. 321-352). Springer.
[37]
Rossi, A. L. D., de Leon Ferreira, A. C. P., Soares, C., & De Souza, B. F. (2014). MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data. Neurocomputing, 127, 52-64.
[38]
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197-227.
[39]
Shaker, A., & Hüllermeier, E. (2015). Recovery analysis for adaptive learning from non-stationary data streams: Experimental design and case study. Neurocomputing, 150, 250-264.
[40]
Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming, 127(1), 3-30.
[41]
van Rijn, J. N. (2016). Massively collaborative machine learning. PhD thesis, Leiden University.
[42]
van Rijn, J. N., Holmes, G., Pfahringer, B., & Vanschoren, J. (2014). Algorithm selection on data streams. In Discovery Science, Lecture Notes in Computer Science (Vol. 8777, pp. 325-336). Springer.
[43]
van Rijn, J. N., Holmes, G., Pfahringer, B., & Vanschoren, J. (2015). Having a blast: Meta-learning and heterogeneous ensembles for data streams. In 2015 IEEE international conference on datamining (ICDM) (pp. 1003-1008). IEEE.
[44]
Vanschoren, J., van Rijn, J. N., Bischl, B., & Torgo, L. (2014). OpenML: Networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2), 49-60.
[45]
Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In KDD (pp. 226-235).
[46]
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259.
[47]
Zhang, P., Gao, B. J., Zhu, X., & Guo, L. (2011) Enabling fast lazy learning for data streams. In 2011 IEEE 11th International conference on data mining (ICDM) (pp. 932-941). IEEE.

Cited By

View all
  1. The online performance estimation framework: heterogeneous ensemble learning for data streams

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Machine Language
      Machine Language  Volume 107, Issue 1
      January 2018
      307 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 January 2018

      Author Tags

      1. Data streams
      2. Ensembles
      3. Meta-learning

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 17 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Concept Drift Adaptation by Exploiting Drift TypeACM Transactions on Knowledge Discovery from Data10.1145/363877718:4(1-22)Online publication date: 12-Feb-2024
      • (2024)Intrusion detection based on ensemble learning for big data classificationCluster Computing10.1007/s10586-023-04168-727:3(3771-3798)Online publication date: 1-Jun-2024
      • (2024)SETL: a transfer learning based dynamic ensemble classifier for concept drift detection in streaming dataCluster Computing10.1007/s10586-023-04149-w27:3(3417-3432)Online publication date: 1-Jun-2024
      • (2023)AutoMRM: A Model Retrieval Method Based on Multimodal Query and Meta-learningProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614787(1228-1237)Online publication date: 21-Oct-2023
      • (2023)A Jackknife-Inspired Deep Learning Approach to Subject-Independent Classification of EEGPattern Recognition Letters10.1016/j.patrec.2023.10.011176:C(28-33)Online publication date: 1-Dec-2023
      • (2023)Homogeneous–Heterogeneous Hybrid Ensemble for concept-drift adaptationNeurocomputing10.1016/j.neucom.2023.126741557:COnline publication date: 7-Nov-2023
      • (2023)Concept drift detection with quadtree-based spatial mapping of streaming dataInformation Sciences: an International Journal10.1016/j.ins.2022.12.085625:C(578-592)Online publication date: 1-May-2023
      • (2023)The Krypteia ensembleInformation Fusion10.1016/j.inffus.2022.09.02190:C(283-297)Online publication date: 1-Feb-2023
      • (2023)Automated Algorithm Selection: from Feature-Based to Feature-Free ApproachesJournal of Heuristics10.1007/s10732-022-09505-429:1(1-38)Online publication date: 9-Jan-2023
      • (2022)PCMCRProcedia Computer Science10.1016/j.procs.2022.09.148207:C(926-935)Online publication date: 1-Jan-2022
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media