Abstract
Ensemble methods represent an effective way to solve supervised learning problems. Such methods are prevalent for learning from evolving data streams. One of the main reasons for such popularity is the possibility of incorporating concept drift detection and recovery strategies in conjunction with the ensemble algorithm. On top of that, successful ensemble strategies, such as bagging and random forest, can be easily adapted to a streaming setting. In this work, we analyse a novel ensemble method designed specially to cope with evolving data streams, namely the streaming random patches (SRP) algorithm. SRP combines random subspaces and online bagging to achieve competitive predictive performance in comparison with other methods. We significantly extend previous theoretical insights and empirical results illustrating different aspects of SRP. In particular, we explain how the widely adopted incremental Hoeffding trees are not, in fact, unstable learners, unlike their batch counterparts, and how this fact significantly influences ensemble methods design and performance. We compare SRP against state-of-the-art ensemble variants for streaming data in a multitude of datasets. The results show how SRP produces a high predictive performance for both real and synthetic datasets. We also show how ensembles of random subspaces can be an efficient and accurate option to SRP and leveraging bagging as we increase the number of base learners. Besides, we analyse the diversity over time and the average tree depth, which provides insights on the differences between local subspace randomization (as in random forest) and global subspace randomization (as in random subspaces). Finally, we analyse the behaviour of SRP when using Naive Bayes as its base learner instead of Hoeffding trees.
Similar content being viewed by others
Notes
The implementation and instructions are available at https://github.com/hmgomes/StreamingRandomPatches.
A formal definition of concept drift can be found in [48].
The inference is one way: Algorithmic stability is sufficient, but not necessary, for learning.
\(\Lambda \) could comprise values of multiple types, for example, here the integer ensemble size M and real-valued weights \(w_j\) could be hyperparameters.
GP and c were originally identified as \(n_{\min }\) and \(\delta \) by Domingos and Hulten [15]; however, we choose to keep their acronyms as in the Massive Online Analysis (MOA) framework to facilitate reproducibility.
Results for AGR(A) and AGR(G) for \(k=50\%\) and \(k=60\%\) produce the same results as \(k=0.6\times 9=5.4\) and \(k=0.5\times 9=4.5\) rounded to the nearest integer is 5 in both cases.
In DWM [30], we can only set the maximum number of base learners, since DWM dynamically changes the ensemble size during execution.
References
Abdulsalam H, Skillicorn DB, Martin P (2008) Classifying evolving data streams using dynamic streaming random forests. In: International conference on database and expert systems applications. Springer, pp 643–651 (2008)
Bifet A, Frank E, Holmes G, Pfahringer B (2012) Ensembles of restricted Hoeffding trees. ACM TIST 3(2):30:1–30:20. https://doi.org/10.1145/2089094.2089106
Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: SIAM
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604
Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: PKDD, pp 135–150
Bousquet O, Elisseeff A (2002) Stability and generalization. J Mach Learn Res 2:499–526
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1023/A:1018054314350
Breiman L (1999) Pasting small votes for classification in large databases and on-line. Mach Learn 36(1–2):85–103
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. J Inf Fusion 6:5–20
Brzezinski D, Stefanowski J (2014) Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf Sci 265:50–67. https://doi.org/10.1016/j.ins.2013.12.011
Chen ST, Lin HT, Lu CJ (2012) An online boosting algorithm with theoretical justifications. In: Proceedings of the international conference on machine learning (ICML)
Da Xu L, He W, Li S (2014) Internet of things in industries: a survey. IEEE Trans Ind Inform 10(4):2233–2243
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM SIGKDD, pp 71–80
Domingos PM (2000) A unified bias-variance decomposition for zero-one and squared loss. AAAI 2000:564–569
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. ICML 96:148–156
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37. https://doi.org/10.1145/2523813
Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):23:1–23:36. https://doi.org/10.1145/3054925
Gomes HM, Barddal JP, Ferreira LEB, Bifet A (2018) Adaptive random forests for data stream regression. In: ESANN
Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 6:1–27. https://doi.org/10.1007/s10994-017-5642-8
Gomes HM, Montiel J, Mastelini SM, Pfahringer B, Bifet A (2020) On ensemble techniques for data stream regression. In: 2020 International joint conference on neural networks (IJCNN). IEEE, pp 1–8
Gomes HM, Read J, Bifet A (2019) Streaming random patches for evolving data stream classification. In: IEEE international conference on data mining. IEEE
Gomes HM, Read J, Bifet A, Barddal JP, Gama J (2019) Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor Newsl 21(2):6–22
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer, New York
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Hoens TR, Chawla NV, Polikar R (2011) Heuristic updatable weighted random subspaces for non-stationary environments. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 241–250
Holmes G, Kirkby R, Pfahringer B (2005) Stress-testing Hoeffding trees. Knowl Discov Databases PKDD 2005:495–502. https://doi.org/10.1007/11564126_50
Ikonomovska E, Gama J, Džeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128–168
Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790
Kuncheva LI (2003) That elusive diversity in classifier ensembles. In: Iberian conference on pattern recognition and image analysis. Springer, pp 1126–1138 (2003)
Kuncheva LI, Rodríguez JJ, Plumpton CO, Linden DE, Johnston SJ (2010) Random subspace ensembles for FMRI classification. IEEE Trans Med Imaging 29(2):531–542
Kutin S, Niyogi P (2002) Almost-everywhere algorithmic stability and generalization error. In: Proceedings of the eighteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, pp 275–282
Kutin S, Niyogi P (2002) Almost-everywhere algorithmic stability and generalization error. Tech. Rep. TR-2002-03, University of Chicago
Lim N, Durrant RJ (2017) Linear dimensionality reduction in linear time: Johnson-lindenstrauss-type guarantees for random subspace. arXiv:1705.06408
Lim N, Durrant RJ (2020) A diversity-aware model for majority vote ensemble accuracy. In: International conference on artificial intelligence and statistics. PMLR, pp 4078–4087
Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590
Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261
Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12:1399–1404
Louppe G, Geurts P (2012) Ensembles on random patches. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 346–361 (2012)
Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742
Oza N, Russell S (2001) Online bagging and boosting. In: Artificial intelligence and statistics 2001, pp 105–112. Morgan Kaufmann
Panov P, Džeroski S (2007) Combining bagging and random subspaces to create better ensembles. In: International symposium on intelligent data analysis. Springer, pp 118–129 (2007)
Plumpton CO, Kuncheva LI, Oosterhof NN, Johnston SJ (2012) Naive random subspace ensemble with linear classifiers for real-time classification of FMRI data. Pattern Recognit 45(6):2101–2108
Servedio RA (2003) Smooth boosting and learning with malicious noise. J Mach Learn Res 4:633–648
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Stapenhurst RJ (2012) Diversity, margins and non-stationary learning. Ph.D. thesis, University of Manchester, UK
Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101. https://doi.org/10.1023/A:1018046501280
Žliobaite I (2010) Change with delayed labeling: When is it detectable? In: 2010 IEEE international conference on Data mining workshops (ICDMW). IEEE, pp 843–850 (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gomes, H.M., Read, J., Bifet, A. et al. Learning from evolving data streams through ensembles of random patches. Knowl Inf Syst 63, 1597–1625 (2021). https://doi.org/10.1007/s10115-021-01579-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01579-z