Abstract
Decision tree ensembles are widely used in practice. In this work, we study in ensemble settings the effectiveness of replacing the split strategy for the state-of-the-art online tree learner, Hoeffding Tree, with a rigorous but more eager splitting strategy that we had previously published as Hoeffding AnyTime Tree. Hoeffding AnyTime Tree (HATT), uses the Hoeffding Test to determine whether the current best candidate split is superior to the current split, with the possibility of revision, while Hoeffding Tree aims to determine whether the top candidate is better than the second best and if a test is selected, fixes it for all posterity. HATT converges to the ideal batch tree while Hoeffding Tree does not. We find that HATT is an efficacious base learner for online bagging and online boosting ensembles. On UCI and synthetic streams, HATT as a base learner outperforms HT at a 0.05 significance level for the majority of tested ensembles on what we believe is the largest and most comprehensive set of testbenches in the online learning literature. Our results indicate that HATT is a superior alternative to Hoeffding Tree in a large number of ensemble settings.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In the prequential setting, training instances arrive in a sequence, and the true target value pertaining to each training instance is made available after the predictor has offered a prediction for a sequence of n instances. The loss function applied is necessarily incremental in nature. Choosing \(n=1\) — that is, evaluating and then updating the predictor after every instance—is an obvious transformation of a periodic evaluation process into an instantaneous one. While not typical of real world application scenarios, prequential accuracy serves as a useful approximation thereto.
There is a common misconception that an individual random variable “changes, taking on a number of values during a process”; in fact, a process is a sequence of events, each of which corresponds to an individual random variable that has taken a particular value (which is fixed and never to change).
References
Agrawal R, Ghosh S, Imielinski T, Iyer B, Swami A (1992) An interval classifier for database mining applications, pp 560–573
Bhatt R, Dhall A (2012) Skin segmentation dataset: UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/skin+segmentation
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: Mas16 sive online analysis. J Mach Learn Res, pp 1601–1604
Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolv18 ing data streams. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 135–150
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009a) CovPokElec dataset from new ensemble methods for evolving data streams, KDD ’09. https://www.openml.org/d/149
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009b) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 139–148
Bifet A, Ikonomovska E (2009) Airlines Dataset. https://www.openml.org/d/1169
Blackard J, Dean D (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variable, vol 24, pp 131–151
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York
Brown G (2017) Ensemble learning. In: Claude S, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, MA pp 393–402. https://doi.org/10.1007/978-1-4899-7687-1_252
Burgués J, Jiménez-Soto JM, Marco S (2018) Estima tion of the limit of detection in semiconductor gas sensors through linearized calibration models. Anal Chim Acta. https://doi.org/10.1016/j.aca.2018.01.062
Burgués J, Marco S (2018) Multivariate estimation of the limit of de tection by orthogonal partial least squares in temperature-modulated MOX sensors. In: Analytica Chimica Acta 1019, pp 49–64. https://doi.org/10.1016/j.aca.2018.03.005. http://www.sciencedirect.com/science/article/pii/S0003267018303702
Chen S-T, Lin H-T, Lu C-J (2012) An online boosting algorithm with theoretical justifications. In: arXiv preprint arXiv:1206.6422
de Barros RSM, de Carvalho Santos SGT, Junior PMG (2016) A boosting-like online learning ensemble. In: 2016 international joint conference on neural networks (IJCNN), pp 1871–1878. https://doi.org/10.1109/IJCNN.2016.7727427
de Carvalho SSGT, Gonçalves JPM, dos Santos SGD, de Barros RSM (2014) Speeding up recovery from concept drifts. In: Toon C, Floriana E, Eyke H, Rosa M (eds) Machine learning and knowledge discovery in databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III. Springer, Berlin, Heidelberg, pp 179–194. https://doi.org/10.1007/978-3-662-44845-8_12
de Mello RF, Chaitanya M, Albert B (2019) Measuring the shattering coefficient of decision tree models. Expert Syst Appl 137:443–452
Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80
Dua D, Graff C (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml
Gehrke J, Ramakrishnan R, Ganti V (2000) RainForest–a framework for fast decision tree construction of large datasets. Data Min Knowl Discov 4(2–3):127–162
Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y (1999) BOAT—optimistic decision tree construction. In: Proceedings of the 1999 ACM SIG- MOD international conference on management of data. SIGMOD ’99. ACM, Philadelphia, Pennsylvania, pp 169–180. https://doi.org/10.1145/304182.304197
Gomes HM, Read J, Bifet A (2019) Streaming random patches for evolving data stream classification. In: Jianyong W, Kyuseok S, Xindong W (eds) 2019 IEEE International conference on data mining, ICDM 2019, Beijing, China, 2019. IEEE, pp 240–249. https://doi.org/10.1109/ICDM.2019.00034
Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfahringer B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9–10):1469–1495
Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):1–36
Harries M, Gama J, Bifet A (2009) NSW Electricity dataset. https://www.openml.org/d/151
Heidrich-Meisner V, Igel C (2009) Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th annual international conference on machine learning, pp 401–408
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106
Hunt EB, Marin J, Stone PJ (1966) Experiments in induction. Academic Press. https://books.google.com.au/books?id=60NDAAAAIAAJ
Jeffrey S, Douglas F (1986) A case study of incremental concept induction. AAAI 86:496–501
Kaluza B, Mirchevska V, Dovgan E, Lustrek M, Gams M (2010) An agent-based approach to care in independent living, pp. 177–186. https://doi.org/10.1007/978-3-642-16917-5_18
Kuncheva LI (2003) That elusive diversity in classifier ensembles. In: Iberian conference on pattern recognition and image analysis. Springer, pp 1126–1138
Kwapisz JR, Weiss GM, Moore SA (2010) Activity recognition using cell phone accelerometers. In: Proceedings of the fourth international workshop on knowledge discovery from sensor data, pp 10–18
Larry Wasserman (n.d.). Lecture Notes 3 — Review: Bounded Random Variables - Hoeffd- ing’s bound. https://www.stat.cmu.edu/~larry/=stat705/Lecture3.pdf
Lyman R (2016) Character font images data set: UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/Character+Font+Images
Manapragada C, Webb GI, Salehi M (2018) Extremely fast decision tree. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. ACM, pp 1953–1962
Manapragada C, Webb GI, Salehi M, Bifet A (2020). Emergent and unspecified behaviors in streaming decision trees. arXiv:2010.08199 [cs.LG]
Oza NC (2005) Online bagging and boosting. In: Jamshidi M (ed) International conference on systems, man, and cybernetics, special session on ensemble methods for extreme environments. Institute for Electrical and Electronics Engineers, New Jersey, pp 2340–2345
Quinlan JR (1979) Discovering rules by induction from large collections of exam ples. In: Expert systems in the micro electronics age
Quinlan JR (1983) Learning efficient classification procedures and their application to chess end games. Mach Learn, pp 463–482
Quinlan JR (1992) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo. http://cds.cern.ch/record/2031749
Ramon H, Thiago M, Jordi F, Nikolai R (2016) Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemom Intell Lab Syst 157:169–176
Reiss A, Stricker D (2012) Introducing a new benchmarked dataset for ac tivity monitoring. In: 2012 16th international symposium on wearable computers (ISWC). IEEE, pp 108–109
Roe B, Yang H, Zhu J, Liu Y, Stancu I, McGregor G (2004) Boosted decision trees as an alternative to artificial neural networks for particle identification. In: Nuclear instruments and methods in physics research A 543. https://doi.org/10.1016/j.nima.2004.12.018
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227
Schlimmer J, Granger R (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354. https://doi.org/10.1007/BF00116895
Servedio Rocco A (2003) Smooth boosting and learning with malicious noise. J Mach Learn Res 4:633–648
SIGKDD (2015) 2015 KDD Test of Time Award Winners. https://www.kdd.org/ awards/view/2015-kdd-test-of-time (visited on 12/10/2019)
Stisen A, Blunck H, Bhattacharya S, Prentow T, Kjaergaard M, Dey A, Sonne T, Jensen M (2015) Smart devices are different: assessing and mitigating mobile sensing heterogeneities for activity recognition. In: Proceedings of the 13th ACM conference on embedded networked sensor systems. SenSys ’15. ACM, Seoul, pp 127–140
Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 377–382
Ugulino W, Cardador D, Vega K, Velloso E, Milidiu R, Fuks H (2012) Wearable computing: accelerometers’ data classification of body postures and movements. https://doi.org/10.1007/978-3-642-34459-6_6
Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186
Visser B, Gouk H (2018) AWS dataset. https://www.openml.org/ d/41424
Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390. https://doi.org/10.1162/neco.1996.8.7.1341
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evolut Comput 1(1):67
Wolpert DH, Macready WG (2005) Coevolutionary free lunches. IEEE Trans Evolut Comput 9(6):721–735
Yair M, Michael B, Yael M, Yisroel M, Asaf S, Dominik B, Yuval E (2018) N-BaIoT–Network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput 17:12–22. https://doi.org/10.1109/MPRV.2018.03367731
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Henrik Boström.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Manapragada, C., Gomes, H.M., Salehi, M. et al. An eager splitting strategy for online decision trees in ensembles. Data Min Knowl Disc 36, 566–619 (2022). https://doi.org/10.1007/s10618-021-00816-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-021-00816-x