An eager splitting strategy for online decision trees in ensembles

Manapragada, Chaitanya; Gomes, Heitor M.; Salehi, Mahsa; Bifet, Albert; Webb, Geoffrey I.

doi:10.1007/s10618-021-00816-x

An eager splitting strategy for online decision trees in ensembles

Published: 04 January 2022

Volume 36, pages 566–619, (2022)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Chaitanya Manapragada ORCID: orcid.org/0000-0002-3726-075X¹,
Heitor M. Gomes²,
Mahsa Salehi¹,
Albert Bifet² &
…
Geoffrey I. Webb¹

637 Accesses
1 Altmetric
Explore all metrics

Abstract

Decision tree ensembles are widely used in practice. In this work, we study in ensemble settings the effectiveness of replacing the split strategy for the state-of-the-art online tree learner, Hoeffding Tree, with a rigorous but more eager splitting strategy that we had previously published as Hoeffding AnyTime Tree. Hoeffding AnyTime Tree (HATT), uses the Hoeffding Test to determine whether the current best candidate split is superior to the current split, with the possibility of revision, while Hoeffding Tree aims to determine whether the top candidate is better than the second best and if a test is selected, fixes it for all posterity. HATT converges to the ideal batch tree while Hoeffding Tree does not. We find that HATT is an efficacious base learner for online bagging and online boosting ensembles. On UCI and synthetic streams, HATT as a base learner outperforms HT at a 0.05 significance level for the majority of tested ensembles on what we believe is the largest and most comprehensive set of testbenches in the online learning literature. Our results indicate that HATT is a superior alternative to Hoeffding Tree in a large number of ensemble settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Base-Learners into Ensembles

2CS: Correlation-Guided Split Candidate Selection in Hoeffding Tree Regressors

The online performance estimation framework: heterogeneous ensemble learning for data streams

Article Open access 21 December 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

In the prequential setting, training instances arrive in a sequence, and the true target value pertaining to each training instance is made available after the predictor has offered a prediction for a sequence of n instances. The loss function applied is necessarily incremental in nature. Choosing $n=1$ — that is, evaluating and then updating the predictor after every instance—is an obvious transformation of a periodic evaluation process into an instantaneous one. While not typical of real world application scenarios, prequential accuracy serves as a useful approximation thereto.
There is a common misconception that an individual random variable “changes, taking on a number of values during a process”; in fact, a process is a sequence of events, each of which corresponds to an individual random variable that has taken a particular value (which is fixed and never to change).

References

Agrawal R, Ghosh S, Imielinski T, Iyer B, Swami A (1992) An interval classifier for database mining applications, pp 560–573
Bhatt R, Dhall A (2012) Skin segmentation dataset: UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/skin+segmentation
Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: Mas16 sive online analysis. J Mach Learn Res, pp 1601–1604
Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolv18 ing data streams. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 135–150
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009a) CovPokElec dataset from new ensemble methods for evolving data streams, KDD ’09. https://www.openml.org/d/149
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavaldà R (2009b) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 139–148
Bifet A, Ikonomovska E (2009) Airlines Dataset. https://www.openml.org/d/1169
Blackard J, Dean D (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variable, vol 24, pp 131–151
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York
MATH Google Scholar
Brown G (2017) Ensemble learning. In: Claude S, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, MA pp 393–402. https://doi.org/10.1007/978-1-4899-7687-1_252
Burgués J, Jiménez-Soto JM, Marco S (2018) Estima tion of the limit of detection in semiconductor gas sensors through linearized calibration models. Anal Chim Acta. https://doi.org/10.1016/j.aca.2018.01.062
Article Google Scholar
Burgués J, Marco S (2018) Multivariate estimation of the limit of de tection by orthogonal partial least squares in temperature-modulated MOX sensors. In: Analytica Chimica Acta 1019, pp 49–64. https://doi.org/10.1016/j.aca.2018.03.005. http://www.sciencedirect.com/science/article/pii/S0003267018303702
Chen S-T, Lin H-T, Lu C-J (2012) An online boosting algorithm with theoretical justifications. In: arXiv preprint arXiv:1206.6422
de Barros RSM, de Carvalho Santos SGT, Junior PMG (2016) A boosting-like online learning ensemble. In: 2016 international joint conference on neural networks (IJCNN), pp 1871–1878. https://doi.org/10.1109/IJCNN.2016.7727427
de Carvalho SSGT, Gonçalves JPM, dos Santos SGD, de Barros RSM (2014) Speeding up recovery from concept drifts. In: Toon C, Floriana E, Eyke H, Rosa M (eds) Machine learning and knowledge discovery in databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III. Springer, Berlin, Heidelberg, pp 179–194. https://doi.org/10.1007/978-3-662-44845-8_12
de Mello RF, Chaitanya M, Albert B (2019) Measuring the shattering coefficient of decision tree models. Expert Syst Appl 137:443–452
Article Google Scholar
Dietterich TG (2000) Ensemble methods in machine learning. In: International workshop on multiple classifier systems. Springer, pp 1–15
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80
Dua D, Graff C (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml
Gehrke J, Ramakrishnan R, Ganti V (2000) RainForest–a framework for fast decision tree construction of large datasets. Data Min Knowl Discov 4(2–3):127–162
Article Google Scholar
Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y (1999) BOAT—optimistic decision tree construction. In: Proceedings of the 1999 ACM SIG- MOD international conference on management of data. SIGMOD ’99. ACM, Philadelphia, Pennsylvania, pp 169–180. https://doi.org/10.1145/304182.304197
Gomes HM, Read J, Bifet A (2019) Streaming random patches for evolving data stream classification. In: Jianyong W, Kyuseok S, Xindong W (eds) 2019 IEEE International conference on data mining, ICDM 2019, Beijing, China, 2019. IEEE, pp 240–249. https://doi.org/10.1109/ICDM.2019.00034
Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfahringer B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9–10):1469–1495
Article MathSciNet Google Scholar
Gomes HM, Barddal JP, Enembreck F, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):1–36
Article Google Scholar
Harries M, Gama J, Bifet A (2009) NSW Electricity dataset. https://www.openml.org/d/151
Heidrich-Meisner V, Igel C (2009) Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search. In: Proceedings of the 26th annual international conference on machine learning, pp 401–408
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Article MathSciNet Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106
Hunt EB, Marin J, Stone PJ (1966) Experiments in induction. Academic Press. https://books.google.com.au/books?id=60NDAAAAIAAJ
Jeffrey S, Douglas F (1986) A case study of incremental concept induction. AAAI 86:496–501
Google Scholar
Kaluza B, Mirchevska V, Dovgan E, Lustrek M, Gams M (2010) An agent-based approach to care in independent living, pp. 177–186. https://doi.org/10.1007/978-3-642-16917-5_18
Kuncheva LI (2003) That elusive diversity in classifier ensembles. In: Iberian conference on pattern recognition and image analysis. Springer, pp 1126–1138
Kwapisz JR, Weiss GM, Moore SA (2010) Activity recognition using cell phone accelerometers. In: Proceedings of the fourth international workshop on knowledge discovery from sensor data, pp 10–18
Larry Wasserman (n.d.). Lecture Notes 3 — Review: Bounded Random Variables - Hoeffd- ing’s bound. https://www.stat.cmu.edu/~larry/=stat705/Lecture3.pdf
Lyman R (2016) Character font images data set: UCI machine learning repository. https://archive.ics.uci.edu/ml/datasets/Character+Font+Images
Manapragada C, Webb GI, Salehi M (2018) Extremely fast decision tree. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. ACM, pp 1953–1962
Manapragada C, Webb GI, Salehi M, Bifet A (2020). Emergent and unspecified behaviors in streaming decision trees. arXiv:2010.08199 [cs.LG]
Oza NC (2005) Online bagging and boosting. In: Jamshidi M (ed) International conference on systems, man, and cybernetics, special session on ensemble methods for extreme environments. Institute for Electrical and Electronics Engineers, New Jersey, pp 2340–2345
Quinlan JR (1979) Discovering rules by induction from large collections of exam ples. In: Expert systems in the micro electronics age
Quinlan JR (1983) Learning efficient classification procedures and their application to chess end games. Mach Learn, pp 463–482
Quinlan JR (1992) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo. http://cds.cern.ch/record/2031749
Ramon H, Thiago M, Jordi F, Nikolai R (2016) Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemom Intell Lab Syst 157:169–176
Article Google Scholar
Reiss A, Stricker D (2012) Introducing a new benchmarked dataset for ac tivity monitoring. In: 2012 16th international symposium on wearable computers (ISWC). IEEE, pp 108–109
Roe B, Yang H, Zhu J, Liu Y, Stancu I, McGregor G (2004) Boosted decision trees as an alternative to artificial neural networks for particle identification. In: Nuclear instruments and methods in physics research A 543. https://doi.org/10.1016/j.nima.2004.12.018
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227
Google Scholar
Schlimmer J, Granger R (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354. https://doi.org/10.1007/BF00116895
Article Google Scholar
Servedio Rocco A (2003) Smooth boosting and learning with malicious noise. J Mach Learn Res 4:633–648
MathSciNet MATH Google Scholar
SIGKDD (2015) 2015 KDD Test of Time Award Winners. https://www.kdd.org/ awards/view/2015-kdd-test-of-time (visited on 12/10/2019)
Stisen A, Blunck H, Bhattacharya S, Prentow T, Kjaergaard M, Dey A, Sonne T, Jensen M (2015) Smart devices are different: assessing and mitigating mobile sensing heterogeneities for activity recognition. In: Proceedings of the 13th ACM conference on embedded networked sensor systems. SenSys ’15. ACM, Seoul, pp 127–140
Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 377–382
Ugulino W, Cardador D, Vega K, Velloso E, Milidiu R, Fuks H (2012) Wearable computing: accelerometers’ data classification of body postures and movements. https://doi.org/10.1007/978-3-642-34459-6_6
Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186
Article Google Scholar
Visser B, Gouk H (2018) AWS dataset. https://www.openml.org/ d/41424
Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994
Article MathSciNet Google Scholar
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390. https://doi.org/10.1162/neco.1996.8.7.1341
Article Google Scholar
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evolut Comput 1(1):67
Article Google Scholar
Wolpert DH, Macready WG (2005) Coevolutionary free lunches. IEEE Trans Evolut Comput 9(6):721–735
Article Google Scholar
Yair M, Michael B, Yael M, Yisroel M, Asaf S, Dominik B, Yuval E (2018) N-BaIoT–Network-based detection of iot botnet attacks using deep autoencoders. IEEE Pervasive Comput 17:12–22. https://doi.org/10.1109/MPRV.2018.03367731
Article Google Scholar

Download references

Author information

Authors and Affiliations

Monash University, Clayton, Australia
Chaitanya Manapragada, Mahsa Salehi & Geoffrey I. Webb
University of Waikato, Hamilton, New Zealand
Heitor M. Gomes & Albert Bifet

Authors

Chaitanya Manapragada
View author publications
You can also search for this author in PubMed Google Scholar
Heitor M. Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Mahsa Salehi
View author publications
You can also search for this author in PubMed Google Scholar
Albert Bifet
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaitanya Manapragada.

Additional information

Responsible editor: Henrik Boström.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Manapragada, C., Gomes, H.M., Salehi, M. et al. An eager splitting strategy for online decision trees in ensembles. Data Min Knowl Disc 36, 566–619 (2022). https://doi.org/10.1007/s10618-021-00816-x

Download citation

Received: 20 October 2020
Accepted: 13 November 2021
Published: 04 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10618-021-00816-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An eager splitting strategy for online decision trees in ensembles

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Base-Learners into Ensembles

2CS: Correlation-Guided Split Candidate Selection in Hoeffding Tree Regressors

The online performance estimation framework: heterogeneous ensemble learning for data streams

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An eager splitting strategy for online decision trees in ensembles

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Base-Learners into Ensembles

2CS: Correlation-Guided Split Candidate Selection in Hoeffding Tree Regressors

The online performance estimation framework: heterogeneous ensemble learning for data streams

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation