research-article

Stacking Bagged and Boosted Forests for Effective Automated Classification

Authors:

Marcos André GonçalvesAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 105 - 114

https://doi.org/10.1145/3077136.3080815

Published: 07 August 2017 Publication History

Get Access

Abstract

Random Forest (RF) is one of the most successful strategies for automated classification tasks. Motivated by the RF success, recently proposed RF-based classification approaches leverage the central RF idea of aggregating a large number of low-correlated trees, which are inherently parallelizable and provide exceptional generalization capabilities. In this context, this work brings several new contributions to this line of research. First, we propose a new RF-based strategy (BERT) that applies the boosting technique in bags of extremely randomized trees. Second, we empirically demonstrate that this new strategy, as well as the recently proposed BROOF and LazyNN_RF classifiers do complement each other, motivating us to stack them to produce an even more effective classifier. Up to our knowledge, this is the first strategy to effectively combine the three main ensemble strategies: stacking, bagging (the cornerstone of RFs) and boosting. Finally, we exploit the efficient and unbiased stacking strategy based on out-of-bag (OOB) samples to considerably speedup the very costly training process of the stacking procedure. Our experiments in several datasets covering two high-dimensional and noisy domains of topic and sentiment classification provide strong evidence in favor of the benefits of our RF-based solutions. We show that BERT is among the top performers in the vast majority of analyzed cases, while retaining the unique benefits of RF classifiers (explainability, parallelization, easiness of parameterization). We also show that stacking only the recently proposed RF-based classifiers and BERT using our OOB-based strategy is not only significantly faster than recently proposed stacking strategies (up to six times) but also much more effective, with gains up to 21% and 17% on MacroF1 and MicroF1, respectively, over the best base method, and of 5% and 6% over a stacking of traditional methods, performing no worse than a complete stacking of methods at a much lower computational effort.

References

[1]

Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 1999. Modern Information Retrieval. ACM Press / Addison-Wesley.

Abstract

References

Cited By

Index Terms

Recommendations

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

Application of bagging, boosting and stacking to intrusion detection

Switching class labels to generate classification ensembles

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations