Abstract
The random forest is a popular and effective classification method. It uses a combination of bootstrap resampling and subspace sampling to construct an ensemble of decision trees that are then averaged for a final prediction. In this paper, we propose a potential improvement on the random forest that can be thought of as applying a weight to each tree before averaging. The new method is motivated by the potential instability of averaging predictions of trees that may be of highly variable quality, and because of this, we replace the regular average with a Cesáro average. We provide both a theoretical analysis that gives exact conditions under which the new approach outperforms the traditional random forest, and numerical analysis that shows the new approach is competitive when training a classification model on numerous realistic data sets.
Similar content being viewed by others
References
Apostol, T. (1976). Introduction to analytic number theory, Berlin Germany. New York: Springer.
Bache, K. , & Lichman, M. UCI machine learning repository. http://archive.ics.uci.edu/ml.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Daho, M.E.H., Settouti, N., Lazouni, M.E., Chikh, M.E.A. (2014). Weighted vote for trees aggregation in random forest. In Intl Conference on Multimedia Computing Systems (ICMCS) (pp. 438–443).
Friedman, J.H. (2006). Recent advances in predictive (machine) learning. Journal of Classification, 23, 175–197.
Hendricks, P. (2015). titanic: Titanic passenger survival data set. R package version 0.1.0. https://CRAN.R-project.org/package=titanic.
Li, H.B., Wang, W., Ding, H.W., Dong, J. (2010). Trees weighting random forest method for classifying high-dimensional noisy data. In Proc. IEEE 7th Int. Conf. e-Business Eng. (ICEBE) (pp. 160–163).
Naghibi, S.A., Pourghasemi, H.R., Dixon, B. (2016). GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environmental Monitoring and Assessment, 188, 44.
Ronao, C.A., & Cho, S.B. (2015). Random forests with weighted voting for anomalous query access detection in relational databases. Artificial Intelligence and Soft Computing, 9120, 36–48.
Stein, E., & Shakarchi, R. (2003). Fourier analysis: an introduction Princeton. New Jersey: Princeton University Press.
Subasi, A., Alickovic, E., Kevric, J. (2017). Diagnosis of chronic kidney disease by using random forest. CMBEBIH, 62, 589–594.
Weisstein, E.W. (2004). Harmonic series. http://mathworld.wolfram.com/HarmonicSeries.html.
Winham, S.J., Freimuth, R.R., Biernacka, J.M. (2013). A weighted random forests approach to improve predictive performance. Statistical Analysis and Data Mining, 6, 496–505.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pham, H., Olafsson, S. On Cesáro Averages for Weighted Trees in the Random Forest. J Classif 37, 223–236 (2020). https://doi.org/10.1007/s00357-019-09322-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-019-09322-8