A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF)

Delgado-Panadero, Ángel; Benítez-Andrades, José Alberto; García-Ordás, María Teresa

doi:10.1007/s10489-023-04735-w

A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF)

Published: 05 July 2023

Volume 53, pages 22991–23003, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ángel Delgado-Panadero¹,
José Alberto Benítez-Andrades ORCID: orcid.org/0000-0002-4450-349X² &
María Teresa García-Ordás³

280 Accesses
4 Altmetric
Explore all metrics

Abstract

Tree ensemble algorithms as RandomForest and GradientBoosting are currently the dominant methods for modeling discrete or tabular data, however, they are unable to perform a hierarchical representation learning from raw data as NeuralNetworks does thanks to its multi-layered structure, which is a key feature for DeepLearning problems and modeling unstructured data. This limitation is due to the fact that tree algorithms can not be trained with back-propagation because of their mathematical nature. However, in this work, we demonstrate that the mathematical formulation of bagging and boosting can be combined together to define a graph-structured-tree-ensemble algorithm with a distributed representation learning process between trees naturally (without using back-propagation). We call this novel approach Distributed Gradient Boosting Forest (DGBF) and we demonstrate that both RandomForest and GradientBoosting can be expressed as particular graph architectures of DGBT. Finally, we see that the distributed learning outperforms both RandomForest and GradientBoosting in 7 out of 9 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs

Transferring Tree Ensembles to Neural Networks

Tree-Based Models for Federated Learning Systems

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

All the data and materials are available at https://doi.org/10.5281/zenodo.7236216.

Notes

https://doi.org/10.5281/zenodo.7236216

References

Borisov V, Leemann T, Seßler K, Haug J, Pawelczyk M, Kasneci G (2022) Deep Neural Networks and Tabular Data: A Survey. IEEE Trans Neural Netw Learn Syst 1–21. https://doi.org/10.1109/TNNLS.2022.3229161
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? J. Mach. Learn. Technol 15(90):3133–3181
MathSciNet MATH Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1983) Classification and Regression Trees
Bengio Y, Mesnil G, Dauphin Y, Rifai S (2013) Better Mixing via Deep Representations. In: Dasgupta S, McAllester D, editors. Proceedings of the 30th International Conference on Machine Learning. vol. 28 of Proceedings of Machine Learning Research. Atlanta, Georgia, USA: PMLR; p. 552–560. Available from: https://proceedings.mlr.press/v28/bengio13.html
Bengio Y, Courville A, Vincent P (2013) Representation Learning: A Review and New Perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828. https://doi.org/10.1109/TPAMI.2013.50
Article Google Scholar
Kontschieder P, Fiterau M, Criminisi A, Bulo SR (2015) Deep Neural Decision Forests. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Biau G, Scornet E, Welbl J (2016) Neural Random Forests. Sankhya A. 04:81. https://doi.org/10.1007/s13171-018-0133-y
Article Google Scholar
Breiman L (2001) Random Forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Friedman JH (2000) Greedy Function Approximation: A Gradient Boosting Machine. Ann Stat 29:1189–1232
MathSciNet MATH Google Scholar
Dorogush AV, Gulin A, Gusev G, Kazeev N, Prokhorenkova LO, Vorobev A (2017) Fighting biases with dynamic boosting. CoRR. arXiv:1706.09516
Zhang G, Lu Y (2012) Bias-corrected random forests in regression. J Appl Stat 39(1):151–160. https://doi.org/10.1080/02664763.2011.578621
Article MathSciNet MATH Google Scholar
Mentch L, Hooker G (2016) Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests. J Mach Learn Res 17(1):841–881
MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning. Springer Series in Statistics. New York, NY, USA: Springer New York Inc
Pavlov DY, Gorodilov A, Brunk CA (2010) BagBoo: A Scalable Hybrid Bagging-the-Boosting Model. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. CIKM’10. New York, NY, USA: Association for Computing Machinery; p. 1897–1900
Jafarzadeh H, Mahdianpari M, Gill E, Mohammadimanesh F, Homayouni S (2021) Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation. Remote Sensing. 13(21). https://doi.org/10.3390/rs13214405
Ghosal I, Hooker G (2021) Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and Its Variance Estimate. J Comput Graph Stat 30(2):493–502. https://doi.org/10.1080/10618600.2020.1820345
Article MathSciNet MATH Google Scholar
Chatterjee S, Das A (2022) An ensemble algorithm integrating consensusclustering with feature weighting based ranking and probabilistic fuzzy logic-multilayer perceptron classifier for diagnosis and staging of breast cancer using heterogeneous datasets. Appl Intell. https://doi.org/10.1007/s10489-022-04157-0
Article Google Scholar
Rashid M, Kamruzzaman J, Imam T, Wibowo S, Gordon S (2022) A tree-based stacking ensemble technique with feature selection for network intrusion detection. Appl Intell 52(9):9768–9781. https://doi.org/10.1007/s10489-021-02968-1
Feng J, Yu Y, Zhou ZH (2018) Multi-Layered Gradient Boosting Decision Trees. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18. Red Hook, NY, USA: Curran Associates Inc. p. 3555–3565
Morid MA, Kawamoto K, Ault T, Dorius J, Abdelrahman S (2018) Supervised Learning Methods for Predicting Healthcare Costs: Systematic Literature Review and Empirical Evaluation. AMIA Annual Symposium proceedings AMIA Symposium 2017:1312–1321
Google Scholar
Yang H, Luo Y, Ren X, Wu M, He X, Peng B et al (2021) Risk Prediction of Diabetes: Big data mining with fusion of multifarious physical examination indicators. Information Fusion. https://doi.org/10.1016/j.inffus.2021.02.015
Article Google Scholar
Iwendi C, Bashir AK, Peshkar A, Sujatha R, Chatterjee JM, Pasupuleti S et al (2020) COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm. Frontiers in Public Health. 8. https://doi.org/10.3389/fpubh.2020.00357
Hew KF, Hu X, Qiao C, Tang Y (2020) What predicts student satisfaction with MOOCs: A gradient boosting trees supervised machine learning and sentiment analysis approach. Comput Educ 145:103724. https://doi.org/10.1016/j.compedu.2019.103724
Lu H, Cheng F, Ma X, Hu G (2020) Short-term prediction of building energy consumption employing an improved extreme gradient boosting model: A case study of an intake tower. Energy 203:117756. https://doi.org/10.1016/j.energy.2020.117756
Karasu S, Altan A (2019) Recognition Model for Solar Radiation Time Series based on Random Forest with Feature Selection Approach. In: 2019 11th International Conference on Electrical and Electronics Engineering (ELECO) p. 8–11
Lee TH, Ullah A, Wang R (2020) In: Fuleky P, editor. Bootstrap Aggregating and Random Forest. Cham: Springer International Publishing p. 389–429. Available from: https://doi.org/10.1007/978-3-030-31150-6_13
Carmona P, Climent F, Momparler A (2019) Predicting failure in the U.S. banking sector: An extreme gradient boosting approach. Int Rev Econ Finance 61:304–323. https://doi.org/10.1016/j.iref.2018.03.008
Ángel Delgado-Panadero, Hernández-Lorca B, García-Ordás MT, Benítez-Andrades JA (2022) Implementing local-explainability in Gradient Boosting Trees: Feature Contribution. Inf Sci 589:199–212. https://doi.org/10.1016/j.ins.2021.12 111
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International Group

Download references

Acknowledgements

I want to thank Sara San Luís Rodriguez for her selfless support and her perfectionism and also Bea Hernández Lorca because the help and questions she planted two years ago are the seeds of the trees from today.

Author information

Authors and Affiliations

Paradigma Digital S.L., Vía de las dos Castillas, 33, Pozuelo de Alarcón, 28224, Madrid, Spain
Ángel Delgado-Panadero
SALBIS Research Group, Department of Electric, Systems and Automatics Engineering, Universidad de León, Campus of Vegazana s/n, León, 24071, León, Spain
José Alberto Benítez-Andrades
SECOMUCI Research Group, Escuela de Ingenierías Industrial e Informática, Universidad de León, Campus of Vegazana s/n, León, 24071, León, Spain
María Teresa García-Ordás

Authors

Ángel Delgado-Panadero
View author publications
You can also search for this author in PubMed Google Scholar
José Alberto Benítez-Andrades
View author publications
You can also search for this author in PubMed Google Scholar
María Teresa García-Ordás
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Alberto Benítez-Andrades.

Ethics declarations

Competing interests

The authors declare that they have no conflicts of interest to this work. The people involved in the experiment have been informed and formally accepted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

José Alberto Benítez-Andrades and María Teresa García-Ordás contributed equally to this work.

Appendices

Appendix A

Appendix B background in CART, RandomForest and GradientBoosting bias

1.1 B.1 CART bias

Bias type 1

During the learning process, node regions are computed iteratively until an end condition is reached. Because of its exhaustive nature, the predictions tend to be biased producing high variance predictions overfitting the dataset $\{x_i\}$. This bias can be defined as

$$\begin{aligned} Bias_1(x) = E[Y \mid X=x]-E[y_i \mid x_i \in R_j] \mid _{x \in R_j} . \end{aligned}$$

(16)

Bias type 2

During the learning process, the splitting thresholds to produce the nodes are computed using only the middle point between two contiguous points from the dataset. No other value can be a threshold candidate. This produces a bias in the prediction due to two reasons. First because of lack of learning capacity in the underpopulated areas and second producing a high variance in the overpopulated areas. This can be visually understood in Fig. 6.

1.2 B.2 Ensemble algorithms

Ensemble algorithms are able to solve the biased prediction of CARTs by combining the predictions of many models trained separately. There are different ensemble algorithms but the main two are GradientBoosting and RandomForest.

RandomForest-Bagging

One way of combining the prediction of many trees to reduce bias is to aggregate the results by the mean. Given a dataset, the learning process of a tree is deterministic, so to make multiple trees produce different predictions, each of them is trained in a different bootstrap subsample of the dataset. This technique is called $bagging$ [8].

$$\begin{aligned} F(x) = \frac{1}{n_{trees}} \sum _{j=0}^{n_{trees}} h_j(x) , \end{aligned}$$

(17)

where $h_j(x)$ is trained to minimize the loss function, $L(y,f(x))$, over a $bagging$ subsample $\{x\}_j$

$$\begin{aligned} h_j(x) = \underset{h}{\mathrm {arg\ min}}\ L(y_i, h(x_i)) / x_i \in \{x\}_j . \end{aligned}$$

(18)

This technique is based on the central limit theorem from statistics where we expect that the variance manifest of the CART biases can be reduced by averaging over enough tree predictions. All the trees learn in parallel, and this kind of learning is called "horizontal learning".

GradientBoosting-Boosting

In contrast, in GradientBoosting each tree does not try to learn over a different dataset sample to minimize the loss, but it follows a "stage-wise" optimization approach, where each tree is added to the ensemble to reduce the global loss from the previous trees. The final prediction is the sum of all the trees in the ensemble. This process is called $boosting$ [9]

$$\begin{aligned} F(x) = \sum _{m=0}^M h_m(x) , \end{aligned}$$

(19)

where $M$ is the number of trees of the ensemble. To reduce the loss function from the previous trees, $L(y,F_{t-1}(x))$, each of those trees is fitted with the gradient of the loss function from the previous predictors:

$$\begin{aligned} h_m(x)= & {} \rho _m g_m (x) , \end{aligned}$$

(20)

$$\begin{aligned} g_m(x)= & {} E_y \left[ \frac{\partial L(y,F(x)}{\partial F(x)}\right] _{F=F_{m-1}} , \nonumber \\ \rho _m= & {} \underset{\rho '}{\mathrm {arg\ min}}\ E_{y,x} \left[ L(y,F_{m-1}(x)-{\rho '} g_m(x))\right] \end{aligned}$$

(21)

In the previous formula, the value of $E_y[.]$ can only be computed knowing the density function $P(y \mid x)$. In real-world problems, we never have the density function, consequently, the $boosting$ is approximated using finite data by assuming regularity in the $P(y \mid x)$ distribution. To do so, each tree is trained with pseudo-responses that are computed (in the case of the $RMSE$ loss) as the difference between the target and the predictions of a current ensemble of trees (residual errors).

$$\begin{aligned} h_m(x) \simeq \underset{h'}{\mathrm {arg\ min}}\ E \left[ L(y-F_{m-1}(x),h'(x))\right] , \end{aligned}$$

(22)

While CART tries to reduce the loss during the training process by splitting leaf nodes into children nodes, the $boosting$ algorithm tries to reduce the loss from each tree by adding another tree. However, the $boosting$ algorithm generalizes better than the optimization from trees because the trees in each state, optimize the loss function using the entire dataset and not only the subsample from the previous node

1.3 B.3 Bagging and boosting bias

Both, $bagging$ and $boosting$, are different ensemble techniques that rely on different mathematical approaches. $Bagging$ relies on generalization by reducing the variance by averaging the prediction from multiple predictors trained to predict the same response. Meanwhile, $boosting$ relies on reducing the global training loss from the previous tree by training a new tree over the pseudo-responses. Even though these techniques are used separately, the two approaches are not incompatible.

GradientBoosting Bias

In the finite data approximation, we suppose that training over the pseudo-responses is representative of the gradient. However, computing the pseudo-responses over the same training data from one tree to another produces biased pseudo-responses. It is demonstrated in [10] that the bias induced by the finite data approximation is:

$$\begin{aligned} Bias_{GB}(x)\equiv & {} E[F_{t-1}(x')]_{x'=x}\nonumber \\{} & {} -E[F_{t-1} (x') \mid x'=x_k]_{x'=x} , \end{aligned}$$

(23)

RandomForest Bias

The reduction of bias from CART using the RandomForest ensemble is based on the reduction of the prediction variance by averaging the predictions from a high number of trees. However, averaging over all the trees does not ensure the convergence to a global minimum of the loss function everywhere (i.e. in all the areas of the feature space). This is due to the robustness of the decision trees to learn outlier regions, which is lost by averaging over the trees which have not been trained with the outliers.

$$\begin{aligned} Bias_{RF}(x) \equiv E \left[ h(x) \mid x \in R_j \right] -E \left[ h'(x) \mid x \in R'_j \right] , \end{aligned}$$

(24)

where $h(x)$ and $h'(x)$ are trees trained with different subsamples. If these subsamples have different statistics (for instance, because of a small outlier region), the ensemble is not flexible enough to learn this characteristic behavior of that region.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Delgado-Panadero, Á., Benítez-Andrades, J.A. & García-Ordás, M.T. A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF). Appl Intell 53, 22991–23003 (2023). https://doi.org/10.1007/s10489-023-04735-w

Download citation

Accepted: 26 May 2023
Published: 05 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04735-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs

Transferring Tree Ensembles to Neural Networks

Tree-Based Models for Federated Learning Systems

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B background in CART, RandomForest and GradientBoosting bias

1.1 B.1 CART bias

Bias type 1

Bias type 2

1.2 B.2 Ensemble algorithms

RandomForest-Bagging

GradientBoosting-Boosting

1.3 B.3 Bagging and boosting bias

GradientBoosting Bias

RandomForest Bias

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Gradient Boosting Forest: a Two-Stage Ensemble Method Enabling Federated Learning of GBDTs

Transferring Tree Ensembles to Neural Networks

Tree-Based Models for Federated Learning Systems

Explore related subjects

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B background in CART, RandomForest and GradientBoosting bias

1.1 B.1 CART bias

Bias type 1

Bias type 2

1.2 B.2 Ensemble algorithms

RandomForest-Bagging

GradientBoosting-Boosting

1.3 B.3 Bagging and boosting bias

GradientBoosting Bias

RandomForest Bias

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation