research-article

Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression

Authors:

Olivier Sprangers,

Sebastian Schelter, and

Maarten de RijkeAuthors Info & Claims

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

August 2021

Pages 1510 - 1520

https://doi.org/10.1145/3447548.3467278

Published: 14 August 2021 Publication History

Get Access

Abstract

Gradient Boosting Machines (GBM) are hugely popular for solving tabular data problems. However, practitioners are not only interested in point predictions, but also in probabilistic predictions in order to quantify the uncertainty of the predictions. Creating such probabilistic predictions is difficult with existing GBM-based solutions: they either require training multiple models or they become too computationally expensive to be useful for large-scale settings. We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner. PGBM approximates the leaf weights in a decision tree as a random variable, and approximates the mean and variance of each sample in a dataset via stochastic tree ensemble update equations. These learned moments allow us to subsequently sample from a specified distribution after training. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods: (i) PGBM enables probabilistic estimates without compromising on point performance in a single model, (ii) PGBM learns probabilistic estimates via a single model only (and without requiring multi-parameter boosting), and thereby offers a speedup of up to several orders of magnitude over existing state-of-the-art methods on large datasets, and (iii) PGBM achieves accurate probabilistic estimates in tasks with complex differentiable loss functions, such as hierarchical time series problems, where we observed up to 10% improvement in point forecasting performance and up to 300% improvement in probabilistic forecasting performance.

Supplementary Material

MP4 File (KDD21-rst1763.mp4)

Creating probabilistic predictions is difficult with existing Gradient Boosting Machines (GBM) solutions: they either require training multiple models or they become too computationally expensive to be useful for large-scale settings. We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods: (i) PGBM enables probabilistic estimates without compromising on point performance, (ii) PGBM learns probabilistic estimates via a single model only, thereby offering a speedup of up to several orders of magnitude over existing state-of-the-art methods on large datasets, and (iii) PGBM achieves accurate probabilistic estimates in tasks with complex differentiable loss functions, such as hierarchical time series problems, where we observed up to 300% improvement in probabilistic forecasting performance.

Download
152.72 MB

References

[1]

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang. 2020. GluonTS : Probabilistic and Neural Time Series Modeling in Python. Journal of Machine Learning Research, Vol. 21, 116 (2020), 1--6.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Incremental Natural Gradient Boosting for Probabilistic Regression

Probabilistic invariants for probabilistic machines

Probabilistic Boosting-Tree: Learning Discriminative Models for Classification, Recognition, and Clustering

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations