Article

Free access

LightGBM: a highly efficient gradient boosting decision tree

Authors:

Tie-Yan LiuAuthors Info & Claims

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

Pages 3149 - 3157

Published: 04 December 2017 Publication History

PDF eReader Publisher Site

Abstract

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB LightGBM. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

References

[1]

Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189-1232, 2001.

[2]

Ping Li. Robust logitboost and adaptive base class (abc) logitboost. arXiv preprint arXiv:1203.3491, 2012.

Digital Library

[3]

Matthew Richardson, Ewa Dominowska, and Robert Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web, pages 521-530. ACM, 2007.

Digital Library

[4]

Christopher JC Burges. Fromranknet to lambdarank to lambdamart: An overview. Learning, 11(23-581):81, 2010.

[5]

Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2):337-407, 2000.

[6]

Charles Dubout and François Fleuret. Boosting with maximum adaptive sampling. In Advances in Neural Information Processing Systems, pages 1332-1340, 2011.

[7]

Ron Appel, Thomas J Fuchs, Piotr Dollár, and Pietro Perona. Quickly boosting decision trees-pruning underachieving features early. In ICML (3), pages 594-602, 2013.

[8]

Manish Mehta, Rakesh Agrawal, and Jorma Rissanen. Sliq: A fast scalable classifier for data mining. In International Conference on Extending Database Technology, pages 18-32. Springer, 1996.

[9]

John Shafer, Rakesh Agrawal, and Manish Mehta. Sprint: A scalable parallel classier for data mining. In Proc. 1996 Int. Conf. Very Large Data Bases, pages 544-555. Citeseer, 1996.

[10]

Sanjay Ranka and V Singh. Clouds: A decision tree classifier for large datasets. In Proceedings of the 4th Knowledge Discovery and Data Mining Conference, pages 2-8, 1998.

[11]

Ruoming Jin and Gagan Agrawal. Communication and memory efficient parallel decision tree construction. In Proceedings of the 2003 SIAM International Conference on Data Mining, pages 119-129. SIAM, 2003.

[12]

Ping Li, Christopher JC Burges, Qiang Wu, JC Platt, D Koller, Y Singer, and S Roweis. Mcrank: Learning to rank using multiple classification and gradient boosting. In NIPS, volume 7, pages 845-852, 2007.

[13]

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785-794. ACM, 2016.

Digital Library

[14]

Stephen Tyree, Kilian Q Weinberger, Kunal Agrawal, and Jennifer Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387-396. ACM, 2011.

Digital Library

[15]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(Oct):2825-2830, 2011.

Digital Library

[16]

Greg Ridgeway. Generalized boosted models: A guide to the gbm package. Update, 1(1):2007, 2007.

[17]

Huan Zhang, Si Si, and Cho-Jui Hsieh. Gpu-acceleration for large-scale tree boosting. arXiv preprint arXiv:1706.08359, 2017.

[18]

Rory Mitchell and Eibe Frank. Accelerating the xgboost algorithm using gpu computing. Peer J Preprints, 5:e2911v1, 2017.

[19]

Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, and Tieyan Liu. A communication-efficient parallel algorithm for decision tree. In Advances in Neural Information Processing Systems, pages 1271-1279, 2016.

[20]

Jerome H Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367-378, 2002.

Digital Library

[21]

Michael Collins, Robert E Schapire, and Yoram Singer. Logistic regression, adaboost and bregman distances. Machine Learning, 48(1-3):253-285, 2002.

Digital Library

[22]

Ian Jolliffe. Principal component analysis. Wiley Online Library, 2002.

[23]

Luis O Jimenez and David A Landgrebe. Hyperspectral data analysis and supervised feature reduction via projection pursuit. IEEE Transactions on Geoscience and Remote Sensing, 37(6):2653-2667, 1999.

[24]

Zhi-Hua Zhou. Ensemble methods: foundations and algorithms. CRC press, 2012.

Digital Library

[25]

Tommy R Jensen and Bjarne Toft. Graph coloring problems, volume 39. John Wiley & Sons, 2011.

[26]

Tao Qin and Tie-Yan Liu. Introducing LETOR 4.0 datasets. CoRR, abs/1306.2597, 2013.

[27]

Allstate claim data, https://www.kaggle.eom/c/ClaimPredictionChallenge.

[28]

Flight delay data, https://github.com/szilard/benchm-ml#data.

[29]

Hsiang-Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh, Jing-Kai Lou, Todd G McKenzie, Jung-Wei Chou, Po-Han Chung, Chia-Hua Ho, Chun-Fu Chang, Yin-Hsuan Wei, et al. Feature engineering and classifier ensemble for kdd cup 2010. In KDD Cup, 2010.

[30]

Kuan-Wei Wu, Chun-Sung Ferng, Chia-Hua Ho, An-Chun Liang, Chun-Heng Huang, Wei-Yuan Shen, Jyun-Yu Jiang, Ming-Hao Yang, Ting-Wei Lin, Ching-Pei Lee, et al. A two-stage ensemble of diverse models for advertisement ranking in kdd cup 2012. In KDDCup, 2012.

[31]

Libsvm binary classification data, https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

[32]

Haijian Shi. Best-first decision tree learning. PhD thesis, The University of Waikato, 2007.

Cited By

Wu SLu WYin XYang R(2025)Robust watermarking against arbitrary scaling and cropping attacksSignal Processing10.1016/j.sigpro.2024.109655226:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.sigpro.2024.109655
Fan QLi XWang PJin XYao SMiao SAn MZhao Y(2025)IDADFuture Generation Computer Systems10.1016/j.future.2024.07.049162:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.future.2024.07.049
Takeda NLegaspi RNishimura YIkeda KMinamikawa APlötz TChernova S(2024)Sensor event sequence prediction for proactive smart homeJournal of Ambient Intelligence and Smart Environments10.3233/AIS-23042916:3(275-308)Online publication date: 24-Sep-2024
https://dl.acm.org/doi/10.3233/AIS-230429
Show More Cited By

Recommendations

LightGBM: An Effective miRNA Classification Method in Breast Cancer Patients
ICCBB '17: Proceedings of the 2017 International Conference on Computational Biology and Bioinformatics

miRNAs are small noncoding RNA molecules, mainly responsible for post-transcriptional control of gene expressions. Machine learning is becoming more and more widely used in breast tumor classification and diagnosis. In this paper, we compared the ...
Ship Classification Based on Trajectories Data and LightGBM Considering Offshore Distance Feature
Spatial Data and Intelligence
Abstract
Ship classification based on AIS trajectory data is an important aspect of spatio-temporal trajectory data mining. Aiming at the fact that most of the features extracted by the existing ship classification methods are motion features, which ignore ...
LightGBM robust optimization algorithm based on topological data analysis
ICCMT '24: Proceedings of the 2024 International Conference on Computer and Multimedia Technology

To enhance the robustness of the Light Gradient Boosting Machine (LightGBM) algorithm for image classification, a topological data analysis (TDA)-based robustness optimization algorithm for LightGBM, TDA-LightGBM, is proposed to address the interference ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems

December 2017

7104 pages

ISBN:9781510860964

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 04 December 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

333
Total Citations
View Citations
6,584
Total Downloads

Downloads (Last 12 months)2,652
Downloads (Last 6 weeks)446

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu SLu WYin XYang R(2025)Robust watermarking against arbitrary scaling and cropping attacksSignal Processing10.1016/j.sigpro.2024.109655226:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.sigpro.2024.109655
Fan QLi XWang PJin XYao SMiao SAn MZhao Y(2025)IDADFuture Generation Computer Systems10.1016/j.future.2024.07.049162:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.future.2024.07.049
Takeda NLegaspi RNishimura YIkeda KMinamikawa APlötz TChernova S(2024)Sensor event sequence prediction for proactive smart homeJournal of Ambient Intelligence and Smart Environments10.3233/AIS-23042916:3(275-308)Online publication date: 24-Sep-2024
https://dl.acm.org/doi/10.3233/AIS-230429
Mohr-Daurat HTheodorakis GPirk H(2024)Hardware-Efficient Data Imputation through DBMS ExtensibilityProceedings of the VLDB Endowment10.14778/3681954.368201617:11(3497-3510)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682016
Qiu XHu JZhou LWu XDu JZhang BGuo CZhou AJensen CSheng ZYang B(2024)TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsProceedings of the VLDB Endowment10.14778/3665844.366586317:9(2363-2377)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.14778/3665844.3665863
Yue LXing SChen JFu T(2024)TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language ModelsProceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3698587.3701375(1-9)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3698587.3701375
Han YDu QHuang YWu JTian FHe CFilkov VRay BZhou M(2024)The Potential of One-Shot Failure Root Cause Analysis: Collaboration of the Large Language Model and Small ClassifierProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695475(931-943)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695475
Huang ZHuang YChen XZhou XYang CZheng ZFilkov VRay BZhou M(2024)An Empirical Study on Learning-based Techniques for Explicit and Implicit Commit Messages GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695025(544-556)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695025
Zhao WYang SLuo X(2024)Formulating Major League Baseball Playoff Prediction as a Ranking ProblemProceedings of the 2024 6th International Conference on Pattern Recognition and Intelligent Systems10.1145/3689218.3689229(74-81)Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1145/3689218.3689229
Fujikawa KMurakami NSugawara Y(2024)Enhancing News Recommendation with Transformers and Ensemble LearningProceedings of the Recommender Systems Challenge 202410.1145/3687151.3687160(42-47)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3687151.3687160
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents