Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3294996.3295074guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

LightGBM: a highly efficient gradient boosting decision tree

Published: 04 December 2017 Publication History

Abstract

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB LightGBM. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

References

[1]
Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189-1232, 2001.
[2]
Ping Li. Robust logitboost and adaptive base class (abc) logitboost. arXiv preprint arXiv:1203.3491, 2012.
[3]
Matthew Richardson, Ewa Dominowska, and Robert Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web, pages 521-530. ACM, 2007.
[4]
Christopher JC Burges. Fromranknet to lambdarank to lambdamart: An overview. Learning, 11(23-581):81, 2010.
[5]
Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2):337-407, 2000.
[6]
Charles Dubout and François Fleuret. Boosting with maximum adaptive sampling. In Advances in Neural Information Processing Systems, pages 1332-1340, 2011.
[7]
Ron Appel, Thomas J Fuchs, Piotr Dollár, and Pietro Perona. Quickly boosting decision trees-pruning underachieving features early. In ICML (3), pages 594-602, 2013.
[8]
Manish Mehta, Rakesh Agrawal, and Jorma Rissanen. Sliq: A fast scalable classifier for data mining. In International Conference on Extending Database Technology, pages 18-32. Springer, 1996.
[9]
John Shafer, Rakesh Agrawal, and Manish Mehta. Sprint: A scalable parallel classier for data mining. In Proc. 1996 Int. Conf. Very Large Data Bases, pages 544-555. Citeseer, 1996.
[10]
Sanjay Ranka and V Singh. Clouds: A decision tree classifier for large datasets. In Proceedings of the 4th Knowledge Discovery and Data Mining Conference, pages 2-8, 1998.
[11]
Ruoming Jin and Gagan Agrawal. Communication and memory efficient parallel decision tree construction. In Proceedings of the 2003 SIAM International Conference on Data Mining, pages 119-129. SIAM, 2003.
[12]
Ping Li, Christopher JC Burges, Qiang Wu, JC Platt, D Koller, Y Singer, and S Roweis. Mcrank: Learning to rank using multiple classification and gradient boosting. In NIPS, volume 7, pages 845-852, 2007.
[13]
Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785-794. ACM, 2016.
[14]
Stephen Tyree, Kilian Q Weinberger, Kunal Agrawal, and Jennifer Paykin. Parallel boosted regression trees for web search ranking. In Proceedings of the 20th international conference on World wide web, pages 387-396. ACM, 2011.
[15]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(Oct):2825-2830, 2011.
[16]
Greg Ridgeway. Generalized boosted models: A guide to the gbm package. Update, 1(1):2007, 2007.
[17]
Huan Zhang, Si Si, and Cho-Jui Hsieh. Gpu-acceleration for large-scale tree boosting. arXiv preprint arXiv:1706.08359, 2017.
[18]
Rory Mitchell and Eibe Frank. Accelerating the xgboost algorithm using gpu computing. Peer J Preprints, 5:e2911v1, 2017.
[19]
Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, and Tieyan Liu. A communication-efficient parallel algorithm for decision tree. In Advances in Neural Information Processing Systems, pages 1271-1279, 2016.
[20]
Jerome H Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367-378, 2002.
[21]
Michael Collins, Robert E Schapire, and Yoram Singer. Logistic regression, adaboost and bregman distances. Machine Learning, 48(1-3):253-285, 2002.
[22]
Ian Jolliffe. Principal component analysis. Wiley Online Library, 2002.
[23]
Luis O Jimenez and David A Landgrebe. Hyperspectral data analysis and supervised feature reduction via projection pursuit. IEEE Transactions on Geoscience and Remote Sensing, 37(6):2653-2667, 1999.
[24]
Zhi-Hua Zhou. Ensemble methods: foundations and algorithms. CRC press, 2012.
[25]
Tommy R Jensen and Bjarne Toft. Graph coloring problems, volume 39. John Wiley & Sons, 2011.
[26]
Tao Qin and Tie-Yan Liu. Introducing LETOR 4.0 datasets. CoRR, abs/1306.2597, 2013.
[27]
Allstate claim data, https://www.kaggle.eom/c/ClaimPredictionChallenge.
[28]
Flight delay data, https://github.com/szilard/benchm-ml#data.
[29]
Hsiang-Fu Yu, Hung-Yi Lo, Hsun-Ping Hsieh, Jing-Kai Lou, Todd G McKenzie, Jung-Wei Chou, Po-Han Chung, Chia-Hua Ho, Chun-Fu Chang, Yin-Hsuan Wei, et al. Feature engineering and classifier ensemble for kdd cup 2010. In KDD Cup, 2010.
[30]
Kuan-Wei Wu, Chun-Sung Ferng, Chia-Hua Ho, An-Chun Liang, Chun-Heng Huang, Wei-Yuan Shen, Jyun-Yu Jiang, Ming-Hao Yang, Ting-Wei Lin, Ching-Pei Lee, et al. A two-stage ensemble of diverse models for advertisement ranking in kdd cup 2012. In KDDCup, 2012.
[31]
Libsvm binary classification data, https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.
[32]
Haijian Shi. Best-first decision tree learning. PhD thesis, The University of Waikato, 2007.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems
December 2017
7104 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 04 December 2017

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,652
  • Downloads (Last 6 weeks)446
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Robust watermarking against arbitrary scaling and cropping attacksSignal Processing10.1016/j.sigpro.2024.109655226:COnline publication date: 1-Jan-2025
  • (2025)IDADFuture Generation Computer Systems10.1016/j.future.2024.07.049162:COnline publication date: 1-Jan-2025
  • (2024)Sensor event sequence prediction for proactive smart homeJournal of Ambient Intelligence and Smart Environments10.3233/AIS-23042916:3(275-308)Online publication date: 24-Sep-2024
  • (2024)Hardware-Efficient Data Imputation through DBMS ExtensibilityProceedings of the VLDB Endowment10.14778/3681954.368201617:11(3497-3510)Online publication date: 1-Jul-2024
  • (2024)TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting MethodsProceedings of the VLDB Endowment10.14778/3665844.366586317:9(2363-2377)Online publication date: 1-May-2024
  • (2024)TrialEnroll: Predicting Clinical Trial Enrollment Success with Deep & Cross Network and Large Language ModelsProceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3698587.3701375(1-9)Online publication date: 22-Nov-2024
  • (2024)The Potential of One-Shot Failure Root Cause Analysis: Collaboration of the Large Language Model and Small ClassifierProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695475(931-943)Online publication date: 27-Oct-2024
  • (2024)An Empirical Study on Learning-based Techniques for Explicit and Implicit Commit Messages GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695025(544-556)Online publication date: 27-Oct-2024
  • (2024)Formulating Major League Baseball Playoff Prediction as a Ranking ProblemProceedings of the 2024 6th International Conference on Pattern Recognition and Intelligent Systems10.1145/3689218.3689229(74-81)Online publication date: 25-Jul-2024
  • (2024)Enhancing News Recommendation with Transformers and Ensemble LearningProceedings of the Recommender Systems Challenge 202410.1145/3687151.3687160(42-47)Online publication date: 14-Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media