Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1963405.1963461acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Parallel boosted regression trees for web search ranking

Published: 28 March 2011 Publication History

Abstract

Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned web-search ranking - a domain notorious for very large data sets. In this paper, we propose a novel method for parallelizing the training of GBRT. Our technique parallelizes the construction of the individual regression trees and operates using the master-worker paradigm as follows. The data are partitioned among the workers. At each iteration, the worker summarizes its data-partition using histograms. The master processor uses these to build one layer of a regression tree, and then sends this layer to the workers, allowing the workers to build histograms for the next layer. Our algorithm carefully orchestrates overlap between communication and computation to achieve good performance.
Since this approach is based on data partitioning, and requires a small amount of communication, it generalizes to distributed and shared memory machines, as well as clouds. We present experimental results on both shared memory machines and clusters for two large scale web search ranking data sets. We demonstrate that the loss in accuracy induced due to the histogram approximation in the regression tree creation can be compensated for through slightly deeper trees. As a result, we see no significant loss in accuracy on the Yahoo data sets and a very small reduction in accuracy for the Microsoft LETOR data. In addition, on shared memory machines, we obtain almost perfect linear speed-up with up to about 48 cores on the large data sets. On distributed memory machines, we get a speedup of 25 with 32 processors. Due to data partitioning our approach can scale to even larger data sets, on which one can reasonably expect even higher speedups.

References

[1]
N. Amado, J. Gama, and F. Silva. Parallel implementation of decision tree learning algorithms. Progress in Artificial Intelligence, pages 34--52, 2001.
[2]
Y. Ben-Haim and E. Yom-Tov. A streaming parallel decision tree algorithm. The Journal of Machine Learning Research, 11:849--872, 2010.
[3]
L. Breiman. Bagging predictors. Machine learning, 24(2):123--140, 1996.
[4]
L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001.
[5]
L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. Chapman & Hall/CRC, 1984.
[6]
C. Burges. From RankNet to LambdaRank to LambdaMART: An Overview. 2010.
[7]
C. Burges, T. Shaked, E. Renshaw, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Internation Conference on Machine Learning, pages 89--96, 2005.
[8]
Z. Cao and T.-Y. Liu. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning, pages 129--136, 2007.
[9]
O. Chapelle and Y. Chang. Yahoo! Learning to Rank Challenge overview. Journal of Machine Learning Research, Workshop and Conference Proceedings, 14:1--24, 2011.
[10]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In Proceeding of the 18th ACM Conference on Information and Knowledge Management, pages 621--630. ACM, 2009.
[11]
O. Chapelle and M. Wu. Gradient descent optimization of smoothed information retrieval metrics. Information Retrieval Journal, Special Issue on Learning to Rank for Information Retrieval, 2010. to appear.
[12]
J. Darlington, Y. Guo, J. Sutiwaraphun, and H. To. Parallel induction algorithms for data mining. Advances in Intelligent Data Analysis Reasoning about Data, pages 437--445, 1997.
[13]
A. Freitas and S. Lavington. Mining very large databases with parallel processing. Springer, 1998.
[14]
J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29:1189--1232, 2001.
[15]
J. Gehrke, R. Ramakrishnan, and V. Ganti. RainForest - a framework for fast decision tree construction of large datasets. Data Mining and Knowledge Discovery, 4(2):127--162, 2000.
[16]
R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression, pages 115--132. MIT Press, Cambridge, MA, 2000.
[17]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4):422--446, 2002.
[18]
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). ACM, 2002.
[19]
A. Lazarevic and Z. Obradovic. Boosting algorithms for parallel and distributed learning. Distributed and Parallel Databases, 11(2):203--229, 2002.
[20]
P. Li, C. J. C. Burges, and Q. Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, NIPS. MIT Press, 2007.
[21]
T. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pages 3--10, 2007.
[22]
A. Mohan, Z. Chen, and K. Q. Weinberger. Web-search ranking with initialized gradient boosted regression trees. Journal of Machine Learning Research, Workshop and Conference Proceedings, 14:77--89, 2011.
[23]
B. Panda, J. Herbach, S. Basu, and R. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. Proceedings of the Very Large Database Endowment, 2(2):1426--1437, 2009.
[24]
D. Pavlov and C. Brunk. Bagboo: Bagging the gradient boosting. Talk at Workshop on Websearch Ranking at the 27th International Conference on Machine Learning, 2010.
[25]
J. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In Proceedings of the International Conference on Very Large Data Bases, pages 544--555, 1996.
[26]
M. Snir. MPI - the Complete Reference: The MPI core. The MIT Press, 1998.
[27]
A. Srivastava, E. Han, V. Kumar, and V. Singh. Parallel formulations of decision-tree classification algorithms. High Performance Data Mining, pages 237--261, 2002.
[28]
M. Taylor, J. Guiver, S. Robertson, and T. Minka. SoftRank: optimizing non-smooth rank metrics. In Proc. 1st ACM Int'l Conf. on Web Search and Data Mining, pages 77--86, 2008.
[29]
N. Uyen and T. Chung. A new framework for distributed boosting algorithm. Future Generation Communication and Networking, 1:420--423, 2007.
[30]
G. Webb. Multiboosting: A technique for combining boosting and wagging. Machine learning, 40(2):159--196, 2000.
[31]
J. Ye, J. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In CIKM '09: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pages 2061--2064. ACM, 2009.
[32]
C. Yu and D. Skillicorn. Parallelizing boosting and bagging. 2001.
[33]
Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In Proc. 30th Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 271--278, 2007.
[34]
Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, and G. Sun. A general boosting method and its application to learning ranking functions for web search. In NIPS, 2007.

Cited By

View all
  • (2025)Decision Tree Clusters: Non-destructive detection of overheating defects in porcelain insulators using quantitative thermal imaging techniquesMeasurement10.1016/j.measurement.2024.115723241(115723)Online publication date: Feb-2025
  • (2024)Machine Learning for Radio Propagation Modeling: A Comprehensive SurveyIEEE Open Journal of the Communications Society10.1109/OJCOMS.2024.34464575(5123-5153)Online publication date: 2024
  • (2024)A physics-informed, data-driven framework for estimation and optimization of two-phase pressure drop of refrigerants in mini- and macro channelsResults in Engineering10.1016/j.rineng.2024.102538(102538)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '11: Proceedings of the 20th international conference on World wide web
March 2011
840 pages
ISBN:9781450306324
DOI:10.1145/1963405
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. boosted regression trees
  2. boosting
  3. distributed computing
  4. machine learning
  5. parallel computing
  6. ranking
  7. web search

Qualifiers

  • Research-article

Conference

WWW '11
WWW '11: 20th International World Wide Web Conference
March 28 - April 1, 2011
Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)4
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Decision Tree Clusters: Non-destructive detection of overheating defects in porcelain insulators using quantitative thermal imaging techniquesMeasurement10.1016/j.measurement.2024.115723241(115723)Online publication date: Feb-2025
  • (2024)Machine Learning for Radio Propagation Modeling: A Comprehensive SurveyIEEE Open Journal of the Communications Society10.1109/OJCOMS.2024.34464575(5123-5153)Online publication date: 2024
  • (2024)A physics-informed, data-driven framework for estimation and optimization of two-phase pressure drop of refrigerants in mini- and macro channelsResults in Engineering10.1016/j.rineng.2024.102538(102538)Online publication date: Jul-2024
  • (2024)Machine Learning Assisted State-of-the-Art-of Petrographic Classification From Geophysical LogsPure and Applied Geophysics10.1007/s00024-024-03563-4Online publication date: 31-Aug-2024
  • (2023)Decision trees: from efficient prediction to responsible AIFrontiers in Artificial Intelligence10.3389/frai.2023.11245536Online publication date: 26-Jul-2023
  • (2023)Privet: A Privacy-Preserving Vertical Federated Learning Service for Gradient Boosted Decision TablesIEEE Transactions on Services Computing10.1109/TSC.2023.327983916:5(3604-3620)Online publication date: Sep-2023
  • (2023)Container Power Consumption Prediction Based on GBRT-PL for Edge Servers in Smart CityIEEE Internet of Things Journal10.1109/JIOT.2023.328136810:21(18799-18807)Online publication date: 1-Nov-2023
  • (2023)A Fast Gradient Boosting Based Approach for Predicting Frags in Tactic Games2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)10.1109/ICMEW59549.2023.00007(6-10)Online publication date: Jul-2023
  • (2023)Novel insights into the modeling financial time-series through machine learning methods: Evidence from the cryptocurrency marketExpert Systems with Applications10.1016/j.eswa.2023.121012234(121012)Online publication date: Dec-2023
  • (2022)CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random FieldsMolecules10.3390/molecules2712371127:12(3711)Online publication date: 9-Jun-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media