Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3209978.3210048acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Selective Gradient Boosting for Effective Learning to Rank

Published: 27 June 2018 Publication History

Abstract

Learning an effective ranking function from a large number of query-document examples is a challenging task. Indeed, training sets where queries are associated with a few relevant documents and a large number of irrelevant ones are required to model real scenarios of Web search production systems, where a query can possibly retrieve thousands of matching documents, but only a few of them are actually relevant. In this paper, we propose Selective Gradient Boosting (SelGB), an algorithm addressing the Learning-to-Rank task by focusing on those irrelevant documents that are most likely to be mis-ranked, thus severely hindering the quality of the learned model. SelGB exploits a novel technique minimizing the mis-ranking risk, i.e., the probability that two randomly drawn instances are ranked incorrectly, within a gradient boosting process that iteratively generates an additive ensemble of decision trees. Specifically, at every iteration and on a per query basis, SelGB selectively chooses among the training instances a small sample of negative examples enhancing the discriminative power of the learned model. Reproducible and comprehensive experiments conducted on a publicly available dataset show that SelGB exploits the diversity and variety of the negative examples selected to train tree ensembles that outperform models generated by state-of-the-art algorithms by achieving improvements of NDCG@10 up to 3.2%.

References

[1]
Javed A. Aslam, Evangelos Kanoulas, Virgil Pavlu, Stefan Savev, and Emine Yilmaz. 2009. Document Selection Methodologies for Efficient and Effective Learning-to-rank. In Proc. SIGIR. ACM, 468--475.
[2]
Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23--581 (2010), 81.
[3]
G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. 2016. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management 52, 6 (2016), 1161 -- 1177.
[4]
Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, and J. Shane Culpepper. 2017. Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval. In Proc. SIGIR. ACM, 445--454.
[5]
Van Dang, Michael Bendersky, and W Bruce Croft. 2013. Two-Stage Learning to Rank for Information Retrieval. In ECIR. Springer, 423--434.
[6]
D. Dato, C. Lucchese, F.M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2016. Fast ranking with additive ensembles of oblivious and nonoblivious regression trees. ACM TOIS 35, 2 (2016).
[7]
Pinar Donmez and Jaime G Carbonell. 2008. Optimizing estimated loss reduction for active sampling in rank learning. In Proc. ICML. ACM, 248--255.
[8]
J. H. Friedman. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 4 (2002), 367--378.
[9]
Muhammad Ibrahim and Mark Carman. 2014. Undersampling Techniques to Rebalance Training Data for Large Scale Learning-to-Rank. Springer International Publishing, Cham, 444--457.
[10]
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20, 4 (Oct. 2002), 422--446.
[11]
Evangelos Kanoulas, Stefan Savev, Pavel Metrikov, Virgil Pavlu, and Javed Aslam. 2011. A Large-scale Study of the Effect of Training Set Characteristics over Learning-to-rank Algorithms. In Proc. SIGIR. ACM, 1243--1244.
[12]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3149--3157.
[13]
Bo Long, Olivier Chapelle, Ya Zhang, Yi Chang, Zhaohui Zheng, and Belle Tseng. 2010. Active Learning for Ranking Through Expected Loss Optimization. In Proc. SIGIR. ACM, 267--274.
[14]
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Trani Salvatore. 2018. X-CLEaVER: Learning Ranking Ensembles by Growing and Pruning Trees. ACM Transactions on Intelligent Systems and Technology (TIST), to appear (2018).
[15]
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Salvatore Trani. Post-Learning Optimization of Tree Ensembles for Efficient Ranking. In Proc. SIGIR. ACM, 949--952.
[16]
Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, and Salvatore Trani. 2017. X-DART: Blending Dropout and Pruning for Efficient Learning to Rank. In Proc. SIGIR. ACM, 1077--1080.
[17]
Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, and Salvatore Trani. 2017. The Impact of Negative Samples on Learning to Rank. In Proc. LEARning Workshop co-located with ACM ICTIR. CEUR-WS.org.
[18]
Craig Macdonald, Rodrygo LT Santos, and Iadh Ounis. 2013. The whens and hows of learning to rank for web search. Information Retrieval 16, 5 (2013), 584--628.
[19]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge University Press.
[20]
M. D. Smucker, J. Allan, and B. Carterette. 2007. A Comparison of Statistical Significance Tests for Information Retrieval Evaluation. In Proc. CIKM. ACM.
[21]
Q. Wu, C.J.C. Burges, K.M. Svore, and J. Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval (2010).
[22]
Emine Yilmaz and Stephen Robertson. 2009. Deep Versus Shallow Judgments in Learning to Rank. In Proc. SIGIR. ACM, 662--663.
[23]
D. Yin, Y. Hu, J. Tang, T. Daly, M. Zhou, H. Ouyang, J. Chen, C. Kang, and others. 2016. Ranking relevance in yahoo search. In Proc. ACM SIGKDD. ACM, 323--332.
[24]
Hwanjo Yu. 2005. SVM Selective Sampling for Ranking with Application to Data Retrieval. In Proc. ACM SIGKDD. ACM, 354--363.
[25]
H. Zaragoza, N. Craswell, M.J. Taylor, S. Saria, and S.E. Robertson. 2004. Microsoft Cambridge at TREC 13: Web and Hard Tracks. In TREC, Vol. 4. 1--1.

Cited By

View all
  • (2024)Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671883(2307-2317)Online publication date: 25-Aug-2024
  • (2024)From Classical to Quantum: Evolution of Information Retrieval SystemsTrends in Sustainable Computing and Machine Intelligence10.1007/978-981-99-9436-6_21(299-312)Online publication date: 9-Mar-2024
  • (2024)An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rankAdvances in Information Retrieval10.1007/978-3-031-56063-7_39(468-476)Online publication date: 23-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
June 2018
1509 pages
ISBN:9781450356572
DOI:10.1145/3209978
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. boosting
  2. learning to rank
  3. multiple additive regression trees

Qualifiers

  • Research-article

Funding Sources

  • Horizon 2020 Framework Programme

Conference

SIGIR '18
Sponsor:

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671883(2307-2317)Online publication date: 25-Aug-2024
  • (2024)From Classical to Quantum: Evolution of Information Retrieval SystemsTrends in Sustainable Computing and Machine Intelligence10.1007/978-981-99-9436-6_21(299-312)Online publication date: 9-Mar-2024
  • (2024)An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rankAdvances in Information Retrieval10.1007/978-3-031-56063-7_39(468-476)Online publication date: 23-Mar-2024
  • (2023)LambdaRank Gradients are IncoherentProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614948(1777-1786)Online publication date: 21-Oct-2023
  • (2023)On the Effect of Low-Ranked Documents: A New Sampling Function for Selective Gradient BoostingProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577597(646-652)Online publication date: 27-Mar-2023
  • (2022)Filtering out Outliers in Learning to RankProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545127(214-222)Online publication date: 23-Aug-2022
  • (2022)The Istella22 DatasetProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531740(3099-3107)Online publication date: 6-Jul-2022
  • (2022)ReNeuIR: Reaching Efficiency in Neural Information RetrievalProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531704(3462-3465)Online publication date: 6-Jul-2022
  • (2022)Provable randomized rounding for minimum-similarity diversificationData Mining and Knowledge Discovery10.1007/s10618-021-00811-236:2(709-738)Online publication date: 4-Jan-2022
  • (2022)Context-aware ranking refinement with attentive semi-supervised autoencodersSoft Computing10.1007/s00500-022-07433-w26:24(13941-13952)Online publication date: 25-Aug-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media