research-article

Selective Gradient Boosting for Effective Learning to Rank

Authors:

Claudio Lucchese,

Franco Maria Nardini,

Raffaele Perego,

Salvatore Orlando,

Salvatore TraniAuthors Info & Claims

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 155 - 164

https://doi.org/10.1145/3209978.3210048

Published: 27 June 2018 Publication History

Abstract

Learning an effective ranking function from a large number of query-document examples is a challenging task. Indeed, training sets where queries are associated with a few relevant documents and a large number of irrelevant ones are required to model real scenarios of Web search production systems, where a query can possibly retrieve thousands of matching documents, but only a few of them are actually relevant. In this paper, we propose Selective Gradient Boosting (SelGB), an algorithm addressing the Learning-to-Rank task by focusing on those irrelevant documents that are most likely to be mis-ranked, thus severely hindering the quality of the learned model. SelGB exploits a novel technique minimizing the mis-ranking risk, i.e., the probability that two randomly drawn instances are ranked incorrectly, within a gradient boosting process that iteratively generates an additive ensemble of decision trees. Specifically, at every iteration and on a per query basis, SelGB selectively chooses among the training instances a small sample of negative examples enhancing the discriminative power of the learned model. Reproducible and comprehensive experiments conducted on a publicly available dataset show that SelGB exploits the diversity and variety of the negative examples selected to train tree ensembles that outperform models generated by state-of-the-art algorithms by achieving improvements of NDCG@10 up to 3.2%.

References

[1]

Javed A. Aslam, Evangelos Kanoulas, Virgil Pavlu, Stefan Savev, and Emine Yilmaz. 2009. Document Selection Methodologies for Efficient and Effective Learning-to-rank. In Proc. SIGIR. ACM, 468--475.

Digital Library

[2]

Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23--581 (2010), 81.

[3]

G. Capannini, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, and N. Tonellotto. 2016. Quality versus efficiency in document scoring with learning-to-rank models. Information Processing & Management 52, 6 (2016), 1161 -- 1177.

Digital Library

[4]

Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, and J. Shane Culpepper. 2017. Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval. In Proc. SIGIR. ACM, 445--454.

Digital Library

[5]

Van Dang, Michael Bendersky, and W Bruce Croft. 2013. Two-Stage Learning to Rank for Information Retrieval. In ECIR. Springer, 423--434.

Digital Library

[6]

D. Dato, C. Lucchese, F.M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2016. Fast ranking with additive ensembles of oblivious and nonoblivious regression trees. ACM TOIS 35, 2 (2016).

Digital Library

[7]

Pinar Donmez and Jaime G Carbonell. 2008. Optimizing estimated loss reduction for active sampling in rank learning. In Proc. ICML. ACM, 248--255.

Digital Library

[8]

J. H. Friedman. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 4 (2002), 367--378.

Digital Library

[9]

Muhammad Ibrahim and Mark Carman. 2014. Undersampling Techniques to Rebalance Training Data for Large Scale Learning-to-Rank. Springer International Publishing, Cham, 444--457.

[10]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Trans. Inf. Syst. 20, 4 (Oct. 2002), 422--446.

Digital Library

[11]

Evangelos Kanoulas, Stefan Savev, Pavel Metrikov, Virgil Pavlu, and Javed Aslam. 2011. A Large-scale Study of the Effect of Training Set Characteristics over Learning-to-rank Algorithms. In Proc. SIGIR. ACM, 1243--1244.

Digital Library

[12]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3149--3157.

[13]

Bo Long, Olivier Chapelle, Ya Zhang, Yi Chang, Zhaohui Zheng, and Belle Tseng. 2010. Active Learning for Ranking Through Expected Loss Optimization. In Proc. SIGIR. ACM, 267--274.

Digital Library

[14]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Trani Salvatore. 2018. X-CLEaVER: Learning Ranking Ensembles by Growing and Pruning Trees. ACM Transactions on Intelligent Systems and Technology (TIST), to appear (2018).

[15]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Salvatore Trani. Post-Learning Optimization of Tree Ensembles for Efficient Ranking. In Proc. SIGIR. ACM, 949--952.

Digital Library

[16]

Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, and Salvatore Trani. 2017. X-DART: Blending Dropout and Pruning for Efficient Learning to Rank. In Proc. SIGIR. ACM, 1077--1080.

Digital Library

[17]

Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, and Salvatore Trani. 2017. The Impact of Negative Samples on Learning to Rank. In Proc. LEARning Workshop co-located with ACM ICTIR. CEUR-WS.org.

[18]

Craig Macdonald, Rodrygo LT Santos, and Iadh Ounis. 2013. The whens and hows of learning to rank for web search. Information Retrieval 16, 5 (2013), 584--628.

Digital Library

[19]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to information retrieval. Cambridge University Press.

[20]

M. D. Smucker, J. Allan, and B. Carterette. 2007. A Comparison of Statistical Significance Tests for Information Retrieval Evaluation. In Proc. CIKM. ACM.

Digital Library

[21]

Q. Wu, C.J.C. Burges, K.M. Svore, and J. Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval (2010).

Digital Library

[22]

Emine Yilmaz and Stephen Robertson. 2009. Deep Versus Shallow Judgments in Learning to Rank. In Proc. SIGIR. ACM, 662--663.

Digital Library

[23]

D. Yin, Y. Hu, J. Tang, T. Daly, M. Zhou, H. Ouyang, J. Chen, C. Kang, and others. 2016. Ranking relevance in yahoo search. In Proc. ACM SIGKDD. ACM, 323--332.

Digital Library

[24]

Hwanjo Yu. 2005. SVM Selective Sampling for Ranking with Application to Data Retrieval. In Proc. ACM SIGKDD. ACM, 354--363.

Digital Library

[25]

H. Zaragoza, N. Craswell, M.J. Taylor, S. Saria, and S.E. Robertson. 2004. Microsoft Cambridge at TREC 13: Web and Hard Tracks. In TREC, Vol. 4. 1--1.

Cited By

Oosterhuis HJagerman RQin ZWang XBendersky MBaeza-Yates RBonchi F(2024)Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671883(2307-2317)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671883
Mehta MD’souza JKaria MKadam VLad MTherese S(2024)From Classical to Quantum: Evolution of Information Retrieval SystemsTrends in Sustainable Computing and Machine Intelligence10.1007/978-981-99-9436-6_21(299-312)Online publication date: 9-Mar-2024
https://doi.org/10.1007/978-981-99-9436-6_21
Tan HYang KYu H(2024)An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rankAdvances in Information Retrieval10.1007/978-3-031-56063-7_39(468-476)Online publication date: 23-Mar-2024
https://doi.org/10.1007/978-3-031-56063-7_39
Show More Cited By

Index Terms

Selective Gradient Boosting for Effective Learning to Rank
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness
    2. Retrieval models and ranking
      1. Learning to rank

Recommendations

Learning to rank with groups
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

An essential issue in document retrieval is ranking, and the documents are ranked by their expected relevance to a given query. Multiple labels are used to represent different level of relevance for documents to a given query, and the corresponding ...
Compression-Based Selective Sampling for Learning to Rank
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or ...
Learning to rank with partially-labeled data
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Ranking algorithms, whose goal is to appropriately order a set of objects/documents, are an important component of information retrieval systems. Previous work on ranking algorithms has focused on cases where only labeled data is available for training (...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

June 2018

1509 pages

ISBN:9781450356572

DOI:10.1145/3209978

General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Horizon 2020 Framework Programme

Conference

SIGIR '18

Sponsor:

SIGIR

SIGIR '18: The 41st International ACM SIGIR conference on research and development in Information Retrieval

July 8 - 12, 2018

MI, Ann Arbor, USA

Acceptance Rates

SIGIR '18 Paper Acceptance Rate 86 of 409 submissions, 21%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
492
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Oosterhuis HJagerman RQin ZWang XBendersky MBaeza-Yates RBonchi F(2024)Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671883(2307-2317)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671883
Mehta MD’souza JKaria MKadam VLad MTherese S(2024)From Classical to Quantum: Evolution of Information Retrieval SystemsTrends in Sustainable Computing and Machine Intelligence10.1007/978-981-99-9436-6_21(299-312)Online publication date: 9-Mar-2024
https://doi.org/10.1007/978-981-99-9436-6_21
Tan HYang KYu H(2024)An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rankAdvances in Information Retrieval10.1007/978-3-031-56063-7_39(468-476)Online publication date: 23-Mar-2024
https://doi.org/10.1007/978-3-031-56063-7_39
Marcuzzi FLucchese COrlando SFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)LambdaRank Gradients are IncoherentProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614948(1777-1786)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614948
Lucchese CMarcuzzi FOrlando SHong JLanperne MPark JCerny TShahriar H(2023)On the Effect of Low-Ranked Documents: A New Sampling Function for Selective Gradient BoostingProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577597(646-652)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3555776.3577597
Marcuzzi FLucchese COrlando SCrestani FPasi GGaussier E(2022)Filtering out Outliers in Learning to RankProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545127(214-222)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3539813.3545127
Dato DMacAvaney SNardini FPerego RTonellotto NAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)The Istella22 DatasetProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531740(3099-3107)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531740
Bruch SLucchese CNardini FAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)ReNeuIR: Reaching Efficiency in Neural Information RetrievalProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531704(3462-3465)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531704
Ordozgoiti BMahadevan AMatakos AGionis A(2022)Provable randomized rounding for minimum-similarity diversificationData Mining and Knowledge Discovery10.1007/s10618-021-00811-236:2(709-738)Online publication date: 4-Jan-2022
https://doi.org/10.1007/s10618-021-00811-2
Xu BLin HLin YXu K(2022)Context-aware ranking refinement with attentive semi-supervised autoencodersSoft Computing10.1007/s00500-022-07433-w26:24(13941-13952)Online publication date: 25-Aug-2022
https://doi.org/10.1007/s00500-022-07433-w
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents