Article

AdaRank: a boosting algorithm for information retrieval

Authors:

Hang LiAuthors Info & Claims

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 391 - 398

https://doi.org/10.1145/1277741.1277809

Published: 23 July 2007 Publication History

Abstract

In this paper we address the issue of learning to rank for document retrieval. In the task, a model is automatically created with some training data and then is utilized for ranking of documents. The goodness of a model is usually evaluated with performance measures such as MAP (Mean Average Precision) and NDCG (Normalized Discounted Cumulative Gain). Ideally a learning algorithm would train a ranking model that could directly optimize the performance measures with respect to the training data. Existing methods, however, are only able to train ranking models by minimizing loss functions loosely related to the performance measures. For example, Ranking SVM and RankBoost train ranking models by minimizing classification errors on instance pairs. To deal with the problem, we propose a novel learning algorithm within the framework of boosting, which can minimize a loss function directly defined on the performance measures. Our algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions. We prove that the training process of AdaRank is exactly that of enhancing the performance measure used. Experimental results on four benchmark datasets show that AdaRank significantly outperforms the baseline methods of BM25, Ranking SVM, and RankBoost.

References

[1]

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, May 1999.

Digital Library

[2]

C. Burges, R. Ragno, and Q. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems 18, pages 95--402. MIT Press, Cambridge, MA, 2006.

[3]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML 22, pages 89--96, 2005.

Digital Library

[4]

Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking SVM to document retrieval. In SIGIR 29, pages 186--193, 2006.

Digital Library

[5]

D. Cossock and T. Zhang. Subset ranking using regression. In COLT, pages 605--619, 2006.

Digital Library

[6]

N. Craswell, D. Hawking, R. Wilkinson, and M. Wu. Overview of the TREC 2003 web track. In TREC, pages 78--92, 2003.

[7]

N. Duffy and D. Helmbold. Boosting methods for regression. Mach. Learn., 47(2-3):153--200, 2002.

Digital Library

[8]

Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933--969, 2003.

Digital Library

[9]

Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119--139, 1997.

Digital Library

[10]

J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2):337--374, 2000.

[11]

G. Fung, R. Rosales, and B. Krishnapuram. Learning rankings via convex hull separation. In Advances in Neural Information Processing Systems 18, pages 395--402. MIT Press, Cambridge, MA, 2006.

Digital Library

[12]

T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, August 2001.

[13]

R. Herbrich, T. Graepel, and K. Obermayer. Large Margin rank boundaries for ordinal regression. MIT Press, Cambridge, MA, 2000.

[14]

W. Hersh, C. Buckley, T. J. Leone, and D. Hickam. Ohsumed: an interactive retrieval evaluation and new large test collection for research. In SIGIR, pages 192--201, 1994.

Digital Library

[15]

K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR 23, pages 41--48, 2000.

Digital Library

[16]

T. Joachims. Optimizing search engines using clickthrough data. In SIGKDD 8, pages 133--142, 2002.

Digital Library

[17]

T. Joachims. A support vector method for multivariate performance measures. In ICML 22, pages 377--384, 2005.

Digital Library

[18]

J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR 24, pages 111--119, 2001.

Digital Library

[19]

D. A. Metzler, W. B. Croft, and A. McCallum. Direct maximization of rank-based metrics for information retrieval. Technical report, CIIR, 2005.

[20]

R. Nallapati. Discriminative models for information retrieval. In SIGIR 27, pages 64--71, 2004.

Digital Library

[21]

L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.

[22]

J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR 21, pages 275--281, 1998.

Digital Library

[23]

T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In SIGIR 28, pages 408--415, 2005.

Digital Library

[24]

S. E. Robertson and D. A. Hull. The TREC-9 filtering track final report. In TREC, pages 25--40, 2000.

[25]

R. E. Schapire, Y. Freund, P. Barlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. In ICML 14, pages 322--330, 1997.

Digital Library

[26]

R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Mach. Learn., 37(3):297--336, 1999.

Digital Library

[27]

R. Song, J. Wen, S. Shi, G. Xin, T. yan Liu, T. Qin, X. Zheng, J. Zhang, G. Xue, and W.-Y. Ma. Microsoft Research Asia at web track and terabyte track of TREC 2004. In TREC, 2004.

[28]

A. Trotman. Learning to rank. Inf. Retr., 8(3):359--381, 2005.

Digital Library

[29]

J. Xu, Y. Cao, H. Li, and Y. Huang. Cost-sensitive learning of SVM for ranking. In ECML, pages 833--840, 2006.

Digital Library

[30]

G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu, and Z. Chen. Exploiting the hierarchical structure for link analysis. In SIGIR 28, pages 186--193, 2005.

Digital Library

[31]

H. Yu. SVM selective sampling for ranking with application to data retrieval. In SIGKDD 11, pages 354--363, 2005.

Digital Library

Cited By

Qiu ZHu QZhong YTu WZhang LYang T(2025)Optimal large-scale stochastic optimization of NDCG surrogates for deep learningMachine Learning10.1007/s10994-024-06631-x114:2Online publication date: 27-Jan-2025
https://doi.org/10.1007/s10994-024-06631-x
Devic SKorolova AKempe DSharan VSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Stability and multigroup fairness in ranking with uncertain predictionsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692494(10661-10686)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692494
Choi HJung SAhn HMoon TSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Listwise reward estimation for offline preference-based reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692415(8651-8671)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692415
Show More Cited By

Index Terms

AdaRank: a boosting algorithm for information retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Learning to rank with partially-labeled data
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Ranking algorithms, whose goal is to appropriately order a set of objects/documents, are an important component of information retrieval systems. Previous work on ranking algorithms has focused on cases where only labeled data is available for training (...
Learning to rank with groups
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

An essential issue in document retrieval is ranking, and the documents are ranked by their expected relevance to a given query. Multiple labels are used to represent different level of relevance for documents to a given query, and the corresponding ...
On Application of Learning to Rank for E-Commerce Search
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

E-Commerce (E-Com) search is an emerging important new application of information retrieval. Learning to Rank (LETOR) is a general effective strategy for optimizing search engines, and is thus also a key technology for E-Com search. While the use of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

July 2007

946 pages

ISBN:9781595935977

DOI:10.1145/1277741

General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR07

Sponsor:

SIGIR07: The 30th Annual International SIGIR Conference

July 23 - 27, 2007

Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

547
Total Citations
View Citations
3,106
Total Downloads

Downloads (Last 12 months)100
Downloads (Last 6 weeks)6

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qiu ZHu QZhong YTu WZhang LYang T(2025)Optimal large-scale stochastic optimization of NDCG surrogates for deep learningMachine Learning10.1007/s10994-024-06631-x114:2Online publication date: 27-Jan-2025
https://doi.org/10.1007/s10994-024-06631-x
Devic SKorolova AKempe DSharan VSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Stability and multigroup fairness in ranking with uncertain predictionsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692494(10661-10686)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692494
Choi HJung SAhn HMoon TSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Listwise reward estimation for offline preference-based reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692415(8651-8671)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692415
Yang HGonçalves T(2024)Improving Consumer Health Search with Field-Level Learning-to-Rank TechniquesInformation10.3390/info1511069515:11(695)Online publication date: 3-Nov-2024
https://doi.org/10.3390/info15110695
Yang LXu JZhang HWu FLyu JLi YBacchelli AFilkov VRay BZhou M(2024)GPP: A Graph-Powered Prioritizer for Code Review RequestsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694990(104-116)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3694990
Xi YLiu WDai XTang RLiu QZhang WYu Y(2024)Utility-Oriented Reranking with Counterfactual ContextACM Transactions on Knowledge Discovery from Data10.1145/367100418:8(1-22)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3671004
Zhang CChen WZhang WXu M(2024)Mitigating the Impact of Inaccurate Feedback in Dynamic Learning-to-Rank: A Study of Overlooked Interesting ItemsACM Transactions on Intelligent Systems and Technology10.1145/365398316:1(1-26)Online publication date: 26-Dec-2024
https://dl.acm.org/doi/10.1145/3653983
Dai SZhou YPang LLiu WHu XLiu YZhang XWang GXu JBaeza-Yates RBonchi F(2024)Neural Retrievers are Biased Towards LLM-Generated ContentProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671882(526-537)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671882
He BWeng YTang XCui ZSun ZChen LHe XMa CBaeza-Yates RBonchi F(2024)Rankability-enhanced Revenue Uplift Modeling Framework for Online MarketingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671516(5093-5104)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671516
Vecchiato TLucchese CNardini FBruch SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor SearchProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657931(2261-2265)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657931
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten