Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1277741.1277809acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

AdaRank: a boosting algorithm for information retrieval

Published: 23 July 2007 Publication History

Abstract

In this paper we address the issue of learning to rank for document retrieval. In the task, a model is automatically created with some training data and then is utilized for ranking of documents. The goodness of a model is usually evaluated with performance measures such as MAP (Mean Average Precision) and NDCG (Normalized Discounted Cumulative Gain). Ideally a learning algorithm would train a ranking model that could directly optimize the performance measures with respect to the training data. Existing methods, however, are only able to train ranking models by minimizing loss functions loosely related to the performance measures. For example, Ranking SVM and RankBoost train ranking models by minimizing classification errors on instance pairs. To deal with the problem, we propose a novel learning algorithm within the framework of boosting, which can minimize a loss function directly defined on the performance measures. Our algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions. We prove that the training process of AdaRank is exactly that of enhancing the performance measure used. Experimental results on four benchmark datasets show that AdaRank significantly outperforms the baseline methods of BM25, Ranking SVM, and RankBoost.

References

[1]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, May 1999.
[2]
C. Burges, R. Ragno, and Q. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems 18, pages 95--402. MIT Press, Cambridge, MA, 2006.
[3]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML 22, pages 89--96, 2005.
[4]
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking SVM to document retrieval. In SIGIR 29, pages 186--193, 2006.
[5]
D. Cossock and T. Zhang. Subset ranking using regression. In COLT, pages 605--619, 2006.
[6]
N. Craswell, D. Hawking, R. Wilkinson, and M. Wu. Overview of the TREC 2003 web track. In TREC, pages 78--92, 2003.
[7]
N. Duffy and D. Helmbold. Boosting methods for regression. Mach. Learn., 47(2-3):153--200, 2002.
[8]
Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933--969, 2003.
[9]
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119--139, 1997.
[10]
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2):337--374, 2000.
[11]
G. Fung, R. Rosales, and B. Krishnapuram. Learning rankings via convex hull separation. In Advances in Neural Information Processing Systems 18, pages 395--402. MIT Press, Cambridge, MA, 2006.
[12]
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, August 2001.
[13]
R. Herbrich, T. Graepel, and K. Obermayer. Large Margin rank boundaries for ordinal regression. MIT Press, Cambridge, MA, 2000.
[14]
W. Hersh, C. Buckley, T. J. Leone, and D. Hickam. Ohsumed: an interactive retrieval evaluation and new large test collection for research. In SIGIR, pages 192--201, 1994.
[15]
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR 23, pages 41--48, 2000.
[16]
T. Joachims. Optimizing search engines using clickthrough data. In SIGKDD 8, pages 133--142, 2002.
[17]
T. Joachims. A support vector method for multivariate performance measures. In ICML 22, pages 377--384, 2005.
[18]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR 24, pages 111--119, 2001.
[19]
D. A. Metzler, W. B. Croft, and A. McCallum. Direct maximization of rank-based metrics for information retrieval. Technical report, CIIR, 2005.
[20]
R. Nallapati. Discriminative models for information retrieval. In SIGIR 27, pages 64--71, 2004.
[21]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[22]
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR 21, pages 275--281, 1998.
[23]
T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In SIGIR 28, pages 408--415, 2005.
[24]
S. E. Robertson and D. A. Hull. The TREC-9 filtering track final report. In TREC, pages 25--40, 2000.
[25]
R. E. Schapire, Y. Freund, P. Barlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. In ICML 14, pages 322--330, 1997.
[26]
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Mach. Learn., 37(3):297--336, 1999.
[27]
R. Song, J. Wen, S. Shi, G. Xin, T. yan Liu, T. Qin, X. Zheng, J. Zhang, G. Xue, and W.-Y. Ma. Microsoft Research Asia at web track and terabyte track of TREC 2004. In TREC, 2004.
[28]
A. Trotman. Learning to rank. Inf. Retr., 8(3):359--381, 2005.
[29]
J. Xu, Y. Cao, H. Li, and Y. Huang. Cost-sensitive learning of SVM for ranking. In ECML, pages 833--840, 2006.
[30]
G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu, and Z. Chen. Exploiting the hierarchical structure for link analysis. In SIGIR 28, pages 186--193, 2005.
[31]
H. Yu. SVM selective sampling for ranking with application to data retrieval. In SIGKDD 11, pages 354--363, 2005.

Cited By

View all
  • (2025)Optimal large-scale stochastic optimization of NDCG surrogates for deep learningMachine Learning10.1007/s10994-024-06631-x114:2Online publication date: 27-Jan-2025
  • (2024)Stability and multigroup fairness in ranking with uncertain predictionsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692494(10661-10686)Online publication date: 21-Jul-2024
  • (2024)Listwise reward estimation for offline preference-based reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692415(8651-8671)Online publication date: 21-Jul-2024
  • Show More Cited By

Index Terms

  1. AdaRank: a boosting algorithm for information retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
    July 2007
    946 pages
    ISBN:9781595935977
    DOI:10.1145/1277741
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 July 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. boosting
    2. information retrieval
    3. learning to rank

    Qualifiers

    • Article

    Conference

    SIGIR07
    Sponsor:
    SIGIR07: The 30th Annual International SIGIR Conference
    July 23 - 27, 2007
    Amsterdam, The Netherlands

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)100
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Optimal large-scale stochastic optimization of NDCG surrogates for deep learningMachine Learning10.1007/s10994-024-06631-x114:2Online publication date: 27-Jan-2025
    • (2024)Stability and multigroup fairness in ranking with uncertain predictionsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692494(10661-10686)Online publication date: 21-Jul-2024
    • (2024)Listwise reward estimation for offline preference-based reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692415(8651-8671)Online publication date: 21-Jul-2024
    • (2024)Improving Consumer Health Search with Field-Level Learning-to-Rank TechniquesInformation10.3390/info1511069515:11(695)Online publication date: 3-Nov-2024
    • (2024)GPP: A Graph-Powered Prioritizer for Code Review RequestsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3694990(104-116)Online publication date: 27-Oct-2024
    • (2024)Utility-Oriented Reranking with Counterfactual ContextACM Transactions on Knowledge Discovery from Data10.1145/367100418:8(1-22)Online publication date: 4-Jun-2024
    • (2024)Mitigating the Impact of Inaccurate Feedback in Dynamic Learning-to-Rank: A Study of Overlooked Interesting ItemsACM Transactions on Intelligent Systems and Technology10.1145/365398316:1(1-26)Online publication date: 26-Dec-2024
    • (2024)Neural Retrievers are Biased Towards LLM-Generated ContentProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671882(526-537)Online publication date: 25-Aug-2024
    • (2024)Rankability-enhanced Revenue Uplift Modeling Framework for Online MarketingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671516(5093-5104)Online publication date: 25-Aug-2024
    • (2024)A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor SearchProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657931(2261-2265)Online publication date: 10-Jul-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media