Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

LETOR: A benchmark collection for research on learning to rank for information retrieval

Published: 01 August 2010 Publication History

Abstract

LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation. We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting research projects on learning to rank and related topics.

References

[1]
Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.
[2]
Baeza-Yates R. and Ribeiro-Neto B. Modern information retrieval 1999 New York Addison Wesley
[3]
Brin S. and Page L. The anatomy of a large-scale hypertextual web search engine Computer Networks and ISDN Systems 1998 30 1–7 107-117
[4]
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., et al. (2005). Learning to rank using gradient descent. In ICML ’05: Proceedings of the 22nd International Conference on Machine Learning (pp. 89–96). New York, NY: ACM Press.
[5]
Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., & Hon, H.-W. (2006). Adapting ranking svm to document retrieval. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 186–193). New York, NY: ACM Press.
[6]
Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., et al. (2007). Learning to rank: From pairwise approach to listwise approach. In ICML ’07: Proceedings of the 24th International Conference on Machine Learning (pp. 129–136). New York, NY: ACM Press.
[7]
Chechik G., Heitz G., Elidan G., Abbeel P., and Koller D. Max-margin classification of data with absent features Journal of Machine Learning Research 2008 9 1-21
[8]
Chirita, P., Diederich, J., & Nejdl, W. (2005). MailRank: Using ranking for spam detection. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (pp. 373–380). New York, NY: ACM.
[9]
Collins, M. (2002). Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, July (pp. 07–12).
[10]
Craswell, N., & Hawking, D. (2004). Overview of the TREC 2004 Web track. In Proceedings of the 13th Text Retrieval Conference (TREC 2004). Gaithersburg, MD: NIST.
[11]
Craswell, N., Hawking, D., Wilkinson, R., & Wu, M. (2003). Overview of the TREC 2003 web track. In Proceedings of TREC 2003 (pp. 78–92).
[12]
Dave, K., Lawrence, S., & Pennock, D. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide Web (pp. 519–528). New York, NY: ACM Press.
[13]
Freund Y., Iyer R., Schapire R. E., and Singer Y. An efficient boosting algorithm for combining preferences Journal of Machine Learning Research 2003 4 933-969
[14]
Geng, X., Liu, T.-Y., Qin, T., Arnold, A., Li, H., & Shum, H.-Y. (2008). Query dependent ranking using k-nearest neighbor. In SIGIR ’08: Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 115–122). New York, NY: ACM.
[15]
Gyöngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30 (pp. 576–587). VLDB Endowment.
[16]
Harrington, E. F. (2003). Online ranking/collaborative filtering using the perceptron algorithm. In Proceedings of the 20th International Conference on Machine Learning (pp. 250–257).
[17]
Herbrich, R., Graepel, T., & Obermayer, K. (1999). Support vector learning for ordinal regression. In ICANN1999 (pp. 97–102).
[18]
Hersh, W., Buckley, C., Leone, T. J., & Hickam, D. (1994). Ohsumed: An interactive retrieval evaluation and new large test collection for research. In SIGIR ’94 (pp. 192–201). New York, NY: Springer.
[19]
Hu, Y., Xin, G., Song, R., Hu, G., Shi, S., Cao, Y., et al. (2005). Title extraction from bodies of html documents and its application to web page retrieval. In SIGIR ’05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 250–257). New York, NY: ACM.
[20]
Huang J. C. and Frey B. J. Koller D., Schuurmans D., Bengio Y., and Bottou L. Structured ranking learning using cumulative distribution networks Advances in neural information processing systems 21 2009 Cambridge MIT Press
[21]
Järvelin K. and Kekäläinen J. Cumulated gain-based evaluation of IR techniques ACM Transactions on Information Systems 2002 20 4 422-446
[22]
Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD ’02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 133–142). New York, NY: ACM Press.
[23]
Lewis D., Yang Y., Rose T., and Li F. RCV1: A new benchmark collection for text categorization research The Journal of Machine Learning Research 2004 5 361-397
[24]
Li, L., & Lin, H.-T. (2006). Ordinal regression by extended binary classification. In NIPS (pp. 865–872).
[25]
Li, P., Burges, C., & Wu, Q. (2008). Mcrank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 20 (pp. 897–904). Cambridge, MA: MIT Press.
[26]
Matveeva, I., Burges, C., Burkard, T., Laucius, A., & Wong, L. (2006). High accuracy retrieval with multiple nested ranker. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 437–444). New York, NY: ACM.
[27]
Minka, T., & Robertson, S. (2008). Selection bias in the LETOR datasets. In Proceedings of SIGIR 2008 Workshop on Learning to Rank for Information Retrieval.
[28]
Nie, L., Davison, B. D., & Qi, X. (2006). Topical link analysis for web search. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 91–98). New York, NY: ACM.
[29]
Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115–124). NJ: Association for Computational Linguistics Morristown.
[30]
Qin, T., Liu, T., Xu, J., & Li, H. (2008a). How to make LETOR more useful and reliable. In Proceedings of SIGIR 2008 Workshop on Learning to Rank for Information Retrieval.
[31]
Qin, T., Liu, T.-Y., & Li, H. (2008b). A general approximation framework for direct optimization of information retrieval measures. Technical Report MSR-TR-2008-164, Microsoft Corporation.
[32]
Qin, T., Liu, T.-Y., Zhang, X.-D., Chen, Z., & Ma, W.-Y. (2005). A study of relevance propagation for web search. In SIGIR ’05 (pp. 408–415). New York, NY: ACM Press.
[33]
Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., & Li, H. (2008c). Global ranking using continuous conditional random fields. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), NIPS. MIT Press.
[34]
Qin, T., Liu, T.-Y., Zhang, X.-D., Wang, D.-S., Xiong, W.-Y., & Li, H. (2008d). Learning to rank relational objects and its application to web search. In WWW ’08: Proceeding of the 17th International Conference on World Wide Web (pp. 407–416). New York, NY: ACM.
[35]
Qin T., Zhang X.-D., Tsai M.-F., Wang D.-S., Liu T.-Y., and Li H. Query-level loss functions for information retrieval Information Processing & Management 2008 44 2 838-855
[36]
Qin, T., Zhang, X.-D., Wang, D.-S., Liu, T.-Y., Lai, W., & Li, H. (2007). Ranking with multiple hyperplanes. In SIGIR ’07 (pp. 279–286). New York, NY: ACM Press.
[37]
Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple bm25 extension to multiple weighted fields. In CIKM ’04: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management (pp. 42–49). New York, NY: ACM.
[38]
Robertson S. E., & Hull, D. A. (2000). The TREC-9 filtering track final report. In TREC (pp. 25–40).
[39]
Shakery, A., & Zhai, C. (2003). Relevance propagation for topic distillation UIUC TREC-2003 Web track experiments. In Proceedings of TREC (pp. 673–677).
[40]
Taylor, M., Guiver, J., Robertson, S., & Minka, T. (2008). Softrank: Optimizing non-smooth rank metrics. In WSDM ’08: Proceedings of the International Conference on Web Search and Web Data Mining (pp. 77–86). New York, NY: ACM.
[42]
Tsai, M.-F., Liu, T.-Y., Qin, T., Chen, H.-H., & Ma, W.-Y. (2007). Frank: A ranking method with fidelity loss. In SIGIR ’07 (pp. 383–390). New York, NY: ACM Press
[43]
Volkovs, M. N., & Zemel, R. S. (2009). Boltzrank: Learning to maximize expected ranking gain. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1089–1096). New York, NY: ACM
[44]
Voorhees E. and Harman D. TREC: Experiment and evaluation in information retrieval 2005 Cambridge, MA MIT Press
[45]
Xia, F., Liu, T.-Y., Wang, J., Zhang, W., & Li, H. (2008). Listwise approach to learning to rank—Theory and algorithm. In ICML ’08: Proceedings of the 25th International Conference on Machine Learning. New York, NY: ACM Press.
[46]
Xu, J., Cao, Y., Li, H., & Zhao, M. (2005). Ranking definitions with supervised learning methods. In International World Wide Web Conference (pp. 811–819). New York, NY: ACM Press.
[47]
Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 391–398). New York, NY: ACM Press.
[48]
Xu, J., Liu, T.-Y., Lu, M., Li, H., & Ma, W.-Y. (2008). Directly optimizing evaluation measures in learning to rank. In SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 107–114). New York, NY: ACM.
[49]
Xue, G.-R., Yang, Q., Zeng, H.-J., Yu, Y., & Chen, Z. (2005). Exploiting the hierarchical structure for link analysis. In SIGIR ’05: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 186–193). New York, NY: ACM.
[50]
Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In SIGIR ’07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 271–278). New York, NY: ACM Press
[51]
Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 334–342). New York, NY: ACM Press.
[52]
Zhai, C. X., Cohen, W. W., & Lafferty, J. (2003). Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (pp. 10–17). New York, NY: ACM Press.

Cited By

View all
  • (2024)Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671883(2307-2317)Online publication date: 25-Aug-2024
  • (2024)Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes ApproachProceedings of the ACM Web Conference 202410.1145/3589334.3645487(1486-1496)Online publication date: 13-May-2024
  • (2024)Whole Page Unbiased Learning to RankProceedings of the ACM Web Conference 202410.1145/3589334.3645474(1431-1440)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. LETOR: A benchmark collection for research on learning to rank for information retrieval
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Information Retrieval
        Information Retrieval  Volume 13, Issue 4
        Aug 2010
        92 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 01 August 2010
        Accepted: 01 December 2009
        Received: 29 April 2009

        Author Tags

        1. Learning to rank
        2. Information retrieval
        3. Benchmark datasets
        4. Feature extraction

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671883(2307-2317)Online publication date: 25-Aug-2024
        • (2024)Mitigating Exploitation Bias in Learning to Rank with an Uncertainty-aware Empirical Bayes ApproachProceedings of the ACM Web Conference 202410.1145/3589334.3645487(1486-1496)Online publication date: 13-May-2024
        • (2024)Whole Page Unbiased Learning to RankProceedings of the ACM Web Conference 202410.1145/3589334.3645474(1431-1440)Online publication date: 13-May-2024
        • (2024)Top-Personalized-K RecommendationProceedings of the ACM Web Conference 202410.1145/3589334.3645417(3388-3399)Online publication date: 13-May-2024
        • (2024)List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented GenerationProceedings of the ACM Web Conference 202410.1145/3589334.3645336(1330-1340)Online publication date: 13-May-2024
        • (2024)A set of novel HTML document quality features for Web information retrievalExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.123177246:COnline publication date: 15-Jul-2024
        • (2024)ITNRComputers in Biology and Medicine10.1016/j.compbiomed.2024.108312172:COnline publication date: 2-Jul-2024
        • (2024)Ranking Enhanced Supervised Contrastive Learning for RegressionAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2253-2_2(15-27)Online publication date: 7-May-2024
        • (2024)Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank?Advances in Information Retrieval10.1007/978-3-031-56066-8_29(384-402)Online publication date: 24-Mar-2024
        • (2024)An In-Depth Comparison of Neural and Probabilistic Tree Models for Learning-to-rankAdvances in Information Retrieval10.1007/978-3-031-56063-7_39(468-476)Online publication date: 24-Mar-2024
        • Show More Cited By

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media