Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2600428.2609542acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

What makes data robust: a data analysis in learning to rank

Published: 03 July 2014 Publication History

Abstract

When applying learning to rank algorithms in real search applications, noise in human labeled training data becomes an inevitable problem which will affect the performance of the algorithms. Previous work mainly focused on studying how noise affects ranking algorithms and how to design robust ranking algorithms. In our work, we investigate what inherent characteristics make training data robust to label noise. The motivation of our work comes from an interesting observation that a same ranking algorithm may show very different sensitivities to label noise over different data sets. We thus investigate the underlying reason for this observation based on two typical kinds of learning to rank algorithms (i.e.~pairwise and listwise methods) and three different public data sets (i.e.~OHSUMED, TD2003 and MSLR-WEB10K). We find that when label noise increases in training data, it is the \emph{document pair noise ratio} (i.e.~\emph{pNoise}) rather than \emph{document noise ratio} (i.e.~\emph{dNoise}) that can well explain the performance degradation of a ranking algorithm.

References

[1]
J. A. Aslam, E. Kanoulas, and et. al. Document selection methodologies for efficient and effective learning-to-rank. SIGIR '09, pages 468--475, 2009.
[2]
P. Bailey, N. Craswell, and et. al. Relevance assessment: are judges exchangeable and does it matter. SIGIR '08, pages 667--674, 2008.
[3]
Z. Cao, T. Qin, and et. al. Learning to rank: from pairwise approach to listwise approach. ICML '07, pages 129--136.
[4]
Y. Freund, R. Iyer, and et. al. An efficient boosting algorithm for combining preferences. JMLR, 4:933--969, 2003.
[5]
X. Geng, T. Qin, and et. al. Selecting optimal training data for learning to rank. IPM, 47:730-- 741, 2011.
[6]
V. Jain and M. Varma. Learning to re-rank: query-dependent image re-ranking using click data. WWW '11, pages 277--286.
[7]
T. Joachims. Optimizing search engines using clickthrough data. KDD '02, pages 133--142, 2002.
[8]
E. Kanoulas, S. Savev, and et. al. A large-scale study of the effect of training set characteristics over learning-to-rank algorithms. SIGIR '11, pages 1243--1244, 2011.
[9]
G. Kazai, N. Craswell, and et. al. An analysis of systematic judging errors in information retrieval. CIKM '12, pages 105--114.
[10]
A. Kumar and M. Lease. Learning to rank from a noisy crowd. SIGIR '11, pages 1221--1222, 2011.
[11]
T.-Y. Liu. Introduction. In Learning to rank for information retrieval, chapter 1, pages 3--30. 2011.
[12]
C. Macdonald, R. Santos, and I. Ounis. The whens and hows of learning to rank for web search. Information Retrieval, 16(5):584--628, 2013.
[13]
T. Qin, T.-Y. Liu, and et. al. Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval Journal, 13:346--374, 2010.
[14]
U. Rebbapragada and C. E. Brodley. Class noise mitigation through instance weighting. ECML '07, pages 708--715, 2007.
[15]
F. Scholer, A. Turpin, and M. Sanderson. Quantifying test collection quality based on the consistency of relevance judgements. SIGIR '11, pages 1063--1072, 2011.
[16]
V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. KDD '08, pages 614--622, 2008.
[17]
W. C. J. V. Carvalho, J. Elsas. A meta-learning approach for robust rank learning. In Proceedings of SIGIR 2008 LR4IR.
[18]
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. SIGIR '98, pages 315--323.
[19]
J. Vuurens, A. P. De Vries, and C. Eickho. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In Proceedings of SIGIR 2011 Workshop on CIR.
[20]
J. L. S.-F. C. Wei Liu, Yugang Jiang. Noise resistant graph ranking for improved web image search. In CVPR, 2011.
[21]
J. Xu, C. Chen, and et. al. Improving quality of training data for learning to rank using click-through data. WSDM '10, pages 171--180, 2010.
[22]
J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. SIGIR '07, pages 391--398, 2007.

Cited By

View all
  • (2022)Context-aware ranking refinement with attentive semi-supervised autoencodersSoft Computing10.1007/s00500-022-07433-w26:24(13941-13952)Online publication date: 25-Aug-2022
  • (2018)Learning to Rank with Deep Autoencoder Features2018 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2018.8489646(1-8)Online publication date: Jul-2018
  • (2017)Learning to rank using multiple loss functionsInternational Journal of Machine Learning and Cybernetics10.1007/s13042-017-0730-4Online publication date: 12-Oct-2017
  • Show More Cited By

Index Terms

  1. What makes data robust: a data analysis in learning to rank

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
    July 2014
    1330 pages
    ISBN:9781450322577
    DOI:10.1145/2600428
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 July 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. label noise
    2. learning to rank
    3. robust data

    Qualifiers

    • Poster

    Conference

    SIGIR '14
    Sponsor:

    Acceptance Rates

    SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Context-aware ranking refinement with attentive semi-supervised autoencodersSoft Computing10.1007/s00500-022-07433-w26:24(13941-13952)Online publication date: 25-Aug-2022
    • (2018)Learning to Rank with Deep Autoencoder Features2018 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2018.8489646(1-8)Online publication date: Jul-2018
    • (2017)Learning to rank using multiple loss functionsInternational Journal of Machine Learning and Cybernetics10.1007/s13042-017-0730-4Online publication date: 12-Oct-2017
    • (2016)From Tf-Idf to Learning-to-RankBusiness Intelligence10.4018/978-1-4666-9562-7.ch063(1245-1292)Online publication date: 2016
    • (2016)From Tf-Idf to Learning-to-RankHandbook of Research on Innovations in Information Retrieval, Analysis, and Management10.4018/978-1-4666-8833-9.ch003(62-109)Online publication date: 2016
    • (2015)A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-RankProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806482(103-112)Online publication date: 17-Oct-2015
    • (2015)Supervised topic models with word order structure for document classification and retrieval learningInformation Retrieval Journal10.1007/s10791-015-9254-218:4(283-330)Online publication date: 4-Jun-2015

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media