Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1390334.1390445acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Evaluation over thousands of queries

Published: 20 July 2008 Publication History

Abstract

Information retrieval evaluation has typically been performed over several dozen queries, each judged to near-completeness. There has been a great deal of recent work on evaluation over much smaller judgment sets: how to select the best set of documents to judge and how to estimate evaluation measures when few judgments are available. In light of this, it should be possible to evaluate over many more queries without much more total judging effort. The Million Query Track at TREC 2007 used two document selection algorithms to acquire relevance judgments for more than 1,800 queries. We present results of the track, along with deeper analysis: investigating tradeoffs between the number of queries and number of judgments shows that, up to a point, evaluation over more queries with fewer judgments is more cost-effective and as reliable as fewer queries with more judgments. Total assessor effort can be reduced by 95% with no appreciable increase in evaluation errors.

References

[1]
J. Allan, B. Carterette, J. A. Aslam, V. Pavlu, B. Dachev, and E. Kanoulas. Overview of the TREC 2007 Million Query Track. In Proceedings of TREC, 2007.
[2]
J. A. Aslam and V. Pavlu. A practical sampling strategy for efficient retrieval evaluation, technical report.
[3]
J. A. Aslam and V. Pavlu. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proceedings of ECIR, pages 198--209. 2007.
[4]
J. A. Aslam, V. Pavlu, and E. Yilmaz. Measure-based metasearch. In Proceedings of SIGIR, pages 571--572, 2005.
[5]
J. A. Aslam, V. Pavlu, and E. Yilmaz. A statistical method for system evaluation using incomplete judgments. In Proceedings of SIGIR, pages 541--548, 2006.
[6]
D. Bodoff and P. Li. Test theory for assessing ir test collection. In Proceedings of SIGIR, pages 367--374, 2007.
[7]
R. L. Brennan. Generalizability Theory. Springer-Verlag, New York, 2001.
[8]
K. R. W. Brewer and M. Hanif. Sampling With Unequal Probabilities. Springer, New York, 1983.
[9]
C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. Bias and the limits of pooling. In Proceedings of SIGIR, pages 619--620, 2006.
[10]
B. Carterette. Robust test collections for retrieval evaluation. In Proceedings of SIGIR, pages 55--62, 2007.
[11]
B. Carterette, J. Allan, and R. K. Sitaraman. Minimal test collections for retrieval evaluation. In Proceedings of SIGIR, pages 268--275, 2006.
[12]
B. Carterette and M. Smucker. Hypothesis testing with incomplete relevance judgments. In Proceedings of CIKM, pages 643--652, 2007.
[13]
C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 Terabyte Track. In Proceedings of TREC, 2004.
[14]
C. L. A. Clarke, F. Scholer, and I. Soboroff. The TREC 2005 terabyte track. In Proceedings of TREC, 2005.
[15]
E. C. Jensen. Repeatable Evaluation of Information Retrieval Effectiveness in Dynamic Environments. PhD thesis, Illinois Institute of Technology, 2006.
[16]
M. Sanderson and J. Zobel. Information retrieval system evaluation: Effort, sensitivity, and reliability. In Proceedings of SIGIR, pages 162--169, 2005.
[17]
K. Sparck Jones and C. J. van Rijsbergen. Information retrieval test collections. Journal of Documentation, 32(1):59--75, 1976.
[18]
W. L. Stevens. Sampling without replacement with probability proportional to size. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 20, No. 2. (1958), pp. 393--397
[19]
S. K. Thompson. Sampling. Wiley Series in Probability and Mathematical Statistics, 1992.
[20]
E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proceedings of CIKM, pages 102--111, 2006.
[21]
J. Zobel. How reliable are the results of large-scale retrieval experiments? In Proceedings of SIGIR, pages 307--314, 1998.

Cited By

View all
  • (2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
  • (2022)Understanding and Predicting Characteristics of Test Collections in Information RetrievalInformation for a Better World: Shaping the Global Future10.1007/978-3-030-96960-8_10(136-148)Online publication date: 23-Feb-2022
  • (2021)Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval ScalesIEEE Access10.1109/ACCESS.2021.31168579(136182-136216)Online publication date: 2021
  • Show More Cited By

Index Terms

  1. Evaluation over thousands of queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
    July 2008
    934 pages
    ISBN:9781605581644
    DOI:10.1145/1390334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evaluation
    2. information retrieval
    3. million query track
    4. test collections

    Qualifiers

    • Research-article

    Conference

    SIGIR '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reliable Information Retrieval Systems Performance Evaluation: A ReviewIEEE Access10.1109/ACCESS.2024.337723912(51740-51751)Online publication date: 2024
    • (2022)Understanding and Predicting Characteristics of Test Collections in Information RetrievalInformation for a Better World: Shaping the Global Future10.1007/978-3-030-96960-8_10(136-148)Online publication date: 23-Feb-2022
    • (2021)Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval ScalesIEEE Access10.1109/ACCESS.2021.31168579(136182-136216)Online publication date: 2021
    • (2019)Improving the Accuracy of System Performance Estimation by Using ShardsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3338062(805-814)Online publication date: 18-Jul-2019
    • (2019)Statistical Significance Testing in Information RetrievalProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331259(505-514)Online publication date: 18-Jul-2019
    • (2018)Mobile Search Behavious: An In-depth Analysis based on Contexts, APPs, and DevicesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00831ED1V01Y201802ICR06310:2(i-159)Online publication date: 19-Mar-2018
    • (2018)Topic Set Size Design for Paired and Unpaired DataProceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3234944.3234971(199-202)Online publication date: 10-Sep-2018
    • (2018)Content-based recommendation for Academic Expert findingProceedings of the 5th Spanish Conference on Information Retrieval10.1145/3230599.3230607(1-8)Online publication date: 26-Jun-2018
    • (2018)Fusion in Information RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210186(1383-1386)Online publication date: 27-Jun-2018
    • (2018)Topic Set Size Design Using ExcelLaboratory Experiments in Information Retrieval10.1007/978-981-13-1199-4_6(99-132)Online publication date: 23-Sep-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media