research-article

Inferring document relevance from incomplete information

Authors:

Javed A. Aslam,

Emine YilmazAuthors Info & Claims

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 633 - 642

https://doi.org/10.1145/1321440.1321529

Published: 06 November 2007 Publication History

Get Access

Abstract

Recent work has shown that average precision can be accurately estimated from a small random sample of judged documents. Unfortunately, such "random pools" cannot be used to evaluate retrieval measures in any standard way. In this work, we show that given such estimates of average precision, one can accurately infer the relevances of the remaining unjudged documents, thus obtaining a fully judged pool that can be used in standard ways for system evaluation of all kinds. Using TREC data, we demonstrate that our inferred judged pools are well correlated with assessor judgments, and we further demonstrate that our inferred pools can be used to accurately infer precision recall curves and all commonly used measures of retrieval performance.

References

[1]

J. A. Aslam, V. Pavlu, and R. Savell. A unified model for metasearch, pooling, and system evaluation. In O. Frieder, J. Hammer, S. Quershi, and L. Seligman, editors, Proceedings of the Twelfth International Conference on Information and Knowledge Management, pages 484--491. ACM Press, November 2003.

Digital Library

Google Scholar

[2]

J. A. Aslam, V. Pavlu, and E. Yilmaz. A statistical method for system evaluation using incomplete judgments. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 541--548. ACM Press, August 2006.

Digital Library

Google Scholar

[3]

J. A. Aslam and E. Yilmaz. Inferring document relevance via average precision. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 601--602. ACM Press, August 2006.

Digital Library

Google Scholar

[4]

J. A. Aslam, E. Yilmaz, and V. Pavlu. The maximum entropy method for analyzing retrieval measures. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 27--34. ACM Press, August 2005.

Digital Library

Google Scholar

[5]

B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 268--275, 2006.

Digital Library

Google Scholar

[6]

G. V. Cormack, C. R. Palmer, and C. L. A. Clarke. Efficient construction of large test collections. In Croft et al. {7}, pages 282--289.

Digital Library

Google Scholar

[7]

W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, editors. Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 1998.

Crossref

Google Scholar

[8]

D. Harman. Overview of the third text REtreival conference (TREC-3). In D. Harman, editor, Overview of the Third Text REtrieval Conference (TREC-3), pages 1--19. U.S. Government Printing Office, Apr. 1995.

Digital Library

Google Scholar

[9]

E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 315--323. ACM Press, 1998.

Digital Library

Google Scholar

[10]

E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proceedings of the Fifteenth ACM International Conference on Information and Knowledge Management, pages 102--111. ACM Press, November 2006.

Digital Library

Google Scholar

[11]

J. Zobel. How reliable are the results of large-scale retrieval experiments? In Croft et al. {7}, pages 307--314.

Digital Library

Google Scholar

Cited By

View all

Fröbe MGienapp LPotthast MHagen M(2023)Bootstrapped nDCG Estimation in the Presence of Unjudged DocumentsAdvances in Information Retrieval10.1007/978-3-031-28244-7_20(313-329)Online publication date: 17-Mar-2023
https://doi.org/10.1007/978-3-031-28244-7_20
Ganguly DDatta SMitra MGreene D(2022)An Analysis of Variations in the Effectiveness of Query Performance PredictionAdvances in Information Retrieval10.1007/978-3-030-99736-6_15(215-229)Online publication date: 5-Apr-2022
https://doi.org/10.1007/978-3-030-99736-6_15
Rahman MKutlu MLease M(2019)Constructing Test Collections using Multi-armed Bandits and Active LearningThe World Wide Web Conference10.1145/3308558.3313675(3158-3164)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313675
Show More Cited By

Index Terms

Inferring document relevance from incomplete information
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Reliable information retrieval evaluation with incomplete and biased judgements
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to the pool of judged documents. This may distort the results obtained about the relative quality of the systems evaluated and thus ...
Inferring document relevance via average precision
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

We consider the problem of evaluating retrieval systems using a limited number of relevance judgments. Recent work has demonstrated that one can accurately estimate average precision via a judged pool corresponding to a relatively small random sample of ...
Estimating average precision with incomplete and imperfect judgments
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

We consider the problem of evaluating retrieval systems using incomplete judgment information. Buckley and Voorhees recently demonstrated that retrieval systems can be efficiently and effectively evaluated using incomplete judgments via the bpref ...

Comments

Information & Contributors

Information

Published In

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

November 2007

1048 pages

ISBN:9781595938039

DOI:10.1145/1321440

Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM07

Sponsor:

CIKM07: Conference on Information and Knowledge Management

November 6 - 10, 2007

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
504
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Fröbe MGienapp LPotthast MHagen M(2023)Bootstrapped nDCG Estimation in the Presence of Unjudged DocumentsAdvances in Information Retrieval10.1007/978-3-031-28244-7_20(313-329)Online publication date: 17-Mar-2023
https://doi.org/10.1007/978-3-031-28244-7_20
Ganguly DDatta SMitra MGreene D(2022)An Analysis of Variations in the Effectiveness of Query Performance PredictionAdvances in Information Retrieval10.1007/978-3-030-99736-6_15(215-229)Online publication date: 5-Apr-2022
https://doi.org/10.1007/978-3-030-99736-6_15
Rahman MKutlu MLease M(2019)Constructing Test Collections using Multi-armed Bandits and Active LearningThe World Wide Web Conference10.1145/3308558.3313675(3158-3164)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313675
Gupta SKutlu MKhetan VLease M(2019)Correlation, Prediction and Ranking of Evaluation Metrics in Information RetrievalAdvances in Information Retrieval10.1007/978-3-030-15712-8_41(636-651)Online publication date: 7-Apr-2019
https://doi.org/10.1007/978-3-030-15712-8_41
Chuklin Ade Rijke MMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)Incorporating Clicks, Attention and Satisfaction into a Search Engine Result Page Evaluation ModelProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983829(175-184)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983829
Kanoulas E(2016)A Short Survey on Online and Offline Methods for Search Quality EvaluationInformation Retrieval10.1007/978-3-319-41718-9_3(38-87)Online publication date: 26-Jul-2016
https://doi.org/10.1007/978-3-319-41718-9_3
Hui Kde Melo GKacimi MVarde A(2014)Towards Robust & Reusable Evaluation for Novelty & DiversityProceedings of the 7th Workshop on Ph.D Students10.1145/2663714.2668045(9-17)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2663714.2668045
Urbano JSchedl MSerra X(2013)Evaluation in Music Information RetrievalJournal of Intelligent Information Systems10.1007/s10844-013-0249-441:3(345-369)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1007/s10844-013-0249-4
Dai KPavlu VKanoulas EAslam J(2012)Extended expectation maximization for inferring score distributionsProceedings of the 34th European conference on Advances in Information Retrieval10.1007/978-3-642-28997-2_25(293-304)Online publication date: 1-Apr-2012
https://dl.acm.org/doi/10.1007/978-3-642-28997-2_25
Carterette BGabrilovich EJosifovski VMetzler DDavison BSuel TCraswell NLiu B(2010)Measuring the reusability of test collectionsProceedings of the third ACM international conference on Web search and data mining10.1145/1718487.1718516(231-240)Online publication date: 4-Feb-2010
https://dl.acm.org/doi/10.1145/1718487.1718516
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Reliable information retrieval evaluation with incomplete and biased judgements

Inferring document relevance via average precision

Estimating average precision with incomplete and imperfect judgments