research-article

Preferences on a Budget: Prioritizing Document Pairs when Crowdsourcing Relevance Judgments

Authors:

Kevin Roitero,

Alessandro Checco,

Stefano Mizzaro,

Gianluca DemartiniAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 319 - 327

https://doi.org/10.1145/3485447.3511960

Published: 25 April 2022 Publication History

Get Access

Abstract

In Information Retrieval (IR) evaluation, preference judgments are collected by presenting to the assessors a pair of documents and asking them to select which of the two, if any, is the most relevant. This is an alternative to the classic relevance judgment approach, in which human assessors judge the relevance of a single document on a scale; such an alternative allows to make relative rather than absolute judgments of relevance. While preference judgments are easier for human assessors to perform, the number of possible document pairs to be judged is usually so high that it makes it unfeasible to judge them all. Thus, following a similar idea to pooling strategies for single document relevance judgments where the goal is to sample the most useful documents to be judged, in this work we focus on analyzing alternative ways to sample document pairs to judge, in order to maximize the value of a fixed number of preference judgments that can feasibly be collected. Such value is defined as how well we can evaluate IR systems given a budget, that is, a fixed number of human preference judgments that may be collected. By relying on several datasets featuring relevance judgments gathered by means of experts and crowdsourcing, we experimentally compare alternative strategies to select document pairs and show how different strategies lead to different IR evaluation result quality levels. Our results show that, by using the appropriate procedure, it is possible to achieve good IR evaluation results with a limited number of preference judgments, thus confirming the feasibility of using preference judgments to create IR evaluation collections.

References

[1]

Javed A. Aslam and Mark Montague. 2001. Models for Metasearch. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (New Orleans, Louisiana, USA) (SIGIR ’01). Association for Computing Machinery, New York, NY, USA, 276–284. https://doi.org/10.1145/383952.384007

Abstract

References

Cited By

Index Terms

Recommendations

Relevance Judgments: Preferences, Scores and Ties

Gauging the Quality of Relevance Assessments using Inter-Rater Agreement

On the effect of relevance scales in crowdsourcing relevance assessments for Information Retrieval evaluation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations