research-article

Click model-based information retrieval metrics

Authors:

Aleksandr Chuklin,

Pavel Serdyukov,

Maarten de RijkeAuthors Info & Claims

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pages 493 - 502

https://doi.org/10.1145/2484028.2484071

Published: 28 July 2013 Publication History

Abstract

In recent years many models have been proposed that are aimed at predicting clicks of web search users. In addition, some information retrieval evaluation metrics have been built on top of a user model. In this paper we bring these two directions together and propose a common approach to converting any click model into an evaluation metric. We then put the resulting model-based metrics as well as traditional metrics (like DCG or Precision) into a common evaluation framework and compare them along a number of dimensions.

One of the dimensions we are particularly interested in is the agreement between offline and online experimental outcomes. It is widely believed, especially in an industrial setting, that online A/B-testing and interleaving experiments are generally better at capturing system quality than offline measurements. We show that offline metrics that are based on click models are more strongly correlated with online experimental outcomes than traditional offline metrics, especially in situations when we have incomplete relevance judgements.

References

[1]

K. Ali and C. Chang. On the relationship between click-rate and relevance for search engines. WIT Trans. on Inform. and Comm. Technol., 1, June 2006.

[2]

J. Allan, B. Carterette, J. Aslam, and V. Pavlu. Million Query Track 2007 Overview. Technical report, DTIC Document, 2007.

[3]

L. Azzopardi, M. de Rijke, and K. Balog. Building simulated queries for known-item topics: an analysis using six european languages. In SIGIR, 2007.

Digital Library

[4]

R. Berendsen, E. Tsagkias, M. de Rijke, and E. Meij. Generating pseudo test collections for learning to rank scientific articles. In CLEF, 2012.

Digital Library

[5]

C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In SIGIR, 2004.

Digital Library

[6]

B. Carterette. System effectiveness, user models, and user utility: a conceptual framework for investigation. In SIGIR, 2011.

Digital Library

[7]

B. Carterette and J. Allan. Incremental test collections. In CIKM, 2005.

Digital Library

[8]

B. Carterette, V. Pavlu, and E. Kanoulas. If I had a million queries. In Advances in Information Retrieval, 2009.

Digital Library

[9]

O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In WWW. ACM, 2009.

Digital Library

[10]

O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM. ACM, 2009.

Digital Library

[11]

O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inform. Systems, 2012.

Digital Library

[12]

D. Chen, W. Chen, and H. Wang. Beyond ten blue links: enabling user click modeling in federated web search. In WSDM. ACM, 2012.

Digital Library

[13]

A. Chuklin and P. Serdyukov. Good abandonments in factoid queries. In WWW, 2012.

Digital Library

[14]

A. Chuklin, P. Serdyukov, and M. de Rijke. Using Intent Information to Model User Behavior in Diversified Search. In ECIR, 2013.

Digital Library

[15]

C. L. A. Clarke, N. Craswell, and I. Soboroff. A comparative analysis of cascade measures for novelty and diversity. In WSDM. ACM, 2011.

Digital Library

[16]

C. L. A. Clarke, N. Craswell, I. Soboroff, and E. M. Voorhees. Overview of the TREC 2011 Web Track. In TREC 2011. NIST, 2012.

[17]

C. W. Cleverdon, J. Mills, and M. Keen. Factors determining the performance of indexing systems. In ASLIB Cranfield project. 1966.

[18]

N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM. ACM, 2008.

Digital Library

[19]

G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR. ACM, 2008.

Digital Library

[20]

F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor, Y.-M. Wang, and C. Faloutsos. Click chain model in web search. In WWW. ACM, 2009.

Digital Library

[21]

F. Guo, C. Liu, and Y. Wang. Efficient multiple-click models in web search. In WSDM. ACM, 2009.

Digital Library

[22]

K. Hofmann, S. Whiteson, and M. de Rijke. A probabilistic method for inferring preferences from clicks. In CIKM. ACM, 2011.

Digital Library

[23]

K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing Historical Interaction Data for Faster Online Learning to Rank for IR. In WSDM, 2013.

Digital Library

[24]

J. Huang, R. W. White, G. Buscher, and K. Wang. Improving searcher models using mouse cursor activity. In SIGIR. ACM, 2012.

Digital Library

[25]

S. B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In SIGIR, 2007.

Digital Library

[26]

K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Systems, 20(4), 2002.

Digital Library

[27]

T. Joachims. Optimizing search engines using clickthrough data. In KDD. ACM, 2002.

Digital Library

[28]

R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), July 2008.

Digital Library

[29]

J. Li, S. Huffman, and A. Tokuda. Good abandonment in mobile and PC internet search. In SIGIR, 2009.

Digital Library

[30]

C. Liu, F. Guo, and C. Faloutsos. BBM. In KDD. ACM, June 2009.

Digital Library

[31]

F. Radlinski and N. Craswell. Comparing the sensitivity of information retrieval metrics. In SIGIR. ACM, 2010.

Digital Library

[32]

F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM'08. ACM, 2008.

Digital Library

[33]

T. Sakai. Evaluating evaluation metrics based on the bootstrap. In SIGIR, 2006.

Digital Library

[34]

T. Sakai. Alternatives to Bpref. In SIGIR. ACM, 2007.

Digital Library

[35]

M. Sanderson and H. Joho. Forming test collections with no system pooling. In SIGIR, 2004.

Digital Library

[36]

M. D. Smucker and C. L. A. Clarke. Time-Based Calibration of Effectiveness Measures. In SIGIR, 2012.

Digital Library

[37]

A. Turpin, F. Scholer, K. Jarvelin, M. Wu, and J. S. Culpepper. Including summaries in system evaluation. In SIGIR, 2009.

Digital Library

[38]

E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information processing & management, 36, 2000.

Digital Library

[39]

E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected browsing utility for web search evaluation. In CIKM. ACM, 2010.

Digital Library

[40]

Y. Yue, Y. Gao, O. Chapelle, Y. Zhang, and T. Joachims. Learning more powerful test statistics for click-based retrieval evaluation. In SIGIR, 2010.

Digital Library

[41]

Y. Zhang, W. Chen, D. Wang, and Q. Yang. User-click modeling for understanding and predicting search-behavior. In KDD. ACM, 2011.

Digital Library

Cited By

Tavakoli LTrippas JZamani HScholer FSanderson M(2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3681786
Mao HZou LZheng YTang JChu XZhao JWang QYin DChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Whole Page Unbiased Learning to RankProceedings of the ACM Web Conference 202410.1145/3589334.3645474(1431-1440)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645474
Arabzadeh NBigdeli ABagheri E(2024)LaQuE: Enabling Entity Search at ScaleAdvances in Information Retrieval10.1007/978-3-031-56060-6_18(270-285)Online publication date: 16-Mar-2024
https://doi.org/10.1007/978-3-031-56060-6_18
Show More Cited By

Index Terms

Click model-based information retrieval metrics
1. Information systems
  1. Information retrieval

Recommendations

Comparing the sensitivity of information retrieval metrics
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Information retrieval effectiveness is usually evaluated using measures such as Normalized Discounted Cumulative Gain (NDCG), Mean Average Precision (MAP) and Precision at some cutoff (Precision@k) on a set of judged queries. Recent research has ...
Evaluating mobile web search performance by taking good abandonment into account
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Usage of mobile devices for Web search grows rapidly in recent years. The common tendency is that users want to receive information immediately results in incorporating rich snippets and vertical results into search engine result pages (SERPs) and in ...
Vertical-Aware Click Model-Based Effectiveness Metrics
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Today's web search systems present users with heterogeneous information coming from sources of different types, also known as verticals. Evaluating such systems is an important but complex task, which is still far from being solved. In this paper we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

July 2013

1188 pages

ISBN:9781450320344

DOI:10.1145/2484028

General Chairs:
Gareth J.F. Jones
Dublin City University, Ireland
,
Páraic Sheridan
Dublin City University, Ireland
,
Program Chairs:
Diane Kelly
University of North Carolina, Chapel Hill, USA
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Tetsuya Sakai
Microsoft Research Asia, China

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '13

Sponsor:

SIGIR

SIGIR '13: The 36th International ACM SIGIR conference on research and development in Information Retrieval

July 28 - August 1, 2013

Dublin, Ireland

Acceptance Rates

SIGIR '13 Paper Acceptance Rate 73 of 366 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
654
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)5

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tavakoli LTrippas JZamani HScholer FSanderson M(2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3681786
Mao HZou LZheng YTang JChu XZhao JWang QYin DChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Whole Page Unbiased Learning to RankProceedings of the ACM Web Conference 202410.1145/3589334.3645474(1431-1440)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645474
Arabzadeh NBigdeli ABagheri E(2024)LaQuE: Enabling Entity Search at ScaleAdvances in Information Retrieval10.1007/978-3-031-56060-6_18(270-285)Online publication date: 16-Mar-2024
https://doi.org/10.1007/978-3-031-56060-6_18
Breuer TFuhr NSchaer P(2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
https://dl.acm.org/doi/10.1145/3623640
Sakai TKim JKang I(2023)A Versatile Framework for Evaluating Ranked Lists in Terms of Group Fairness and RelevanceACM Transactions on Information Systems10.1145/358976342:1(1-36)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3589763
Ai QWang XBendersky MChen HDuh WHuang HKato MMothe JPoblete B(2023)Metric-agnostic Ranking OptimizationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591935(2669-2680)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591935
Galuscáková PDeveaud RGonzález Sáez GMulhem PGoeuriot LPiroi FPopel MChen HDuh WHuang HKato MMothe JPoblete B(2023)LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search EvaluationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591921(3086-3094)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591921
Chen NPark DPark HChoi KSakai TKim JChen HDuh WHuang HKato MMothe JPoblete B(2023)Practice and Challenges in Building a Business-oriented Search Engine Quality MetricProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591841(3295-3299)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591841
Deffayet RHager PRenders Jde Rijke MChen HDuh WHuang HKato MMothe JPoblete B(2023)An Offline Metric for the Debiasedness of Click ModelsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591639(558-568)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591639
Sarkar SShah CChua TLauw HSi LTerzi ETsaparas P(2023)A Synthetic Search Session Generator for Task-Aware Information Seeking and RetrievalProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3573041(1156-1159)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3573041
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten