Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2484028.2484071acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Click model-based information retrieval metrics

Published: 28 July 2013 Publication History

Abstract

In recent years many models have been proposed that are aimed at predicting clicks of web search users. In addition, some information retrieval evaluation metrics have been built on top of a user model. In this paper we bring these two directions together and propose a common approach to converting any click model into an evaluation metric. We then put the resulting model-based metrics as well as traditional metrics (like DCG or Precision) into a common evaluation framework and compare them along a number of dimensions.
One of the dimensions we are particularly interested in is the agreement between offline and online experimental outcomes. It is widely believed, especially in an industrial setting, that online A/B-testing and interleaving experiments are generally better at capturing system quality than offline measurements. We show that offline metrics that are based on click models are more strongly correlated with online experimental outcomes than traditional offline metrics, especially in situations when we have incomplete relevance judgements.

References

[1]
K. Ali and C. Chang. On the relationship between click-rate and relevance for search engines. WIT Trans. on Inform. and Comm. Technol., 1, June 2006.
[2]
J. Allan, B. Carterette, J. Aslam, and V. Pavlu. Million Query Track 2007 Overview. Technical report, DTIC Document, 2007.
[3]
L. Azzopardi, M. de Rijke, and K. Balog. Building simulated queries for known-item topics: an analysis using six european languages. In SIGIR, 2007.
[4]
R. Berendsen, E. Tsagkias, M. de Rijke, and E. Meij. Generating pseudo test collections for learning to rank scientific articles. In CLEF, 2012.
[5]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In SIGIR, 2004.
[6]
B. Carterette. System effectiveness, user models, and user utility: a conceptual framework for investigation. In SIGIR, 2011.
[7]
B. Carterette and J. Allan. Incremental test collections. In CIKM, 2005.
[8]
B. Carterette, V. Pavlu, and E. Kanoulas. If I had a million queries. In Advances in Information Retrieval, 2009.
[9]
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In WWW. ACM, 2009.
[10]
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM. ACM, 2009.
[11]
O. Chapelle, T. Joachims, F. Radlinski, and Y. Yue. Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inform. Systems, 2012.
[12]
D. Chen, W. Chen, and H. Wang. Beyond ten blue links: enabling user click modeling in federated web search. In WSDM. ACM, 2012.
[13]
A. Chuklin and P. Serdyukov. Good abandonments in factoid queries. In WWW, 2012.
[14]
A. Chuklin, P. Serdyukov, and M. de Rijke. Using Intent Information to Model User Behavior in Diversified Search. In ECIR, 2013.
[15]
C. L. A. Clarke, N. Craswell, and I. Soboroff. A comparative analysis of cascade measures for novelty and diversity. In WSDM. ACM, 2011.
[16]
C. L. A. Clarke, N. Craswell, I. Soboroff, and E. M. Voorhees. Overview of the TREC 2011 Web Track. In TREC 2011. NIST, 2012.
[17]
C. W. Cleverdon, J. Mills, and M. Keen. Factors determining the performance of indexing systems. In ASLIB Cranfield project. 1966.
[18]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In WSDM. ACM, 2008.
[19]
G. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR. ACM, 2008.
[20]
F. Guo, C. Liu, A. Kannan, T. Minka, M. Taylor, Y.-M. Wang, and C. Faloutsos. Click chain model in web search. In WWW. ACM, 2009.
[21]
F. Guo, C. Liu, and Y. Wang. Efficient multiple-click models in web search. In WSDM. ACM, 2009.
[22]
K. Hofmann, S. Whiteson, and M. de Rijke. A probabilistic method for inferring preferences from clicks. In CIKM. ACM, 2011.
[23]
K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing Historical Interaction Data for Faster Online Learning to Rank for IR. In WSDM, 2013.
[24]
J. Huang, R. W. White, G. Buscher, and K. Wang. Improving searcher models using mouse cursor activity. In SIGIR. ACM, 2012.
[25]
S. B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In SIGIR, 2007.
[26]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inform. Systems, 20(4), 2002.
[27]
T. Joachims. Optimizing search engines using clickthrough data. In KDD. ACM, 2002.
[28]
R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), July 2008.
[29]
J. Li, S. Huffman, and A. Tokuda. Good abandonment in mobile and PC internet search. In SIGIR, 2009.
[30]
C. Liu, F. Guo, and C. Faloutsos. BBM. In KDD. ACM, June 2009.
[31]
F. Radlinski and N. Craswell. Comparing the sensitivity of information retrieval metrics. In SIGIR. ACM, 2010.
[32]
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM'08. ACM, 2008.
[33]
T. Sakai. Evaluating evaluation metrics based on the bootstrap. In SIGIR, 2006.
[34]
T. Sakai. Alternatives to Bpref. In SIGIR. ACM, 2007.
[35]
M. Sanderson and H. Joho. Forming test collections with no system pooling. In SIGIR, 2004.
[36]
M. D. Smucker and C. L. A. Clarke. Time-Based Calibration of Effectiveness Measures. In SIGIR, 2012.
[37]
A. Turpin, F. Scholer, K. Jarvelin, M. Wu, and J. S. Culpepper. Including summaries in system evaluation. In SIGIR, 2009.
[38]
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information processing & management, 36, 2000.
[39]
E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected browsing utility for web search evaluation. In CIKM. ACM, 2010.
[40]
Y. Yue, Y. Gao, O. Chapelle, Y. Zhang, and T. Joachims. Learning more powerful test statistics for click-based retrieval evaluation. In SIGIR, 2010.
[41]
Y. Zhang, W. Chen, D. Wang, and Q. Yang. User-click modeling for understanding and predicting search-behavior. In KDD. ACM, 2011.

Cited By

View all
  • (2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
  • (2024)Whole Page Unbiased Learning to RankProceedings of the ACM Web Conference 202410.1145/3589334.3645474(1431-1440)Online publication date: 13-May-2024
  • (2024)LaQuE: Enabling Entity Search at ScaleAdvances in Information Retrieval10.1007/978-3-031-56060-6_18(270-285)Online publication date: 16-Mar-2024
  • Show More Cited By

Index Terms

  1. Click model-based information retrieval metrics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
    July 2013
    1188 pages
    ISBN:9781450320344
    DOI:10.1145/2484028
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 July 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. click models
    2. evaluation
    3. information retrieval measures
    4. user behavior

    Qualifiers

    • Research-article

    Conference

    SIGIR '13
    Sponsor:

    Acceptance Rates

    SIGIR '13 Paper Acceptance Rate 73 of 366 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
    • (2024)Whole Page Unbiased Learning to RankProceedings of the ACM Web Conference 202410.1145/3589334.3645474(1431-1440)Online publication date: 13-May-2024
    • (2024)LaQuE: Enabling Entity Search at ScaleAdvances in Information Retrieval10.1007/978-3-031-56060-6_18(270-285)Online publication date: 16-Mar-2024
    • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
    • (2023)A Versatile Framework for Evaluating Ranked Lists in Terms of Group Fairness and RelevanceACM Transactions on Information Systems10.1145/358976342:1(1-36)Online publication date: 30-May-2023
    • (2023)Metric-agnostic Ranking OptimizationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591935(2669-2680)Online publication date: 19-Jul-2023
    • (2023)LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search EvaluationProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591921(3086-3094)Online publication date: 19-Jul-2023
    • (2023)Practice and Challenges in Building a Business-oriented Search Engine Quality MetricProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591841(3295-3299)Online publication date: 19-Jul-2023
    • (2023)An Offline Metric for the Debiasedness of Click ModelsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591639(558-568)Online publication date: 19-Jul-2023
    • (2023)A Synthetic Search Session Generator for Task-Aware Information Seeking and RetrievalProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3573041(1156-1159)Online publication date: 27-Feb-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media