research-article

A Family of Rank Similarity Measures Based on Maximized Effectiveness Difference

Authors:

Charles L. A. ClarkeAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 27, Issue 11

Pages 2865 - 2877

https://doi.org/10.1109/TKDE.2015.2448541

Published: 01 November 2015 Publication History

Abstract

Rank similarity measures provide a method for quantifying differences between search engine results without the need for relevance judgments. For example, the providers of a search service might use such measures to estimate the impact of a proposed algorithmic change across a large number of queries-perhaps millions-identifying those queries where the impact is greatest. In this paper, we propose and validate a family of rank similarity measures, each derived from an associated effectiveness measure. Each member of the family is based on the maximization of effectiveness difference under this associated measure. Computing this maximized effectiveness difference (MED) requires the solution of an optimization problem that varies in difficulty, depending on the associated measure. We present solutions for several standard effectiveness measures, including nDCG, AP, and ERR. Through an experimental validation, we show that MED reveals meaningful differences between retrieval runs. Mathematically, MED is a metric, regardless of the associated measure. Prior work has established a number of other desiderata for rank similarity in the context of search, and we demonstrate that MED satisfies these requirements. Unlike previous proposals, MED allows us to directly translate assumptions about user behavior from any established effectiveness measure to create a corresponding rank similarity measure. In addition, MED cleanly accommodates partial relevance judgments, and if complete relevance information is available, it reduces to a simple difference between effectiveness values.

References

[1]

K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation of IR techniques,” ACM Trans. Inf. Syst., vol. 20, no. 4, pp. 422–446, 2002.

Digital Library

[2]

O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan, “Expected reciprocal rank for graded relevance,” in Proc. 18th ACM Conf. Inf. Knowl. Manag., 2009, pp. 621–630.

Digital Library

[3]

M. A. Najork, “Comparing the effectiveness of HITS and SALSA, ” in Proc. 16th ACM Conf. Inf. Knowl. Manag., 2007, pp. 157– 164.

[4]

O. Chapelle and Y. Zhang, “A dynamic Bayesian network click model for web search ranking,” in Proc. 18th Int. World Wide Web Conf., 2009, pp. 1–10.

Digital Library

[5]

G. Dupret and C. Liao, “A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine,” in Proc. 3rd ACM Int. Conf. Web Search Data Mining, 2010, pp. 181–190.

[6]

C. Eickhoff, C. G. Harris, A. P. de Vries, and P. Srinivasan, “Quality through flow and immersion: Gamifying crowdsourced relevance assessments,” in Proc. 35th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012, pp. 871–880.

Digital Library

[7]

B. Hu, Y. Zhang, W. Chen, G. Wang, and Q. Yang, “Characterizing search intent diversity into click models,” in Proc. 20th Int. World Wide Web Conf., 2011, pp. 17–26.

[8]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay, “Accurately interpreting clickthrough data as implicit feedback, ” in Proc. 28th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2005, pp. 154–161.

Digital Library

[9]

M. D. Smucker and C. P. Jethani, “Human performance and retrieval precision revisited,” in Proc. 33rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2010, pp. 595– 602.

Digital Library

[10]

C. Buckley, “Topic prediction based on comparative retrieval rankings, ” in Proc. 27th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2004, pp. 506–507.

Digital Library

[11]

M. Sun, G. Lebanon, and K. Collins-Thompson, “Visualizing differences in web search algorithms using the expected weighted Hoeffding distance,” in Proc. 19th Int. World Wide Web Conf., 2010, pp. 931–940.

Digital Library

[12]

M. Melucci, “Weighted rank correlation in information retrieval evaluation, ” in Proc. 5th Asia Inf. Retrieval Symp. Inf. Retrieval Technol., 2009, pp. 75–86.

Digital Library

[13]

R. Kumar and S. Vassilvitskii, “Generalized distances between rankings,” in Proc. 19th Int. World Wide Web Conf., 2010, pp. 571–580.

Digital Library

[14]

W. Webber, A. Moffat, and J. Zobel, “A similarity measure for indefinite rankings,” ACM Trans. Inf. Syst., vol. 28, no. 4, pp. 20:1–20:38, Nov. 2010.

[15]

B. Carterette, “On rank correlation and the distance between rankings, ” in Proc. 32nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2009, pp. 436–443.

Digital Library

[16]

E. Yilmaz, J. A. Aslam, and S. Robertson, “A new rank correlation coefficient for information retrieval,” in Proc. 31st Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2008, pp. 587–594.

Digital Library

[17]

B. J. Jansen and M. Resnick, “An examination of searcher’s perceptions of nonsponsored and sponsored links during ecommerce web searching,” J. Amer. Soc. Inf. Sci. Technol., vol. 57, no. 14, pp. 1949–1961, 2006.

Digital Library

[18]

A. Moffat and J. Zobel, “Rank-biased precision for measurement of retrieval effectiveness,” ACM Trans. Inf. Syst., vol. 27, no. 1, pp. 2:1– 2:27, Dec. 2008.

[19]

N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey, “An experimental comparison of click position-bias models,” in Proc. 1st Int. Conf. Web Search Data Mining, 2008, pp. 87–94.

[20]

C. L. Clarke, N. Craswell, I. Soboroff, and A. Ashkan, “A comparative analysis of cascade measures for novelty and diversity,” in Proc. 4th ACM Int. Conf. Web Search Data Mining , 2011, pp. 75–84.

[21]

A. Moffat, F. Scholer, and P. Thomas, “Models and metrics: IR evaluation as a user process,” in Proc. 17th Australasian Document Comput. Symp., 2012, pp. 47–54.

[22]

A. Al-Maskari, M. Sanderson, and P. Clough, “The relationship between IR effectiveness measures and user satisfaction,” in Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2007, pp. 773– 774.

Digital Library

[23]

F. Rendl, G. Rinaldi, and A. Wiegele, “Solving Max-cut to optimality by intersecting semidefinite and polyhedral relaxations,” Math. Program., vol. 121, no. 2, pp. 307–335, Jul. 2009.

[24]

J. E. Beasley, “Heuristic algorithms for the unconstrained binary quadratic programming problem,” The Management School, Imperial College, London, U.K., Dec. 1998.

[25]

E. M. Voorhees, “Overview of the TREC 2005 robust retrieval track, ” in Proc. 14th Text REtrieval Conf., 2005.

[26]

D. Roussinov, M. Chau, E. Filatova, and J. A. Robles-Flores, “Building on redundancy: Factoid question answering and the ‘other’,” in Proc. 14th Text REtrieval Conf., 2005.

[27]

S. Robertson, “On GMAP: And other transformations,” in Proc. 15th ACM Int. Conf. Inf. Knowl. Manag., 2006, pp. 78–83.

Digital Library

[28]

M. D. Smucker and C. L. Clarke, “Time-based calibration of effectiveness measures,” in Proc. 35th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012, pp. 95–104.

Digital Library

[29]

B. Carterette and J. Allan, “Incremental test collections,” in Proc. 14th ACM Int. Conf. Inf. Knowl. Manag., 2005, pp. 680–687.

Digital Library

[30]

B. Carterette, J. Allan, and R. Sitaraman, “Minimal test collections for retrieval evaluation,” in Proc. 29th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2006, pp. 268–275.

Digital Library

[31]

B. Carterette, “Low-cost and robust evaluation of information retrieval systems, ” Ph.D. dissertation, University of Massachusetts Amherst, MA, USA, 2008.

[32]

B. Cardoso and J. a. Magalhães, “Google, Bing and a new perspective on ranking similarity,” in Proc. 20th ACM Int. Conf. Inf. Knowl. Manag., 2011, pp. 1933– 1936.

Digital Library

[33]

S. Büttcher, C. L. A. Clarke, P. C. K. Yeung, and I. Soboroff, “Reliable information retrieval evaluation with incomplete and biased judgements,” in Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2007, pp. 63–70 .

Digital Library

[34]

E. C. Jensen, S. M. Beitzel, A. Chowdhury, and O. Frieder, “Repeatable evaluation of search services in dynamic environments,” ACM Trans. Inf. Syst., vol. 26, no. 1, Nov. 2007, pp. 1:1–1:38 .

[35]

E. Yilmaz, E. Kanoulas, and J. A. Aslam, “A simple and efficient sampling method for estimating AP and NDCG,” in Proc. 31st Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2008, pp. 603–610.

Digital Library

[36]

T. Sakai and N. Kando, “On information retrieval metrics designed for evaluation with incomplete relevance assessments, ” Inf. Retrieval, vol. 11, no. 5, pp. 447 –470, Oct. 2008.

Digital Library

[37]

T. Sakai and Z. Dou, “Summaries, ranked retrieval and sessions: A unified framework for information access evaluation, ” in Proc. 36th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2013, pp. 473–482.

[38]

J. Allan, “HARD track overview in TREC 2004: High accuracy retrieval from documents,” in Proc. 13th Text REtrieval Conf., 2004.

[39]

M. D. Smucker and C. L. A. Clarke, “Stochastic simulation of time-biased gain,” in Proc. 21st ACM Int. Conf. Inf. Knowl. Manag., 2012, pp. 2040–2044.

Digital Library

[40]

C. L. A. Clarke and M. D. Smucker, “Time well spent,” in Proc. Inf. Interaction Context Conf., 2014, pp. 205–214.

[41]

T. Sakai and R. Song, “Evaluating diversified search results using Per-intent graded relevance,” in Proc. 34th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2011, pp. 1043 –1052.

Digital Library

[42]

F. Baskaya, H. Keskustalo, and K. Järvelin, “ Time drives interaction: Simulating sessions in diverse searching environments,” in Proc. 35th Int.l ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012, pp. 105 –114.

Cited By

Corsi MUrbano JSakai TIshita EOhshima HRadboud FMao JJose J(2024)How do Ties Affect the Uncertainty in Rank-Biased Overlap?Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698422(125-134)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698422
Moffat AMackenzie JMallia APetri MSakai TIshita EOhshima HRadboud FMao JJose J(2024)Rank-Biased Quality Measurement for Sets and RankingsProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698405(135-144)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698405
Corsi MUrbano JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)The Treatment of Ties in Rank-Biased OverlapProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657700(251-260)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657700
Show More Cited By

Index Terms

A Family of Rank Similarity Measures Based on Maximized Effectiveness Difference
1. Information systems
  1. Information retrieval

Index terms have been assigned to the content through auto-classification.

Recommendations

Effective rank aggregation for metasearching

Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the ...
On the informativeness of cascade and intent-aware effectiveness measures
WWW '11: Proceedings of the 20th international conference on World wide web

The Maximum Entropy Method provides one technique for validating search engine effectiveness measures. Under this method, the value of an effectiveness measure is used as a constraint to estimate the most likely distribution of relevant documents under ...
Evaluating Google queries based on language preferences

This paper evaluates the assumption that users expect search engines to retrieve the same results for queries regardless of the language or the location of the originator. The dependency of the Google search engine on the language and location from ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 27, Issue 11

Nov. 2015

287 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © 2015.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 November 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Corsi MUrbano JSakai TIshita EOhshima HRadboud FMao JJose J(2024)How do Ties Affect the Uncertainty in Rank-Biased Overlap?Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698422(125-134)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698422
Moffat AMackenzie JMallia APetri MSakai TIshita EOhshima HRadboud FMao JJose J(2024)Rank-Biased Quality Measurement for Sets and RankingsProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698405(135-144)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698405
Corsi MUrbano JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)The Treatment of Ties in Rank-Biased OverlapProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657700(251-260)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657700
Clarke CDiaz FArabzadeh NChua TLauw HSi LTerzi ETsaparas P(2023)Preference-Based Offline EvaluationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572725(1248-1251)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3572725
Clarke CVtyurina ASmucker MBalog KSetty VLioma CLiu YZhang MBerberich K(2020)Offline Evaluation without GainProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409816(185-192)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.1145/3409256.3409816
Kutlu MElsayed THasanain MLease MCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)When Rank Order Isn't EnoughProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271751(397-406)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271751
Roitman HSong DLiu TSun LBruza PMelucci MSebastiani FYang G(2018)Enhanced Performance Prediction of Fusion-based RetrievalProceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3234944.3234950(195-198)Online publication date: 10-Sep-2018
https://dl.acm.org/doi/10.1145/3234944.3234950
Mohammad HXu KCallan JCulpepper JCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Dynamic Shard Cutoff Prediction for Selective SearchThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210005(85-94)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210005
Mackenzie JCulpepper JBlanco RCrane MClarke CLin JChang YZhai CLiu YMaarek Y(2018)Query Driven Algorithm Selection in Early Stage RetrievalProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159676(396-404)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.1145/3159652.3159676
Mackenzie JKando NSakai TJoho HLi Hde Vries AWhite R(2017)Managing Tail Latencies in Large Scale IR SystemsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3084152(1369-1369)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3084152
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents