Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Family of Rank Similarity Measures Based on Maximized Effectiveness Difference

Published: 01 November 2015 Publication History

Abstract

Rank similarity measures provide a method for quantifying differences between search engine results without the need for relevance judgments. For example, the providers of a search service might use such measures to estimate the impact of a proposed algorithmic change across a large number of queries-perhaps millions-identifying those queries where the impact is greatest. In this paper, we propose and validate a family of rank similarity measures, each derived from an associated effectiveness measure. Each member of the family is based on the maximization of effectiveness difference under this associated measure. Computing this maximized effectiveness difference (MED) requires the solution of an optimization problem that varies in difficulty, depending on the associated measure. We present solutions for several standard effectiveness measures, including nDCG, AP, and ERR. Through an experimental validation, we show that MED reveals meaningful differences between retrieval runs. Mathematically, MED is a metric, regardless of the associated measure. Prior work has established a number of other desiderata for rank similarity in the context of search, and we demonstrate that MED satisfies these requirements. Unlike previous proposals, MED allows us to directly translate assumptions about user behavior from any established effectiveness measure to create a corresponding rank similarity measure. In addition, MED cleanly accommodates partial relevance judgments, and if complete relevance information is available, it reduces to a simple difference between effectiveness values.

References

[1]
K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation of IR techniques,” ACM Trans. Inf. Syst., vol. 20, no. 4, pp. 422–446, 2002.
[2]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan, “Expected reciprocal rank for graded relevance,” in Proc. 18th ACM Conf. Inf. Knowl. Manag., 2009, pp. 621–630.
[3]
M. A. Najork, “Comparing the effectiveness of HITS and SALSA, ” in Proc. 16th ACM Conf. Inf. Knowl. Manag., 2007, pp. 157– 164.
[4]
O. Chapelle and Y. Zhang, “A dynamic Bayesian network click model for web search ranking,” in Proc. 18th Int. World Wide Web Conf., 2009, pp. 1–10.
[5]
G. Dupret and C. Liao, “A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine,” in Proc. 3rd ACM Int. Conf. Web Search Data Mining, 2010, pp. 181–190.
[6]
C. Eickhoff, C. G. Harris, A. P. de Vries, and P. Srinivasan, “Quality through flow and immersion: Gamifying crowdsourced relevance assessments,” in Proc. 35th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012, pp. 871–880.
[7]
B. Hu, Y. Zhang, W. Chen, G. Wang, and Q. Yang, “Characterizing search intent diversity into click models,” in Proc. 20th Int. World Wide Web Conf., 2011, pp. 17–26.
[8]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay, “Accurately interpreting clickthrough data as implicit feedback, ” in Proc. 28th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2005, pp. 154–161.
[9]
M. D. Smucker and C. P. Jethani, “Human performance and retrieval precision revisited,” in Proc. 33rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2010, pp. 595– 602.
[10]
C. Buckley, “Topic prediction based on comparative retrieval rankings, ” in Proc. 27th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2004, pp. 506–507.
[11]
M. Sun, G. Lebanon, and K. Collins-Thompson, “Visualizing differences in web search algorithms using the expected weighted Hoeffding distance,” in Proc. 19th Int. World Wide Web Conf., 2010, pp. 931–940.
[12]
M. Melucci, “Weighted rank correlation in information retrieval evaluation, ” in Proc. 5th Asia Inf. Retrieval Symp. Inf. Retrieval Technol., 2009, pp. 75–86.
[13]
R. Kumar and S. Vassilvitskii, “Generalized distances between rankings,” in Proc. 19th Int. World Wide Web Conf., 2010, pp. 571–580.
[14]
W. Webber, A. Moffat, and J. Zobel, “A similarity measure for indefinite rankings,” ACM Trans. Inf. Syst., vol. 28, no. 4, pp. 20:1–20:38, Nov. 2010.
[15]
B. Carterette, “On rank correlation and the distance between rankings, ” in Proc. 32nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2009, pp. 436–443.
[16]
E. Yilmaz, J. A. Aslam, and S. Robertson, “A new rank correlation coefficient for information retrieval,” in Proc. 31st Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2008, pp. 587–594.
[17]
B. J. Jansen and M. Resnick, “An examination of searcher’s perceptions of nonsponsored and sponsored links during ecommerce web searching,” J. Amer. Soc. Inf. Sci. Technol., vol. 57, no. 14, pp. 1949–1961, 2006.
[18]
A. Moffat and J. Zobel, “Rank-biased precision for measurement of retrieval effectiveness,” ACM Trans. Inf. Syst., vol. 27, no. 1, pp. 2:1– 2:27, Dec. 2008.
[19]
N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey, “An experimental comparison of click position-bias models,” in Proc. 1st Int. Conf. Web Search Data Mining, 2008, pp. 87–94.
[20]
C. L. Clarke, N. Craswell, I. Soboroff, and A. Ashkan, “A comparative analysis of cascade measures for novelty and diversity,” in Proc. 4th ACM Int. Conf. Web Search Data Mining , 2011, pp. 75–84.
[21]
A. Moffat, F. Scholer, and P. Thomas, “Models and metrics: IR evaluation as a user process,” in Proc. 17th Australasian Document Comput. Symp., 2012, pp. 47–54.
[22]
A. Al-Maskari, M. Sanderson, and P. Clough, “The relationship between IR effectiveness measures and user satisfaction,” in Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2007, pp. 773– 774.
[23]
F. Rendl, G. Rinaldi, and A. Wiegele, “Solving Max-cut to optimality by intersecting semidefinite and polyhedral relaxations,” Math. Program., vol. 121, no. 2, pp. 307–335, Jul. 2009.
[24]
J. E. Beasley, “Heuristic algorithms for the unconstrained binary quadratic programming problem,” The Management School, Imperial College, London, U.K., Dec. 1998.
[25]
E. M. Voorhees, “Overview of the TREC 2005 robust retrieval track, ” in Proc. 14th Text REtrieval Conf., 2005.
[26]
D. Roussinov, M. Chau, E. Filatova, and J. A. Robles-Flores, “Building on redundancy: Factoid question answering and the ‘other’,” in Proc. 14th Text REtrieval Conf., 2005.
[27]
S. Robertson, “On GMAP: And other transformations,” in Proc. 15th ACM Int. Conf. Inf. Knowl. Manag., 2006, pp. 78–83.
[28]
M. D. Smucker and C. L. Clarke, “Time-based calibration of effectiveness measures,” in Proc. 35th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012, pp. 95–104.
[29]
B. Carterette and J. Allan, “Incremental test collections,” in Proc. 14th ACM Int. Conf. Inf. Knowl. Manag., 2005, pp. 680–687.
[30]
B. Carterette, J. Allan, and R. Sitaraman, “Minimal test collections for retrieval evaluation,” in Proc. 29th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2006, pp. 268–275.
[31]
B. Carterette, “Low-cost and robust evaluation of information retrieval systems, ” Ph.D. dissertation, University of Massachusetts Amherst, MA, USA, 2008.
[32]
B. Cardoso and J. a. Magalhães, “Google, Bing and a new perspective on ranking similarity,” in Proc. 20th ACM Int. Conf. Inf. Knowl. Manag., 2011, pp. 1933– 1936.
[33]
S. Büttcher, C. L. A. Clarke, P. C. K. Yeung, and I. Soboroff, “Reliable information retrieval evaluation with incomplete and biased judgements,” in Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2007, pp. 63–70 .
[34]
E. C. Jensen, S. M. Beitzel, A. Chowdhury, and O. Frieder, “Repeatable evaluation of search services in dynamic environments,” ACM Trans. Inf. Syst., vol. 26, no. 1, Nov. 2007, pp. 1:1–1:38 .
[35]
E. Yilmaz, E. Kanoulas, and J. A. Aslam, “A simple and efficient sampling method for estimating AP and NDCG,” in Proc. 31st Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2008, pp. 603–610.
[36]
T. Sakai and N. Kando, “On information retrieval metrics designed for evaluation with incomplete relevance assessments, ” Inf. Retrieval, vol. 11, no. 5, pp. 447 –470, Oct. 2008.
[37]
T. Sakai and Z. Dou, “Summaries, ranked retrieval and sessions: A unified framework for information access evaluation, ” in Proc. 36th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2013, pp. 473–482.
[38]
J. Allan, “HARD track overview in TREC 2004: High accuracy retrieval from documents,” in Proc. 13th Text REtrieval Conf., 2004.
[39]
M. D. Smucker and C. L. A. Clarke, “Stochastic simulation of time-biased gain,” in Proc. 21st ACM Int. Conf. Inf. Knowl. Manag., 2012, pp. 2040–2044.
[40]
C. L. A. Clarke and M. D. Smucker, “Time well spent,” in Proc. Inf. Interaction Context Conf., 2014, pp. 205–214.
[41]
T. Sakai and R. Song, “Evaluating diversified search results using Per-intent graded relevance,” in Proc. 34th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2011, pp. 1043 –1052.
[42]
F. Baskaya, H. Keskustalo, and K. Järvelin, “ Time drives interaction: Simulating sessions in diverse searching environments,” in Proc. 35th Int.l ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2012, pp. 105 –114.

Cited By

View all
  • (2024)How do Ties Affect the Uncertainty in Rank-Biased Overlap?Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698422(125-134)Online publication date: 8-Dec-2024
  • (2024)Rank-Biased Quality Measurement for Sets and RankingsProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698405(135-144)Online publication date: 8-Dec-2024
  • (2024)The Treatment of Ties in Rank-Biased OverlapProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657700(251-260)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. A Family of Rank Similarity Measures Based on Maximized Effectiveness Difference
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Knowledge and Data Engineering
        IEEE Transactions on Knowledge and Data Engineering  Volume 27, Issue 11
        Nov. 2015
        287 pages

        Publisher

        IEEE Educational Activities Department

        United States

        Publication History

        Published: 01 November 2015

        Author Tags

        1. search engines
        2. Search
        3. rank similarity
        4. information retrieval
        5. effectiveness measures

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 25 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)How do Ties Affect the Uncertainty in Rank-Biased Overlap?Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698422(125-134)Online publication date: 8-Dec-2024
        • (2024)Rank-Biased Quality Measurement for Sets and RankingsProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698405(135-144)Online publication date: 8-Dec-2024
        • (2024)The Treatment of Ties in Rank-Biased OverlapProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657700(251-260)Online publication date: 10-Jul-2024
        • (2023)Preference-Based Offline EvaluationProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3572725(1248-1251)Online publication date: 27-Feb-2023
        • (2020)Offline Evaluation without GainProceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval10.1145/3409256.3409816(185-192)Online publication date: 14-Sep-2020
        • (2018)When Rank Order Isn't EnoughProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271751(397-406)Online publication date: 17-Oct-2018
        • (2018)Enhanced Performance Prediction of Fusion-based RetrievalProceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3234944.3234950(195-198)Online publication date: 10-Sep-2018
        • (2018)Dynamic Shard Cutoff Prediction for Selective SearchThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210005(85-94)Online publication date: 27-Jun-2018
        • (2018)Query Driven Algorithm Selection in Early Stage RetrievalProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159676(396-404)Online publication date: 2-Feb-2018
        • (2017)Managing Tail Latencies in Large Scale IR SystemsProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3084152(1369-1369)Online publication date: 7-Aug-2017
        • Show More Cited By

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media