Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of Preferences

Published: 30 December 2023 Publication History

Abstract

Information retrieval (IR) evaluation measures are essential for capturing the relevance of documents to topics and determining the task performance efficiency of retrieval systems. The study of IR evaluation measures through their formal properties enables a better understanding of their suitability for a specific task. Some works have modeled the effectiveness of retrieval measures with axioms, heuristics, or desirable properties, leading to order relationships on the set where they are defined. Each of these ordering structures constitutes an axiomatic model of preferences (AMP), which can be considered as an “ideal” scenario of retrieval. Based on lattice theory and on the representational theory of measurement, this work formally explores numeric, metric, and scale properties of some effectiveness measures defined on AMPs. In some of these scenarios, retrieval measures are completely determined from the scores of a subset of document rankings: join-irreducible elements. All the possible metrics and pseudometrics, defined on these structures are expressed in terms of the join-irreducible elements. The deduced scale properties of the precision, recall, F-measure, RBP, DCG, and AP confirm some recent results in the IR field.

References

[1]
James Allan, Jay Aslam, Nicholas Belkin, Chris Buckley, Jamie Callan, Bruce Croft, Sue Dumais, Norbert Fuhr, Donna Harman, David J. Harper, Djoerd Hiemstra, Thomas Hofmann, Eduard Hovy, Wessel Kraaij, John Lafferty, Victor Lavrenko, David Lewis, Liz Liddy, R. Manmatha, Andrew McCallum, Jay Ponte, John Prager, Dragomir Radev, Philip Resnik, Stephen Robertson, Roni Rosenfeld, Salim Roukos, Mark Sanderson, Rich Schwartz, Amit Singhal, Alan Smeaton, Howard Turtle, Ellen Voorhees, Ralph Weischedel, Jinxi Xu, and ChengXiang Zhai. 2003. Challenges in information retrieval and language modeling: Report of a workshop held at the center for intelligent information retrieval, university of massachusetts amherst, september 2002. SIGIR Forum 37, 1 (2003), 31–47. DOI:
[2]
Enrique Amigo, Hui Fang, Stefano Mizzaro, and ChengXiang Zhai. 2017. Axiomatic thinking for information retrieval: And related tasks. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). Association for Computing Machinery, 1419–1420. DOI:
[3]
Enrique Amigó, Hui Fang, Stefano Mizzaro, and Chengxiang Zhai. 2020. Axiomatic thinking for information retrieval: Introduction to special issue. Information Retrieval Journal 23, 3 (2020), 187–190.
[4]
Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 4 (2009), 461–486.
[5]
Enrique Amigó, Julio Gonzalo, and Felisa Verdejo. 2013. A general evaluation measure for document organization tasks. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). Association for Computing Machinery, 643–652. DOI:
[6]
Enrique Amigó and Stefano Mizzaro. 2020. On the nature of information access evaluation metrics: A unifying framework. Information Retrieval Journal 23, 3 (2020), 318–386.
[7]
Enrique Amigó, Damiano Spina, and Jorge Carrillo-de Albornoz. 2018. An axiomatic analysis of diversity evaluation metrics: Introducing the rank-biased utility metric. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). Association for Computing Machinery, 625–634. DOI:
[8]
Enrique Amigó, Julio Gonzalo, and Stefano Mizzaro. 2023. What is my problem? Identifying formal tasks and metrics in data mining on the basis of measurement theory. IEEE Transactions on Knowledge and Data Engineering 35, 2 (2023), 2147–2157. DOI:
[9]
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2009. Evaluation measures for ordinal regression. In Proceedings of the 2009 9th International Conference on Intelligent Systems Design and Applications. IEEE, 283–287. DOI:
[10]
Garrett Birkhoff. 1940. Lattice Theory. American Mathematical Soc.
[11]
T. S Blyth. 2005. Distributive Lattices. Springer, London. DOI:
[12]
Peter Bollmann. 1984. Two axioms for evaluation measures in information retrieval. In Proceedings of the SIGIR. Citeseer, Association for Computing Machinery, 233–245.
[13]
Peter Bollmann and Vladimir S. Cherniavsky. 1980. Measurement-theoretical investigation of the MZ-metric. In Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval. Citeseer, Association for Computing Machinery, 256–267.
[14]
C. Buckley and Ellen Voorhees. 2005. Retrieval System Evaluation. TREC: Experiment and Evaluation in Information Retrieval, Chapter 3,MIT Press.
[15]
Luca Busin and Stefano Mizzaro. 2013. Axiometrics: An axiomatic approach to information retrieval effectiveness metrics. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR’13). Association for Computing Machinery, 22–29. DOI:
[16]
N. R. Campbell. 1921. Physics: The elements. Mind 30, 118 (1921), 207–214.
[17]
Nathalie Caspard, Bruno Leclerc, and Bernard Monjardet. 2012. Finite Ordered Sets: Concepts, Results and Uses. Cambridge University Press, United Kingdom.
[18]
Ariel Caticha. 1998. Consistency, amplitudes, and probabilities in quantum theory. Physical Review A 57, 3 (1998), 1572.
[19]
Cyril W. Cleverdon. 1991. The significance of the cranfield tests on index languages. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 3–12.
[20]
Stéphane Clinchant and Eric Gaussier. 2011. Is document frequency important for PRF?. In Proceedings of the Advances in Information Retrieval Theory. Giambattista Amati and Fabio Crestani (Eds.), Springer, Berlin, 89–100.
[21]
Stéphane Clinchant and Eric Gaussier. 2013. A theoretical analysis of pseudo-relevance feedback models. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR’13). Association for Computing Machinery, 6–13. DOI:
[22]
W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Information Retrieval in Practice. Addison-Wesley Reading.
[23]
Hui Fang. 2007. An Axiomatic Approach to Information Retrieval. Technical Report. University of Illinois.
[24]
Hui Fang, Tao Tao, and ChengXiang Zhai. 2004. A formal study of information retrieval heuristics. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). Association for Computing Machinery, 49–56. DOI:
[25]
Hui Fang, Tao Tao, and Chengxiang Zhai. 2011. Diagnostic evaluation of information retrieval models. ACM Transactions on Information Systems 29, 2 (2011), 1–42.
[26]
Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). Association for Computing Machinery, 480–487. DOI:
[27]
Hui Fang and ChengXiang Zhai. 2006. Semantic term matching in axiomatic approaches to information retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). Association for Computing Machinery, 115–122. DOI:
[28]
Marco Ferrante, Nicola Ferro, and Norbert Fuhr. 2021. Towards meaningful statements in IR evaluation: Mapping evaluation measures to interval scales. IEEE Access 9 (2021), 136182–136216. DOI:
[29]
Marco Ferrante, Nicola Ferro, and Norbert Fuhr. 2022. Response to Moffat’s Comment on “Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales”. DOI:
[30]
Marco Ferrante, Nicola Ferro, and Maria Maistro. 2015. Towards a formal framework for utility-oriented measurements of retrieval effectiveness. In Proceedings of the 2015 International Conference on the Theory of Information Retrieval (ICTIR’15). Association for Computing Machinery, 21–30. DOI:
[31]
Marco Ferrante, Nicola Ferro, and Silvia Pontarollo. 2018. A general theory of IR evaluation measures. IEEE Transactions on Knowledge and Data Engineering 31, 3 (2018), 409–422.
[32]
Nicola Ferro and Carol Peters. 2019. Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF. Springer, Switzerland.
[33]
Ludwik Finkelstein. 2003. Widely, strongly and weakly defined measurement. Measurement 34, 1 (2003), 39–48.
[34]
L. Finkelstein. 2005. Problems of measurement in soft systems. Measurement 38, 4 (2005), 267–274.
[35]
Peter Flach. 2019. Performance evaluation in machine learning: The good, the bad, the ugly, and the way forward. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (2019), 9808–9814. DOI:
[36]
M. Maurice Fréchet. 1906. Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo (1884–1940) 22, 1 (1906), 1–72.
[37]
Norbert Fuhr. 2018. Some common mistakes in IR evaluation, and how they can be avoided. SIGIR Forum 51, 3 (2018), 32–41. DOI:
[38]
Lisa Gaudette and Nathalie Japkowicz. 2009. Evaluation methods for ordinal classification. In Proceedings of the Advances in Artificial Intelligence. Yong Gao and Nathalie Japkowicz (Eds.), Springer, Berlin, 207–210.
[39]
Fernando Giner. 2023a. A comment to “a general theory of IR evaluation measures”. arXiv preprint arXiv:2303.16061 1, 1 (2023), 1–7.
[40]
Fernando Giner. 2023b. An intrinsic framework of information retrieval evaluation measures. arXiv preprint arXiv:2304.00615 1, 1 (2023), 1–23.
[41]
George Grätzer. 2002. General Lattice Theory. Springer Science and Business Media, Germany.
[42]
Lei Han, Kevin Roitero, Eddy Maddalena, Stefano Mizzaro, and Gianluca Demartini. 2019. On transforming relevance scales. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). Association for Computing Machinery, 39–48. DOI:
[43]
David J. Hand. 1996. Statistics and the theory of measurement. Journal of the Royal Statistical Society: Series A (Statistics in Society) 159, 3 (1996), 445–473.
[44]
Donna Harman. 2011. Information retrieval evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services 3, 2 (2011), 1–119.
[45]
Felix Hausdorff. 2005. Set Theory. American Mathematical Society.
[46]
Maryam Karimzadehgan and ChengXiang Zhai. 2012. Axiomatic analysis of translation language model for information retrieval. In Proceedings of the Advances in Information Retrieval. Ricardo Baeza-Yates, Arjen P. de Vries, Hugo Zaragoza, B. Barla Cambazoglu, Vanessa Murdock, Ronny Lempel, and Fabrizio Silvestri (Eds.), Springer, Berlin, 268–280.
[47]
Jaana Kekäläinen and Kalervo Järvelin. 2002. Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology 53, 13 (2002), 1120–1129.
[48]
Daniel A. Klain, Gian-Carlo Rota, et al. 1997. Introduction to Geometric Probability. Cambridge University Press, United Kigdom.
[49]
Kevin H. Knuth. 2004. Deriving laws from ordering relations. AIP Conference Proceedings 707, 1 (2004), 204–235. DOI:
[50]
Kevin H. Knuth. 2005. Lattice duality: The origin of probability and entropy. Neurocomputing 67 (2005), 245–274.
[51]
David Krantz. 1989. Foundations of Measurement. Vol. Ii. Geometrical, Threshold and Probabilistic Representations. New York Academic Press, NY.
[52]
David Krantz, Duncan Luce, Patrick Suppes, and Amos Tversky. 1971. Foundations of Measurement, Vol. I: Additive and Polynomial Representations. New York Academic Press, NY.
[53]
Bruno Leclerc. 1993. Lattice valuations, medians and majorities. Discrete Mathematics 111, 1–3 (1993), 345–356.
[54]
Bruno Leclerc. 1994. Medians for weight metrics in the covering graphs of semilattices. Discrete Applied Mathematics 49, 1–3 (1994), 281–297.
[55]
Duncan Luce, David Krantz, Patrick Suppes, and Amos Tversky. 1990. Foundations of Measurement, Vol. Iii: Representation, Axiomatization, and Invariance. New York Academic Press, NY.
[56]
Eddy Maddalena and Stefano Mizzaro. 2014. Axiometrics: Axioms of information retrieval effectiveness metrics. In Proceedings of the 2nd Australasian Web Conference (AWC 2014). EVIA@ NTCIR, Auckland, New Zealand, 39–48.
[57]
Joel Michell. 1986. Measurement scales and statistics: A clash of paradigms. Psychological Bulletin 100, 3 (1986), 398.
[58]
Joel Michell. 2014. An Introduction to the Logic of Psychological Measurement. Psychology Press.
[59]
Alistair Moffat. 2013. Seven numeric properties of effectiveness metrics. In Proceedings of the Information Retrieval Technology. Rafael E. Banchs, Fabrizio Silvestri, Tie-Yan Liu, Min Zhang, Sheng Gao, and Jun Lang (Eds.), Springer, Berlin,1–12.
[60]
Alistair Moffat. 2022. Batch evaluation metrics in information retrieval: Measures, scales, and meaning. IEEE Access 10 (2022), 105564–105577.
[61]
Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) 27, 1 (2008), 1–27.
[62]
Ali Montazeralghaem, Hamed Zamani, and Azadeh Shakery. 2016. Axiomatic analysis for improving the log-logistic feedback model. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). Association for Computing Machinery, 765–768. DOI:
[63]
Razieh Rahimi, Ali Montazeralghaem, and Azadeh Shakery. 2020. An axiomatic approach to corpus-based cross-language information retrieval. Information Retrieval Journal 23, 3 (2020), 191–215.
[64]
Fred S. Roberts. 1985. Measurement Theory. Cambridge University Press, New York.
[65]
Stephen Robertson. 2008. On the history of evaluation in IR. Journal of Information Science 34, 4 (2008), 439–456.
[66]
Corby Rosset, Bhaskar Mitra, Chenyan Xiong, Nick Craswell, Xia Song, and Saurabh Tiwary. 2019. An axiomatic approach to regularizing neural ranking models. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). Association for Computing Machinery, 981–984. DOI:
[67]
Gian-Carlo Rota. 1971. On the combinatorics of the euler characteristic. In Proceedings of the Studies in Pure Mathematics (Presented to Richard Rado). Academic Press, London, 221–233.
[68]
Tetsuya Sakai. 2021. On fuhr’s guideline for IR evaluation. SIGIR Forum 54, 1, Article 12 (2021), 8 pages. DOI:
[69]
Tetsuya Sakai and Noriko Kando. 2008. On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval 11, 5 (2008), 447–470.
[70]
Tetsuya Sakai, Douglas W. Oard, and Noriko Kando. 2021. Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact. Springer Nature, Netherlands.
[71]
Gerard Salton. 1968. Automatic Information Organization and Retrieval.McGraw Hill Text, New York.
[72]
Mark Sanderson. 2010. Test collection based evaluation of information retrieval systems. Foundations and Trends® in Information Retrieval 4, 4 (2010), 247–375.
[73]
Fabrizio Sebastiani. 2015. An axiomatically derived measure for the evaluation of classification algorithms. In Proceedings of the 2015 International Conference on the Theory of Information Retrieval (ICTIR’15). Association for Computing Machinery, 11–20. DOI:
[74]
S. S. Stevens. 1946. On the theory of scales of measurement. Science 103, 2684 (1946), 677–680. DOI:
[75]
John A. Swets. 1963. Information retrieval systems. Science 141, 3577 (1963), 245–250.
[76]
Cornelis Joost Van Rijsbergen. 1974. Foundation of evaluation. Journal of Documentation 30, 4 (1974), 365–373.
[77]
Sophie Vanbelle and Adelin Albert. 2009. A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology 6, 2 (2009), 157–163.
[78]
Paul F. Velleman and Leland Wilkinson. 1993. Nominal, ordinal, interval, and ratio typologies are misleading. The American Statistician 47, 1 (1993), 65–72.
[79]
Ellen M. Voorhees, Donna K. Harman, et al. 2005. TREC: Experiment and Evaluation in Information Retrieval. Citeseer, Cambridge.
[80]
ChengXiang Zhai and Hui Fang. 2013. Axiomatic analysis and optimization of information retrieval models. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR’13). Association for Computing Machinery, 3. DOI:

Index Terms

  1. Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of Preferences

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 42, Issue 3
    May 2024
    721 pages
    EISSN:1558-2868
    DOI:10.1145/3618081
    • Editor:
    • Min Zhang
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 December 2023
    Online AM: 08 November 2023
    Accepted: 24 October 2023
    Revised: 23 August 2023
    Received: 23 August 2022
    Published in TOIS Volume 42, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Information retrieval
    2. evaluation metric
    3. lattice theory

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 181
      Total Downloads
    • Downloads (Last 12 months)181
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media