research-article

Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of Preferences

Author:

Fernando GinerAuthors Info & Claims

ACM Transactions on Information Systems, Volume 42, Issue 3

Article No.: 69, Pages 1 - 35

https://doi.org/10.1145/3632171

Published: 30 December 2023 Publication History

Abstract

Information retrieval (IR) evaluation measures are essential for capturing the relevance of documents to topics and determining the task performance efficiency of retrieval systems. The study of IR evaluation measures through their formal properties enables a better understanding of their suitability for a specific task. Some works have modeled the effectiveness of retrieval measures with axioms, heuristics, or desirable properties, leading to order relationships on the set where they are defined. Each of these ordering structures constitutes an axiomatic model of preferences (AMP), which can be considered as an “ideal” scenario of retrieval. Based on lattice theory and on the representational theory of measurement, this work formally explores numeric, metric, and scale properties of some effectiveness measures defined on AMPs. In some of these scenarios, retrieval measures are completely determined from the scores of a subset of document rankings: join-irreducible elements. All the possible metrics and pseudometrics, defined on these structures are expressed in terms of the join-irreducible elements. The deduced scale properties of the precision, recall, F-measure, RBP, DCG, and AP confirm some recent results in the IR field.

References

[1]

James Allan, Jay Aslam, Nicholas Belkin, Chris Buckley, Jamie Callan, Bruce Croft, Sue Dumais, Norbert Fuhr, Donna Harman, David J. Harper, Djoerd Hiemstra, Thomas Hofmann, Eduard Hovy, Wessel Kraaij, John Lafferty, Victor Lavrenko, David Lewis, Liz Liddy, R. Manmatha, Andrew McCallum, Jay Ponte, John Prager, Dragomir Radev, Philip Resnik, Stephen Robertson, Roni Rosenfeld, Salim Roukos, Mark Sanderson, Rich Schwartz, Amit Singhal, Alan Smeaton, Howard Turtle, Ellen Voorhees, Ralph Weischedel, Jinxi Xu, and ChengXiang Zhai. 2003. Challenges in information retrieval and language modeling: Report of a workshop held at the center for intelligent information retrieval, university of massachusetts amherst, september 2002. SIGIR Forum 37, 1 (2003), 31–47. DOI:

Digital Library

[2]

Enrique Amigo, Hui Fang, Stefano Mizzaro, and ChengXiang Zhai. 2017. Axiomatic thinking for information retrieval: And related tasks. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). Association for Computing Machinery, 1419–1420. DOI:

Digital Library

[3]

Enrique Amigó, Hui Fang, Stefano Mizzaro, and Chengxiang Zhai. 2020. Axiomatic thinking for information retrieval: Introduction to special issue. Information Retrieval Journal 23, 3 (2020), 187–190.

Digital Library

[4]

Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 4 (2009), 461–486.

Digital Library

[5]

Enrique Amigó, Julio Gonzalo, and Felisa Verdejo. 2013. A general evaluation measure for document organization tasks. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). Association for Computing Machinery, 643–652. DOI:

Digital Library

[6]

Enrique Amigó and Stefano Mizzaro. 2020. On the nature of information access evaluation metrics: A unifying framework. Information Retrieval Journal 23, 3 (2020), 318–386.

Digital Library

[7]

Enrique Amigó, Damiano Spina, and Jorge Carrillo-de Albornoz. 2018. An axiomatic analysis of diversity evaluation metrics: Introducing the rank-biased utility metric. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). Association for Computing Machinery, 625–634. DOI:

Digital Library

[8]

Enrique Amigó, Julio Gonzalo, and Stefano Mizzaro. 2023. What is my problem? Identifying formal tasks and metrics in data mining on the basis of measurement theory. IEEE Transactions on Knowledge and Data Engineering 35, 2 (2023), 2147–2157. DOI:

[9]

Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2009. Evaluation measures for ordinal regression. In Proceedings of the 2009 9th International Conference on Intelligent Systems Design and Applications. IEEE, 283–287. DOI:

Digital Library

[10]

Garrett Birkhoff. 1940. Lattice Theory. American Mathematical Soc.

[11]

T. S Blyth. 2005. Distributive Lattices. Springer, London. DOI:

[12]

Peter Bollmann. 1984. Two axioms for evaluation measures in information retrieval. In Proceedings of the SIGIR. Citeseer, Association for Computing Machinery, 233–245.

[13]

Peter Bollmann and Vladimir S. Cherniavsky. 1980. Measurement-theoretical investigation of the MZ-metric. In Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval. Citeseer, Association for Computing Machinery, 256–267.

Digital Library

[14]

C. Buckley and Ellen Voorhees. 2005. Retrieval System Evaluation. TREC: Experiment and Evaluation in Information Retrieval, Chapter 3,MIT Press.

[15]

Luca Busin and Stefano Mizzaro. 2013. Axiometrics: An axiomatic approach to information retrieval effectiveness metrics. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR’13). Association for Computing Machinery, 22–29. DOI:

Digital Library

[16]

N. R. Campbell. 1921. Physics: The elements. Mind 30, 118 (1921), 207–214.

[17]

Nathalie Caspard, Bruno Leclerc, and Bernard Monjardet. 2012. Finite Ordered Sets: Concepts, Results and Uses. Cambridge University Press, United Kingdom.

[18]

Ariel Caticha. 1998. Consistency, amplitudes, and probabilities in quantum theory. Physical Review A 57, 3 (1998), 1572.

[19]

Cyril W. Cleverdon. 1991. The significance of the cranfield tests on index languages. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 3–12.

Digital Library

[20]

Stéphane Clinchant and Eric Gaussier. 2011. Is document frequency important for PRF?. In Proceedings of the Advances in Information Retrieval Theory. Giambattista Amati and Fabio Crestani (Eds.), Springer, Berlin, 89–100.

[21]

Stéphane Clinchant and Eric Gaussier. 2013. A theoretical analysis of pseudo-relevance feedback models. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR’13). Association for Computing Machinery, 6–13. DOI:

Digital Library

[22]

W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Information Retrieval in Practice. Addison-Wesley Reading.

Digital Library

[23]

Hui Fang. 2007. An Axiomatic Approach to Information Retrieval. Technical Report. University of Illinois.

[24]

Hui Fang, Tao Tao, and ChengXiang Zhai. 2004. A formal study of information retrieval heuristics. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04). Association for Computing Machinery, 49–56. DOI:

Digital Library

[25]

Hui Fang, Tao Tao, and Chengxiang Zhai. 2011. Diagnostic evaluation of information retrieval models. ACM Transactions on Information Systems 29, 2 (2011), 1–42.

Digital Library

[26]

Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’05). Association for Computing Machinery, 480–487. DOI:

Digital Library

[27]

Hui Fang and ChengXiang Zhai. 2006. Semantic term matching in axiomatic approaches to information retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). Association for Computing Machinery, 115–122. DOI:

Digital Library

[28]

Marco Ferrante, Nicola Ferro, and Norbert Fuhr. 2021. Towards meaningful statements in IR evaluation: Mapping evaluation measures to interval scales. IEEE Access 9 (2021), 136182–136216. DOI:

[29]

Marco Ferrante, Nicola Ferro, and Norbert Fuhr. 2022. Response to Moffat’s Comment on “Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales”. DOI:

[30]

Marco Ferrante, Nicola Ferro, and Maria Maistro. 2015. Towards a formal framework for utility-oriented measurements of retrieval effectiveness. In Proceedings of the 2015 International Conference on the Theory of Information Retrieval (ICTIR’15). Association for Computing Machinery, 21–30. DOI:

Digital Library

[31]

Marco Ferrante, Nicola Ferro, and Silvia Pontarollo. 2018. A general theory of IR evaluation measures. IEEE Transactions on Knowledge and Data Engineering 31, 3 (2018), 409–422.

Digital Library

[32]

Nicola Ferro and Carol Peters. 2019. Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF. Springer, Switzerland.

[33]

Ludwik Finkelstein. 2003. Widely, strongly and weakly defined measurement. Measurement 34, 1 (2003), 39–48.

[34]

L. Finkelstein. 2005. Problems of measurement in soft systems. Measurement 38, 4 (2005), 267–274.

[35]

Peter Flach. 2019. Performance evaluation in machine learning: The good, the bad, the ugly, and the way forward. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (2019), 9808–9814. DOI:

Digital Library

[36]

M. Maurice Fréchet. 1906. Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo (1884–1940) 22, 1 (1906), 1–72.

[37]

Norbert Fuhr. 2018. Some common mistakes in IR evaluation, and how they can be avoided. SIGIR Forum 51, 3 (2018), 32–41. DOI:

Digital Library

[38]

Lisa Gaudette and Nathalie Japkowicz. 2009. Evaluation methods for ordinal classification. In Proceedings of the Advances in Artificial Intelligence. Yong Gao and Nathalie Japkowicz (Eds.), Springer, Berlin, 207–210.

Digital Library

[39]

Fernando Giner. 2023a. A comment to “a general theory of IR evaluation measures”. arXiv preprint arXiv:2303.16061 1, 1 (2023), 1–7.

[40]

Fernando Giner. 2023b. An intrinsic framework of information retrieval evaluation measures. arXiv preprint arXiv:2304.00615 1, 1 (2023), 1–23.

[41]

George Grätzer. 2002. General Lattice Theory. Springer Science and Business Media, Germany.

[42]

Lei Han, Kevin Roitero, Eddy Maddalena, Stefano Mizzaro, and Gianluca Demartini. 2019. On transforming relevance scales. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM’19). Association for Computing Machinery, 39–48. DOI:

Digital Library

[43]

David J. Hand. 1996. Statistics and the theory of measurement. Journal of the Royal Statistical Society: Series A (Statistics in Society) 159, 3 (1996), 445–473.

[44]

Donna Harman. 2011. Information retrieval evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services 3, 2 (2011), 1–119.

[45]

Felix Hausdorff. 2005. Set Theory. American Mathematical Society.

[46]

Maryam Karimzadehgan and ChengXiang Zhai. 2012. Axiomatic analysis of translation language model for information retrieval. In Proceedings of the Advances in Information Retrieval. Ricardo Baeza-Yates, Arjen P. de Vries, Hugo Zaragoza, B. Barla Cambazoglu, Vanessa Murdock, Ronny Lempel, and Fabrizio Silvestri (Eds.), Springer, Berlin, 268–280.

Digital Library

[47]

Jaana Kekäläinen and Kalervo Järvelin. 2002. Using graded relevance assessments in IR evaluation. Journal of the American Society for Information Science and Technology 53, 13 (2002), 1120–1129.

Digital Library

[48]

Daniel A. Klain, Gian-Carlo Rota, et al. 1997. Introduction to Geometric Probability. Cambridge University Press, United Kigdom.

[49]

Kevin H. Knuth. 2004. Deriving laws from ordering relations. AIP Conference Proceedings 707, 1 (2004), 204–235. DOI:

[50]

Kevin H. Knuth. 2005. Lattice duality: The origin of probability and entropy. Neurocomputing 67 (2005), 245–274.

Digital Library

[51]

David Krantz. 1989. Foundations of Measurement. Vol. Ii. Geometrical, Threshold and Probabilistic Representations. New York Academic Press, NY.

[52]

David Krantz, Duncan Luce, Patrick Suppes, and Amos Tversky. 1971. Foundations of Measurement, Vol. I: Additive and Polynomial Representations. New York Academic Press, NY.

[53]

Bruno Leclerc. 1993. Lattice valuations, medians and majorities. Discrete Mathematics 111, 1–3 (1993), 345–356.

Digital Library

[54]

Bruno Leclerc. 1994. Medians for weight metrics in the covering graphs of semilattices. Discrete Applied Mathematics 49, 1–3 (1994), 281–297.

Digital Library

[55]

Duncan Luce, David Krantz, Patrick Suppes, and Amos Tversky. 1990. Foundations of Measurement, Vol. Iii: Representation, Axiomatization, and Invariance. New York Academic Press, NY.

[56]

Eddy Maddalena and Stefano Mizzaro. 2014. Axiometrics: Axioms of information retrieval effectiveness metrics. In Proceedings of the 2nd Australasian Web Conference (AWC 2014). EVIA@ NTCIR, Auckland, New Zealand, 39–48.

[57]

Joel Michell. 1986. Measurement scales and statistics: A clash of paradigms. Psychological Bulletin 100, 3 (1986), 398.

[58]

Joel Michell. 2014. An Introduction to the Logic of Psychological Measurement. Psychology Press.

[59]

Alistair Moffat. 2013. Seven numeric properties of effectiveness metrics. In Proceedings of the Information Retrieval Technology. Rafael E. Banchs, Fabrizio Silvestri, Tie-Yan Liu, Min Zhang, Sheng Gao, and Jun Lang (Eds.), Springer, Berlin,1–12.

[60]

Alistair Moffat. 2022. Batch evaluation metrics in information retrieval: Measures, scales, and meaning. IEEE Access 10 (2022), 105564–105577.

[61]

Alistair Moffat and Justin Zobel. 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) 27, 1 (2008), 1–27.

Digital Library

[62]

Ali Montazeralghaem, Hamed Zamani, and Azadeh Shakery. 2016. Axiomatic analysis for improving the log-logistic feedback model. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). Association for Computing Machinery, 765–768. DOI:

Digital Library

[63]

Razieh Rahimi, Ali Montazeralghaem, and Azadeh Shakery. 2020. An axiomatic approach to corpus-based cross-language information retrieval. Information Retrieval Journal 23, 3 (2020), 191–215.

Digital Library

[64]

Fred S. Roberts. 1985. Measurement Theory. Cambridge University Press, New York.

[65]

Stephen Robertson. 2008. On the history of evaluation in IR. Journal of Information Science 34, 4 (2008), 439–456.

Digital Library

[66]

Corby Rosset, Bhaskar Mitra, Chenyan Xiong, Nick Craswell, Xia Song, and Saurabh Tiwary. 2019. An axiomatic approach to regularizing neural ranking models. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). Association for Computing Machinery, 981–984. DOI:

Digital Library

[67]

Gian-Carlo Rota. 1971. On the combinatorics of the euler characteristic. In Proceedings of the Studies in Pure Mathematics (Presented to Richard Rado). Academic Press, London, 221–233.

[68]

Tetsuya Sakai. 2021. On fuhr’s guideline for IR evaluation. SIGIR Forum 54, 1, Article 12 (2021), 8 pages. DOI:

Digital Library

[69]

Tetsuya Sakai and Noriko Kando. 2008. On information retrieval metrics designed for evaluation with incomplete relevance assessments. Information Retrieval 11, 5 (2008), 447–470.

Digital Library

[70]

Tetsuya Sakai, Douglas W. Oard, and Noriko Kando. 2021. Evaluating Information Retrieval and Access Tasks: NTCIR’s Legacy of Research Impact. Springer Nature, Netherlands.

[71]

Gerard Salton. 1968. Automatic Information Organization and Retrieval.McGraw Hill Text, New York.

Digital Library

[72]

Mark Sanderson. 2010. Test collection based evaluation of information retrieval systems. Foundations and Trends® in Information Retrieval 4, 4 (2010), 247–375.

[73]

Fabrizio Sebastiani. 2015. An axiomatically derived measure for the evaluation of classification algorithms. In Proceedings of the 2015 International Conference on the Theory of Information Retrieval (ICTIR’15). Association for Computing Machinery, 11–20. DOI:

Digital Library

[74]

S. S. Stevens. 1946. On the theory of scales of measurement. Science 103, 2684 (1946), 677–680. DOI:

[75]

John A. Swets. 1963. Information retrieval systems. Science 141, 3577 (1963), 245–250.

[76]

Cornelis Joost Van Rijsbergen. 1974. Foundation of evaluation. Journal of Documentation 30, 4 (1974), 365–373.

[77]

Sophie Vanbelle and Adelin Albert. 2009. A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology 6, 2 (2009), 157–163.

[78]

Paul F. Velleman and Leland Wilkinson. 1993. Nominal, ordinal, interval, and ratio typologies are misleading. The American Statistician 47, 1 (1993), 65–72.

[79]

Ellen M. Voorhees, Donna K. Harman, et al. 2005. TREC: Experiment and Evaluation in Information Retrieval. Citeseer, Cambridge.

Digital Library

[80]

ChengXiang Zhai and Hui Fang. 2013. Axiomatic analysis and optimization of information retrieval models. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR’13). Association for Computing Machinery, 3. DOI:

Digital Library

Index Terms

Information Retrieval Evaluation Measures Defined on Some Axiomatic Models of Preferences
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval effectiveness

Recommendations

On the Effect of Ranking Axioms on IR Evaluation Metrics
ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

The study of IR evaluation metrics through axiomatic analysis enables a better understanding of their numerical properties. Some works have modelled the effectiveness of retrieval metrics with axioms that capture desirable properties on the set of ...
Complexity Reduction in Lattice-Based Information Retrieval
Abstract
Though lattice-based information representation has the advantage of providing efficient visual interface over textual display, the complexity of a lattice may grow rapidly with the size of the database. In this paper we formally draw the analogy ...
Current Status of the Evaluation of Information Retrieval

This is the second in the series of the articles on an application of the systems analytic approach to evaluation of information retrieval (IR). In the previous article a historical overview of IR was presented and existing terminological problems ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 42, Issue 3

May 2024

721 pages

EISSN:1558-2868

DOI:10.1145/3618081

Editor:
Min Zhang
Tsinghua University, China

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 December 2023

Online AM: 08 November 2023

Accepted: 24 October 2023

Revised: 23 August 2023

Received: 23 August 2022

Published in TOIS Volume 42, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
181
Total Downloads

Downloads (Last 12 months)181
Downloads (Last 6 weeks)14

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents