Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems

González-Sáez, Gabriela Nicole; Mulhem, Philippe; Goeuriot, Lorraine

doi:10.1007/978-3-030-85251-1_8

Gabriela Nicole González-Sáez¹⁸,
Philippe Mulhem¹⁸ &
Lorraine Goeuriot¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1134 Accesses

Abstract

Evaluation of information retrieval systems follows the Cranfield paradigm, where the evaluation of several IR systems relies on a common evaluation environment (test collection and evaluation settings). The Cranfield paradigm requires the evaluation environment (EE) to be strictly identical to compare system’s performances. For those cases where such paradigm cannot be used, e.g. when we do not have access to the code of the systems, we consider an evaluation framework that allows for slight changes in the EEs, as the evolution of the document corpus or topics. To do so, we propose to compare systems evaluated on different environments using a reference system, called pivot. In this paper, we present and validate a method to select a pivot, which is used to construct a correct ranking of systems evaluated in different environments. We test our framework on the TREC-COVID test collection, which is composed of five rounds of growing topics, documents and relevance judgments. The results of our experiments show that the pivot strategy can propose a correct ranking of systems evaluated in an evolving test collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Predicting Retrieval Performance Changes in Evolving Evaluation Environments

Replicability Measures for Longitudinal Information Retrieval Evaluation

Investigating the Dynamic Decision Mechanisms of Users’ Relevance Judgment for Information Retrieval via Log Analysis

Notes

References

Amati, G.: Probabilistic models for information retrieval based on divergence from randomness. Ph.D. thesis, Glasgow University, Glasgow, June 2003
Google Scholar
Ferro, N., Sanderson, M.: Sub-corpora impact on system effectiveness. In: Proceedings of SIGIR 2017, pp. 901–904 (2017)
Google Scholar
Ferro, N., Sanderson, M.: Improving the accuracy of system performance estimation by using shards. In: Proceedings of SIGIR 2019, pp. 805–814 (2019)
Google Scholar
Jensen, E.C., Beitzel, S.M., Chowdhury, A., Frieder, O.: Repeatable evaluation of search services in dynamic environments. ACM Trans. Inf. Syst. (TOIS) 26(1), 1-es (2007)
Article Google Scholar
Ounis, I., Lioma, C., Macdonald, C., Plachouras, V.: Research directions in terrier: a search engine for advanced retrieval on the web. CEPIS Upgrade J. 8(1) (2007)
Google Scholar
Sakai, T.: A simple and effective approach to score standardisation. In: Proceedings of ICTIR 2016, pp. 95–104 (2016)
Google Scholar
Sanderson, M., Turpin, A., Zhang, Y., Scholer, F.: Differences in effectiveness across sub-collections. In: Proceedings of CIKM 2012, pp. 1965–1969 (2012)
Google Scholar
Soboroff, I.: Dynamic test collections: measuring search effectiveness on the live web. In: Proceedings of SIGIR 2006, pp. 276–283 (2006)
Google Scholar
Soboroff, I.: Meta-analysis for retrieval experiments involving multiple test collections. In: Proceedings of CIKM 2018, pp. 713–722 (2018)
Google Scholar
Tonon, A., Demartini, G., Cudré-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retr. J. 18(5), 445–472 (2015)
Article Google Scholar
Urbano, J., Lima, H., Hanjalic, A.: A new perspective on score standardization. In: Proceedings of SIGIR 2019, pp. 1061–1064 (2019)
Google Scholar
Voorhees, E., et al.: TREC-COVID: constructing a pandemic information retrieval test collection. In: ACM SIGIR Forum, New York, NY, USA, vol. 54, pp. 1–12. ACM (2021)
Google Scholar
Voorhees, E.M., Samarov, D., Soboroff, I.: Using replicates in information retrieval evaluation. ACM Trans. Inf. Syst. (TOIS) 36(2), 1–21 (2017)
Article Google Scholar
Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. arXiv (2020)
Google Scholar
Webber, W., Moffat, A., Zobel, J.: Score standardization for robust comparison of retrieval systems. In: Proceedings of the 12th Australasian Document Computing Symposium, pp. 1–8 (2007)
Google Scholar
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of SIGIR 2008, pp. 51–58 (2008)
Google Scholar
Webber, W., Park, L.A.: Score adjustment for correction of pooling bias. In: Proceedings of SIGIR 2009, pp. 444–451 (2009)
Google Scholar

Download references

Acknowledgements

This work was supported by the ANR Kodicare bi-lateral project, grant ANR-19-CE23-0029 of the French Agence Nationale de la Recherche, and by the Austrian Science Fund (FWF).

Author information

Authors and Affiliations

Univ. Grenoble Alpes, CNRS, Grenoble INP (Institute of Engineering Univ. Grenoble Alpes), LIG, 38000, Grenoble, France
Gabriela Nicole González-Sáez, Philippe Mulhem & Lorraine Goeuriot

Authors

Gabriela Nicole González-Sáez
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Mulhem
View author publications
You can also search for this author in PubMed Google Scholar
Lorraine Goeuriot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriela Nicole González-Sáez .

Editor information

Editors and Affiliations

Arizona State University, Tempe, AZ, USA
K. Selçuk Candan
Politehnica University of Bucharest, Bucharest, Romania
Bogdan Ionescu
Université Grenoble Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Aalborg University Copenhagen, Copenhagen, Denmark
Birger Larsen
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Montpellier, Montpellier, France
Alexis Joly
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
TU Wien, Vienna, Austria
Florina Piroi
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

González-Sáez, G.N., Mulhem, P., Goeuriot, L. (2021). Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-85251-1_8
Published: 14 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Retrieval Performance Changes in Evolving Evaluation Environments

Replicability Measures for Longitudinal Information Retrieval Evaluation

Investigating the Dynamic Decision Mechanisms of Users’ Relevance Judgment for Information Retrieval via Log Analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Retrieval Performance Changes in Evolving Evaluation Environments

Replicability Measures for Longitudinal Information Retrieval Evaluation

Investigating the Dynamic Decision Mechanisms of Users’ Relevance Judgment for Information Retrieval via Log Analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation