Abstract
Evaluation of information retrieval systems follows the Cranfield paradigm, where the evaluation of several IR systems relies on a common evaluation environment (test collection and evaluation settings). The Cranfield paradigm requires the evaluation environment (EE) to be strictly identical to compare system’s performances. For those cases where such paradigm cannot be used, e.g. when we do not have access to the code of the systems, we consider an evaluation framework that allows for slight changes in the EEs, as the evolution of the document corpus or topics. To do so, we propose to compare systems evaluated on different environments using a reference system, called pivot. In this paper, we present and validate a method to select a pivot, which is used to construct a correct ranking of systems evaluated in different environments. We test our framework on the TREC-COVID test collection, which is composed of five rounds of growing topics, documents and relevance judgments. The results of our experiments show that the pivot strategy can propose a correct ranking of systems evaluated in an evolving test collection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amati, G.: Probabilistic models for information retrieval based on divergence from randomness. Ph.D. thesis, Glasgow University, Glasgow, June 2003
Ferro, N., Sanderson, M.: Sub-corpora impact on system effectiveness. In: Proceedings of SIGIR 2017, pp. 901–904 (2017)
Ferro, N., Sanderson, M.: Improving the accuracy of system performance estimation by using shards. In: Proceedings of SIGIR 2019, pp. 805–814 (2019)
Jensen, E.C., Beitzel, S.M., Chowdhury, A., Frieder, O.: Repeatable evaluation of search services in dynamic environments. ACM Trans. Inf. Syst. (TOIS) 26(1), 1-es (2007)
Ounis, I., Lioma, C., Macdonald, C., Plachouras, V.: Research directions in terrier: a search engine for advanced retrieval on the web. CEPIS Upgrade J. 8(1) (2007)
Sakai, T.: A simple and effective approach to score standardisation. In: Proceedings of ICTIR 2016, pp. 95–104 (2016)
Sanderson, M., Turpin, A., Zhang, Y., Scholer, F.: Differences in effectiveness across sub-collections. In: Proceedings of CIKM 2012, pp. 1965–1969 (2012)
Soboroff, I.: Dynamic test collections: measuring search effectiveness on the live web. In: Proceedings of SIGIR 2006, pp. 276–283 (2006)
Soboroff, I.: Meta-analysis for retrieval experiments involving multiple test collections. In: Proceedings of CIKM 2018, pp. 713–722 (2018)
Tonon, A., Demartini, G., Cudré-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retr. J. 18(5), 445–472 (2015)
Urbano, J., Lima, H., Hanjalic, A.: A new perspective on score standardization. In: Proceedings of SIGIR 2019, pp. 1061–1064 (2019)
Voorhees, E., et al.: TREC-COVID: constructing a pandemic information retrieval test collection. In: ACM SIGIR Forum, New York, NY, USA, vol. 54, pp. 1–12. ACM (2021)
Voorhees, E.M., Samarov, D., Soboroff, I.: Using replicates in information retrieval evaluation. ACM Trans. Inf. Syst. (TOIS) 36(2), 1–21 (2017)
Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. arXiv (2020)
Webber, W., Moffat, A., Zobel, J.: Score standardization for robust comparison of retrieval systems. In: Proceedings of the 12th Australasian Document Computing Symposium, pp. 1–8 (2007)
Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of SIGIR 2008, pp. 51–58 (2008)
Webber, W., Park, L.A.: Score adjustment for correction of pooling bias. In: Proceedings of SIGIR 2009, pp. 444–451 (2009)
Acknowledgements
This work was supported by the ANR Kodicare bi-lateral project, grant ANR-19-CE23-0029 of the French Agence Nationale de la Recherche, and by the Austrian Science Fund (FWF).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
González-Sáez, G.N., Mulhem, P., Goeuriot, L. (2021). Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-85251-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)