Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2021)

Abstract

Evaluation of information retrieval systems follows the Cranfield paradigm, where the evaluation of several IR systems relies on a common evaluation environment (test collection and evaluation settings). The Cranfield paradigm requires the evaluation environment (EE) to be strictly identical to compare system’s performances. For those cases where such paradigm cannot be used, e.g. when we do not have access to the code of the systems, we consider an evaluation framework that allows for slight changes in the EEs, as the evolution of the document corpus or topics. To do so, we propose to compare systems evaluated on different environments using a reference system, called pivot. In this paper, we present and validate a method to select a pivot, which is used to construct a correct ranking of systems evaluated in different environments. We test our framework on the TREC-COVID test collection, which is composed of five rounds of growing topics, documents and relevance judgments. The results of our experiments show that the pivot strategy can propose a correct ranking of systems evaluated in an evolving test collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://ir.nist.gov/covidSubmit/archive.html.

  2. 2.

    https://www.semanticscholar.org/cord19.

  3. 3.

    https://ir.nist.gov/covidSubmit/archive/round4/UPrrf38rrf3-r4.pdf.

  4. 4.

    https://ir.nist.gov/covidSubmit/archive/round5/UPrrf102-wt-r5.pdf.

References

  1. Amati, G.: Probabilistic models for information retrieval based on divergence from randomness. Ph.D. thesis, Glasgow University, Glasgow, June 2003

    Google Scholar 

  2. Ferro, N., Sanderson, M.: Sub-corpora impact on system effectiveness. In: Proceedings of SIGIR 2017, pp. 901–904 (2017)

    Google Scholar 

  3. Ferro, N., Sanderson, M.: Improving the accuracy of system performance estimation by using shards. In: Proceedings of SIGIR 2019, pp. 805–814 (2019)

    Google Scholar 

  4. Jensen, E.C., Beitzel, S.M., Chowdhury, A., Frieder, O.: Repeatable evaluation of search services in dynamic environments. ACM Trans. Inf. Syst. (TOIS) 26(1), 1-es (2007)

    Article  Google Scholar 

  5. Ounis, I., Lioma, C., Macdonald, C., Plachouras, V.: Research directions in terrier: a search engine for advanced retrieval on the web. CEPIS Upgrade J. 8(1) (2007)

    Google Scholar 

  6. Sakai, T.: A simple and effective approach to score standardisation. In: Proceedings of ICTIR 2016, pp. 95–104 (2016)

    Google Scholar 

  7. Sanderson, M., Turpin, A., Zhang, Y., Scholer, F.: Differences in effectiveness across sub-collections. In: Proceedings of CIKM 2012, pp. 1965–1969 (2012)

    Google Scholar 

  8. Soboroff, I.: Dynamic test collections: measuring search effectiveness on the live web. In: Proceedings of SIGIR 2006, pp. 276–283 (2006)

    Google Scholar 

  9. Soboroff, I.: Meta-analysis for retrieval experiments involving multiple test collections. In: Proceedings of CIKM 2018, pp. 713–722 (2018)

    Google Scholar 

  10. Tonon, A., Demartini, G., Cudré-Mauroux, P.: Pooling-based continuous evaluation of information retrieval systems. Inf. Retr. J. 18(5), 445–472 (2015)

    Article  Google Scholar 

  11. Urbano, J., Lima, H., Hanjalic, A.: A new perspective on score standardization. In: Proceedings of SIGIR 2019, pp. 1061–1064 (2019)

    Google Scholar 

  12. Voorhees, E., et al.: TREC-COVID: constructing a pandemic information retrieval test collection. In: ACM SIGIR Forum, New York, NY, USA, vol. 54, pp. 1–12. ACM (2021)

    Google Scholar 

  13. Voorhees, E.M., Samarov, D., Soboroff, I.: Using replicates in information retrieval evaluation. ACM Trans. Inf. Syst. (TOIS) 36(2), 1–21 (2017)

    Article  Google Scholar 

  14. Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. arXiv (2020)

    Google Scholar 

  15. Webber, W., Moffat, A., Zobel, J.: Score standardization for robust comparison of retrieval systems. In: Proceedings of the 12th Australasian Document Computing Symposium, pp. 1–8 (2007)

    Google Scholar 

  16. Webber, W., Moffat, A., Zobel, J.: Score standardization for inter-collection comparison of retrieval systems. In: Proceedings of SIGIR 2008, pp. 51–58 (2008)

    Google Scholar 

  17. Webber, W., Park, L.A.: Score adjustment for correction of pooling bias. In: Proceedings of SIGIR 2009, pp. 444–451 (2009)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the ANR Kodicare bi-lateral project, grant ANR-19-CE23-0029 of the French Agence Nationale de la Recherche, and by the Austrian Science Fund (FWF).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriela Nicole González-Sáez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

González-Sáez, G.N., Mulhem, P., Goeuriot, L. (2021). Towards the Evaluation of Information Retrieval Systems on Evolving Datasets with Pivot Systems. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85251-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85250-4

  • Online ISBN: 978-3-030-85251-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics