Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3331184.3331399acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

TrecTools: an Open-source Python Library for Information Retrieval Practitioners Involved in TREC-like Campaigns

Published: 18 July 2019 Publication History

Abstract

This paper introduces TrecTools, a Python library for assisting Information Retrieval (IR) practitioners with TREC-like campaigns. IR practitioners tasked with activities like building test collections, evaluating systems, or analysing results from empirical experiments commonly have to resort to use a number of different software tools and scripts that each perform an individual functionality - and at times they even have to implement ad-hoc scripts of their own. TrecTools aims to provide a unified environment for performing these common activities.
Written in the most popular programming language for Data Science, Python, TrecTools offers an object-oriented, easily extensible library. Existing systems, e.g., trec_eval, have considerable barrier to entry when it comes to modify or extend them. Furthermore, many existing IR measures and tools are implemented independently of each other, in different programming languages. TrecTools seeks to lower the barrier to entry and to unify existing tools, frameworks and activities into one common umbrella. Widespread adoption of a centralised solution for developing, evaluating, and analysing TREC-like campaigns will ease the burden on organisers and provide participants and users with a standard environment for common IR experimental activities.
TrecTools is distributed as an open source library under the MIT license at https://github.com/joaopalotti/trectools

References

[1]
Enrique Amigó, Jorge Carrillo-de Albornoz, Mario Almagro-Cádiz, Julio Gonzalo, Javier Rodríguez-Vidal, and Felisa Verdejo. 2017. Evall: Open access evaluation for information access systems. In SIGIR. ACM, 1301--1304.
[2]
Javed A. Aslam and Mark Montague. 2001. Models for Metasearch. In SIGIR. ACM, 276--284.
[3]
Leif Azzopardi, Paul Thomas, and Alistair Moffat. 2019. cwl_eval: An Evaluation Tool for Information Retrieval. In SIGIR. ACM.
[4]
Chris Buckley et al. 2004. The trec_eval evaluation package.
[5]
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR. ACM, 659--666.
[6]
Gordon V. Cormack, Charles L. A. Clarke, and Stefan Buettcher. 2009. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In SIGIR, Vol. 9. 758--759.
[7]
Jimmy, Guido Zuccon, João Palotti, Lorraine Goeuriot, and Liadh Kelly. 2018. Overview of the CLEF 2018 Consumer Health Search Task. In CLEF. http://ceur-ws.org/Vol-2125/invited_paper_17.pdf.
[8]
Bevan Koopman and Guido Zuccon. 2014. Relevation!: An open source system for information retrieval relevance assessment. In SIGIR. 1243--1244.
[9]
Joon Ho Lee. 1997. Analyses of Multiple Evidence Combination. In SIGIR. ACM, 267--276.
[10]
Aldo Lipani, Mihai Lupu, and Allan Hanbury. 2017. Visual Pool: A Tool to Visualize and Interact with the Pooling Method. In SIGIR. ACM, 1321--1324.
[11]
Aldo Lipani, Joao Palotti, Mihai Lupu, Florina Piroi, Guido Zuccon, and Allan Hanbury. 2017. Fixed-cost pooling strategies based on IR evaluation measures. In ECIR. Springer, 357--368.
[12]
Craig Macdonald, Richard McCreadie, Rodrygo L. T. Santos, and Iadh Ounis. 2012. From puppy to maturity: Experiences in developing Terrier. Proc. of OSIR at SIGIR (2012), 60--63.
[13]
Craig Macdonald and Iadh Ounis. 2006. Voting for candidates: adapting data fusion techniques for an expert search task. In CIKM. ACM, 387--396.
[14]
Alistair Moffat and Justin Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Inf. Syst. 27, 1, Article 2 (Dec. 2008), 27 pages.
[15]
Joao Palotti, Lorraine Goeuriot, Guido Zuccon, and Allan Hanbury. 2016. Ranking health web pages with relevance and understandability. In SIGIR. ACM, 965--968.
[16]
João Palotti, Guido Zuccon, Lorraine Goeuriot, Liadh Kelly, Allan Hanbury, Gareth J. F. Jones, Mihai Lupu, and Pavel Pecina. 2015. ShARe/CLEF eHealth Evaluation Lab 2015, Task 2: User-centred Health Information Retrieval. In CLEF.
[17]
Joao Palotti, Guido Zuccon, and Allan Hanbury. 2018. MM: A new Framework for Multidimensional Evaluation of Search Engines. In CIKM. ACM, 1699--1702.
[18]
João Palotti, Guido Zuccon, Jimmy, Pavel Pecina, Mihai Lupu, Lorraine Goeuriot, Liadh Kelly, and Allan Hanbury. 2017. CLEF 2017 Task Overview: The IR Task at the eHealth Evaluation Lab - Evaluating Retrieval Methods for Consumer Health Search. In CLEF. http://ceur-ws.org/Vol-1866/invited_paper_16.pdf.
[19]
K. Spark-Jones. 1975. Report on the need for and provision of an 'ideal' information retrieval test collection. Computer Laboratory (1975).
[20]
Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis, Vol. 2. 2--6.
[21]
Julián Urbano and Mónica Marrero. 2017. The Treatment of Ties in AP Correlation. In SIGIR. 321--324.
[22]
Christophe Van Gysel and Maarten de Rijke. 2018. Pytrec_Eval: An Extremely Fast Python Interface to Trec_Eval. In SIGIR. ACM, 873--876.
[23]
Christophe Van Gysel, Evangelos Kanoulas, and Maarten de Rijke. 2017. Pyndri: a Python Interface to the Indri Search Engine. In ECIR, Vol. 2017. Springer.
[24]
Lidan Wang, Paul N. Bennett, and Kevyn Collins-Thompson. 2012. Robust ranking models via risk-sensitive optimization. In SIGIR. ACM, 761--770.
[25]
Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible ranking baselines using Lucene. JDIQ 10, 4 (2018), 16.
[26]
Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. 2008. A new rank correlation coefficient for information retrieval. In SIGIR. ACM, 587--594.
[27]
Guido Zuccon. 2016. Understandability biased evaluation for information retrieval. In ECIR. Springer, 280--292.
[28]
Guido Zuccon, João Palotti, Lorraine Goeuriot, Liadh Kelly, Mihai Lupu, Pavel Pecina, Henning Mueller, Julie Budaher, and Anthony Deacon. 2016. The IR Task at the CLEF eHealth Evaluation Lab 2016: User-centred Health Information Retrieval. In CLEF, Vol. 1609. 15--27.

Cited By

View all
  • (2024)Developing an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description-Based Code Selection (Preprint)JMIR Formative Research10.2196/60095Online publication date: 1-May-2024
  • (2024)Browsing and Searching Metadata of TRECProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657873(313-323)Online publication date: 10-Jul-2024
  • (2024)Performance of Traditional and Dense Vector Information Retrieval Models in Code Search2024 2nd International Conference on Software Engineering and Information Technology (ICoSEIT)10.1109/ICoSEIT60086.2024.10497512(52-57)Online publication date: 28-Feb-2024
  • Show More Cited By

Index Terms

  1. TrecTools: an Open-source Python Library for Information Retrieval Practitioners Involved in TREC-like Campaigns

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2019
      1512 pages
      ISBN:9781450361729
      DOI:10.1145/3331184
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 July 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. information retrieval campaigns
      2. information retrieval evaluation
      3. python
      4. retrieval effectiveness
      5. test collections
      6. trec_eval
      7. trectools

      Qualifiers

      • Research-article

      Funding Sources

      • Australian Research Council DECRA Research Fellowship

      Conference

      SIGIR '19
      Sponsor:

      Acceptance Rates

      SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;
      Overall Acceptance Rate 705 of 3,463 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)22
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Developing an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description-Based Code Selection (Preprint)JMIR Formative Research10.2196/60095Online publication date: 1-May-2024
      • (2024)Browsing and Searching Metadata of TRECProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657873(313-323)Online publication date: 10-Jul-2024
      • (2024)Performance of Traditional and Dense Vector Information Retrieval Models in Code Search2024 2nd International Conference on Software Engineering and Information Technology (ICoSEIT)10.1109/ICoSEIT60086.2024.10497512(52-57)Online publication date: 28-Feb-2024
      • (2024)Enhancing RAG’s Retrieval via Query BacktranslationsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0579-8_20(270-285)Online publication date: 29-Nov-2024
      • (2023)Learned Text Representation for Amharic Information Retrieval and Natural Language ProcessingInformation10.3390/info1403019514:3(195)Online publication date: 20-Mar-2023
      • (2023)Overview of Touché 2023: Argument and Causal RetrievalExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-42448-9_31(507-530)Online publication date: 18-Sep-2023
      • (2023)Bootstrapped nDCG Estimation in the Presence of Unjudged DocumentsAdvances in Information Retrieval10.1007/978-3-031-28244-7_20(313-329)Online publication date: 17-Mar-2023
      • (2022)ranx.fuse: A Python Library for MetasearchProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557207(4808-4812)Online publication date: 17-Oct-2022
      • (2022)Overview of Touché 2022: Argument RetrievalExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-13643-6_21(311-336)Online publication date: 5-Sep-2022
      • (2022)Streamlining Evaluation with ir-measuresAdvances in Information Retrieval10.1007/978-3-030-99739-7_38(305-310)Online publication date: 5-Apr-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media