Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Evaluation-as-a-Service for the Computational Sciences: Overview and Outlook

Published: 29 October 2018 Publication History

Abstract

Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely large data sets, confidential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the field of machine learning.
This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures.
The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions.
EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results.

References

[1]
Daron Acemoglu, Gino Gancia, and Fabrizio Zilibotti. 2012. Competing engines of growth: Innovation and standardization. J. Econ. Theory 147, 2 (2012), 570--601.e3.
[2]
Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements that don’t add up: Ad hoc retrieval results since 1998. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). ACM, 601--610.
[3]
Michael Arrington. 2006. AOL Proudly Releases Massive Amounts of Private Data. Retrieved from https://techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/.
[4]
Krisztian Balog, Liadh Kelly, and Anne Schuth. 2014. Head first: Living labs for ad hoc search evaluation. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (CIKM’14). 1815--1818.
[5]
Torben Brodt and Frank Hopfgartner. 2014. Shedding light on a living lab: The CLEF NEWSREEL open recommendation platform. In Proceedings of Information Interaction in Context Conference (IIiX’14). ACM, 223--226.
[6]
Jamie Callan and Alistair Moffat. 2012. Panel on use of proprietary data. SIGIR Forum 46, 2 (2012).
[7]
Gauthier Chassang. 2017. The impact of the EU general data protection regulation on scientific research. Ecancermedicalscience 11, 709 (Jan. 2017).
[8]
Gordon V. Cormack and Thomas R. Lynam. 2005. Spam corpus creation for TREC. In Proceedings of the 2nd Conference on Email and Anti-Spam (CEAS’05).
[9]
Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems. Lecture Notes in Computer Science, Vol. 1857. Springer, 1--15.
[10]
Nicola Ferro, Norbert Fuhr, Kalervo Järvelin, Noriko Kando, Matthias Lippold, and Justin Zobel. 2016. Increasing reproducibility in IR: Findings from the Dagstuhl seminar on “reproducibility of data-oriented experiments in e-science.”SIGIR Forum 50, 1 (June 2016), 68--82.
[11]
Nicola Ferro and Gianmaria Silvello. 2014. CLEF 15th birthday: What can we learn from ad hoc retrieval? In Proceedings of the Conference on Information Access Evaluation. Multilinguality, Multimodality, and Interaction, Evangelos Kanoulas, Mihai Lupu, Paul Clough, Mark Sanderson, Mark Hall, Allan Hanbury, and Elaine Toms (Eds.). Lecture Notes in Computer Science, Vol. 8685. Springer, 31--43.
[12]
Antonio Foncubierta-Rodríguez and Henning Müller. 2012. Ground truth generation in medical imaging: A crowdsourcing based iterative approach. In Proceedings of the Workshop on Crowdsourcing for Multimedia, ACM Multimedia.
[13]
Juliana Freire and Claudio T. Silva. 2012. Making computations and publications reproducible with VisTrails. Comput. Sci. Eng. 14, 4 (Aug. 2012), 18 --25.
[14]
Norbert Fuhr. 2017. Some common mistakes in IR evaluation, and how they can be avoided. SIGIR Forum 51, 3 (2017), 32--41.
[15]
Tim Gollub, Benno Stein, and Steven Burrows. 2012. Ousting ivory tower research: Toward a web framework for providing experiments as a service. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR’12). ACM, 1125--1126.
[16]
Maura R. Grossman and Gordon V. Cormack. 2014. Comments on “the implications of rule 26 (g) on the use of technology-assisted review.”Fed. Cts. L. Rev. 2014 (2014), 285--285.
[17]
Allan Hanbury and Henning Müller. 2010. Automated component--level evaluation: Present and future. In Proceedings of the International Conference of the Cross-Language Evaluation Forum (CLEF’10), Lecture Notes in Computer Science (LNCS), Vol. 6360. Springer, 124--135.
[18]
Allan Hanbury, Henning Müller, Krisztian Balog, Torben Brodt, Gordon V. Cormack, Ivan Eggel, Tim Gollub, Frank Hopfgartner, Jayashree Kalpathy-Cramer, Noriko Kando, Anastasia Krithara, Jimmy J. Lin, Simon Mercer, and Martin Potthast. 2015. Evaluation-as-a-service: Overview and outlook. CoRR abs/1512.07454. Retrieved from http://arxiv.org/abs/1512.07454.
[19]
Allan Hanbury, Henning Müller, Georg Langs, Marc André Weber, Bjoern H. Menze, and Tomas Salas Fernandez. 2012. Bringing the algorithms to the data: Cloud-based benchmarking for medical image analysis. In Proceedings of the 3rd International Conference of the CLEF Initiative (CLEF’12). Springer Verlag, 24--29.
[20]
Donna K. Harman. 1992. Evaluation issues in information retrieval. Info. Process. Manage. 28 (1992), 439--440.
[21]
David Hawking. 2015. If SIGIR had an academic track, what would be in it? In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’15). ACM, New York, NY, 1077--1077.
[22]
Katja Hofmann, Lihong Li, and Filip Radlinski. 2016. Online evaluation for information retrieval. Found. Trends Info. Retr. 10, 1 (June 2016), 1--117.
[23]
Frank Hopfgartner, Allan Hanbury, Henning Mueller, Noriko Kando, Simon Mercer, Jayashree Kalpathy-Cramer, Martin Potthast, Tim Gollub, Anastasia Krithara, Jimmy Lin, Krisztian Balog, and Ivan Eggel. 2015. Report on the evaluation-as-a-service (EaaS) expert workshop. SIGIR Forum 49, 1 (2015), 57--65.
[24]
Frank Hopfgartner, Benjamin Kille, Andreas Lommatzsch, Torben Brodt, and Tobias Heintz. 2014. Benchmarking news recommendations in a living lab. In Proceedings of the 5th International Conference of the CLEF Initiative (CLEF’14). Springer Verlag, 250--267.
[25]
Frank Hopfgartner, Andreas Lommatzsch, Benjamin Kille, Martha Larson, Torben Brodt, Paolo Cremonesi, and Alexandros Karatzoglou. 2016. The potentials of recommender systems challenges for student learning. In Proceedings of the Challenges in Machine Learning: Gaming and Education (CiML’16).
[26]
Xiao Hu, Jin Ha Lee, David Bainbridge, Kahyun Choi, Peter Organisciak, and J. Stephen Downie. 2015. The MIREX grand challenge: A framework of holistic user-experience evaluation in music information retrieval. J. Assoc. Info. Sci. Technol. 68, 1 (2015).
[27]
Bernardo Huberman. 2012. Big data deserve a bigger audience. Nature 482 (2012).
[28]
Darrel C. Ince, Leslie Hatton, and John Graham-Cumming. 2012. The case for open computer programs. Nature 482, 7386 (Feb. 2012), 485--488.
[29]
G. Douglas Jenkins Jr., Atul Mitra, Nina Gupta, and Jason D. Shaw. 1998. Are financial incentives related to performance? A meta-analytic review of empirical research. J. Appl. Psychol. 83, 5 (1998), 777--787.
[30]
Makoto P. Kato, Takehiro Yamamoto, Tomohiro Manabe, Akiomi Nishida, and Sumio Fujita. 2017. Overview of the NTCIR-13 OpenLiveQ task. In Proceedings of the 13th NII Testbeds and Community for Information Research Conference (NTCIR’17).
[31]
Benjamin Kille, Frank Hopfgartner, Torben Brodt, and Tobias Heintz. 2013. The plista dataset. In Proceedings of the International Workshop and Challenge on News Recommender Systems (NRS’13). ACM, 14--21.
[32]
Benjamin Kille, Andreas Lommatzsch, Frank Hopfgartner, Martha Larson, and Arjen P. de Vries. 2017. A stream-based resource for multi-dimensional evaluation of recommender algorithms. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR’17). 1257--1260.
[33]
Amy Jo Kim. 2000. Community Building on the Web: Secret Strategies for Successful Online Communities (1st ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
[34]
Rob Kitchin. 2014. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Sage.
[35]
Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu. 2012. Trustworthy online controlled experiments: Five puzzling outcomes explained. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 786--794.
[36]
Markus Krenn, Matthias Dorfer, Oscar Alfonso Jimenez del Toro, Henning Müller, Bjoern Menze, Marc-Andre Weber, Allan Hanbury, and Georg Langs. 2015. Creating a large-scale silver corpus from multiple algorithmic segmentations. In Proceedings of the Medical Computer Vision Workshop 2015 at MICCAI. LNCS, Vol. 9059. Springer, Munich.
[37]
Udo Kruschwitz and Charlie Hall. 2017. Searching the enterprise. Found. Trends Info. Retriev. 11, 1 (2017).
[38]
Takuya Kudo. 2010. Creating an age where anyone can find the information they truly need: NTCIR’s information retrieval ideal. NII Today 34 (2010), 4--7.
[39]
Mounia Lalmas and Liangjie Hong. 2018. Tutorial on metrics of user engagement: Applications to news, search and e-commerce. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM’18). 781--782.
[40]
Georg Langs, Henning Müller, Bjoern H. Menze, and Allan Hanbury. 2012. VISCERAL: Toward large data in medical imaging—Challenges and directions. In Proceedings of the 3rd MICCAI International Workshop (MCBR--CDS’12). Springer, 92--98.
[41]
Carol Lefebvre, Eric Manheimer, and Julie Glanville. 2008. Searching for studies. Cochrane Handbook for Systematic Reviews of Interventions, 95--150.
[42]
Mark Levy. 2013. Offline evaluation of recommender systems: All pain and no gain? In Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys’13). ACM.
[43]
David E. Polley Light, Robert P. and Katy Börner. 2013. Open data and open code for big science of science studies. In Proceedings of International Society of Scientometrics and Informetrics Conference. 1342--1356.
[44]
Jimmy Lin and Miles Efron. 2013. Overview of the TREC-2013 microblog track. In Proceedings of the 22nd Text REtrieval Conference (TREC’13). Gaithersburg, Maryland.
[45]
Andreas Lommatzsch. 2014. Real-time news recommendation using context-aware ensembles. In Advances in Information Retrieval—Proceedings of the 36th European Conference on IR Research (ECIR’14). 51--62.
[46]
Andreas Lommatzsch, Benjamin Kille, Frank Hopfgartner, Martha Larson, Torben Brodt, Jonas Seiler, and Özlem Özgöbek. 2017. CLEF 2017 NewsREEL overview: A stream-based recommender task for evaluation and education. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. Springer International Publishing, Cham, 239--254.
[47]
James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byres. 2011. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Technical Report.
[48]
John Markoff. 2012. Troves of personal data, forbidden to researchers. The New York Times (21 May 2012).
[49]
Nathan Marz and James Warren. 2015. Big Data: Principles and Best Practices of Scalable Real-time Data Systems (1st ed.). Manning Publications Co., Greenwich, CT.
[50]
Karl Matthias. 2015. Docker: Up and Running. O’Reilly.
[51]
Richard McCreadie, Ian Soboroff, Jimmy Lin, Craig Macdonald, Iadh Ounis, and Dean McCullough. 2012. On building a reusable Twitter corpus. In Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). 1113--1114.
[52]
Henning Müller, Jayashree Kalpathy-Cramer, Allan Hanbury, Keyvan Farahani, Rinat Sergeev, Jin H. Paik, Arno Klein, Antonio Criminisi, Andrew Trister, Thea Norman, David Kennedy, Ganapati Srinivasa, Artem Mamonov, and Nina Preuss. 2016. Report on the cloud--based evaluation approaches workshop 2015. ACM SIGIR Forum 51, 1 (2016), 35--41.
[53]
Virginia Ortiz-Repiso, Jane Greenberg, and Javier Calzada-Prado. 2018. A cross-institutional analysis of data-related curricula in information science programmes: A focused look at the iSchools. J. Info. Sci. (2018).
[54]
Iadh Ounis, Craig Macdonald, Jimmy Lin, and Ian Soboroff. 2011. Overview of the TREC-2011 microblog track. In Proceedings of the 20th Text REtrieval Conference (TREC’11).
[55]
Martin Potthast, Sarah Braun, Tolga Buz, Fabian Duffhauss, Florian Friedrich, Jörg Marvin Gülzow, Jakob Köhler, Winfried Lötzsch, Fabian Müller, Maike Elisa Müller, Robert Paßmann, Bernhard Reinke, Lucas Rettenmeier, Thomas Rometsch, Timo Sommer, Michael Träger, Sebastian Wilhelm, Benno Stein, Efstathios Stamatatos, and Matthias Hagen. 2016. Who wrote the web? Revisiting influential author identification research applicable to information retrieval. In Proceedings of the European Conference on Information Retrieval (ECIR’16), Vol. 9626. Springer, Berlin, 393--407.
[56]
Martin Potthast, Tim Gollub, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, and Benno Stein. 2014. Improving the reproducibility of PAN’s shared tasks: Plagiarism detection, author identification, and author profiling. In Proceedings of the 5th International Conference of the CLEF Initiative (CLEF’14). Springer Verlag, 268--299.
[57]
Martin Potthast, Matthias Hagen, and Benno Stein. 2016. Author obfuscation: Attacking the state of the art in authorship verification. In Proceedings of the CLEF 2016 Evaluation Labs (CEUR’16), Vol. 1609. CLEF and CEUR-WS.org.
[58]
Joaquin Quiñonero-Candela, Ido Dagan, Bernardo Magnini, and Florence d’Alché Buc (Eds.). 2006. Machine-Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment. Number 3944 in LNAI. Springer.
[59]
Jinfeng Rao, Jimmy Lin, and Miles Efron. 2015. Reproducible experiments on lexical and temporal feedback for tweet search. In Proceedings of the 37th European Conference on Information Retrieval (ECIR’15). 755--767.
[60]
Phyllis A. Richmond. 1963. Review of the cranfield project. Amer. Document. 14, 4 (1963), 307--311.
[61]
Charles Safran, Meryl Bloomrosen, W. Edward Hammond, Steven Labkoff, Suzanne Markel-Fox, Paul C. Tang, and Don E. Detmer. 2007. Toward a national framework for the secondary use of health data: An American medical informatics association white paper. J. Amer. Med. Info. Assoc. 14, 1 (2007), 1--9.
[62]
Mario Scriminaci, Andreas Lommatzsch, Benjamin Kille, Frank Hopfgartner, Martha Larson, Davide Malagoli, András Serény, and Till Plumbaum. 2016. Idomaar: A framework for multi-dimensional benchmarking of recommender algorithms. In Proceedings of the Poster Track of the 10th ACM Conference on Recommender Systems (RecSys’16).
[63]
Karen Spärck Jones and Cornelius Joost van Rijsbergen. 1975. Report on the Need for and Provision of an Ideal Information Retrieval Test Collection. British Library Research and Development Report 5266. Computer Laboratory, University of Cambridge.
[64]
Efstathios Stamatatos, Martin Potthast, Francisco Rangel, Paolo Rosso, and Benno Stein. 2015. Overview of the PAN/CLEF 2015 evaluation lab. In Proceedings of the International Conference of the CLEF Initiative (CLEF’15), Josiane Mothe, Jacques Savoy, Jaap Kamps, Karen Pinel-Sauvagnat, Gareth J.F. Jones, Eric SanJuan, Linda Cappellato, and Nicola Ferro (Eds.). Springer, Berlin, 518--538.
[65]
Joseph E. Stiglitz and Scott J. Wallsten. 1999. Public-private technology partnerships: Promises and pitfalls. Amer. Behav. Sci. 43, 1 (1999), 52--73.
[66]
Victoria Stodden. 2009. The legal framework for reproducible scientific research: Licensing and copyright. Comput. Sci. Eng. 11, 1 (Feb. 2009), 35—40.
[67]
Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (Jan. 2016), 64--73.
[68]
George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R. Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos et al. 2015. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformat. 16, 1 (2015), 138.
[69]
Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo, and Joerg Waitelonis. 2016. Evaluating entity linking: An analysis of current benchmark datasets and a roadmap for doing a better job. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16) (23--28), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), Paris, France.
[70]
Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: Networked science in machine learning. SIGKDD Explor. 15, 2 (2013), 49--60.
[71]
Ellen M. Voorhees and Donna K. Harman (Eds.). 2005. TREC: Experiment and Evaluation in Information Retrieval. MIT Press.
[72]
Richard Watermeyer. 2016. Impact in the REF: Issues and obstacles. Studies Higher Edu. 41, 2 (2016), 199--214.
[73]
Nianwen Xue, Hwee Tou Ng, Sameer Pradhan, Rashmi Prasad, Christopher Bryant, and Attapol Rutherford. 2015. The CoNLL-2015 shared task on shallow discourse parsing. In Proceedings of the 19th Conference on Computational Natural Language Learning—Shared Task. Association for Computational Linguistics, 1--16.
[74]
Nianwen Xue, Hwee Tou Ng, Sameer Pradhan, Attapol Rutherford, Bonnie Webber, Chuan Wang, and Hongmin Wang. 2016. CoNLL 2016 shared task on multilingual shallow discourse parsing. In Proceedings of the CoNLL 2016 Shared Task. Association for Computational Linguistics, 1--19.
[75]
Daniel Zeman, Martin Popel, Milan Straka, Jan Hajic, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkova, Jan Hajic jr., Jaroslava Hlavacova, Václava Kettnerová, Zdenka Uresova, Jenna Kanerva, Stina Ojala, Anna Missilä, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, Héctor Martínez Alonso, Çağrı Çöltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael Mandl, Jesse Kirchner, Hector Fernandez Alcalde, Jana Strnadová, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendonca, Tatiana Lando, Rattima Nitisaroj, and Josie Li. 2017. CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics, 1--19.

Cited By

View all
  • (2024)Browsing and Searching Metadata of TRECProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657873(313-323)Online publication date: 10-Jul-2024
  • (2023)Shared tasks as tutorialsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i13.26877(15807-15815)Online publication date: 7-Feb-2023
  • (2023)The Information Retrieval Experiment PlatformProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591888(2826-2836)Online publication date: 19-Jul-2023
  • Show More Cited By

Index Terms

  1. Evaluation-as-a-Service for the Computational Sciences: Overview and Outlook

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Journal of Data and Information Quality
        Journal of Data and Information Quality  Volume 10, Issue 4
        Reproducibility in Information Retrieval:Tools and Infrastructures
        December 2018
        106 pages
        ISSN:1936-1955
        EISSN:1936-1963
        DOI:10.1145/3289400
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 29 October 2018
        Accepted: 01 July 2018
        Revised: 01 April 2018
        Received: 01 October 2017
        Published in JDIQ Volume 10, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Evaluation-as-a-service
        2. benchmarking
        3. information access systems

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • European Science Foundation via its Research Network Program “Evaluating Information Access Systems” (ELIAS)
        • European Commission via the FP7 project VISCERAL

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)26
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Browsing and Searching Metadata of TRECProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657873(313-323)Online publication date: 10-Jul-2024
        • (2023)Shared tasks as tutorialsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i13.26877(15807-15815)Online publication date: 7-Feb-2023
        • (2023)The Information Retrieval Experiment PlatformProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591888(2826-2836)Online publication date: 19-Jul-2023
        • (2023)Continuous Integration for Reproducible Shared Tasks with TIRA.ioAdvances in Information Retrieval10.1007/978-3-031-28241-6_20(236-241)Online publication date: 16-Mar-2023
        • (2022)Analytics Methods to Understand Information Retrieval Effectiveness—A SurveyMathematics10.3390/math1012213510:12(2135)Online publication date: 19-Jun-2022
        • (2022)Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR 2021ACM SIGIR Forum10.1145/3527546.352755955:2(1-16)Online publication date: 17-Mar-2022
        • (2022)ir_metadata: An Extensible Metadata Schema for IR ExperimentsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531738(3078-3089)Online publication date: 6-Jul-2022
        • (2021)The information retrieval anthology 2021ACM SIGIR Forum10.1145/3476415.347641755:1(1-18)Online publication date: 16-Jul-2021
        • (2021)Interactive Information Retrieval: Models, Algorithms, and EvaluationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462811(2662-2665)Online publication date: 11-Jul-2021
        • (2021)repro_eval: A Python Interface to Reproducibility Measures of System-Oriented IR ExperimentsAdvances in Information Retrieval10.1007/978-3-030-72240-1_51(481-486)Online publication date: 30-Mar-2021
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media