Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Report on the Dagstuhl Seminar on Frontiers of Information Access Experimentation for Research and Education

Published: 04 December 2023 Publication History

Abstract

This report documents the program and the outcomes of Dagstuhl Seminar 23031 "Frontiers of Information Access Experimentation for Research and Education", which brought together 38 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) and specifically focused on developing more responsible experimental practices leading to more valid results, both for research as well as for scientific education.
The seminar featured a series of long and short talks delivered by participants, who helped in setting a common ground and in letting emerge topics of interest to be explored as the main output of the seminar. This led to the definition of five groups which investigated challenges, opportunities, and next steps in the following areas: reality check, i.e. conducting real-world studies, human-machine-collaborative relevance judgment frameworks, overcoming methodological challenges in information retrieval and recommender systems through awareness and education, results-blind reviewing, and guidance for authors.
Date: 15--20 January 2023.
Website: https://www.dagstuhl.de/23031.

References

[1]
Ahmed Allam, Peter Johannes Schulz, and Kent Nakamoto. The impact of search engine selection and sorting criteria on vaccination beliefs and attitudes: Two experiments manipulating Google output. Journal of Medical Internet Research, 16(4):e100, 2014.
[2]
Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. Improvements that don't add up: Ad-hoc retrieval results since 1998. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM '09, pages 601--610, 2009.
[3]
Christine Bauer and Eva Zangerle. Leveraging multi-method evaluation for multi-stakeholder settings. In 1st Workshop on the Impact of Recommender Systems, co-located with 13th ACM Conference on Recommender Systems (ACM RecSys 2019), ImpactRS'19. CEUR-WS.org, 2019. URL http://ceur-ws.org/Vol-2462/short3.pdf.
[4]
Christine Bauer, Ben A. Carterette, Nicola Ferro, and Norbert Fuhr, editors. Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education, Dagstuhl Reports, Volume 13, Number 1, 2023a. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Germany.
[5]
Christine Bauer, Maik Fröbe, Dietmar Jannach, Udo Kruschwitz, Paolo Rosso, Damiano Spina, and Nava Tintarev. Overcoming methodological challenges in information retrieval and recommender systems through awareness and education. In Bauer et al. [2023a], pages 51--67.
[6]
Joeran Beel, Timo Breuer, Anita Crescenzi, Norbert Fuhr, and Meijie Li. Results-blind reviewing. In Bauer et al. [2023a], pages 67--73.
[7]
Jöran Beel and Stefan Langer. A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In Proceedings of the 22nd International Conference on Theory and Practice of Digital Libraries, TPDL '15, pages 153--168, 2015.
[8]
Stefan Büttcher, Charles L. A. Clarke, Peter C. K. Yeung, and Ian Soboroff. Reliable information retrieval evaluation with incomplete and biased judgements. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '07, pages 63--70. ACM, 2007.
[9]
Donald T. Campbell and Julian C. Stanley. Experimental and quasi-experimental designs for research. Houghton Mifflin Company, Boston, 1963.
[10]
Ye Chen, Ke Zhou, Yiqun Liu, Min Zhang, and Shaoping Ma. Meta-evaluation of online and offline web search evaluation metrics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '17, pages 15--24. ACM, 2017.
[11]
Kenneth Ward Church and Valia Kordoni. Emerging trends: Sota-chasing. Natural Language Engineering, 28(2):249--269, 2022.
[12]
Charles L. A. Clarke, Gianluca Demartini, Laura Dietz, Guglielmo Faggioli, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Ian Soboroff, Benno Stein, and Henning Wachsmuth. Hmc: A spectrum of human-machine-collaborative relevance judgment frameworks. In Bauer et al. [2023a], pages 41--50.
[13]
Charles LA Clarke, Alexandra Vtyurina, and Mark D Smucker. Assessing top-preferences. ACM Transactions on Information Systems, 39(3), 2021.
[14]
Andy Cockburn, Pierre Dragicevic, Lonni Besançpn, and Carl Gutwin. Threats of a replication crisis in empirical computer science. Communications of the ACM, 63(8):70--79, 2020.
[15]
Giorgio Maria Di Nunzio, Maria Maistro, Christin Seifert, Julián Urbano, and Justin Zobel. Guidance for authors. In Bauer et al. [2023a], pages 74--79.
[16]
Tim Draws, Nava Tintarev, and Ujwal Gadiraju. Assessing viewpoint diversity in search results using ranking fairness metrics. ACM SIGKDD Explorations Newsletter, 23(1):50--58, 2021a.
[17]
Tim Draws, Nava Tintarev, Ujwal Gadiraju, Alessandro Bozzon, and Benjamin Timmermans. This is not what we ordered: Exploring why biased search result rankings affect user attitudes on debated topics. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '21, pages 295--305. ACM, 2021b.
[18]
Tim Draws, Nirmal Roy, Oana Inel, Alisa Rieger, Rishav Hada, Mehmet Orcun Yalcin, Benjamin Timmermans, and Nava Tintarev. Viewpoint diversity in search results. In Advances in Information Retrieval, pages 279--297. Springer, 2023.
[19]
Michael D. Ekstrand, Michael Ludwig, Joseph A. Konstan, and John T. Riedl. Rethinking the recommender research ecosystem: Reproducibility, openness, and LensKit. In Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys '11, pages 133--140, 2011.
[20]
Theresa Elstner, Frank Loebe, Yamen Ajjour, Christopher Akiki, Alexander Bondarenko, Maik Frobe, Lukas Gienapp, Nikolay Kolyada, Janis Mohr, Stephan Sandfuchs, Matti Wiegmann, Jörg Frochte, Nicola Ferro, Sven Hofmann, Benno Stein, Matthias Hagen, and Martin Potthast. Shared tasks as tutorials: A methodical approach. In 37th AAAI Conference on Artificial Intelligence, AAAI 2023. AAAI, 2023.
[21]
Robert Epstein and Ronald E. Robertson. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences, 112(33):E4512--E4521, 2015.
[22]
Robert Epstein, Ronald E. Robertson, David Lazer, and Christo Wilson. Suppressing the search engine manipulation effect (SEME). Proceedings of the ACM on Human-Computer Interaction, 1(CSCW):1--22, 2017.
[23]
Guglielmo Faggioli, Laura Dietz, Charles Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, et al. Perspectives on large language models for relevance judgment. In Proceedings of the 13th International Conference on the Theory of Information Retrieval, ICTIR '23, 2023.
[24]
Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. Are we really making much progress? a worrying analysis of recent neural recommendation approaches. In Proceedings of the 13th ACM Conference on Recommender Systems, RecSys '19, pages 101--109. ACM, 2019.
[25]
Nicola Ferro. What Happened in CLEF... For a While? In Experimental IR Meets Multilinguality, Multimodality, and Interaction, CLEF 19, pages 3--45. Springer, 2019.
[26]
Nicola Ferro. Coordinate research, evaluation, and education in information access: Towards a more sustainable environment for the community. In Bauer et al. [2023a], pages 13--16.
[27]
Nicola Ferro and Mark Sanderson. How do you test a test?: A multifaceted examination of significance tests. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, WSDM '22, pages 280--288. ACM, 2022.
[28]
Nicola Ferro, Norbert Fuhr, Gregory Grefenstette, Joseph A. Konstan, Pablo Castells, Elizabeth M. Daly, Thierry Declerck, Michael D. Ekstrand, Werner Geyer, Julio Gonzalo, Tsvi Kuflik, Krister Lindén, Bernardo Magnini, Jian-Yun Nie, Raffaele Perego, Bracha Shapira, Ian Soboroff, Nava Tintarev, Karin Verspoor, Martijn C. Willemsen, and Justin Zobel. From evaluating to forecasting performance: How to turn information retrieval, natural language processing and recommender systems into predictive sciences (Dagstuhl Perspectives Workshop 17442). Dagstuhl Manifestos, 7(1):96--139, 2018.
[29]
Bruce Ferwerda, Akkan Hanbury, Bart P. Knijnenburg, Birger Larsen, Lien Michiels, Andrea Papenmeier, Alan Said, Philipp Schaer, and Martijn C. Willemsen. Reality check---conducting real world studies. In Bauer et al. [2023a], pages 20--40.
[30]
Juliana Freire, Norbert Fuhr, and Andreas Rauber, editors. Report from Dagstuhl Seminar 16041: Reproducibility of Data-Oriented Experiments in e-Science, Dagstuhl Reports, Volume 6, Number 1, 2016. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Germany.
[31]
Norbert Fuhr. Some common mistakes in IR evaluation, and how they can be avoided. SIGIR Forum, 51(3):32--41, 2017.
[32]
Carlos A. Gomez-Uribe and Neil Hunt. The Netflix recommender system: Algorithms, business value, and innovation. Transactions on Management Information Systems, 6(4), 2016.
[33]
Odd Erik Gundersen and Sigbjørn Kjensmo. State of the art: Reproducibility in artificial intelligence. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI '18, pages 1644--1651, 2018.
[34]
Donna K. Harman. Information retrieval evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services, 3(2):1--119, 2011.
[35]
Donna K. Harman and Ellen M. Voorhees, editors. TREC. Experiment and Evaluation in Information Retrieval, 2005. MIT Press.
[36]
Ahmed Hassan, Rosie Jones, and Kristina Lisa Klinkner. Beyond DCG: User behavior as a predictor of a successful search. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM '10, pages 221--230. ACM, 2010.
[37]
W. Heisenberg. Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. Zeitschrift für Physik, 43(3--4):172--198, 1927.
[38]
Dietmar Jannach and Gediminas Adomavicius. Price and profit awareness in recommender systems. In Proceedings of the ACM RecSys 2017 Workshop on Value-Aware and Multi-Stakeholder Recommendation, 2017.
[39]
Dietmar Jannach and Christine Bauer. Escaping the McNamara fallacy: Towards more impactful recommender systems research. AI Magazine, 41(4):79--95, 2020.
[40]
Daniel Kahneman. Thinking, fast and slow. Penguin, 2011.
[41]
Diane Kelly. Methods for evaluating interactive information retrieval systems with users. Foundations and Trends in Information Retrieval, 3(1--2), 2007.
[42]
Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction, 22(4--5):411--504, 2012.
[43]
Joseph A. Konstan and Gediminas Adomavicius. Toward identification and adoption of best practices in algorithmic recommender systems research. In Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation, pages 23--28, 2013.
[44]
Daniël Lakens. Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social psychological and personality science, 8(4):355--362, 2017.
[45]
Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, and Emine Yilmaz. Significant improvements over the state of the art? a case study of the MS MARCO document ranking leaderboard. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '21, pages 2283--2287. ACM, 2021.
[46]
Marianne Lykke, Ann Bygholm, Louise Bak Søndergaard, and Katriina Byström. The role of historical and contextual knowledge in enterprise search. Journal of Documentation, 78(5): 1053--1074, 2022.
[47]
Jiaxin Mao, Yiqun Liu, Ke Zhou, Jian-Yun Nie, Jingtao Song, Min Zhang, Shaoping Ma, Jiashen Sun, and Hengliang Luo. When does relevance mean usefulness and user satisfaction in Web search? In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '16, pages 463--472. ACM, 2016.
[48]
Frances A. Pogacar, Amira Ghenai, Mark D. Smucker, and Charles L.A. Clarke. The positive and negative influence of search results on people's decisions about the efficacy of medical treatments. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR '17, pages 209--216. ACM, 2017.
[49]
Martin Potthast, Lukas Gienapp, Florian Euchner, Nick Heilenkötter, Nico Weidmann, Henning Wachsmuth, Benno Stein, and Matthias Hagen. Argument search: Assessing argument relevance. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, pages 1117--1120. ACM, 2019.
[50]
Tetsuya Sakai. Laboratory experiments in information retrieval. The Information Retrieval Series, 40, 2018.
[51]
David P. Sander and Laura Dietz. EXAM: how to evaluate retrieve-and-generate systems for users who do not (yet) know what they want. In Proceedings of the Second International Conference on Design of Experimental Search & Information Retrieval Systems, pages 136--146. CEUR-WS.org, 2021. URL https://ceur-ws.org/Vol-2950/paper-16.pdf.
[52]
Mark Sanderson, Monica Lestari Paramita, Paul Clough, and Evangelos Kanoulas. Do user preferences and evaluation measures line up? In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '10, pages 555--562. ACM, 2010.
[53]
Daniel Schwartz, Baruch Fischhoff, Tamar Krishnamurti, and Fallaw Sowell. The Hawthorne effect and energy awareness. Proceedings of the National Academy of Science, 110(38):15242--15246, 2013.
[54]
William R Shadish, Thomas D Cook, and Donald T. Campbell. Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company, New York, 2002.
[55]
Zhu Sun, Di Yu, Hui Fang, Jie Yang, Xinghua Qu, Jie Zhang, and Cong Geng. Are we evaluating rigorously? Benchmarking recommendation for reproducible evaluation and fair comparison. In Proceedings of the 14th ACM Conference on Recommender Systems, RecSys '20, pages 23--32, 2020.
[56]
Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, and Barbara Plank. Experimental standards for deep learning in natural language processing research, 2022. URL https://aclanthology.org/2022.findings-emnlp.196.
[57]
Johnny van Doorn, Don van den Bergh, Udo Böhm, Fabian Dablander, Koen Derks, Tim Draws, Alexander Etz, Nathan J Evans, Quentin F Gronau, Julia M Haaf, et al. The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3):813--826, 2021.
[58]
Ryen W White. Interactions with search systems. Cambridge University Press, New York, NY, USA, 2016.
[59]
Haley M. Woznyj, Kelcie Grenier, Roxanne Ross, George C. Banks, and Steven G. Rogelberg. Results-blind review: a masked crusader for science. European Journal of Work and Organizational Psychology, 27(5):561--576, 2018.
[60]
Wei Yang, Kuang Lu, Peilin Yang, and Jimmy Lin. Critically examining the "neural hype": Weak baselines and the additivity of effectiveness gains from neural ranking models. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'19, pages 1129--1132, 2019.
[61]
Eva Zangerle and Christine Bauer. Evaluating recommender systems: survey and framework. ACM Computing Surveys, 55(8), 2022.
[62]
Fan Zhang, Jiaxin Mao, Yiqun Liu, Xiaohui Xie, Weizhi Ma, Min Zhang, and Shaoping Ma. Models versus satisfaction: Towards a better understanding of evaluation metrics. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '20, pages 379--388. ACM, 2020.
[63]
Justin Zobel. When measurement misleads: The limits of batch assessment of retrieval systems. SIGIR Forum, 56(1), 2023.

Cited By

View all
  • (2024)Report on the 1st International Workshop on Open Web Search (WOWS 2024) at ECIR 2024ACM SIGIR Forum10.1145/3687273.368729058:1(1-13)Online publication date: 7-Aug-2024
  • (2024)Resources for Combining Teaching and Research in Information Retrieval CourseworkProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657886(1115-1125)Online publication date: 10-Jul-2024
  • (2024)What Happened in CLEF$$\ldots $$ For Another While?Experimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_1(3-57)Online publication date: 14-Sep-2024

Index Terms

  1. Report on the Dagstuhl Seminar on Frontiers of Information Access Experimentation for Research and Education
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM SIGIR Forum
        ACM SIGIR Forum  Volume 57, Issue 1
        June 2023
        129 pages
        ISSN:0163-5840
        DOI:10.1145/3636341
        Issue’s Table of Contents
        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 04 December 2023
        Published in SIGIR Volume 57, Issue 1

        Check for updates

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)77
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 24 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Report on the 1st International Workshop on Open Web Search (WOWS 2024) at ECIR 2024ACM SIGIR Forum10.1145/3687273.368729058:1(1-13)Online publication date: 7-Aug-2024
        • (2024)Resources for Combining Teaching and Research in Information Retrieval CourseworkProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657886(1115-1125)Online publication date: 10-Jul-2024
        • (2024)What Happened in CLEF$$\ldots $$ For Another While?Experimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_1(3-57)Online publication date: 14-Sep-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media