research-article

Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access

Authors:

Nolwenn Bernard,

Krisztian BalogAuthors Info & Claims

ICTIR '24: Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval

Pages 185 - 193

https://doi.org/10.1145/3664190.3672529

Published: 05 August 2024 Publication History

Abstract

User simulation is a promising approach for automatically training and evaluating conversational information access agents, enabling the generation of synthetic dialogues and facilitating reproducible experiments at scale. However, the objectives of user simulation for the different uses remain loosely defined, hindering the development of effective simulators. In this work, we formally characterize the distinct objectives for user simulators: training aims to maximize behavioral similarity to real users, while evaluation focuses on the accurate prediction of real-world conversational agent performance. Through an empirical study, we demonstrate that optimizing for one objective does not necessarily lead to improved performance on the other. This finding underscores the need for tailored design considerations depending on the intended use of the simulator. By establishing clear objectives and proposing concrete measures to evaluate user simulators against those objectives, we pave the way for the development of simulators that are specifically tailored to their intended use, ultimately leading to more effective conversational agents.

References

[1]

Krisztian Balog. 2021. Conversational AI from an Information Retrieval Perspective: Remaining Challenges and a Case for User Simulation. In Proceedings of the 2nd International Conference on Design of Experimental Search & Information REtrieval Systems (DESIRES '21). 80--90.

[2]

Krisztian Balog, David Maxwell, Paul Thomas, and Shuo Zhang. 2021. Report on the 1st Simulation for Information Retrieval Workshop (Sim4IR 2021) at SIGIR 2021. SIGIR Forum, Vol. 55, 2 (2021), 1--16.

Digital Library

[3]

Krisztian Balog and ChengXiang Zhai. 2024. User Simulation for Evaluating Information Access Systems. Foundations and Trends in Information Retrieval, Vol. 18, 1--2 (2024), 1--261.

Digital Library

[4]

Nolwenn Bernard and Krisztian Balog. 2023. MG-ShopDial: A Multi-Goal Conversational Dataset for e-Commerce. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23). 2775--2785.

Digital Library

[5]

Adam Bignold, Francisco Cruz, Richard Dazeley, Peter Vamplew, and Cameron Foale. 2021. An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users. Biomimetics, Vol. 6, 1 (2021).

[6]

Hayet Brabra, Marcos Báez, Boualem Benatallah, Walid Gaaloul, Sara Bouguelia, and Shayan Zamanirad. 2022. Dialogue Management in Conversational Systems: A Review of Approaches, Challenges, and Opportunities. IEEE Transactions on Cognitive and Developmental Systems, Vol. 14, 3 (2022), 783--798.

[7]

Michael D. Cooper. 1973. A simulation model of an information retrieval system. Information Storage and Retrieval, Vol. 9, 1 (1973), 13--32.

[8]

Paul Crook and Alex Marin. 2017. Sequence to Sequence Modeling for User Simulation in Dialog Systems. In Proceedings of Interspeech 2017. 1706--1710.

[9]

J. Shane Culpepper, Fernando Diaz, and Mark D. Smucker. 2018. Research Frontiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). SIGIR Forum, Vol. 52, 1 (2018), 34--90.

Digital Library

[10]

W. Eckert, E. Levin, and R. Pieraccini. 1997. User modeling for spoken dialogue system evaluation. In 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings (ASRU '97). 80--87.

[11]

Layla El Asri, Jing He, and Kaheer Suleman. 2016. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems. In Proceedings of Interspeech 2016. 1151--1155.

[12]

Pierre Erbacher, Laure Soulier, and Ludovic Denoyer. 2021. State of the Art of User Simulation Approaches for Conversational Information Retrieval. In Joint Proceedings of the Causality in Search and Recommendation (CSR) and Simulation of Information Retrieval Evaluation (Sim4IR) Workshops (CRS-Sim4IR '21). 32--37.

[13]

Yutai Hou, Meng Fang, Wanxiang Che, and Ting Liu. 2019. A Corpus-Free State2Seq User Simulator for Task-Oriented Dialogue. In Chinese Computational Linguistics (CCL '19). 689--702.

Digital Library

[14]

Sangkeun Jung, Cheongjae Lee, Kyungduk Kim, Minwoo Jeong, and Gary Geunbae Lee. 2009. Data-driven user simulation for automated evaluation of spoken dialog systems. Computer Speech & Language, Vol. 23, 4 (2009), 479--509.

Digital Library

[15]

Dan Jurafsky and James H. Martin. 2023. Chatbots and Dialogue Systems. In Speech and Language Processing 3rd edition ed.). Chapter 15.

[16]

Florian Kreyssig, I nigo Casanueva, Paweł Budzianowski, and Milica Gavsić. 2018. Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL '18). 60--69.

[17]

Wai-Chung Kwan, Hong-Ru Wang, Hui-Min Wang, and Kam-Fai Wong. 2023. A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning. Machine Intelligence Research, Vol. 20, 3 (2023), 318--334.

[18]

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. 74--81.

[19]

Hsien-chin Lin, Christian Geishauser, Shutong Feng, Nurul Lubis, Carel van Niekerk, Michael Heck, and Milica Gasic. 2022. GenTUS: Simulating User Behaviour and Language in Task-oriented Dialogues with Generative Transformers. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL '22). 270--282.

[20]

Hsien-chin Lin, Nurul Lubis, Songbo Hu, Carel van Niekerk, Christian Geishauser, Michael Heck, Shutong Feng, and Milica Gasic. 2021. Domain-independent User Simulation with Transformers for Task-oriented Dialogue Systems. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL '21). 445--456.

[21]

Aldo Lipani, Ben Carterette, and Emine Yilmaz. 2021. How Am I Doing?: Evaluating Conversational Search Systems Offline. ACM Transactions on Information Systems, Vol. 39, 4 (2021), 1--22.

Digital Library

[22]

Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP '16). 2122--2132.

[23]

Yajiao Liu, Xin Jiang, Yichun Yin, Yasheng Wang, Fei Mi, Qun Liu, Xiang Wan, and Benyou Wang. 2023. One Cannot Stand for Everyone! Leveraging Multiple User Simulators to train Task-oriented Dialogue Systems. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL '23). 1--21.

[24]

Paul Owoicho, Ivan Sekulic, Mohammad Aliannejadi, Jeffrey Dalton, and Fabio Crestani. 2023. Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23). 632--642.

Digital Library

[25]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). 311--318.

Digital Library

[26]

Olivier Pietquin and Helen Hastie. 2013. A survey on metrics for the evaluation of user simulations. The Knowledge Engineering Review, Vol. 28, 1 (2013), 59--73.

[27]

Verena Rieser and Oliver Lemon. 2006. Cluster-based user simulations for learning dialogue strategies. In INTERSPEECH 2006: Ninth International Conference on Spoken Language Processing (INTERSPEECH '06).

[28]

Alexandre Salle, Shervin Malmasi, Oleg Rokhlenko, and Eugene Agichtein. 2021. Studying the Effectiveness of Conversational Search Refinement Through User Simulation. In Advances in Information Retrieval (ECIR '21). 587--602.

[29]

Jost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye, and Steve Young. 2007. Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers (NAACL '07). 149--152.

[30]

Jost Schatzmann, Karl Weilhammer, Matt Stuttle, and Steve Young. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The Knowledge Engineering Review, Vol. 21, 2 (2006), 97--126.

Digital Library

[31]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arxiv: 1707.06347 [cs.LG]

[32]

Ivan Sekuliç, Mohammad Aliannejadi, and Fabio Crestani. 2022. Evaluating Mixed-Initiative Conversational Search Systems via User Simulation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (WSDM '22). 888--896.

Digital Library

[33]

Clemencia Siro, Mohammad Aliannejadi, and Maarten De Rijke. 2023. Understanding and Predicting User Satisfaction with Conversational Recommender Systems. ACM Transactions on Information Systems, Vol. 42, 2 (2023), 1--37.

Digital Library

[34]

Johanne R. Trippas, Damiano Spina, Lawrence Cavedon, and Mark Sanderson. 2017. How Do People Interact in Conversational Speech-Only Search Tasks: A Preliminary Analysis. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval (CHIIR '17). 325--328.

Digital Library

[35]

Svitlana Vakulenko, Kate Revoredo, Claudio Di Ciccio, and Maarten de Rijke. 2019. QRFA: A Data-Driven Model of Information-Seeking Dialogues. In Advances in Information Retrieval (ECIR '19). 541--557.

[36]

Tsung-Hsien Wen, David Vandyke, Nikola Mrkvsić, Milica Gavsić, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A Network-based End-to-End Trainable Task-oriented Dialogue System. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (EACL '17). 438--449.

[37]

Jason D Williams, Antoine Raux, and Matthew Henderson. 2016. The dialog state tracking challenge series: A review. Dialogue & Discourse, Vol. 7, 3 (2016), 4--33.

[38]

Hamed Zamani, Johanne R. Trippas, Jeff Dalton, and Filip Radlinski. 2023. Conversational Information Seeking. Foundations and Trends® in Information Retrieval, Vol. 17, 3--4 (2023), 244--456.

[39]

Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '20). 1512--1520.

Digital Library

Index Terms

Towards a Formal Characterization of User Simulation Objectives in Conversational Information Access
1. Computing methodologies
  1. Modeling and simulation
2. Information systems
  1. Information retrieval
    1. Users and interactive retrieval

Recommendations

Identifying Breakdowns in Conversational Recommender Systems using User Simulation
CUI '24: Proceedings of the 6th ACM Conference on Conversational User Interfaces

We present a methodology to systematically test conversational recommender systems with regards to conversational breakdowns. It involves examining conversations generated between the system and simulated users for a set of pre-defined breakdown types, ...
Leveraging User Simulation to Develop and Evaluate Conversational Information Access Agents
WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining

We observe a change in the way users access information, that is, the rise of conversational information access (CIA) agents. However, the automatic evaluation of these agents remains an open challenge. Moreover, the training of CIA agents is cumbersome ...
Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICTIR '24: Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval

August 2024

267 pages

ISBN:9798400706813

DOI:10.1145/3664190

General Chair:
Harrie Oosterhuis
Radboud University
,
Program Chairs:
Hannah Bast
University of Freiburg
,
Chenyan Xiong
Carnegie Mellon University

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Honorable Mention

Author Tags

Qualifiers

Research-article

Funding Sources

Research Council of Norway

Conference

ICTIR '24

Sponsor:

SIGIR

ICTIR '24: The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC, USA

Acceptance Rates

ICTIR '24 Paper Acceptance Rate 26 of 45 submissions, 58%;

Overall Acceptance Rate 235 of 527 submissions, 45%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
96
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)13

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten