Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539618.3591683acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open access

Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond

Published: 18 July 2023 Publication History

Abstract

This research aims to explore various methods for assessing user feedback in mixed-initiative conversational search (CS ) systems. While CS systems enjoy profuse advancements across multiple aspects, recent research fails to successfully incorporate feedback from the users. One of the main reasons for that is the lack of system-user conversational interaction data. To this end, we propose a user simulator-based framework for multi-turn interactions with a variety of mixed-initiative CS systems. Specifically, we develop a user simulator, dubbed ConvSim, that, once initialized with an information need description, is capable of providing feedback to system's responses, as well as answering potential clarifying questions. Our experiments on a wide variety of state-of-the-art passage retrieval and neural re-ranking models show that effective utilization of user feedback can lead to 16% retrieval performance increase in terms of nDCG@3. Moreover, we observe consistent improvements as the number of feedback rounds increases (35% relative improvement in terms of nDCG@3 after three rounds). This points to a research gap in the development of specific feedback processing modules and opens a potential for significant advancements in CS. To support further research in the topic, we release over 30 000 transcripts of system-simulator interactions based on well-established CS datasets.

References

[1]
Jafar Afzali, Aleksander Mark Drzewiecki, Krisztian Balog, and Shuo Zhang. 2023. UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems. arXiv preprint arXiv:2301.05544 (2023).
[2]
Mohammad Aliannejadi, Leif Azzopardi, Hamed Zamani, Evangelos Kanoulas, Paul Thomas, and Nick Craswell. 2021a. Analysing Mixed Initiatives and Search Strategies during Conversational Search. In CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021. ACM, 16--26.
[3]
Mohammad Aliannejadi, Julia Kiseleva, Aleksandr Chuklin, Jeff Dalton, and Mikhail Burtsev. 2020. ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ). (2020).
[4]
Mohammad Aliannejadi, Julia Kiseleva, Aleksandr Chuklin, Jeff Dalton, and Mikhail S. Burtsev. 2021b. Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. Association for Computational Linguistics, 4473--4484.
[5]
Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W Bruce Croft. 2019. Asking clarifying questions in open-domain information-seeking conversations. In SIGIR. 475--484.
[6]
Avishek Anand, Lawrence Cavedon, Hideo Joho, Mark Sanderson, and Benno Stein. 2020. Conversational Search (Dagstuhl Seminar 19461). In Dagstuhl Reports, Vol. 9. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
[7]
Leif Azzopardi. 2011. The economics in interactive information retrieval. In SIGIR. ACM, 15--24.
[8]
Leif Azzopardi, Mohammad Aliannejadi, and Evangelos Kanoulas. 2022. Towards Building Economic Models of Conversational Search. In European Conference on Information Retrieval. Springer, 31--38.
[9]
Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, et al. 2016. MS MARCO: A human generated machine reading comprehension dataset. arXiv preprint arXiv:1611.09268 (2016).
[10]
Krisztian Balog. 2021. Conversational AI from an information retrieval perspective: Remaining challenges and a case for user simulation. (2021).
[11]
Guorui Bian, Michael McAleer, and Wing-Keung Wong. 2011. A trinomial test for paired data when there are many ties. Mathematics and Computers in Simulation, Vol. 81, 6 (2011), 1153--1160.
[12]
Pavel Braslavski, Denis Savenkov, Eugene Agichtein, and Alina Dubatovka. 2017. What do you mean exactly? Analyzing clarification questions in CQA. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval. 345--348.
[13]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[14]
Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user behavior for system effectiveness evaluation. In CIKM. 611--620.
[15]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. https://doi.org/10.48550/ARXIV.2204.02311
[16]
Aleksandr Chuklin, Aliaksei Severyn, Johanne R Trippas, Enrique Alfonseca, Hanna Silen, and Damiano Spina. 2019. Using audio transformations to improve comprehension in voice question answering. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 164--170.
[17]
Charles L Clarke, Nick Craswell, and Ian Soboroff. 2009. Overview of the trec 2009 web track. Technical Report. WATERLOO UNIV (ONTARIO).
[18]
Jeffrey Dalton, Chenyan Xiong, and Jamie Callan. 2020. TREC CAsT 2019: The conversational assistance track overview. arXiv preprint arXiv:2003.13624 (2020).
[19]
Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak. 2021. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review, Vol. 54, 1 (2021), 755--810.
[20]
Xiao Fu, Emine Yilmaz, and Aldo Lipani. 2022. Evaluating the Cranfield Paradigm for Conversational Search Systems. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval. 275--280.
[21]
Ahmed Hassan, Rosie Jones, and Kristina Lisa Klinkner. 2010. Beyond DCG: user behavior as a predictor of a successful search. In Proceedings of the third ACM international conference on Web search and data mining. 221--230.
[22]
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In CHI. 159--166.
[23]
Kalervo J"arvelin, Susan L Price, Lois ML Delcambre, and Marianne Lykke Nielsen. 2008. Discounted cumulated gain based evaluation of multiple-query IR sessions. In Advances in Information Retrieval: 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings 30. Springer, 4--15.
[24]
Antonios Minas Krasakis, Mohammad Aliannejadi, Nikos Voskarides, and Evangelos Kanoulas. 2020. Analysing the Effect of Clarifying Questions on Document Ranking in Conversational Search. In Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval. 129--132.
[25]
Weronika Lajewska, Nolwenn Bernard, Ivica Kostric, Ivan Sekulic, and Krisztian Balog. 2022. The University of Stavanger (IAI) at the TREC 2022 Conversational Assistance Track. (2022).
[26]
Victor Lavrenko and W Bruce Croft. 2009. A generative theory of relevance. Vol. 26. Springer.
[27]
Margaret Li, Jason Weston, and Stephen Roller. 2019. Acute-eval: Improved dialogue evaluation with optimized questions and multi-turn comparisons. arXiv preprint arXiv:1909.03087 (2019).
[28]
Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, and Jimmy Lin. 2020. Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting. https://doi.org/10.48550/ARXIV.2005.02230
[29]
Aldo Lipani, Ben Carterette, and Emine Yilmaz. 2021. How Am I Doing?: Evaluating Conversational Search Systems Offline. ACM TOIS (2021).
[30]
Chia-Wei Liu, Ryan Lowe, Iulian Vlad Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In EMNLP. 2122--2132.
[31]
Javed Mostafa, Snehasis Mukhopadhyay, and Mathew Palakal. 2003. Simulation studies of different dimensions of users' interests and their impact on user modeling and information filtering. Information Retrieval, Vol. 6, 2 (2003), 199--223.
[32]
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 708--718. https://doi.org/10.18653/v1/2020.findings-emnlp.63
[33]
Paul Owoicho, Jeffery Dalton, Mohammad Aliannejadi, Leif Azzopardi, Johanne R. Trippas, and Svitlana Vakulenko. 2022. TREC CAsT 2022: Going Beyond User Ask and System Retrieve with Initiative and Response Generation. (2022).
[34]
Teemu Pä228;kkönen, Jaana Kekäläinen, Heikki Keskustalo, Leif Azzopardi, David Maxwell, and Kalervo J"arvelin. 2017. Validating simulated interaction for retrieval evaluation. Information Retrieval Journal, Vol. 20 (2017), 338--362.
[35]
Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng, and Jianfeng Gao. 2020. Few-shot Natural Language Generation for Task-Oriented Dialog. In Findings of the Association for Computational Linguistics: EMNLP 2020. 172--182.
[36]
Gustavo Penha and Claudia Hauff. 2020. Challenges in the Evaluation of Conversational Search Systems. KDD Workshop on Conversational Systems Towards Mainstream Adoption (2020).
[37]
Marco Ponza, Paolo Ferragina, and Francesco Piccinno. 2019. Swat: A system for detecting salient Wikipedia entities in texts. Computational Intelligence, Vol. 35, 4 (2019), 858--890.
[38]
Chen Qu, Liu Yang, W. Bruce Croft, Johanne R. Trippas, Yongfeng Zhang, and Minghui Qiu. 2018. Analyzing and Characterizing User Intent in Information-seeking Conversations. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-12, 2019. ACM, 989--992.
[39]
Chen Qu, Liu Yang, W Bruce Croft, Yongfeng Zhang, Johanne R Trippas, and Minghui Qiu. 2019. User intent prediction in information-seeking conversations. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval (CHIIR '19). 25--33.
[40]
Filip Radlinski and Nick Craswell. 2017. A theoretical framework for conversational search. In CHIIR. 117--126.
[41]
Sudha Rao and Hal Daumé III. 2018. Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information. In ACL. 2737--2746.
[42]
Sudha Rao and Hal Daumé III. 2019. Answer-based adversarial training for generating clarification questions. arXiv preprint arXiv:1904.02281 (2019).
[43]
Joseph John Rocchio Jr. 1971. Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing (1971).
[44]
Corbin Rosset, Chenyan Xiong, Xia Song, Daniel Campos, Nick Craswell, Saurabh Tiwary, and Paul Bennett. 2020. Leading conversational search by suggesting useful questions. In TheWebConference. 1160--1170.
[45]
Ananya B Sai, Akash Kumar Mohankumar, and Mitesh M Khapra. 2022. A survey of evaluation metrics used for NLG systems. ACM Computing Surveys (CSUR), Vol. 55, 2 (2022), 1--39.
[46]
Alexandre Salle, Shervin Malmasi, Oleg Rokhlenko, and Eugene Agichtein. 2021. Studying the Effectiveness of Conversational Search Refinement Through User Simulation. In ECIR. 587--602.
[47]
Ivan Sekulić, Mohammad Aliannejadi, and Fabio Crestani. 2021. Towards Facet-Driven Generation of Clarifying Questions for Conversational Search. In Proceedings of the 2021 ACM SIGIR on International Conference on Theory of Information Retrieval (Virtual Event) (ICTIR '21). Association for Computing Machinery.
[48]
Ivan Sekulić, Mohammad Aliannejadi, and Fabio Crestani. 2022. Evaluating Mixed-initiative Conversational Search Systems via User Simulation. In WSDM '22: International Conference on Web Search and Data Mining (Phoenix, AZ).
[49]
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, Vol. 33 (2020), 16857--16867.
[50]
Amanda Stent, Matthew Marge, and Mohit Singhai. 2005. Evaluating evaluation methods for generation in the presence of variation. In international conference on intelligent text processing and computational linguistics. Springer, 341--351.
[51]
Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg. 2014. Towards natural clarification questions in dialogue systems. In AISB symposium on questions, discourse and dialogue, Vol. 20.
[52]
Weiwei Sun, Shuo Zhang, Krisztian Balog, Zhaochun Ren, Pengjie Ren, Zhumin Chen, and Maarten de Rijke. 2021. Simulating user satisfaction for the evaluation of task-oriented dialogue systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2499--2506.
[53]
Svitlana Vakulenko, Kate Revoredo, Claudio Di Ciccio, and Maarten de Rijke. 2019. QRFA: A Data-Driven Model of Information Seeking Dialogues. In Advances in Information Retrieval. Springer International Publishing, 541--557.
[54]
Svitlana Vakulenko, Nikos Voskarides, Zhucheng Tu, and Shayne Longpre. 2021. A Comparison of Question Rewriting Methods for Conversational Passage Retrieval. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021 (ECIR '21). 418--424.
[55]
Nikos Voskarides, Dan Li, Pengjie Ren, Evangelos Kanoulas, and Maarten de Rijke. 2020. Query Resolution for Conversational Search with Limited Supervision. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM. https://doi.org/10.1145/3397271.3401130
[56]
Marilyn A Walker and Steve Whittaker. 1990. Mixed Initiative in Dialogue: An Investigation into Discourse Segmentation. In ACL.
[57]
Shi Yu, Jiahua Liu, Jingqin Yang, Chenyan Xiong, Paul Bennett, Jianfeng Gao, and Zhiyuan Liu. 2020. Few-shot generative conversational query rewriting. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 1933--1936.
[58]
Shi Yu, Zhenghao Liu, Chenyan Xiong, Tao Feng, and Zhiyuan Liu. 2021. Few-shot conversational dense retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 829--838.
[59]
Hamed Zamani, Susan Dumais, Nick Craswell, Paul Bennett, and Gord Lueck. 2020a. Generating clarifying questions for information retrieval. In TheWebConference. 418--428.
[60]
Hamed Zamani, Gord Lueck, Everest Chen, Rodolfo Quispe, Flint Luu, and Nick Craswell. 2020b. Mimics: A large-scale data collection for search clarification. In CIKM.
[61]
Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N Bennett, Nick Craswell, and Susan T Dumais. 2020c. Analyzing and Learning from User Interactions for Search Clarification. arXiv preprint arXiv:2006.00166 (2020).
[62]
Hamed Zamani, Johanne R Trippas, Jeff Dalton, and Filip Radlinski. 2022. Conversational information seeking. arXiv preprint arXiv:2201.08808 (2022).
[63]
Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In KDD. 1512--1520.
[64]
Jie Zou, Evangelos Kanoulas, and Yiqun Liu. 2020. An Empirical Study on Clarifying Question-Based Systems. In CIKM. 2361--2364.
[65]
Jie Zou, Aixin Sun, Cheng Long, Mohammad Aliannejadi, and Evangelos Kanoulas. 2023. Asking Clarifying Questions: To benefit or to disturb users in Web search? Information Processing & Management, Vol. 60, 2 (2023), 103176. https://doi.org/10.1016/j.ipm.2022.103176

Cited By

View all
  • (2025)Proactive Conversational AI: A Comprehensive Survey of Advancements and OpportunitiesACM Transactions on Information Systems10.1145/3715097Online publication date: 24-Jan-2025
  • (2025)User Behavior Simulation with Large Language Model-based AgentsACM Transactions on Information Systems10.1145/370898543:2(1-37)Online publication date: 28-Jan-2025
  • (2024)Simulating Conversational Search Users with Parameterized BehaviorProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698425(72-81)Online publication date: 8-Dec-2024
  • Show More Cited By

Index Terms

  1. Exploiting Simulated User Feedback for Conversational Search: Ranking, Rewriting, and Beyond

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2023
    3567 pages
    ISBN:9781450394086
    DOI:10.1145/3539618
    This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2023

    Check for updates

    Author Tags

    1. conversational information seeking
    2. mixed-initiative
    3. user simulation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGIR '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)602
    • Downloads (Last 6 weeks)47
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Proactive Conversational AI: A Comprehensive Survey of Advancements and OpportunitiesACM Transactions on Information Systems10.1145/3715097Online publication date: 24-Jan-2025
    • (2025)User Behavior Simulation with Large Language Model-based AgentsACM Transactions on Information Systems10.1145/370898543:2(1-37)Online publication date: 28-Jan-2025
    • (2024)Simulating Conversational Search Users with Parameterized BehaviorProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698425(72-81)Online publication date: 8-Dec-2024
    • (2024)Towards a Formal Characterization of User Simulation Objectives in Conversational Information AccessProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672529(185-193)Online publication date: 2-Aug-2024
    • (2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 17-May-2024
    • (2024)GRILLBot In Practice: Lessons and Tradeoffs Deploying Large Language Models for Adaptable Conversational Task AssistantsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671622(4951-4961)Online publication date: 25-Aug-2024
    • (2024)Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM InteractionsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635856(8-17)Online publication date: 4-Mar-2024
    • (2024)Robust Training for Conversational Question Answering Models with Reinforced Reformulation GenerationProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635822(322-331)Online publication date: 4-Mar-2024
    • (2024)Leveraging User Simulation to Develop and Evaluate Conversational Information Access AgentsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635730(1136-1138)Online publication date: 4-Mar-2024
    • (2024)Tutorial on User Simulation for Evaluating Information Access Systems on the WebCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641243(1254-1257)Online publication date: 13-May-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media