short-paper

Open access

Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI

Authors:

David Elsweiler,

Marcos Fernandez-Pichel,

David E. LosadaAuthors Info & Claims

CUI '22: Proceedings of the 4th Conference on Conversational User Interfaces

Article No.: 8, Pages 1 - 6

https://doi.org/10.1145/3543829.3544529

Published: 15 September 2022 Publication History

All formats PDF

Abstract

Scarcity of user data continues to be a problem in research on conversational user interfaces and often hinders or slows down technical innovation. In the past, different ways of synthetically generating data, such as data augmentation techniques have been explored. With the rise of ever improving pre-trained language models, we ask if we can go beyond such methods by simply providing appropriate prompts to these general purpose models to generate data. We explore the feasibility and cost-benefit trade-offs of using non fine-tuned synthetic data to train classification algorithms for conversational agents. We compare this synthetically generated data with real user data and evaluate the performance of classifiers trained on different combinations of synthetic and real data. We come to the conclusion that, although classifiers trained on such synthetic data perform much better than random baselines, they do not compare to the performance of classifiers trained on even very small amounts of real user data, largely because such data is lacking much of the variability found in user generated data. Nevertheless, we show that in situations where very little data and resources are available, classifiers trained on such synthetically generated data might be preferable to the collection and annotation of naturalistic data.

References

[1]

Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational linguistics 34, 4 (2008), 555–596.

[2]

Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). 597–604.

Digital Library

[3]

Ms Aayushi Bansal, Dr Rewa Sharma, and Dr Mamta Kathuria. 2020. A Systematic Review on Data Scarcity Problem in Deep Learning: Solution and Applications. ACM Computing Surveys (CSUR)(2020).

[4]

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258(2021).

[5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[6]

Paweł Budzianowski and Ivan Vulić. 2019. Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems. In Proceedings of the 3rd Workshop on Neural Generation and Translation. Association for Computational Linguistics, Hong Kong, 15–22. https://doi.org/10.18653/v1/D19-5602

[7]

Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, and Noah A. Smith. 2021. All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 7282–7296. https://doi.org/10.18653/v1/2021.acl-long.565

[8]

Dawn Clifford and Laura Curtis. 2016. Motivational interviewing in nutrition and fitness. Guilford Publications.

[9]

Arthur RT de Lacerda and Carla SR Aguiar. 2019. FLOSS FAQ chatbot project reuse: how to allow nonexperts to develop a chatbot. In Proceedings of the 15th International Symposium on Open Collaboration. 1–8.

Digital Library

[10]

Alexander Frummet, David Elsweiler, and Bernd Ludwig. 2022. “What Can I Cook with these Ingredients?”-Understanding Cooking-Related Information Needs in Conversational Search. ACM Transactions on Information Systems (TOIS) 40, 4 (2022), 1–32.

Digital Library

[11]

Ahlam Fuad and Maha Al-Yahya. 2022. Cross-Lingual Transfer Learning for Arabic Task-Oriented Dialogue Systems Using Multilingual Transformer Model mT5. Mathematics 10, 5 (2022), 746.

[12]

Itika Gupta, Barbara Di Eugenio, Brian Ziebart, Aiswarya Baiju, Bing Liu, Ben Gerber, Lisa Sharp, Nadia Nabulsi, and Mary Smart. 2020. Human-Human Health Coaching via Text Messages: Corpus, Annotation, and Analysis. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 1st virtual meeting, 246–256. https://aclanthology.org/2020.sigdial-1.30

[13]

ARDB Landim, AM Pereira, Thales Vieira, E de B. Costa, JAB Moura, V Wanick, and Eirini Bazaki. 2021. Chatbot design approaches for fashion E-commerce: an interdisciplinary review. International Journal of Fashion Design, Technology and Education (2021), 1–11.

[14]

Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained transformers for text ranking: Bert and beyond. Synthesis Lectures on Human Language Technologies 14, 4(2021), 1–325.

[15]

David E Losada, David Elsweiler, Morgan Harvey, and Christoph Trattner. 2021. A day at the races: using best arm identification algorithms to reduce the cost of information retrieval user studies. Applied Intelligence(2021).

[16]

Antonis Maronikolakis, Mark Stevenson, and Hinrich Schütze. 2020. Transformers Are Better Than Humans at Identifying Generated Text. ArXiv abs/2009.13375(2020).

[17]

Selina Meyer. 2021. Natural Language Stage of Change Modelling for “Motivationally-driven” Weight Loss Support. In Proceedings of the 2021 International Conference on Multimodal Interaction. 807–811.

Digital Library

[18]

Selina Meyer. 2022. “I’m at my wits’ end”-Anticipating Information Needs and Appropriate Support Strategies in Behaviour Change. In ACM SIGIR Conference on Human Information Interaction and Retrieval. 396–399.

[19]

Selina Meyer and David Elsweiler. 2022. GLoHBCD: A Naturalistic German Dataset for Language of Health Behaviour Change on Online Support Forums. In Proceedings of the Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 2226–2235. https://aclanthology.org/2022.lrec-1.239

[20]

William R Miller and Stephen Rollnick. 2002. Motivational interviewing: Preparing people for change. Book Review.(2002).

[21]

Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, and Lawrence An. 2016. Building a motivational interviewing dataset. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology. 42–51.

[22]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.

[23]

Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.

Digital Library

[24]

Gözde Gül Şahin. 2021. To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP. Computational Linguistics(2021), 1–38.

[25]

Ivan Sekulić, Mohammad Aliannejadi, and Fabio Crestani. 2022. Evaluating Mixed-Initiative Conversational Search Systems via User Simulation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining(Virtual Event, AZ, USA) (WSDM ’22). Association for Computing Machinery, New York, NY, USA, 888–896. https://doi.org/10.1145/3488560.3498440

Digital Library

[26]

Tomer Wullach, Amir Adler, and Einat Minkov. 2021. Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech. In Findings of the Association for Computational Linguistics: EMNLP 2021. 4699–4705.

[27]

Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, and Woomyeong Park. 2021. Gpt3mix: Leveraging large-scale language models for text augmentation. arXiv preprint arXiv:2104.08826(2021).

[28]

Munazza Zaib, Quan Z Sheng, and Wei Emma Zhang. 2020. A short survey of pre-trained language models for conversational ai-a new age in nlp. In Proceedings of the Australasian Computer Science Week Multiconference. 1–4.

Digital Library

Cited By

Sufi F(2024)Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature ExtractionInformation10.3390/info1505026415:5(264)Online publication date: 6-May-2024
https://doi.org/10.3390/info15050264
Kochanek MCichecki IKaszyca OSzydło DMadej MJędrzejewski DKazienko PKocoń J(2024)Improving Training Dataset Balance with ChatGPT Prompt EngineeringElectronics10.3390/electronics1312225513:12(2255)Online publication date: 8-Jun-2024
https://doi.org/10.3390/electronics13122255
Sekulić IAlinannejadi MCrestani F(2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.1145/3650041
Show More Cited By

Index Terms

Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

Clarifying User's Information Need in Conversational Information Retrieval
A Multi-party Conversational Social Robot Using LLMs
HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

In this paper, we describe our setting and the architecture of our LLM-based dialogue system embodied in a social robot and able to have multi-party conversations. Each component is detailed, and a video of the full system is available with the ...
Towards Conversationally Intelligent Dialog Systems
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems

Spoken dialog systems, lacking the means to address the complex phenomena of spontaneous speech and conversational dynamics, force users into a constrained mode of dialog that resembles text-based interaction more closely than spoken conversation. Turn-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CUI '22: Proceedings of the 4th Conference on Conversational User Interfaces

July 2022

289 pages

ISBN:9781450397391

DOI:10.1145/3543829

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2022

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

CUI 2022

CUI 2022: 4th Conference on Conversational User Interfaces

July 26 - 28, 2022

Glasgow, United Kingdom

Acceptance Rates

CUI '22 Paper Acceptance Rate 12 of 33 submissions, 36%;

Overall Acceptance Rate 34 of 100 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
1,995
Total Downloads

Downloads (Last 12 months)890
Downloads (Last 6 weeks)106

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sufi F(2024)Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature ExtractionInformation10.3390/info1505026415:5(264)Online publication date: 6-May-2024
https://doi.org/10.3390/info15050264
Kochanek MCichecki IKaszyca OSzydło DMadej MJędrzejewski DKazienko PKocoń J(2024)Improving Training Dataset Balance with ChatGPT Prompt EngineeringElectronics10.3390/electronics1312225513:12(2255)Online publication date: 8-Jun-2024
https://doi.org/10.3390/electronics13122255
Sekulić IAlinannejadi MCrestani F(2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.1145/3650041
Reza MLaundry NMusabirov IDushniku PYu ZMittal KGrossman TLiut MKuzminykh AWilliams J(2024)ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641899(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3641899
Wang SMa SLai GChao C(2024)ChatOps for microservice systems: A low-code approach using service composition and large language modelsFuture Generation Computer Systems10.1016/j.future.2024.07.029161(518-530)Online publication date: Dec-2024
https://doi.org/10.1016/j.future.2024.07.029
Hellwig NFehle JWolff C(2024)Exploring large language models for the generation of synthetic training samples for aspect-based sentiment analysis in low resource settingsExpert Systems with Applications10.1016/j.eswa.2024.125514(125514)Online publication date: Oct-2024
https://doi.org/10.1016/j.eswa.2024.125514
Oliveira VNogueira GFaleiros TMarcacini R(2024)Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documentsArtificial Intelligence and Law10.1007/s10506-023-09388-1Online publication date: 15-Feb-2024
https://doi.org/10.1007/s10506-023-09388-1
Bucur A(2024)Leveraging LLM-Generated Data for Detecting Depression Symptoms on Social MediaExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_14(193-204)Online publication date: 14-Sep-2024
https://doi.org/10.1007/978-3-031-71736-9_14
Alawida MMejri SMehmood AChikhaoui BIsaac Abiodun O(2023)A Comprehensive Study of ChatGPT: Advancements, Limitations, and Ethical Considerations in Natural Language Processing and CybersecurityInformation10.3390/info1408046214:8(462)Online publication date: 16-Aug-2023
https://doi.org/10.3390/info14080462
Lotfigolian MPapanikolaou CTaghizadeh SSandnes F(2023)Human Experts’ Perceptions of Auto-Generated Summarization QualityProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3594806.3594828(95-98)Online publication date: 5-Jul-2023
https://dl.acm.org/doi/10.1145/3594806.3594828
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents