Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3543829.3544529acmotherconferencesArticle/Chapter ViewAbstractPublication PagescuiConference Proceedingsconference-collections
short-paper
Open access

Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI

Published: 15 September 2022 Publication History

Abstract

Scarcity of user data continues to be a problem in research on conversational user interfaces and often hinders or slows down technical innovation. In the past, different ways of synthetically generating data, such as data augmentation techniques have been explored. With the rise of ever improving pre-trained language models, we ask if we can go beyond such methods by simply providing appropriate prompts to these general purpose models to generate data. We explore the feasibility and cost-benefit trade-offs of using non fine-tuned synthetic data to train classification algorithms for conversational agents. We compare this synthetically generated data with real user data and evaluate the performance of classifiers trained on different combinations of synthetic and real data. We come to the conclusion that, although classifiers trained on such synthetic data perform much better than random baselines, they do not compare to the performance of classifiers trained on even very small amounts of real user data, largely because such data is lacking much of the variability found in user generated data. Nevertheless, we show that in situations where very little data and resources are available, classifiers trained on such synthetically generated data might be preferable to the collection and annotation of naturalistic data.

References

[1]
Ron Artstein and Massimo Poesio. 2008. Inter-coder agreement for computational linguistics. Computational linguistics 34, 4 (2008), 555–596.
[2]
Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). 597–604.
[3]
Ms Aayushi Bansal, Dr Rewa Sharma, and Dr Mamta Kathuria. 2020. A Systematic Review on Data Scarcity Problem in Deep Learning: Solution and Applications. ACM Computing Surveys (CSUR)(2020).
[4]
Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258(2021).
[5]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[6]
Paweł Budzianowski and Ivan Vulić. 2019. Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems. In Proceedings of the 3rd Workshop on Neural Generation and Translation. Association for Computational Linguistics, Hong Kong, 15–22. https://doi.org/10.18653/v1/D19-5602
[7]
Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, and Noah A. Smith. 2021. All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 7282–7296. https://doi.org/10.18653/v1/2021.acl-long.565
[8]
Dawn Clifford and Laura Curtis. 2016. Motivational interviewing in nutrition and fitness. Guilford Publications.
[9]
Arthur RT de Lacerda and Carla SR Aguiar. 2019. FLOSS FAQ chatbot project reuse: how to allow nonexperts to develop a chatbot. In Proceedings of the 15th International Symposium on Open Collaboration. 1–8.
[10]
Alexander Frummet, David Elsweiler, and Bernd Ludwig. 2022. “What Can I Cook with these Ingredients?”-Understanding Cooking-Related Information Needs in Conversational Search. ACM Transactions on Information Systems (TOIS) 40, 4 (2022), 1–32.
[11]
Ahlam Fuad and Maha Al-Yahya. 2022. Cross-Lingual Transfer Learning for Arabic Task-Oriented Dialogue Systems Using Multilingual Transformer Model mT5. Mathematics 10, 5 (2022), 746.
[12]
Itika Gupta, Barbara Di Eugenio, Brian Ziebart, Aiswarya Baiju, Bing Liu, Ben Gerber, Lisa Sharp, Nadia Nabulsi, and Mary Smart. 2020. Human-Human Health Coaching via Text Messages: Corpus, Annotation, and Analysis. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 1st virtual meeting, 246–256. https://aclanthology.org/2020.sigdial-1.30
[13]
ARDB Landim, AM Pereira, Thales Vieira, E de B. Costa, JAB Moura, V Wanick, and Eirini Bazaki. 2021. Chatbot design approaches for fashion E-commerce: an interdisciplinary review. International Journal of Fashion Design, Technology and Education (2021), 1–11.
[14]
Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021. Pretrained transformers for text ranking: Bert and beyond. Synthesis Lectures on Human Language Technologies 14, 4(2021), 1–325.
[15]
David E Losada, David Elsweiler, Morgan Harvey, and Christoph Trattner. 2021. A day at the races: using best arm identification algorithms to reduce the cost of information retrieval user studies. Applied Intelligence(2021).
[16]
Antonis Maronikolakis, Mark Stevenson, and Hinrich Schütze. 2020. Transformers Are Better Than Humans at Identifying Generated Text. ArXiv abs/2009.13375(2020).
[17]
Selina Meyer. 2021. Natural Language Stage of Change Modelling for “Motivationally-driven” Weight Loss Support. In Proceedings of the 2021 International Conference on Multimodal Interaction. 807–811.
[18]
Selina Meyer. 2022. “I’m at my wits’ end”-Anticipating Information Needs and Appropriate Support Strategies in Behaviour Change. In ACM SIGIR Conference on Human Information Interaction and Retrieval. 396–399.
[19]
Selina Meyer and David Elsweiler. 2022. GLoHBCD: A Naturalistic German Dataset for Language of Health Behaviour Change on Online Support Forums. In Proceedings of the Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 2226–2235. https://aclanthology.org/2022.lrec-1.239
[20]
William R Miller and Stephen Rollnick. 2002. Motivational interviewing: Preparing people for change. Book Review.(2002).
[21]
Verónica Pérez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, and Lawrence An. 2016. Building a motivational interviewing dataset. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology. 42–51.
[22]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
[23]
Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
[24]
Gözde Gül Şahin. 2021. To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP. Computational Linguistics(2021), 1–38.
[25]
Ivan Sekulić, Mohammad Aliannejadi, and Fabio Crestani. 2022. Evaluating Mixed-Initiative Conversational Search Systems via User Simulation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining(Virtual Event, AZ, USA) (WSDM ’22). Association for Computing Machinery, New York, NY, USA, 888–896. https://doi.org/10.1145/3488560.3498440
[26]
Tomer Wullach, Amir Adler, and Einat Minkov. 2021. Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech. In Findings of the Association for Computational Linguistics: EMNLP 2021. 4699–4705.
[27]
Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-Woo Lee, and Woomyeong Park. 2021. Gpt3mix: Leveraging large-scale language models for text augmentation. arXiv preprint arXiv:2104.08826(2021).
[28]
Munazza Zaib, Quan Z Sheng, and Wei Emma Zhang. 2020. A short survey of pre-trained language models for conversational ai-a new age in nlp. In Proceedings of the Australasian Computer Science Week Multiconference. 1–4.

Cited By

View all
  • (2024)Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature ExtractionInformation10.3390/info1505026415:5(264)Online publication date: 6-May-2024
  • (2024)Improving Training Dataset Balance with ChatGPT Prompt EngineeringElectronics10.3390/electronics1312225513:12(2255)Online publication date: 8-Jun-2024
  • (2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 5-Mar-2024
  • Show More Cited By

Index Terms

  1. Do We Still Need Human Assessors? Prompt-Based GPT-3 User Simulation in Conversational AI

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CUI '22: Proceedings of the 4th Conference on Conversational User Interfaces
    July 2022
    289 pages
    ISBN:9781450397391
    DOI:10.1145/3543829
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 September 2022

    Check for updates

    Author Tags

    1. conversational ai
    2. datasets
    3. nlp
    4. text generation

    Qualifiers

    • Short-paper
    • Research
    • Refereed limited

    Conference

    CUI 2022
    CUI 2022: 4th Conference on Conversational User Interfaces
    July 26 - 28, 2022
    Glasgow, United Kingdom

    Acceptance Rates

    CUI '22 Paper Acceptance Rate 12 of 33 submissions, 36%;
    Overall Acceptance Rate 34 of 100 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)890
    • Downloads (Last 6 weeks)106
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature ExtractionInformation10.3390/info1505026415:5(264)Online publication date: 6-May-2024
    • (2024)Improving Training Dataset Balance with ChatGPT Prompt EngineeringElectronics10.3390/electronics1312225513:12(2255)Online publication date: 8-Jun-2024
    • (2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 5-Mar-2024
    • (2024)ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3641899(1-18)Online publication date: 11-May-2024
    • (2024)ChatOps for microservice systems: A low-code approach using service composition and large language modelsFuture Generation Computer Systems10.1016/j.future.2024.07.029161(518-530)Online publication date: Dec-2024
    • (2024)Exploring large language models for the generation of synthetic training samples for aspect-based sentiment analysis in low resource settingsExpert Systems with Applications10.1016/j.eswa.2024.125514(125514)Online publication date: Oct-2024
    • (2024)Combining prompt-based language models and weak supervision for labeling named entity recognition on legal documentsArtificial Intelligence and Law10.1007/s10506-023-09388-1Online publication date: 15-Feb-2024
    • (2024)Leveraging LLM-Generated Data for Detecting Depression Symptoms on Social MediaExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_14(193-204)Online publication date: 14-Sep-2024
    • (2023)A Comprehensive Study of ChatGPT: Advancements, Limitations, and Ethical Considerations in Natural Language Processing and CybersecurityInformation10.3390/info1408046214:8(462)Online publication date: 16-Aug-2023
    • (2023)Human Experts’ Perceptions of Auto-Generated Summarization QualityProceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3594806.3594828(95-98)Online publication date: 5-Jul-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media