research-article

Open access

Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3

Authors:

J.D. Zamfirescu-Pereira,

Bjoern Hartmann,

Qian YangAuthors Info & Claims

DIS '23: Proceedings of the 2023 ACM Designing Interactive Systems Conference

Pages 2206 - 2220

https://doi.org/10.1145/3563657.3596138

Published: 10 July 2023 Publication History

All formats PDF

Abstract

Prompting Large Language Models (LLMs) is an exciting new approach to designing chatbots. But can it improve LLM’s user experience (UX) reliably enough to power chatbot products? Our attempt to design a robust chatbot by prompting GPT-3/4 alone suggests: not yet. Prompts made achieving “80%” UX goals easy, but not the remaining 20%. Fixing the few remaining interaction breakdowns resembled herding cats: We could not address one UX issue or test one design solution at a time; instead, we had to handle everything everywhere all at once. Moreover, because no prompt could make GPT reliably say “I don’t know” when it should, the user-GPT conversations had no guardrails after a breakdown occurred, often leading to UX downward spirals. These risks incentivized us to design highly prescriptive prompts and scripted bots, counter to the promises of LLM-powered chatbots. This paper describes this case study, unpacks prompting’s fickleness and its impact on UX design processes, and discusses implications for LLM-based design methods and tools.

References

[1]

2022. CHATGPT: Optimizing language models for dialogue. https://openai.com/blog/chatgpt/

[2]

Bon Appétit. 2018. Elizabeth Olsen Tries to Keep Up with a Professional Chef | Back-to-Back Chef | Bon Appétit. Youtube. https://www.youtube.com/watch?v=Om2oM-TDErQ

[3]

Isaac Asimov. 1941. Three laws of robotics. Asimov, I. Runaround 2 (1941).

[4]

Stephen H. Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-David, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Alan Fries, Maged S. Al-shaibani, Shanya Sharma, Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev, Mike Tian-Jian Jiang, and Alexander M. Rush. 2022. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. https://doi.org/10.48550/ARXIV.2202.01279

[5]

Som Biswas. 2023. ChatGPT and the future of medical writing., e223312 pages.

[6]

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. 2021. On the Opportunities and Risks of Foundation Models. arxiv:2108.07258 [cs.LG]

[7]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[8]

Bill Buxton. 2010. Sketching user experiences: getting the design right and the right design. Morgan Kaufmann.

[9]

Design Council. 2005. The ‘double diamond’ design process model. Design Council (2005).

[10]

Holly Cummins. 2018. Chatbot best practices. https://www.ibm.com/cloud/blog/chatbot-best-practices

[11]

Amy Cyphert. 2021. A Human Being Wrote This Law Review Article: GPT-3 and the Practice of Law. UC Davis Law Review 55, 1 (2021), 2022–02.

[12]

William Gaver. 2012. What should we expect from research through design?. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 937–946.

Digital Library

[13]

Elizabeth Goodman, Erik Stolterman, and Ron Wakkary. 2011. Understanding Interaction Design Practices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). ACM, New York, NY, USA, 1061–1070. https://doi.org/10.1145/1978942.1979100

Digital Library

[14]

Erin Griffith and Cade Metz. 2023. A new area of A.I. booms, even amid the tech gloom. https://www.nytimes.com/2023/01/07/technology/generative-ai-chatgpt-investments.html

[15]

Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems.

Digital Library

[16]

Peter Lee, Sebastien Bubeck, and Joseph Petro. 2023. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. New England Journal of Medicine 388, 13 (2023), 1233–1239.

[17]

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arxiv:2107.13586 [cs.CL]

[18]

Damien Newman. 2009. The Process of Design Squiggle. thedesignsquiggle.com

[19]

Donald A Norman. 1999. Affordance, conventions, and design. interactions 6, 3 (1999), 38–43.

[20]

Owain Pedgley. 2007. Capturing and analysing own design activity. Design studies 28, 5 (2007), 463–483.

[21]

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, and Alexander M. Rush. 2021. Multitask Prompted Training Enables Zero-Shot Task Generalization. https://doi.org/10.48550/ARXIV.2110.08207

[22]

Donald Schön and John Bennett. 1996. Reflective conversation with materials. In Bringing design to software. ACM, 171–189.

[23]

Jessica Shieh. 2023. Best practices for prompt engineering with openai API. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api

[24]

Stephanie Valencia, Michal Luria, Amy Pavel, Jeffrey P. Bigham, and Henny Admoni. 2021. Co-Designing Socially Assistive Sidekicks for Motion-Based AAC. In Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction (Boulder, CO, USA) (HRI ’21). Association for Computing Machinery, New York, NY, USA, 24–33. https://doi.org/10.1145/3434073.3444646

Digital Library

[25]

Linxi Wang. 2019. Behind the Chatbot: Investigate the Design Process of Commercial Conversational Experience.

[26]

Xuewei Wang, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. 2019. Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5635–5649.

[27]

Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, and Carrie J Cai. 2022. PromptChainer: Chaining Large Language Model Prompts through Visual Programming. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems.

Digital Library

[28]

Tongshuang Wu, Michael Terry, and Carrie J Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems.

Digital Library

[29]

Qian Yang, Justin Cranshaw, Saleema Amershi, Shamsi T Iqbal, and Jaime Teevan. 2019. Sketching nlp: A case study of exploring the right things to design with language intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.

Digital Library

[30]

Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018. Investigating How Experienced UX Designers Effectively Work with Machine Learning. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 585–596. https://doi.org/10.1145/3196709.3196730

Digital Library

[31]

Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301

Digital Library

[32]

Wenmian Yang, Guangtao Zeng, Bowen Tan, Zeqian Ju, Subrato Chakravorty, Xuehai He, Shu Chen, Xingyi Yang, Qingyang Wu, Zhou Yu, Eric P. Xing, and Pengtao Xie. 2020. On the Generation of Medical Dialogues for COVID-19. CoRR abs/2005.05442 (2020). arXiv:2005.05442

[33]

J.D. Zamfirescu-Pereira, Richmond Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In CHI Conference on Human Factors in Computing Systems.

[34]

J. D. Zamfirescu-Pereira, Bjoern Hartmann, and Qian Yang. 2023. Conversation Regression Testing: A Design Technique for Prototyping Generalizable Prompt Strategies for Pre-trained Language Models. arxiv:2302.03154 [cs.HC]

Cited By

Canizares PÁvila DPerez-Soler SGuerra EDe Lara JSaadatmand MLonetti FBudnik CLi JGuerriero A(2024)Coverage-based Strategies for the Automated Synthesis of Test Scenarios for Conversational AgentsProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)10.1145/3644032.3644456(23-33)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3644032.3644456
Sánchez Cuadrado JPérez-Soler SGuerra EDe Lara J(2024)Automating the Development of Task-oriented LLM-based ChatbotsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665538(1-10)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665538
Petridis SWedin BWexler JPushkarna MDonsbach AGoyal NCai CTerry M(2024)ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into PrinciplesProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645144(853-868)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645144
Show More Cited By

Index Terms

Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3
1. Human-centered computing
  1. Interaction design

Recommendations

Exploring how politeness impacts the user experience of chatbots for mental health support
Abstract
Politeness is important in human–human interaction when asking people to engage in sensitive conversations. If politeness manifests similarly in human–chatbot interaction, it may play an important role in the design of sensitive chatbot ...
Highlights
- Politeness can both positively and negatively impact the chatbot user experience.
- The Personal politeness chatbot was experienced as caring and encouraging.
- The Passive politeness chatbot was experienced as too apologetic and ...
Exploring Emotions in Avatar Design to Increase Adherence to Chatbot Technology
Design, User Experience, and Usability
Abstract
A mock-up of a simulated e-commerce platform was used to evaluate the intentions to start the interaction with a chatbot during a system failure. The participants (N = 60) that volunteered to evaluate the platform’s usability were not aware that ...
The Effect of Back-Channeling Cues on Motivation to Continue Human-Machine Textual Interaction
IMCOM '18: Proceedings of the 12th International Conference on Ubiquitous Information Management and Communication

With the emergence of Conversational User Interface (CUI), great efforts have been put into technologies and machines to make it more humane and human-friendly. One of the most important features in CUI is turn-taking process. Nonetheless, in real human ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DIS '23: Proceedings of the 2023 ACM Designing Interactive Systems Conference

July 2023

2717 pages

ISBN:9781450398930

DOI:10.1145/3563657

Editors:
Daragh Byrne
Carnegie Mellon University
,
Nikolas Martelaro
Carnegie Mellon University
,
Andy Boucher
Northumbria University
,
David Chatting
Goldsmiths, University of London
,
Sarah Fdili Alaoui
LISN-Université Paris Saclay
,
Sarah Fox
Carnegie Mellon University
,
Iohanna Nicenboim
Delft University of Technology
,
Cayley MacArthur
University of Waterloo

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

USAF / DARPA - SDCPS Program

Conference

DIS '23

Sponsor:

SIGCHI

DIS '23: Designing Interactive Systems Conference

July 10 - 14, 2023

PA, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 1,158 of 4,684 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
2,316
Total Downloads

Downloads (Last 12 months)2,208
Downloads (Last 6 weeks)217

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Canizares PÁvila DPerez-Soler SGuerra EDe Lara JSaadatmand MLonetti FBudnik CLi JGuerriero A(2024)Coverage-based Strategies for the Automated Synthesis of Test Scenarios for Conversational AgentsProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)10.1145/3644032.3644456(23-33)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3644032.3644456
Sánchez Cuadrado JPérez-Soler SGuerra EDe Lara J(2024)Automating the Development of Task-oriented LLM-based ChatbotsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665538(1-10)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665538
Petridis SWedin BWexler JPushkarna MDonsbach AGoyal NCai CTerry M(2024)ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into PrinciplesProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645144(853-868)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645144
Cañizares PLópez-Morales JPérez-Soler SGuerra Ede Lara J(2024)Measuring and Clustering Heterogeneous Chatbot DesignsACM Transactions on Software Engineering and Methodology10.1145/363722833:4(1-43)Online publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1145/3637228
Goyal NChang MTerry M(2024)Designing for Human-Agent Alignment: Understanding what humans want from their agentsExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650948(1-6)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650948
Kang SOdom W(2024)On the Design of Quologue: Uncovering Opportunities and Challenges with Generative AI as a Resource for Creating a Self-Morphing E-book Metadata ArchiveExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650909(1-16)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3650909
Kim TBae SKim HLee SHong HYang CKim Y(2024)MindfulDiary: Harnessing Large Language Model to Support Psychiatric Patients' JournalingProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642937(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642937
Almeda SZamfirescu-Pereira JKim KMani Rathnam PHartmann B(2024)Prompting for Discovery: Flexible Sense-Making for AI Art-Making with DreamsheetsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642858(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642858
Subramonyam HPea RPondoc CAgrawala MSeifert C(2024)Bridging the Gulf of Envisioning: Cognitive Challenges in Prompt Based Interactions with LLMsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642754(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642754
Li ZLiang MLc RLuo Y(2024)StayFocused: Examining the Effects of Reflective Prompts and Chatbot Support on Compulsive Smartphone UseProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642479(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642479
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents