ABSTRACT
Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools to support medical decisions may provide patients with feedback on their concerns while also helping ensure the most urgent cases are routed to dermatologists. Although AI-based conversational agents have been explored recently, how they are perceived by patients and clinicians is not well understood. We conducted a Wizard-of-Oz study involving 18 participants with real skin concerns. Participants were randomly assigned to interact with either a clinician agent (portrayed by a dermatologist) or an LLM agent (supervised by a dermatologist) via synchronous multimodal chat. In both conditions, participants found the conversation to be helpful in understanding their medical situation and alleviate their concerns. Through qualitative coding of the conversation transcripts, we provide insight on the importance of empathy and effective information-seeking. We conclude with design considerations for future AI-based conversational agents in healthcare settings.
Footnotes
⁎ Both authors contributed equally.
† Both authors advised equally.
Supplemental Material
Available for Download
A.1 Participant pre-interaction survey
- Dominique Ansell, James A G Crispo, Benjamin Simard, and Lise M Bjerre. 2017. Interventions to reduce wait times for primary care appointments: a systematic review. BMC Health Serv. Res. 17, 1 (April 2017), 295.Google Scholar
Cross Ref
- Gopi J Astik, Nita Kulkarni, Rachel M Cyrus, Chen Yeh, and Kevin J O’Leary. 2021. Implementation of a triage nurse role and the effect on hospitalist workload. Hospital Practice 49, 5 (2021), 336–340.Google Scholar
Cross Ref
- Adam Baker, Yura Perov, Katherine Middleton, Janie Baxter, Daniel Mullarkey, Davinder Sangar, Mobasher Butt, Arnold DoRosario, and Saurabh Johri. 2020. A comparison of artificial intelligence and human doctors for the purpose of triage and diagnosis. Frontiers in artificial intelligence 3 (2020), 543405.Google Scholar
- Neeli M Bendapudi, Leonard L Berry, Keith A Frey, Janet Turner Parish, and William L Rayburn. 2006. Patients’ perspectives on ideal physician behaviors. In Mayo Clinic Proceedings, Vol. 81. Elsevier, Mayo Clinic Proceedings, England, UK, 338–344.Google Scholar
- Virginia Braun and Victoria Clarke. 2012. Thematic analysis. American Psychological Association, Washington DC, USA.Google Scholar
- PA Cameron, Belinda Jane Gabbe, Karen Smith, and Biswadev Mitra. 2014. Triaging the right patient to the right place in the shortest time. British journal of anaesthesia 113, 2 (2014), 226–233.Google Scholar
- Bolin Cao, Shiyi Huang, and Weiming Tang. 2024. AI triage or manual triage? Exploring medical staffs’ preference for AI triage in China. Patient Education and Counseling 119 (2024), 108076.Google Scholar
Cross Ref
- Deborah Cline, Carolyn Reilly, and Jayne F Moore. 2004. What’s behind RN turnover?: Uncover the “real reason” nurses leave. Holistic Nursing Practice 18, 1 (2004), 45–48.Google Scholar
Cross Ref
- Mukhamad Fathoni, Hathairat Sangchan, and Praneed Songwathana. 2013. Relationships between triage knowledge, training, working experiences and triage skills among emergency nurses in East Java, Indonesia. Nurse Media Journal of Nursing 3, 1 (2013), 511–525.Google Scholar
- Thomas B Fitzpatrick. 1988. The validity and practicality of sun-reactive skin types I through VI. Archives of dermatology 124, 6 (1988), 869–871.Google Scholar
- Karen A Funk and Malia Davis. 2015. Enhancing the role of the nurse in primary care: the RN “co-visit” model. Journal of general internal medicine 30, 12 (2015), 1871–1873.Google Scholar
Cross Ref
- Aidan Gilson, Conrad W Safranek, Thomas Huang, Vimig Socrates, Ling Chi, Richard Andrew Taylor, David Chartash, 2023. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Medical Education 9, 1 (2023), e45312.Google Scholar
Cross Ref
- Katelyn R Glines, Wasim Haidari, Leena Ramani, Zeynep M Akkurt, and Steven R Feldman. 2020. Digital future of dermatology. Dermatology online journal 26, 10 (2020), N/A.Google Scholar
- Derek Haggett. 2022. N.B. woman shocked at four-year wait time to see dermatologist. https://atlantic.ctvnews.ca/n-b-woman-shocked-at-four-year-wait-time-to-see-dermatologist-1.5975452. Accessed: 2023-11-2.Google Scholar
- Eunkyung Jo, Daniel A. Epstein, Hyunhoon Jung, and Young-Ho Kim. 2023. Understanding the Benefits and Challenges of Deploying Conversational AI Leveraging Large Language Models for Public Health Intervention. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (, Hamburg, Germany,) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 18, 16 pages. https://doi.org/10.1145/3544548.3581503Google Scholar
Digital Library
- William R. Kearns, Neha Kaura, Myra Divina, Cuong Vo, Dong Si, Teresa Ward, and Weichao Yuwen. 2020. A Wizard-of-Oz Interface and Persona-based Methodology for Collecting Health Counseling Dialog. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (, Honolulu, HI, USA,) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3334480.3382902Google Scholar
Digital Library
- Rafal Kocielnik, Elena Agapie, Alexander Argyle, Dennis T Hsieh, Kabir Yadav, Breena Taira, and Gary Hsieh. 2019. HarborBot: a chatbot for social needs screening. In AMIA Annual Symposium Proceedings, Vol. 2019. American Medical Informatics Association, American Medical Informatics Association, USA, 552.Google Scholar
- Liliana Laranjo, Adam G Dunn, Huong Ly Tong, Ahmet Baki Kocaballi, Jessica Chen, Rabia Bashir, Didi Surian, Blanca Gallego, Farah Magrabi, Annie YS Lau, 2018. Conversational agents in healthcare: a systematic review. Journal of the American Medical Informatics Association 25, 9 (2018), 1248–1258.Google Scholar
Cross Ref
- Brenna Li, Tetyana Skoropad, Puneet Seth, Mohit Jain, Khai Truong, and Alex Mariakakis. 2023. Constraints and Workarounds to Support Clinical Consultations in Synchronous Text-based Platforms. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (, Hamburg, Germany,) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 342, 17 pages. https://doi.org/10.1145/3544548.3581014Google Scholar
Digital Library
- Society of Dermatology Physician Assistants. 2023. Patients Are Waiting: America’s Dermatology Wait Times Crisis. https://www.dermpa.org/page/GAPP. Accessed: 2023-11-2.Google Scholar
- Vikas N O’Reilly-Shah. 2017. Factors influencing healthcare provider respondent fatigue answering a globally administered in-app survey. PeerJ 5 (2017), e3785.Google Scholar
Cross Ref
- Maria Panagioti, Efharis Panagopoulou, Peter Bower, George Lewith, Evangelos Kontopantelis, Carolyn Chew-Graham, Shoba Dawson, Harm Van Marwijk, Keith Geraghty, and Aneez Esmail. 2017. Controlled interventions to reduce burnout in physicians: a systematic review and meta-analysis. JAMA internal medicine 177, 2 (2017), 195–205.Google Scholar
Cross Ref
- Marisa Shrimpling. 2002. Redesigning triage to reduce waiting times. Emerg. Nurse 10, 2 (May 2002), 34–37.Google Scholar
Cross Ref
- Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, 2023. Large language models encode clinical knowledge. Nature 620, 7972 (2023), 172–180.Google Scholar
- Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Le Hou, Kevin Clark, Stephen Pfohl, Heather Cole-Lewis, Darlene Neal, Mike Schaekermann, Amy Wang, Mohamed Amin, Sami Lachgar, Philip Mansfield, Sushant Prakash, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Nenad Tomasev, Yun Liu, Renee Wong, Christopher Semturs, S. Sara Mahdavi, Joelle Barral, Dale Webster, Greg S. Corrado, Yossi Matias, Shekoofeh Azizi, Alan Karthikesalingam, and Vivek Natarajan. 2023. Towards Expert-Level Medical Question Answering with Large Language Models. arxiv:2305.09617 [cs.CL]Google Scholar
- Augustin Toma, Patrick R Lawler, Jimmy Ba, Rahul G Krishnan, Barry B Rubin, and Bo Wang. 2023. Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding.Google Scholar
Index Terms
- Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM
Recommendations
A Wizard-of-Oz Interface and Persona-based Methodology for Collecting Health Counseling Dialog
CHI EA '20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing SystemsHealth dialog collection is the primary bottleneck for the training and deployment of conversational agents into clinical practice. Current tools for the development of dialog systems are primarily focused on writing intent-slot schemas for natural ...
Wizard-of-Oz vs. GPT-4: A Comparative Study of Perceived Social Intelligence in HRI Brainstorming
HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot InteractionHuman-robot interaction often employs the Wizard-of-Oz (WoZ) paradigm, where a human controls the robot. However, this approach has limitations, such as a lack of autonomy that impedes real-world applications. Large language models (LLMs) can replace WoZ ...
A Conversational Agent for Medical Disclosure of Sexually Transmitted Infections
Hybrid Artificial Intelligent SystemsAbstractSexually transmitted infections (STIs) are serious health problems worldwide, increasing the risk of infection by Human Immunodeficiency Virus (HIV)/Acquired Immune Deficiency Syndrome (AIDS). Despite the significant efforts to address the ...
Comments