Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3640794.3665886acmconferencesArticle/Chapter ViewAbstractPublication PagescuiConference Proceedingsconference-collections
extended-abstract

Using Large Language Models for Robot-Assisted Therapeutic Role-Play: Factuality is not enough!

Published: 08 July 2024 Publication History

Abstract

Robot-assisted social role-play can help neurodivergent individuals practice social skills in a safe environment. Large language models (LLM) facilitate the implementation of such agents. However, high quality standards must be ensured in this sensitive setting. This article argues that current evaluation methods of generated language are not sufficient because they are grounded in beliefs about language as an external code to describe the world (referential functions of language). We argue that non-referential functions of language must be part of the evaluation of LLM-generated language when LLMs engage in social interactions with users. We test the feasibility of our approach in a pilot implementation of a platform for robot-assisted social role-play. Out proposed evaluation framework helps to assess systematically referential and non-referential functions of LLM-generated language. We argue that the evaluation framework can be also applied to multimodal interaction.

References

[1]
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, 2022. Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073.
[2]
Stephanie Barros and Isabelle Albert. 2020. “I feel more Luxembourgish, but Portuguese too” cultural identities in a multicultural society. Integrative Psychological and Behavioral Science 54, 1 (2020), 72–103.
[3]
Alexei A Birkun and Adhish Gautam. 2023. Large Language Model (LLM)-powered chatbots fail to generate guideline-consistent content on resuscitation and may provide potentially harmful advice. Prehospital and Disaster Medicine 38, 6 (2023), 757–763.
[4]
Lera Boroditsky. 2011. How language shapes thought. Scientific American 304, 2 (2011), 62–65.
[5]
Sam Brandsen, Tara Chandrasekhar, Lauren Franz, Jordan Grapel, Geraldine Dawson, and David Carlson. 2024. Prevalence of bias against neurodivergence-related terms in artificial intelligence language models. Autism Research 17, 2 (2024), 234–248. https://doi.org/10.1002/aur.3094 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/aur.3094
[6]
Alicia A Broderick. 2010. Autism as enemy: Metaphor and cultural politics. In Handbook of cultural politics and education. Brill, 237–268.
[7]
Justine Cassell. 2022. Socially Interactive Agents as Peers (1 ed.). Association for Computing Machinery, New York, NY, USA, 331–366. https://doi.org/10.1145/3563659.3563670
[8]
Liang Chen, Yang Deng, Yatao Bian, Zeyu Qin, Bingzhe Wu, Tat-Seng Chua, and Kam-Fai Wong. 2023. Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 6325–6341. https://doi.org/10.18653/v1/2023.emnlp-main.390
[9]
shiqi chen, Yiran Zhao, Jinghan Zhang, I-Chun Chern, Siyang Gao, Pengfei Liu, and Junxian He. 2023. FELM: Benchmarking Factuality Evaluation of Large Language Models. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.). Vol. 36. Curran Associates, Inc., 44502–44523. https://proceedings.neurips.cc/paper_files/paper/2023/file/8b8a7960d343e023a6a0afe37eee6022-Paper-Datasets_and_Benchmarks.pdf
[10]
Yujin Cho, Mingeon Kim, Seojin Kim, Oyun Kwon, Ryan Donghan Kwon, Yoonha Lee, and Dohyun Lim. 2023. Evaluating the efficacy of interactive language therapy based on LLM for high-functioning autistic adolescent psychological counseling. arXiv preprint arXiv:2311.09243.
[11]
Jan Clusmann, Fiona R Kolbinger, Hannah Sophie Muti, Zunamys I Carrero, Jan-Niklas Eckardt, Narmin Ghaffari Laleh, Chiara Maria Lavinia Löffler, Sophie-Caroline Schwarzkopf, Michaela Unger, Gregory P Veldhuizen, 2023. The future landscape of large language models in medicine. Communications medicine 3, 1 (2023), 141.
[12]
Simon Coghlan, Kobi Leins, Susie Sheldrick, Marc Cheong, Piers Gooding, and Simon D’Alfonso. 2023. To chat or bot to chat: Ethical issues with using chatbots in mental health. DIGITAL HEALTH 9 (2023), 20552076231183542.
[13]
Nathalie Smuha (Coordinator). 2019. Ethics guidelines for trustworthy AI. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai.
[14]
Huu-Tien Dang, Xuan-Hieu Phan, 2022. Non-Standard Vietnamese Word Detection and Normalization for Text–to–Speech. In 2022 14th International Conference on Knowledge and Systems Engineering (KSE). IEEE, 1–6.
[15]
Penelope Eckert. 2012. Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation. Annual review of Anthropology 41, 1 (2012), 87–100.
[16]
Ronen Eldan and Yuanzhi Li. 2023. Tinystories: How small can language models be and still speak coherent english?arXiv preprint arXiv:2305.07759.
[17]
Samantha Finkelstein, Evelyn Yarzebinski, Callie Vaughn, Amy Ogan, and Justine Cassell. 2013. The effects of culturally congruent educational technologies on student achievement. In Artificial Intelligence in Education: 16th International Conference. Springer, Berlin-Heidelberg, 493–502.
[18]
Ivar Frisch and Mario Giulianelli. 2024. LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models. arXiv preprint arXiv:2402.02896.
[19]
Susan Gal. 2016. Sociolinguistic differentiation. Sociolinguistics: Theoretical Debates 113 (2016), 113 – 124.
[20]
Susan Gal and Kathryn A Woolard. 1995. Constructing languages and publics: Authority and representation. Pragmatics 5, 2 (1995), 129–138.
[21]
Robert Gale, Alexandra Salem, Gerasimos Fergadiotis, and Steven Bedrick. 2023. Mixed Orthographic/Phonemic Language Modeling: Beyond Orthographically Restricted Transformers (BORT). In Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023), Burcu Can, Maximilian Mozes, Samuel Cahyawijaya, Naomi Saphra, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Chen Zhao, Isabelle Augenstein, Anna Rogers, Kyunghyun Cho, Edward Grefenstette, and Lena Voita (Eds.). Association for Computational Linguistics, Toronto, Canada, 212–225. https://doi.org/10.18653/v1/2023.repl4nlp-1.18
[22]
Carol Gray. 2000. The new social story book. Future Horizons.
[23]
Chris Carl Hale. 2012. Power, position and autonomy: Student conflict in a communicative language classroom. Studies in Applied Linguistics and TESOL 12, 1 (2012), 1–17.
[24]
Maximilian Haug and Heiko Gewald. 2018. Are Friendly and Competent the Same? - The Role of the Doctor-Patient Relationship in Physician Ratings. In Proceedings of the 2018 ACM SIGMIS Conference on Computers and People Research (Buffalo-Niagara Falls, NY, USA) (SIGMIS-CPR’18). Association for Computing Machinery, New York, NY, USA, 157. https://doi.org/10.1145/3209626.3209734
[25]
Sviatlana Höhn. 2024. Non-Referential Functions of Language in Social Agents: The Case of Social Proximity. In Proceedings of the 1st Worskhop on Towards Ethical and Inclusive Conversational AI: Language Attitudes, Linguistic Diversity, and Language Rights (TEICAI 2024), Nina Hosseini-Kivanani, Sviatlana Höhn, Dimitra Anastasiou, Bettina Migge, Angela Soltan, Doris Dippold, Ekaterina Kamlovskaya, and Fred Philippy (Eds.). Association for Computational Linguistics, St Julians, Malta, 36–41. https://aclanthology.org/2024.teicai-1.6
[26]
Sviatlana Höhn, Bettina Migge, Doris Dippold, Britta Schneider, and Sjouke Mauw. 2023. Language Ideology Bias in Conversational Technology. In International Workshop on Chatbot Research and Design. Springer, Berlin-Heidelberg, 133–148.
[27]
Dirk Hovy and Diyi Yang. 2021. The Importance of Modeling Social Factors of Language: Theory and Practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (Eds.). Association for Computational Linguistics, Online, 588–602. https://doi.org/10.18653/v1/2021.naacl-main.49
[28]
Alexandra Jaffe. 2000. Introduction: Non-standard orthography and non-standard speech. Journal of sociolinguistics 4, 4 (2000), 497–513.
[29]
JiWoong Jang, Sanika Moharana, Patrick Carrington, and Andrew Begel. 2024. " It’s the only thing I can trust": Envisioning Large Language Model Use by Autistic Workers for Communication Assistance. arXiv preprint arXiv:2403.03297.
[30]
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
[31]
Roland Kehrein and Hanna Fischer. 2016. Nähe, Distanz und Regionalsprache. In Zur Karriere von Nähe und Distanz.de Gruyter Mouton, Berlin – New York, 213–257.
[32]
Peter Koch and Wulf Oesterreicher. 2012. Language of immediacy-language of distance: Orality and literacy from the perspective of language theory and linguistic history. Linha D’Água 26 (2012), 153–174.
[33]
Vivien Kühne, Astrid Marieke Rosenthal-von der Pütten, and Nicole C Krämer. 2013. Using linguistic alignment to enhance learning experience with pedagogical agents: the special case of dialect. In 13th International Conference on Intelligent Virtual Agents. Springer, Heidelberg, 149–158.
[34]
Hoa Kuoch and Pat Mirenda. 2003. Social story interventions for young children with autism spectrum disorders. Focus on Autism and other developmental disabilities 18, 4 (2003), 219–227.
[35]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
[36]
Keming Lu, Bowen Yu, Chang Zhou, and Jingren Zhou. 2024. Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment. arXiv preprint arXiv:2401.12474.
[37]
Birgit Lugrin, Elisabeth Ströle, David Obremski, Frank Schwab, and Benjamin Lange. 2020. What if it speaks like it was from the village? Effects of a Robot speaking in Regional Language Variations on Users’ Evaluations. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 1315–1320. https://doi.org/10.1109/RO-MAN47096.2020.9223432
[38]
Karthik Mahadevan, Jonathan Chien, Noah Brown, Zhuo Xu, Carolina Parada, Fei Xia, Andy Zeng, Leila Takayama, and Dorsa Sadigh. 2024. Generative Expressive Robot Behaviors using Large Language Models. In Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. Association for Computing Machinery, 482–491. https://doi.org/10.1145/3610977.3634999
[39]
Alandeom W Oliveira, Troy D Sadler, and Daniel F Suslak. 2007. The linguistic construction of expert identity in professor–student discussions of science. Cultural Studies of Science Education 2 (2007), 119–150.
[40]
Daniel Y Park and Hyungsook Kim. 2023. Determinants of intentions to use digital mental healthcare content among university students, faculty, and staff: motivation, perceived usefulness, perceived ease of use, and parasocial interaction with AI Chatbot. Sustainability 15, 1 (2023), 872.
[41]
Askarbek Pazylbekov, Daryn Kalym, Anuar Otynshin, and Anara Sandygulova. 2019. Similarity attraction for robot’s dialect in language learning using social robots. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 532–533.
[42]
Courtney Potts, Frida Lindström, Raymond Bond, Maurice Mulvenna, Frederick Booth, Edel Ennis, Karolina Parding, Catrine Kostenius, Thomas Broderick, Kyle Boyd, 2023. A multilingual digital mental health and well-being Chatbot (ChatPal): pre-post multicenter intervention study. Journal of Medical Internet Research 25 (2023), e43051.
[43]
Zohreh Salimi, Ensiyeh Jenabi, and Saeid Bashirian. 2021. Are social robots ready yet to be used in care and therapy of autism spectrum disorder: A systematic review of randomized controlled trials. Neuroscience & Biobehavioral Reviews 129 (2021), 1–16.
[44]
Murray Shanahan, Kyle McDonell, and Laria Reynolds. 2023. Role play with large language models. Nature 623, 7987 (2023), 493–498.
[45]
Natasha Shrikant. 2014. “It’s like,‘I’ve never met a lesbian before!’”: Personal narratives and the construction of diverse female identities in a lesbian counterpublic. Pragmatics 24, 4 (2014), 799–818.
[46]
Agnieszka Sowińska and Tatiana Dubrovskaya. 2012. Discursive construction and transformation of ‘us’ and ‘them’ categories in the newspaper coverage on the US anti-ballistic missile system: Polish versus Russian view. Discourse & Comm. 6, 4 (2012), 449–468.
[47]
Micol Spitale, Minja Axelsson, and Hatice Gunes. 2024. Appropriateness of LLM-equipped Robotic Well-being Coach Language in the Workplace: A Qualitative Evaluation. arXiv preprint arXiv:2401.14935.
[48]
Anna Temkina, Daria Litvina, and Anastasia Novkunskaya. 2021. Emotional styles in Russian maternity hospitals: juggling between khamstvo and smiling. Emotions and Society 3, 1 (2021), 95–113.
[49]
Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. 2023. Large language models in medicine. Nature medicine 29, 8 (2023), 1930–1940.
[50]
Mimi B Trosdal. 1995. Meaning: The referential function of language. Philippine quarterly of culture and society 23, 3/4 (1995), 361–368.
[51]
Cunxiang Wang, Xiaoze Liu, Yuanhao Yue, Xiangru Tang, Tianhang Zhang, Cheng Jiayang, Yunzhi Yao, Wenyang Gao, Xuming Hu, Zehan Qi, 2023. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity. arXiv preprint arXiv:2310.07521.
[52]
Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Georgi Georgiev, Rocktim Jyoti Das, and Preslav Nakov. 2024. Factuality of Large Language Models in the Year 2024. arXiv preprint arXiv:2402.02420.
[53]
Kevin Whitehead and Gene Lerner. 2009. When are persons ’white’?: On some practical asymmetries of racial reference in talk-in-interaction. Discourse & Society 20 (09 2009), 613–641.
[54]
Sarah Afiqah Mohd Zabidi, Hazlina Md Yusof, Shahrul Naim Sidek, AS Ghazali, and MA Rashidan. 2022. Application of Robots in Improving Joint Attention and Imitation Skills for Children with Autism: A Comprehensive Review. Journal of Mechanical Engineering Research and Developments 45 (2022), 96–122.

Index Terms

  1. Using Large Language Models for Robot-Assisted Therapeutic Role-Play: Factuality is not enough!

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CUI '24: Proceedings of the 6th ACM Conference on Conversational User Interfaces
      July 2024
      616 pages
      ISBN:9798400705113
      DOI:10.1145/3640794
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 July 2024

      Check for updates

      Author Tags

      1. Social conversational agents
      2. interaction quality assessment
      3. large language models
      4. non-referential functions of language
      5. robot-assisted therapy

      Qualifiers

      • Extended-abstract
      • Research
      • Refereed limited

      Funding Sources

      • FNR Luxembourg

      Conference

      CUI '24
      Sponsor:
      CUI '24: ACM Conversational User Interfaces 2024
      July 8 - 10, 2024
      Luxembourg, Luxembourg

      Acceptance Rates

      Overall Acceptance Rate 34 of 100 submissions, 34%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 62
        Total Downloads
      • Downloads (Last 12 months)62
      • Downloads (Last 6 weeks)31
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media