Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581641.3584045acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Public Access

The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents

Published: 27 March 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Previous studies regarding the perception of emotions for embodied virtual agents have shown the effectiveness of using virtual characters in conveying emotions through interactions with humans. However, creating an autonomous embodied conversational agent with expressive behaviors presents two major challenges. The first challenge is the difficulty of synthesizing the conversational behaviors for each modality that are as expressive as real human behaviors. The second challenge is that the affects are modeled independently, which makes it difficult to generate multimodal responses with consistent emotions across all modalities. In this work, we propose a conceptual framework, ACTOR (Affect-Consistent mulTimodal behaviOR generation), that aims to increase the perception of affects by generating multimodal behaviors conditioned on a consistent driving affect. We have conducted a user study with 199 participants to assess how the average person judges the affects perceived from multimodal behaviors that are consistent and inconsistent with respect to a driving affect. The result shows that among all model conditions, our affect-consistent framework receives the highest Likert scores for the perception of driving affects. Our statistical analysis suggests that making a modality affect-inconsistent significantly decreases the perception of driving affects. We also observe that multimodal behaviors conditioned on consistent affects are more expressive compared to behaviors with inconsistent affects. Therefore, we conclude that multimodal emotion conditioning and affect consistency are vital to enhancing the perception of affects for embodied conversational agents.

    Supplementary Material

    PDF File (appendix.pdf)
    Appendix and Demo Video.
    MP4 File (demo_video.mp4)
    Appendix and Demo Video.

    References

    [1]
    Hervé Abdi and Lynne J Williams. 2010. Tukey’s honestly significant difference (HSD) test. Encyclopedia of research design 3, 1 (2010), 1–5.
    [2]
    Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 487–496.
    [3]
    Hillel Aviezer, Yaacov Trope, and Alexander Todorov. 2012. Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338, 6111 (2012), 1225–1229.
    [4]
    Farnaz Badiee and David Kaufman. 2015. Design evaluation of a simulation for teacher education. Sage Open 5, 2 (2015), 2158244015592454.
    [5]
    Kirsten Bergmann and Stefan Kopp. 2009. Increasing the expressiveness of virtual agents: autonomous generation of speech and gesture for spatial description tasks. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. 361–368.
    [6]
    Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, and Dinesh Manocha. 2021. Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning. In Proceedings of the 29th ACM International Conference on Multimedia(MM ’21). Association for Computing Machinery, New York, NY, USA.
    [7]
    Uttaran Bhattacharya, Nicholas Rewkowski, Abhishek Banerjee, Pooja Guhan, Aniket Bera, and Dinesh Manocha. 2021. Text2Gestures: A Transformer-Based Network for Generating Emotive Body Gestures for Virtual Agents** This work has been supported in part by ARO Grants W911NF1910069 and W911NF1910315, and Intel. Code and additional materials available at: https://gamma. umd. edu/t2g. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR). IEEE, 1–10.
    [8]
    Pieter A Blomsma, Guido M Linders, Julija Vaitonyte, and Max M Louwerse. 2020. Intrapersonal dependencies in multimodal behavior. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–8.
    [9]
    Jacky Casas, Timo Spring, Karl Daher, Elena Mugellini, Omar Abou Khaled, and Philippe Cudré-Mauroux. 2021. Enhancing conversational agents with empathic abilities. In Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. 41–47.
    [10]
    Susana Castillo, Philipp Hahn, Katharina Legde, and Douglas W Cunningham. 2018. Personality analysis of embodied conversational agents. In Proceedings of the 18th International Conference on Intelligent Virtual Agents. 227–232.
    [11]
    Che-Jui Chang. 2020. Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion. https://doi.org/10.48550/ARXIV.2009.14668
    [12]
    Che-Jui Chang, Sen Zhang, and Mubbasir Kapadia. 2022. The TeamName entry to the GENEA Challenge 2022 – A Tacotron2 Based Method for Co-Speech Gesture Generation With Locality-Constraint Attention Mechanism. in press (2022).
    [13]
    Che-Jui Chang, Long Zhao, Sen Zhang, and Mubbasir Kapadia. 2022. Disentangling audio content and emotion with adaptive instance normalization for expressive facial animation synthesis. Computer Animation and Virtual Worlds 33, 3-4 (2022), e2076. https://doi.org/10.1002/cav.2076 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cav.2076
    [14]
    Chaona Chen, Oliver GB Garrod, Philippe G Schyns, and Rachael E Jack. 2020. Dynamic Face Movement Texture Enhances the Perceived Realism of Facial Expressions of Emotion. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–3.
    [15]
    Céline Clavel, Justine Plessier, Jean-Claude Martin, Laurent Ach, and Benoit Morel. 2009. Combining facial and postural expressions of emotions in a virtual character. In International Workshop on Intelligent Virtual Agents. Springer, 287–300.
    [16]
    Pierre Colombo, Wojciech Witon, Ashutosh Modi, James Kennedy, and Mubbasir Kapadia. 2019. Affect-Driven Dialog Generation. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT (2019-01-01). 3734–3743. https://aclweb.org/anthology/papers/N/N19/N19-1374/
    [17]
    Alban Delamarre, Cédric Buche, and Christine Lisetti. 2019. Aimer: Appraisal interpersonal model of emotion regulation, affective virtual students to support teachers training. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 182–184.
    [18]
    Steve DiPaola and Özge Nilay Yalçin. 2019. A multi-layer artificial intelligence and sensing based affective conversational embodied agent. In 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW). IEEE, 91–92.
    [19]
    Funda Durupinar, Mubbasir Kapadia, Susan Deutsch, Michael Neff, and Norman I Badler. 2016. Perform: Perceptual approach for adding ocean personality to human motion using laban movement analysis. ACM Transactions on Graphics (TOG) 36, 1 (2016), 1–16.
    [20]
    Paul Ekman. 1999. Basic emotions. Handbook of cognition and emotion 98, 45-60 (1999), 16.
    [21]
    Unreal Engine. 2021. MetaHuman Creator.
    [22]
    Cathy Ennis, Ludovic Hoyet, Arjan Egges, and Rachel McDonnell. 2013. Emotion capture: Emotionally expressive characters for games. In Proceedings of motion on games. 53–60.
    [23]
    Jessica Falk, Steven Poulakos, Mubbasir Kapadia, and Robert W Sumner. 2018. Pica: Proactive intelligent conversational agent for interactive narratives. In Proceedings of the 18th International Conference on Intelligent Virtual Agents. 141–146.
    [24]
    Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2020. Understanding the predictability of gesture parameters from speech and their perceptual importance. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–8.
    [25]
    Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2021. ExpressGesture: Expressive gesture generation from speech through database matching. Computer Animation and Virtual Worlds 32, 3-4 (2021), e2016.
    [26]
    Ylva Ferstl, Sean Thomas, Cédric Guiard, Cathy Ennis, and Rachel McDonnell. 2021. Human or Robot? Investigating voice, appearance and gesture motion realism of conversational social agents. In Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. 76–83.
    [27]
    Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. 2019. Learning individual styles of conversational gesture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3497–3506.
    [28]
    Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Lingjie Liu, Hans-Peter Seidel, Gerard Pons-Moll, Mohamed Elgharib, and Christian Theobalt. 2021. Learning speech-driven 3d conversational gestures from video. In Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. 101–108.
    [29]
    Rens Hoegen, Deepali Aneja, Daniel McDuff, and Mary Czerwinski. 2019. An end-to-end conversational style matching agent. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 111–118.
    [30]
    IBM. 2015. IBM Text to Speech. https://www.ibm.com/watson. Accessed: 2022-03-05.
    [31]
    Ryo Ishii, Chaitanya Ahuja, Yukiko I Nakano, and Louis-Philippe Morency. 2020. Impact of personality on nonverbal behavior generation. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–8.
    [32]
    Sepehr Janghorbani, Ashutosh Modi, Jakob Buhmann, and Mubbasir Kapadia. 2019. Domain authoring assistant for intelligent virtual agents. arXiv preprint arXiv:1904.03266(2019).
    [33]
    Charles M Judd, Gary H McClelland, and Carey S Ryan. 2017. Data analysis: A model comparison approach to regression, ANOVA, and beyond. Routledge.
    [34]
    Mubbasir Kapadia, Fabio Zünd, Jessica Falk, Marcel Marti, Robert W Sumner, and Markus Gross. 2015. Evaluating the authoring complexity of interactive narratives with interactive behaviour trees. Foundations of Digital Games(2015).
    [35]
    Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics (TOG) 36, 4 (2017), 1–12.
    [36]
    Stefan Kopp, Brigitte Krenn, Stacy Marsella, Andrew N Marshall, Catherine Pelachaud, Hannes Pirker, Kristinn R Thórisson, and Hannes Vilhjálmsson. 2006. Towards a common framework for multimodal generation: The behavior markup language. In International workshop on intelligent virtual agents. Springer, 205–217.
    [37]
    Jina Lee and Stacy Marsella. 2006. Nonverbal behavior generator for embodied conversational agents. In International Workshop on Intelligent Virtual Agents. Springer, 243–255.
    [38]
    Tao Li, Shan Yang, Liumeng Xue, and Lei Xie. 2021. Controllable emotion transfer for end-to-end speech synthesis. In 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 1–5.
    [39]
    Meng Liu, Yaocong Duan, Robin AA Ince, Chaona Chen, Oliver GB Garrod, Philippe G Schyns, and Rachael E Jack. 2020. Building a generative space of facial expressions of emotions using psychological data-driven methods. In Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents. 1–3.
    [40]
    Yu Liu, Gelareh Mohammadi, Yang Song, and Wafa Johal. 2021. Speech-based Gesture Generation for Robots and Embodied Agents: A Scoping Review. In Proceedings of the 9th International Conference on Human-Agent Interaction. 31–38.
    [41]
    Max M Louwerse, Rick Dale, Ellen G Bard, and Patrick Jeuniaux. 2012. Behavior matching in multimodal communication is synchronized. Cognitive science 36, 8 (2012), 1404–1426.
    [42]
    Robert R McCrae and Oliver P John. 1992. An introduction to the five-factor model and its applications. Journal of personality 60, 2 (1992), 175–215.
    [43]
    Rachel McDonnell, Sophie Jörg, Joanna McHugh, Fiona Newell, and Carol O’Sullivan. 2008. Evaluating the emotional content of human motions on real and virtual characters. In Proceedings of the 5th symposium on Applied perception in graphics and visualization. 67–74.
    [44]
    David McNeill. 1992. Hand and Mind: What Gestures Reveal About Thought.(1992).
    [45]
    Rajmund Nagy, Taras Kucherenko, Birger Moell, André Pereira, Hedvig Kjellström, and Ulysses Bernardet. 2021. A framework for integrating gesture generation models into interactive conversational agents. arXiv preprint arXiv:2102.12302(2021).
    [46]
    Nvidia. 2021. Omniverse Audio2Face.
    [47]
    Delphine Potdevin, Céline Clavel, and Nicolas Sabouret. 2018. Virtual Intimacy, this little something between us: a study about Human perception of intimate behaviors in Embodied Conversational Agents. In Proceedings of the 18th international conference on intelligent virtual agents. 165–172.
    [48]
    Qualtrics. 2021. Qualtrics. Qualtrics, Provo, Utah, USA. http://www.qualtrics.com
    [49]
    Tanmay Randhavane, Aniket Bera, Kyra Kapsaskis, Rahul Sheth, Kurt Gray, and Dinesh Manocha. 2019. Eva: Generating emotional behavior of virtual agents using expressive features of gait and gaze. In ACM symposium on applied perception 2019. 1–10.
    [50]
    Brian Ravenet, Catherine Pelachaud, Chloé Clavel, and Stacy Marsella. 2018. Automating the production of communicative gestures in embodied characters. Frontiers in psychology 9 (2018), 1144.
    [51]
    James A Russell. 1980. A circumplex model of affect.Journal of personality and social psychology 39, 6(1980), 1161.
    [52]
    Pejman Sajjadi, Laura Hoffmann, Philipp Cimiano, and Stefan Kopp. 2019. A personality-based emotional model for embodied conversational agents: Effects on perceived social presence and game experience of users. Entertainment Computing 32 (2019), 100313.
    [53]
    Marc Schröder. 2001. Emotional speech synthesis: A review. In Seventh European Conference on Speech Communication and Technology. Citeseer.
    [54]
    Alexander Shoulson, Nathan Marshak, Mubbasir Kapadia, and Norman I Badler. 2013. Adapt: the agent developmentand prototyping testbed. IEEE Transactions on Visualization and Computer Graphics 20, 7(2013), 1035–1047.
    [55]
    Maayan Shvo, Jakob Buhmann, and Mubbasir Kapadia. 2019. An interdependent model of personality, motivation, emotion, and mood for intelligent virtual agents. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 65–72.
    [56]
    Samuel S Sohn, Xun Zhang, Fernando Geraci, and Mubbasir Kapadia. 2018. An emotionally aware embodied conversational agent. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2250–2252.
    [57]
    Sinan Sonlu, Uğur Güdükbay, and Funda Durupinar. 2021. A conversational agent framework with multi-modal personality expression. ACM Transactions on Graphics (TOG) 40, 1 (2021), 1–16.
    [58]
    Petra Wagner, Zofia Malisz, and Stefan Kopp. 2014. Gesture and speech in interaction: An overview., 209–232 pages.
    [59]
    Bernard L Welch. 1947. The generalization of ‘STUDENT’S’problem when several different population varlances are involved. Biometrika 34, 1-2 (1947), 28–35.
    [60]
    Pieter Wolfert, Nicole Robinson, and Tony Belpaeme. 2022. A review of evaluation practices of gesture generation in embodied conversational agents. IEEE Transactions on Human-Machine Systems(2022).
    [61]
    Özge Nilay Yalçın. 2020. Empathy framework for embodied conversational agents. Cognitive Systems Research 59 (2020), 123–132.
    [62]
    Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech gesture generation from the trimodal context of text, audio, and speaker identity. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1–16.
    [63]
    Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2019. Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots. In Proc. of The International Conference in Robotics and Automation (ICRA).
    [64]
    Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, and Bing-Qian Liu. 2018. Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory. In AAAI.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IUI '23: Proceedings of the 28th International Conference on Intelligent User Interfaces
    March 2023
    972 pages
    ISBN:9798400701061
    DOI:10.1145/3581641
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 March 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. affect consistency
    2. embodied conversational agents
    3. emotion conditioning
    4. multimodal behavior generation

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    IUI '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 746 of 2,811 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 376
      Total Downloads
    • Downloads (Last 12 months)254
    • Downloads (Last 6 weeks)22
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media