Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3652988.3673931acmconferencesArticle/Chapter ViewAbstractPublication PagesivaConference Proceedingsconference-collections
research-article

Modifying Gesture Style with Impression Words

Published: 26 December 2024 Publication History

Abstract

When people form impressions of others in face-to-face communication, gesture style (i.e. the way of gesturing) impacts their impressions, such as being well-mannered, honest, and enthusiastic. As a mechanism for changing the gesture style, we trained a GAN-based style transfer model using a collection of video clips of speakers. Then, we collected a new speaker dataset from YouTube videos representing three different countries, and applied them to the style encoder of our style transfer model and created a gesture style latent space. However, it is difficult to select an appropriate style from a large number of style candidates to synthesize motions that would give off a specific impression. To assist users with this, we propose a method for automatically selecting an appropriate style by fine-tuning a Large Language Model (LLM) that uses a list of impression words as input. An evaluation study found that the gesture transfer model effectively changes the impression of gesture, and styles selected by the style selection mechanism produced motions that express similar impression to those that applied ground truth styles, compared to randomly selected styles.

References

[1]
Kfir Aberman, Yijia Weng, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2020. Unpaired Motion Style Transfer from Video to Animation. ACM Trans. Graph. 39, 4, Article 64 (aug 2020), 12 pages. https://doi.org/10.1145/3386569.3392469
[2]
Chaitanya Ahuja, Dong Won Lee, and Louis-Philippe Morency. 2022. Low-Resource Adaptation for Personalized Co-Speech Gesture Generation. CVPR (2022), 20566–20576.
[3]
Chaitanya Ahuja, Dong Won Lee, Y. Nakano, and Louis-Philippe Morency. 2020. Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach. European Conference on Computer Vision (2020).
[4]
N. Ambady and R. Rosenthal. 1993. Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology 64, 3 (1993), 431–44.
[5]
Jie An, Tao Li, Haozhi Huang, Li Shen, Xuan Wang, Yongyi Tang, Jinwen Ma, Wei Liu, and Jiebo Luo. 2020. Real-time Universal Style Transfer on High-resolution Images via Zero-channel Pruning. CoRR abs/2006.09029 (2020). arXiv:2006.09029https://arxiv.org/abs/2006.09029
[6]
Jie An, Haoyi Xiong, Jun Huan, and Jiebo Luo. 2020. Ultrafast photorealistic style transfer via neural architecture search. AAAI Conference on Artificial Intelligence (AAAI) (2020), 10443–10450.
[7]
Andreas Aristidou, Daniel Cohen-Or, Jessica K. Hodgins, Yiorgos Chrysanthou, and Ariel Shamir. 2018. Deep Motifs and Motion Signatures. ACM Transactions on Graphics 37, 187 (2018), 1–13. Issue 6.
[8]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291–7299.
[9]
Chung Cheng Chiu and Stacy Marsella. 2011. How to train your avatar: A data driven approach to gesture generation. In Proceedings of the 11th international conference on Intelligent virtual agents (IVA2011), Vol. 6895 LNAI. 127–140. https://doi.org/10.1007/978-3-642-23974-8_14
[10]
Yuzhu Dong, Andreas Aristidou, Ariel Shamir, Moshe Mahler, and Eakta Jain. 2020. Adult2child: Motion Style Transfer Using CycleGANs. In Proceedings of the 13th ACM SIGGRAPH Conference on Motion, Interaction and Games (Virtual Event, SC, USA) (MIG ’20). Association for Computing Machinery, New York, NY, USA, Article 13, 11 pages. https://doi.org/10.1145/3424636.3426909
[11]
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2414–2423. https://doi.org/10.1109/CVPR.2016.265
[12]
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. A Neural Algorithm of Artistic Style. Journal of Vision 16, 12 (2016), 326.
[13]
Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. 2019. Learning Individual Styles of Conversational Gesture. (2019). arxiv:1906.04160http://arxiv.org/abs/1906.04160
[14]
Dai Hasegawa, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. 2018. Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network. In Proceedings of the 18th International Conference on Intelligent Virtual Agents (Sydney, NSW, Australia) (IVA ’18). Association for Computing Machinery, New York, NY, USA, 79–86. https://doi.org/10.1145/3267851.3267878
[15]
Daniel Holden, Jun Saito, and Taku Komura. 2016. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Trans. Graph. 35, 4, Article 138 (jul 2016), 11 pages. https://doi.org/10.1145/2897824.2925975
[16]
Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics 35 (2016), 1–11. Issue 4.
[17]
Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. SIGGRAPH Asia 2015 Technical Briefs18 (2015), 1–4.
[18]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. ICCV (2017), 1510–1519.
[19]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. ECCV abs/1603.08155 (2016), 694–711.
[20]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. CVPR (2019), 4401–4410.
[21]
Mark L. Knapp, Judith A. Hall, and Terrence G. Horgan. 2013. Nonverbal Communication in Human Interaction (8 ed.). Wadsworth Publishing.
[22]
Markus Koppensteiner and Karl Grammer. 2010. Motion patterns in political speech and their influence on personality ratings. Journal of Research in Personality 44, 3 (2010), 374–379. https://doi.org/10.1016/j.jrp.2010.04.002
[23]
Markus Koppensteiner, Pia Stephan, and Johannes Paul Michael Jäschke. 2016. Moving speeches: Dominance, trustworthiness and competence in body motion. Personality and Individual Differences 94 (2016), 101–106. https://doi.org/10.1016/j.paid.2016.01.013
[24]
Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexandersson, Iolanda Leite, and Hedvig Kjellström. 2020. Gesticulator: A Framework for Semantically-Aware Speech-Driven Gesture Generation(ICMI ’20). Association for Computing Machinery, New York, NY, USA, 242–250. https://doi.org/10.1145/3382507.3418815
[25]
Sergey Levine, Christian Theobalt, and Vladlen Koltun. 2009. Real-Time Prosody-Driven Synthesis of Body Language. ACM Trans. Graph. 28, 5 (dec 2009), 1–10. https://doi.org/10.1145/1618452.1618518
[26]
Lawrence I-Kuei Lin. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 1 (1989), 255––268.
[27]
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-Shot Unsupervised Image-to-Image Translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[28]
François Mairesse and Marilyn Walker. 2007. PERSONAGE: Personality Generation for Dialogue. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, Prague, Czech Republic, 496–503. https://aclanthology.org/P07-1063
[29]
David McNeill. 1992. Hand and mind: What gestures reveal about thought. University of Chicago Press.
[30]
Michael Neff, Yingying Wang, Rob Abbott, and Marilyn Walker. 2010. Evaluating the Effect of Gesture and Language on Personality Perception in Conversational Agents. In Intelligent Virtual Agents, Jan Allbeck, Norman Badler, Timothy Bickmore, Catherine Pelachaud, and Alla Safonova (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 222–235.
[31]
Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, and Michael Neff. 2023. A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. (2023). arxiv:2301.05339http://arxiv.org/abs/2301.05339
[32]
Soomin Park, Deok-Kyeong Jang, and Sung-Hee Lee. 2021. Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model. Proc. ACM Comput. Graph. Interact. Tech. 4, 3, Article 36 (sep 2021), 17 pages. https://doi.org/10.1145/3480145
[33]
Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7753–7762.
[34]
Mehmet E. Sargin, Yucel Yemez, Engin Erzin, and Ahmet M. Tekalp. 2008. Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008), 1330–1345. https://doi.org/10.1109/TPAMI.2007.70797
[35]
Smith, Harrison Jesse, and Michael Neff. 2017. Understanding the impact of animated gesture performance on personality perceptions. ACM Transactions on Graphics 36, 4 (2017). https://doi.org/10.1145/3072959.3073697
[36]
T. Tao, X. Zhan, Z. Chen, and M. van de Panne. 2022. Style-ERD: Responsive and Coherent Online Motion Style Transfer. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 6583–6593. https://doi.org/10.1109/CVPR52688.2022.00648
[37]
John C. Thoresen, Quoc C. Vuong, and Anthony P Atkinson. 2012. First impressions: Gait cues drive reliable trait judgements. Cognition 124, 3 (2012), 261–271.
[38]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 8798–8807.
[39]
Yingying Wang, Jean E. Fox Tree, Marilyn Walker, and Michael Neff. 2016. Assessing the Impact of Hand Motion on Virtual Character Personality. ACM Trans. Appl. Percept. 13, 2, Article 9 (mar 2016), 23 pages. https://doi.org/10.1145/2874357
[40]
Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity. ACM Trans. Graph. 39, 6, Article 222 (nov 2020), 16 pages. https://doi.org/10.1145/3414685.3417838

Index Terms

  1. Modifying Gesture Style with Impression Words

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IVA '24: Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents
    September 2024
    337 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 December 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. LLM
    2. gesture generation
    3. gesture impression
    4. impression words
    5. style transfer

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • JST Moonshot R&D
    • JST AIP Trilateral AI Research

    Conference

    IVA '24
    Sponsor:
    IVA '24: ACM International Conference on Intelligent Virtual Agents
    September 16 - 19, 2024
    GLASGOW, United Kingdom

    Acceptance Rates

    Overall Acceptance Rate 53 of 196 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 41
      Total Downloads
    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media