research-article

Modifying Gesture Style with Impression Words

Authors:

Yoshiki Takahashi,

Yukiko I. Nakano,

Tatsuya Sakato,

Hannes Högni VilhjálmssonAuthors Info & Claims

IVA '24: Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents

Article No.: 15, Pages 1 - 9

https://doi.org/10.1145/3652988.3673931

Published: 26 December 2024 Publication History

Abstract

When people form impressions of others in face-to-face communication, gesture style (i.e. the way of gesturing) impacts their impressions, such as being well-mannered, honest, and enthusiastic. As a mechanism for changing the gesture style, we trained a GAN-based style transfer model using a collection of video clips of speakers. Then, we collected a new speaker dataset from YouTube videos representing three different countries, and applied them to the style encoder of our style transfer model and created a gesture style latent space. However, it is difficult to select an appropriate style from a large number of style candidates to synthesize motions that would give off a specific impression. To assist users with this, we propose a method for automatically selecting an appropriate style by fine-tuning a Large Language Model (LLM) that uses a list of impression words as input. An evaluation study found that the gesture transfer model effectively changes the impression of gesture, and styles selected by the style selection mechanism produced motions that express similar impression to those that applied ground truth styles, compared to randomly selected styles.

References

[1]

Kfir Aberman, Yijia Weng, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2020. Unpaired Motion Style Transfer from Video to Animation. ACM Trans. Graph. 39, 4, Article 64 (aug 2020), 12 pages. https://doi.org/10.1145/3386569.3392469

Digital Library

[2]

Chaitanya Ahuja, Dong Won Lee, and Louis-Philippe Morency. 2022. Low-Resource Adaptation for Personalized Co-Speech Gesture Generation. CVPR (2022), 20566–20576.

[3]

Chaitanya Ahuja, Dong Won Lee, Y. Nakano, and Louis-Philippe Morency. 2020. Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach. European Conference on Computer Vision (2020).

[4]

N. Ambady and R. Rosenthal. 1993. Half a minute: Predicting teacher evaluations from thin slices of nonverbal behavior and physical attractiveness. Journal of Personality and Social Psychology 64, 3 (1993), 431–44.

[5]

Jie An, Tao Li, Haozhi Huang, Li Shen, Xuan Wang, Yongyi Tang, Jinwen Ma, Wei Liu, and Jiebo Luo. 2020. Real-time Universal Style Transfer on High-resolution Images via Zero-channel Pruning. CoRR abs/2006.09029 (2020). arXiv:2006.09029https://arxiv.org/abs/2006.09029

[6]

Jie An, Haoyi Xiong, Jun Huan, and Jiebo Luo. 2020. Ultrafast photorealistic style transfer via neural architecture search. AAAI Conference on Artificial Intelligence (AAAI) (2020), 10443–10450.

[7]

Andreas Aristidou, Daniel Cohen-Or, Jessica K. Hodgins, Yiorgos Chrysanthou, and Ariel Shamir. 2018. Deep Motifs and Motion Signatures. ACM Transactions on Graphics 37, 187 (2018), 1–13. Issue 6.

Digital Library

[8]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7291–7299.

[9]

Chung Cheng Chiu and Stacy Marsella. 2011. How to train your avatar: A data driven approach to gesture generation. In Proceedings of the 11th international conference on Intelligent virtual agents (IVA2011), Vol. 6895 LNAI. 127–140. https://doi.org/10.1007/978-3-642-23974-8_14

[10]

Yuzhu Dong, Andreas Aristidou, Ariel Shamir, Moshe Mahler, and Eakta Jain. 2020. Adult2child: Motion Style Transfer Using CycleGANs. In Proceedings of the 13th ACM SIGGRAPH Conference on Motion, Interaction and Games (Virtual Event, SC, USA) (MIG ’20). Association for Computing Machinery, New York, NY, USA, Article 13, 11 pages. https://doi.org/10.1145/3424636.3426909

Digital Library

[11]

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2414–2423. https://doi.org/10.1109/CVPR.2016.265

[12]

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. A Neural Algorithm of Artistic Style. Journal of Vision 16, 12 (2016), 326.

[13]

Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik. 2019. Learning Individual Styles of Conversational Gesture. (2019). arxiv:1906.04160http://arxiv.org/abs/1906.04160

[14]

Dai Hasegawa, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. 2018. Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network. In Proceedings of the 18th International Conference on Intelligent Virtual Agents (Sydney, NSW, Australia) (IVA ’18). Association for Computing Machinery, New York, NY, USA, 79–86. https://doi.org/10.1145/3267851.3267878

Digital Library

[15]

Daniel Holden, Jun Saito, and Taku Komura. 2016. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Trans. Graph. 35, 4, Article 138 (jul 2016), 11 pages. https://doi.org/10.1145/2897824.2925975

Digital Library

[16]

Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics 35 (2016), 1–11. Issue 4.

Digital Library

[17]

Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. SIGGRAPH Asia 2015 Technical Briefs18 (2015), 1–4.

[18]

Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. ICCV (2017), 1510–1519.

[19]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. ECCV abs/1603.08155 (2016), 694–711.

[20]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. CVPR (2019), 4401–4410.

[21]

Mark L. Knapp, Judith A. Hall, and Terrence G. Horgan. 2013. Nonverbal Communication in Human Interaction (8 ed.). Wadsworth Publishing.

[22]

Markus Koppensteiner and Karl Grammer. 2010. Motion patterns in political speech and their influence on personality ratings. Journal of Research in Personality 44, 3 (2010), 374–379. https://doi.org/10.1016/j.jrp.2010.04.002

[23]

Markus Koppensteiner, Pia Stephan, and Johannes Paul Michael Jäschke. 2016. Moving speeches: Dominance, trustworthiness and competence in body motion. Personality and Individual Differences 94 (2016), 101–106. https://doi.org/10.1016/j.paid.2016.01.013

[24]

Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexandersson, Iolanda Leite, and Hedvig Kjellström. 2020. Gesticulator: A Framework for Semantically-Aware Speech-Driven Gesture Generation(ICMI ’20). Association for Computing Machinery, New York, NY, USA, 242–250. https://doi.org/10.1145/3382507.3418815

Digital Library

[25]

Sergey Levine, Christian Theobalt, and Vladlen Koltun. 2009. Real-Time Prosody-Driven Synthesis of Body Language. ACM Trans. Graph. 28, 5 (dec 2009), 1–10. https://doi.org/10.1145/1618452.1618518

Digital Library

[26]

Lawrence I-Kuei Lin. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 1 (1989), 255––268.

[27]

Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-Shot Unsupervised Image-to-Image Translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[28]

François Mairesse and Marilyn Walker. 2007. PERSONAGE: Personality Generation for Dialogue. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, Prague, Czech Republic, 496–503. https://aclanthology.org/P07-1063

[29]

David McNeill. 1992. Hand and mind: What gestures reveal about thought. University of Chicago Press.

[30]

Michael Neff, Yingying Wang, Rob Abbott, and Marilyn Walker. 2010. Evaluating the Effect of Gesture and Language on Personality Perception in Conversational Agents. In Intelligent Virtual Agents, Jan Allbeck, Norman Badler, Timothy Bickmore, Catherine Pelachaud, and Alla Safonova (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 222–235.

Digital Library

[31]

Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav Eje Henter, and Michael Neff. 2023. A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. (2023). arxiv:2301.05339http://arxiv.org/abs/2301.05339

[32]

Soomin Park, Deok-Kyeong Jang, and Sung-Hee Lee. 2021. Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model. Proc. ACM Comput. Graph. Interact. Tech. 4, 3, Article 36 (sep 2021), 17 pages. https://doi.org/10.1145/3480145

Digital Library

[33]

Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7753–7762.

[34]

Mehmet E. Sargin, Yucel Yemez, Engin Erzin, and Ahmet M. Tekalp. 2008. Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation. IEEE Trans. Pattern Anal. Mach. Intell. 30 (2008), 1330–1345. https://doi.org/10.1109/TPAMI.2007.70797

Digital Library

[35]

Smith, Harrison Jesse, and Michael Neff. 2017. Understanding the impact of animated gesture performance on personality perceptions. ACM Transactions on Graphics 36, 4 (2017). https://doi.org/10.1145/3072959.3073697

Digital Library

[36]

T. Tao, X. Zhan, Z. Chen, and M. van de Panne. 2022. Style-ERD: Responsive and Coherent Online Motion Style Transfer. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 6583–6593. https://doi.org/10.1109/CVPR52688.2022.00648

[37]

John C. Thoresen, Quoc C. Vuong, and Anthony P Atkinson. 2012. First impressions: Gait cues drive reliable trait judgements. Cognition 124, 3 (2012), 261–271.

[38]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 8798–8807.

[39]

Yingying Wang, Jean E. Fox Tree, Marilyn Walker, and Michael Neff. 2016. Assessing the Impact of Hand Motion on Virtual Character Personality. ACM Trans. Appl. Percept. 13, 2, Article 9 (mar 2016), 23 pages. https://doi.org/10.1145/2874357

Digital Library

[40]

Youngwoo Yoon, Bok Cha, Joo-Haeng Lee, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2020. Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity. ACM Trans. Graph. 39, 6, Article 222 (nov 2020), 16 pages. https://doi.org/10.1145/3414685.3417838

Digital Library

Index Terms

Modifying Gesture Style with Impression Words
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion processing

Recommendations

Pluggable Style Representation Learning for Multi-style Transfer
Computer Vision – ACCV 2024
Abstract
Due to the high diversity of image styles, the scalability to various styles plays a critical role in real-world applications. To accommodate a large amount of styles, previous multi-style transfer approaches rely on enlarging the model size while ...
Style Transfer for Co-speech Gesture Animation: A Multi-speaker Conditional-Mixture Approach
Computer Vision – ECCV 2020
Abstract
How can we teach robots or virtual assistants to gesture naturally? Can we go further and adapt the gesturing style to follow a specific speaker? Gestures that are naturally timed with corresponding speech during human communication are called co-...
Style Transfer with Gesture Style Generator
SA '24: SIGGRAPH Asia 2024 Posters
Traditionally, style transfer methods have extracted style element from user-provided " Style motion" and content element from "Content motion", and combining these two elements to achieve transferred style in the content motion. In the process, they ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IVA '24: Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents

September 2024

337 pages

ISBN:9798400706257

DOI:10.1145/3652988

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

JST Moonshot R&D
JST AIP Trilateral AI Research

Conference

IVA '24

Sponsor:

SIGAI

IVA '24: ACM International Conference on Intelligent Virtual Agents

September 16 - 19, 2024

GLASGOW, United Kingdom

Acceptance Rates

Overall Acceptance Rate 53 of 196 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
41
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)18

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten