short-paper

Music-Driven Animation Generation of Expressive Musical Gestures

Authors:

Alysha Bogaers,

Anja VolkAuthors Info & Claims

ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction

Pages 22 - 26

https://doi.org/10.1145/3395035.3425244

Published: 27 December 2020 Publication History

Abstract

While audio-driven face and gesture motion synthesis has been studied before, to our knowledge no research has been done yet for automatic generation of musical gestures for virtual humans. Existing work either focuses on precise 3D finger movement generation required to play an instrument or expressive musical gestures based on 2D video data. In this paper, we propose a music-driven piano performance generation method using 3D motion capture data and recurrent neural networks. Our results show that it is feasible to automatically generate expressive musical gestures for piano playing using various audio and musical features. However, it is not yet clear which features work best for which type of music. Our future work aims to further test with other datasets, deep learning methods and musical instruments using both objective and subjective evaluations.

Supplementary Material

ZIP File (lbr1021aux.zip)

Download
74.69 MB

References

[1]

Omid Alemi, Jules Francc oise, and Philippe Pasquier. 2017. GrooveNet: Real-time music-driven dance movement generation using artificial neural networks. networks, Vol. 8, 17 (2017), 26.

[2]

Omid Alemi and Philippe Pasquier. 2019. Machine Learning for Data-Driven Movement Generation: a Review of the State of the Art. arXiv:1903.08356 (2019).

[3]

R Thomas Boone and Joseph G Cunningham. 2001. Children's expression of emotional meaning in music through expressive body movement. Journal of nonverbal behavior, Vol. 25, 1 (2001), 21--41.

[4]

Alexandre Bouënard, Marcelo M Wanderley, Sylvie Gibet, and Fabrice Marandola. 2011. Virtual gesture control and synthesis of music performances: Qualitative evaluation of synthesized timpani exercises. Computer Music Journal, Vol. 35, 3 (2011), 57--72.

Digital Library

[5]

Roberto Bresin. 1998. Artificial neural networks based models for automatic performance of musical scores. Journal of New Music Research, Vol. 27, 3 (1998), 239--270.

[6]

Claude Cadoz and Marcelo M Wanderley. 2000. Gesture-music. Trends in Gestural Control of Music (2000).

[7]

Antonio Camurri, Ingrid Lagerlöf, and Gualtiero Volpe. 2003. Recognizing emotion from dance movement: comparison of spectator recognition and automated techniques. International journal of human-computer studies, Vol. 59, 1--2 (2003), 213--225.

Digital Library

[8]

Emily Carlson, Pasi Saari, Birgitta Burger, and Petri Toiviainen. 2020. Dance to your own drum: Identification of musical genre and individual dancer from motion capture using machine learning. Journal of New Music Research (2020), 1--16.

[9]

Constantinos Charalambous, Zerrin Yumak, and A Frank van der Stappen. 2019. Audio-driven emotional speech animation for interactive virtual characters. Computer Animation and Virtual Worlds, Vol. 30, 3--4 (2019), e1892.

[10]

Sofia Dahl and Anders Friberg. 2003. Expressiveness of musician's body movements in performances on marimba. In International Gesture Workshop. Springer, 479--486.

[11]

Ferdinand de Coninck, Zerrin Yumak, Guntur Sandino, Remco C Veltkamp, and BV CleVR. 2019. Non-Verbal Behavior Generation for Virtual Characters in Group Conversations. In AIVR. 41--49.

[12]

Thais Fernandes Rodrigues dos Santos. 2017. The relationship between ancillary gestures and musical phrase organization: application to flute performance. (2017).

[13]

Rukun Fan, Songhua Xu, and Weidong Geng. 2011. Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics, Vol. 18, 3 (2011), 501--515.

Digital Library

[14]

Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2019. Multi-objective adversarial gesture generation. In Motion, Interaction and Games. 1--10.

Digital Library

[15]

Anders Friberg. 2004. A fuzzy analyzer of emotional expression in music performance and body motion. In Proceedings of Music and Music Science, Vol. 10. 28--30.

[16]

Satoru Fukayama and Masataka Goto. 2015. Music content driven automated choreography with beat-wise motion connectivity constraints. Proceedings of SMC (2015), 177--183.

[17]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning .MIT press.

Digital Library

[18]

Dai Hasegawa, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. 2018. Evaluation of speech-to-gesture generation using bi-directional LSTM network. In Proceedings of the 18th International Conference on Intelligent Virtual Agents. 79--86.

Digital Library

[19]

Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics, Vol. 36, 4 (2017), 42.

Digital Library

[20]

Ashesh Jain, Amir R Zamir, Silvio Savarese, and Ashutosh Saxena. 2016. Structural-rnn: Deep learning on spatio-temporal graphs. In Proceedings of the ieee conference on computer vision and pattern recognition. 5308--5317.

[21]

Alexander Refsum Jensenius and Marcelo M Wanderley. 2010. Musical gestures: Concepts and methods in research. In Musical Gestures. Routledge, 24--47.

[22]

Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics, Vol. 36, 4 (2017), 94.

Digital Library

[23]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).

[24]

Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström. 2019. Analyzing input and output representations for speech-driven gesture generation. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 97--104.

Digital Library

[25]

Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, and Jan Kautz. 2019. Dancing to music. In Advances in Neural Information Processing Systems. 3586--3596.

[26]

Zimo Li, Yi Zhou, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2017. Auto-conditioned recurrent networks for extended complex human motion synthesis. arXiv:1707.05363 (2017).

[27]

Jun-Wei Liu, Hung-Yi Lin, Yu-Fen Huang, Hsuan-Kai Kao, and Li Su. 2020. Body Movement Generation for Expressive Violin Performance Applying Neural Networks. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3787--3791.

[28]

Beth Logan et al. 2000. Mel frequency cepstral coefficients for music modeling. In Ismir, Vol. 270. 1--11.

[29]

Catherine Massie-Laberge, Isabelle Cossette, and Marcelo M Wanderley. 2019. Kinematic Analysis of Pianists' Expressive Performances of Romantic Excerpts: Applications for Enhanced Pedagogical Approaches. Frontiers in Psychology, Vol. 9 (2019), 2725.

[30]

Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, Vol. 8. 18--25.

[31]

Meinard Müller. 2015. Fundamentals of music processing: Audio, analysis, algorithms, applications .Springer.

Digital Library

[32]

Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2013. How to construct deep recurrent neural networks. arXiv:1312.6026 (2013).

[33]

Giovanni De Poli. 2004. Methodologies for expressiveness modelling of and for music performance. Journal of New Music Research, Vol. 33, 3 (2004), 189--202.

[34]

Yu Qi, Yazhou Liu, and Quansen Sun. 2019. Music-Driven Dance Generation. IEEE Access, Vol. 7 (2019), 166540--166550.

[35]

Igor Rodriguez, José Mar'ia Mart'inez-Otzeta, Itziar Irigoien, and Elena Lazkano. 2019. Spontaneous talking gestures using generative adversarial networks. Robotics and Autonomous Systems, Vol. 114 (2019), 57--65.

Digital Library

[36]

Justin Salamon and Emilia Gómez. 2012. Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, 6 (2012), 1759--1770.

Digital Library

[37]

Justin Salamon, Emilia Gómez, Daniel PW Ellis, and Gaël Richard. 2014. Melody extraction from polyphonic music signals: Approaches, applications, and challenges. IEEE Signal Processing Magazine, Vol. 31, 2 (2014), 118--134.

[38]

Alvaro Sarasúa, Baptiste Caramiaux, Atau Tanaka, and Miguel Ortiz. 2017. Datasets for the analysis of expressive musical gestures. In Proceedings of the 4th International Conference on Movement Computing. 1--4.

Digital Library

[39]

Danielle Sauer and Yee-Hong Yang. 2009. Music-driven character animation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 5, 4 (2009), 27.

Digital Library

[40]

Eli Shlizerman, Lucio Dery, Hayden Schoen, and Ira Kemelmacher-Shlizerman. 2018. Audio to body dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7574--7583.

[41]

Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. 2019. Neural state machine for character-scene interactions. ACM Trans. Graph., Vol. 38, 6 (2019), 209--1.

Digital Library

[42]

Taoran Tang, Jia Jia, and Hanyang Mao. 2018. Dance with melody: An LSTM-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM international conference on Multimedia. 1598--1606.

Digital Library

[43]

Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, and Iain Matthews. 2017. A deep learning approach for generalized speech animation. ACM Transactions on Graphics, Vol. 36, 4 (2017), 93.

Digital Library

[44]

Marc R Thompson and Geoff Luck. 2012. Exploring relationships between pianists? body movements, their expressive intentions, and structural elements of the music. Musicae Scientiae, Vol. 16, 1 (2012), 19--40.

[45]

Guanzhong Tian, Yi Yuan, and Yong Liu. 2019. Audio2face: Generating speech/face animation from single audio with attention-based bidirectional lstm networks. In 2019 IEEE international conference on Multimedia & Expo Workshops (ICMEW). IEEE, 366--371.

[46]

Herwin Van Welbergen, Ben JH Van Basten, Arjan Egges, Zs M Ruttkay, and Mark H Overmars. 2010. Real time animation of virtual humans: a trade-off between naturalness and control. In Computer Graphics Forum, Vol. 29. Wiley Online Library, 2530--2554.

[47]

Marcelo M Wanderley. 2001. Quantitative analysis of non-obvious performer gestures. In International Gesture Workshop. Springer, 241--253.

Digital Library

[48]

Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, and Tetsuya Ogata. 2019. Weakly-supervised deep recurrent neural networks for basic dance step generation. In 2019 International Joint Conference on Neural Networks. IEEE, 1--8.

[49]

Lawrence M Zbikowski. 2016. Musical gesture and musical grammar: A cognitive approach. In New perspectives on music and gesture. Routledge, 109--124.

[50]

He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics, Vol. 37, 4 (2018), 1--11.

Digital Library

[51]

Ruobing Zheng, Zhou Zhu, Bo Song, and Changjiang Ji. 2020. Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks. arXiv:2002.08700 (2020).

[52]

Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, and Karan Singh. 2018. Visemenet: Audio-driven animator-centric speech animation. ACM Transactions on Graphics, Vol. 37, 4 (2018), 1--10.

Digital Library

[53]

Yuanfeng Zhu, Ajay Sundar Ramakrishnan, Bernd Hamann, and Michael Neff. 2013. A system for automatic animation of piano performances. Computer Animation and Virtual Worlds, Vol. 24, 5 (2013), 445--457.

Cited By

Kyriakou Tde la Campa Crespo MPanayiotou AChrysanthou YCharalambous PAristidou A(2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15065
Rekik RWuhrer SHoyet LZibrek KOlivier A(2024)A Survey on Realistic Virtual Human Animations: Definitions, Features and EvaluationsComputer Graphics Forum10.1111/cgf.1506443:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15064
Chen JFan CZhang ZLi GZhao ZDeng ZDing Y(2023)A Music-Driven Deep Generative Adversarial Model for Guzheng Playing AnimationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311590229:2(1400-1414)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TVCG.2021.3115902
Show More Cited By

Index Terms

Music-Driven Animation Generation of Expressive Musical Gestures
1. Computing methodologies
  1. Computer graphics
    1. Animation

Recommendations

Music-driven character animation

Music-driven character animation extracts musical features from a song and uses them to create an animation. This article presents a system that builds a new animation directly from musical attributes, rather than simply synchronizing it to the music ...
Pop Music Generation: From Melody to Multi-style Arrangement
Special Issue on KDD 2018, Regular Papers and Survey Paper

Music plays an important role in our daily life. With the development of deep learning and modern generation techniques, researchers have done plenty of works on automatic music generation. However, due to the special requirements of both melody and ...
Datasets for the Analysis of Expressive Musical Gestures
MOCO '17: Proceedings of the 4th International Conference on Movement Computing

In this paper we present two datasets of instrumental gestures performed with expressive variations: five violinists performing standard pedagogical phrases with variation in dynamics and tempo; and two pianists performing a repertoire piece with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction

October 2020

548 pages

ISBN:9781450380027

DOI:10.1145/3395035

General Chairs:
Khiet Truong
University of Twente, the Netherlands
,
Dirk Heylen
University of Twente, the Netherlands
,
Mary Czerwinski
Microsoft Research, USA
,
Program Chairs:
Nadia Berthouze
University College London, United Kingdom
,
Mohamed Chetouani
Sorbonne University, France
,
Mikio Nakano
C4A Research Institute, Japan

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICMI '20

Sponsor:

SIGCHI

ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 25 - 29, 2020

Virtual Event, Netherlands

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
316
Total Downloads

Downloads (Last 12 months)57
Downloads (Last 6 weeks)3

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kyriakou Tde la Campa Crespo MPanayiotou AChrysanthou YCharalambous PAristidou A(2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15065
Rekik RWuhrer SHoyet LZibrek KOlivier A(2024)A Survey on Realistic Virtual Human Animations: Definitions, Features and EvaluationsComputer Graphics Forum10.1111/cgf.1506443:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15064
Chen JFan CZhang ZLi GZhao ZDeng ZDing Y(2023)A Music-Driven Deep Generative Adversarial Model for Guzheng Playing AnimationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311590229:2(1400-1414)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TVCG.2021.3115902
Lin TLiu CSu L(2023)Audio-Driven Facial Landmark Generation in Violin Performance using 3DCNN Network with Self Attention ModelICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096358(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10096358
Liu FChen DZhou RYang SXu F(2022)Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion GenerationJournal of Computer Science and Technology10.1007/s11390-022-2030-z37:3(539-558)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1007/s11390-022-2030-z
Tong WLi RGong XZhai SZheng XYe G(2021)Exploiting Serialized Fine-Grained Action Recognition Using WiFi SensingMobile Information Systems10.1155/2021/47701432021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/4770143

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents