Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3395035.3425244acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Music-Driven Animation Generation of Expressive Musical Gestures

Published: 27 December 2020 Publication History

Abstract

While audio-driven face and gesture motion synthesis has been studied before, to our knowledge no research has been done yet for automatic generation of musical gestures for virtual humans. Existing work either focuses on precise 3D finger movement generation required to play an instrument or expressive musical gestures based on 2D video data. In this paper, we propose a music-driven piano performance generation method using 3D motion capture data and recurrent neural networks. Our results show that it is feasible to automatically generate expressive musical gestures for piano playing using various audio and musical features. However, it is not yet clear which features work best for which type of music. Our future work aims to further test with other datasets, deep learning methods and musical instruments using both objective and subjective evaluations.

Supplementary Material

ZIP File (lbr1021aux.zip)

References

[1]
Omid Alemi, Jules Francc oise, and Philippe Pasquier. 2017. GrooveNet: Real-time music-driven dance movement generation using artificial neural networks. networks, Vol. 8, 17 (2017), 26.
[2]
Omid Alemi and Philippe Pasquier. 2019. Machine Learning for Data-Driven Movement Generation: a Review of the State of the Art. arXiv:1903.08356 (2019).
[3]
R Thomas Boone and Joseph G Cunningham. 2001. Children's expression of emotional meaning in music through expressive body movement. Journal of nonverbal behavior, Vol. 25, 1 (2001), 21--41.
[4]
Alexandre Bouënard, Marcelo M Wanderley, Sylvie Gibet, and Fabrice Marandola. 2011. Virtual gesture control and synthesis of music performances: Qualitative evaluation of synthesized timpani exercises. Computer Music Journal, Vol. 35, 3 (2011), 57--72.
[5]
Roberto Bresin. 1998. Artificial neural networks based models for automatic performance of musical scores. Journal of New Music Research, Vol. 27, 3 (1998), 239--270.
[6]
Claude Cadoz and Marcelo M Wanderley. 2000. Gesture-music. Trends in Gestural Control of Music (2000).
[7]
Antonio Camurri, Ingrid Lagerlöf, and Gualtiero Volpe. 2003. Recognizing emotion from dance movement: comparison of spectator recognition and automated techniques. International journal of human-computer studies, Vol. 59, 1--2 (2003), 213--225.
[8]
Emily Carlson, Pasi Saari, Birgitta Burger, and Petri Toiviainen. 2020. Dance to your own drum: Identification of musical genre and individual dancer from motion capture using machine learning. Journal of New Music Research (2020), 1--16.
[9]
Constantinos Charalambous, Zerrin Yumak, and A Frank van der Stappen. 2019. Audio-driven emotional speech animation for interactive virtual characters. Computer Animation and Virtual Worlds, Vol. 30, 3--4 (2019), e1892.
[10]
Sofia Dahl and Anders Friberg. 2003. Expressiveness of musician's body movements in performances on marimba. In International Gesture Workshop. Springer, 479--486.
[11]
Ferdinand de Coninck, Zerrin Yumak, Guntur Sandino, Remco C Veltkamp, and BV CleVR. 2019. Non-Verbal Behavior Generation for Virtual Characters in Group Conversations. In AIVR. 41--49.
[12]
Thais Fernandes Rodrigues dos Santos. 2017. The relationship between ancillary gestures and musical phrase organization: application to flute performance. (2017).
[13]
Rukun Fan, Songhua Xu, and Weidong Geng. 2011. Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics, Vol. 18, 3 (2011), 501--515.
[14]
Ylva Ferstl, Michael Neff, and Rachel McDonnell. 2019. Multi-objective adversarial gesture generation. In Motion, Interaction and Games. 1--10.
[15]
Anders Friberg. 2004. A fuzzy analyzer of emotional expression in music performance and body motion. In Proceedings of Music and Music Science, Vol. 10. 28--30.
[16]
Satoru Fukayama and Masataka Goto. 2015. Music content driven automated choreography with beat-wise motion connectivity constraints. Proceedings of SMC (2015), 177--183.
[17]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning .MIT press.
[18]
Dai Hasegawa, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. 2018. Evaluation of speech-to-gesture generation using bi-directional LSTM network. In Proceedings of the 18th International Conference on Intelligent Virtual Agents. 79--86.
[19]
Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics, Vol. 36, 4 (2017), 42.
[20]
Ashesh Jain, Amir R Zamir, Silvio Savarese, and Ashutosh Saxena. 2016. Structural-rnn: Deep learning on spatio-temporal graphs. In Proceedings of the ieee conference on computer vision and pattern recognition. 5308--5317.
[21]
Alexander Refsum Jensenius and Marcelo M Wanderley. 2010. Musical gestures: Concepts and methods in research. In Musical Gestures. Routledge, 24--47.
[22]
Tero Karras, Timo Aila, Samuli Laine, Antti Herva, and Jaakko Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics, Vol. 36, 4 (2017), 94.
[23]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).
[24]
Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, and Hedvig Kjellström. 2019. Analyzing input and output representations for speech-driven gesture generation. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 97--104.
[25]
Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, and Jan Kautz. 2019. Dancing to music. In Advances in Neural Information Processing Systems. 3586--3596.
[26]
Zimo Li, Yi Zhou, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2017. Auto-conditioned recurrent networks for extended complex human motion synthesis. arXiv:1707.05363 (2017).
[27]
Jun-Wei Liu, Hung-Yi Lin, Yu-Fen Huang, Hsuan-Kai Kao, and Li Su. 2020. Body Movement Generation for Expressive Violin Performance Applying Neural Networks. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3787--3791.
[28]
Beth Logan et al. 2000. Mel frequency cepstral coefficients for music modeling. In Ismir, Vol. 270. 1--11.
[29]
Catherine Massie-Laberge, Isabelle Cossette, and Marcelo M Wanderley. 2019. Kinematic Analysis of Pianists' Expressive Performances of Romantic Excerpts: Applications for Enhanced Pedagogical Approaches. Frontiers in Psychology, Vol. 9 (2019), 2725.
[30]
Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, Vol. 8. 18--25.
[31]
Meinard Müller. 2015. Fundamentals of music processing: Audio, analysis, algorithms, applications .Springer.
[32]
Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2013. How to construct deep recurrent neural networks. arXiv:1312.6026 (2013).
[33]
Giovanni De Poli. 2004. Methodologies for expressiveness modelling of and for music performance. Journal of New Music Research, Vol. 33, 3 (2004), 189--202.
[34]
Yu Qi, Yazhou Liu, and Quansen Sun. 2019. Music-Driven Dance Generation. IEEE Access, Vol. 7 (2019), 166540--166550.
[35]
Igor Rodriguez, José Mar'ia Mart'inez-Otzeta, Itziar Irigoien, and Elena Lazkano. 2019. Spontaneous talking gestures using generative adversarial networks. Robotics and Autonomous Systems, Vol. 114 (2019), 57--65.
[36]
Justin Salamon and Emilia Gómez. 2012. Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, 6 (2012), 1759--1770.
[37]
Justin Salamon, Emilia Gómez, Daniel PW Ellis, and Gaël Richard. 2014. Melody extraction from polyphonic music signals: Approaches, applications, and challenges. IEEE Signal Processing Magazine, Vol. 31, 2 (2014), 118--134.
[38]
Alvaro Sarasúa, Baptiste Caramiaux, Atau Tanaka, and Miguel Ortiz. 2017. Datasets for the analysis of expressive musical gestures. In Proceedings of the 4th International Conference on Movement Computing. 1--4.
[39]
Danielle Sauer and Yee-Hong Yang. 2009. Music-driven character animation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 5, 4 (2009), 27.
[40]
Eli Shlizerman, Lucio Dery, Hayden Schoen, and Ira Kemelmacher-Shlizerman. 2018. Audio to body dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7574--7583.
[41]
Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. 2019. Neural state machine for character-scene interactions. ACM Trans. Graph., Vol. 38, 6 (2019), 209--1.
[42]
Taoran Tang, Jia Jia, and Hanyang Mao. 2018. Dance with melody: An LSTM-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM international conference on Multimedia. 1598--1606.
[43]
Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, and Iain Matthews. 2017. A deep learning approach for generalized speech animation. ACM Transactions on Graphics, Vol. 36, 4 (2017), 93.
[44]
Marc R Thompson and Geoff Luck. 2012. Exploring relationships between pianists? body movements, their expressive intentions, and structural elements of the music. Musicae Scientiae, Vol. 16, 1 (2012), 19--40.
[45]
Guanzhong Tian, Yi Yuan, and Yong Liu. 2019. Audio2face: Generating speech/face animation from single audio with attention-based bidirectional lstm networks. In 2019 IEEE international conference on Multimedia & Expo Workshops (ICMEW). IEEE, 366--371.
[46]
Herwin Van Welbergen, Ben JH Van Basten, Arjan Egges, Zs M Ruttkay, and Mark H Overmars. 2010. Real time animation of virtual humans: a trade-off between naturalness and control. In Computer Graphics Forum, Vol. 29. Wiley Online Library, 2530--2554.
[47]
Marcelo M Wanderley. 2001. Quantitative analysis of non-obvious performer gestures. In International Gesture Workshop. Springer, 241--253.
[48]
Nelson Yalta, Shinji Watanabe, Kazuhiro Nakadai, and Tetsuya Ogata. 2019. Weakly-supervised deep recurrent neural networks for basic dance step generation. In 2019 International Joint Conference on Neural Networks. IEEE, 1--8.
[49]
Lawrence M Zbikowski. 2016. Musical gesture and musical grammar: A cognitive approach. In New perspectives on music and gesture. Routledge, 109--124.
[50]
He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics, Vol. 37, 4 (2018), 1--11.
[51]
Ruobing Zheng, Zhou Zhu, Bo Song, and Changjiang Ji. 2020. Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks. arXiv:2002.08700 (2020).
[52]
Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, and Karan Singh. 2018. Visemenet: Audio-driven animator-centric speech animation. ACM Transactions on Graphics, Vol. 37, 4 (2018), 1--10.
[53]
Yuanfeng Zhu, Ajay Sundar Ramakrishnan, Bernd Hamann, and Michael Neff. 2013. A system for automatic animation of piano performances. Computer Animation and Virtual Worlds, Vol. 24, 5 (2013), 445--457.

Cited By

View all
  • (2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
  • (2024)A Survey on Realistic Virtual Human Animations: Definitions, Features and EvaluationsComputer Graphics Forum10.1111/cgf.1506443:2Online publication date: 30-Apr-2024
  • (2023)A Music-Driven Deep Generative Adversarial Model for Guzheng Playing AnimationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311590229:2(1400-1414)Online publication date: 1-Feb-2023
  • Show More Cited By

Index Terms

  1. Music-Driven Animation Generation of Expressive Musical Gestures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction
    October 2020
    548 pages
    ISBN:9781450380027
    DOI:10.1145/3395035
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 December 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. gesture animation
    2. music-driven animation
    3. neural networks

    Qualifiers

    • Short-paper

    Conference

    ICMI '20
    Sponsor:
    ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
    October 25 - 29, 2020
    Virtual Event, Netherlands

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)57
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
    • (2024)A Survey on Realistic Virtual Human Animations: Definitions, Features and EvaluationsComputer Graphics Forum10.1111/cgf.1506443:2Online publication date: 30-Apr-2024
    • (2023)A Music-Driven Deep Generative Adversarial Model for Guzheng Playing AnimationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.311590229:2(1400-1414)Online publication date: 1-Feb-2023
    • (2023)Audio-Driven Facial Landmark Generation in Violin Performance using 3DCNN Network with Self Attention ModelICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10096358(1-5)Online publication date: 4-Jun-2023
    • (2022)Self-Supervised Music Motion Synchronization Learning for Music-Driven Conducting Motion GenerationJournal of Computer Science and Technology10.1007/s11390-022-2030-z37:3(539-558)Online publication date: 1-Jun-2022
    • (2021)Exploiting Serialized Fine-Grained Action Recognition Using WiFi SensingMobile Information Systems10.1155/2021/47701432021Online publication date: 1-Jan-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media