research-article

Transflower: probabilistic autoregressive dance generation with multimodal attention

Authors:

Guillermo Valle-Pérez,

Gustav Eje Henter,

Andre Holzapfel,

Pierre-Yves Oudeyer,

Simon AlexandersonAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 40, Issue 6

Article No.: 195, Pages 1 - 14

https://doi.org/10.1145/3478513.3480570

Published: 10 December 2021 Publication History

Abstract

Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.

Supplementary Material

ZIP File (a195-valle-perez.zip)

Supplemental files.

Download
189.62 MB

MP4 File (a195-valle-perez.mp4)

Download
330.71 MB

References

[1]

Josh Abramson, Arun Ahuja, Iain Barr, Arthur Brussee, Federico Carnevale, Mary Cassin, Rachita Chhaparia, Stephen Clark, Bogdan Damoc, Andrew Dudzik, et al. 2020. Imitating interactive intelligence. arXiv preprint arXiv:2012.05672 (2020).

[2]

Omid Alemi, Jules Françoise, and Philippe Pasquier. 2017. Groovenet: Real-time music-driven dance movement generation using artificial neural networks. networks 8, 17 (2017), 26.

[3]

Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-controllable speech-driven gesture synthesis using normalising flows. Comput. Graph. Forum 39, 2 (2020), 487--496.

[4]

Okan Arikan and David A. Forsyth. 2002. Interactive motion generation from examples. ACM Trans. Graph. 21, 3 (2002), 483--490.

Digital Library

[5]

Jody Avirgan. 2013. Why Spiderman is Such a Good Dancer. https://www.wnycstudios.org/podcasts/radiolab/articles/299399-why-spiderman-such-good-dancer

[6]

Bettina Bläsing, Beatriz Calvo-Merino, Emily S Cross, Corinne Jola, Juliane Honisch, and Catherine J Stevens. 2012. Neurocognitive control in dance perception and performance. Acta psychologica 139, 2 (2012), 300--308.

[7]

Sebastian Böck and Markus Schedl. 2011. Enhanced beat tracking with context-aware neural networks. In Proc. Int. Conf. Digital Audio Effects. 135--139.

[8]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561--578.

[9]

Sam Bond-Taylor, Adam Leach, Yang Long, and Chris G Willcocks. 2021. Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models. arXiv preprint arXiv:2103.04922 (2021).

[10]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations.

[11]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Proc. NeurIPS, Vol. 33. 1877--1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

[12]

Birgitta Burger, Marc R Thompson, Geoff Luck, Suvi Saarikallio, and Petri Toiviainen. 2013. Influences of rhythm-and timbre-related musical features on characteristics of music-induced movement. Frontiers in Psychology 4, Article 183 (2013), 10 pages.

[13]

Judith Bütepage, Michael J. Black, Danica Kragic, and Hedvig Kjellström. 2017. Deep representation learning for human motion prediction and classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'17). IEEE Computer Society, Los Alamitos, CA, USA, 1591--1599.

[14]

Shih-Pin Chao, Chih-Yi Chiu, Jui-Hsiang Chao, Shi-Nine Yang, and T-K Lin. 2004. Motion retrieval and its application to motion synthesis. In 24th International Conference on Distributed Computing Systems Workshops. IEEE, 254--259.

Digital Library

[15]

CMU Graphics Lab. 2003. Carnegie Mellon University motion capture database. http://mocap.cs.cmu.edu/

[16]

Luka Crnkovic-Friis and Louise Crnkovic-Friis. 2016. Generative choreography using deep learning. arXiv preprint arXiv:1605.06921 (2016).

[17]

Abe Davis and Maneesh Agrawala. 2018. Visual rhythm and beat. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2532--2535.

Digital Library

[18]

Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. 2020. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341 (2020).

[19]

Chris Donahue, Zachary C Lipton, and Julian McAuley. 2017. Dance dance convolution. In International conference on machine learning. PMLR, 1039--1048.

Digital Library

[20]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[21]

Rukun Fan, Songhua Xu, and Weidong Geng. 2011. Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics 18, 3 (2011), 501--515.

Digital Library

[22]

Ylva Ferstl and Rachel McDonnell. 2018. IVA: Investigating the use of recurrent motion modelling for speech gesture generation. In IVA '18 Proceedings of the 18th International Conference on Intelligent Virtual Agents. https://trinityspeechgesture.scss.tcd.ie

Digital Library

[23]

Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'15). IEEE Computer Society, Los Alamitos, CA, USA, 4346--4354.

Digital Library

[24]

Satoru Fukayama and Masataka Goto. 2015. Music content driven automated choreography with beat-wise motion connectivity constraints. Proceedings of SMC (2015), 177--183.

[25]

F. Sebastian Grassia. 1998. Practical parameterization of rotations using the exponential map. J. Graph. Tools 3, 3 (1998), 29--48.

Digital Library

[26]

Keith Grochow, Steven L. Martin, Aaron Hertzmann, and Zoran Popović. 2004. Style-based inverse kinematics. ACM Trans. Graph. 23, 3 (2004), 522--531.

Digital Library

[27]

Ikhansul Habibie, Daniel Holden, Jonathan Schwarz, Joe Yearsley, and Taku Komura. 2017. A recurrent variational autoencoder for human motion synthesis. In Proceedings of the British Machine Vision Conference (BMVC'17). BMVA Press, Durham, UK, Article 119, 12 pages.

[28]

Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Lingjie Liu, Hans-Peter Seidel, Gerard Pons-Moll, Mohamed Elgharib, and Christian Theobalt. 2021. Learning Speech-driven 3D Conversational Gestures from Video. arXiv preprint arXiv:2102.06837 (2021).

[29]

Mari Romarheim Haugen. 2014. Studying rhythmical structures in Norwegian folk music and dance using motion capture technology: A case study of Norwegian telespringar. Musikk og Tradisjon 28 (2014), 27--52.

[30]

Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B Brown, Prafulla Dhariwal, Scott Gray, et al. 2020. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701 (2020).

[31]

Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. Moglow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1--14.

Digital Library

[32]

Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. 2019. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning. PMLR, 2722--2730.

[33]

Daniel Holden, Oussama Kanoun, Maksym Perepichka, and Tiberiu Popa. 2020. Learned motion matching. ACM Trans. Graph. 39, 4 (2020), 53--1.

Digital Library

[34]

Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Trans. Graph. 36, 4, Article 42 (2017), 13 pages.

Digital Library

[35]

Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35, 4, Article 138 (2016), 11 pages.

Digital Library

[36]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curious Case of Neural Text Degeneration. In International Conference on Learning Representations.

[37]

Andre Holzapfel, Michael Hagleitner, and Stella Pashalidou. 2020. Diversity of Traditional Dance Expression in Crete: Data Collection, Research Questions, and Method Development. In Proceedings of the ICTM Study Group on Sound, Movement, and the Sciences Symposium.

[38]

Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018. Music Transformer: Generating Music with Long-Term Structure. In International Conference on Learning Representations.

[39]

Chen Kang, Zhipeng Tan, Jin Lei, Song-Hai Zhang, Yuan-Chen Guo, Weidong Zhang, and Shi-Min Hu. 2021. ChoreoMaster: Choreography-oriented Music-driven Dance Synthesis. (2021). https://www.youtube.com/watch?v=V8MlYa_yhF0 accepted for publication at SIGGRAPH 2021.

Digital Library

[40]

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020).

[41]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.

[42]

Diederik P. Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1×1 convolutions. In Advances in Neural Information Processing Systems (NeurIPS'18). Curran Associates, Inc., Red Hook, NY, USA, 10236--10245. http://papers.nips.cc/paper/8224-glow-generative-flow-with-invertible-1x1-con

Digital Library

[43]

Lucas Kovar and Michael Gleicher. 2004. Automated extraction and parameterization of motions in large data sets. ACM Trans. Graph. 23, 3 (2004), 559--568.

Digital Library

[44]

Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2002. Motion graphs. ACM Trans. Graph. 21, 3 (2002), 473--482.

Digital Library

[45]

Florian Krebs, Sebastian Böck, and Gerhard Widmer. 2015. An Efficient State-Space Model for Joint Tempo and Meter Tracking. In ISMIR. 72--78.

[46]

Taras Kucherenko, Patrik Jonell, Youngwoo Yoon, Pieter Wolfert, and Gustav Eje Henter. 2021. A Large, Crowdsourced Evaluation of Gesture Generation Systems on Common Data: The GENEA Challenge 2020. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI '21). ACM, New York, NY, USA, 11--21.

Digital Library

[47]

OxAI Labs. 2019. DeepSaber. https://github.com/oxai/deepsaber/.

[48]

Kimerer LaMothe. 2019. The dancing species: how moving together in time helps make us human. Aeon (June 2019). https://aeon.co/ideas/the-dancing-species-how-moving-together-in-time-helps-make-us-human

[49]

Ben Lang. 2021. The Future is Now: Live Breakdance Battles in VR Are Connecting People Across the Globe. https://www.roadtovr.com/vr-dance-battle-vrchat-breakdance/

[50]

Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S Srinivasa, and Yaser Sheikh. 2019a. Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 763--772.

[51]

Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, and Jan Kautz. 2019b. Dancing to music. arXiv preprint arXiv:1911.02001 (2019).

[52]

Jehee Lee, Jinxiang Chai, Paul S. A. Reitsma, Jessica K. Hodgins, and Nancy S. Pollard. 2002. Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21, 3 (2002), 491--500.

Digital Library

[53]

Sergey Levine, Jack M. Wang, Alexis Haraux, Zoran Popović, and Vladlen Koltun. 2012. Continuous character control with low-dimensional embeddings. ACM Trans. Graph. 31, 4, Article 28 (2012), 10 pages.

Digital Library

[54]

Buyu Li, Yongchi Zhao, and Lu Sheng. 2021b. DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. arXiv preprint arXiv:2103.10206 (2021).

[55]

Jiaman Li, Yihang Yin, Hang Chu, Yi Zhou, Tingwu Wang, Sanja Fidler, and Hao Li. 2020. Learning to Generate Diverse Dance Motions with Transformer. arXiv preprint arXiv:2008.08171 (2020).

[56]

Ruilong Li, Shan Yang, David A Ross, and Angjoo Kanazawa. 2021a. Learn to Dance with AIST++: Music Conditioned 3D Dance Generation. arXiv preprint arXiv:2101.08779 (2021).

[57]

Hung Yu Ling, Fabio Zinno, George Cheng, and Michiel van de Panne. 2020. Character controllers using motion VAEs. ACM Trans. Graph. 39, 4, Article 40 (2020), 12 pages.

Digital Library

[58]

lox9973. 2021. ShaderMotion. https://gitlab.com/lox9973/ShaderMotion.

[59]

Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision. 5442--5451.

[60]

Christian Mandery, Ömer Terlemez, Martin Do, Nikolaus Vahrenkamp, and Tamim Asfour. 2015. The KIT whole-body human motion database. In 2015 International Conference on Advanced Robotics (ICAR). IEEE, 329--336.

[61]

Alexander Mathis, Steffen Schneider, Jessy Lauer, and Mackenzie Weygandt Mathis. 2020. A primer on motion capture with deep learning: principles, pitfalls, and perspectives. Neuron 108, 1 (2020), 44--65.

[62]

Josh Merel, Yuval Tassa, Sriram Srinivasan, Jay Lemmon, Ziyu Wang, Greg Wayne, and Nicolas Heess. 2017. Learning human behaviors from motion capture by adversarial imitation. arXiv preprint arXiv:1707.02201 (2017).

[63]

Jared E Miller, Laura A Carlson, and J Devin McAuley. 2013. When what you hear influences when you see: listening to an auditory rhythm influences the temporal allocation of visual attention. Psychological science 24, 1 (2013), 11--18.

[64]

Olof Misgeld, Andre Holzapfel, and Sven Ahlbäck. 2019. Dancing Dots - Investigating the Link between Dancer and Musician in Swedish Folk Dance. In Sound & Music Computing Conference.

[65]

Luiz Naveda and Marc Leman. 2011. Hypotheses on the choreographic roots of the musical meter: a case study on Afro-Brazilian dance and music. Debates actuales en evolución, desarrollo y cognición e implicancias socio-culturales (2011), 477--495.

[66]

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. 2021. Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research 22, 57 (2021), 1--64. http://jmlr.org/papers/v22/19-1028.html

[67]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2337--2346.

[68]

Dario Pavllo, David Grangier, and Michael Auli. 2018. QuaterNet: A quaternion-based recurrent model for human motion. In Proceedings of the British Machine Vision Conference (BMVC'18). BMVA Press, Durham, UK, 14 pages. http://www.bmva.org/bmvc/2018/contents/papers/0675.pdf

[69]

Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018a. Deep-Mimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 4 (2018), 1--14.

Digital Library

[70]

Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018b. Sfv: Reinforcement learning of physical skills from videos. ACM Transactions On Graphics (TOG) 37, 6 (2018), 1--14.

Digital Library

[71]

Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. 2021. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. ACM Trans. Graph. 40, 4, Article 1 (July 2021), 15 pages.

Digital Library

[72]

Mathis Petrovich, Michael J Black, and Gül Varol. 2021. Action-Conditioned 3D Human Motion Synthesis with Transformer VAE. arXiv preprint arXiv:2104.05670 (2021).

[73]

Wim Pouw, Shannon Proksch, Linda Drijvers, Marco Gamba, Judith Holler, Christopher Kello, Rebecca S. Schaefer, and Geraint A. Wiggins. 2021. Multilevel rhythms in multimodal communication. Philosophical Transactions of the Royal Society B 376, 1835 (2021), 20200334.

[74]

Ryan Prenger, Rafael Valle, and Bryan Catanzaro. 2019. WaveGlow: A flow-based generative network for speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'19). IEEE Signal Processing Society, Piscataway, NJ, USA, 3617--3621.

[75]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).

[76]

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021).

[77]

Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. arXiv preprint arXiv:2102.07350 (2021).

[78]

Yu Rong, Takaaki Shiratori, and Hanbyul Joo. 2021. FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration. In IEEE International Conference on Computer Vision Workshops.

[79]

Alla Safonova and Jessica K. Hodgins. 2007. Construction and Optimal Search of Interpolated Motion Graphs. ACM Trans. Graph. 26, 3 (July 2007), 106--es.

Digital Library

[80]

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu. 2018. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'18). IEEE Signal Processing Society, Piscataway, NJ, USA, 4799--4783.

Digital Library

[81]

Sebastian Starke, Yiwei Zhao, Taku Komura, and Kazi Zaman. 2020. Local motion phases for learning multi-contact character movements. ACM Trans. Graph. 39, 4, Article 54 (2020), 14 pages.

Digital Library

[82]

Statista. 2020. Augmented reality (AR) and virtual reality (VR) headset shipments worldwide from 2020 to 2025. https://www.statista.com/statistics/653390/worldwide-virtual-and-augmented-reality-headset-shipments/

[83]

Wataru Takano, Katsu Yamane, and Yoshihiro Nakamura. 2010. Retrieval and Generation of Human Motions Based on Associative Model between Motion Symbols and Motion Labels. Proceedings of Journal of the Robotics Society of Japan 28, 6 (2010), 723--734.

[84]

Taoran Tang, Jia Jia, and Hanyang Mao. 2018. Dance with melody: An LSTM-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM International Conference on Multimedia. 1598--1606.

Digital Library

[85]

Petri Toiviainen, Geoff Luck, and Marc R Thompson. 2010. Embodied meter: hierarchical eigenmodes in music-induced movement. Music Perception 28, 1 (2010), 59--70.

[86]

Nikolaus F Troje. 2002. Decomposing biological motion: A framework for analysis and synthesis of human gait patterns. Journal of vision 2, 5 (2002), 2--2.

[87]

Shuhei Tsuchida, Satoru Fukayama, Masahiro Hamasaki, and Masataka Goto. 2019. AIST Dance Video Database: Multi-genre, Multi-dancer, and Multi-camera Database for Dance Information Processing. In Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019. Delft, Netherlands, 501--510.

[88]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS'17). Curran Associates, Inc., Red Hook, NY, USA, 5998--6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need

Digital Library

[89]

Jack M. Wang, David J. Fleet, and Aaron Hertzmann. 2008. Gaussian process dynamical models for human motion. IEEE T. Pattern Anal. 30, 2 (2008), 283--298.

Digital Library

[90]

Ulme Wennberg and Gustav Eje Henter. 2021. The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (ACL '21). ACL, 130--140.

[91]

Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, and Nan Duan. 2021. GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions. arXiv preprint arXiv:2104.14806 (2021).

[92]

Zijie Ye, Haozhe Wu, Jia Jia, Yaohua Bu, Wei Chen, Fanbo Meng, and Yanfeng Wang. 2020. ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit. In Proceedings of the 28th ACM International Conference on Multimedia. 744--752.

Digital Library

[93]

Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2019. Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'19). IEEE Robotics and Automation Society, Piscataway, NJ, USA, 4303--4309.

Digital Library

[94]

Yi Zhou, Zimo Li, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2018. Auto-conditioned recurrent networks for extended complex human motion synthesis. In Proceedings of the International Conference on Learning Representations (ICLR'18). 13 pages. https://openreview.net/forum?id=r11Q2SlRW

[95]

Wenlin Zhuang, Congyi Wang, Siyu Xia, Jinxiang Chai, and Yangang Wang. 2020. Music2Dance: DanceNet for Music-driven Dance Generation. arXiv e-prints (2020), arXiv-2002.

Cited By

Starke SStarke PHe NKomura TYe Y(2024)Categorical Codebook Matching for Embodied Character ControllersACM Transactions on Graphics10.1145/365820943:4(1-14)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658209
Yang KZhou XTang XDiao RLiu HHe JFan ZGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658045(11-19)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658045
Yang KTang XDiao RLiu HHe JFan ZGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic UnitProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657998(675-683)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657998
Show More Cited By

Index Terms

Transflower: probabilistic autoregressive dance generation with multimodal attention
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion capture
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Dividual Plays Experimental Lab: An installation derived from Dividual Plays
TEI '16: Proceedings of the TEI '16: Tenth International Conference on Tangible, Embedded, and Embodied Interaction

"Dividual Plays Experimental Lab" is an extract from the dance piece "Dividual Plays". Dividual Plays was produced as the first research outcome of "Reactor for Awareness in Motion [RAM]", a research project we have been involved since 2010 (http://...
Visual rhythm and beat

We present a visual analogue for musical rhythm derived from an analysis of motion in video, and show that alignment of visual rhythm with its musical counterpart results in the appearance of dance. Central to our work is the concept of visual beats --- ...
Interactive music 3.0: empowering people to participate musically inside nightclubs
CMMR'11: Proceedings of the 8th international conference on Speech, Sound and Music Processing: embracing research in India

Nightclubs are powerhouses in western culture for social listening and dancing to music. Here, mostly digital, pre-composed tunes are selected, mixed and played by a person called Disc Jockey. In another digital arena, the internet, a revolution is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 40, Issue 6

December 2021

1351 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3478513

Issue’s Table of Contents

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 December 2021

Published in TOG Volume 40, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Swedish Research Council a.k.a. Vetenskapsrådet
The Knut and Alice Wallenberg Foundation
The Marianne and Marcus Wallenberg Foundation
GENCI

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
656
Total Downloads

Downloads (Last 12 months)118
Downloads (Last 6 weeks)8

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Starke SStarke PHe NKomura TYe Y(2024)Categorical Codebook Matching for Embodied Character ControllersACM Transactions on Graphics10.1145/365820943:4(1-14)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658209
Yang KZhou XTang XDiao RLiu HHe JFan ZGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658045(11-19)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658045
Yang KTang XDiao RLiu HHe JFan ZGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic UnitProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657998(675-683)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657998
Liu YSra M(2024)DanceGen: Supporting Choreography Ideation and Prototyping with Generative AIProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661594(920-938)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661594
Schreiner PNetterstrøm RYin HDarkner SErleben KKry PCani MSkouras MWang H(2024)ADAPT: AI-Driven Artefact Purging Technique for IMU Based Motion CaptureProceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation10.1111/cgf.15172(1-13)Online publication date: 21-Aug-2024
https://dl.acm.org/doi/10.1111/cgf.15172
Hou STao HBao HXu W(2024)A Two-Part Transformer Network for Controllable Motion SynthesisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328440230:8(5047-5062)Online publication date: Aug-2024
https://doi.org/10.1109/TVCG.2023.3284402
Yang ZWen YChen SLiu XGao YLiu YGao LFu H(2024)Keyframe Control of Music-Driven 3D Dance GenerationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.323553830:7(3474-3486)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3235538
Zhu WMa XRo DCi HZhang JShi JGao FTian QWang Y(2024)Human Motion Generation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333093546:4(2430-2449)Online publication date: Apr-2024
https://doi.org/10.1109/TPAMI.2023.3330935
Luo ZRen MHu XHuang YYao L(2024)POPDG: Popular 3D Dance Generation with PopDanceSet2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02548(26974-26983)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02548
Wang ZJia JSun SWu HHan RLi ZTang DZhou JLuo J(2024)DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00754(7892-7901)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00754
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents