Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Transflower: probabilistic autoregressive dance generation with multimodal attention

Published: 10 December 2021 Publication History

Abstract

Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.

Supplementary Material

ZIP File (a195-valle-perez.zip)
Supplemental files.
MP4 File (a195-valle-perez.mp4)

References

[1]
Josh Abramson, Arun Ahuja, Iain Barr, Arthur Brussee, Federico Carnevale, Mary Cassin, Rachita Chhaparia, Stephen Clark, Bogdan Damoc, Andrew Dudzik, et al. 2020. Imitating interactive intelligence. arXiv preprint arXiv:2012.05672 (2020).
[2]
Omid Alemi, Jules Françoise, and Philippe Pasquier. 2017. Groovenet: Real-time music-driven dance movement generation using artificial neural networks. networks 8, 17 (2017), 26.
[3]
Simon Alexanderson, Gustav Eje Henter, Taras Kucherenko, and Jonas Beskow. 2020. Style-controllable speech-driven gesture synthesis using normalising flows. Comput. Graph. Forum 39, 2 (2020), 487--496.
[4]
Okan Arikan and David A. Forsyth. 2002. Interactive motion generation from examples. ACM Trans. Graph. 21, 3 (2002), 483--490.
[5]
Jody Avirgan. 2013. Why Spiderman is Such a Good Dancer. https://www.wnycstudios.org/podcasts/radiolab/articles/299399-why-spiderman-such-good-dancer
[6]
Bettina Bläsing, Beatriz Calvo-Merino, Emily S Cross, Corinne Jola, Juliane Honisch, and Catherine J Stevens. 2012. Neurocognitive control in dance perception and performance. Acta psychologica 139, 2 (2012), 300--308.
[7]
Sebastian Böck and Markus Schedl. 2011. Enhanced beat tracking with context-aware neural networks. In Proc. Int. Conf. Digital Audio Effects. 135--139.
[8]
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561--578.
[9]
Sam Bond-Taylor, Adam Leach, Yang Long, and Chris G Willcocks. 2021. Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models. arXiv preprint arXiv:2103.04922 (2021).
[10]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations.
[11]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Proc. NeurIPS, Vol. 33. 1877--1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[12]
Birgitta Burger, Marc R Thompson, Geoff Luck, Suvi Saarikallio, and Petri Toiviainen. 2013. Influences of rhythm-and timbre-related musical features on characteristics of music-induced movement. Frontiers in Psychology 4, Article 183 (2013), 10 pages.
[13]
Judith Bütepage, Michael J. Black, Danica Kragic, and Hedvig Kjellström. 2017. Deep representation learning for human motion prediction and classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'17). IEEE Computer Society, Los Alamitos, CA, USA, 1591--1599.
[14]
Shih-Pin Chao, Chih-Yi Chiu, Jui-Hsiang Chao, Shi-Nine Yang, and T-K Lin. 2004. Motion retrieval and its application to motion synthesis. In 24th International Conference on Distributed Computing Systems Workshops. IEEE, 254--259.
[15]
CMU Graphics Lab. 2003. Carnegie Mellon University motion capture database. http://mocap.cs.cmu.edu/
[16]
Luka Crnkovic-Friis and Louise Crnkovic-Friis. 2016. Generative choreography using deep learning. arXiv preprint arXiv:1605.06921 (2016).
[17]
Abe Davis and Maneesh Agrawala. 2018. Visual rhythm and beat. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2532--2535.
[18]
Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. 2020. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341 (2020).
[19]
Chris Donahue, Zachary C Lipton, and Julian McAuley. 2017. Dance dance convolution. In International conference on machine learning. PMLR, 1039--1048.
[20]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[21]
Rukun Fan, Songhua Xu, and Weidong Geng. 2011. Example-based automatic music-driven conventional dance motion synthesis. IEEE transactions on visualization and computer graphics 18, 3 (2011), 501--515.
[22]
Ylva Ferstl and Rachel McDonnell. 2018. IVA: Investigating the use of recurrent motion modelling for speech gesture generation. In IVA '18 Proceedings of the 18th International Conference on Intelligent Virtual Agents. https://trinityspeechgesture.scss.tcd.ie
[23]
Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'15). IEEE Computer Society, Los Alamitos, CA, USA, 4346--4354.
[24]
Satoru Fukayama and Masataka Goto. 2015. Music content driven automated choreography with beat-wise motion connectivity constraints. Proceedings of SMC (2015), 177--183.
[25]
F. Sebastian Grassia. 1998. Practical parameterization of rotations using the exponential map. J. Graph. Tools 3, 3 (1998), 29--48.
[26]
Keith Grochow, Steven L. Martin, Aaron Hertzmann, and Zoran Popović. 2004. Style-based inverse kinematics. ACM Trans. Graph. 23, 3 (2004), 522--531.
[27]
Ikhansul Habibie, Daniel Holden, Jonathan Schwarz, Joe Yearsley, and Taku Komura. 2017. A recurrent variational autoencoder for human motion synthesis. In Proceedings of the British Machine Vision Conference (BMVC'17). BMVA Press, Durham, UK, Article 119, 12 pages.
[28]
Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Lingjie Liu, Hans-Peter Seidel, Gerard Pons-Moll, Mohamed Elgharib, and Christian Theobalt. 2021. Learning Speech-driven 3D Conversational Gestures from Video. arXiv preprint arXiv:2102.06837 (2021).
[29]
Mari Romarheim Haugen. 2014. Studying rhythmical structures in Norwegian folk music and dance using motion capture technology: A case study of Norwegian telespringar. Musikk og Tradisjon 28 (2014), 27--52.
[30]
Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B Brown, Prafulla Dhariwal, Scott Gray, et al. 2020. Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701 (2020).
[31]
Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. Moglow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1--14.
[32]
Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. 2019. Flow++: Improving flow-based generative models with variational dequantization and architecture design. In International Conference on Machine Learning. PMLR, 2722--2730.
[33]
Daniel Holden, Oussama Kanoun, Maksym Perepichka, and Tiberiu Popa. 2020. Learned motion matching. ACM Trans. Graph. 39, 4 (2020), 53--1.
[34]
Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Trans. Graph. 36, 4, Article 42 (2017), 13 pages.
[35]
Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35, 4, Article 138 (2016), 11 pages.
[36]
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curious Case of Neural Text Degeneration. In International Conference on Learning Representations.
[37]
Andre Holzapfel, Michael Hagleitner, and Stella Pashalidou. 2020. Diversity of Traditional Dance Expression in Crete: Data Collection, Research Questions, and Method Development. In Proceedings of the ICTM Study Group on Sound, Movement, and the Sciences Symposium.
[38]
Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Ian Simon, Curtis Hawthorne, Noam Shazeer, Andrew M Dai, Matthew D Hoffman, Monica Dinculescu, and Douglas Eck. 2018. Music Transformer: Generating Music with Long-Term Structure. In International Conference on Learning Representations.
[39]
Chen Kang, Zhipeng Tan, Jin Lei, Song-Hai Zhang, Yuan-Chen Guo, Weidong Zhang, and Shi-Min Hu. 2021. ChoreoMaster: Choreography-oriented Music-driven Dance Synthesis. (2021). https://www.youtube.com/watch?v=V8MlYa_yhF0 accepted for publication at SIGGRAPH 2021.
[40]
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020).
[41]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.
[42]
Diederik P. Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1×1 convolutions. In Advances in Neural Information Processing Systems (NeurIPS'18). Curran Associates, Inc., Red Hook, NY, USA, 10236--10245. http://papers.nips.cc/paper/8224-glow-generative-flow-with-invertible-1x1-con
[43]
Lucas Kovar and Michael Gleicher. 2004. Automated extraction and parameterization of motions in large data sets. ACM Trans. Graph. 23, 3 (2004), 559--568.
[44]
Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2002. Motion graphs. ACM Trans. Graph. 21, 3 (2002), 473--482.
[45]
Florian Krebs, Sebastian Böck, and Gerhard Widmer. 2015. An Efficient State-Space Model for Joint Tempo and Meter Tracking. In ISMIR. 72--78.
[46]
Taras Kucherenko, Patrik Jonell, Youngwoo Yoon, Pieter Wolfert, and Gustav Eje Henter. 2021. A Large, Crowdsourced Evaluation of Gesture Generation Systems on Common Data: The GENEA Challenge 2020. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI '21). ACM, New York, NY, USA, 11--21.
[47]
OxAI Labs. 2019. DeepSaber. https://github.com/oxai/deepsaber/.
[48]
Kimerer LaMothe. 2019. The dancing species: how moving together in time helps make us human. Aeon (June 2019). https://aeon.co/ideas/the-dancing-species-how-moving-together-in-time-helps-make-us-human
[49]
Ben Lang. 2021. The Future is Now: Live Breakdance Battles in VR Are Connecting People Across the Globe. https://www.roadtovr.com/vr-dance-battle-vrchat-breakdance/
[50]
Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha S Srinivasa, and Yaser Sheikh. 2019a. Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 763--772.
[51]
Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, and Jan Kautz. 2019b. Dancing to music. arXiv preprint arXiv:1911.02001 (2019).
[52]
Jehee Lee, Jinxiang Chai, Paul S. A. Reitsma, Jessica K. Hodgins, and Nancy S. Pollard. 2002. Interactive control of avatars animated with human motion data. ACM Trans. Graph. 21, 3 (2002), 491--500.
[53]
Sergey Levine, Jack M. Wang, Alexis Haraux, Zoran Popović, and Vladlen Koltun. 2012. Continuous character control with low-dimensional embeddings. ACM Trans. Graph. 31, 4, Article 28 (2012), 10 pages.
[54]
Buyu Li, Yongchi Zhao, and Lu Sheng. 2021b. DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. arXiv preprint arXiv:2103.10206 (2021).
[55]
Jiaman Li, Yihang Yin, Hang Chu, Yi Zhou, Tingwu Wang, Sanja Fidler, and Hao Li. 2020. Learning to Generate Diverse Dance Motions with Transformer. arXiv preprint arXiv:2008.08171 (2020).
[56]
Ruilong Li, Shan Yang, David A Ross, and Angjoo Kanazawa. 2021a. Learn to Dance with AIST++: Music Conditioned 3D Dance Generation. arXiv preprint arXiv:2101.08779 (2021).
[57]
Hung Yu Ling, Fabio Zinno, George Cheng, and Michiel van de Panne. 2020. Character controllers using motion VAEs. ACM Trans. Graph. 39, 4, Article 40 (2020), 12 pages.
[58]
lox9973. 2021. ShaderMotion. https://gitlab.com/lox9973/ShaderMotion.
[59]
Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision. 5442--5451.
[60]
Christian Mandery, Ömer Terlemez, Martin Do, Nikolaus Vahrenkamp, and Tamim Asfour. 2015. The KIT whole-body human motion database. In 2015 International Conference on Advanced Robotics (ICAR). IEEE, 329--336.
[61]
Alexander Mathis, Steffen Schneider, Jessy Lauer, and Mackenzie Weygandt Mathis. 2020. A primer on motion capture with deep learning: principles, pitfalls, and perspectives. Neuron 108, 1 (2020), 44--65.
[62]
Josh Merel, Yuval Tassa, Sriram Srinivasan, Jay Lemmon, Ziyu Wang, Greg Wayne, and Nicolas Heess. 2017. Learning human behaviors from motion capture by adversarial imitation. arXiv preprint arXiv:1707.02201 (2017).
[63]
Jared E Miller, Laura A Carlson, and J Devin McAuley. 2013. When what you hear influences when you see: listening to an auditory rhythm influences the temporal allocation of visual attention. Psychological science 24, 1 (2013), 11--18.
[64]
Olof Misgeld, Andre Holzapfel, and Sven Ahlbäck. 2019. Dancing Dots - Investigating the Link between Dancer and Musician in Swedish Folk Dance. In Sound & Music Computing Conference.
[65]
Luiz Naveda and Marc Leman. 2011. Hypotheses on the choreographic roots of the musical meter: a case study on Afro-Brazilian dance and music. Debates actuales en evolución, desarrollo y cognición e implicancias socio-culturales (2011), 477--495.
[66]
George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. 2021. Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research 22, 57 (2021), 1--64. http://jmlr.org/papers/v22/19-1028.html
[67]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2337--2346.
[68]
Dario Pavllo, David Grangier, and Michael Auli. 2018. QuaterNet: A quaternion-based recurrent model for human motion. In Proceedings of the British Machine Vision Conference (BMVC'18). BMVA Press, Durham, UK, 14 pages. http://www.bmva.org/bmvc/2018/contents/papers/0675.pdf
[69]
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018a. Deep-Mimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37, 4 (2018), 1--14.
[70]
Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018b. Sfv: Reinforcement learning of physical skills from videos. ACM Transactions On Graphics (TOG) 37, 6 (2018), 1--14.
[71]
Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. 2021. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. ACM Trans. Graph. 40, 4, Article 1 (July 2021), 15 pages.
[72]
Mathis Petrovich, Michael J Black, and Gül Varol. 2021. Action-Conditioned 3D Human Motion Synthesis with Transformer VAE. arXiv preprint arXiv:2104.05670 (2021).
[73]
Wim Pouw, Shannon Proksch, Linda Drijvers, Marco Gamba, Judith Holler, Christopher Kello, Rebecca S. Schaefer, and Geraint A. Wiggins. 2021. Multilevel rhythms in multimodal communication. Philosophical Transactions of the Royal Society B 376, 1835 (2021), 20200334.
[74]
Ryan Prenger, Rafael Valle, and Bryan Catanzaro. 2019. WaveGlow: A flow-based generative network for speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'19). IEEE Signal Processing Society, Piscataway, NJ, USA, 3617--3621.
[75]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).
[76]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021).
[77]
Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. arXiv preprint arXiv:2102.07350 (2021).
[78]
Yu Rong, Takaaki Shiratori, and Hanbyul Joo. 2021. FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration. In IEEE International Conference on Computer Vision Workshops.
[79]
Alla Safonova and Jessica K. Hodgins. 2007. Construction and Optimal Search of Interpolated Motion Graphs. ACM Trans. Graph. 26, 3 (July 2007), 106--es.
[80]
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu. 2018. Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'18). IEEE Signal Processing Society, Piscataway, NJ, USA, 4799--4783.
[81]
Sebastian Starke, Yiwei Zhao, Taku Komura, and Kazi Zaman. 2020. Local motion phases for learning multi-contact character movements. ACM Trans. Graph. 39, 4, Article 54 (2020), 14 pages.
[82]
Statista. 2020. Augmented reality (AR) and virtual reality (VR) headset shipments worldwide from 2020 to 2025. https://www.statista.com/statistics/653390/worldwide-virtual-and-augmented-reality-headset-shipments/
[83]
Wataru Takano, Katsu Yamane, and Yoshihiro Nakamura. 2010. Retrieval and Generation of Human Motions Based on Associative Model between Motion Symbols and Motion Labels. Proceedings of Journal of the Robotics Society of Japan 28, 6 (2010), 723--734.
[84]
Taoran Tang, Jia Jia, and Hanyang Mao. 2018. Dance with melody: An LSTM-autoencoder approach to music-oriented dance synthesis. In Proceedings of the 26th ACM International Conference on Multimedia. 1598--1606.
[85]
Petri Toiviainen, Geoff Luck, and Marc R Thompson. 2010. Embodied meter: hierarchical eigenmodes in music-induced movement. Music Perception 28, 1 (2010), 59--70.
[86]
Nikolaus F Troje. 2002. Decomposing biological motion: A framework for analysis and synthesis of human gait patterns. Journal of vision 2, 5 (2002), 2--2.
[87]
Shuhei Tsuchida, Satoru Fukayama, Masahiro Hamasaki, and Masataka Goto. 2019. AIST Dance Video Database: Multi-genre, Multi-dancer, and Multi-camera Database for Dance Information Processing. In Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019. Delft, Netherlands, 501--510.
[88]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS'17). Curran Associates, Inc., Red Hook, NY, USA, 5998--6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need
[89]
Jack M. Wang, David J. Fleet, and Aaron Hertzmann. 2008. Gaussian process dynamical models for human motion. IEEE T. Pattern Anal. 30, 2 (2008), 283--298.
[90]
Ulme Wennberg and Gustav Eje Henter. 2021. The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (ACL '21). ACL, 130--140.
[91]
Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, and Nan Duan. 2021. GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions. arXiv preprint arXiv:2104.14806 (2021).
[92]
Zijie Ye, Haozhe Wu, Jia Jia, Yaohua Bu, Wei Chen, Fanbo Meng, and Yanfeng Wang. 2020. ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit. In Proceedings of the 28th ACM International Conference on Multimedia. 744--752.
[93]
Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, and Geehyuk Lee. 2019. Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'19). IEEE Robotics and Automation Society, Piscataway, NJ, USA, 4303--4309.
[94]
Yi Zhou, Zimo Li, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2018. Auto-conditioned recurrent networks for extended complex human motion synthesis. In Proceedings of the International Conference on Learning Representations (ICLR'18). 13 pages. https://openreview.net/forum?id=r11Q2SlRW
[95]
Wenlin Zhuang, Congyi Wang, Siyu Xia, Jinxiang Chai, and Yangang Wang. 2020. Music2Dance: DanceNet for Music-driven Dance Generation. arXiv e-prints (2020), arXiv-2002.

Cited By

View all
  • (2024)Categorical Codebook Matching for Embodied Character ControllersACM Transactions on Graphics10.1145/365820943:4(1-14)Online publication date: 19-Jul-2024
  • (2024)BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658045(11-19)Online publication date: 30-May-2024
  • (2024)CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic UnitProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657998(675-683)Online publication date: 30-May-2024
  • Show More Cited By

Index Terms

  1. Transflower: probabilistic autoregressive dance generation with multimodal attention

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 40, Issue 6
      December 2021
      1351 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3478513
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 December 2021
      Published in TOG Volume 40, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dance
      2. generative models
      3. glow
      4. machine learning
      5. normalising flows
      6. transformers

      Qualifiers

      • Research-article

      Funding Sources

      • The Swedish Research Council a.k.a. Vetenskapsrådet
      • The Knut and Alice Wallenberg Foundation
      • The Marianne and Marcus Wallenberg Foundation
      • GENCI

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)118
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 22 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Categorical Codebook Matching for Embodied Character ControllersACM Transactions on Graphics10.1145/365820943:4(1-14)Online publication date: 19-Jul-2024
      • (2024)BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658045(11-19)Online publication date: 30-May-2024
      • (2024)CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic UnitProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657998(675-683)Online publication date: 30-May-2024
      • (2024)DanceGen: Supporting Choreography Ideation and Prototyping with Generative AIProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661594(920-938)Online publication date: 1-Jul-2024
      • (2024)ADAPT: AI-Driven Artefact Purging Technique for IMU Based Motion CaptureProceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation10.1111/cgf.15172(1-13)Online publication date: 21-Aug-2024
      • (2024)A Two-Part Transformer Network for Controllable Motion SynthesisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.328440230:8(5047-5062)Online publication date: Aug-2024
      • (2024)Keyframe Control of Music-Driven 3D Dance GenerationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.323553830:7(3474-3486)Online publication date: 1-Jul-2024
      • (2024)Human Motion Generation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333093546:4(2430-2449)Online publication date: Apr-2024
      • (2024)POPDG: Popular 3D Dance Generation with PopDanceSet2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02548(26974-26983)Online publication date: 16-Jun-2024
      • (2024)DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00754(7892-7901)Online publication date: 16-Jun-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media