Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Human dynamics from monocular video with dynamic camera movements

Published: 10 December 2021 Publication History

Abstract

We propose a new method that reconstructs 3D human motion from in-the-wild video by making full use of prior knowledge on the laws of physics. Previous studies focus on reconstructing joint angles and positions in the body local coordinate frame. Body translations and rotations in the global reference frame are partially reconstructed only when the video has a static camera view. We are interested in overcoming this static view limitation to deal with dynamic view videos. The camera may pan, tilt, and zoom to track the moving subject. Since we do not assume any limitations on camera movements, body translations and rotations from the video do not correspond to absolute positions in the reference frame. The key technical challenge is inferring body translations and rotations from a sequence of 3D full-body poses, assuming the absence of root motion. This inference is possible because human motion obeys the law of physics. Our reconstruction algorithm produces a control policy that simulates 3D human motion imitating the one in the video. Our algorithm is particularly useful for reconstructing highly dynamic movements, such as sports, dance, gymnastics, and parkour actions.

Supplementary Material

MP4 File (a208-yu.mp4)

References

[1]
Adobe Systems Inc. 2018. Mixamo. http://www.mixamo.com
[2]
Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. 2019. DReCon: data-driven responsive control of physics-based characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--11.
[3]
Marcus A Brubaker and David J Fleet. 2008. The kneed walker for human pose tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[4]
Marcus A Brubaker, David J Fleet, and Aaron Hertzmann. 2007. Physics-based person tracking using simplified lower-body dynamics. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[5]
Marcus A Brubaker, Leonid Sigal, and David J Fleet. 2009. Estimating contact dynamics. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2389--2396.
[6]
Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).
[7]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7103--7112.
[8]
Alexander Clegg, Wenhao Yu, Jie Tan, C. Karen Liu, and Greg Turk. 2018. Learning to Dress: Synthesizing Human Dressing Motion via Deep Reinforcement Learning. ACM Trans. Graph. 37, 6, Article 179 (Dec. 2018), 10 pages.
[9]
Marco da Silva, Yeuhi Abe, and Jovan Popović. 2008. Interactive Simulation of Stylized Human Locomotion. ACM Trans. Graph. 27, 3 (Aug. 2008), 1--10.
[10]
Marc Habermann, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, and Christian Theobalt. 2019. Livecap: Real-time human performance capture from monocular video. ACM Transactions on Graphics (TOG) 38, 2 (2019), 1--17.
[11]
Marc Habermann, Weipeng Xu, Michael Zollhofer, Gerard Pons-Moll, and Christian Theobalt. 2020. Deepcap: Monocular human performance capture using weak supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5052--5063.
[12]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs/1502.01852 (2015). arXiv:1502.01852 http://arxiv.org/abs/1502.01852
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[15]
Jessica K. Hodgins, Wayne L. Wooten, David C. Brogan, and James F. O'Brien. 1995. Animating Human Athletics. In Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '95). Association for Computing Machinery, New York, NY, USA, 71--78.
[16]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence 36, 7 (2013), 1325--1339.
[17]
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society.
[18]
Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3D Human Dynamics from Video. In Computer Vision and Pattern Recognition (CVPR).
[19]
Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. 2020. VIBE: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5253--5263.
[20]
Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis. 2019. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In Proceedings of the IEEE International Conference on Computer Vision. 2252--2261.
[21]
Dieter Kraft. 1994. Algorithm 733: TOMP-Fortran Modules for Optimal Control Calculations. ACM Trans. Math. Softw. 20, 3 (Sept. 1994), 262--281.
[22]
Taesoo Kwon and Jessica K. Hodgins. 2017. Momentum-Mapped Inverted Pendulum Models for Controlling Dynamic Human Motions. ACM Trans. Graph. 36, 4, Article 145d (Jan. 2017), 14 pages.
[23]
Jeongseok Lee, Michael X. Grey, Sehoon Ha, Tobias Kunz, Sumit Jain, Yuting Ye, Siddhartha S. Srinivasa, Mike Stilman, and C. Karen Liu. 2018. DART Dynamic Animation and Robotics Toolkit. Journal of Open Source Software 3, 22 (2018), 500.
[24]
Yoonsang Lee, Sungeun Kim, and Jehee Lee. 2010. Data-Driven Biped Control. ACM Trans. Graph. 29, 4, Article 129 (July 2010), 8 pages.
[25]
Libin Liu and Jessica Hodgins. 2018. Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--14.
[26]
Libin Liu, KangKang Yin, Michiel van de Panne, and Baining Guo. 2012. Terrain Runner: Control, Parameterization, Composition, and Planning for Highly Dynamic Motions. ACM Trans. Graph. 31, 6, Article 154 (Nov. 2012), 10 pages.
[27]
Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. 2019. AMASS: Archive of motion capture as surface shapes. In Proceedings of the IEEE International Conference on Computer Vision. 5442--5451.
[28]
Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, and Christian Theobalt. 2020. XNect: Real-time multi-person 3D motion capture with a single RGB camera. ACM Transactions on Graphics (TOG) 39, 4 (2020), 82--1.
[29]
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera. ACM Transactions on Graphics 36, 4, 14 pages.
[30]
Josh Merel, Saran Tunyasuvunakool, Arun Ahuja, Yuval Tassa, Leonard Hasenclever, Vu Pham, Tom Erez, Greg Wayne, and Nicolas Heess. 2020. Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks. ACM Trans. Graph. 39, 4, Article 39 (July 2020), 14 pages.
[31]
Aron Monszpart, Paul Guerrero, Duygu Ceylan, Ersin Yumer, and Niloy J Mitra. 2019. iMapper: Interaction-guided scene mapping from monocular videos. ACM Transactions On Graphics (TOG) 38, 4 (2019), 1--15.
[32]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European conference on computer vision. Springer, 483--499.
[33]
Soohwan Park, Hoseok Ryu, Seyoung Lee, Sunmin Lee, and Jehee Lee. 2019. Learning predict-and-simulate policies from unorganized human motion data. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--11.
[34]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[35]
Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Conference on Computer Vision and Pattern Recognition (CVPR).
[36]
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018a. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--14.
[37]
Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, and Sergey Levine. 2019. MCP: Learning composable hierarchical control with multiplicative compositional policies. arXiv preprint arXiv:1905.09808 (2019).
[38]
Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018b. SFV: Reinforcement Learning of Physical Skills from Videos. ACM Trans. Graph. 37, 6, Article 178 (Nov. 2018), 14 pages.
[39]
Tomas Pfister, James Charles, and Andrew Zisserman. 2015. Flowing convnets for human pose estimation in videos. In Proceedings of the IEEE International Conference on Computer Vision. 1913--1921.
[40]
Davis Rempe, Leonidas J. Guibas, Aaron Hertzmann, Bryan Russell, Ruben Villegas, and Jimei Yang. 2020. Contact and Human Dynamics from Monocular Video. In Proceedings of the European Conference on Computer Vision (ECCV).
[41]
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-Dimensional Continuous Control Using Generalized Advantage Estimation. arXiv:1506.02438 [cs.LG]
[42]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG]
[43]
Mingyi Shi, Kfir Aberman, Andreas Aristidou, Taku Komura, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2020. MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency. ACM Transactions on Graphics (TOG) 40, 1 (2020), 1--15.
[44]
Soshi Shimada, Vladislav Golyanik, Weipeng Xu, and Christian Theobalt. 2020. PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time. ACM Transactions on Graphics 39, 6, Article 235 (dec 2020).
[45]
Leonid Sigal, Alexandru O Balan, and Michael J Black. 2010. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International journal of computer vision 87, 1-2 (2010), 4.
[46]
Kwang Won Sok, Manmyung Kim, and Jehee Lee. 2007. Simulating Biped Behaviors from Human Motion Data. ACM Trans. Graph. 26, 3 (July 2007), 107:1--107:10.
[47]
J. Tan, K. Liu, and G. Turk. 2011. Stable Proportional-Derivative Controllers. IEEE Computer Graphics and Applications 31, 4 (2011), 34--44.
[48]
Jonathan J Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in neural information processing systems. 1799--1807.
[49]
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1653--1660.
[50]
Marek Vondrak, Leonid Sigal, Jessica Hodgins, and Odest Jenkins. 2012. Video-based 3D motion capture through biped control. ACM Transactions On Graphics (TOG) 31, 4 (2012), 1--12.
[51]
Marek Vondrak, Leonid Sigal, and Odest Chadwicke Jenkins. 2008. Physical simulation for probabilistic motion tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[52]
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 4724--4732.
[53]
Xiaolin Wei and Jinxiang Chai. 2010. Videomocap: Modeling physically realistic human motion from monocular video sequences. In ACM SIGGRAPH 2010 papers. 1--10.
[54]
Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2020. A Scalable Approach to Control Diverse Behaviors for Physically Simulated Characters. ACM Trans. Graph. 39, 4, Article 33 (July 2020), 12 pages.
[55]
Jungdam Won and Jehee Lee. 2019. Learning body shape variation in physics-based characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--12.
[56]
Donglai Xiang, Hanbyul Joo, and Yaser Sheikh. 2019. Monocular total capture: Posing face, body, and hands in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10965--10974.
[57]
Zhaoming Xie, Hung Yu Ling, Nam Hee Kim, and Michiel van de Panne. 2020. ALL-STEPS: Curriculum-driven Learning of Stepping Stone Skills. In Computer Graphics Forum, Vol. 39. Wiley Online Library, 213--224.
[58]
Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human Performance Capture From Monocular Video. ACM Trans. Graph. 37, 2, Article 27 (May 2018), 15 pages.
[59]
KangKang Yin, Kevin Loken, and Michiel van de Panne. 2007. SIMBICON: Simple Biped Locomotion Control. ACM Trans. Graph. 26, 3 (July 2007), 105:1--105:10.
[60]
Zhiqi Yin and KangKang Yin. 2020. Linear Time Stable PD Controllers for Physics-based Character Animation. Computer Graphics Forum 39, 8 (2020), 191--200.
[61]
Wenhao Yu, Greg Turk, and C Karen Liu. 2018. Learning symmetric and low-energy locomotion. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--12.
[62]
Ye Yuan and Kris Kitani. 2018. 3d ego-pose estimation via imitation learning. In Proceedings of the European Conference on Computer Vision (ECCV). 735--750.
[63]
Ye Yuan and Kris Kitani. 2019. Ego-pose estimation and forecasting as real-time pd control. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10082--10092.
[64]
Ye Yuan and Kris Kitani. 2020. Residual force control for agile human behavior imitation and extended motion synthesis. arXiv preprint arXiv:2006.07364 (2020).

Cited By

View all
  • (2024)CBIL: Collective Behavior Imitation Learning for Fish from Real VideosACM Transactions on Graphics10.1145/368790443:6(1-17)Online publication date: 19-Dec-2024
  • (2024)World-Grounded Human Motion Recovery via Gravity-View CoordinatesSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687565(1-11)Online publication date: 3-Dec-2024
  • (2024)Bluefish: Composing Diagrams with Declarative RelationsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676465(1-21)Online publication date: 13-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 40, Issue 6
December 2021
1351 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3478513
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 December 2021
Published in TOG Volume 40, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. motion reconstruction
  2. physics-based simulation
  3. video processing

Qualifiers

  • Research-article

Funding Sources

  • Korea government (MSIT)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)9
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CBIL: Collective Behavior Imitation Learning for Fish from Real VideosACM Transactions on Graphics10.1145/368790443:6(1-17)Online publication date: 19-Dec-2024
  • (2024)World-Grounded Human Motion Recovery via Gravity-View CoordinatesSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687565(1-11)Online publication date: 3-Dec-2024
  • (2024)Bluefish: Composing Diagrams with Declarative RelationsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676465(1-21)Online publication date: 13-Oct-2024
  • (2024)Minkowski Penalties: Robust Differentiable Constraint Enforcement for Vector GraphicsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657495(1-12)Online publication date: 13-Jul-2024
  • (2024)Cinematic Behavior Transfer via NeRF-based Differentiable Filming2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00642(6723-6732)Online publication date: 16-Jun-2024
  • (2024)MultiPhys: Multi-Person Physics-Aware 3D Motion Estimation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00226(2331-2340)Online publication date: 16-Jun-2024
  • (2024)Synergistic Global-Space Camera and Human Reconstruction from Videos2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00122(1216-1226)Online publication date: 16-Jun-2024
  • (2024)PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00075(718-728)Online publication date: 16-Jun-2024
  • (2024)WHAC: World-Grounded Humans and CamerasComputer Vision – ECCV 202410.1007/978-3-031-72754-2_2(20-37)Online publication date: 29-Sep-2024
  • (2023)Augmented Behavioral Annotation Tools, with Application to Multimodal Datasets and Models: A Systematic ReviewAI10.3390/ai40100074:1(128-171)Online publication date: 28-Jan-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media