research-article

Open access

QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars

Authors:

Alexander Winkler,

Yuting YeAuthors Info & Claims

SA '22: SIGGRAPH Asia 2022 Conference Papers

Article No.: 2, Pages 1 - 8

https://doi.org/10.1145/3550469.3555411

Published: 30 November 2022 Publication History

All formats PDF

Abstract

Real-time tracking of human body motion is crucial for interactive and immersive experiences in AR/VR. However, very limited sensor data about the body is available from standalone wearable devices such as HMDs (Head Mounted Devices) or AR glasses. In this work, we present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers, and simulates plausible and physically valid full body motions. Using high quality full body motion as dense supervision during training, a simple policy network can learn to output appropriate torques for the character to balance, walk, and jog, while closely following the input signals. Our results demonstrate surprisingly similar leg motions to ground truth without any observations of the lower body, even when the input is only the 6D transformations of the HMD. We also show that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments.

Supplemental Material

MP4 File

presentation

Download
978.10 MB

MP4 File

Supplemental video

Download
258.27 MB

PDF File

Appendix

Download
246.87 KB

References

[1]

Sadegh Aliakbarian, Pashmina Cameron, Federica Bogo, Andrew Fitzgibbon, and Tom Cashman. 2022. FLAG: Flow-based 3D Avatar Generation from Sparse Observations. In 2022 Computer Vision and Pattern Recognition. https://www.microsoft.com/en-us/research/publication/flag-flow-based-3d-avatar-generation-from-sparse-observations/

[2]

Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. 2019. DReCon: Data-driven Responsive Control of Physics-based Characters. ACM Trans. Graph. 38, 6, Article 206 (2019). http://doi.acm.org/10.1145/3355089.3356536

Digital Library

[3]

Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).

Digital Library

[4]

Nuttapong Chentanez, Matthias Müller, Miles Macklin, Viktor Makoviychuk, and Stefan Jeschke. 2018. Physics-based motion capture imitation with deep reinforcement learning. In Motion, Interaction and Games, MIG 2018. ACM, 1:1–1:10. https://doi.org/10.1145/3274247.3274506

Digital Library

[5]

Andrea Dittadi, Sebastian Dziadzio, Darren Cosker, Ben Lundell, Tom Cashman, and Jamie Shotton. 2021. Full-Body Motion From a Single Head-Mounted Device: Generating SMPL Poses From Partial Observations. In International Conference on Computer Vision 2021.

[6]

H. Durrant-Whyte and T. Bailey. 2006. Simultaneous localization and mapping: part I. IEEE Robotics Automation Magazine 13, 2 (2006), 99–110. https://doi.org/10.1109/MRA.2006.1638022

[7]

Levi Fussell, Kevin Bergamin, and Daniel Holden. 2021. SuperTrack: Motion Tracking for Physically Simulated Characters using Supervised Learning. ACM Trans. Graph. 40, 6, Article 197 (2021). https://dl.acm.org/doi/10.1145/3478513.3480527

Digital Library

[8]

Rıza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7297–7306.

[9]

Félix G. Harvey, Mike Yurick, Derek Nowrouzezahrai, and Christopher Pal. 2020. Robust Motion In-Betweening. 39, 4 (2020).

[10]

Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J. Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time. ACM TOG 37, 6 (12 2018).

Digital Library

[11]

Jiaxi Jiang, Paul Streli, Huajian Qiu, Andreas Fender, Larissa Laich, Patrick Snape, and Christian Holz. 2022a. AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing. https://doi.org/10.48550/ARXIV.2207.13784

[12]

Yifeng Jiang, Yuting Ye, Deepak Gopinath, Jungdam Won, Alexander W Winkler, and C Karen Liu. 2022b. Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation. journal = ACM Trans. Graph.(2022).

[13]

Angjoo Kanazawa, Jason Y. Zhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3D Human Dynamics from Video. In Computer Vision and Pattern Recognition (CVPR).

[14]

Manuel Kaufmann, Yi Zhao, Chengcheng Tang, Lingling Tao, Christopher Twigg, Jie Song, Robert Wang, and Otmar Hilliges. 2021. EM-POSE: 3D Human Pose Estimation from Sparse Electromagnetic Trackers. In International Conference on Computer Vision (ICCV).

[15]

Hung Yu Ling, Fabio Zinno, George Cheng, and Michiel Van De Panne. 2020. Character controllers using motion vaes. ACM Transactions on Graphics (TOG)(2020).

[16]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM TOG 34, 6 (Oct. 2015), 248:1–248:16.

Digital Library

[17]

Zhengyi Luo, Ryo Hachiuma, Ye Yuan, and Kris Kitani. 2021. Dynamics-regulated kinematic policy for egocentric pose estimation. Advances in Neural Information Processing Systems 34 (2021).

[18]

Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, and Gavriel State. 2021. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning. https://doi.org/10.48550/ARXIV.2108.10470

[19]

Josh Merel, Saran Tunyasuvunakool, Arun Ahuja, Yuval Tassa, Leonard Hasenclever, Vu Pham, Tom Erez, Greg Wayne, and Nicolas Heess. 2020. Catch and Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks. ACM Trans. Graph. 39, 4, Article 39 (2020). https://doi.org/10.1145/3386569.3392474

Digital Library

[20]

Deepak Nagaraj, Erik Schake, Patrick Leiner, and Dirk Werth. 2020. An RNN-Ensemble Approach for Real Time Human Pose Estimation from Sparse IMUs. In Proceedings of the 3rd International Conference on Applications of Intelligent Systems (Las Palmas de Gran Canaria, Spain) (APPIS 2020). Article 32, 6 pages.

Digital Library

[21]

Soohwan Park, Hoseok Ryu, Seyoung Lee, Sunmin Lee, and Jehee Lee. 2019. Learning Predict-and-simulate Policies from Unorganized Human Motion Data. ACM Trans. Graph. 38, 6, Article 205 (2019). http://doi.acm.org/10.1145/3355089.3356501

Digital Library

[22]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019), 8026–8037.

[23]

Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018a. DeepMimic: Example-guided Deep Reinforcement Learning of Physics-based Character Skills. ACM Trans. Graph. 37, 4, Article 143 (July 2018), 143:1–143:14 pages.

Digital Library

[24]

Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, and Sergey Levine. 2019. MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies. In Advances in Neural Information Processing Systems 32. 3681–3692.

[25]

Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler. 2022. ASE: Large-scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters. ACM Trans. Graph. 41, 4, Article 94 (July 2022).

Digital Library

[26]

Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018b. SFV: Reinforcement Learning of Physical Skills from Videos. ACM Trans. Graph. 37, 6, Article 178 (Nov. 2018), 14 pages.

Digital Library

[27]

Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. 2021. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. ACM Trans. Graph. 40, 4, Article 1 (July 2021), 15 pages. https://doi.org/10.1145/3450626.3459670

Digital Library

[28]

Davis Rempe, Tolga Birdal, Aaron Hertzmann, Jimei Yang, Srinath Sridhar, and Leonidas J Guibas. 2021. Humor: 3d human motion model for robust pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11488–11499.

[29]

Yu Rong, Takaaki Shiratori, and Hanbyul Joo. 2021. FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration. In IEEE International Conference on Computer Vision Workshops.

[30]

Nikita Rudin. 2021. Github repository: github.com/leggedrobotics/rsl_rl.

[31]

Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter. 2021. Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. https://doi.org/10.48550/ARXIV.2109.11978

[32]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. https://doi.org/10.48550/ARXIV.1707.06347

[33]

Soshi Shimada, Vladislav Golyanik, Weipeng Xu, and Christian Theobalt. 2020. PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time. ACM TOG 39, 6 (12 2020).

Digital Library

[34]

Sebastian Starke, Ian Mason, and Taku Komura. 2022. DeepPhase: Periodic Autoencoders for Learning Motion Phase Manifolds. ACM Trans. Graph. 41, 4, Article 136 (jul 2022), 13 pages. https://doi.org/10.1145/3528223.3530178

Digital Library

[35]

Jie Tan, Karen Liu, and Greg Turk. 2011. Stable Proportional-Derivative Controllers. IEEE Computer Graphics and Applications 31, 4 (2011), 34–44. https://doi.org/10.1109/MCG.2011.30

Digital Library

[36]

Matt Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse. 2017. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors. In 2017 British Machine Vision Conference (BMVC).

[37]

Systems Vicon. 2022. Vicon Motion Systems https://www.vicon.com/. Last visited: 01/26/2022.

[38]

Timo von Marcard, Bodo Rosenhahn, Michael Black, and Gerard Pons-Moll. 2017. Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs. Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics) (2017), 349–360.

[39]

Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2020. A scalable approach to control diverse behaviors for physically simulated characters. ACM Transactions on Graphics (TOG) 39, 4 (2020), 33–1.

Digital Library

[40]

Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2021. Control Strategies for Physically Simulated Characters Performing Two-Player Competitive Sports. ACM Trans. Graph. 40, 4, Article 146 (2021). https://doi.org/10.1145/3450626.3459761

Digital Library

[41]

Jungdam Won, Deepak Gopinath, and Jessica Hodgins. 2022. Physics-Based Character Controllers Using Conditional VAEs. ACM Trans. Graph. 41, 4, Article 96 (jul 2022), 12 pages. https://doi.org/10.1145/3528223.3530067

Digital Library

[42]

Jungdam Won and Jehee Lee. 2019. Learning body shape variation in physics-based characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1–12.

Digital Library

[43]

Zhaoming Xie, Sebastian Starke, Hung Yu Ling, and Michiel van de Panne. 2022. Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts. (2022).

[44]

Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019. Denserac: Joint 3d pose and shape estimation by dense render-and-compare. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7760–7770.

[45]

Xinyu Yi, Yuxiao Zhou, Marc Habermann, Soshi Shimada, Vladislav Golyanik, Christian Theobalt, and Feng Xu. 2022. Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]

Xinyu Yi, Yuxiao Zhou, and Feng Xu. 2021. TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors. ACM TOG 40, 4 (8 2021).

Digital Library

[47]

Ri Yu, Hwangpil Park, and Jehee Lee. 2021. Human Dynamics from Monocular Video with Dynamic Camera Movements. ACM Trans. Graph. 40, 6, Article 208 (2021), 14 pages. https://doi.org/10.1145/3478513.3480504

Digital Library

[48]

Ye Yuan and Kris Kitani. 2019. Ego-Pose Estimation and Forecasting as Real-Time PD Control. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 10082–10092.

Cited By

Kim DYeo HPark K(2025)Effects of an Avatar Control on VR EmbodimentBioengineering10.3390/bioengineering1201003212:1(32)Online publication date: 3-Jan-2025
https://doi.org/10.3390/bioengineering12010032
Fang JSong HZuo CGao XChen XGuo SQin YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)SuDAProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692955(22042-22061)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692955
Cho YSon WBak JLee YLim HCha Y(2024)Full-Body Pose Estimation of Humanoid Robots Using Head-Worn Cameras for Digital Human-Augmented Robotic TelepresenceMathematics10.3390/math1219303912:19(3039)Online publication date: 28-Sep-2024
https://doi.org/10.3390/math12193039
Show More Cited By

Recommendations

SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data
Accurate and reliable human motion reconstruction is crucial for creating natural interactions of full-body avatars in Virtual Reality (VR) and entertainment applications. As the Metaverse and social applications gain popularity, users are seeking cost-...
Real time automatic skeleton and motion estimation for character animation
International Workshop Motion in Games (MIG08)

Motion capture is prevalent in the pipeline of realistic articulated character animation. To define accurate joint positions and joint orientations for the movement of a hierarchical human-like character without using a pre-defined skeleton remains a ...
Automatic Estimation of Skeletal Motion from Optical Motion Capture Data
Motion in Games

Utilization of motion capture techniques is becoming more popular in the pipeline of articulated character animation. Based upon captured motion data, defining accurate joint positions and joint orientations for the movement of a hierarchical human-like ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SA '22: SIGGRAPH Asia 2022 Conference Papers

November 2022

482 pages

ISBN:9781450394703

DOI:10.1145/3550469

Editors:
Soon Ki Jung
Kyungpook National University, South Korea
,
Jehee Lee
Seoul National University, South Korea
,
Adam Bargteil
University of Maryland Baltimore County, USA

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2022

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Data Availability

presentation https://dl.acm.org/doi/10.1145/3550469.3555411#3550469.3555411.mp4

Supplemental video https://dl.acm.org/doi/10.1145/3550469.3555411#quest_sim.mp4

Appendix https://dl.acm.org/doi/10.1145/3550469.3555411#quest_sim_appendix.pdf

Conference

SA '22

Sponsor:

SIGGRAPH

SA '22: SIGGRAPH Asia 2022

December 6 - 9, 2022

Daegu, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 178 of 869 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
2,303
Total Downloads

Downloads (Last 12 months)859
Downloads (Last 6 weeks)91

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kim DYeo HPark K(2025)Effects of an Avatar Control on VR EmbodimentBioengineering10.3390/bioengineering1201003212:1(32)Online publication date: 3-Jan-2025
https://doi.org/10.3390/bioengineering12010032
Fang JSong HZuo CGao XChen XGuo SQin YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)SuDAProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692955(22042-22061)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692955
Cho YSon WBak JLee YLim HCha Y(2024)Full-Body Pose Estimation of Humanoid Robots Using Head-Worn Cameras for Digital Human-Augmented Robotic TelepresenceMathematics10.3390/math1219303912:19(3039)Online publication date: 28-Sep-2024
https://doi.org/10.3390/math12193039
Li XHe WJin SGugenheimer JHui PLiang HKristensson P(2024)Investigating Creation Perspectives and Icon Placement Preferences for On-Body Menus in Virtual RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981368:ISS(236-254)Online publication date: 24-Oct-2024
https://dl.acm.org/doi/10.1145/3698136
Jang DYang DJang DChoi BLee SShin D(2024)ELMO: Enhanced Real-time LiDAR Motion Capture through UpsamplingACM Transactions on Graphics10.1145/368799143:6(1-14)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687991
Tessler CGuo YNabati OChechik GPeng X(2024)MaskedMimic: Unified Physics-Based Character Control Through Masked Motion InpaintingACM Transactions on Graphics10.1145/368795143:6(1-21)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687951
Kimmel SLandwehr EHeuten W(2024)Kinetic Connections: Exploring the Impact of Realistic Body Movements on Social Presence in Collaborative Virtual RealityProceedings of the ACM on Human-Computer Interaction10.1145/36869108:CSCW2(1-30)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3686910
Starke SStarke PHe NKomura TYe Y(2024)Categorical Codebook Matching for Embodied Character ControllersACM Transactions on Graphics10.1145/365820943:4(1-14)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3658209
Hu HYi XCao ZYong JXu F(2024)Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with PhysicsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657505(1-10)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657505
Juravsky JGuo YFidler SPeng X(2024)SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised DistillationACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657492(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657492
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten