Article

HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance

Authors:

Vladislav Golyanik,

Patrick Pérez,

Christian TheobaltAuthors Info & Claims

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII

Pages 516 - 533

https://doi.org/10.1007/978-3-031-20047-2_30

Published: 23 October 2022 Publication History

Abstract

Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation. Due to the inherent depth ambiguity of monocular settings, 3D motions captured with existing methods often contain severe artefacts such as incorrect body-scene inter-penetrations, jitter and body floating. To tackle these issues, we propose HULC, a new approach for 3D human MoCap which is aware of the scene geometry. HULC estimates 3D poses and dense body-environment surface contacts for improved 3D localisations, as well as the absolute scale of the subject. Furthermore, we introduce a 3D pose trajectory optimisation based on a novel pose manifold sampling that resolves erroneous body-environment inter-penetrations. Although the proposed method requires less structured inputs compared to existing scene-aware monocular MoCap algorithms, it produces more physically-plausible poses: HULC significantly and consistently outperforms the existing approaches in various experiments and on different metrics. Project page: https://vcai.mpi-inf.mpg.de/projects/HULC/.

References

[1]

Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, and Black MJ Leibe B, Matas J, Sebe N, and Welling M Keep it SMPL: automatic estimation of 3d human pose and shape from a single image Computer Vision – ECCV 2016 2016 Cham Springer 561-578

[2]

Cao Z, Gao H, Mangalam K, Cai Q-Z, Vo M, and Malik J Vedaldi A, Bischof H, Brox T, and Frahm J-M Long-term human motion prediction with scene context Computer Vision – ECCV 2020 2020 Cham Springer 387-404

[3]

Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2019)

[4]

Charles J, Pfister T, Everingham M, and Zisserman A Automatic and efficient human pose estimation for sign language videos Int. J. Comput. Vision 2013 110 70-90

[5]

Chen, C., Ramanan, D.: 3d human pose estimation = 2d pose estimation + matching. In: Computer Vision and Pattern Recognition (CVPR) (2017)

[6]

Choi, H., Moon, G., Lee, K.M.: Beyond static features for temporally consistent 3d human pose and shape from a video. In: Computer Vision and Pattern Recognition (CVPR) (2021)

[7]

Dabral, R., Shimada, S., Jain, A., Theobalt, C., Golyanik, V.: Gravity-aware monocular 3d human-object reconstruction. In: International Conference on Computer Vision (ICCV) (2021)

[8]

Fieraru, M., Zanfir, M., Oneata, E., Popa, A.I., Olaru, V., Sminchisescu, C.: Three-dimensional reconstruction of human interactions. In: Computer Vision and Pattern Recognition (CVPR) (2020)

[9]

Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2d features and intermediate 3d representations. In: Computer Vision and Pattern Recognition (CVPR) (2019)

[10]

Hassan, M., Ceylan, D., Villegas, R., Saito, J., Yang, J., Zhou, Y., Black, M.J.: Stochastic scene-aware motion prediction. In: International Conference on Computer Vision (ICCV) (2021)

[11]

Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constraints. In: International Conference on Computer Vision (ICCV) (2019)

[12]

Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M.J.: Populating 3D scenes by learning human-scene interaction. In: Computer Vision and Pattern Recognition (CVPR) (2021)

[13]

Jiang, W., Kolotouros, N., Pavlakos, G., Zhou, X., Daniilidis, K.: Coherent reconstruction of multiple humans from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2020)

[14]

John, V., Trucco, E., McKenna, S.: Markerless human motion capture using charting and manifold constrained particle swarm optimisation. In: British Machine Vision Conference (BMVC) (2010)

[15]

Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Computer Vision and Pattern Recognition (CVPR) (2018)

[16]

Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: Computer Vision and Pattern Recognition (CVPR) (2019)

[17]

Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)

[18]

Knauer, C., Löffler, M., Scherfenberg, M., Wolle, T.: The directed hausdorff distance between imprecise point sets. In: International Symposium on Algorithms and Computation (ISAAC) (2009)

[19]

Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Computer Vision and Pattern Recognition (CVPR) (2020)

[20]

Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: PARE: part attention regressor for 3D human body estimation. In: International Conference on Computer Vision (ICCV) (2021)

[21]

Kocabas, M., Huang, C.H.P., Tesch, J., Müller, L., Hilliges, O., Black, M.J.: SPEC: seeing people in the wild with an estimated camera. In: International Conference on Computer Vision (ICCV) (2021)

[22]

Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: International Conference on Computer Vision (ICCV) (2019)

[23]

Kolotouros, N., Pavlakos, G., Jayaraman, D., Daniilidis, K.: Probabilistic modeling for human mesh recovery. In: International Conference on Computer Vision (ICCV) (2021)

[24]

Li, Z., Shimada, S., Schiele, B., Theobalt, C., Golyanik, V.: Mocapdeform: monocular 3d human motion capture in deformable scenes. In: Arxiv (2022)

[25]

Li, Z., Sedlar, J., Carpentier, J., Laptev, I., Mansard, N., Sivic, J.: Estimating 3d motion and forces of person-object interactions from monocular video. In: Computer Vision and Pattern Recognition (CVPR) (2019)

[26]

Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: International Conference on Computer Vision (ICCV) (2019)

[27]

Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: International Conference on Computer Vision (ICCV) (2017)

[28]

Mehta, D., et al.: Monocular 3d human pose estimation in the wild using improved CNN supervision. In: International Conference on 3D Vision (3DV) (2017)

[29]

Mehta D et al. XNect: real-time multi-person 3d motion capture with a single RGB camera ACM Trans. Graph. (TOG) 2020 39 4 82-91

[30]

Mehta D et al. VNect: Real-time 3d human pose estimation with a single RGB camera ACM Trans. Graph. (TOG) 2017 36 4 1-4

[31]

Moreno-Noguer, F.: 3d human pose estimation from a single image via distance matrix regression. In: Computer Vision and Pattern Recognition (CVPR) (2017)

[32]

Müller, L., Osman, A.A.A., Tang, S., Huang, C.H.P., Black, M.J.: On self-contact and human pose. In: Computer Vision and Pattern Recognition (CVPR) (2021)

[33]

Newell A, Yang K, and Deng J Leibe B, Matas J, Sebe N, and Welling M Stacked hourglass networks for human pose estimation Computer Vision – ECCV 2016 2016 Cham Springer 483-499

[34]

Pavlakos, G., et al.: Expressive body capture: 3d hands, face, and body from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2019)

[35]

Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Computer Vision and Pattern Recognition (CVPR) (2017)

[36]

Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3d human pose and shape from a single color image. In: Computer Vision and Pattern Recognition (CVPR) (2018)

[37]

Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: Computer Vision and Pattern Recognition (CVPR) (2019)

[38]

Rempe, D., Birdal, T., Hertzmann, A., Yang, J., Sridhar, S., Guibas, L.J.: Humor: 3d human motion model for robust pose estimation. In: International Conference on Computer Vision (ICCV) (2021)

[39]

Rempe D, Guibas LJ, Hertzmann A, Russell B, Villegas R, and Yang J Vedaldi A, Bischof H, Brox T, and Frahm J-M Contact and human dynamics from monocular video Computer Vision – ECCV 2020 2020 Cham Springer 71-87

[40]

Rhodin H, Salzmann M, and Fua P Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Unsupervised geometry-aware representation for 3d human pose estimation Computer Vision – ECCV 2018 2018 Cham Springer 765-782

[41]

Saini, S., Rambli, D.R.B.A., Sulaiman, S.B., Zakaria, M.N.B.: Human pose tracking in low-dimensional subspace using manifold learning by charting. In: International Conference on Signal and Image Processing Applications (ICSIPA) (2013)

[42]

Saini S, Rambli DRBA, Sulaiman SB, Zakaria MNB, and Rohkmah S Markerless multi-view human motion tracking using manifold model learning by charting Proc. Eng. 2012 41 664-670

[43]

Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PiFU: pixel-aligned implicit function for high-resolution clothed human digitization. In: International Conference on Computer Vision (ICCV) (2019)

[44]

Sharma, S., Varigonda, P.T., Bindal, P., Sharma, A., Jain, A.: Monocular 3d human pose estimation by generation and ordinal ranking. In: International Conference on Computer Vision (ICCV) (2019)

[45]

Shi M Motionet: 3d human motion reconstruction from monocular video with skeleton consistency ACM Trans. Graph. (TOG) 2020 40 1 1-15

[46]

Shimada S, Golyanik V, Xu W, Pérez P, and Theobalt C Neural monocular 3d human motion capture with physical awareness ACM Trans. Graph. (TOG) 2021 40 4 1-5

[47]

Shimada S, Golyanik V, Xu W, and Theobalt C PhysCAP: physically plausible monocular 3d motion capture in real time ACM Trans. Graph. 2020 39 6 1-6

[48]

Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems (NIPS) (2015)

[49]

Sun, Y., Ye, Y., Liu, W., Gao, W., Fu, Y., Mei, T.: Human mesh recovery from monocular images via a skeleton-disentangled representation. In: International Conference on Computer Vision (ICCV) (2019)

[50]

Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3d human pose with deep neural networks. In: British Machine Vision Conference (BMVC) (2016)

[51]

Tomè, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3d pose estimation from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2017)

[52]

Vicon blade. https://www.vicon.com/

[53]

Wang, J., Xu, H., Xu, J., Liu, S., Wang, X.: Synthesizing long-term 3d human motion and interaction in 3d scenes. In: Computer Vision and Pattern Recognition (CVPR) (2021)

[54]

Wang, Z., Chen, L., Rathore, S., Shin, D., Fowlkes, C.: Geometric pose affordance: 3d human pose with scene constraints. In: Arxiv (2019)

[55]

Wang Z, Shin D, and Fowlkes CC Bartoli A and Fusiello A Predicting camera viewpoint improves cross-dataset generalization for 3d human pose estimation Computer Vision – ECCV 2020 Workshops 2020 Cham Springer 523-540

[56]

Wei X and Chai J VideoMocap: modeling physically realistic human motion from monocular video sequences ACM Trans. Graph. (TOG) 2010 29 4 1-7

[57]

Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3d human pose estimation in the wild by adversarial learning. In: Computer Vision and Pattern Recognition (CVPR) (2018)

[58]

Yi, X., et al.: Physical inertial poser (PIP): physics-aware real-time human motion tracking from sparse inertial sensors. In: Computer Vision and Pattern Recognition (CVPR) (2022)

[59]

Yuan, Y., Wei, S.E., Simon, T., Kitani, K., Saragih, J.: SimPoE: simulated character control for 3d human pose estimation. In: Computer Vision and Pattern Recognition (CVPR) (2021)

[60]

Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3d pose and shape estimation of multiple people in natural scenes - the importance of multiple scene constraints. In: Computer Vision and Pattern Recognition (CVPR) (2018)

[61]

Zhang, S., Zhang, Y., Bogo, F., Marc, P., Tang, S.: Learning motion priors for 4d human body capture in 3d scenes. In: International Conference on Computer Vision (ICCV), October 2021

[62]

Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: Computer Vision and Pattern Recognition (CVPR) (2020)

[63]

Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: International Conference on Computer Vision (ICCV) (2017)

[64]

Zou, Y., Yang, J., Ceylan, D., Zhang, J., Perazzi, F., Huang, J.B.: Reducing footskate in human motion reconstruction with ground contact constraints. In: Winter Conference on Applications of Computer Vision (WACV) (2020)

Cited By

Shimada SGolyanik VPérez PTheobalt C(2023)Decaf: Monocular Deformation Capture for Face and Hand InteractionsACM Transactions on Graphics10.1145/361832942:6(1-16)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3618329
Han XSenderling BTo SKumar DWhiting ESaito J(2023)GroundLink: A Dataset Unifying Human Body Movement and Ground Reaction DynamicsSIGGRAPH Asia 2023 Conference Papers10.1145/3610548.3618247(1-10)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.1145/3610548.3618247

Index Terms

HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
  2. Computer graphics
    1. Animation
    2. Graphics systems and interfaces
      1. Virtual reality
2. Human-centered computing
  1. Human computer interaction (HCI)

Index terms have been assigned to the content through auto-classification.

Recommendations

Outdoor Human Motion Capture by Simultaneous Optimization of Pose and Camera Parameters

We present a method for capturing the skeletal motions of humans using a sparse set of potentially moving cameras in an uncontrolled environment. Our approach is able to track multiple people even in front of cluttered and non-static backgrounds, and ...
Automatic reconstruction of 3D human motion pose from uncalibrated monocular video sequences based on markerless human motion tracking

We present a method to reconstruct human motion pose from uncalibrated monocular video sequences based on the morphing appearance model matching. The human pose estimation is made by integrated human joint tracking with pose reconstruction in depth-...
Enhancing Silhouette-Based Human Motion Capture with 3D Motion Fields
PG '03: Proceedings of the 11th Pacific Conference on Computer Graphics and Applications

High-quality non-intrusive human motion capture is necessary for acquistion of model-based free-viewpoint video of human actors. Silhouette-based approaches have demonstrated that they are able to accurately recover a large range of human motion from ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII

Oct 2022

827 pages

ISBN:978-3-031-20046-5

DOI:10.1007/978-3-031-20047-2

Editors:
Shai Avidan
Tel Aviv University, Tel Aviv, Israel
,
Gabriel Brostow
University College London, London, UK
,
Moustapha Cissé
Google AI, Accra, Ghana
,
Giovanni Maria Farinella
University of Catania, Catania, Italy
,
Tal Hassner
Facebook (United States), Menlo Park, CA, USA

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 October 2022

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shimada SGolyanik VPérez PTheobalt C(2023)Decaf: Monocular Deformation Capture for Face and Hand InteractionsACM Transactions on Graphics10.1145/361832942:6(1-16)Online publication date: 5-Dec-2023
https://dl.acm.org/doi/10.1145/3618329
Han XSenderling BTo SKumar DWhiting ESaito J(2023)GroundLink: A Dataset Unifying Human Body Movement and Ground Reaction DynamicsSIGGRAPH Asia 2023 Conference Papers10.1145/3610548.3618247(1-10)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.1145/3610548.3618247

View Options

View options

Media

Figures

Other

Tables

View Table of Contents