Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-20047-2_30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance

Published: 23 October 2022 Publication History

Abstract

Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation. Due to the inherent depth ambiguity of monocular settings, 3D motions captured with existing methods often contain severe artefacts such as incorrect body-scene inter-penetrations, jitter and body floating. To tackle these issues, we propose HULC, a new approach for 3D human MoCap which is aware of the scene geometry. HULC estimates 3D poses and dense body-environment surface contacts for improved 3D localisations, as well as the absolute scale of the subject. Furthermore, we introduce a 3D pose trajectory optimisation based on a novel pose manifold sampling that resolves erroneous body-environment inter-penetrations. Although the proposed method requires less structured inputs compared to existing scene-aware monocular MoCap algorithms, it produces more physically-plausible poses: HULC significantly and consistently outperforms the existing approaches in various experiments and on different metrics. Project page: https://vcai.mpi-inf.mpg.de/projects/HULC/.

References

[1]
Bogo F, Kanazawa A, Lassner C, Gehler P, Romero J, and Black MJ Leibe B, Matas J, Sebe N, and Welling M Keep it SMPL: automatic estimation of 3d human pose and shape from a single image Computer Vision – ECCV 2016 2016 Cham Springer 561-578
[2]
Cao Z, Gao H, Mangalam K, Cai Q-Z, Vo M, and Malik J Vedaldi A, Bischof H, Brox T, and Frahm J-M Long-term human motion prediction with scene context Computer Vision – ECCV 2020 2020 Cham Springer 387-404
[3]
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: realtime multi-person 2d pose estimation using part affinity fields. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2019)
[4]
Charles J, Pfister T, Everingham M, and Zisserman A Automatic and efficient human pose estimation for sign language videos Int. J. Comput. Vision 2013 110 70-90
[5]
Chen, C., Ramanan, D.: 3d human pose estimation = 2d pose estimation + matching. In: Computer Vision and Pattern Recognition (CVPR) (2017)
[6]
Choi, H., Moon, G., Lee, K.M.: Beyond static features for temporally consistent 3d human pose and shape from a video. In: Computer Vision and Pattern Recognition (CVPR) (2021)
[7]
Dabral, R., Shimada, S., Jain, A., Theobalt, C., Golyanik, V.: Gravity-aware monocular 3d human-object reconstruction. In: International Conference on Computer Vision (ICCV) (2021)
[8]
Fieraru, M., Zanfir, M., Oneata, E., Popa, A.I., Olaru, V., Sminchisescu, C.: Three-dimensional reconstruction of human interactions. In: Computer Vision and Pattern Recognition (CVPR) (2020)
[9]
Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2d features and intermediate 3d representations. In: Computer Vision and Pattern Recognition (CVPR) (2019)
[10]
Hassan, M., Ceylan, D., Villegas, R., Saito, J., Yang, J., Zhou, Y., Black, M.J.: Stochastic scene-aware motion prediction. In: International Conference on Computer Vision (ICCV) (2021)
[11]
Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constraints. In: International Conference on Computer Vision (ICCV) (2019)
[12]
Hassan, M., Ghosh, P., Tesch, J., Tzionas, D., Black, M.J.: Populating 3D scenes by learning human-scene interaction. In: Computer Vision and Pattern Recognition (CVPR) (2021)
[13]
Jiang, W., Kolotouros, N., Pavlakos, G., Zhou, X., Daniilidis, K.: Coherent reconstruction of multiple humans from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2020)
[14]
John, V., Trucco, E., McKenna, S.: Markerless human motion capture using charting and manifold constrained particle swarm optimisation. In: British Machine Vision Conference (BMVC) (2010)
[15]
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Computer Vision and Pattern Recognition (CVPR) (2018)
[16]
Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3d human dynamics from video. In: Computer Vision and Pattern Recognition (CVPR) (2019)
[17]
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (ICLR) (2014)
[18]
Knauer, C., Löffler, M., Scherfenberg, M., Wolle, T.: The directed hausdorff distance between imprecise point sets. In: International Symposium on Algorithms and Computation (ISAAC) (2009)
[19]
Kocabas, M., Athanasiou, N., Black, M.J.: Vibe: video inference for human body pose and shape estimation. In: Computer Vision and Pattern Recognition (CVPR) (2020)
[20]
Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: PARE: part attention regressor for 3D human body estimation. In: International Conference on Computer Vision (ICCV) (2021)
[21]
Kocabas, M., Huang, C.H.P., Tesch, J., Müller, L., Hilliges, O., Black, M.J.: SPEC: seeing people in the wild with an estimated camera. In: International Conference on Computer Vision (ICCV) (2021)
[22]
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: International Conference on Computer Vision (ICCV) (2019)
[23]
Kolotouros, N., Pavlakos, G., Jayaraman, D., Daniilidis, K.: Probabilistic modeling for human mesh recovery. In: International Conference on Computer Vision (ICCV) (2021)
[24]
Li, Z., Shimada, S., Schiele, B., Theobalt, C., Golyanik, V.: Mocapdeform: monocular 3d human motion capture in deformable scenes. In: Arxiv (2022)
[25]
Li, Z., Sedlar, J., Carpentier, J., Laptev, I., Mansard, N., Sivic, J.: Estimating 3d motion and forces of person-object interactions from monocular video. In: Computer Vision and Pattern Recognition (CVPR) (2019)
[26]
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: International Conference on Computer Vision (ICCV) (2019)
[27]
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: International Conference on Computer Vision (ICCV) (2017)
[28]
Mehta, D., et al.: Monocular 3d human pose estimation in the wild using improved CNN supervision. In: International Conference on 3D Vision (3DV) (2017)
[29]
Mehta D et al. XNect: real-time multi-person 3d motion capture with a single RGB camera ACM Trans. Graph. (TOG) 2020 39 4 82-91
[30]
Mehta D et al. VNect: Real-time 3d human pose estimation with a single RGB camera ACM Trans. Graph. (TOG) 2017 36 4 1-4
[31]
Moreno-Noguer, F.: 3d human pose estimation from a single image via distance matrix regression. In: Computer Vision and Pattern Recognition (CVPR) (2017)
[32]
Müller, L., Osman, A.A.A., Tang, S., Huang, C.H.P., Black, M.J.: On self-contact and human pose. In: Computer Vision and Pattern Recognition (CVPR) (2021)
[33]
Newell A, Yang K, and Deng J Leibe B, Matas J, Sebe N, and Welling M Stacked hourglass networks for human pose estimation Computer Vision – ECCV 2016 2016 Cham Springer 483-499
[34]
Pavlakos, G., et al.: Expressive body capture: 3d hands, face, and body from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2019)
[35]
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3D human pose. In: Computer Vision and Pattern Recognition (CVPR) (2017)
[36]
Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3d human pose and shape from a single color image. In: Computer Vision and Pattern Recognition (CVPR) (2018)
[37]
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: Computer Vision and Pattern Recognition (CVPR) (2019)
[38]
Rempe, D., Birdal, T., Hertzmann, A., Yang, J., Sridhar, S., Guibas, L.J.: Humor: 3d human motion model for robust pose estimation. In: International Conference on Computer Vision (ICCV) (2021)
[39]
Rempe D, Guibas LJ, Hertzmann A, Russell B, Villegas R, and Yang J Vedaldi A, Bischof H, Brox T, and Frahm J-M Contact and human dynamics from monocular video Computer Vision – ECCV 2020 2020 Cham Springer 71-87
[40]
Rhodin H, Salzmann M, and Fua P Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Unsupervised geometry-aware representation for 3d human pose estimation Computer Vision – ECCV 2018 2018 Cham Springer 765-782
[41]
Saini, S., Rambli, D.R.B.A., Sulaiman, S.B., Zakaria, M.N.B.: Human pose tracking in low-dimensional subspace using manifold learning by charting. In: International Conference on Signal and Image Processing Applications (ICSIPA) (2013)
[42]
Saini S, Rambli DRBA, Sulaiman SB, Zakaria MNB, and Rohkmah S Markerless multi-view human motion tracking using manifold model learning by charting Proc. Eng. 2012 41 664-670
[43]
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PiFU: pixel-aligned implicit function for high-resolution clothed human digitization. In: International Conference on Computer Vision (ICCV) (2019)
[44]
Sharma, S., Varigonda, P.T., Bindal, P., Sharma, A., Jain, A.: Monocular 3d human pose estimation by generation and ordinal ranking. In: International Conference on Computer Vision (ICCV) (2019)
[45]
Shi M Motionet: 3d human motion reconstruction from monocular video with skeleton consistency ACM Trans. Graph. (TOG) 2020 40 1 1-15
[46]
Shimada S, Golyanik V, Xu W, Pérez P, and Theobalt C Neural monocular 3d human motion capture with physical awareness ACM Trans. Graph. (TOG) 2021 40 4 1-5
[47]
Shimada S, Golyanik V, Xu W, and Theobalt C PhysCAP: physically plausible monocular 3d motion capture in real time ACM Trans. Graph. 2020 39 6 1-6
[48]
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems (NIPS) (2015)
[49]
Sun, Y., Ye, Y., Liu, W., Gao, W., Fu, Y., Mei, T.: Human mesh recovery from monocular images via a skeleton-disentangled representation. In: International Conference on Computer Vision (ICCV) (2019)
[50]
Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3d human pose with deep neural networks. In: British Machine Vision Conference (BMVC) (2016)
[51]
Tomè, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3d pose estimation from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2017)
[53]
Wang, J., Xu, H., Xu, J., Liu, S., Wang, X.: Synthesizing long-term 3d human motion and interaction in 3d scenes. In: Computer Vision and Pattern Recognition (CVPR) (2021)
[54]
Wang, Z., Chen, L., Rathore, S., Shin, D., Fowlkes, C.: Geometric pose affordance: 3d human pose with scene constraints. In: Arxiv (2019)
[55]
Wang Z, Shin D, and Fowlkes CC Bartoli A and Fusiello A Predicting camera viewpoint improves cross-dataset generalization for 3d human pose estimation Computer Vision – ECCV 2020 Workshops 2020 Cham Springer 523-540
[56]
Wei X and Chai J VideoMocap: modeling physically realistic human motion from monocular video sequences ACM Trans. Graph. (TOG) 2010 29 4 1-7
[57]
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3d human pose estimation in the wild by adversarial learning. In: Computer Vision and Pattern Recognition (CVPR) (2018)
[58]
Yi, X., et al.: Physical inertial poser (PIP): physics-aware real-time human motion tracking from sparse inertial sensors. In: Computer Vision and Pattern Recognition (CVPR) (2022)
[59]
Yuan, Y., Wei, S.E., Simon, T., Kitani, K., Saragih, J.: SimPoE: simulated character control for 3d human pose estimation. In: Computer Vision and Pattern Recognition (CVPR) (2021)
[60]
Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3d pose and shape estimation of multiple people in natural scenes - the importance of multiple scene constraints. In: Computer Vision and Pattern Recognition (CVPR) (2018)
[61]
Zhang, S., Zhang, Y., Bogo, F., Marc, P., Tang, S.: Learning motion priors for 4d human body capture in 3d scenes. In: International Conference on Computer Vision (ICCV), October 2021
[62]
Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: Computer Vision and Pattern Recognition (CVPR) (2020)
[63]
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: International Conference on Computer Vision (ICCV) (2017)
[64]
Zou, Y., Yang, J., Ceylan, D., Zhang, J., Perazzi, F., Huang, J.B.: Reducing footskate in human motion reconstruction with ground contact constraints. In: Winter Conference on Applications of Computer Vision (WACV) (2020)

Cited By

View all
  • (2023)Decaf: Monocular Deformation Capture for Face and Hand InteractionsACM Transactions on Graphics10.1145/361832942:6(1-16)Online publication date: 5-Dec-2023
  • (2023)GroundLink: A Dataset Unifying Human Body Movement and Ground Reaction DynamicsSIGGRAPH Asia 2023 Conference Papers10.1145/3610548.3618247(1-10)Online publication date: 10-Dec-2023

Index Terms

  1. HULC: 3D HUman Motion Capture with Pose Manifold SampLing and Dense Contact Guidance
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII
          Oct 2022
          827 pages
          ISBN:978-3-031-20046-5
          DOI:10.1007/978-3-031-20047-2

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 23 October 2022

          Author Tags

          1. 3D Human MoCap
          2. Dense contact estimations
          3. Sampling

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 18 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)Decaf: Monocular Deformation Capture for Face and Hand InteractionsACM Transactions on Graphics10.1145/361832942:6(1-16)Online publication date: 5-Dec-2023
          • (2023)GroundLink: A Dataset Unifying Human Body Movement and Ground Reaction DynamicsSIGGRAPH Asia 2023 Conference Papers10.1145/3610548.3618247(1-10)Online publication date: 10-Dec-2023

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media