Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-72940-9_2guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Expressive Whole-Body 3D Gaussian Avatar

Published: 17 November 2024 Publication History

Abstract

Facial expression and hand motions are necessary to express our emotions and interact with the world. Nevertheless, most of the 3D human avatars modeled from a casually captured video only support body motions without facial expressions and hand motions. In this work, we present ExAvatar, an expressive whole-body 3D human avatar learned from a short monocular video. We design ExAvatar as a combination of the whole-body parametric mesh model (SMPL-X) and 3D Gaussian Splatting (3DGS). The main challenges are 1) a limited diversity of facial expressions and poses in the video and 2) the absence of 3D observations, such as 3D scans and RGBD images. The limited diversity in the video makes animations with novel facial expressions and poses non-trivial. In addition, the absence of 3D observations could cause significant ambiguity in human parts that are not observed in the video, which can result in noticeable artifacts under novel motions. To address them, we introduce our hybrid representation of the mesh and 3D Gaussians. Our hybrid representation treats each 3D Gaussian as a vertex on the surface with pre-defined connectivity information (i.e., triangle faces) between them following the mesh topology of SMPL-X. It makes our ExAvatar animatable with novel facial expressions by driven by the facial expression space of SMPL-X. In addition, by using connectivity-based regularizers, we significantly reduce artifacts in novel facial expressions and poses.

References

[1]
Alldieck, T., Magnor, M., Xu, W., Theobalt, C., Pons-Moll, G.: Video based reconstruction of 3D people models. In: CVPR (2018)
[2]
Alldieck, T., Xu, H., Sminchisescu, C.: imGHUM: implicit generative models of 3D human shape and articulated pose. In: ICCV (2021)
[3]
Bagautdinov T et al. Driving-signal aware full-body avatars ACM TOG 2021 40 1-17
[4]
Cai, Z., et al.: SMPLer-X: scaling up expressive human pose and shape estimation. In: NeurIPS (2023)
[5]
Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)
[6]
Chen, J., et al.: Animatable neural radiance fields from monocular RGB videos. arXiv preprint arXiv:2106.13629 (2021)
[7]
Chen, Z., et al.: URhand: universal relightable hands. In: CVPR (2024)
[8]
Choi, H., Moon, G., Armando, M., Leroy, V., Lee, K.M., Rogez, G.: MonoNHR: monocular neural human renderer. In: 3DV (2022)
[9]
Choutas V, Pavlakos G, Bolkart T, Tzionas D, and Black MJ Vedaldi A, Bischof H, Brox T, and Frahm J-M Monocular expressive body regression through body-driven attention Computer Vision – ECCV 2020 2020 Cham Springer 20-40
[10]
Contributors, M.: Openmmlab pose estimation toolbox and benchmark (2020). https://github.com/open-mmlab/mmpose
[11]
Feng, Y., Choutas, V., Bolkart, T., Tzionas, D., Black, M.J.: Collaborative regression of expressive bodies using moderation. In: 3DV (2021)
[12]
Feng Y, Feng H, Black MJ, and Bolkart T Learning an animatable detailed 3D face model from in-the-wild images ACM TOG 2021 40 1-13
[13]
Guo, C., Jiang, T., Chen, X., Song, J., Hilliges, O.: Vid2Avatar: 3D avatar reconstruction from videos in the wild via self-supervised scene decomposition. In: CVPR (2023)
[14]
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
[15]
Hu, L., et al.: GaussianAvatar: towards realistic human avatar modeling from a single video via animatable 3D Gaussians. arXiv preprint arXiv:2312.02134 (2023)
[16]
Jiang, T., Chen, X., Song, J., Hilliges, O.: InstantAvatar: learning avatars from monocular video in 60 seconds. In: CVPR (2023)
[17]
Jiang, W., Yi, K.M., Samei, G., Tuzel, O., Ranjan, A.: NeuMan: neural human radiance field from a single video. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13692, pp. 402–418. Springer, Cham (2022).
[18]
Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3D deformation model for tracking faces, hands, and bodies. In: CVPR (2018)
[19]
Kerbl B, Kopanas G, Leimkühler T, and Drettakis G 3D Gaussian splatting for real-time radiance field rendering ACM TOG 2023 42 4 139-1
[20]
Kocabas, M., Chang, J.H.R., Gabriel, J., Tuzel, O., Ranjan, A.: HUGS: human Gaussian splats. arXiv preprint arXiv:2311.17910 (2023)
[21]
Kwon, Y., Kim, D., Ceylan, D., Fuchs, H.: Neural human performer: learning generalizable radiance fields for human performance rendering. In: NeurIPS (2021)
[22]
Li, J., Bian, S., Xu, C., Chen, Z., Yang, L., Lu, C.: HybrIK-X: hybrid analytical-neural inverse kinematics for whole-body mesh recovery. arXiv preprint arXiv:2304.05690 (2023)
[23]
Li T, Bolkart T, Black MJ, Li H, and Romero J Learning a model of facial shape and expression from 4D scans ACM TOG 2017 36 1-17
[24]
Lin, J., Zeng, A., Wang, H., Zhang, L., Li, Y.: One-stage 3D whole-body mesh recovery with component aware transformer. In: CVPR (2023)
[25]
Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: ICCV (2019)
[26]
Liu, X., et al.: GEA: reconstructing expressive 3D Gaussian avatar from monocular video. arXiv preprint arXiv:2402.16607 (2024)
[27]
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: Representing scenes as neural radiance fields for view synthesis. Commun. ACM (2021)
[28]
Moon, G., Choi, H., Lee, K.M.: Accurate 3D hand pose estimation for whole-body 3D human mesh estimation. In: CVPRW (2022)
[29]
Moon G, Shiratori T, and Lee KM Vedaldi A, Bischof H, Brox T, and Frahm J-M DeepHandMesh: a weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling Computer Vision – ECCV 2020 2020 Cham Springer 440-455
[30]
Moon, G., Xu, W., Joshi, R., Wu, C., Shiratori, T.: Authentic hand avatar from a phone scan via universal hand model. In: CVPR (2024)
[31]
Patel, P., Huang, C.H.P., Tesch, J., Hoffmann, D.T., Tripathi, S., Black, M.J.: AGORA: avatars in geography optimized for regression analysis. In: CVPR (2021)
[32]
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: CVPR (2019)
[33]
Peng, S., et al.: Animatable neural radiance fields for modeling dynamic human bodies. In: ICCV (2021)
[34]
Peng, S., et al.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: CVPR (2021)
[35]
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. In: ICLR (2023)
[36]
Qian, Z., Wang, S., Mihajlovic, M., Geiger, A., Tang, S.: 3DGS-avatar: animatable avatars via deformable 3D Gaussian splatting. In: CVPR (2024)
[37]
Ravi, N., et al.: Accelerating 3D deep learning with PyTorch3D. arXiv preprint arXiv:2007.08501 (2020)
[38]
Remelli, E., et al.: Drivable volumetric avatars using texel-aligned features. In: ACM SIGGRAPH Conference Proceedings (2022)
[39]
Rong, Y., Shiratori, T., Joo, H.: FrankMocap: a monocular 3D whole-body pose estimation system via regression and integration. In: ICCVW (2021)
[40]
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: ICCV (2019)
[41]
Shen, K., et al.: X-Avatar: expressive human avatars. In: CVPR (2023)
[42]
Weng, C.Y., Curless, B., Srinivasan, P.P., Barron, J.T., Kemelmacher-Shlizerman, I.: HumanNeRF: free-viewpoint rendering of moving people from monocular video. In: CVPR (2022)
[43]
Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: GHUM & GHUML: generative 3D human shape and articulated pose models. In: CVPR (2020)
[44]
Zhang H et al. PyMAF-X: towards well-aligned full-body model regression from monocular images TPAMI 2023 45 10 12287-12303
[45]
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XLI
Sep 2024
585 pages
ISBN:978-3-031-72939-3
DOI:10.1007/978-3-031-72940-9
  • Editors:
  • Aleš Leonardis,
  • Elisa Ricci,
  • Stefan Roth,
  • Olga Russakovsky,
  • Torsten Sattler,
  • Gül Varol

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 November 2024

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media