TAVA: Template-free Animatable Volumetric Actors

Li, Ruilong; Tanke, Julian; Vo, Minh; Zollhöfer, Michael; Gall, Jürgen; Kanazawa, Angjoo; Lassner, Christoph

doi:10.1007/978-3-031-19824-3_25

Ruilong Li^12,14,
Julian Tanke^13,14,
Minh Vo¹⁴,
Michael Zollhöfer¹⁴,
Jürgen Gall¹³,
Angjoo Kanazawa¹² &
…
Christoph Lassner¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13692))

Included in the following conference series:

European Conference on Computer Vision

3263 Accesses
63 Citations

Abstract

Coordinate-based volumetric representations have the potential to generate photo-realistic virtual avatars from images. However, virtual avatars need to be controllable and be rendered in novel poses that may not have been observed. Traditional techniques, such as LBS, provide such a controlling function; yet it usually requires a hand-designed body template, 3D scan data, and surface-based appearance models. On the other hand, neural representations have been shown to be powerful in representing visual details, but are under-explored in dynamic and articulated settings. In this paper, we propose TAVA, a method to create Template-free Animatable Volumetric Actors, based on neural representations. We rely solely on multi-view data and a tracked skeleton to create a volumetric model of an actor, which can be animated at test time given novel poses. Since TAVA does not require a body template, it is applicable to humans as well as other creatures such as animals. Furthermore, TAVA is designed such that it can recover accurate dense correspondences, making it amenable to content-creation and editing tasks. Through extensive experiments, we demonstrate that the proposed method generalizes well to novel poses as well as unseen views and showcase basic editing capabilities. The code is available at https://github.com/facebookresearch/tava.

Work done partially while Ruilong and Julian were at Meta Reality Labs Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos

MeshAvatar: Learning High-Quality Triangular Human Avatars from Multi-view Videos

AvatarGen: A 3D Generative Model for Animatable Human Avatars

Notes

1.
For NARF, our re-implementation achieves better performance than it’s official implementation. Please refer to the supplmental material for further details.
2.
Pose-NeRF, A-NeRF and NARF all query the color and density of ${(\textbf{x}_v, \textbf{P})}$ in a higher dimensional ($>3$) space, where we do the nearest neighbor matching for them using our approach as described in Sect. 3.5. Please refer to the supp.mat. for further details.

References

Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. In: ACM SIGGRAPH 2005 Papers, pp. 408–416 (2005)
Google Scholar
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: International Conference on Computer Vision (2021)
Google Scholar
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: CVPR (2022)
Google Scholar
Borshukov, G., Piponi, D., Larsen, O., Lewis, J.P., Tempelaar-Lietz, C.: Universal capture-image-based facial animation for “the matrix reloaded”. In: SIGGRAPH 2005 Courses (2005)
Google Scholar
Carranza, J., Theobalt, C., Magnor, M.A., Seidel, H.P.: Free-viewpoint video of human actors. Trans. Graphics 22, 569–577 (2003)
Article Google Scholar
Casas, D., Volino, M., Collomosse, J., Hilton, A.: 4D video textures for interactive character appearance. In: Computer Graphics Forum (2014)
Google Scholar
Chen, X., Zheng, Y., Black, M.J., Hilliges, O., Geiger, A.: Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11594–11604 (2021)
Google Scholar
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Collet, A., et al.: High-quality streamable free-viewpoint video. Trans. Graphics 34, 1–13 (2015)
Article Google Scholar
De Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008)
Google Scholar
Deng, B., et al.: NASA neural articulated shape approximation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 612–628. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_36
Chapter Google Scholar
Guo, K., et al.: The relightables: volumetric performance capture of humans with realistic relighting. ACM Trans. Graphics (ToG) 38(6), 1–19 (2019)
Google Scholar
Hasler, N., Thormählen, T., Rosenhahn, B., Seidel, H.P.: Learning skeletons for shape and pose. In: SIGGRAPH Symposium on Interactive 3D Graphics and Games (2010)
Google Scholar
Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: Arch: animatable reconstruction of clothed humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3093–3102 (2020)
Google Scholar
James, D.L., Twigg, C.D.: Skinning mesh animations. Trans. Graphics 24, 399–407 (2005)
Article Google Scholar
Jiang, B., Zhang, J., Cai, J., Zheng, J.: Disentangled human body embedding based on deep hierarchical neural network. Trans. Visual. Comput. Graphics 26, 2560–2575 (2020)
Article Google Scholar
Li, H., et al.: Temporally coherent completion of dynamic shapes. ACM Trans. Graphics (TOG) 31(1), 1–11 (2012)
Article Google Scholar
Li, K., et al.: SPA: sparse photorealistic animation using a single RGB-D camera. Trans. Circuits Syst. Video Technol. 27, 771–783 (2016)
Google Scholar
Li, R., et al.: Learning formation of physically-based face attributes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3410–3419 (2020)
Google Scholar
Li, R., Xiu, Y., Saito, S., Huang, Z., Olszewski, K., Li, H.: Monocular real-time volumetric performance capture. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 49–67. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_4
Chapter Google Scholar
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Liu, L., Habermann, M., Rudnev, V., Sarkar, K., Gu, J., Theobalt, C.: Neural actor: neural free-view synthesis of human actors with pose control. ACM Trans. Graphics (TOG) 40(6), 1–16 (2021)
Google Scholar
Liu, S., Li, T., Chen, W., Li, H.: A general differentiable mesh renderer for image-based 3D reasoning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 50–62 (2020)
Article Google Scholar
Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: Learning dynamic renderable volumes from images. ACM Trans. Graph. 38(4), 65:1-65:14 (2019)
Article Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. Trans. Graphics 34, 1–16 (2015)
Article Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Occupancy flow: 4D reconstruction by learning particle dynamics. In: International Conference on Computer Vision (2019)
Google Scholar
Noguchi, A., Sun, X., Lin, S., Harada, T.: Neural articulated radiance field. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5762–5772 (2021)
Google Scholar
Osman, A.A.A., Bolkart, T., Black, M.J.: STAR: sparse trained articulated human body regressor. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 598–613. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_36
Chapter Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Park, K., et al.: Nerfies: deformable neural radiance fields. In: International Conference on Computer Vision (2021)
Google Scholar
Park, K., et al.: Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph. 40(6) (2021)
Google Scholar
Peng, S., et al.: Animatable neural radiance fields for modeling dynamic human bodies. In: International Conference on Computer Vision (2021)
Google Scholar
Peng, S., et al.: Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9054–9063 (2021)
Google Scholar
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: neural radiance fields for dynamic scenes. In: Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Raj, A., Tanke, J., Hays, J., Vo, M., Stoll, C., Lassner, C.: ANR: articulated neural rendering for virtual avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3722–3731 (2021)
Google Scholar
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFU: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2304–2314 (2019)
Google Scholar
Saito, S., Yang, J., Ma, Q., Black, M.J.: Scanimate: weakly supervised learning of skinned clothed avatar networks. In: Conference on Computer Vision and Pattern Recognition (2021)
Google Scholar
Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3d-structure-aware neural scene representations. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graphics Appl. 27(3), 21–31 (2007)
Article Google Scholar
Su, S.Y., Yu, F., Zollhöfer, M., Rhodin, H.: A-nerf: articulated neural radiance fields for learning human shape, appearance, and pose. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Lassner, C., Theobalt, C.: Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In: International Conference on Computer Vision. IEEE (2021)
Google Scholar
Volino, M., Casas, D., Collomosse, J.P., Hilton, A.: Optimal representation of multi-view video. In: British Machine Vision Conference (2014)
Google Scholar
Weng, C.Y., Curless, B., Srinivasan, P.P., Barron, J.T., Kemelmacher-Shlizerman, I.: Humannerf: free-viewpoint rendering of moving people from monocular video. In: CVPR (2022)
Google Scholar
Xu, F., et al.: Video-based characters: creating new human performances from a multi-view video database. In: ACM SIGGRAPH 2011 papers (2011)
Google Scholar
Xu, Z., Zhou, Y., Kalogerakis, E., Landreth, C., Singh, K.: RigNet: neural rigging for articulated characters. Trans. Graphics (2020)
Google Scholar
Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: Plenoctrees for real-time rendering of neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5752–5761 (2021)
Google Scholar
Zhi, T., Lassner, C., Tung, T., Stoll, C., Narasimhan, S.G., Vo, M.: TexMesh: reconstructing detailed human texture and geometry from RGB-D video. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 492–509. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_29
Chapter Google Scholar
Zhou, K., Bhatnagar, B.L., Pons-Moll, G.: Unsupervised shape and pose disentanglement for 3D meshes. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 341–357. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_21
Chapter Google Scholar

Download references

Acknowledgements

Ruilong Li’s work at UC Berkeley is partly supported by the CONIX Research Center, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

Author information

Authors and Affiliations

UC Berkeley, Berkeley, USA
Ruilong Li & Angjoo Kanazawa
University of Bonn, Bonn, Germany
Julian Tanke & Jürgen Gall
Meta Reality Labs Research, Sausalito, USA
Ruilong Li, Julian Tanke, Minh Vo, Michael Zollhöfer & Christoph Lassner

Authors

Ruilong Li
View author publications
You can also search for this author in PubMed Google Scholar
Julian Tanke
View author publications
You can also search for this author in PubMed Google Scholar
Minh Vo
View author publications
You can also search for this author in PubMed Google Scholar
Michael Zollhöfer
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Gall
View author publications
You can also search for this author in PubMed Google Scholar
Angjoo Kanazawa
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Lassner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruilong Li .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1806 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, R. et al. (2022). TAVA: Template-free Animatable Volumetric Actors. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-19824-3_25
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TAVA: Template-free Animatable Volumetric Actors

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos

MeshAvatar: Learning High-Quality Triangular Human Avatars from Multi-view Videos

AvatarGen: A 3D Generative Model for Animatable Human Avatars

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1806 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TAVA: Template-free Animatable Volumetric Actors

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos

MeshAvatar: Learning High-Quality Triangular Human Avatars from Multi-view Videos

AvatarGen: A 3D Generative Model for Animatable Human Avatars

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1806 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation