Abstract
Coordinate-based volumetric representations have the potential to generate photo-realistic virtual avatars from images. However, virtual avatars need to be controllable and be rendered in novel poses that may not have been observed. Traditional techniques, such as LBS, provide such a controlling function; yet it usually requires a hand-designed body template, 3D scan data, and surface-based appearance models. On the other hand, neural representations have been shown to be powerful in representing visual details, but are under-explored in dynamic and articulated settings. In this paper, we propose TAVA, a method to create Template-free Animatable Volumetric Actors, based on neural representations. We rely solely on multi-view data and a tracked skeleton to create a volumetric model of an actor, which can be animated at test time given novel poses. Since TAVA does not require a body template, it is applicable to humans as well as other creatures such as animals. Furthermore, TAVA is designed such that it can recover accurate dense correspondences, making it amenable to content-creation and editing tasks. Through extensive experiments, we demonstrate that the proposed method generalizes well to novel poses as well as unseen views and showcase basic editing capabilities. The code is available at https://github.com/facebookresearch/tava.
Work done partially while Ruilong and Julian were at Meta Reality Labs Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
For NARF, our re-implementation achieves better performance than it’s official implementation. Please refer to the supplmental material for further details.
- 2.
Pose-NeRF, A-NeRF and NARF all query the color and density of \({(\textbf{x}_v, \textbf{P})}\) in a higher dimensional (\(>3\)) space, where we do the nearest neighbor matching for them using our approach as described in Sect. 3.5. Please refer to the supp.mat. for further details.
References
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. In: ACM SIGGRAPH 2005 Papers, pp. 408–416 (2005)
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: a multiscale representation for anti-aliasing neural radiance fields. In: International Conference on Computer Vision (2021)
Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: CVPR (2022)
Borshukov, G., Piponi, D., Larsen, O., Lewis, J.P., Tempelaar-Lietz, C.: Universal capture-image-based facial animation for “the matrix reloaded”. In: SIGGRAPH 2005 Courses (2005)
Carranza, J., Theobalt, C., Magnor, M.A., Seidel, H.P.: Free-viewpoint video of human actors. Trans. Graphics 22, 569–577 (2003)
Casas, D., Volino, M., Collomosse, J., Hilton, A.: 4D video textures for interactive character appearance. In: Computer Graphics Forum (2014)
Chen, X., Zheng, Y., Black, M.J., Hilliges, O., Geiger, A.: Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11594–11604 (2021)
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Conference on Computer Vision and Pattern Recognition (2019)
Collet, A., et al.: High-quality streamable free-viewpoint video. Trans. Graphics 34, 1–13 (2015)
De Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008)
Deng, B., et al.: NASA neural articulated shape approximation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 612–628. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_36
Guo, K., et al.: The relightables: volumetric performance capture of humans with realistic relighting. ACM Trans. Graphics (ToG) 38(6), 1–19 (2019)
Hasler, N., Thormählen, T., Rosenhahn, B., Seidel, H.P.: Learning skeletons for shape and pose. In: SIGGRAPH Symposium on Interactive 3D Graphics and Games (2010)
Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: Arch: animatable reconstruction of clothed humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3093–3102 (2020)
James, D.L., Twigg, C.D.: Skinning mesh animations. Trans. Graphics 24, 399–407 (2005)
Jiang, B., Zhang, J., Cai, J., Zheng, J.: Disentangled human body embedding based on deep hierarchical neural network. Trans. Visual. Comput. Graphics 26, 2560–2575 (2020)
Li, H., et al.: Temporally coherent completion of dynamic shapes. ACM Trans. Graphics (TOG) 31(1), 1–11 (2012)
Li, K., et al.: SPA: sparse photorealistic animation using a single RGB-D camera. Trans. Circuits Syst. Video Technol. 27, 771–783 (2016)
Li, R., et al.: Learning formation of physically-based face attributes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3410–3419 (2020)
Li, R., Xiu, Y., Saito, S., Huang, Z., Olszewski, K., Li, H.: Monocular real-time volumetric performance capture. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 49–67. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_4
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Conference on Computer Vision and Pattern Recognition (2021)
Liu, L., Habermann, M., Rudnev, V., Sarkar, K., Gu, J., Theobalt, C.: Neural actor: neural free-view synthesis of human actors with pose control. ACM Trans. Graphics (TOG) 40(6), 1–16 (2021)
Liu, S., Li, T., Chen, W., Li, H.: A general differentiable mesh renderer for image-based 3D reasoning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 50–62 (2020)
Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: Learning dynamic renderable volumes from images. ACM Trans. Graph. 38(4), 65:1-65:14 (2019)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. Trans. Graphics 34, 1–16 (2015)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Conference on Computer Vision and Pattern Recognition (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Occupancy flow: 4D reconstruction by learning particle dynamics. In: International Conference on Computer Vision (2019)
Noguchi, A., Sun, X., Lin, S., Harada, T.: Neural articulated radiance field. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5762–5772 (2021)
Osman, A.A.A., Bolkart, T., Black, M.J.: STAR: sparse trained articulated human body regressor. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 598–613. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_36
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: Conference on Computer Vision and Pattern Recognition (2019)
Park, K., et al.: Nerfies: deformable neural radiance fields. In: International Conference on Computer Vision (2021)
Park, K., et al.: Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph. 40(6) (2021)
Peng, S., et al.: Animatable neural radiance fields for modeling dynamic human bodies. In: International Conference on Computer Vision (2021)
Peng, S., et al.: Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9054–9063 (2021)
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-nerf: neural radiance fields for dynamic scenes. In: Conference on Computer Vision and Pattern Recognition (2021)
Raj, A., Tanke, J., Hays, J., Vo, M., Stoll, C., Lassner, C.: ANR: articulated neural rendering for virtual avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3722–3731 (2021)
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFU: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2304–2314 (2019)
Saito, S., Yang, J., Ma, Q., Black, M.J.: Scanimate: weakly supervised learning of skinned clothed avatar networks. In: Conference on Computer Vision and Pattern Recognition (2021)
Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3d-structure-aware neural scene representations. In: Advances in Neural Information Processing Systems (2019)
Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Comput. Graphics Appl. 27(3), 21–31 (2007)
Su, S.Y., Yu, F., Zollhöfer, M., Rhodin, H.: A-nerf: articulated neural radiance fields for learning human shape, appearance, and pose. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Tretschk, E., Tewari, A., Golyanik, V., Zollhöfer, M., Lassner, C., Theobalt, C.: Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In: International Conference on Computer Vision. IEEE (2021)
Volino, M., Casas, D., Collomosse, J.P., Hilton, A.: Optimal representation of multi-view video. In: British Machine Vision Conference (2014)
Weng, C.Y., Curless, B., Srinivasan, P.P., Barron, J.T., Kemelmacher-Shlizerman, I.: Humannerf: free-viewpoint rendering of moving people from monocular video. In: CVPR (2022)
Xu, F., et al.: Video-based characters: creating new human performances from a multi-view video database. In: ACM SIGGRAPH 2011 papers (2011)
Xu, Z., Zhou, Y., Kalogerakis, E., Landreth, C., Singh, K.: RigNet: neural rigging for articulated characters. Trans. Graphics (2020)
Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks. In: Conference on Computer Vision and Pattern Recognition (2022)
Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: Plenoctrees for real-time rendering of neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5752–5761 (2021)
Zhi, T., Lassner, C., Tung, T., Stoll, C., Narasimhan, S.G., Vo, M.: TexMesh: reconstructing detailed human texture and geometry from RGB-D video. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 492–509. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_29
Zhou, K., Bhatnagar, B.L., Pons-Moll, G.: Unsupervised shape and pose disentanglement for 3D meshes. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 341–357. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_21
Acknowledgements
Ruilong Li’s work at UC Berkeley is partly supported by the CONIX Research Center, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, R. et al. (2022). TAVA: Template-free Animatable Volumetric Actors. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13692. Springer, Cham. https://doi.org/10.1007/978-3-031-19824-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-19824-3_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19823-6
Online ISBN: 978-3-031-19824-3
eBook Packages: Computer ScienceComputer Science (R0)