Learning to disentangle latent physical factors of deformable faces

Ha, Inwoo; Chang, Hyun Sung; Son, Minjung; Yoon, Sung-eui

doi:10.1007/s00371-023-02948-1

Learning to disentangle latent physical factors of deformable faces

Original article
Published: 20 July 2023

Volume 39, pages 3481–3494, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Inwoo Ha^1,2,
Hyun Sung Chang¹,
Minjung Son¹ &
…
Sung-eui Yoon²

188 Accesses
Explore all metrics

Abstract

We proposed a monocular image disentanglement framework based on a compositional model. Our model disentangles the input image into its constituent components of albedo, depth, deformation, pose, and illumination. Instead of relying on any handcrafted priors, we trained our deep neural network to understand the physical meaning of each element by mimicking real-world operations, allowing it to reconstruct images in a self-supervised manner. Our model, trained on multi-frame images of each subject, demonstrates a better understanding of the objects without requiring any supervision or strong model assumptions. We utilized a deformation-free canonical space to align multi-frame images in the same space. This approach enables the understanding of information from multi-frame images in the same space. Our experiments showed that our approach accurately disentangled the physical elements of deformable faces from images with wide variations found in the wild.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Metrical Reconstruction of Human Faces

“Look Ma, No Landmarks!” – Unsupervised, Model-Based Dense Face Alignment

Monocular 3D Object Reconstruction with GAN Inversion

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data that support the findings of this study are openly available in VoxCeleb2 and Basel Face Model at www.robots.ox.ac.uk/~vgg/data/voxceleb/vox2.html, reference number [9] and https://faces.dmi.unibas.ch/bfm, reference number [37], respectively.

References

Abrevaya, V.F., Boukhayma, A., Torr, P.H., Boyer, E.: Cross-modal deep face normals with deactivable skip connections. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4979–4989 (2020)
Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: international Conference on Computer Vision (2015)
Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1670–1687 (2015)
Article Google Scholar
Barrow, H.: Recovering intrinsic scene characteristics from images. Comput. Vis. Syst. pp. 3–26 (1978). Cited By (since 1996) 143
Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. ACM Trans. Graph. (2014). https://doi.org/10.1145/2601097.2601206
Article Google Scholar
Blanz, V., Basso, C., Poggio, T., Vetter, T.: Reanimating faces in images and video. Comput. Graph. Forum (2003). https://doi.org/10.1111/1467-8659.t01-1-00712
Article Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Annual Conference on Computer Graphics and Interactive Techniques (Proc. SIGGRAPH 1999), pp. 187–194 (1999)
Burkov, E., Pasechnik, I., Grigorev, A., Lempitsky, V.: Neural head reentactment with latent pose descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (2020)
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: INTERSPEECH (2018)
Daněček, R., Black, M.J., Bolkart, T.: Emoca: Emotion driven monocular face capture and animation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20,311–20,322 (2022)
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 605–613 (2017)
Fan, Q., Yang, J., Hua, G., Chen, B., Wipf, D.: Revisiting deep intrinsic image decompositions. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8944–8952 (2018). https://doi.org/10.1109/CVPR.2018.00932
Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. (ToG) 40(4), 1–13 (2021)
Article Google Scholar
Geiger, A., Ziegler, J., Stiller, C.: StereoScan: gense 3D reconstruction in real-time. In: IEEE Intelligent Vehicles Symposium (IV), pp. 963–968 (2011)
Georgoulis, S., Rematas, K., Ritschel, T., Gavves, E., Fritz, M., Van Gool, L., Tuytelaars, T.: Reflectance and natural illumination from single-material specular objects using deep learning. IEEE Trans. Pattern Anal. Mach. Intell. 40(8), 1932–1947 (2018). https://doi.org/10.1109/TPAMI.2017.2742999
Article Google Scholar
Goel, S., Kanazawa, A., Malik, J.: Shape and viewpoint without keypoints. In: European Conference on Computer Vision (2020)
Henderson, P., Ferrari, V.: Learning to generate and reconstruct 3D meshes with only 2D supervision. arXiv preprint arXiv:1807.09259 (2018)
Horn, B.K.P.: Obtaining shape from shading information. In: Winston, P.H. (ed.) The Psychology of Computer Vision. McGraw-Hill (1975)
Google Scholar
Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: Advances in Neural Information Processing Systems (2018)
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (eds.) Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XV, Lecture Notes in Computer Science, vol. 11219, pp. 386–402. Springer (2018). https://doi.org/10.1007/978-3-030-01267-0_23
Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Niessner, M., Pérez, P., Richardt, C., Zollhöfer, M., Theobalt, C.: Deep video portraits. ACM Trans. Graph. (Proc. SIGGRAPH 2018) 37(4), 1–14 (2018)
Article Google Scholar
Kim, H., Zollhöfer, M., Tewari, A., Thies, J., Richardt, C., Theobalt, C.: Inversefacenet: Deep monocular inverse face rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Kovacs, B., Bell, S., Snavely, N., Bala, K.: Shading annotations in the wild. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 850–859 (2017). https://doi.org/10.1109/CVPR.2017.97
Liu, F., Liu, X.: 2D gans meet unsupervised single-view 3D reconstruction. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, pp. 497–514. Springer (2022)
Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Lombardi, S., Nishino, K.: Reflectance and illumination recovery in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 129–141 (2016). https://doi.org/10.1109/TPAMI.2015.2430318
Article Google Scholar
Meka, A., Haene, C., Pandey, R., Zollhoefer, M., Fanello, S., Fyffe, G., Kowdle, A., Yu, X., Busch, J., Dourgarian, J., Denny, P., Bouaziz, S., Lincoln, P., Whalen, M., Harvey, G., Taylor, J., Izadi, S., Tagliasacchi, A., Debevec, P., Theobalt, C., Valentin, J., Rhemann, C.: Deep reflectance fields—high-quality facial reflectance field inference from color gradient illumination. ACM Trans. Graph. (Proceedings SIGGRAPH) 38(4), 1–12 (2019). https://doi.org/10.1145/3306346.3323027
Article Google Scholar
Meka, A., Maximov, M., Zollhoefer, M., Chatterjee, A., Seidel, H.P., Richardt, C., Theobalt, C.: Lime: Live intrinsic material estimation. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2018). http://gvv.mpi-inf.mpg.de/projects/LIME/
Mobahi, H., Liu, C., Freeman, W.T.: A compositional model for low-dimensional image set representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Nestmeyer, T., Lalonde, J.F., Matthews, I., Lehrmann, A.: Learning physics-guided face relighting under directional light. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Novotny, D., Larlus, D., Vedaldi, A.: Learning 3D object categories by looking around them. In: International Conference on Computer Vision (2017)
Ondrúška, P., Kohli, P., Izadi, S.: MobileFusion: real-time volumetric surface reconstruction and dense tracking on mobile phones. IEEE Trans. Vis. Comput. Graph. 21(11), 1251–1258 (2015)
Article Google Scholar
Pan, X., Dai, B., Liu, Z., Loy, C.C., Luo, P.: Do 2D Gans know 3D shape? unsupervised 3D shape reconstruction from 2D image Gans. In: International Conference on Learning Representations (2021)
Pan, X., Dai, B., Liu, Z., Loy, C.C., Luo, P.: Do 2D gans know 3D shape? unsupervised 3d shape reconstruction from 2D image Gans. In: International Conference on Learning Representations (2021)
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D face model for pose and illumination invariant face recognition. In: 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 296–301. IEEE (2009)
Ramamoorthi, R., Hanrahan, P.: An efficient representation for irradiance environment maps. ACM Trans. Graph. (Proc/ SIGGRAPH 2001) 20(3), 497–500 (2001)
Google Scholar
Sengupta, S., Kanazawa, A., Castillo, C.D., Jacobs, D.W.: SfSNet: Learning shape, reflectance and illuminance of faces ‘in the wild’. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6296–6305 (2018)
Shang, J., Shen, T., Li, S., Zhou, L., Zhen, M., Fang, T., Quan, L.: Self-supervised monocular 3D face reconstruction by occlusion-aware multi-view geometry consistency. arXiv preprint arXiv:2007.12494 (2020)
Shu, Z., Sahasrabudhe, M., Güler, R.A., Samaras, D., Paragios, N., Kokkinos, I.: Deforming autoencoders: unsupervised disentangling of shape and appearance. In: Proceedings of the European conference on computer vision, pp. 650–665 (2018)
Shu, Z., Yumer, E., Hadap, S., Sunkavalli, K., Shechtman, E., Samaras, D.: Neural face editing with intrinsic image disentangling. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, pp. 5444–5453. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.578
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Sun, T., Barron, J.T., Tsai, Y.T., Xu, Z., Yu, X., Fyffe, G., Rhemann, C., Busch, J., Debevec, P., Ramamoorthi, R.: Single image portrait relighting. ACM Trans. Graph. (2019). https://doi.org/10.1145/3306346.3323008
Article Google Scholar
Tewari, A., Bernard, F., Garrido, P., Bharaj, G., Elgharib, M., Seidel, H.P., Pérez, P., Zöllhofer, M., Theobalt, C.: Fml: Face model learning from videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10,812–10,822 (2019)
Tewari, A., Zollhofer, M., Kim, H., Garrido, P., Bernard, F., Perez, P., Theobalt, C.: Mofa: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops (2017)
Tran, L., Liu, X.: Nonlinear 3d face morphable model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Tran, L., Liu, X.: Nonlinear 3D face morphable model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Tran, L., Liu, X.: On learning 3d face morphable model from in-the-wild images. IEEE Tran. Pattern Anal. Mach. Intell. 43, 157–171 (2019)
Google Scholar
Tran, L., Liu, X.: On learning 3D face morphable model from in-the-wild images. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 157–171 (2021). https://doi.org/10.1109/TPAMI.2019.2927975
Article Google Scholar
Tulsiani, S., Efros, A.A., Malik, J.: Multi-view consistency as supervisory signal for learning shape and pose prediction. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., Brox, T.: DeMoN: Depth and motion network for learning monocular stereo. In: IEEE Conf. Comput. Vis. Pattern Recog., pp. 5038–5047 (2017)
Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Wen, Y., Liu, W., Raj, B., Singh, R.: Self-supervised 3d face reconstruction via conditional estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13,289–13,298 (2021)
Wiles, O., Gkioxari, G., Szeliski, R., Johnson, J.: Synsin: end-to-end view synthesis from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7467–7477 (2020)
Woodham, R.J.: Photometric method for determining surface orientation from multiple images. Opt. Eng. 19(1), 139–144 (1980)
Article Google Scholar
Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–10 (2020)
Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: Advances in Neural Information Processing systems, pp. 1696–1704 (2016)
Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: International Conference on Computer Vision (2019)
Zhang, K., Zhang, Z., Li, Z., Yu, Q.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape from shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999). https://doi.org/10.1109/34.784284
Article MATH Google Scholar
Zhang, Z., Ge, Y., Tai, Y., Cao, W., Chen, R., Liu, K., Tang, H., Huang, X., Wang, C., Xie, Z., et al.: Physically-guided disentangled implicit rendering for 3D face modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20,353–20,363 (2022)
Zhou, H., Hadap, S., Sunkavalli, K., Jacobs, D.W.: Deep single-image portrait relighting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: IEEE Conference On Computer Vision And Pattern Recognition (2017)

Download references

Author information

Authors and Affiliations

SAIT (Samsung Advanced Institute of Technology), Suwon-si, South Korea
Inwoo Ha, Hyun Sung Chang & Minjung Son
KAIST, Daejeon, South Korea
Inwoo Ha & Sung-eui Yoon

Authors

Inwoo Ha
View author publications
You can also search for this author in PubMed Google Scholar
Hyun Sung Chang
View author publications
You can also search for this author in PubMed Google Scholar
Minjung Son
View author publications
You can also search for this author in PubMed Google Scholar
Sung-eui Yoon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sung-eui Yoon.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ha, I., Chang, H.S., Son, M. et al. Learning to disentangle latent physical factors of deformable faces. Vis Comput 39, 3481–3494 (2023). https://doi.org/10.1007/s00371-023-02948-1

Download citation

Accepted: 09 June 2023
Published: 20 July 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00371-023-02948-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to disentangle latent physical factors of deformable faces

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Metrical Reconstruction of Human Faces

“Look Ma, No Landmarks!” – Unsupervised, Model-Based Dense Face Alignment

Monocular 3D Object Reconstruction with GAN Inversion

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning to disentangle latent physical factors of deformable faces

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Metrical Reconstruction of Human Faces

“Look Ma, No Landmarks!” – Unsupervised, Model-Based Dense Face Alignment

Monocular 3D Object Reconstruction with GAN Inversion

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation