research-article

Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation

Authors: Yuhao Cheng, Yichao Yan, Wenhan Zhu, Ye Pan, Bowen Pan, Xiaokang YangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 6

Article No.: 176, Pages 1 - 20

https://doi.org/10.1145/3635717

Published: 08 March 2024 Publication History

Abstract

Head generation with diverse identities is an important task in computer vision and computer graphics, widely used in multimedia applications. However, current full-head generation methods require a large number of three-dimensional (3D) scans or multi-view images to train the model, resulting in expensive data acquisition costs. To address this issue, we propose Head3D, a method to generate full 3D heads with limited multi-view images. Specifically, our approach first extracts facial priors represented by tri-planes learned in EG3D, a 3D-aware generative model, and then proposes feature distillation to deliver the 3D frontal faces within complete heads without compromising head integrity. To mitigate the domain gap between the face and head models, we present a dual-discriminator to guide the frontal and back head generation. Our model achieves cost-efficient and diverse complete head generation with photo-realistic renderings and high-quality geometry representations. Extensive experiments demonstrate the effectiveness of our proposed Head3D, both qualitatively and quantitatively.

References

[1]

Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Y. Ogras, and Linjie Luo. 2023. PanoHead: Geometry-aware 3D full-head synthesis in 360deg. In CVPR. 20950–20959.

[2]

Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, and Arthur Gretton. 2018. Demystifying MMD GANs. arXiv preprint arXiv:1801.01401 (2018).

[3]

Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In SIGGRAPH187–194.

[4]

James Booth, Epameinondas Antonakos, Stylianos Ploumpis, George Trigeorgis, Yannis Panagakis, and Stefanos Zafeiriou. 2017. 3D face morphable models “in-the-wild”. In CVPR. 48–57.

[5]

James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniah, and David Dunaway. 2016. A 3D morphable model learnt from 10,000 faces. In CVPR. 5543–5552.

[6]

Egor Burkov, Ruslan Rakhimov, Aleksandr Safin, Evgeny Burnaev, and Victor Lempitsky. 2022. Multi-NeuS: 3D head portraits from single image with neural implicit functions. arXiv preprint arXiv:2209.04436 (2022).

[7]

Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013. FaceWarehouse: A 3D facial expression database for visual computing. IEEE TVCG (2013), 413–425.

[8]

Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J. Guibas, Jonathan Tremblay, Sameh Khamis, et al. 2022. Efficient geometry-aware 3D generative adversarial networks. In CVPR. 16123–16133.

[9]

Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2021. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis. In CVPR. 5799–5809.

[10]

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. 2022. Tensorf: Tensorial radiance fields. In ECCV. Springer, 333–350.

[11]

Hanting Chen, Yunhe Wang, Han Shu, Changyuan Wen, Chunjing Xu, Boxin Shi, Chao Xu, and Chang Xu. 2020. Distilling portable generative adversarial networks for image translation. In AAAI.

[12]

Jang Hyun Cho and Bharath Hariharan. 2019. On the efficacy of knowledge distillation. In ICCV. 4794–4802.

[13]

Hang Dai, Nick Pears, William A. P. Smith, and Christian Duncan. 2017. A 3D morphable model of craniofacial shape and texture variation. In Proceedings of the IEEE International Conference on Computer Vision. 3085–3093.

[14]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In CVPR. 4690–4699.

[15]

Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set. In CVPR Workshops. 0–0.

[16]

Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. 2021. Learning an animatable detailed 3D face model from in-the-wild images. TOG (2021), 1–13.

[17]

Rinon Gal, Or Patashnik, Haggai Maron, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. Stylegan-nada: Clip-guided domain adaptation of image generators. TOG (2022), 1–13.

Digital Library

[18]

Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, and Thomas Funkhouser. 2020. Local deep implicit functions for 3D shape. In CVPR. 4857–4866.

[19]

Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. 2018. Morphable face models—an open framework. In FG. 75–82.

[20]

Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, and Matthias Nießner. 2022. Learning neural parametric head models. arXiv preprint arXiv:2212.02761 (2022).

[21]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM (2020), 139–144.

Digital Library

[22]

Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nießner, and Justus Thies. 2022. Neural head avatars from monocular RGB videos. In CVPR. 18653–18664.

[23]

Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. 2022. StyleNeRF: A style-based 3D aware generator for high-resolution image synthesis. In ICLR.

[24]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NeurIPS (2017).

[25]

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015).

[26]

Trang-Thi Ho, John Jethro Virtusio, Yung-Yao Chen, Chih-Ming Hsu, and Kai-Lung Hua. 2020. Sketch-guided deep portrait generation. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), 1–18.

Digital Library

[27]

Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, and Juyong Zhang. 2022. HeadNeRF: A real-time NeRF-based parametric head model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20374–20384.

[28]

Liang Hou, Zehuan Yuan, Lei Huang, Huawei Shen, Xueqi Cheng, and Changhu Wang. [n. d.]. Slimmable generative adversarial networks. In AAAI.

[29]

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training generative adversarial networks with limited data. NeurIPS (2020), 12104–12114.

[30]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. 4401–4410.

[31]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of StyleGAN. In CVPR.

[32]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR, Yoshua Bengio and Yann LeCun (Eds.).

[33]

Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, and Song Han. 2020. GAN compression: Efficient architectures for interactive conditional GANs. In CVPR. 5284–5294.

[34]

Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM TOG (2017).

Digital Library

[35]

Yidong Li, Wenhua Liu, Yi Jin, and Yuanzhouhan Cao. 2021. SPGAN: Face forgery using spoofing generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1–20.

Digital Library

[36]

Shiguang Liu and Huixin Wang. 2023. Talking face generation via facial anatomy. ACM Transactions on Multimedia Computing, Communications and Applications 19, 3 (2023), 1–19.

Digital Library

[37]

Yuchen Liu, Zhixin Shu, Yijun Li, Zhe Lin, Federico Perazzi, and Sun-Yuan Kung. 2021. Content-aware GAN compression. In CVPR. 12156–12166.

[38]

Quan Meng, Anpei Chen, Haimin Luo, Minye Wu, Hao Su, Lan Xu, Xuming He, and Jingyi Yu. 2021. GNeRF: GAN-based neural radiance field without posed camera. In ICCV. 6351–6361.

[39]

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: Representing scenes as neural radiance fields for view synthesis. Commun. ACM (2021), 99–106.

[40]

Michael Niemeyer and Andreas Geiger. 2021. GIRAFFE: Representing scenes as compositional generative neural feature fields. In CVPR. 11453–11464.

[41]

Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, and Ira Kemelmacher-Shlizerman. 2022. StyleSDF: High-resolution 3D-consistent image and geometry generation. In CVPR. 13493–13503.

[42]

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In CVPR. 3967–3976.

[43]

Ankur Patel and William A. P. Smith. 2009. 3D morphable face models revisited. In CVPR. 1327–1334.

[44]

Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In AVSS. 296–301.

[45]

Stylianos Ploumpis, Evangelos Ververas, Eimear O’Sullivan, Stylianos Moschoglou, Haoyang Wang, Nick Pears, William A. P. Smith, Baris Gecer, and Stefanos Zafeiriou. 2020. Towards a complete 3D morphable model of the human head. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2020), 4142–4160.

[46]

Stylianos Ploumpis, Haoyang Wang, Nick Pears, William A. P. Smith, and Stefanos Zafeiriou. 2019. Combining 3D morphable models: A large scale face-and-head model. In CVPR. 10934–10943.

[47]

Eduard Ramon, Gil Triginer, Janna Escur, Albert Pumarola, Jaime Garcia, Xavier Giro-i Nieto, and Francesc Moreno-Noguer. 2021. H3D-Net: Few-shot high-fidelity 3D head reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5620–5629.

[48]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).

[49]

Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. GRAF: Generative radiance fields for 3D-aware image synthesis. NeurIPS (2020), 20154–20166.

[50]

Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020. InterFaceGAN: Interpreting the disentangled face representation learned by GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2020), 2004–2018.

[51]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.

[52]

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. 2020. Implicit neural representations with periodic activation functions. NeurIPS (2020), 7462–7473.

[53]

Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, and Peter Wonka. 2022. EpiGRAF: Rethinking training of 3D GANs. CoRR abs/2206.10535 (2022).

[54]

Jingxiang Sun, Xuan Wang, Yong Zhang, Xiaoyu Li, Qi Zhang, Yebin Liu, and Jue Wang. 2022. FENeRF: Face editing in neural radiance fields. In CVPR. 7672–7682.

[55]

Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive representation distillation. In ICLR.

[56]

Luan Tran, Feng Liu, and Xiaoming Liu. 2019. Towards high-fidelity nonlinear 3D face morphable model. In CVPR. 1126–1135.

[57]

Daoye Wang, Prashanth Chandran, Gaspard Zoss, Derek Bradley, and Paulo Gotardo. 2022. MoRF: Morphable radiance fields for multiview neural head modeling. In ACM SIGGRAPH 2022 Conference Proceedings. 1–9.

Digital Library

[58]

Haotao Wang, Shupeng Gui, Haichuan Yang, Ji Liu, and Zhangyang Wang. 2020. GAN slimming: All-in-one GAN compression by a unified optimization framework. In ECCV. 54–73.

[59]

Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. 2022. Rodin: A generative model for sculpting 3D digital avatars using diffusion. arXiv preprint arXiv:2212.06135 (2022).

[60]

Xueping Wang, Yunhong Wang, and Weixin Li. 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Transactions on Multimedia Computing, Communications and Applications 15, 3s (2019), 1–23.

Digital Library

[61]

Sijing Wu, Yichao Yan, Yunhao Li, Yuhao Cheng, Wenhan Zhu, Ke Gao, Xiaobo Li, and Guangtao Zhai. 2023. GANHead: Towards generative animatable neural head avatars. In CVPR. 437–447.

[62]

Yiqian Wu, Jing Zhang, Hongbo Fu, and Xiaogang Jin. 2023. LPFF: A portrait dataset for face generators across large poses. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 20327–20337.

[63]

Jianfeng Xiang, Jiaolong Yang, Yu Deng, and Xin Tong. 2023. GRAM-HD: 3D-consistent image generation at high resolution with generative radiance manifolds. In ICCV. 2195–2205.

[64]

Guodong Xu, Yuenan Hou, Ziwei Liu, and Chen Change Loy. 2022. Mind the gap in distilling StyleGANs. arXiv preprint arXiv:2208.08840 (2022).

[65]

Guodong Xu, Ziwei Liu, Xiaoxiao Li, and Chen Change Loy. 2020. Knowledge distillation meets self-supervision. In ECCV.

[66]

Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Shen, and Bolei Zhou. 2022. 3D-aware image synthesis via learning structural and textural representations. In CVPR. 18409–18418.

[67]

Han Xue, Jun Ling, Anni Tang, Li Song, Rong Xie, and Wenjun Zhang. 2023. High-fidelity face reenactment via identity-matched correspondence learning. ACM Transactions on Multimedia Computing, Communications and Applications 19, 3 (2023), 1–23.

Digital Library

[68]

Yang Xue, Yuheng Li, Krishna Kumar Singh, and Yong Jae Lee. 2022. GIRAFFE HD: A high-resolution 3D-aware generative model. In CVPR. 18419–18428.

[69]

Tarun Yenamandra, Ayush Tewari, Florian Bernard, Hans-Peter Seidel, Mohamed Elgharib, Daniel Cremers, and Christian Theobalt. 2021. i3DMM: Deep implicit 3d morphable model of human heads. In CVPR. 12803–12813.

[70]

Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In ECCV. 325–341.

[71]

Hang Yu, Chilam Cheang, Yanwei Fu, and Xiangyang Xue. 2023. Multi-view shape generation for a 3D human-like body. ACM Transactions on Multimedia Computing, Communications and Applications 19, 1 (2023), 1–22.

Digital Library

[72]

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–595.

[73]

Xuanmeng Zhang, Zhedong Zheng, Daiheng Gao, Bang Zhang, Pan Pan, and Yi Yang. 2022. Multi-view consistent generative adversarial networks for 3D-aware image synthesis. In CVPR. IEEE, 18429–18438.

[74]

Youcai Zhang, Zhonghao Lan, Yuchen Dai, Fangao Zeng, Yan Bai, Jie Chang, and Yichen Wei. 2020. Prime-aware adaptive distillation. In ECCV. 658–674.

[75]

Peng Zhou, Lingxi Xie, Bingbing Ni, and Qi Tian. 2021. CIPS-3D: A 3D-aware generator of GANs based on conditionally-independent pixel synthesis. CoRR abs/2110.09788 (2021).

Cited By

Sun JWu TGao L(2024)Recent advances in implicit representation-based 3D shape generationVisual Intelligence10.1007/s44267-024-00042-12:1Online publication date: 25-Mar-2024
https://doi.org/10.1007/s44267-024-00042-1

Index Terms

Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

AvatarMAV: Fast 3D Head Avatar Reconstruction Using Motion-Aware Neural Voxels
SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings

With NeRF widely used for facial reenactment, recent methods can recover photo-realistic 3D head avatar from just a monocular video. Unfortunately, the training process of the NeRF-based methods is quite time-consuming, as MLP used in the NeRF-based ...
Head gesture recognition using feature interpolation
KES'06: Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I

This paper addresses a technique of recognizing a head gesture. The proposed system is composed of eye tracking and head motion decision. The eye tracking step is divided into face detection, eye location and eye feature interpolation. Face detection ...
HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation
SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

We present HeadArtist for 3D head generation following human-language descriptions. With a landmark-guided ControlNet serving as a generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 6

June 2024

715 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3613638

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2024

Online AM: 25 January 2024

Accepted: 21 November 2023

Revised: 15 October 2023

Received: 13 July 2023

Published in TOMM Volume 20, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Shanghai Municipal Science and Technology Major Project
CCF-Alibaba Innovative Research Fund For Young Scholars

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
351
Total Downloads

Downloads (Last 12 months)351
Downloads (Last 6 weeks)69

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun JWu TGao L(2024)Recent advances in implicit representation-based 3D shape generationVisual Intelligence10.1007/s44267-024-00042-12:1Online publication date: 25-Mar-2024
https://doi.org/10.1007/s44267-024-00042-1

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents