Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation

Published: 08 March 2024 Publication History

Abstract

Head generation with diverse identities is an important task in computer vision and computer graphics, widely used in multimedia applications. However, current full-head generation methods require a large number of three-dimensional (3D) scans or multi-view images to train the model, resulting in expensive data acquisition costs. To address this issue, we propose Head3D, a method to generate full 3D heads with limited multi-view images. Specifically, our approach first extracts facial priors represented by tri-planes learned in EG3D, a 3D-aware generative model, and then proposes feature distillation to deliver the 3D frontal faces within complete heads without compromising head integrity. To mitigate the domain gap between the face and head models, we present a dual-discriminator to guide the frontal and back head generation. Our model achieves cost-efficient and diverse complete head generation with photo-realistic renderings and high-quality geometry representations. Extensive experiments demonstrate the effectiveness of our proposed Head3D, both qualitatively and quantitatively.

References

[1]
Sizhe An, Hongyi Xu, Yichun Shi, Guoxian Song, Umit Y. Ogras, and Linjie Luo. 2023. PanoHead: Geometry-aware 3D full-head synthesis in 360deg. In CVPR. 20950–20959.
[2]
Mikołaj Bińkowski, Danica J. Sutherland, Michael Arbel, and Arthur Gretton. 2018. Demystifying MMD GANs. arXiv preprint arXiv:1801.01401 (2018).
[3]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In SIGGRAPH187–194.
[4]
James Booth, Epameinondas Antonakos, Stylianos Ploumpis, George Trigeorgis, Yannis Panagakis, and Stefanos Zafeiriou. 2017. 3D face morphable models “in-the-wild”. In CVPR. 48–57.
[5]
James Booth, Anastasios Roussos, Stefanos Zafeiriou, Allan Ponniah, and David Dunaway. 2016. A 3D morphable model learnt from 10,000 faces. In CVPR. 5543–5552.
[6]
Egor Burkov, Ruslan Rakhimov, Aleksandr Safin, Evgeny Burnaev, and Victor Lempitsky. 2022. Multi-NeuS: 3D head portraits from single image with neural implicit functions. arXiv preprint arXiv:2209.04436 (2022).
[7]
Chen Cao, Yanlin Weng, Shun Zhou, Yiying Tong, and Kun Zhou. 2013. FaceWarehouse: A 3D facial expression database for visual computing. IEEE TVCG (2013), 413–425.
[8]
Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J. Guibas, Jonathan Tremblay, Sameh Khamis, et al. 2022. Efficient geometry-aware 3D generative adversarial networks. In CVPR. 16123–16133.
[9]
Eric R. Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2021. pi-GAN: Periodic implicit generative adversarial networks for 3D-aware image synthesis. In CVPR. 5799–5809.
[10]
Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. 2022. Tensorf: Tensorial radiance fields. In ECCV. Springer, 333–350.
[11]
Hanting Chen, Yunhe Wang, Han Shu, Changyuan Wen, Chunjing Xu, Boxin Shi, Chao Xu, and Chang Xu. 2020. Distilling portable generative adversarial networks for image translation. In AAAI.
[12]
Jang Hyun Cho and Bharath Hariharan. 2019. On the efficacy of knowledge distillation. In ICCV. 4794–4802.
[13]
Hang Dai, Nick Pears, William A. P. Smith, and Christian Duncan. 2017. A 3D morphable model of craniofacial shape and texture variation. In Proceedings of the IEEE International Conference on Computer Vision. 3085–3093.
[14]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In CVPR. 4690–4699.
[15]
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3D face reconstruction with weakly-supervised learning: From single image to image set. In CVPR Workshops. 0–0.
[16]
Yao Feng, Haiwen Feng, Michael J. Black, and Timo Bolkart. 2021. Learning an animatable detailed 3D face model from in-the-wild images. TOG (2021), 1–13.
[17]
Rinon Gal, Or Patashnik, Haggai Maron, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. Stylegan-nada: Clip-guided domain adaptation of image generators. TOG (2022), 1–13.
[18]
Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, and Thomas Funkhouser. 2020. Local deep implicit functions for 3D shape. In CVPR. 4857–4866.
[19]
Thomas Gerig, Andreas Morel-Forster, Clemens Blumer, Bernhard Egger, Marcel Luthi, Sandro Schönborn, and Thomas Vetter. 2018. Morphable face models—an open framework. In FG. 75–82.
[20]
Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, and Matthias Nießner. 2022. Learning neural parametric head models. arXiv preprint arXiv:2212.02761 (2022).
[21]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM (2020), 139–144.
[22]
Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nießner, and Justus Thies. 2022. Neural head avatars from monocular RGB videos. In CVPR. 18653–18664.
[23]
Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. 2022. StyleNeRF: A style-based 3D aware generator for high-resolution image synthesis. In ICLR.
[24]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NeurIPS (2017).
[25]
Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015).
[26]
Trang-Thi Ho, John Jethro Virtusio, Yung-Yao Chen, Chih-Ming Hsu, and Kai-Lung Hua. 2020. Sketch-guided deep portrait generation. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), 1–18.
[27]
Yang Hong, Bo Peng, Haiyao Xiao, Ligang Liu, and Juyong Zhang. 2022. HeadNeRF: A real-time NeRF-based parametric head model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20374–20384.
[28]
Liang Hou, Zehuan Yuan, Lei Huang, Huawei Shen, Xueqi Cheng, and Changhu Wang. [n. d.]. Slimmable generative adversarial networks. In AAAI.
[29]
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training generative adversarial networks with limited data. NeurIPS (2020), 12104–12114.
[30]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. 4401–4410.
[31]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of StyleGAN. In CVPR.
[32]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR, Yoshua Bengio and Yann LeCun (Eds.).
[33]
Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, and Song Han. 2020. GAN compression: Efficient architectures for interactive conditional GANs. In CVPR. 5284–5294.
[34]
Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans. ACM TOG (2017).
[35]
Yidong Li, Wenhua Liu, Yi Jin, and Yuanzhouhan Cao. 2021. SPGAN: Face forgery using spoofing generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 1–20.
[36]
Shiguang Liu and Huixin Wang. 2023. Talking face generation via facial anatomy. ACM Transactions on Multimedia Computing, Communications and Applications 19, 3 (2023), 1–19.
[37]
Yuchen Liu, Zhixin Shu, Yijun Li, Zhe Lin, Federico Perazzi, and Sun-Yuan Kung. 2021. Content-aware GAN compression. In CVPR. 12156–12166.
[38]
Quan Meng, Anpei Chen, Haimin Luo, Minye Wu, Hao Su, Lan Xu, Xuming He, and Jingyi Yu. 2021. GNeRF: GAN-based neural radiance field without posed camera. In ICCV. 6351–6361.
[39]
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2021. NeRF: Representing scenes as neural radiance fields for view synthesis. Commun. ACM (2021), 99–106.
[40]
Michael Niemeyer and Andreas Geiger. 2021. GIRAFFE: Representing scenes as compositional generative neural feature fields. In CVPR. 11453–11464.
[41]
Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, and Ira Kemelmacher-Shlizerman. 2022. StyleSDF: High-resolution 3D-consistent image and geometry generation. In CVPR. 13493–13503.
[42]
Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In CVPR. 3967–3976.
[43]
Ankur Patel and William A. P. Smith. 2009. 3D morphable face models revisited. In CVPR. 1327–1334.
[44]
Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In AVSS. 296–301.
[45]
Stylianos Ploumpis, Evangelos Ververas, Eimear O’Sullivan, Stylianos Moschoglou, Haoyang Wang, Nick Pears, William A. P. Smith, Baris Gecer, and Stefanos Zafeiriou. 2020. Towards a complete 3D morphable model of the human head. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2020), 4142–4160.
[46]
Stylianos Ploumpis, Haoyang Wang, Nick Pears, William A. P. Smith, and Stefanos Zafeiriou. 2019. Combining 3D morphable models: A large scale face-and-head model. In CVPR. 10934–10943.
[47]
Eduard Ramon, Gil Triginer, Janna Escur, Albert Pumarola, Jaime Garcia, Xavier Giro-i Nieto, and Francesc Moreno-Noguer. 2021. H3D-Net: Few-shot high-fidelity 3D head reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5620–5629.
[48]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).
[49]
Katja Schwarz, Yiyi Liao, Michael Niemeyer, and Andreas Geiger. 2020. GRAF: Generative radiance fields for 3D-aware image synthesis. NeurIPS (2020), 20154–20166.
[50]
Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020. InterFaceGAN: Interpreting the disentangled face representation learned by GANs. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 4 (2020), 2004–2018.
[51]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.
[52]
Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. 2020. Implicit neural representations with periodic activation functions. NeurIPS (2020), 7462–7473.
[53]
Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, and Peter Wonka. 2022. EpiGRAF: Rethinking training of 3D GANs. CoRR abs/2206.10535 (2022).
[54]
Jingxiang Sun, Xuan Wang, Yong Zhang, Xiaoyu Li, Qi Zhang, Yebin Liu, and Jue Wang. 2022. FENeRF: Face editing in neural radiance fields. In CVPR. 7672–7682.
[55]
Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive representation distillation. In ICLR.
[56]
Luan Tran, Feng Liu, and Xiaoming Liu. 2019. Towards high-fidelity nonlinear 3D face morphable model. In CVPR. 1126–1135.
[57]
Daoye Wang, Prashanth Chandran, Gaspard Zoss, Derek Bradley, and Paulo Gotardo. 2022. MoRF: Morphable radiance fields for multiview neural head modeling. In ACM SIGGRAPH 2022 Conference Proceedings. 1–9.
[58]
Haotao Wang, Shupeng Gui, Haichuan Yang, Ji Liu, and Zhangyang Wang. 2020. GAN slimming: All-in-one GAN compression by a unified optimization framework. In ECCV. 54–73.
[59]
Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, et al. 2022. Rodin: A generative model for sculpting 3D digital avatars using diffusion. arXiv preprint arXiv:2212.06135 (2022).
[60]
Xueping Wang, Yunhong Wang, and Weixin Li. 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Transactions on Multimedia Computing, Communications and Applications 15, 3s (2019), 1–23.
[61]
Sijing Wu, Yichao Yan, Yunhao Li, Yuhao Cheng, Wenhan Zhu, Ke Gao, Xiaobo Li, and Guangtao Zhai. 2023. GANHead: Towards generative animatable neural head avatars. In CVPR. 437–447.
[62]
Yiqian Wu, Jing Zhang, Hongbo Fu, and Xiaogang Jin. 2023. LPFF: A portrait dataset for face generators across large poses. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 20327–20337.
[63]
Jianfeng Xiang, Jiaolong Yang, Yu Deng, and Xin Tong. 2023. GRAM-HD: 3D-consistent image generation at high resolution with generative radiance manifolds. In ICCV. 2195–2205.
[64]
Guodong Xu, Yuenan Hou, Ziwei Liu, and Chen Change Loy. 2022. Mind the gap in distilling StyleGANs. arXiv preprint arXiv:2208.08840 (2022).
[65]
Guodong Xu, Ziwei Liu, Xiaoxiao Li, and Chen Change Loy. 2020. Knowledge distillation meets self-supervision. In ECCV.
[66]
Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Shen, and Bolei Zhou. 2022. 3D-aware image synthesis via learning structural and textural representations. In CVPR. 18409–18418.
[67]
Han Xue, Jun Ling, Anni Tang, Li Song, Rong Xie, and Wenjun Zhang. 2023. High-fidelity face reenactment via identity-matched correspondence learning. ACM Transactions on Multimedia Computing, Communications and Applications 19, 3 (2023), 1–23.
[68]
Yang Xue, Yuheng Li, Krishna Kumar Singh, and Yong Jae Lee. 2022. GIRAFFE HD: A high-resolution 3D-aware generative model. In CVPR. 18419–18428.
[69]
Tarun Yenamandra, Ayush Tewari, Florian Bernard, Hans-Peter Seidel, Mohamed Elgharib, Daniel Cremers, and Christian Theobalt. 2021. i3DMM: Deep implicit 3d morphable model of human heads. In CVPR. 12803–12813.
[70]
Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In ECCV. 325–341.
[71]
Hang Yu, Chilam Cheang, Yanwei Fu, and Xiangyang Xue. 2023. Multi-view shape generation for a 3D human-like body. ACM Transactions on Multimedia Computing, Communications and Applications 19, 1 (2023), 1–22.
[72]
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586–595.
[73]
Xuanmeng Zhang, Zhedong Zheng, Daiheng Gao, Bang Zhang, Pan Pan, and Yi Yang. 2022. Multi-view consistent generative adversarial networks for 3D-aware image synthesis. In CVPR. IEEE, 18429–18438.
[74]
Youcai Zhang, Zhonghao Lan, Yuchen Dai, Fangao Zeng, Yan Bai, Jie Chang, and Yichen Wei. 2020. Prime-aware adaptive distillation. In ECCV. 658–674.
[75]
Peng Zhou, Lingxi Xie, Bingbing Ni, and Qi Tian. 2021. CIPS-3D: A 3D-aware generator of GANs based on conditionally-independent pixel synthesis. CoRR abs/2110.09788 (2021).

Cited By

View all
  • (2024)3D-Aware Face Editing via Warping-Guided Latent Direction Learning2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00093(916-926)Online publication date: 16-Jun-2024
  • (2024)Recent advances in implicit representation-based 3D shape generationVisual Intelligence10.1007/s44267-024-00042-12:1Online publication date: 25-Mar-2024

Index Terms

  1. Head3D: Complete 3D Head Generation via Tri-plane Feature Distillation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 6
    June 2024
    715 pages
    EISSN:1551-6865
    DOI:10.1145/3613638
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 March 2024
    Online AM: 25 January 2024
    Accepted: 21 November 2023
    Revised: 15 October 2023
    Received: 13 July 2023
    Published in TOMM Volume 20, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Head generation
    2. neural radiance field
    3. adversarial generative network
    4. limited data

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Shanghai Municipal Science and Technology Major Project
    • CCF-Alibaba Innovative Research Fund For Young Scholars

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)474
    • Downloads (Last 6 weeks)43
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)3D-Aware Face Editing via Warping-Guided Latent Direction Learning2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00093(916-926)Online publication date: 16-Jun-2024
    • (2024)Recent advances in implicit representation-based 3D shape generationVisual Intelligence10.1007/s44267-024-00042-12:1Online publication date: 25-Mar-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media