research-article

Unlocking Potential of 3D-aware GAN for More Expressive Face Generation

Authors:

Sanghoon LeeAuthors Info & Claims

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

Pages 115 - 124

https://doi.org/10.1145/3591106.3592273

Published: 12 June 2023 Publication History

Abstract

As style-based image generators have achieved disentanglement in features by converting latent vector space to style vector space, numerous efforts have been made to enhance the controllability of the latent. However, existing methods for controllable models have limitations in precisely creating high-resolution faces with large expressions. The degradation is due to the dependence on the training dataset, as the high-resolution face datasets do not have sufficient expressive images. To tackle this challenge, we propose a robust training framework for 3D-aware generative adversarial networks to learn the high-quality generation of more expressive faces through a signed distance field. First, we propose a novel 3D enforcement loss to generate more expressive images in an unsupervised manner. Second, we introduce a partial training method to fine-tune the network on multiple datasets without loss of image resolution. Finally, we propose a ray-scaling scheme for the volume renderer to represent a face at arbitrary scales. Through the proposed framework, the network learns 3D face priors, such as expressional shapes of the parametric facial model, to generate detailed faces. The experimental results outperform the methods of the state of the art, showing strong benefits in the generation of high-resolution facial expressions.

References

[1]

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432–4441.

[2]

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2StyleGAN++: How to Edit the Embedded Images?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8296–8305.

[3]

Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6711–6720.

[4]

Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal, and Amit Bermano. 2022. HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18511–18521.

[5]

Volker Blanz and Thomas Vetter. 1999. A Morphable Model For The Synthesis Of 3D Faces. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques. 187–194.

Digital Library

[6]

Adrian Bulat and Georgios Tzimiropoulos. 2017. How far are we from solving the 2D & 3D Face Alignment problem?(and a dataset of 230,000 3D facial landmarks). In Proceedings of the IEEE International Conference on Computer Vision. 1021–1030.

[7]

Eric R Chan, Connor Z Lin, Matthew A Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J Guibas, Jonathan Tremblay, Sameh Khamis, 2022. Efficient Geometry-aware 3D Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16123–16133.

[8]

Eric R Chan, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Gordon Wetzstein. 2021. pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5799–5809.

[9]

Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. VoxCeleb2: Deep Speaker Recognition. arXiv preprint arXiv:1806.05622 (2018).

[10]

Antonia Creswell and Anil Anthony Bharath. 2018. Inverting the Generator of a Generative Adversarial Network. IEEE Transactions on Neural Networks and Learning Systems 30, 7 (2018), 1967–1974.

[11]

Radek Daněček, Michael J Black, and Timo Bolkart. 2022. EMOCA: Emotion Driven Monocular Face Capture and Animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20311–20322.

[12]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4690–4699.

[13]

Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong. 2020. Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5154–5163.

[14]

Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, and Xin Tong. 2019. Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 285–295.

[15]

Tan M Dinh, Anh Tuan Tran, Rang Nguyen, and Binh-Son Hua. 2022. HyperInverter: Improving StyleGAN Inversion via Hypernetwork. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11389–11398.

[16]

Partha Ghosh, Pravir Singh Gupta, Roy Uziel, Anurag Ranjan, Michael J Black, and Timo Bolkart. 2020. GIF: Generative Interpretable Faces. In Proceedings of the International Conference on 3D Vision. IEEE, 868–878.

[17]

Jiatao Gu, Lingjie Liu, Peng Wang, and Christian Theobalt. 2021. StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis. arXiv preprint arXiv:2110.08985 (2021).

[18]

Yuchao Gu, Xintao Wang, Liangbin Xie, Chao Dong, Gen Li, Ying Shan, and Ming-Ming Cheng. 2022. VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder. In Proceedings of the European Conference on Computer Vision. Springer, 126–143.

Digital Library

[19]

Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. Proceeding of the Advances in Neural Information Processing Systems 33, 9841–9850.

[20]

Zhenliang He, Meina Kan, and Shiguang Shan. 2021. EigenGAN: Layer-Wise Eigen-Learning for GANs. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14408–14417.

[21]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Proceedings of the Advances in Neural Information Processing Systems 30.

[22]

Mingyu Jang, Seongmin Lee, Jiwoo Kang, and Sanghoon Lee. 2022. Technical Consideration towards Robust 3D Reconstruction with Multi-View Active Stereo Sensors. Sensors 22, 11 (2022), 4142.

[23]

Jiwoo Kang, Seongmin Lee, Mingyu Jang, and Sanghoon Lee. 2021. Gradient Flow Evolution for 3D Fusion from a Single Depth Sensor. IEEE Transactions on Circuits and Systems for Video Technology 32, 4 (2021), 2211–2225.

[24]

Jiwoo Kang, Seongmin Lee, and Sanghoon Lee. 2020. UV Completion with Self-referenced Discrimination. In Eurographics (Short Papers). 61–64.

[25]

Jiwoo Kang, Seongmin Lee, and Sanghoon Lee. 2021. Competitive Learning of Facial Fitting and Synthesis Using UV Energy. IEEE Transactions on Systems, Man, and Cybernetics: Systems 52, 5 (2021), 2858–2873.

[26]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.

[27]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110–8119.

[28]

Seongmin Lee, Jeonghaeng Lee, Minsik Kim, and Sanghoon Lee. 2022. Region Adaptive Self-Attention for an Accurate Facial Emotion Recognition. In 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. IEEE, 791–796.

[29]

Yeonkyeong Lee, Taeho Choi, Hyunsung Go, Hyunjoon Lee, Sunghyun Cho, and Junho Kim. 2022. Exp-GAN: 3D-Aware Facial Image Generation with Expression Control. In Proceedings of the Asian Conference on Computer Vision. 3812–3827.

[30]

Tianye Li, Timo Bolkart, Michael. J. Black, Hao Li, and Javier Romero. 2017. Learning a Model of Facial Shape and Expression from 4D Scans. ACM Transactions on Graphics 36, 6 (2017), 194:1–194:17.

Digital Library

[31]

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. 2021. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1833–1844.

[32]

Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, and Sanja Fidler. 2021. EditGAN: High-Precision Semantic Image Editing. Proceedings of the Advances in Neural Information Processing Systems 34, 16331–16345.

[33]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of the International Conference on Computer Vision.

Digital Library

[34]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled Weight Decay Regularization. arXiv preprint arXiv:1711.05101 (2017).

[35]

Ali Mollahosseini, Behzad Hasani, and Mohammad H Mahoor. 2017. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Transactions on Affective Computing 10, 1 (2017), 18–31.

Digital Library

[36]

Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017).

[37]

Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, and Yong-Liang Yang. 2019. HoloGAN: Unsupervised Learning of 3D Representations From Natural Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7588–7597.

[38]

Thu H Nguyen-Phuoc, Christian Richardt, Long Mai, Yongliang Yang, and Niloy Mitra. 2020. BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images. Proceedings of the Advances in Neural Information Processing Systems 33, 6767–6778.

[39]

Michael Niemeyer and Andreas Geiger. 2021. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11453–11464.

[40]

Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, and Ira Kemelmacher-Shlizerman. 2022. StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13503–13513.

[41]

Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D Face Model for Pose and Illumination Invariant Face Recognition. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 296–301.

Digital Library

[42]

Yurui Ren, Ge Li, Yuanqi Chen, Thomas H Li, and Shan Liu. 2021. PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13759–13768.

[43]

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2287–2296.

[44]

Daniel Roich, Ron Mokady, Amit H Bermano, and Daniel Cohen-Or. 2022. Pivotal Tuning for Latent-based Editing of Real Images. ACM Transactions on Graphics 42, 1 (2022), 1–13.

Digital Library

[45]

Yujun Shen and Bolei Zhou. 2021. Closed-Form Factorization of Latent Semantics in GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1532–1540.

[46]

Yichun Shi, Divyansh Aggarwal, and Anil K Jain. 2021. Lifting 2D StyleGAN for 3D-Aware Face Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6258–6266.

[47]

Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, and Yebin Liu. 2022. IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-Aware Portrait Synthesis. ACM Transactions on Graphics 41, 6 (2022), 1–10.

Digital Library

[48]

Junshu Tang, Bo Zhang, Binxin Yang, Ting Zhang, Dong Chen, Lizhuang Ma, and Fang Wen. 2022. Explicitly Ccontrollable 3D-Aware Portrait Generation. arXiv preprint arXiv:2209.05434 (2022).

[49]

Ayush Tewari, Mohamed Elgharib, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, and Christian Theobalt. 2020. PIE: Portrait Image Embedding for Semantic Control. ACM Transactions on Graphics 39, 6 (2020), 1–14.

Digital Library

[50]

Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhofer, and Christian Theobalt. 2020. StyleRig: Rigging StyleGAN for 3D Control Over Portrait Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6142–6151.

[51]

Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. 2021. IBRNet: Learning Multi-View Image-Based Rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4690–4699.

[52]

Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Lu Yuan, Gang Hua, and Nenghai Yu. 2022. E2Style: Improve the Efficiency and Effectiveness of StyleGAN Inversion. IEEE Transactions on Image Processing 31 (2022), 3267–3280.

[53]

Olivia Wiles, A Koepke, and Andrew Zisserman. 2018. X2face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes. In Proceedings of the European Conference on Computer Vision. 670–686.

Digital Library

[54]

Zongze Wu, Dani Lischinski, and Eli Shechtman. 2021. StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12863–12872.

[55]

Egor Zakharov, Aleksei Ivakhnenko, Aliaksandra Shysheya, and Victor Lempitsky. 2020. Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars. In Proceedings of the European Conference on Computer Vision. Springer, 524–540.

Digital Library

Index Terms

Unlocking Potential of 3D-aware GAN for More Expressive Face Generation
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation

Recommendations

3D face recognition: a survey

3D face recognition has become a trending research direction in both industry and academia. It inherits advantages from traditional 2D face recognition, such as the natural recognition process and a wide range of applications. Moreover, 3D face ...
Expression flow for 3D-aware face component transfer
SIGGRAPH '11: ACM SIGGRAPH 2011 papers

We address the problem of correcting an undesirable expression on a face photo by transferring local facial components, such as a smiling mouth, from another face photo of the same person which has the desired expression. Direct copying and blending ...
Lighting-aware face frontalization for unconstrained face recognition

Provide both lighting-recovered and lighting-normalized frontalized images.Basic frontalization with a generic 3D face model by the alignment of only five landmarks.Lighting recovered and normalized image filling by the symmetry of quotient image.LRFF ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

June 2023

694 pages

ISBN:9798400701788

DOI:10.1145/3591106

Editors:
Ioannis (Yiannis) Kompatsiaris
Centre for Research and Technology Hellas, Greece
,
Jiebo Luo
University of Rochester,USA
,
Nicu Sebe
University of Trento, Italy
,
Angela Yao
National University of Singapore, Singapore
,
Vasileios Mezaris
Centre for Research and Technology Hellas, Greece
,
Symeon Papadopoulos
Centre for Research and Technology Hellas, Greece
,
Adrian Popescu
CEA LIST, France
,
Zi (Helen) Huang
University of Queensland, Australia

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Research Foundation of Korea (NRF)

Conference

ICMR '23

Sponsor:

SIGMM

ICMR '23: International Conference on Multimedia Retrieval

June 12 - 15, 2023

Thessaloniki, Greece

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
199
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)9

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten