research-article

Identity-Preserving Face Swapping via Dual Surrogate Generative Models

Authors:

Tong-Yee LeeAuthors Info & Claims

ACM Transactions on Graphics, Volume 43, Issue 5

Article No.: 161, Pages 1 - 19

https://doi.org/10.1145/3676165

Published: 09 August 2024 Publication History

Abstract

In this study, we revisit the fundamental setting of face-swapping models and reveal that only using implicit supervision for training leads to the difficulty of advanced methods to preserve the source identity. We propose a novel reverse pseudo-input generation approach to offer supplemental data for training face-swapping models, which addresses the aforementioned issue. Unlike the traditional pseudo-label-based training strategy, we assume that arbitrary real facial images could serve as the ground-truth outputs for the face-swapping network and try to generate corresponding input <source, target> pair data. Specifically, we involve a source-creating surrogate that alters the attributes of the real image while keeping the identity, and a target-creating surrogate intends to synthesize attribute-preserved target images with different identities. Our framework, which utilizes proxy-paired data as explicit supervision to direct the face-swapping training process, partially fulfills a credible and effective optimization direction to boost the identity-preserving capability. We design explicit and implicit adaption strategies to better approximate the explicit supervision for face swapping. Quantitative and qualitative experiments on FF++, FFHQ, and wild images show that our framework could improve the performance of various face-swapping pipelines in terms of visual fidelity and ID preserving. Furthermore, we display applications with our method on re-aging, swappable attribute customization, cross-domain, and video face swapping. Code is available under https://github.com/ ICTMCG/CSCS.

Supplementary Material

tog-23-0102-File003 (tog-23-0102-file003.mp4)

Supplementary material

Download
46.26 MB

References

[1]

Dmitri Bitouk, Neeraj Kumar, Samreen Dhillon, Peter Belhumeur, and Shree K. Nayar. 2008. Face swapping: Automatically replacing faces in photographs. In Proceedings of the ACM SIGGRAPH (SIGGRAPH’08). Association for Computing Machinery, New York, NY.

Digital Library

[2]

Volker Blanz, Kristina Scherbaum, Thomas Vetter, and Hans-Peter Seidel. 2004. Exchanging faces in images. In Computer Graphics Forum. Wiley Online Library, 669–676.

[3]

Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley, 187–194.

Digital Library

[4]

Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. SimSwap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). Association for Computing Machinery, New York, NY, 2003–2011.

Digital Library

[5]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 4690–4699.

[6]

Qixin Deng, Luming Ma, Aobo Jin, Huikun Bi, Binh Huy Le, and Zhigang Deng. 2021. Plausible 3d face wrinkle generation using variational autoencoders. IEEE Trans. Vis. Comput. Graph. 28, 9 (2021), 3113–3125.

[7]

Zhigang Deng, U. Neumann, J. P. Lewis, Tae-Yong Kim, M. Bulut, and S. Narayanan. 2006. Expressive facial animation synthesis by learning speech coarticulation and expression spaces. IEEE Trans. Vis. Comput. Graph. 12, 6 (2006), 1523–1534.

Digital Library

[8]

Guillaume Francois, Pascal Gautron, Gaspard Breton, and Kadi Bouatouch. 2009. Image-based modeling of the human eye. IEEE Trans. Vis. Comput. Graph. 15, 5 (2009), 815–827.

Digital Library

[9]

Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, and Ran He. 2021. Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). IEEE, Los Alamitos, CA, 3404–3413.

[10]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.

Digital Library

[11]

Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z. Li. 2020. Towards fast, accurate and stable 3D dense face alignment. In Proceedings of the European Conference on Computer Vision (ECCV’20). Springer International, Cham, 152–168.

Digital Library

[12]

Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. 2016. MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer International, Cham, 87–102.

[13]

Fangzhou Han, Shuquan Ye, Mingming He, Menglei Chai, and Jing Liao. 2021. Exemplar-based 3d portrait stylization. IEEE Trans. Vis. Comput. Graph. 29, 2 (2021), 1371–1383.

[14]

Omer Kafri, Or Patashnik, Yuval Alaluf, and Daniel Cohen-Or. 2022. Stylefusion: Disentangling spatial segments in stylegan-generated images. ACM Trans. Graph. 41, 5 (2022), 1–15.

Digital Library

[15]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of International Conference on Learning Representations (ICLR’18).

[16]

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst. 33 (2020), 12104–12114.

[17]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 4401–4410.

[18]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, Los Alamitos, CA, 8110–8119.

[19]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 33 (2020), 18661–18673.

[20]

Jiseob Kim, Jihoon Lee, and Byoung-Tak Zhang. 2022. Smooth-swap: A simple enhancement for face-swapping with smoothness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, Los Alamitos, CA, 10779–10788.

[21]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations.

[22]

Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, and Qifeng Chen. 2023. Blind video deflickering by neural filtering with a flawed atlas. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’23).

[23]

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2020. Advancing high fidelity identity swapping for forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, Los Alamitos, CA, 5074–5083.

[24]

Moran Li, Haibin Huang, Yi Zheng, Mengtian Li, Nong Sang, and Chongyang Ma. 2022. Implicit neural deformation for sparse-view face reconstruction. Comput. Graph. Forum 41. (2022), 601–610.

[25]

Jingwang Ling, Zhibo Wang, Ming Lu, Quan Wang, Chen Qian, and Feng Xu. 2022. Semantically disentangled variational autoencoder for modeling 3d facial details. IEEE Transactions on Visualization and Computer Graphics 29, 8 (2022), 3630–3641.

[26]

Jingying Liu, Binyuan Hui, Kun Li, Yunke Liu, Yu-Kun Lai, Yuxiang Zhang, Yebin Liu, and Jingyu Yang. 2021. Geometry-guided dense perspective network for speech-driven facial animation. IEEE Trans. Vis. Comput. Graph. 28, 12 (2021), 4873–4886.

Digital Library

[27]

Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Meiling Wang, Xin Li, Zhengxing Sun, Qian Li, and Errui Ding. 2021a. AdaAttN: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21), 6629–6638.

[28]

Yilong Liu, Chengwei Zheng, Feng Xu, Xin Tong, and Baining Guo. 2021b. Data-driven 3D neck modeling and animation. IEEE Trans. Vis. Comput. Graph. 27, 7 (2021), 3226–3237.

Digital Library

[29]

Zhian Liu, Maomao Li, Yong Zhang, Cairong Wang, Qi Zhang, Jue Wang, and Yongwei Nie. 2023. Fine-grained face swapping via regional GAN inversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8578–8587.

[30]

Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge? In Proceedings of the International Conference on Machine Learning (ICML’18). PMLR, 3481–3490.

[31]

B. Mildenhall. 2020. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV’20).

Digital Library

[32]

Yisroel Mirsky and Wenke Lee. 2021. The creation and detection of deepfakes: A survey. ACM Comput. Surv. 54, 1 (2021), 1–41.

Digital Library

[33]

Jacek Naruniec, Leonhard Helminger, Christopher Schroers, and Romann M. Weber. 2020. High-resolution neural face swapping for visual effects. Comput. Graph. 39 (2020), 173–184.

[34]

Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 7184–7193.

[35]

Yuval Nirkin, Iacopo Masi, A. Tran, Tal Hassner, and Gérard G. Medioni. 2017. On face segmentation, face swapping, and face perception. In Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG’18), 98–105.

[36]

Christopher Otto, Jacek Naruniec, Leonhard Helminger, Thomas Etterlin, Graziana Mignone, Prashanth Chandran, Gaspard Zoss, Christopher Schroers, Markus Gross, Paulo Gotardo, et al. 2022. Learning dynamic 3D geometry and texture for video face swapping. Comput. Graph. 41 (2022), 611–622.

[37]

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the ACM on Asia Conference on Computer and Communications Security. 506–519.

Digital Library

[38]

Yuru Pei and Hongbin Zha. 2007. Transferring of speech movements from video to 3D face space. IEEE Trans. Vis. Comput. Graph. 13, 1 (2007), 58–69.

Digital Library

[39]

Kunlin Liu, Ivan Perov, Daiheng Gao, Nikolay Chervoniy, Wenbo Zhou, and Weiming Zhang. 2023. Deepfacelab: Integrated, flexible and extensible face-swapping framework. Pattern Recognition 141 (2023), 109628.

[40]

Yurui Ren, Ge Li, Yuanqi Chen, Thomas H. Li, and Shan Liu. 2021. Pirenderer: Controllable portrait image generation via semantic neural rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21). 13759–13768.

[41]

Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19).

[42]

Shunsuke Saito, Liwen Hu, Chongyang Ma, Hikaru Ibayashi, Linjie Luo, and Hao Li. 2018. 3D hair synthesis using volumetric variational autoencoders. ACM Trans. Graph. 37, 6 (2018), 1–12.

Digital Library

[43]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815–823.

[44]

Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Trans. Pattern Anal. Mach. Intell. 44, 4 (2020), 2004–2018.

[45]

Xinhui Song, Chen Liu, Youyi Zheng, Zunlei Feng, Lincheng Li, Kun Zhou, and Xin Yu. 2024. HairStyle editing via parametric controllable strokes. IEEE Transactions on Visualization and Computer Graphics 30, 7 (2024), 3857–3870.

Digital Library

[46]

Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2387–2395.

Digital Library

[47]

Rotem Tzaban, Ron Mokady, Rinon Gal, Amit Haim Bermano, and Daniel Cohen-Or. 2022. Stitch it in Time: GAN-based facial editing of real videos. In Proceedings of the SIGGRAPH Asia Conference.

[48]

Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. 2022. Cross-domain and disentangled face manipulation with 3d guidance. IEEE Trans. Vis. Comput. Graph. 29, 4 (2022), 2053–2066.

Digital Library

[49]

Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5265–5274.

[50]

Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu. 2021c. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 10039–10049.

[51]

Xintao Wang, Yu Li, Honglun Zhang, and Ying Shan. 2021b. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’21).

[52]

Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021a. HifiFace: 3D shape and semantic prior guided high fidelity face swapping. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI’21), Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 1136–1142. Main Track.

[53]

Yaohui Wang, Di Yang, Francois Bremond, and Antitza Dantcheva. 2021. Latent image animator: Learning to animate images via latent space navigation. In Proceedings of the International Conference on Learning Representations (ICLR’21).

[54]

Xin Wen, Miao Wang, Christian Richardt, Ze-Yin Chen, and Shi-Min Hu. 2020. Photorealistic audio-driven video portraits. IEEE Trans. Vis. Comput. Graph. 26, 12 (2020), 3457–3466.

[55]

Wenpeng Xiao, Cheng Xu, Jiajie Mai, Xuemiao Xu, Yue Li, Chengze Li, Xueting Liu, and Shengfeng He. 2024. Appearance-preserved portrait-to-anime translation via proxy-guided domain adaptation. IEEE Transactions on Visualization and Computer Graphics 30, 7 (2024), 3104–3120.

Digital Library

[56]

Chao Xu, Jiangning Zhang, Miao Hua, Qian He, Zili Yi, and Yong Liu. 2022b. Region-aware face swapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 7632–7641.

[57]

Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, and Shengfeng He. 2022a. High-resolution face swapping via latent semantics disentanglement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 7642–7651.

[58]

Zipeng Ye, Mengfei Xia, Yanan Sun, Ran Yi, Minjing Yu, Juyong Zhang, Yu-Kun Lai, and Yong-Jin Liu. 2023. 3D-CariGAN: An end-to-end solution to 3D caricature generation from normal face photos. IEEE Transactions on Visualization and Computer Graphics 29, 4 (2023), 2203–2210.

Digital Library

[59]

Ran Yi, Zipeng Ye, Ruoyu Fan, Yezhi Shu, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin. 2022. Animating portrait line drawings from a single face photo and a speech signal. In Proceedings of the ACM SIGGRAPH 2022 Conference. 1–8.

Digital Library

[60]

Jie Zhang, Kangneng Zhou, Yan Luximon, Tong-Yee Lee, and Ping Li. 2024. MeshWGAN: Mesh-to-mesh Wasserstein GAN with Multi-Task Gradient Penalty for 3D facial geometric age transformation. IEEE Transactions on Visualization and Computer Graphics 30, 8 (2024), 4927–4940.

Digital Library

[61]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.

[62]

Yujian Zheng, Zirong Jin, Moran Li, Haibin Huang, Chongyang Ma, Shuguang Cui, and Xiaoguang Han. 2023. Hairstep: Transfer synthetic to real using strand and depth maps for single-view 3d hair modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’23). 12726–12735.

[63]

Wen-Yang Zhou, Lu Yuan, Shu-Yu Chen, Lin Gao, and Shi-Min Hu. 2024. LC-NeRF: Local controllable face generation in neural radiance field. IEEE Transactions on Visualization and Computer Graphics 30, 8 (2024), 5437–5448.

Digital Library

[64]

Yuhao Zhu, Qi Li, Jian Wang, Cheng-Zhong Xu, and Zhenan Sun. 2021. One shot face swapping on megapixels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 4834–4844.

Index Terms

Identity-Preserving Face Swapping via Dual Surrogate Generative Models
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation

Recommendations

Deep Face Swapping via Cross-Identity Adversarial Training
MultiMedia Modeling
Abstract
Generative Adversarial Networks (GANs) have shown promising improvements in face synthesis and image manipulation. However, it remains difficult to swap the faces in videos with a specific target. The most well-known face swapping method, ...
Face identity and expression consistency for game character face swapping
Abstract
Customizing the appearance of game characters according to individual preferences is an important application in the gaming industry. The traditional solutions such as manual editing within the game engine require much time and some professional ...
Highlights
- The first work for the game character face swapping from the normal human face to the game face domain.
- An identity compound strategy to improve the identity consistency between the source and the swapped face images while preserving ...
RSGAN: face swapping and editing using face and hair representation in latent spaces
SIGGRAPH '18: ACM SIGGRAPH 2018 Posters

This abstract introduces a generative neural network for face swapping and editing face images. We refer to this network as "region-separative generative adversarial network (RSGAN)". In existing deep generative models such as Variational autoencoder (...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 43, Issue 5

October 2024

176 pages

EISSN:1557-7368

DOI:10.1145/3613708

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2024

Online AM: 01 July 2024

Accepted: 24 June 2024

Revised: 02 April 2024

Received: 21 September 2023

Published in TOG Volume 43, Issue 5

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Beijing Science and Technology Plan Project
242 project
National Science and Technology Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
284
Total Downloads

Downloads (Last 12 months)284
Downloads (Last 6 weeks)107

Reflects downloads up to 03 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents