Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Identity-Preserving Face Swapping via Dual Surrogate Generative Models

Published: 09 August 2024 Publication History

Abstract

In this study, we revisit the fundamental setting of face-swapping models and reveal that only using implicit supervision for training leads to the difficulty of advanced methods to preserve the source identity. We propose a novel reverse pseudo-input generation approach to offer supplemental data for training face-swapping models, which addresses the aforementioned issue. Unlike the traditional pseudo-label-based training strategy, we assume that arbitrary real facial images could serve as the ground-truth outputs for the face-swapping network and try to generate corresponding input <source, target> pair data. Specifically, we involve a source-creating surrogate that alters the attributes of the real image while keeping the identity, and a target-creating surrogate intends to synthesize attribute-preserved target images with different identities. Our framework, which utilizes proxy-paired data as explicit supervision to direct the face-swapping training process, partially fulfills a credible and effective optimization direction to boost the identity-preserving capability. We design explicit and implicit adaption strategies to better approximate the explicit supervision for face swapping. Quantitative and qualitative experiments on FF++, FFHQ, and wild images show that our framework could improve the performance of various face-swapping pipelines in terms of visual fidelity and ID preserving. Furthermore, we display applications with our method on re-aging, swappable attribute customization, cross-domain, and video face swapping. Code is available under https://github.com/ ICTMCG/CSCS.

Supplementary Material

tog-23-0102-File003 (tog-23-0102-file003.mp4)
Supplementary material

References

[1]
Dmitri Bitouk, Neeraj Kumar, Samreen Dhillon, Peter Belhumeur, and Shree K. Nayar. 2008. Face swapping: Automatically replacing faces in photographs. In Proceedings of the ACM SIGGRAPH (SIGGRAPH’08). Association for Computing Machinery, New York, NY.
[2]
Volker Blanz, Kristina Scherbaum, Thomas Vetter, and Hans-Peter Seidel. 2004. Exchanging faces in images. In Computer Graphics Forum. Wiley Online Library, 669–676.
[3]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley, 187–194.
[4]
Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. SimSwap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). Association for Computing Machinery, New York, NY, 2003–2011.
[5]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 4690–4699.
[6]
Qixin Deng, Luming Ma, Aobo Jin, Huikun Bi, Binh Huy Le, and Zhigang Deng. 2021. Plausible 3d face wrinkle generation using variational autoencoders. IEEE Trans. Vis. Comput. Graph. 28, 9 (2021), 3113–3125.
[7]
Zhigang Deng, U. Neumann, J. P. Lewis, Tae-Yong Kim, M. Bulut, and S. Narayanan. 2006. Expressive facial animation synthesis by learning speech coarticulation and expression spaces. IEEE Trans. Vis. Comput. Graph. 12, 6 (2006), 1523–1534.
[8]
Guillaume Francois, Pascal Gautron, Gaspard Breton, and Kadi Bouatouch. 2009. Image-based modeling of the human eye. IEEE Trans. Vis. Comput. Graph. 15, 5 (2009), 815–827.
[9]
Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, and Ran He. 2021. Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). IEEE, Los Alamitos, CA, 3404–3413.
[10]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
[11]
Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan Z. Li. 2020. Towards fast, accurate and stable 3D dense face alignment. In Proceedings of the European Conference on Computer Vision (ECCV’20). Springer International, Cham, 152–168.
[12]
Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. 2016. MS-Celeb-1M: A dataset and benchmark for large-scale face recognition. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer International, Cham, 87–102.
[13]
Fangzhou Han, Shuquan Ye, Mingming He, Menglei Chai, and Jing Liao. 2021. Exemplar-based 3d portrait stylization. IEEE Trans. Vis. Comput. Graph. 29, 2 (2021), 1371–1383.
[14]
Omer Kafri, Or Patashnik, Yuval Alaluf, and Daniel Cohen-Or. 2022. Stylefusion: Disentangling spatial segments in stylegan-generated images. ACM Trans. Graph. 41, 5 (2022), 1–15.
[15]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of International Conference on Learning Representations (ICLR’18).
[16]
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training generative adversarial networks with limited data. Adv. Neural Inf. Process. Syst. 33 (2020), 12104–12114.
[17]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, Los Alamitos, CA, 4401–4410.
[18]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, Los Alamitos, CA, 8110–8119.
[19]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 33 (2020), 18661–18673.
[20]
Jiseob Kim, Jihoon Lee, and Byoung-Tak Zhang. 2022. Smooth-swap: A simple enhancement for face-swapping with smoothness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, Los Alamitos, CA, 10779–10788.
[21]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations.
[22]
Chenyang Lei, Xuanchi Ren, Zhaoxiang Zhang, and Qifeng Chen. 2023. Blind video deflickering by neural filtering with a flawed atlas. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’23).
[23]
Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2020. Advancing high fidelity identity swapping for forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, Los Alamitos, CA, 5074–5083.
[24]
Moran Li, Haibin Huang, Yi Zheng, Mengtian Li, Nong Sang, and Chongyang Ma. 2022. Implicit neural deformation for sparse-view face reconstruction. Comput. Graph. Forum 41. (2022), 601–610.
[25]
Jingwang Ling, Zhibo Wang, Ming Lu, Quan Wang, Chen Qian, and Feng Xu. 2022. Semantically disentangled variational autoencoder for modeling 3d facial details. IEEE Transactions on Visualization and Computer Graphics 29, 8 (2022), 3630–3641.
[26]
Jingying Liu, Binyuan Hui, Kun Li, Yunke Liu, Yu-Kun Lai, Yuxiang Zhang, Yebin Liu, and Jingyu Yang. 2021. Geometry-guided dense perspective network for speech-driven facial animation. IEEE Trans. Vis. Comput. Graph. 28, 12 (2021), 4873–4886.
[27]
Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Meiling Wang, Xin Li, Zhengxing Sun, Qian Li, and Errui Ding. 2021a. AdaAttN: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21), 6629–6638.
[28]
Yilong Liu, Chengwei Zheng, Feng Xu, Xin Tong, and Baining Guo. 2021b. Data-driven 3D neck modeling and animation. IEEE Trans. Vis. Comput. Graph. 27, 7 (2021), 3226–3237.
[29]
Zhian Liu, Maomao Li, Yong Zhang, Cairong Wang, Qi Zhang, Jue Wang, and Yongwei Nie. 2023. Fine-grained face swapping via regional GAN inversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8578–8587.
[30]
Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge? In Proceedings of the International Conference on Machine Learning (ICML’18). PMLR, 3481–3490.
[31]
B. Mildenhall. 2020. NeRF: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision (ECCV’20).
[32]
Yisroel Mirsky and Wenke Lee. 2021. The creation and detection of deepfakes: A survey. ACM Comput. Surv. 54, 1 (2021), 1–41.
[33]
Jacek Naruniec, Leonhard Helminger, Christopher Schroers, and Romann M. Weber. 2020. High-resolution neural face swapping for visual effects. Comput. Graph. 39 (2020), 173–184.
[34]
Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). 7184–7193.
[35]
Yuval Nirkin, Iacopo Masi, A. Tran, Tal Hassner, and Gérard G. Medioni. 2017. On face segmentation, face swapping, and face perception. In Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG’18), 98–105.
[36]
Christopher Otto, Jacek Naruniec, Leonhard Helminger, Thomas Etterlin, Graziana Mignone, Prashanth Chandran, Gaspard Zoss, Christopher Schroers, Markus Gross, Paulo Gotardo, et al. 2022. Learning dynamic 3D geometry and texture for video face swapping. Comput. Graph. 41 (2022), 611–622.
[37]
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the ACM on Asia Conference on Computer and Communications Security. 506–519.
[38]
Yuru Pei and Hongbin Zha. 2007. Transferring of speech movements from video to 3D face space. IEEE Trans. Vis. Comput. Graph. 13, 1 (2007), 58–69.
[39]
Kunlin Liu, Ivan Perov, Daiheng Gao, Nikolay Chervoniy, Wenbo Zhou, and Weiming Zhang. 2023. Deepfacelab: Integrated, flexible and extensible face-swapping framework. Pattern Recognition 141 (2023), 109628.
[40]
Yurui Ren, Ge Li, Yuanqi Chen, Thomas H. Li, and Shan Liu. 2021. Pirenderer: Controllable portrait image generation via semantic neural rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’21). 13759–13768.
[41]
Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19).
[42]
Shunsuke Saito, Liwen Hu, Chongyang Ma, Hikaru Ibayashi, Linjie Luo, and Hao Li. 2018. 3D hair synthesis using volumetric variational autoencoders. ACM Trans. Graph. 37, 6 (2018), 1–12.
[43]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815–823.
[44]
Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Trans. Pattern Anal. Mach. Intell. 44, 4 (2020), 2004–2018.
[45]
Xinhui Song, Chen Liu, Youyi Zheng, Zunlei Feng, Lincheng Li, Kun Zhou, and Xin Yu. 2024. HairStyle editing via parametric controllable strokes. IEEE Transactions on Visualization and Computer Graphics 30, 7 (2024), 3857–3870.
[46]
Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2387–2395.
[47]
Rotem Tzaban, Ron Mokady, Rinon Gal, Amit Haim Bermano, and Daniel Cohen-Or. 2022. Stitch it in Time: GAN-based facial editing of real videos. In Proceedings of the SIGGRAPH Asia Conference.
[48]
Can Wang, Menglei Chai, Mingming He, Dongdong Chen, and Jing Liao. 2022. Cross-domain and disentangled face manipulation with 3d guidance. IEEE Trans. Vis. Comput. Graph. 29, 4 (2022), 2053–2066.
[49]
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5265–5274.
[50]
Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu. 2021c. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 10039–10049.
[51]
Xintao Wang, Yu Li, Honglun Zhang, and Ying Shan. 2021b. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’21).
[52]
Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021a. HifiFace: 3D shape and semantic prior guided high fidelity face swapping. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI’21), Zhi-Hua Zhou (Ed.). International Joint Conferences on Artificial Intelligence Organization, 1136–1142. Main Track.
[53]
Yaohui Wang, Di Yang, Francois Bremond, and Antitza Dantcheva. 2021. Latent image animator: Learning to animate images via latent space navigation. In Proceedings of the International Conference on Learning Representations (ICLR’21).
[54]
Xin Wen, Miao Wang, Christian Richardt, Ze-Yin Chen, and Shi-Min Hu. 2020. Photorealistic audio-driven video portraits. IEEE Trans. Vis. Comput. Graph. 26, 12 (2020), 3457–3466.
[55]
Wenpeng Xiao, Cheng Xu, Jiajie Mai, Xuemiao Xu, Yue Li, Chengze Li, Xueting Liu, and Shengfeng He. 2024. Appearance-preserved portrait-to-anime translation via proxy-guided domain adaptation. IEEE Transactions on Visualization and Computer Graphics 30, 7 (2024), 3104–3120.
[56]
Chao Xu, Jiangning Zhang, Miao Hua, Qian He, Zili Yi, and Yong Liu. 2022b. Region-aware face swapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 7632–7641.
[57]
Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, and Shengfeng He. 2022a. High-resolution face swapping via latent semantics disentanglement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 7642–7651.
[58]
Zipeng Ye, Mengfei Xia, Yanan Sun, Ran Yi, Minjing Yu, Juyong Zhang, Yu-Kun Lai, and Yong-Jin Liu. 2023. 3D-CariGAN: An end-to-end solution to 3D caricature generation from normal face photos. IEEE Transactions on Visualization and Computer Graphics 29, 4 (2023), 2203–2210.
[59]
Ran Yi, Zipeng Ye, Ruoyu Fan, Yezhi Shu, Yong-Jin Liu, Yu-Kun Lai, and Paul L. Rosin. 2022. Animating portrait line drawings from a single face photo and a speech signal. In Proceedings of the ACM SIGGRAPH 2022 Conference. 1–8.
[60]
Jie Zhang, Kangneng Zhou, Yan Luximon, Tong-Yee Lee, and Ping Li. 2024. MeshWGAN: Mesh-to-mesh Wasserstein GAN with Multi-Task Gradient Penalty for 3D facial geometric age transformation. IEEE Transactions on Visualization and Computer Graphics 30, 8 (2024), 4927–4940.
[61]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
[62]
Yujian Zheng, Zirong Jin, Moran Li, Haibin Huang, Chongyang Ma, Shuguang Cui, and Xiaoguang Han. 2023. Hairstep: Transfer synthetic to real using strand and depth maps for single-view 3d hair modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’23). 12726–12735.
[63]
Wen-Yang Zhou, Lu Yuan, Shu-Yu Chen, Lin Gao, and Shi-Min Hu. 2024. LC-NeRF: Local controllable face generation in neural radiance field. IEEE Transactions on Visualization and Computer Graphics 30, 8 (2024), 5437–5448.
[64]
Yuhao Zhu, Qi Li, Jian Wang, Cheng-Zhong Xu, and Zhenan Sun. 2021. One shot face swapping on megapixels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 4834–4844.

Index Terms

  1. Identity-Preserving Face Swapping via Dual Surrogate Generative Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Graphics
    ACM Transactions on Graphics  Volume 43, Issue 5
    October 2024
    176 pages
    EISSN:1557-7368
    DOI:10.1145/3613708
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 August 2024
    Online AM: 01 July 2024
    Accepted: 24 June 2024
    Revised: 02 April 2024
    Received: 21 September 2023
    Published in TOG Volume 43, Issue 5

    Check for updates

    Author Tags

    1. Face swapping
    2. image editing
    3. digital face synthesis

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Beijing Science and Technology Plan Project
    • 242 project
    • National Science and Technology Council

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 284
      Total Downloads
    • Downloads (Last 12 months)284
    • Downloads (Last 6 weeks)107
    Reflects downloads up to 03 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media