research-article

Open access

Temporal Optimization for Face Swapping Video based on Consistency Inheritance

Authors:

Nenghai YuAuthors Info & Claims

ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

Pages 165 - 170

https://doi.org/10.1145/3674399.3674457

Published: 30 July 2024 Publication History

All formats PDF

Abstract

Applying existing face swapping algorithms independently to each video frame typically leads to temporal inconsistency. We analyze the inconsistency in the generated results and model inter-frame inconsistency as time-domain noise. We propose a face swapping mapper network to inherit identity and suppress noise. Training strategies include primary perceptual loss to learn the face swapping information of the reference face, optical flow loss to impose temporal constraints, and identity loss to transfer identity information. In addition, we introduce a 3D face disentanglement model to regress FLAME parameters and guide the optimization direction precisely for facial detail consistency. Only a pair of original and swapped videos is used for training, eliminating the need for a large dataset. Experiments demonstrate that we improve the temporal consistency and detail consistency of the results, and enhance the generation quality of face swapping methods at the video level.

References

[1]

Brandon Amos, Bartosz Ludwiczuk, Mahadev Satyanarayanan, 2016. Openface: A general-purpose face recognition library with mobile applications. CMU School of Computer Science 6, 2 (2016), 20.

[2]

Volker Blanz and Thomas Vetter. 2023. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 157–164.

Digital Library

[3]

Nicolas Bonneel, James Tompkin, Kalyan Sunkavalli, Deqing Sun, Sylvain Paris, and Hanspeter Pfister. 2015. Blind video temporal consistency. ACM Transactions on Graphics (TOG) 34, 6 (2015), 1–9.

Digital Library

[4]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4690–4699.

[5]

Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter?Structural safety 31, 2 (2009), 105–112.

[6]

Gabriel Eilertsen, Rafal K Mantiuk, and Jonas Unger. 2019. Single-frame regularization for temporally stable cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11176–11185.

[7]

Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2021. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–13.

Digital Library

[8]

Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, and José Lezama. 2023. Photorealistic video generation with diffusion models. arXiv preprint arXiv:2312.06662 (2023).

[9]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 694–711.

[10]

Minchul Kim, Anil K Jain, and Xiaoming Liu. 2022. Adaface: Quality adaptive margin for face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 18750–18759.

[11]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[12]

Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018. Learning blind video temporal consistency. In Proceedings of the European conference on computer vision (ECCV). 170–185.

Digital Library

[13]

Chenyang Lei and Qifeng Chen. 2019. Fully automatic video colorization with self-regularization and diversity. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3753–3761.

[14]

Chenyang Lei, Yazhou Xing, and Qifeng Chen. 2020. Blind video temporal consistency via deep video prior. Advances in Neural Information Processing Systems 33 (2020), 1083–1093.

[15]

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).

[16]

Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans.ACM Trans. Graph. 36, 6 (2017), 194–1.

Digital Library

[17]

Ziwei Liu, Raymond A Yeh, Xiaoou Tang, Yiming Liu, and Aseem Agarwala. 2017. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE international conference on computer vision. 4463–4471.

[18]

Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, and Mingyu Ding. 2023. Vdt: General-purpose video diffusion transformers via mask modeling. In The Twelfth International Conference on Learning Representations.

[19]

Simone Meyer, Oliver Wang, Henning Zimmer, Max Grosse, and Alexander Sorkine-Hornung. 2015. Phase-based frame interpolation for video. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1410–1418.

[20]

Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF international conference on computer vision. 7184–7193.

[21]

Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, and In So Kweon. 2019. Preserving semantic and temporal consistency for unpaired video-to-video translation. In Proceedings of the 27th ACM International Conference on Multimedia. 1248–1257.

Digital Library

[22]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241.

[23]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision. 1–11.

[24]

Deqing Sun, Stefan Roth, and Michael J Black. 2010. Secrets of optical flow estimation and their principles. In 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2432–2439.

[25]

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8934–8943.

[26]

Qianru Sun, Ayush Tewari, Weipeng Xu, Mario Fritz, Christian Theobalt, and Bernt Schiele. 2018. A hybrid model for identity obfuscation by face replacement. In Proceedings of the European conference on computer vision (ECCV). 553–569.

Digital Library

[27]

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1526–1535.

[28]

Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, and Bryan Catanzaro. 2019. Few-shot video-to-video synthesis. arXiv preprint arXiv:1910.12713 (2019).

[29]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018).

Digital Library

[30]

Yaohui Wang, Piotr Bilinski, Francois Bremond, and Antitza Dantcheva. 2020. Imaginator: Conditional spatio-temporal gan for video generation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1160–1169.

[31]

Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, and Shengfeng He. 2022. High-resolution face swapping via latent semantics disentanglement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7642–7651.

[32]

Yanbo Xu, Yueqin Yin, Liming Jiang, Qianyi Wu, Chengyao Zheng, Chen Change Loy, Bo Dai, and Wayne Wu. 2022. Transeditor: Transformer-based dual-space gan for highly controllable facial editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7683–7692.

[33]

Chun-Han Yao, Chia-Yang Chang, and Shao-Yi Chien. 2017. Occlusion-aware video temporal consistency. In Proceedings of the 25th ACM international conference on Multimedia. 777–785.

Digital Library

[34]

Fan Zhang, Yu Li, Shaodi You, and Ying Fu. 2021. Learning temporal consistency for low light video enhancement from single images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4967–4976.

Index Terms

Temporal Optimization for Face Swapping Video based on Consistency Inheritance
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Appearance and texture representations

Recommendations

Face identity and expression consistency for game character face swapping
Abstract
Customizing the appearance of game characters according to individual preferences is an important application in the gaming industry. The traditional solutions such as manual editing within the game engine require much time and some professional ...
Highlights
- The first work for the game character face swapping from the normal human face to the game face domain.
- An identity compound strategy to improve the identity consistency between the source and the swapped face images while preserving ...
Robust and Real-Time Face Swapping Based on Face Segmentation and CANDIDE-3
PRICAI 2018: Trends in Artificial Intelligence
Abstract
Despite some successes have been made in face swapping research, face swapping is still not robust and real-time enough. In this paper, a robust and real-time method for face swapping based on face segmentation and CANDIDE-3 is proposed. We ...
Face Swapping under Large Pose Variations: A 3D Model Based Approach
ICME '12: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo

Traditional face swapping technologies require the faces of source images and target images have similar pose and appearance (usually frontal). This limits its applications. This paper presents a method for face swapping based on personalized 3D head ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024

July 2024

261 pages

ISBN:9798400710117

DOI:10.1145/3674399

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 July 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

the Natural Science Foundation of China
Key Research and De- velopment program of Anhui Province

Conference

ACM-TURC '24

ACM-TURC '24: ACM Turing Award Celebration Conference 2024

July 5 - 7, 2024

Changsha, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
180
Total Downloads

Downloads (Last 12 months)180
Downloads (Last 6 weeks)40

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten