Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3674399.3674457acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article
Open access

Temporal Optimization for Face Swapping Video based on Consistency Inheritance

Published: 30 July 2024 Publication History

Abstract

Applying existing face swapping algorithms independently to each video frame typically leads to temporal inconsistency. We analyze the inconsistency in the generated results and model inter-frame inconsistency as time-domain noise. We propose a face swapping mapper network to inherit identity and suppress noise. Training strategies include primary perceptual loss to learn the face swapping information of the reference face, optical flow loss to impose temporal constraints, and identity loss to transfer identity information. In addition, we introduce a 3D face disentanglement model to regress FLAME parameters and guide the optimization direction precisely for facial detail consistency. Only a pair of original and swapped videos is used for training, eliminating the need for a large dataset. Experiments demonstrate that we improve the temporal consistency and detail consistency of the results, and enhance the generation quality of face swapping methods at the video level.

References

[1]
Brandon Amos, Bartosz Ludwiczuk, Mahadev Satyanarayanan, 2016. Openface: A general-purpose face recognition library with mobile applications. CMU School of Computer Science 6, 2 (2016), 20.
[2]
Volker Blanz and Thomas Vetter. 2023. A morphable model for the synthesis of 3D faces. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 157–164.
[3]
Nicolas Bonneel, James Tompkin, Kalyan Sunkavalli, Deqing Sun, Sylvain Paris, and Hanspeter Pfister. 2015. Blind video temporal consistency. ACM Transactions on Graphics (TOG) 34, 6 (2015), 1–9.
[4]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4690–4699.
[5]
Armen Der Kiureghian and Ove Ditlevsen. 2009. Aleatory or epistemic? Does it matter?Structural safety 31, 2 (2009), 105–112.
[6]
Gabriel Eilertsen, Rafal K Mantiuk, and Jonas Unger. 2019. Single-frame regularization for temporally stable cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11176–11185.
[7]
Yao Feng, Haiwen Feng, Michael J Black, and Timo Bolkart. 2021. Learning an animatable detailed 3D face model from in-the-wild images. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–13.
[8]
Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, and José Lezama. 2023. Photorealistic video generation with diffusion models. arXiv preprint arXiv:2312.06662 (2023).
[9]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 694–711.
[10]
Minchul Kim, Anil K Jain, and Xiaoming Liu. 2022. Adaface: Quality adaptive margin for face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 18750–18759.
[11]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[12]
Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, and Ming-Hsuan Yang. 2018. Learning blind video temporal consistency. In Proceedings of the European conference on computer vision (ECCV). 170–185.
[13]
Chenyang Lei and Qifeng Chen. 2019. Fully automatic video colorization with self-regularization and diversity. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3753–3761.
[14]
Chenyang Lei, Yazhou Xing, and Qifeng Chen. 2020. Blind video temporal consistency via deep video prior. Advances in Neural Information Processing Systems 33 (2020), 1083–1093.
[15]
Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).
[16]
Tianye Li, Timo Bolkart, Michael J Black, Hao Li, and Javier Romero. 2017. Learning a model of facial shape and expression from 4D scans.ACM Trans. Graph. 36, 6 (2017), 194–1.
[17]
Ziwei Liu, Raymond A Yeh, Xiaoou Tang, Yiming Liu, and Aseem Agarwala. 2017. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE international conference on computer vision. 4463–4471.
[18]
Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, and Mingyu Ding. 2023. Vdt: General-purpose video diffusion transformers via mask modeling. In The Twelfth International Conference on Learning Representations.
[19]
Simone Meyer, Oliver Wang, Henning Zimmer, Max Grosse, and Alexander Sorkine-Hornung. 2015. Phase-based frame interpolation for video. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1410–1418.
[20]
Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF international conference on computer vision. 7184–7193.
[21]
Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, and In So Kweon. 2019. Preserving semantic and temporal consistency for unpaired video-to-video translation. In Proceedings of the 27th ACM International Conference on Multimedia. 1248–1257.
[22]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234–241.
[23]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF international conference on computer vision. 1–11.
[24]
Deqing Sun, Stefan Roth, and Michael J Black. 2010. Secrets of optical flow estimation and their principles. In 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 2432–2439.
[25]
Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8934–8943.
[26]
Qianru Sun, Ayush Tewari, Weipeng Xu, Mario Fritz, Christian Theobalt, and Bernt Schiele. 2018. A hybrid model for identity obfuscation by face replacement. In Proceedings of the European conference on computer vision (ECCV). 553–569.
[27]
Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. Mocogan: Decomposing motion and content for video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1526–1535.
[28]
Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, and Bryan Catanzaro. 2019. Few-shot video-to-video synthesis. arXiv preprint arXiv:1910.12713 (2019).
[29]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. Video-to-video synthesis. arXiv preprint arXiv:1808.06601 (2018).
[30]
Yaohui Wang, Piotr Bilinski, Francois Bremond, and Antitza Dantcheva. 2020. Imaginator: Conditional spatio-temporal gan for video generation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1160–1169.
[31]
Yangyang Xu, Bailin Deng, Junle Wang, Yanqing Jing, Jia Pan, and Shengfeng He. 2022. High-resolution face swapping via latent semantics disentanglement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7642–7651.
[32]
Yanbo Xu, Yueqin Yin, Liming Jiang, Qianyi Wu, Chengyao Zheng, Chen Change Loy, Bo Dai, and Wayne Wu. 2022. Transeditor: Transformer-based dual-space gan for highly controllable facial editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7683–7692.
[33]
Chun-Han Yao, Chia-Yang Chang, and Shao-Yi Chien. 2017. Occlusion-aware video temporal consistency. In Proceedings of the 25th ACM international conference on Multimedia. 777–785.
[34]
Fan Zhang, Yu Li, Shaodi You, and Ying Fu. 2021. Learning temporal consistency for low light video enhancement from single images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4967–4976.

Index Terms

  1. Temporal Optimization for Face Swapping Video based on Consistency Inheritance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACM-TURC '24: Proceedings of the ACM Turing Award Celebration Conference - China 2024
    July 2024
    261 pages
    ISBN:9798400710117
    DOI:10.1145/3674399
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 July 2024

    Check for updates

    Author Tags

    1. 3D face disentanglement
    2. Deepfake
    3. face swapping
    4. optical flow
    5. temporal consistency

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the Natural Science Foundation of China
    • Key Research and De- velopment program of Anhui Province

    Conference

    ACM-TURC '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 66
      Total Downloads
    • Downloads (Last 12 months)66
    • Downloads (Last 6 weeks)48
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media