Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475257acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

One-stage Context and Identity Hallucination Network

Published: 17 October 2021 Publication History

Abstract

Face swapping aims to synthesize a face image, in which the facial identity is well transplanted from the source image and the context (e.g., hairstyle, head posture, facial expression, lighting, and background) keeps consistent with the reference image. The prior work mainly accomplishes the task in two stages, i.e., generating the inner face with the source identity, and then stitching the generation with the complementary part of the reference image by image blending techniques. The blending mask, which is usually obtained by the additional face segmentation model, is a common practice towards photo-realistic face swapping. However, artifacts usually appear at the blending boundary, especially in areas occluded by the hair, eyeglasses, accessories, etc. To address this problem, rather than struggling with the blending mask in the two-stage routine, we develop a novel one-stage context and identity hallucination network, which learns a series of hallucination maps to softly divide the context areas and identity areas. For context areas, the features are fully utilized by a multi-level context encoder. For identity areas, we design a novel two-cascading AdaIN to transfer the identity while retaining the context. Besides, with the help of hallucination maps, we introduce an effectively improved reconstruction loss to utilize unlimited unpaired face images for training. Our network performs well on both context areas and identity areas without any dependency on post-processing. Extensive qualitative and quantitative experiments demonstrate the superiority of our network.

Supplementary Material

ZIP File (mfp0519aux.zip)
mfp0519_Supplementary.pdf is a supplemental material, which includes additional results, failure cases and detailed network structure.

References

[1]
Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua. 2018. Towards open-set identity preserving face synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6713--6722.
[2]
Dmitri Bitouk, Neeraj Kumar, Samreen Dhillon, Peter Belhumeur, and Shree K Nayar. 2008. Face swapping: automatically replacing faces in photographs. In ACM SIGGRAPH 2008 papers. 1--8.
[3]
Volker Blanz, Sami Romdhani, and Thomas Vetter. 2002. Face identification across different poses and illuminations with a 3d morphable model. In Proceedings of fifth IEEE international conference on automatic face gesture recognition. IEEE, 202--207.
[4]
Volker Blanz, Kristina Scherbaum, Thomas Vetter, and Hans-Peter Seidel. 2004. Exchanging faces in images. In Computer Graphics Forum, Vol. 23. Wiley Online Library, 669--676.
[5]
Volker Blanz and Thomas Vetter. 2003. Face recognition based on fitting a 3d morphable model. IEEE Transactions on pattern analysis and machine intelligence, Vol. 25, 9 (2003), 1063--1074.
[6]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
[7]
Frédéric Cazals and Joachim Giesen. 2006. Delaunay triangulation based surface reconstruction. In Effective computational geometry for curves and surfaces. Springer, 231--276.
[8]
Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. SimSwap: An Efficient Framework For High Fidelity Face Swapping. In Proceedings of the 28th ACM International Conference on Multimedia. 2003--2011.
[9]
Deepfakes. 2021. https://github.com/deepfakes/faceswap .
[10]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019 a. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4690--4699.
[11]
Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019 b. Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641 (2019).
[12]
FaceSwap. 2021. https://github.com/MarekKowalski/FaceSwap .
[13]
Michael S Floater, Kai Hormann, and Géza Kós. 2006. A general construction of barycentric coordinates over convex polygons. advances in computational mathematics, Vol. 24, 1--4 (2006), 311--331.
[14]
Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormahlen, Patrick Perez, and Christian Theobalt. 2014. Automatic face reenactment. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4217--4224.
[15]
Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. In Computer graphics forum, Vol. 34. Wiley Online Library, 193--204.
[16]
Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014).
[17]
David Güera and Edward J Delp. 2018. Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--6.
[18]
Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE international conference on computer vision. 2439--2448.
[19]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
[20]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.
[21]
Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep video portraits. ACM Transactions on Graphics (TOG), Vol. 37, 4 (2018), 1--14.
[22]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[23]
Iryna Korshunova, Wenzhe Shi, Joni Dambre, and Lucas Theis. 2017. Fast face-swap using convolutional neural networks. In Proceedings of the IEEE international conference on computer vision. 3677--3685.
[24]
Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).
[25]
Yuezun Li and Siwei Lyu. 2018. Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656 (2018).
[26]
Yuan Lin, Shengjin Wang, Qian Lin, and Feng Tang. 2012. Face swapping under large pose variations: A 3D model based approach. In 2012 IEEE International Conference on Multimedia and Expo. IEEE, 333--338.
[27]
Yinglu Liu, Peipei Li, Xin Tong, Hailin Shi, Xiangyu Zhu, Zhenan Sun, Zhen Xu, Huaibo Liu, Xuefeng Su, Wei Chen, et al. 2021. The 2^ nd 2 nd 106-Point Lightweight Facial Landmark Localization Grand Challenge. In ICPR Workshops. Springer International Publishing, 327--338.
[28]
Yinglu Liu, Hao Shen, Yue Si, Xiaobo Wang, Xiangyu Zhu, Hailin Shi, Zhibin Hong, Hanqi Guo, Ziyuan Guo, Yanqin Chen, et al. 2019. Grand challenge of 106-point facial landmark localization. In ICME Workshop. IEEE, 613--616.
[29]
Yinglu Liu, Hailin Shi, Hao Shen, Yue Si, Xiaobo Wang, and Tao Mei. 2020. A new dataset and boundary-attention semantic segmentation for face parsing. In AAAI, Vol. 34. 11637--11644.
[30]
Ryota Natsume, Tatsuya Yatagawa, and Shigeo Morishima. 2018. Rsgan: face swapping and editing using face and hair representation in latent spaces. arXiv preprint arXiv:1804.03447 (2018).
[31]
Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7184--7193.
[32]
Yuval Nirkin, Iacopo Masi, Anh Tran Tuan, Tal Hassner, and Gerard Medioni. 2018. On face segmentation, face swapping, and face perception. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 98--105.
[33]
Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. (2015).
[34]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[35]
Arun Ross and Asem Othman. 2010. Visual cryptography for biometric privacy. IEEE transactions on information forensics and security, Vol. 6, 1 (2010), 70--81.
[36]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics+: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1--11.
[37]
Nataniel Ruiz, Eunji Chong, and James M. Rehg. 2018. Fine-Grained Head Pose Estimation Without Keypoints. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops .
[38]
Supasorn Suwajanakorn, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (ToG), Vol. 36, 4 (2017), 1--13.
[39]
Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG), Vol. 38, 4 (2019), 1--12.
[40]
Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph., Vol. 34, 6 (2015), 183--1.
[41]
Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2387--2395.
[42]
Luan Tran, Xi Yin, and Xiaoming Liu. 2017. Disentangled representation learning gan for pose-invariant face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1415--1424.
[43]
Hong-Xia Wang, Chunhong Pan, Haifeng Gong, and Huai-Yu Wu. 2008. Facial image composition based on active appearance model. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 893--896.
[44]
Jun Wang, Yinglu Liu, Yibo Hu, Hailin Shi, and Tao Mei. 2021. FaceX-Zoo: A PyTorh Toolbox for Face Recognition. arXiv preprint arXiv:2101.04407 (2021).
[45]
Mingcan Xiang, Yinglu Liu, Tingting Liao, Xiangyu Zhu, Can Yang, Wu Liu, and Hailin Shi. 2021. The 3rd Grand Challenge of Lightweight 106-Point Facial Landmark Localization on Masked Faces. In ICME Workshop. IEEE, 1--6.
[46]
Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker. 2017. Towards large-pose face frontalization in the wild. In Proceedings of the IEEE international conference on computer vision. 3990--3999.
[47]
Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self-attention generative adversarial networks. In International conference on machine learning. PMLR, 7354--7363.
[48]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, Vol. 23, 10 (2016), 1499--1503.

Cited By

View all
  • (2024)AdapMTL: Adaptive Pruning Framework for Multitask Learning ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681426(5121-5130)Online publication date: 28-Oct-2024

Index Terms

  1. One-stage Context and Identity Hallucination Network

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. face generation
      2. face swapping
      3. self-adaptive learning

      Qualifiers

      • Research-article

      Funding Sources

      • the National Key R&D Program of China

      Conference

      MM '21
      Sponsor:
      MM '21: ACM Multimedia Conference
      October 20 - 24, 2021
      Virtual Event, China

      Acceptance Rates

      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 31 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)AdapMTL: Adaptive Pruning Framework for Multitask Learning ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681426(5121-5130)Online publication date: 28-Oct-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media