research-article

One-stage Context and Identity Hallucination Network

Authors:

Tao MeiAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 835 - 843

https://doi.org/10.1145/3474085.3475257

Published: 17 October 2021 Publication History

Abstract

Face swapping aims to synthesize a face image, in which the facial identity is well transplanted from the source image and the context (e.g., hairstyle, head posture, facial expression, lighting, and background) keeps consistent with the reference image. The prior work mainly accomplishes the task in two stages, i.e., generating the inner face with the source identity, and then stitching the generation with the complementary part of the reference image by image blending techniques. The blending mask, which is usually obtained by the additional face segmentation model, is a common practice towards photo-realistic face swapping. However, artifacts usually appear at the blending boundary, especially in areas occluded by the hair, eyeglasses, accessories, etc. To address this problem, rather than struggling with the blending mask in the two-stage routine, we develop a novel one-stage context and identity hallucination network, which learns a series of hallucination maps to softly divide the context areas and identity areas. For context areas, the features are fully utilized by a multi-level context encoder. For identity areas, we design a novel two-cascading AdaIN to transfer the identity while retaining the context. Besides, with the help of hallucination maps, we introduce an effectively improved reconstruction loss to utilize unlimited unpaired face images for training. Our network performs well on both context areas and identity areas without any dependency on post-processing. Extensive qualitative and quantitative experiments demonstrate the superiority of our network.

Supplementary Material

ZIP File (mfp0519aux.zip)

mfp0519_Supplementary.pdf is a supplemental material, which includes additional results, failure cases and detailed network structure.

Download
27.96 MB

References

[1]

Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua. 2018. Towards open-set identity preserving face synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6713--6722.

[2]

Dmitri Bitouk, Neeraj Kumar, Samreen Dhillon, Peter Belhumeur, and Shree K Nayar. 2008. Face swapping: automatically replacing faces in photographs. In ACM SIGGRAPH 2008 papers. 1--8.

Digital Library

[3]

Volker Blanz, Sami Romdhani, and Thomas Vetter. 2002. Face identification across different poses and illuminations with a 3d morphable model. In Proceedings of fifth IEEE international conference on automatic face gesture recognition. IEEE, 202--207.

Digital Library

[4]

Volker Blanz, Kristina Scherbaum, Thomas Vetter, and Hans-Peter Seidel. 2004. Exchanging faces in images. In Computer Graphics Forum, Vol. 23. Wiley Online Library, 669--676.

[5]

Volker Blanz and Thomas Vetter. 2003. Face recognition based on fitting a 3d morphable model. IEEE Transactions on pattern analysis and machine intelligence, Vol. 25, 9 (2003), 1063--1074.

Digital Library

[6]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).

[7]

Frédéric Cazals and Joachim Giesen. 2006. Delaunay triangulation based surface reconstruction. In Effective computational geometry for curves and surfaces. Springer, 231--276.

[8]

Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. SimSwap: An Efficient Framework For High Fidelity Face Swapping. In Proceedings of the 28th ACM International Conference on Multimedia. 2003--2011.

Digital Library

[9]

Deepfakes. 2021. https://github.com/deepfakes/faceswap .

[10]

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019 a. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4690--4699.

[11]

Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019 b. Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641 (2019).

[12]

FaceSwap. 2021. https://github.com/MarekKowalski/FaceSwap .

[13]

Michael S Floater, Kai Hormann, and Géza Kós. 2006. A general construction of barycentric coordinates over convex polygons. advances in computational mathematics, Vol. 24, 1--4 (2006), 311--331.

[14]

Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormahlen, Patrick Perez, and Christian Theobalt. 2014. Automatic face reenactment. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4217--4224.

Digital Library

[15]

Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. In Computer graphics forum, Vol. 34. Wiley Online Library, 193--204.

Digital Library

[16]

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014).

[17]

David Güera and Edward J Delp. 2018. Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--6.

[18]

Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE international conference on computer vision. 2439--2448.

[19]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).

[20]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.

[21]

Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep video portraits. ACM Transactions on Graphics (TOG), Vol. 37, 4 (2018), 1--14.

Digital Library

[22]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[23]

Iryna Korshunova, Wenzhe Shi, Joni Dambre, and Lucas Theis. 2017. Fast face-swap using convolutional neural networks. In Proceedings of the IEEE international conference on computer vision. 3677--3685.

[24]

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).

[25]

Yuezun Li and Siwei Lyu. 2018. Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656 (2018).

[26]

Yuan Lin, Shengjin Wang, Qian Lin, and Feng Tang. 2012. Face swapping under large pose variations: A 3D model based approach. In 2012 IEEE International Conference on Multimedia and Expo. IEEE, 333--338.

Digital Library

[27]

Yinglu Liu, Peipei Li, Xin Tong, Hailin Shi, Xiangyu Zhu, Zhenan Sun, Zhen Xu, Huaibo Liu, Xuefeng Su, Wei Chen, et al. 2021. The 2^ nd 2 nd 106-Point Lightweight Facial Landmark Localization Grand Challenge. In ICPR Workshops. Springer International Publishing, 327--338.

[28]

Yinglu Liu, Hao Shen, Yue Si, Xiaobo Wang, Xiangyu Zhu, Hailin Shi, Zhibin Hong, Hanqi Guo, Ziyuan Guo, Yanqin Chen, et al. 2019. Grand challenge of 106-point facial landmark localization. In ICME Workshop. IEEE, 613--616.

[29]

Yinglu Liu, Hailin Shi, Hao Shen, Yue Si, Xiaobo Wang, and Tao Mei. 2020. A new dataset and boundary-attention semantic segmentation for face parsing. In AAAI, Vol. 34. 11637--11644.

[30]

Ryota Natsume, Tatsuya Yatagawa, and Shigeo Morishima. 2018. Rsgan: face swapping and editing using face and hair representation in latent spaces. arXiv preprint arXiv:1804.03447 (2018).

[31]

Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7184--7193.

[32]

Yuval Nirkin, Iacopo Masi, Anh Tran Tuan, Tal Hassner, and Gerard Medioni. 2018. On face segmentation, face swapping, and face perception. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 98--105.

Digital Library

[33]

Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. (2015).

[34]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[35]

Arun Ross and Asem Othman. 2010. Visual cryptography for biometric privacy. IEEE transactions on information forensics and security, Vol. 6, 1 (2010), 70--81.

Digital Library

[36]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics+: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1--11.

[37]

Nataniel Ruiz, Eunji Chong, and James M. Rehg. 2018. Fine-Grained Head Pose Estimation Without Keypoints. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops .

[38]

Supasorn Suwajanakorn, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing obama: learning lip sync from audio. ACM Transactions on Graphics (ToG), Vol. 36, 4 (2017), 1--13.

Digital Library

[39]

Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG), Vol. 38, 4 (2019), 1--12.

Digital Library

[40]

Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph., Vol. 34, 6 (2015), 183--1.

Digital Library

[41]

Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2387--2395.

Digital Library

[42]

Luan Tran, Xi Yin, and Xiaoming Liu. 2017. Disentangled representation learning gan for pose-invariant face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1415--1424.

[43]

Hong-Xia Wang, Chunhong Pan, Haifeng Gong, and Huai-Yu Wu. 2008. Facial image composition based on active appearance model. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 893--896.

[44]

Jun Wang, Yinglu Liu, Yibo Hu, Hailin Shi, and Tao Mei. 2021. FaceX-Zoo: A PyTorh Toolbox for Face Recognition. arXiv preprint arXiv:2101.04407 (2021).

[45]

Mingcan Xiang, Yinglu Liu, Tingting Liao, Xiangyu Zhu, Can Yang, Wu Liu, and Hailin Shi. 2021. The 3rd Grand Challenge of Lightweight 106-Point Facial Landmark Localization on Masked Faces. In ICME Workshop. IEEE, 1--6.

[46]

Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, and Manmohan Chandraker. 2017. Towards large-pose face frontalization in the wild. In Proceedings of the IEEE international conference on computer vision. 3990--3999.

[47]

Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self-attention generative adversarial networks. In International conference on machine learning. PMLR, 7354--7363.

[48]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, Vol. 23, 10 (2016), 1499--1503.

Cited By

Xiang MTang JYang QGuan HLiu TCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AdapMTL: Adaptive Pruning Framework for Multitask Learning ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681426(5121-5130)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681426

Index Terms

One-stage Context and Identity Hallucination Network
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Image manipulation

Recommendations

Anonym-Recognizer: Relationship-preserving Face Anonymization and Recognition
HCMA '22: Proceedings of the 3rd International Workshop on Human-Centric Multimedia Analysis

With the widespread application of big data technology, we are exposed to more and more video monitoring. To prevent serious social problems caused by face data leakage, face anonymization has become an important kind of method to protect face privacy. ...
Face identity and expression consistency for game character face swapping
Abstract
Customizing the appearance of game characters according to individual preferences is an important application in the gaming industry. The traditional solutions such as manual editing within the game engine require much time and some professional ...
Highlights
- The first work for the game character face swapping from the normal human face to the game face domain.
- An identity compound strategy to improve the identity consistency between the source and the swapped face images while preserving ...
Super-Identity Convolutional Neural Network for Face Hallucination
Computer Vision – ECCV 2018
Abstract
Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information. However, previous face hallucination approaches largely ignore facial identity ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Key R&D Program of China

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
915
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xiang MTang JYang QGuan HLiu TCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AdapMTL: Adaptive Pruning Framework for Multitask Learning ModelProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681426(5121-5130)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681426

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten