research-article

Wavelet-enhanced Weakly Supervised Local Feature Learning for Face Forgery Detection

Authors:

Yongdong ZhangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 1299 - 1308

https://doi.org/10.1145/3503161.3547832

Published: 10 October 2022 Publication History

Abstract

Face forgery detection is getting increasing attention due to the security threats caused by forged faces. Recently, local patch-based approaches have achieved sound achievements due to effective attention to local details. However, there are still unignorable problems: a) local feature learning requires patch-level labels to circumvent label noise, which is not practical in real-world scenarios; b) the commonly used DCT (FFT) transform loses all spatial information, which brings difficulty in handling local details. To compensate for such limitations, a novel wavelet-enhanced weakly supervised local feature learning framework is proposed in this paper. Specifically, to supervise the learning of local features with only image-level labels, two modules are devised based on the idea of multi-instance learning: local relation constraint module (LRCM) and category knowledge-guided local feature aggregation module (CKLFA). LRCM constrains the maximum distance between local features of forged face images greater than that of real face images. CKLFA adaptively aggregates local features based on their correlation to global embedding containing global category information. Combining these two modules, the network is encouraged to learn discriminative local features supervised only by image-level labels. Besides, a multi-level wavelet-powered feature enhancement module is developed to promote the network mining local forgery artifacts from spatio-frequency domain, which is beneficial to learning discriminative local features. Extensive experiments show that our approach outperforms previous state-of-the-art methods when only image-level labels are available and achieves comparable or even better performance than counterparts using patch-level labels.

Supplementary Material

MP4 File (MM22-fp370.mp4)

presentation video-short version

Download
10.34 MB

References

[1]

2016. FaceSwap. https://www.github.com/MarekKowalski/FaceSwap. Accessed: 2019-09--30.

[2]

2017. Deepfakes. https://www.github.com/deepfakes/faceswap. Accessed: 2019-09--18.

[3]

Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: a compact facial video forgery detection network. In WIFS. IEEE, 1--7.

[4]

Jaume Amores. 2013. Multiple instance classification: Review, taxonomy and comparative study. Artificial intelligence 201 (2013), 81--105.

[5]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR. 6299--6308.

[6]

Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. [n. d.]. What makes fake images detectable? Understanding properties that generalize. ([n. d.]).

[7]

Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia. 2003--2011.

Digital Library

[8]

Shen Chen, Taiping Yao, Yang Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. 2021. Local Relation Learning for Face Forgery Detection. arXiv preprint arXiv:2105.02577 (2021).

[9]

François Chollet. 2017. Xception: Deep learning with depth-wise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251--1258.

[10]

Hao Dang, Feng Liu, Joel Stehouwer, Xiaoming Liu, and Anil K Jain. 2020. On the detection of digital face manipulation. In CVPR. 5781--5790.

[11]

Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. 2019. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854 (2019).

[12]

James Foulds and Eibe Frank. 2010. A review of multi-instance learning assumptions. The knowledge engineering review 25, 1 (2010), 1--25.

[13]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[15]

Maximilian Ilse, Jakub Tomczak, and Max Welling. 2018. Attention-based deep multiple instance learning. In International conference on machine learning. PMLR, 2127--2136.

[16]

Arne Jensen and Anders la Cour-Harbo. 2001. Ripples in mathematics: the discrete wavelet transform. Springer Science & Business Media.

[17]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[18]

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. In Thirty-Second AAAI Conference on Artificial Intelligence.

[19]

Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. 2021. Frequency-Aware Discriminative Feature Learning Supervised by Single- Center Loss for Face Forgery Detection. In CVPR. 6458--6467.

[20]

Jiaming Li, Hongtao Xie, Lingyun Yu, Xingyu Gao, and Yongdong Zhang. 2021. Discriminative Feature Mining Based on Frequency Information and Metric Learning for Face Forgery Detection. IEEE Transactions on Knowledge and Data Engineering (2021).

[21]

Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face x-ray for more general face forgery detection. In CVPR. 5001--5010.

[22]

Pandeng Li, Yan Li, Hongtao Xie, and Lei Zhang. 2022. Neighborhood-Adaptive Structure Augmented Metric Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.

[23]

Pandeng Li, Hongtao Xie, Shaobo Min, Zheng-Jun Zha, and Yongdong Zhang. 2021. Online Residual Quantization Via Streaming Data Correlation Preserving. IEEE Transactions on Multimedia 24 (2021), 981--994.

[24]

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In CVPR. 3207--3216.

[25]

Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).

[26]

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017).

[27]

An-An Liu, Yu-Ting Su, Wei-Zhi Nie, and Mohan Kankanhalli. 2016. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE transactions on pattern analysis and machine intelligence 39, 1 (2016), 102--114.

[28]

Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. 2021. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 772--781.

[29]

Zhenguang Liu, Shuang Wu, Shuyuan Jin, Shouling Ji, Qi Liu, Shijian Lu, and Li Cheng. 2021. Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2021), 1--16. https://doi.org/10.1109/TPAMI.2021.3139918

[30]

Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. 2021. Generalizing Face Forgery Detection with High-frequency Features. In CVPR. 16317--16326.

[31]

Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, and Wael AbdAlmageed. 2020. Two-branch recurrent network for isolating deepfakes in videos. In ECCV. Springer, 667--684.

[32]

Yuxin Peng, Xiangteng He, and Junjie Zhao. 2017. Object-part attention model for fine-grained image classification. TIP 27, 3 (2017), 1487--1500.

[33]

Pedro O Pinheiro and Ronan Collobert. 2015. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1713--1721.

[34]

KR Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, and CV Jawahar. 2020. A lip sync expert is all you need for speech to lip generation in the wild. In Proceedings of the 28th ACM International Conference on Multimedia. 484--492.

Digital Library

[35]

Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, and Jianjun Zhao. 2020. DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms. In Proceedings of the 28th ACM International Conference on Multimedia. 4318--4327.

Digital Library

[36]

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In ECCV. Springer, 86--103.

[37]

Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3d residual networks. In proceedings of the IEEE International Conference on Computer Vision. 5533--5541.

[38]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning. PMLR, 1278--1286.

[39]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1--11.

[40]

Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.

[41]

Gilbert Strang and Truong Nguyen. 1996. Wavelets and filter banks. SIAM.

[42]

Ke Sun, Hong Liu, Qixiang Ye, Jianzhuang Liu, Yue Gao, Ling Shao, and Rongrong Ji. 2021. Domain general face forgery detection by learning to weight. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2638--2646.

[43]

Ke Sun, Taiping Yao, Shen Chen, Shouhong Ding, Jilin L, and Rongrong Ji. 2021. Dual Contrastive Learning for General Face Forgery Detection. arXiv:cs.CV/2112.13522

[44]

Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105--6114.

[45]

David S Taubman and Michael W Marcellin. 2002. JPEG2000: Standard for interactive imaging. Proc. IEEE 90, 8 (2002), 1336--1357.

[46]

Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. TOG 38, 4 (2019), 1--12.

Digital Library

[47]

Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR. 2387--2395.

[48]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In ICCV. 4489--4497.

[49]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[50]

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. 2020. Cnn-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8695--8704.

[51]

Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. 2018. Revisiting multiple instance neural networks. Pattern Recognition 74 (2018), 15--24.

Digital Library

[52]

Moritz Wolter, Felix Blanke, Charles Tapley Hoyt, and Jochen Garcke. 2021. Wavelet-Packet Powered Deepfake Image Detection. arXiv preprint arXiv:2106.09369 (2021).

[53]

Huikai Wu, Shuai Zheng, Junge Zhang, and Kaiqi Huang. 2019. Gp-gan: Towards realistic high-resolution image blending. In Proceedings of the 27th ACM international conference on multimedia. 2487--2495.

Digital Library

[54]

Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, and Xiaochun Cao. 2016. SketchNet: Sketch Classification With Web Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]

Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021. Multi-attentional deepfake detection. In CVPR. 2185--2194.

[56]

Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. 2021. Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15023--15033.

[57]

Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, and Jiebo Luo. 2019. Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In CVPR. 5012--5021.

[58]

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.

[59]

Wentao Zhu, Qi Lou, Yeeleng Scott Vang, and Xiaohui Xie. 2017. Deep multi-instance networks with sparse label assignment for whole mammogram classification. In International conference on medical image computing and computer-assisted intervention. Springer, 603--611.

Digital Library

[60]

Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. 2021. WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection. arXiv:cs.CV/2101.01456

Cited By

Prashnani EGoebel MManjunath B(2025)Generalizable Deepfake Detection With Phase-Based Motion AnalysisIEEE Transactions on Image Processing10.1109/TIP.2024.344182134(100-112)Online publication date: 2025
https://doi.org/10.1109/TIP.2024.3441821
Zhang DHe RLiao XLi FChen JYang G(2025)Face Forgery Detection Based on Fine-Grained Clues and Noise InconsistencyIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34553116:1(144-158)Online publication date: Jan-2025
https://doi.org/10.1109/TAI.2024.3455311
Fang MYu LSong YZhang YXie H(2024)IEIRNet: Inconsistency Exploiting Based Identity Rectification for Face Forgery DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.345306626(11232-11245)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3453066
Show More Cited By

Index Terms

Wavelet-enhanced Weakly Supervised Local Feature Learning for Face Forgery Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Convex and scalable weakly labeled SVMs

In this paper, we study the problem of learning from weakly labeled data, where labels of the training examples are incomplete. This includes, for example, (i) semi-supervised learning where labels are partially known; (ii) multi-instance learning where ...
Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection
Weakly supervised object detection (WSOD), aiming to detect objects with only image-level annotations, has become one of the research hotspots over the past few years. Recently, much effort has been devoted to WSOD for the simple yet effective ...
Cost‐effective multi‐instance multilabel active learning
Abstract
Multi‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Nature Science Foundation of China
the Fundamental Research Funds for the Central Universities
the Hefei Postdoctoral Research Activities Foundation
the Youth Innovation Promotion Association Chinese Academy of Sciences

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
504
Total Downloads

Downloads (Last 12 months)150
Downloads (Last 6 weeks)10

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Prashnani EGoebel MManjunath B(2025)Generalizable Deepfake Detection With Phase-Based Motion AnalysisIEEE Transactions on Image Processing10.1109/TIP.2024.344182134(100-112)Online publication date: 2025
https://doi.org/10.1109/TIP.2024.3441821
Zhang DHe RLiao XLi FChen JYang G(2025)Face Forgery Detection Based on Fine-Grained Clues and Noise InconsistencyIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34553116:1(144-158)Online publication date: Jan-2025
https://doi.org/10.1109/TAI.2024.3455311
Fang MYu LSong YZhang YXie H(2024)IEIRNet: Inconsistency Exploiting Based Identity Rectification for Face Forgery DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.345306626(11232-11245)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3453066
Guan WWang WDong JPeng B(2024)Improving Generalization of Deepfake Detectors by Imposing Gradient RegularizationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.339606419(5345-5356)Online publication date: 2024
https://doi.org/10.1109/TIFS.2024.3396064
Wang YChen CZhang NHu X(2024) WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake DetectionInternational Journal of Computer Vision10.1007/s11263-024-02116-5132:10(4746-4767)Online publication date: 23-May-2024
https://doi.org/10.1007/s11263-024-02116-5
Li YZhang YYang HChen BHuang D(2024)SA$$^3$$WT: Adaptive Wavelet-Based Transformer with Self-Paced Auto Augmentation for Face Forgery DetectionInternational Journal of Computer Vision10.1007/s11263-024-02091-x132:10(4417-4439)Online publication date: 16-May-2024
https://doi.org/10.1007/s11263-024-02091-x
Guan WWang WPeng BDong JTan T(2024)ST-SBV: Spatial-Temporal Self-Blended Videos for Deepfake DetectionPattern Recognition and Computer Vision10.1007/978-981-97-8620-6_19(274-288)Online publication date: 20-Oct-2024
https://doi.org/10.1007/978-981-97-8620-6_19
Wang CShi CLiu YXia ZLi JXian YMa B(2024)Dual-Task Cascaded for Proactive Deepfake Detection Using QPCET WatermarkingPattern Recognition and Computer Vision10.1007/978-981-97-8490-5_10(132-147)Online publication date: 7-Nov-2024
https://doi.org/10.1007/978-981-97-8490-5_10
Prashnani ENagano KDe Mello SLuebke DGallo O(2024)Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head VideosComputer Vision – ECCV 202410.1007/978-3-031-72633-0_12(209-228)Online publication date: 22-Nov-2024
https://doi.org/10.1007/978-3-031-72633-0_12
Shi ZChen HChen LZhang DElkind E(2023)Discrepancy-guided reconstruction learning for image forgery detectionProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/154(1387-1395)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/154
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten