Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3547832acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Wavelet-enhanced Weakly Supervised Local Feature Learning for Face Forgery Detection

Published: 10 October 2022 Publication History

Abstract

Face forgery detection is getting increasing attention due to the security threats caused by forged faces. Recently, local patch-based approaches have achieved sound achievements due to effective attention to local details. However, there are still unignorable problems: a) local feature learning requires patch-level labels to circumvent label noise, which is not practical in real-world scenarios; b) the commonly used DCT (FFT) transform loses all spatial information, which brings difficulty in handling local details. To compensate for such limitations, a novel wavelet-enhanced weakly supervised local feature learning framework is proposed in this paper. Specifically, to supervise the learning of local features with only image-level labels, two modules are devised based on the idea of multi-instance learning: local relation constraint module (LRCM) and category knowledge-guided local feature aggregation module (CKLFA). LRCM constrains the maximum distance between local features of forged face images greater than that of real face images. CKLFA adaptively aggregates local features based on their correlation to global embedding containing global category information. Combining these two modules, the network is encouraged to learn discriminative local features supervised only by image-level labels. Besides, a multi-level wavelet-powered feature enhancement module is developed to promote the network mining local forgery artifacts from spatio-frequency domain, which is beneficial to learning discriminative local features. Extensive experiments show that our approach outperforms previous state-of-the-art methods when only image-level labels are available and achieves comparable or even better performance than counterparts using patch-level labels.

Supplementary Material

MP4 File (MM22-fp370.mp4)
presentation video-short version

References

[1]
2016. FaceSwap. https://www.github.com/MarekKowalski/FaceSwap. Accessed: 2019-09--30.
[2]
2017. Deepfakes. https://www.github.com/deepfakes/faceswap. Accessed: 2019-09--18.
[3]
Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: a compact facial video forgery detection network. In WIFS. IEEE, 1--7.
[4]
Jaume Amores. 2013. Multiple instance classification: Review, taxonomy and comparative study. Artificial intelligence 201 (2013), 81--105.
[5]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR. 6299--6308.
[6]
Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. [n. d.]. What makes fake images detectable? Understanding properties that generalize. ([n. d.]).
[7]
Renwang Chen, Xuanhong Chen, Bingbing Ni, and Yanhao Ge. 2020. Simswap: An efficient framework for high fidelity face swapping. In Proceedings of the 28th ACM International Conference on Multimedia. 2003--2011.
[8]
Shen Chen, Taiping Yao, Yang Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. 2021. Local Relation Learning for Face Forgery Detection. arXiv preprint arXiv:2105.02577 (2021).
[9]
François Chollet. 2017. Xception: Deep learning with depth-wise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251--1258.
[10]
Hao Dang, Feng Liu, Joel Stehouwer, Xiaoming Liu, and Anil K Jain. 2020. On the detection of digital face manipulation. In CVPR. 5781--5790.
[11]
Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. 2019. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854 (2019).
[12]
James Foulds and Eibe Frank. 2010. A review of multi-instance learning assumptions. The knowledge engineering review 25, 1 (2010), 1--25.
[13]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[15]
Maximilian Ilse, Jakub Tomczak, and Max Welling. 2018. Attention-based deep multiple instance learning. In International conference on machine learning. PMLR, 2127--2136.
[16]
Arne Jensen and Anders la Cour-Harbo. 2001. Ripples in mathematics: the discrete wavelet transform. Springer Science & Business Media.
[17]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[18]
Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. In Thirty-Second AAAI Conference on Artificial Intelligence.
[19]
Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. 2021. Frequency-Aware Discriminative Feature Learning Supervised by Single- Center Loss for Face Forgery Detection. In CVPR. 6458--6467.
[20]
Jiaming Li, Hongtao Xie, Lingyun Yu, Xingyu Gao, and Yongdong Zhang. 2021. Discriminative Feature Mining Based on Frequency Information and Metric Learning for Face Forgery Detection. IEEE Transactions on Knowledge and Data Engineering (2021).
[21]
Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face x-ray for more general face forgery detection. In CVPR. 5001--5010.
[22]
Pandeng Li, Yan Li, Hongtao Xie, and Lei Zhang. 2022. Neighborhood-Adaptive Structure Augmented Metric Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
[23]
Pandeng Li, Hongtao Xie, Shaobo Min, Zheng-Jun Zha, and Yongdong Zhang. 2021. Online Residual Quantization Via Streaming Data Correlation Preserving. IEEE Transactions on Multimedia 24 (2021), 981--994.
[24]
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In CVPR. 3207--3216.
[25]
Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).
[26]
Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017).
[27]
An-An Liu, Yu-Ting Su, Wei-Zhi Nie, and Mohan Kankanhalli. 2016. Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE transactions on pattern analysis and machine intelligence 39, 1 (2016), 102--114.
[28]
Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. 2021. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 772--781.
[29]
Zhenguang Liu, Shuang Wu, Shuyuan Jin, Shouling Ji, Qi Liu, Shijian Lu, and Li Cheng. 2021. Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2021), 1--16. https://doi.org/10.1109/TPAMI.2021.3139918
[30]
Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. 2021. Generalizing Face Forgery Detection with High-frequency Features. In CVPR. 16317--16326.
[31]
Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, and Wael AbdAlmageed. 2020. Two-branch recurrent network for isolating deepfakes in videos. In ECCV. Springer, 667--684.
[32]
Yuxin Peng, Xiangteng He, and Junjie Zhao. 2017. Object-part attention model for fine-grained image classification. TIP 27, 3 (2017), 1487--1500.
[33]
Pedro O Pinheiro and Ronan Collobert. 2015. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1713--1721.
[34]
KR Prajwal, Rudrabha Mukhopadhyay, Vinay P Namboodiri, and CV Jawahar. 2020. A lip sync expert is all you need for speech to lip generation in the wild. In Proceedings of the 28th ACM International Conference on Multimedia. 484--492.
[35]
Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, and Jianjun Zhao. 2020. DeepRhythm: Exposing DeepFakes with Attentional Visual Heartbeat Rhythms. In Proceedings of the 28th ACM International Conference on Multimedia. 4318--4327.
[36]
Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In ECCV. Springer, 86--103.
[37]
Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3d residual networks. In proceedings of the IEEE International Conference on Computer Vision. 5533--5541.
[38]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning. PMLR, 1278--1286.
[39]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1--11.
[40]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618--626.
[41]
Gilbert Strang and Truong Nguyen. 1996. Wavelets and filter banks. SIAM.
[42]
Ke Sun, Hong Liu, Qixiang Ye, Jianzhuang Liu, Yue Gao, Ling Shao, and Rongrong Ji. 2021. Domain general face forgery detection by learning to weight. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2638--2646.
[43]
Ke Sun, Taiping Yao, Shen Chen, Shouhong Ding, Jilin L, and Rongrong Ji. 2021. Dual Contrastive Learning for General Face Forgery Detection. arXiv:cs.CV/2112.13522
[44]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning. PMLR, 6105--6114.
[45]
David S Taubman and Michael W Marcellin. 2002. JPEG2000: Standard for interactive imaging. Proc. IEEE 90, 8 (2002), 1336--1357.
[46]
Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. TOG 38, 4 (2019), 1--12.
[47]
Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR. 2387--2395.
[48]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In ICCV. 4489--4497.
[49]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[50]
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. 2020. Cnn-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8695--8704.
[51]
Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. 2018. Revisiting multiple instance neural networks. Pattern Recognition 74 (2018), 15--24.
[52]
Moritz Wolter, Felix Blanke, Charles Tapley Hoyt, and Jochen Garcke. 2021. Wavelet-Packet Powered Deepfake Image Detection. arXiv preprint arXiv:2106.09369 (2021).
[53]
Huikai Wu, Shuai Zheng, Junge Zhang, and Kaiqi Huang. 2019. Gp-gan: Towards realistic high-resolution image blending. In Proceedings of the 27th ACM international conference on multimedia. 2487--2495.
[54]
Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, and Xiaochun Cao. 2016. SketchNet: Sketch Classification With Web Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55]
Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021. Multi-attentional deepfake detection. In CVPR. 2185--2194.
[56]
Tianchen Zhao, Xiang Xu, Mingze Xu, Hui Ding, Yuanjun Xiong, and Wei Xia. 2021. Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15023--15033.
[57]
Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, and Jiebo Luo. 2019. Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In CVPR. 5012--5021.
[58]
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2921--2929.
[59]
Wentao Zhu, Qi Lou, Yeeleng Scott Vang, and Xiaohui Xie. 2017. Deep multi-instance networks with sparse label assignment for whole mammogram classification. In International conference on medical image computing and computer-assisted intervention. Springer, 603--611.
[60]
Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yu-Gang Jiang. 2021. WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection. arXiv:cs.CV/2101.01456

Cited By

View all
  • (2025)Generalizable Deepfake Detection With Phase-Based Motion AnalysisIEEE Transactions on Image Processing10.1109/TIP.2024.344182134(100-112)Online publication date: 2025
  • (2025)Face Forgery Detection Based on Fine-Grained Clues and Noise InconsistencyIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34553116:1(144-158)Online publication date: Jan-2025
  • (2024)IEIRNet: Inconsistency Exploiting Based Identity Rectification for Face Forgery DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.345306626(11232-11245)Online publication date: 1-Jan-2024
  • Show More Cited By

Index Terms

  1. Wavelet-enhanced Weakly Supervised Local Feature Learning for Face Forgery Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. face forgery detection
    2. multi-instance learning
    3. wavelet transform

    Qualifiers

    • Research-article

    Funding Sources

    • the National Nature Science Foundation of China
    • the Fundamental Research Funds for the Central Universities
    • the Hefei Postdoctoral Research Activities Foundation
    • the Youth Innovation Promotion Association Chinese Academy of Sciences

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)150
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Generalizable Deepfake Detection With Phase-Based Motion AnalysisIEEE Transactions on Image Processing10.1109/TIP.2024.344182134(100-112)Online publication date: 2025
    • (2025)Face Forgery Detection Based on Fine-Grained Clues and Noise InconsistencyIEEE Transactions on Artificial Intelligence10.1109/TAI.2024.34553116:1(144-158)Online publication date: Jan-2025
    • (2024)IEIRNet: Inconsistency Exploiting Based Identity Rectification for Face Forgery DetectionIEEE Transactions on Multimedia10.1109/TMM.2024.345306626(11232-11245)Online publication date: 1-Jan-2024
    • (2024)Improving Generalization of Deepfake Detectors by Imposing Gradient RegularizationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.339606419(5345-5356)Online publication date: 2024
    • (2024) WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake DetectionInternational Journal of Computer Vision10.1007/s11263-024-02116-5132:10(4746-4767)Online publication date: 23-May-2024
    • (2024)SA$$^3$$WT: Adaptive Wavelet-Based Transformer with Self-Paced Auto Augmentation for Face Forgery DetectionInternational Journal of Computer Vision10.1007/s11263-024-02091-x132:10(4417-4439)Online publication date: 16-May-2024
    • (2024)ST-SBV: Spatial-Temporal Self-Blended Videos for Deepfake DetectionPattern Recognition and Computer Vision10.1007/978-981-97-8620-6_19(274-288)Online publication date: 20-Oct-2024
    • (2024)Dual-Task Cascaded for Proactive Deepfake Detection Using QPCET WatermarkingPattern Recognition and Computer Vision10.1007/978-981-97-8490-5_10(132-147)Online publication date: 7-Nov-2024
    • (2024)Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head VideosComputer Vision – ECCV 202410.1007/978-3-031-72633-0_12(209-228)Online publication date: 22-Nov-2024
    • (2023)Discrepancy-guided reconstruction learning for image forgery detectionProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/154(1387-1395)Online publication date: 19-Aug-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media