Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3547806acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Face Forgery Detection via Symmetric Transformer

Published: 10 October 2022 Publication History

Abstract

The deep learning-based face forgery detection is a novel yet challenging task. Despite impressive results have been achieved, there are still some limitations in the existing methods. For example, the previous methods are hard to maintain consistent predictions for consecutive frames, even if all of those frames are actually forged. We propose a symmetric transformer for channel and spatial feature extraction, which is because the channel and spatial features of a robust forgery detector should be consistent in the temporal domain. The symmetric transformer adopt the newly-designed attention-based strategies for channel variance and spatial gradients as the vital features, which greatly improves the robustness of deepfake video detection. Moreover, this symmetric structure acts on temporal and spatial features respectively, which ensures the robustness of detection from two different aspects. Our symmetric transformer is an end-to-end optimized network. Experiments are conducted on various settings, the proposed methods achieve significantly improvement on prediction robustness and perform better than state-of-the-art methods on different datasets.

Supplementary Material

MP4 File (MM22-fp280.mp4)
The presentation video of the paper 280, Face Forgery Detection via Symmetric Transformer.

References

[1]
(n.d.). Deepfakes. https://github.com/deepfakes/faceswap/.
[2]
(n.d.). Faceswap. https://github.com/MarekKowalski/FaceSwap/.
[3]
Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: a compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1--7.
[4]
Belhassen Bayar and Matthew C Stamm. 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security. 5--10.
[5]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, 213--229.
[6]
Joao Carreira, Eric Noland, Chloe Hillier, and Andrew Zisserman. 2019. A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987 (2019).
[7]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
[8]
Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. 2020. What makes fake images detectable? Understanding properties that generalize. arXiv preprint arXiv:2008.10588 (2020).
[9]
Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. 2020. Pre-trained image processing transformer. arXiv preprint arXiv:2012.00364 (2020).
[10]
Tianlong Chen, Shaojin Ding, Jingyi Xie, Ye Yuan, Wuyang Chen, Yang Yang, Zhou Ren, and ZhangyangWang. 2019. Abd-net: Attentive but diverse person reidentification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8351--8361.
[11]
Zehao Chen and Hua Yang. 2020. Manipulated Face Detector: Joint Spatial and Frequency Domain Attention Network. arXiv preprint arXiv:2005.02958 (2020).
[12]
Myungsub Choi, Heewon Kim, Bohyung Han, Ning Xu, and Kyoung Mu Lee. 2020. Channel attention is all you need for video frame interpolation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10663--10671.
[13]
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251--1258.
[14]
Davide Cozzolino, Diego Gragnaniello, and Luisa Verdoliva. 2014. Image forgery localization through the fusion of camera-based, feature-based and pixel-based techniques. In 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 5302--5306.
[15]
Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2017. Recasting residualbased local descriptors as convolutional neural networks: an application to image forgery detection. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. 159--164.
[16]
Hao Dang, Feng Liu, Joel Stehouwer, Xiaoming Liu, and Anil K Jain. 2020. On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5781--5790.
[17]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[18]
Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, and Janis Keuper. 2019. Unmasking deepfakes with simple features. arXiv preprint arXiv:1911.00686 (2019).
[19]
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slowfast networks for video recognition. In Proceedings of the IEEE international conference on computer vision. 6202--6211.
[20]
Pasquale Ferrara, Tiziano Bianchi, Alessia De Rosa, and Alessandro Piva. 2012. Image forgery localization via fine-grained analysis of CFA artifacts. IEEE Transactions on Information Forensics and Security 7, 5 (2012), 1566--1577.
[21]
Jessica Fridrich and Jan Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7, 3 (2012), 868--882.
[22]
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3146--3154.
[23]
Teddy Surya Gunawan, Siti Amalina Mohammad Hanafiah, Mira Kartiwi, Nanang Ismail, Nor Farahidah Za'bah, and Anis Nurashikin Nordin. 2017. Development of photo forensics algorithm by detecting photoshop manipulation using error level analysis. Indonesian Journal of Electrical Engineering and Computer Science (IJEECS) 7, 1 (2017), 131--137.
[24]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6546--6555.
[25]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015).
[26]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026--1034.
[27]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.
[28]
Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2886--2895.
[29]
Haodong Li, Bin Li, Shunquan Tan, and Jiwu Huang. 2018. Detection of deep network generated images using disparities in color components. arXiv preprint arXiv:1808.07276 (2018).
[30]
Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5001--5010.
[31]
Yuezun Li and Siwei Lyu. 2018. Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656 (2018).
[32]
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3207--3216.
[33]
Zhengzhe Liu, Xiaojuan Qi, and Philip HS Torr. 2020. Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8060--8069.
[34]
Daniel Mas Montserrat, Hanxiang Hao, Sri K Yarlagadda, Sriram Baireddy, Ruiting Shao, Janos Horvath, Emily Bartusiak, Justin Yang, David Guera, Fengqing Zhu, et al. 2020. Deepfakes Detection with Automatic Face Weighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern RecognitionWorkshops. 668--669.
[35]
Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, andWael AbdAlmageed. 2020. Two-branch Recurrent Network for Isolating Deepfakes in Videos. arXiv preprint arXiv:2008.03412 (2020).
[36]
Falko Matern, Christian Riess, and Marc Stamminger. 2019. Exploiting visual artifacts to expose deepfakes and face manipulations. In 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW). IEEE, 83--92.
[37]
Scott McCloskey and Michael Albright. 2018. Detecting gan-generated imagery using color cues. arXiv preprint arXiv:1812.08247 (2018).
[38]
Huy H Nguyen, Fuming Fang, Junichi Yamagishi, and Isao Echizen. 2019. Multitask learning for detecting and segmenting manipulated facial images and videos. arXiv preprint arXiv:1906.06876 (2019).
[39]
Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Use of a capsule network to detect fake images and videos. arXiv preprint arXiv:1910.12467 (2019).
[40]
Xunyu Pan, Xing Zhang, and Siwei Lyu. 2012. Exposing image splicing with inconsistent local noise variances. In 2012 IEEE International Conference on Computational Photography (ICCP). IEEE, 1--10.
[41]
Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues. arXiv preprint arXiv:2007.09355 (2020).
[42]
Nicolas Rahmouni, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2017. Distinguishing computer graphics from natural images using convolution neural networks. In 2017 IEEE Workshop on Information Forensics and Security (WIFS). IEEE, 1--6.
[43]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics: Learning to detect manipulated facial images. In Proceedings of the IEEE International Conference on Computer Vision. 1--11.
[44]
Irwin Sobel. 2014. An Isotropic 3x3 Image Gradient Operator. Presentation at Stanford A.I. Project 1968 (02 2014).
[45]
Luchuan Song, Bin Liu, Guojun Yin, Xiaoyi Dong, Yufei Zhang, and Jia-Xuan Bai. 2021. TACR-Net: Editing on Deep Video and Voice Portraits. In Proceedings of the 29th ACM International Conference on Multimedia. 478--486.
[46]
Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. 2021. Bottleneck Transformers for Visual Recognition. arXiv preprint arXiv:2101.11605 (2021).
[47]
Jingqun Tang,Wenqing Zhang, Hongye Liu, MingKun Yang, Bo Jiang, Guanglong Hu, and Xiang Bai. 2022. Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4563--4572.
[48]
Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1--12.
[49]
Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2387--2395.
[50]
Ruben Tolosana, Sergio Romero-Tapiador, Julian Fierrez, and Ruben Vera-Rodriguez. 2020. DeepFakes Evolution: Analysis of Facial Regions and Fake Detection Performance. arXiv preprint arXiv:2004.07532 (2020).
[51]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489--4497.
[52]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).
[53]
Haohan Wang, Xindi Wu, Zeyi Huang, and Eric P Xing. 2020. High-frequency Component Helps Explain the Generalization of Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8684--8694.
[54]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20--36.
[55]
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. 2020. CNN-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 7.
[56]
XiaolongWang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7794--7803.
[57]
Yaohui Wang and Antitza Dantcheva. 2020. A video is worth more than 1000 lies. Comparing 3DCNN approaches for detecting deepfakes. In FG'20, 15th IEEE International Conference on Automatic Face and Gesture Recognition, May 18--22, 2020, Buenos Aires, Argentina.
[58]
Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. In ICASSP 2019--2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8261--8265.
[59]
Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun. 2020. Exploring Self-attention for Image Recognition. In CVPR.
[60]
Minghang Zheng, Peng Gao, Xiaogang Wang, Hongsheng Li, and Hao Dong. 2020. End-to-End Object Detection with Adaptive Clustering Transformer. arXiv preprint arXiv:2011.09315 (2020).
[61]
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2020. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. arXiv preprint arXiv:2012.07436 (2020).
[62]
Peng Zhou, Xintong Han, Vlad I Morariu, and Larry S Davis. 2017. Two-stream neural networks for tampered face detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 1831--1839.

Cited By

View all
  • (2024) WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake DetectionInternational Journal of Computer Vision10.1007/s11263-024-02116-5Online publication date: 23-May-2024
  • (2023)Spatio-Temporal Catcher: A Self-Supervised Transformer for Deepfake Video DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613842(8707-8718)Online publication date: 26-Oct-2023
  • (2023)UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613767(8749-8759)Online publication date: 26-Oct-2023
  • Show More Cited By

Index Terms

  1. Face Forgery Detection via Symmetric Transformer

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deepfake video detection
    2. symmetric transformer

    Qualifiers

    • Research-article

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)155
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024) WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake DetectionInternational Journal of Computer Vision10.1007/s11263-024-02116-5Online publication date: 23-May-2024
    • (2023)Spatio-Temporal Catcher: A Self-Supervised Transformer for Deepfake Video DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613842(8707-8718)Online publication date: 26-Oct-2023
    • (2023)UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery LocalizationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613767(8749-8759)Online publication date: 26-Oct-2023
    • (2023)Hierarchical Forgery Classifier on Multi-Modality Face Forgery CluesIEEE Transactions on Multimedia10.1109/TMM.2023.330491326(2894-2905)Online publication date: 14-Aug-2023
    • (2023)Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in VIS and NIR ScenarioIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328147533:12(7943-7956)Online publication date: 30-May-2023
    • (2023)Transformer-Auxiliary Neural Networks for Image Manipulation Localization by Operator InductionsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.325144433:9(4907-4920)Online publication date: 1-Sep-2023
    • (2023)Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01905(20782-20792)Online publication date: 1-Oct-2023
    • (2023)Structure Invariant Transformation for better Adversarial Transferability2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00425(4584-4596)Online publication date: 1-Oct-2023
    • (2023)Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00389(3994-4004)Online publication date: Jun-2023
    • (2023)Testing human ability to detect ‘deepfake’ images of human facesJournal of Cybersecurity10.1093/cybsec/tyad0119:1Online publication date: 23-Jun-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media