Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

Domain-invariant and Patch-discriminative Feature Learning for General Deepfake Detection

Online AM: 27 April 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Hyper-realistic avatars in the metaverse have already raised security concerns about deepfake techniques, deepfakes involving generated video “recording” may be mistaken for a real recording of the people it depicts. As a result, deepfake detection has drawn considerable attention in the multimedia forensic community. Though existing methods for deepfake detection achieve fairly good performance under the intra-dataset scenario, many of them gain unsatisfying results in the case of cross-dataset testing with more practical value, where the forged faces in training and testing datasets are from different domains. To tackle this issue, in this paper, we propose a novel Domain-Invariant and Patch-Discriminative feature learning framework - DI&PD. For image-level feature learning, a single-side adversarial domain generalization is introduced to eliminate domain variances and learn domain-invariant features in training samples from different manipulation methods, along with the global and local random crop augmentation strategy to generate more data views of forged images at various scales. A graph structure is then built by splitting the learned image-level feature maps, with each spatial location corresponding to a local patch, which facilitates patch representation learning by message-passing among similar nodes. Two types of center losses are utilized to learn more discriminative features in both image-level and patch-level embedding spaces. Extensive experimental results on several datasets demonstrate the effectiveness and generalization of the proposed method compared with other state-of-the-art methods.

    References

    [1]
    Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: a compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–7.
    [2]
    Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. 2021. ViViT: A Video Vision Transformer. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 6816–6826. https://doi.org/10.1109/ICCV48922.2021.00676
    [3]
    Belhassen Bayar and Matthew C Stamm. 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM workshop on information hiding and multimedia security. 5–10.
    [4]
    Federico Becattini, Carmen Bisogni, Vincenzo Loia, Chiara Pero, and Fei Hao. 2023. Head Pose Estimation Patterns as Deepfake Detectors. ACM Trans. Multimedia Comput. Commun. Appl.(2023). https://doi.org/10.1145/3612928
    [5]
    Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR2014), CBLS, April 2014.
    [6]
    Shen Chen, Taiping Yao, Yang Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. 2021. Local relation learning for face forgery detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.  35. 1081–1088.
    [7]
    Zehao Chen and Hua Yang. 2020. Manipulated face detector: Joint spatial and frequency domain attention network. arXiv e-prints (2020), arXiv–2005.
    [8]
    Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2017. Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security. 159–164.
    [9]
    Davide Cozzolino, Justus Thies, Andreas Rössler, Christian Riess, Matthias Nießner, and Luisa Verdoliva. 2018. Forensictransfer: Weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510(2018).
    [10]
    Deepfakes. 2019. https : / / www. github. com /deepfakes/faceswap.
    [11]
    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
    [12]
    Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong. 2020. Disentangled and controllable face image generation via 3d imitative-contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5154–5163.
    [13]
    Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. 2019. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854(2019).
    [14]
    Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, and Janis Keuper. 2019. Unmasking deepfakes with simple features. arXiv preprint arXiv:1911.00686(2019).
    [15]
    Faceswap. 2019. https : / / www. github. com /MarekKowalski / FaceSwap.
    [16]
    Jianwei Fei, Yunshu Dai, Peipeng Yu, Tianrun Shen, Zhihua Xia, and Jian Weng. 2022. Learning second order local anomaly for general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20270–20280.
    [17]
    Jessica Fridrich and Jan Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7, 3(2012), 868–882.
    [18]
    Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International conference on machine learning. PMLR, 1180–1189.
    [19]
    Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. Large-scale learnable graph convolutional networks. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1416–1424.
    [20]
    Shiming Ge, Fanzhao Lin, Chenyu Li, Daichi Zhang, Weiping Wang, and Dan Zeng. 2022. Deepfake Video Detection via Predictive Representation Learning. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2s (2022). https://doi.org/10.1145/3536426
    [21]
    Marco Gori, Gabriele Monfardini, and Franco Scarselli. 2005. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol.  2. IEEE, 729–734.
    [22]
    Alexandros Haliassos, Rodrigo Mira, Stavros Petridis, and Maja Pantic. 2022. Leveraging real talking faces via self-supervision for robust forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14950–14962.
    [23]
    Alexandros Haliassos, Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2021. Lips don’t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5039–5049.
    [24]
    William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS.
    [25]
    Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, and Enhua Wu. 2022. Vision GNN: An Image is Worth Graph of Nodes. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https://openreview.net/forum?id=htM1WJZVB2I
    [26]
    Farkhund Iqbal, Ahmed Abbasi, Abdul Rehman Javed, Ahmad Almadhor, Zunera Jalil, Sajid Anwar, and Imad Rida. 2023. Data Augmentation-Based Novel Deep Learning Method for Deepfaked Images Detection. ACM Trans. Multimedia Comput. Commun. Appl.(2023). https://doi.org/10.1145/3592615
    [27]
    Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2889–2898.
    [28]
    Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.
    [29]
    Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1867–1874.
    [30]
    Yann LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2002. Efficient backprop. In Neural networks: Tricks of the trade. Springer, 9–50.
    [31]
    Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. In Thirty-Second AAAI Conference on Artificial Intelligence.
    [32]
    Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2019. Deepgcns: Can gcns go as deep as cnns?. In Proceedings of the IEEE/CVF international conference on computer vision. 9267–9276.
    [33]
    Haodong Li, Bin Li, Shunquan Tan, and Jiwu Huang. 2020. Identification of deep network generated images using disparities in color components. Signal Processing 174(2020), 107616. https://doi.org/10.1016/j.sigpro.2020.107616
    [34]
    Haoliang Li, Sinno Jialin Pan, Shiqi Wang, and Alex C Kot. 2018. Domain generalization with adversarial feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5400–5409.
    [35]
    Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, and Yongdong Zhang. 2021. Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6458–6467.
    [36]
    Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5001–5010.
    [37]
    Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–7.
    [38]
    Yuezun Li and Siwei Lyu. 2019. Exposing DeepFake Videos By Detecting Face Warping Artifacts. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
    [39]
    Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3207–3216.
    [40]
    Honggu Liu, Xiaodan Li, Wenbo Zhou, Yuefeng Chen, Yuan He, Hui Xue, Weiming Zhang, and Nenghai Yu. 2021. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 772–781.
    [41]
    Honggu Liu, Wenbo Zhou, Dongdong Chen, Han Fang, Huanyu Bian, Kunlin Liu, Weiming Zhang, and Nenghai Yu. 2023. Coherent adversarial deepfake video generation. Signal Processing 203(2023), 108790. https://doi.org/10.1016/j.sigpro.2022.108790
    [42]
    Xiaolong Liu, Yang Yu, Xiaolong Li, Yao Zhao, and Guodong Guo. 2022. TCSD: Triple Complementary Streams Detector for Comprehensive Deepfake Detection. ACM Trans. Multimedia Comput. Commun. Appl.(2022). https://doi.org/10.1145/3558004
    [43]
    Yuchen Luo, Yong Zhang, Junchi Yan, and Wei Liu. 2021. Generalizing Face Forgery Detection with High-frequency Features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16317–16326.
    [44]
    Alessio Micheli. 2009. Neural network for graphs: A contextual constructive approach. IEEE Transactions on Neural Networks 20, 3 (2009), 498–511.
    [45]
    Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. 2017. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5115–5124.
    [46]
    Saeid Motiian, Marco Piccirilli, Donald A Adjeroh, and Gianfranco Doretto. 2017. Unified deep supervised domain adaptation and generalization. In Proceedings of the IEEE international conference on computer vision. 5715–5725.
    [47]
    Huy H Nguyen, Fuming Fang, Junichi Yamagishi, and Isao Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv preprint arXiv:1906.06876(2019).
    [48]
    Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2307–2311.
    [49]
    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).
    [50]
    Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In European Conference on Computer Vision. Springer, 86–103.
    [51]
    Nicolas Rahmouni, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2017. Distinguishing computer graphics from natural images using convolution neural networks. In 2017 IEEE Workshop on Information Forensics and Security (WIFS). IEEE, 1–6.
    [52]
    Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1–11.
    [53]
    Victor Garcia Satorras and Joan Bruna Estrach. 2018. Few-shot learning with graph neural networks. In International conference on learning representations.
    [54]
    Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE transactions on neural networks 20, 1 (2008), 61–80.
    [55]
    Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV). 618–626.
    [56]
    Rui Shao, Xiangyuan Lan, Jiawei Li, and Pong C Yuen. 2019. Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10023–10031.
    [57]
    Ke Sun, Hong Liu, Qixiang Ye, Jianzhuang Liu, Yue Gao, Ling Shao, and Rongrong Ji. 2021. Domain General Face Forgery Detection by Learning to Weight. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.  35. 2638–2646.
    [58]
    Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105–6114.
    [59]
    Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of machine learning research 9, 11 (2008).
    [60]
    Tianyi Wang, Harry Cheng, Kam Pui Chow, and Liqiang Nie. 2023. Deep Convolutional Pooling Transformer for Deepfake Detection. ACM Trans. Multimedia Comput. Commun. Appl.(2023). https://doi.org/10.1145/3588574
    [61]
    Yuntao Wang, Zhou Su, Ning Zhang, Rui Xing, Dongxiao Liu, Tom H Luan, and Xuemin Shen. 2022. A survey on metaverse: Fundamentals, security, and privacy. IEEE Communications Surveys & Tutorials(2022).
    [62]
    Hao Xie, Jiangqun Ni, Jian Zhang, Weizhe Zhang, and Jiwu Huang. 2022. Evading generated-image detectors: A deep dithering approach. Signal Processing 197(2022), 108558. https://doi.org/10.1016/j.sigpro.2022.108558
    [63]
    Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8261–8265.
    [64]
    Baogen Zhang, Sheng Li, Guorui Feng, Zhenxing Qian, and Xinpeng Zhang. 2022. Patch Diffusion: A General Module for Face Manipulation Detection. Proceedings of the AAAI Conference on Artificial Intelligence 36, 3(Jun. 2022), 3243–3251. https://doi.org/10.1609/aaai.v36i3.20233
    [65]
    Jian Zhang and Jiangqun Ni. 2023. Domain-Invariant Feature Learning for General Face Forgery Detection. In 2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2321–2326.
    [66]
    Li Zhang, Dan Xu, Anurag Arnab, and Philip HS Torr. 2020. Dynamic graph message passing networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3726–3735.
    [67]
    Cairong Zhao, Chutian Wang, Guosheng Hu, Haonan Chen, Chun Liu, and Jinhui Tang. 2023. ISTVT: interpretable spatial-temporal video transformer for deepfake detection. IEEE Transactions on Information Forensics and Security 18 (2023), 1335–1348.
    [68]
    Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021. Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2185–2194.
    [69]
    Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. 2021. Exploring temporal coherence for more general video face forgery detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15044–15054.
    [70]
    Peng Zhou, Xintong Han, Vlad I Morariu, and Larry S Davis. 2017. Two-stream neural networks for tampered face detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 1831–1839.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted
    ISSN:1551-6857
    EISSN:1551-6865
    Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Online AM: 27 April 2024
    Accepted: 05 April 2024
    Revised: 21 March 2024
    Received: 29 December 2023

    Check for updates

    Author Tags

    1. Deepfake detection
    2. Adversarial domain generalization
    3. Patch-based graph learning
    4. Center loss

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 208
      Total Downloads
    • Downloads (Last 12 months)208
    • Downloads (Last 6 weeks)53
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media