research-article

Detecting Deepfake Videos using Spatiotemporal Trident Network

Authors:

Yangyang MeiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 11

Article No.: 340, Pages 1 - 20

https://doi.org/10.1145/3623639

Published: 12 September 2024 Publication History

Abstract

The widespread dissemination of Deepfake in social networks has posed serious security risks, thus necessitating the development of an effective Deepfake detection technique. Currently, video-based detectors have not been explored as extensively as image-based detectors. Most existing video-based methods only consider temporal features without combining spatial features, and do not mine deeper-level subtle forgeries, resulting in limited detection performance. In this paper, a novel spatiotemporal trident network (STN) is proposed to detect both spatial and temporal inconsistencies of Deepfake videos. Since there is a large amount of redundant information in Deepfake video frames, we introduce convolutional block attention module (CBAM) on the basis of the I3D network and optimize the structure to make the network better focus on the meaningful information of the input video. Aiming at the defects in the deeper-level subtle forgeries, we designed three feature extraction modules (FEMs) of RGB, optical flow, and noise to further extract deeper video frame information. Extensive experiments on several well-known datasets demonstrate that our method has promising performance, surpassing several state-of-the-art Deepfake video detection methods.

References

[1]

Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. MesoNet: A compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS’18). IEEE, 1–7.

[2]

Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, and Matthias Grundmann. 2019. BlazeFace: Sub-millisecond neural face detection on mobile GPUs. arXiv preprint arXiv:1907.05047 (2019).

[3]

Nicolo Bonettini, Edoardo Daniele Cannas, Sara Mandelli, Luca Bondi, Paolo Bestagini, and Stefano Tubaro. 2021. Video face manipulation detection through ensemble of CNNs. In 2020 25th International Conference on Pattern Recognition (ICPR’21). IEEE, 5012–5019.

[4]

Gary Bradski. 2000. The openCV library. Dr. Dobb’s Journal: Software Tools for the Professional Programmer 25, 11 (2000), 120–123.

[5]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.

[6]

Oscar De Lima, Sean Franklin, Shreshtha Basu, Blake Karwoski, and Annet George. 2020. Deepfake detection using spatiotemporal convolutional networks. arXiv preprint arXiv:2006.14749 (2020).

[7]

DeepFakes. 2017. DeepFakes. Retrieved Feb. 16, 2023 from https://github.com/deepfakes/faceswap

[8]

Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. RetinaFace: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5203–5212.

[9]

Jessica Fridrich and Jan Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7, 3 (2012), 868–882.

Digital Library

[10]

Shreyan Ganguly, Sk Mohiuddin, Samir Malakar, Erik Cuevas, and Ram Sarkar. 2022. Visual attention-based deepfake video forgery detection. Pattern Analysis and Applications 25, 4 (2022), 981–992.

Digital Library

[11]

Ipek Ganiyusufoglu, L. Minh Ngô, Nedko Savov, Sezer Karaoglu, and Theo Gevers. 2020. Spatio-temporal features for generalized detection of deepfake videos. arXiv preprint arXiv:2010.11844 (2020).

[12]

Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, and Zhouhui Lian. 2021. High-fidelity and arbitrary face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16115–16124.

[13]

Shiming Ge, Fanzhao Lin, Chenyu Li, Daichi Zhang, Weiping Wang, and Dan Zeng. 2022. Deepfake video detection via predictive representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 2s (2022), 1–21.

Digital Library

[14]

David Güera and Edward J. Delp. 2018. Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’18). IEEE, 1–6.

[15]

Bing Han, Xiaoguang Han, Hua Zhang, Jingzhi Li, and Xiaochun Cao. 2021. Fighting fake news: Two stream network for deepfake detection via learnable SRM. IEEE Transactions on Biometrics, Behavior, and Identity Science 3, 3 (2021), 320–331.

[16]

Weihong Han, Zhihong Tian, Chunsheng Zhu, Zizhong Huang, Yan Jia, and Mohsen Guizani. 2019. A topic representation model for online social networks based on hybrid human–artificial intelligence. IEEE Transactions on Computational Social Systems 8, 1 (2019), 191–200.

[17]

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatio-temporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154–3160.

[18]

Juan Hu, Xin Liao, Wei Wang, and Zheng Qin. 2021. Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2021), 1089–1102.

[19]

Serdar Ince and Janusz Konrad. 2008. Occlusion-aware optical flow estimation. IEEE Transactions on Image Processing 17, 8 (2008), 1443–1451.

Digital Library

[20]

Felix Juefei-Xu, Run Wang, Yihao Huang, Qing Guo, Lei Ma, and Yang Liu. 2022. Countering malicious deepfakes: Survey, battleground, and horizon. International Journal of Computer Vision 130, 7 (2022), 1678–1734.

Digital Library

[21]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.

[22]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110–8119.

[23]

Serkan Kiranyaz, Muhammad-Adeel Waris, Iftikhar Ahmad, Ridha Hamila, and Moncef Gabbouj. 2016. Face segmentation in thumbnail images by data-adaptive convolutional segmentation networks. In 2016 IEEE International Conference on Image Processing (ICIP’16). IEEE, 2306–2310.

[24]

Marek Kowalski. 2017. FaceSwap. Retrieved Feb. 16, 2023 from https://github.com/MarekKowalski/FaceSwap/

[25]

Shudong Li, Laiyuan Jiang, Xiaobo Wu, Weihong Han, Dawei Zhao, and Zhen Wang. 2021. A weighted network community detection algorithm based on deep learning. Appl. Math. Comput. 401 (2021), 126012.

[26]

Xinyu Li, Guangshun Wei, Jie Wang, and Yuanfeng Zhou. 2021. Multi-scale joint feature network for micro-expression recognition. Computational Visual Media 7 (2021), 407–417.

[27]

Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In Ictu Oculi: Exposing AI created fake videos by detecting eye blinking. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS’18). IEEE, 1–7.

[28]

Yidong Li, Wenhua Liu, Yi Jin, and Yuanzhouhan Cao. 2021. SPGAN: Face forgery using spoofing generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 1s (2021), 1–20.

Digital Library

[29]

Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-DF: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3207–3216.

[30]

Kaihan Lin, Weihong Han, Shudong Li, Zhaoquan Gu, Huimin Zhao, Jinchang Ren, Li Zhu, and Jujian Lv. 2022. IR-capsule: Two-stream network for face forgery detection. Cognitive Computation (2022), 1–10.

[31]

Kaihan Lin, Huimin Zhao, Jujian Lv, Canyao Li, Xiaoyong Liu, Rongjun Chen, and Ruoyan Zhao. 2020. Face detection and segmentation based on improved mask R-CNN. Discrete Dynamics in Nature and Society 2020 (2020), 1–11.

[32]

Jiarui Liu, Kaiman Zhu, Wei Lu, Xiangyang Luo, and Xianfeng Zhao. 2021. A lightweight 3D convolutional neural network for deepfake detection. International Journal of Intelligent Systems 36, 9 (2021), 4990–5004.

Digital Library

[33]

Xiaolong Liu, Yang Yu, Xiaolong Li, Yao Zhao, and Guodong Guo. 2022. TCSD: Triple complementary streams detector for comprehensive deepfake detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) (2022).

[34]

Yong-Jin Liu, Jin-Kai Zhang, Wen-Jing Yan, Su-Jing Wang, Guoying Zhao, and Xiaolan Fu. 2015. A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Transactions on Affective Computing 7, 4 (2015), 299–310.

Digital Library

[35]

Changlei Lu, Bin Liu, Wenbo Zhou, Qi Chu, and Nenghai Yu. 2021. Deepfake video detection using 3D-attentional inception convolutional neural network. In 2021 IEEE International Conference on Image Processing (ICIP’21). IEEE, 3572–3576.

[36]

Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, and Wael AbdAlmageed. 2020. Two-branch recurrent network for isolating deepfakes in videos. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16. Springer, 667–684.

Digital Library

[37]

Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, and Isao Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. In 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS’19). IEEE, 1–8.

Digital Library

[38]

Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 2307–2311.

[39]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1–11.

[40]

Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–12.

Digital Library

[41]

Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387–2395.

Digital Library

[42]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489–4497.

Digital Library

[43]

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6450–6459.

[44]

Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 3–19.

Digital Library

[45]

Feng Xu, Junping Zhang, and James Z. Wang. 2017. Microexpression identification and categorization using a facial dynamics map. IEEE Transactions on Affective Computing 8, 2 (2017), 254–267.

Digital Library

[46]

Jiachen Yang, Shuai Xiao, Aiyun Li, Wen Lu, Xinbo Gao, and Yang Li. 2021. MSTA-Net: Forgery detection by generating manipulation trace based on multi-scale self-texture attention. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2021), 4854–4866.

Digital Library

[47]

Yang Yu, Rongrong Ni, Wenjie Li, and Yao Zhao. 2022. Detection of AI-manipulated fake faces via mining generalized features. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 4 (2022), 1–23.

Digital Library

[48]

Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime tv-l 1 optical flow. In Pattern Recognition: 29th DAGM Symposium, Heidelberg, Germany, September 12–14, 2007. Proceedings 29. Springer, 214–223.

[49]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.

[50]

Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021. Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2185–2194.

[51]

Yue Zhao and Jiancheng Xu. 2019. A convolutional neural network for compound micro-expression recognition. Sensors 19, 24 (2019), 5553.

[52]

Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. 2021. Exploring temporal coherence for more general video face forgery detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15044–15054.

[53]

Peng Zhou, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2017. Two-stream neural networks for tampered face detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). IEEE, 1831–1839.

[54]

Peng Zhou, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2018. Learning rich features for image manipulation detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1053–1061.

[55]

Tianfei Zhou, Fatih Porikli, David J. Crandall, Luc Van Gool, and Wenguan Wang. 2022. A survey on deep learning technique for video segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 6 (2022), 7099–7122.

Digital Library

[56]

Tianfei Zhou, Wenguan Wang, Ender Konukoglu, and Luc Van Gool. 2022. Rethinking semantic segmentation: A prototype view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2582–2593.

[57]

Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, and Jianbing Shen. 2021. Face forensics in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5778–5788.

[58]

Yuan Zong, Wenming Zheng, Xiaohua Huang, Jingang Shi, Zhen Cui, and Guoying Zhao. 2018. Domain regeneration for cross-database micro-expression recognition. IEEE Transactions on Image Processing 27, 5 (2018), 2484–2498.

Cited By

Vyas RNappi Mdel Bimbo ABakshi S(2024)Introduction to Special Issue on “Recent trends in Multimedia Forensics”ACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678473Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3678473
Ahmed HTraore ISaad SMamun M(2024)Effect of Text Augmentation and Adversarial Training on Fake News DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.334459711:4(4775-4789)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3344597

Index Terms

Detecting Deepfake Videos using Spatiotemporal Trident Network
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
2. Security and privacy
  1. Software and application security
    1. Social network security and privacy

Recommendations

TCSD: Triple Complementary Streams Detector for Comprehensive Deepfake Detection
Advancements in computer vision and deep learning have made it difficult to distinguish deepfake visual media. While existing detection frameworks have achieved significant performance on challenging deepfake datasets, these approaches consider only a ...
An integrated spatiotemporal-based methodology for deepfake detection
Abstract
Rapid advances in deep learning models have made it easier for public and crackers to generate hyper-realistic deepfake videos in which faces are swapped. Such deepfake videos may constitute a significant threat to the world if they are misused to ...
Three-classification face manipulation detection using attention-based feature decomposition
Abstract
Face manipulation detection has become a recent research hotpot, and many detection methods have been proposed. Most existing detection methods treat face manipulation detection as a vanilla binary classification problem. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 11

November 2024

333 pages

EISSN:1551-6865

DOI:10.1145/3613730

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2024

Online AM: 13 September 2023

Accepted: 04 September 2023

Revised: 30 July 2023

Received: 21 February 2023

Published in TOMM Volume 20, Issue 11

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key research and Development Plan
National Natural Science Foundation of China
Major Key Project of PCL
DongGuan Innovative Research Team Program
Guangzhou Key research and Development Plan
Guangdong Higher Education Innovation Group
Guangzhou Higher Education Innovation Group
Key Laboratory of the Education Department of Guangdong Province
Scientific and Technological Planning Projects of Guangdong Province
Key Construction Discipline Scientific Research Capacity Improvement Project of Guangdong Province
Postgraduate Education Innovation Plan Project of Guangdong Province
Guangzhou University Graduate Innovation Research Funding Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
615
Total Downloads

Downloads (Last 12 months)537
Downloads (Last 6 weeks)74

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Vyas RNappi Mdel Bimbo ABakshi S(2024)Introduction to Special Issue on “Recent trends in Multimedia Forensics”ACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678473Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3678473
Ahmed HTraore ISaad SMamun M(2024)Effect of Text Augmentation and Adversarial Training on Fake News DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.334459711:4(4775-4789)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3344597

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents