Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Detecting Deepfake Videos using Spatiotemporal Trident Network

Published: 12 September 2024 Publication History

Abstract

The widespread dissemination of Deepfake in social networks has posed serious security risks, thus necessitating the development of an effective Deepfake detection technique. Currently, video-based detectors have not been explored as extensively as image-based detectors. Most existing video-based methods only consider temporal features without combining spatial features, and do not mine deeper-level subtle forgeries, resulting in limited detection performance. In this paper, a novel spatiotemporal trident network (STN) is proposed to detect both spatial and temporal inconsistencies of Deepfake videos. Since there is a large amount of redundant information in Deepfake video frames, we introduce convolutional block attention module (CBAM) on the basis of the I3D network and optimize the structure to make the network better focus on the meaningful information of the input video. Aiming at the defects in the deeper-level subtle forgeries, we designed three feature extraction modules (FEMs) of RGB, optical flow, and noise to further extract deeper video frame information. Extensive experiments on several well-known datasets demonstrate that our method has promising performance, surpassing several state-of-the-art Deepfake video detection methods.

References

[1]
Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. MesoNet: A compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS’18). IEEE, 1–7.
[2]
Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, and Matthias Grundmann. 2019. BlazeFace: Sub-millisecond neural face detection on mobile GPUs. arXiv preprint arXiv:1907.05047 (2019).
[3]
Nicolo Bonettini, Edoardo Daniele Cannas, Sara Mandelli, Luca Bondi, Paolo Bestagini, and Stefano Tubaro. 2021. Video face manipulation detection through ensemble of CNNs. In 2020 25th International Conference on Pattern Recognition (ICPR’21). IEEE, 5012–5019.
[4]
Gary Bradski. 2000. The openCV library. Dr. Dobb’s Journal: Software Tools for the Professional Programmer 25, 11 (2000), 120–123.
[5]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.
[6]
Oscar De Lima, Sean Franklin, Shreshtha Basu, Blake Karwoski, and Annet George. 2020. Deepfake detection using spatiotemporal convolutional networks. arXiv preprint arXiv:2006.14749 (2020).
[7]
DeepFakes. 2017. DeepFakes. Retrieved Feb. 16, 2023 from https://github.com/deepfakes/faceswap
[8]
Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. RetinaFace: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5203–5212.
[9]
Jessica Fridrich and Jan Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7, 3 (2012), 868–882.
[10]
Shreyan Ganguly, Sk Mohiuddin, Samir Malakar, Erik Cuevas, and Ram Sarkar. 2022. Visual attention-based deepfake video forgery detection. Pattern Analysis and Applications 25, 4 (2022), 981–992.
[11]
Ipek Ganiyusufoglu, L. Minh Ngô, Nedko Savov, Sezer Karaoglu, and Theo Gevers. 2020. Spatio-temporal features for generalized detection of deepfake videos. arXiv preprint arXiv:2010.11844 (2020).
[12]
Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, and Zhouhui Lian. 2021. High-fidelity and arbitrary face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16115–16124.
[13]
Shiming Ge, Fanzhao Lin, Chenyu Li, Daichi Zhang, Weiping Wang, and Dan Zeng. 2022. Deepfake video detection via predictive representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 2s (2022), 1–21.
[14]
David Güera and Edward J. Delp. 2018. Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’18). IEEE, 1–6.
[15]
Bing Han, Xiaoguang Han, Hua Zhang, Jingzhi Li, and Xiaochun Cao. 2021. Fighting fake news: Two stream network for deepfake detection via learnable SRM. IEEE Transactions on Biometrics, Behavior, and Identity Science 3, 3 (2021), 320–331.
[16]
Weihong Han, Zhihong Tian, Chunsheng Zhu, Zizhong Huang, Yan Jia, and Mohsen Guizani. 2019. A topic representation model for online social networks based on hybrid human–artificial intelligence. IEEE Transactions on Computational Social Systems 8, 1 (2019), 191–200.
[17]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatio-temporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154–3160.
[18]
Juan Hu, Xin Liao, Wei Wang, and Zheng Qin. 2021. Detecting compressed deepfake videos in social networks using frame-temporality two-stream convolutional network. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2021), 1089–1102.
[19]
Serdar Ince and Janusz Konrad. 2008. Occlusion-aware optical flow estimation. IEEE Transactions on Image Processing 17, 8 (2008), 1443–1451.
[20]
Felix Juefei-Xu, Run Wang, Yihao Huang, Qing Guo, Lei Ma, and Yang Liu. 2022. Countering malicious deepfakes: Survey, battleground, and horizon. International Journal of Computer Vision 130, 7 (2022), 1678–1734.
[21]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.
[22]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110–8119.
[23]
Serkan Kiranyaz, Muhammad-Adeel Waris, Iftikhar Ahmad, Ridha Hamila, and Moncef Gabbouj. 2016. Face segmentation in thumbnail images by data-adaptive convolutional segmentation networks. In 2016 IEEE International Conference on Image Processing (ICIP’16). IEEE, 2306–2310.
[24]
Marek Kowalski. 2017. FaceSwap. Retrieved Feb. 16, 2023 from https://github.com/MarekKowalski/FaceSwap/
[25]
Shudong Li, Laiyuan Jiang, Xiaobo Wu, Weihong Han, Dawei Zhao, and Zhen Wang. 2021. A weighted network community detection algorithm based on deep learning. Appl. Math. Comput. 401 (2021), 126012.
[26]
Xinyu Li, Guangshun Wei, Jie Wang, and Yuanfeng Zhou. 2021. Multi-scale joint feature network for micro-expression recognition. Computational Visual Media 7 (2021), 407–417.
[27]
Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In Ictu Oculi: Exposing AI created fake videos by detecting eye blinking. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS’18). IEEE, 1–7.
[28]
Yidong Li, Wenhua Liu, Yi Jin, and Yuanzhouhan Cao. 2021. SPGAN: Face forgery using spoofing generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 17, 1s (2021), 1–20.
[29]
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-DF: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3207–3216.
[30]
Kaihan Lin, Weihong Han, Shudong Li, Zhaoquan Gu, Huimin Zhao, Jinchang Ren, Li Zhu, and Jujian Lv. 2022. IR-capsule: Two-stream network for face forgery detection. Cognitive Computation (2022), 1–10.
[31]
Kaihan Lin, Huimin Zhao, Jujian Lv, Canyao Li, Xiaoyong Liu, Rongjun Chen, and Ruoyan Zhao. 2020. Face detection and segmentation based on improved mask R-CNN. Discrete Dynamics in Nature and Society 2020 (2020), 1–11.
[32]
Jiarui Liu, Kaiman Zhu, Wei Lu, Xiangyang Luo, and Xianfeng Zhao. 2021. A lightweight 3D convolutional neural network for deepfake detection. International Journal of Intelligent Systems 36, 9 (2021), 4990–5004.
[33]
Xiaolong Liu, Yang Yu, Xiaolong Li, Yao Zhao, and Guodong Guo. 2022. TCSD: Triple complementary streams detector for comprehensive deepfake detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) (2022).
[34]
Yong-Jin Liu, Jin-Kai Zhang, Wen-Jing Yan, Su-Jing Wang, Guoying Zhao, and Xiaolan Fu. 2015. A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Transactions on Affective Computing 7, 4 (2015), 299–310.
[35]
Changlei Lu, Bin Liu, Wenbo Zhou, Qi Chu, and Nenghai Yu. 2021. Deepfake video detection using 3D-attentional inception convolutional neural network. In 2021 IEEE International Conference on Image Processing (ICIP’21). IEEE, 3572–3576.
[36]
Iacopo Masi, Aditya Killekar, Royston Marian Mascarenhas, Shenoy Pratik Gurudatt, and Wael AbdAlmageed. 2020. Two-branch recurrent network for isolating deepfakes in videos. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16. Springer, 667–684.
[37]
Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, and Isao Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. In 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS’19). IEEE, 1–8.
[38]
Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 2307–2311.
[39]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1–11.
[40]
Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–12.
[41]
Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387–2395.
[42]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489–4497.
[43]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6450–6459.
[44]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18). 3–19.
[45]
Feng Xu, Junping Zhang, and James Z. Wang. 2017. Microexpression identification and categorization using a facial dynamics map. IEEE Transactions on Affective Computing 8, 2 (2017), 254–267.
[46]
Jiachen Yang, Shuai Xiao, Aiyun Li, Wen Lu, Xinbo Gao, and Yang Li. 2021. MSTA-Net: Forgery detection by generating manipulation trace based on multi-scale self-texture attention. IEEE Transactions on Circuits and Systems for Video Technology 32, 7 (2021), 4854–4866.
[47]
Yang Yu, Rongrong Ni, Wenjie Li, and Yao Zhao. 2022. Detection of AI-manipulated fake faces via mining generalized features. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 4 (2022), 1–23.
[48]
Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime tv-l 1 optical flow. In Pattern Recognition: 29th DAGM Symposium, Heidelberg, Germany, September 12–14, 2007. Proceedings 29. Springer, 214–223.
[49]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.
[50]
Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Tianyi Wei, Weiming Zhang, and Nenghai Yu. 2021. Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2185–2194.
[51]
Yue Zhao and Jiancheng Xu. 2019. A convolutional neural network for compound micro-expression recognition. Sensors 19, 24 (2019), 5553.
[52]
Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, and Fang Wen. 2021. Exploring temporal coherence for more general video face forgery detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15044–15054.
[53]
Peng Zhou, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2017. Two-stream neural networks for tampered face detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). IEEE, 1831–1839.
[54]
Peng Zhou, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2018. Learning rich features for image manipulation detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1053–1061.
[55]
Tianfei Zhou, Fatih Porikli, David J. Crandall, Luc Van Gool, and Wenguan Wang. 2022. A survey on deep learning technique for video segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 6 (2022), 7099–7122.
[56]
Tianfei Zhou, Wenguan Wang, Ender Konukoglu, and Luc Van Gool. 2022. Rethinking semantic segmentation: A prototype view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2582–2593.
[57]
Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, and Jianbing Shen. 2021. Face forensics in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5778–5788.
[58]
Yuan Zong, Wenming Zheng, Xiaohua Huang, Jingang Shi, Zhen Cui, and Guoying Zhao. 2018. Domain regeneration for cross-database micro-expression recognition. IEEE Transactions on Image Processing 27, 5 (2018), 2484–2498.

Cited By

View all
  • (2024)Introduction to Special Issue on “Recent trends in Multimedia Forensics”ACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678473Online publication date: 2-Aug-2024
  • (2024)Effect of Text Augmentation and Adversarial Training on Fake News DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.334459711:4(4775-4789)Online publication date: Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 11
November 2024
333 pages
EISSN:1551-6865
DOI:10.1145/3613730
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2024
Online AM: 13 September 2023
Accepted: 04 September 2023
Revised: 30 July 2023
Received: 21 February 2023
Published in TOMM Volume 20, Issue 11

Check for updates

Author Tags

  1. Deepfake
  2. spatiotemporal
  3. trident networks
  4. feature extraction module

Qualifiers

  • Research-article

Funding Sources

  • National Key research and Development Plan
  • National Natural Science Foundation of China
  • Major Key Project of PCL
  • DongGuan Innovative Research Team Program
  • Guangzhou Key research and Development Plan
  • Guangdong Higher Education Innovation Group
  • Guangzhou Higher Education Innovation Group
  • Key Laboratory of the Education Department of Guangdong Province
  • Scientific and Technological Planning Projects of Guangdong Province
  • Key Construction Discipline Scientific Research Capacity Improvement Project of Guangdong Province
  • Postgraduate Education Innovation Plan Project of Guangdong Province
  • Guangzhou University Graduate Innovation Research Funding Program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)537
  • Downloads (Last 6 weeks)74
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Introduction to Special Issue on “Recent trends in Multimedia Forensics”ACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3678473Online publication date: 2-Aug-2024
  • (2024)Effect of Text Augmentation and Adversarial Training on Fake News DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.334459711:4(4775-4789)Online publication date: Aug-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media