Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475693acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection

Published: 17 October 2021 Publication History

Abstract

Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this problem, few of them can capture the normal spatio-temporal patterns effectively and efficiently. Moreover, existing works seldom explicitly consider the local consistency at frame level and global coherence of temporal dynamics in video sequences. To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. Specifically, we first present a convolutional transformer to perform future frame prediction. It contains three key components, i.e., a convolutional encoder to capture the spatial information of the input video clips, a temporal self-attention module to encode the temporal dynamics, and a convolutional decoder to integrate spatio-temporal features and predict the future frame. Next, a dual discriminator based adversarial training procedure, which jointly considers an image discriminator that can maintain the local consistency at frame-level and a video discriminator that can enforce the global coherence of temporal dynamics, is employed to enhance the future frame prediction. Finally, the prediction error is used to identify abnormal video frames. Thoroughly empirical studies on three public video anomaly detection datasets, i.e., UCSD Ped2, CUHK Avenue, and Shanghai Tech Campus, demonstrate the effectiveness of the proposed adversarial spatio-temporal modeling framework.

Supplementary Material

MP4 File (ct-d2gan-presentation.mp4)
In this presentation video, we present our work on the convolutional transformer based dual discriminator GAN framework (CT-D2GAN) for video anomaly detection. The presentation starts with the context and related works, and then goes over the technical details of the proposed framework specifically the convolutional transformer, the temporal self-attention module, and the dual discriminator GAN. Next, we present qualitative and quantitative experiment results, and conclude with the summary of the contributions and innovations. The proposed framework is demonstrated to be an effective and efficient spatio-temporal modeling framework and has the potential for further improvement and generalized use cases.

References

[1]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 214--223.
[2]
Thomas Brox, Andrés Bruhn, Nils Papenberg, and Joachim Weickert. 2004. High accuracy optical flow estimation based on a theory for warping. In European Conference on Computer Vision (ECCV). Springer, 25--36.
[3]
Yunpeng Chang, Zhigang Tu, Wei Xie, and Junsong Yuan. 2020. Clustering Driven Deep Autoencoder for Video Anomaly Detection. In European Conference on Computer Vision (ECCV). Springer, 329--345.
[4]
Yong Shean Chong and Yong Haur Tay. 2017. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks (ISNN). Springer, 189--196.
[5]
Fei Dong, Yu Zhang, and Xiushan Nie. 2020. Dual Discriminator Generative Adversarial Network for Video Anomaly Detection. IEEE Access, Vol. 8 (2020), 88170--88176.
[6]
Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel. 2019. Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection. In IEEE International Conference on Computer Vision (ICCV). IEEE, 1705--1714.
[7]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems (NIPS). 5767--5777.
[8]
Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. 2016. Learning temporal regularity in video sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 733--742.
[9]
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2462--2470.
[10]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5967--5976.
[11]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR) (2015).
[12]
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. In Advances in Neural Information Processing Systems (NIPS). 971--980.
[13]
Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2014. Anomaly detection and localization in crowded scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 1 (2014), 18--32.
[14]
Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future Frame Prediction for Anomaly Detection -- A New Baseline. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6536--6545.
[15]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3431--3440.
[16]
Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal event detection at 150 FPS in MATLAB. In IEEE International Conference on Computer Vision (ICCV). IEEE, 2720--2727.
[17]
Weixin Luo, Wen Liu, and Shenghua Gao. 2017a. Remembering history with convolutional LS™ for anomaly detection. In IEEE International Conference on Multimedia and Expo (ICME). IEEE, 439--444.
[18]
Weixin Luo, Wen Liu, and Shenghua Gao. 2017b. A revisit of sparse coding based anomaly detection in stacked RNN framework. IEEE International Conference on Computer Vision (ICCV), Vol. 1, 2 (2017), 3.
[19]
Vijay Mahadevan, Weixin Li, Viral Bhalodia, and Nuno Vasconcelos. 2010. Anomaly detection in crowded scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1975--1981.
[20]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations (ICLR).
[21]
Tu Nguyen, Trung Le, Hung Vu, and Dinh Phung. 2017. Dual discriminator generative adversarial nets. In Advances in neural information processing systems (NIPS). 2670--2680.
[22]
Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning Memory-guided Normality for Anomaly Detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 14372--14381.
[23]
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image Transformer. In International Conference on Machine Learning (ICML). PMLR, 4052--4061.
[24]
Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng, Guofei Jiang, and Garrison W. Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In International Joint Conference on Artificial Intelligence (IJCAI). 2627--26332.
[25]
Mahdyar Ravanbakhsh, Moin Nabi, Enver Sangineto, Lucio Marcenaro, Carlo Regazzoni, and Nicu Sebe. 2017. Abnormal Event Detection in Videos using Generative Adversarial Nets. IEEE International Conference on Image Processing (ICIP) (2017), 1577--1581.
[26]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 234--241.
[27]
Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LS™ network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems (NIPS). 802--810.
[28]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NIPS). 568--576.
[29]
Dongjin Song and Dacheng Tao. 2010. Biologically Inspired Feature Manifold for Scene Classification. IEEE Transactions on Image Processing, Vol. 19, 1 (2010), 174--184.
[30]
Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-World Anomaly Detection in Surveillance Videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 6479--6488.
[31]
Yao Tang, Lin Zhao, Shanshan Zhang, Chen Gong, Guangyu Li, and Jian Yang. 2020. Integrating prediction and reconstruction for anomaly detection. Pattern Recognition Letters, Vol. 129 (2020), 123--130.
[32]
Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, and Jan Kautz. 2018. MoCoGAN: Decomposing Motion and Content for Video Generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1526--1535.
[33]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS). 6000--6010.
[34]
Dan Xu, Yan Yan, Elisa Ricci, and Nicu Sebe. 2017. Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding, Vol. 156 (2017), 117--127.
[35]
Han Xu, Pengwei Liang, Wei Yu, Junjun Jiang, and Jiayi Ma. 2019. Learning a Generative Model for Fusing Infrared and Visible Images via Conditional Generative Adversarial Network with Dual Discriminators. In International Joint Conference on Artificial Intelligence (IJCAI). 3954--3960.
[36]
Fisher Yu and Vladlen Koltun. 2016. Multi-scale context aggregation by dilated convolutions. International Conference on Learning Representations (ICLR) (2016).
[37]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5505--5514.
[38]
Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, and V. Nitesh Chawla. 2019 b. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In Association for the Advancement of Artificial Intelligence (AAAI). AAAI, 1409--1416.
[39]
Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019 a. Self-attention generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 7354--7363.
[40]
Yiru Zhao, Bing Deng, Chen Shen, Yao Liu, Hongtao Lu, and Xian-Sheng Hua. 2017. Spatio-Temporal AutoEncoder for Video Anomaly Detection. In ACM International Conference on Multimedia (ACM MM). ACM, 1933--1941.

Cited By

View all
  • (2025)MTFL: multi-timescale feature learning for weakly-supervised anomaly detection in surveillance videosSeventeenth International Conference on Machine Vision (ICMV 2024)10.1117/12.3055069(14)Online publication date: 25-Feb-2025
  • (2025)Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and AnticipationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346171847:1(224-239)Online publication date: Jan-2025
  • (2025)SSIM over MSE: A new perspective for video anomaly detectionNeural Networks10.1016/j.neunet.2024.107115185(107115)Online publication date: May-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural network
  2. generative adversarial networks
  3. spatio-temporal modeling
  4. transformer model
  5. video anomaly detection

Qualifiers

  • Research-article

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)12
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)MTFL: multi-timescale feature learning for weakly-supervised anomaly detection in surveillance videosSeventeenth International Conference on Machine Vision (ICMV 2024)10.1117/12.3055069(14)Online publication date: 25-Feb-2025
  • (2025)Scene-Dependent Prediction in Latent Space for Video Anomaly Detection and AnticipationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346171847:1(224-239)Online publication date: Jan-2025
  • (2025)SSIM over MSE: A new perspective for video anomaly detectionNeural Networks10.1016/j.neunet.2024.107115185(107115)Online publication date: May-2025
  • (2025)Video anomaly behavior detection method based on attention-enhanced graph convolution and normalizing flowsSignal, Image and Video Processing10.1007/s11760-025-03903-419:5Online publication date: 5-Mar-2025
  • (2025)Deep crowd anomaly detection: state-of-the-art, challenges, and future research directionsArtificial Intelligence Review10.1007/s10462-024-11092-858:5Online publication date: 20-Feb-2025
  • (2024)Enhancing Anomaly Detection for Cultural Heritage via Long Short-Term Memory with Attention MechanismElectronics10.3390/electronics1307125413:7(1254)Online publication date: 28-Mar-2024
  • (2024)Spatiotemporal Masked Autoencoder with Multi-Memory and Skip Connections for Video Anomaly DetectionElectronics10.3390/electronics1302035313:2(353)Online publication date: 14-Jan-2024
  • (2024)Abnormal activities identification using Deep Q Network from IoT Surveillance SystemsProceedings of the 2024 Sixteenth International Conference on Contemporary Computing10.1145/3675888.3676077(398-406)Online publication date: 8-Aug-2024
  • (2024)Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal PromptsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681442(9301-9310)Online publication date: 28-Oct-2024
  • (2024)Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep ModelsACM Computing Surveys10.1145/364510156:7(1-38)Online publication date: 9-Apr-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media