research-article

Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection

Authors:

Haifeng ChenAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 5546 - 5554

https://doi.org/10.1145/3474085.3475693

Published: 17 October 2021 Publication History

Get Access

Abstract

Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this problem, few of them can capture the normal spatio-temporal patterns effectively and efficiently. Moreover, existing works seldom explicitly consider the local consistency at frame level and global coherence of temporal dynamics in video sequences. To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. Specifically, we first present a convolutional transformer to perform future frame prediction. It contains three key components, i.e., a convolutional encoder to capture the spatial information of the input video clips, a temporal self-attention module to encode the temporal dynamics, and a convolutional decoder to integrate spatio-temporal features and predict the future frame. Next, a dual discriminator based adversarial training procedure, which jointly considers an image discriminator that can maintain the local consistency at frame-level and a video discriminator that can enforce the global coherence of temporal dynamics, is employed to enhance the future frame prediction. Finally, the prediction error is used to identify abnormal video frames. Thoroughly empirical studies on three public video anomaly detection datasets, i.e., UCSD Ped2, CUHK Avenue, and Shanghai Tech Campus, demonstrate the effectiveness of the proposed adversarial spatio-temporal modeling framework.

Supplementary Material

MP4 File (ct-d2gan-presentation.mp4)

In this presentation video, we present our work on the convolutional transformer based dual discriminator GAN framework (CT-D2GAN) for video anomaly detection. The presentation starts with the context and related works, and then goes over the technical details of the proposed framework specifically the convolutional transformer, the temporal self-attention module, and the dual discriminator GAN. Next, we present qualitative and quantitative experiment results, and conclude with the summary of the contributions and innovations. The proposed framework is demonstrated to be an effective and efficient spatio-temporal modeling framework and has the potential for further improvement and generalized use cases.

Download
19.00 MB

References

[1]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning (ICML). PMLR, 214--223.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Spatio-Temporal AutoEncoder for Video Anomaly Detection

Generative Adversarial Networks for anomaly detection in aerial images

Attention-guided generator with dual discriminator GAN for real-time video anomaly detection

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations