A deep learning-assisted visual attention mechanism for anomaly detection in videos

Shoaib, Muhammad; Shah, Babar; Hussain, Tariq; Yang, Bailin; Ullah, Asad; Khan, Jahangir; Ali, Farman

doi:10.1007/s11042-023-17770-z

A deep learning-assisted visual attention mechanism for anomaly detection in videos

1232: Human-centric Multimedia Analysis
Published: 05 December 2023

Volume 83, pages 73363–73390, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Muhammad Shoaib^1,2,
Babar Shah³,
Tariq Hussain^4,5,
Bailin Yang^4,5,
Asad Ullah²,
Jahangir Khan² &
…
Farman Ali⁶

563 Accesses
Explore all metrics

Abstract

Ensuring public safety in urban areas is a crucial element in maintaining a good quality of life. The successful deployment of video surveillance systems depends heavily on the acquisition and processing of large volumes of urban data to derive meaningful insights. Manual monitoring and analysis of anomalous activities in the surveillance footage is both a time-consuming and error-prone process that is not scalable for urban environments with high levels of foot and vehicular traffic. Moreover, traditional surveillance systems are limited by their inability to process real-time data at scale, which can result in missed or delayed detection of potential security threats. This paper tackles this problem by proposing an automatic anomaly detection method via an attention mechanism. The attention area is identified using the background subtraction (BG) algorithm which identifies motion regions in the video frames. This information is then passed through a 3D convolutional neural network (3D CNN) to classify the normal and anomalous events. To evaluate the proposed method, experiments and analysis were conducted using the publicly available UCF crime dataset, demonstrating its effectiveness with an accuracy of 96.89% compared to the state-of-the-art methods. In case an anomaly is detected, an alert is sent to the nearest authorities to take immediate action to prevent further harm or damage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep anomaly detection through visual attention in surveillance videos

Article Open access 16 October 2020

Spatial Attention Transformer Based Framework for Anomaly Classification in Image Sequences

Detection and localization of anomalous objects in video sequences using vision transformers and U-Net model

Article 17 June 2024

Data availability

The datasets used in this research work are already online available on the following link: https://www.crcv.ucf.edu/projects/real-world/.

References

Cárdenas AA, Amin S, Sastry S (2008) Secure control: towards survivable cyber-physical systems. Proc - Int Conf Distrib Comput Syst:495–500. https://doi.org/10.1109/ICDCS.Workshops.2008.40
Ghazal S, Khan US, Saleem MM, Rashid N, Iqbal J (2019) Human activity recognition using 2D skeleton data and supervised machine learning. IET Image Process 13(13):2572–2578. https://doi.org/10.1049/iet-ipr.2019.0030
Article Google Scholar
Ding W, Liu K, Belyaev E, Cheng F (2018) Tensor-based linear dynamical systems for action recognition from 3D skeletons. Pattern Recognit 77:75–86. https://doi.org/10.1016/j.patcog.2017.12.004
Article Google Scholar
Dong J, Jiang W, Huang Q, Bao H, Zhou X Fast and robust multi-person 3D pose estimation from multiple views
Wang X, Yang LT, Song L, Wang H, Ren L, Deen MJ (2021) A tensor-based multiattributes visual feature recognition method for industrial intelligence. IEEE Trans Ind Inf 17(3):2231–2241. https://doi.org/10.1109/TII.2020.2999901
Article Google Scholar
Tan W, Yao Q, Liu J (2022) Overlooked video classification in weakly supervised video anomaly detection. arXiv preprint arXiv:2210.06688. https://doi.org/10.48550/arXiv.2210.06688
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71. https://doi.org/10.1016/s0004-3702(96)00034-3
Article Google Scholar
Irfanullah, Hussain T, Iqbal A, Yang B, Hussain A (2022) Real time violence detection in surveillance videos using convolutional neural networks. Multimed Tools Appl: 1–23.https://doi.org/10.1007/s11042-022-13169-4
Landi F, Snoek CGM, Cucchiara R (2019) Anomaly locality in video surveillance. [Online]. Available: http://arxiv.org/abs/1901.10364
Xu Q, See J, Lin W (2019) Localization guided fight action detection in surveillance videos. Proc - IEEE Int Conf Multimed Expo 2019-July:568–573. https://doi.org/10.1109/ICME.2019.00104
Article Google Scholar
Jain M, Van Gemert J, Jegou H, Bouthemy P, Snoek CGM (2014) Action localization with tubelets from motion. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:740–747. https://doi.org/10.1109/CVPR.2014.100
Xu D, Ricci E, Yan Y, Song J, Sebe N (2015) Learning deep representations of appearance and motion for anomalous event detection. 8.1–8.12. https://doi.org/10.5244/c.29.8
Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:2054–2060. https://doi.org/10.1109/CVPR.2010.5539882
Basharat A, Gritai A, Shah M (2008) Learning object motion patterns for anomaly detection and improved object detection. 26th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587510
Cui X, Liu Q, Gao M, Metaxas DN (2011) Abnormal detection using interaction energy potentials. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:3161–3167. https://doi.org/10.1109/CVPR.2011.5995558
Antić B, Ommer B (2011) Video parsing for abnormality detection. Proc IEEE Int Conf Comput Vis:2415–2422. https://doi.org/10.1109/ICCV.2011.6126525
Hospedales T, Gong S, Xiang T (2009) A Markov clustering topic model for mining behaviour in video. Proc IEEE Int Conf Comput Vis (Iccv):1165–1172. https://doi.org/10.1109/ICCV.2009.5459342
Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware activity recognition and anomaly detection in video. IEEE J Sel Top Signal Process 7(1):91–101. https://doi.org/10.1109/JSTSP.2012.2234722
Article Google Scholar
Gnouma M, Ejbali R, Zaied M (2020) Video anomaly detection and localization in crowded scenes. Adv Intell Syst Comput 951(10):87–96. https://doi.org/10.1007/978-3-030-20005-3_9
Article Google Scholar
Kratz L, Nishino K (2009) Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. June, pp. 1446–1453. https://doi.org/10.1109/CVPRW.2009.5206771
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 FPS in MATLAB. Proc IEEE Int Conf Comput Vis:2720–2727. https://doi.org/10.1109/ICCV.2013.338
Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:3313–3320. https://doi.org/10.1109/CVPR.2011.5995524
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences: supplementary material. Cvpr, pp. 1–31, [Online]. Available: http://arxiv.org/abs/1604.04574
Cheng KW, Chen YT, Fang WH (2015) Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 07:2909–2917. https://doi.org/10.1109/CVPR.2015.7298909
Article Google Scholar
Cong Y, Yuan J, Liu J 2002_Studi Tingkah Laku Pelolosan Kerapu Macan (Epinephelus fuscoguttatus) PADA BUBU (skripsi).pdf
Dutta JK, Banerjee B (2015) Online detection of abnormal events using incremental coding length. Proc Natl Conf Artif Intell 5:3755–3761. https://doi.org/10.1609/aaai.v29i1.9799
Article Google Scholar
Ionescu RT, Smeureanu S, Popescu M, Alexe B (2019) Detecting abnormal events in video using narrowed normality clusters. Proc. - 2019 IEEE Winter Conf. Appl. Comput. Vision, WACV 2019, pp. 1951–1960. https://doi.org/10.1109/WACV.2019.00212
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. June, pp. 2921–2928. https://doi.org/10.1109/CVPRW.2009.5206569
Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. 1, pp. 935–942. https://doi.org/10.1109/CVPRW.2009.5206641
Leyva R, Sanchez V, Li C, Member S (2017) Feature sets for online performance. 26(7): 3463–3478
Ahmed SA, Dogra DP, Kar S, Roy PP (2019) Trajectory-based surveillance analysis: a survey. IEEE Trans Circuits Syst Video Technol 29(7):1985–1997. https://doi.org/10.1109/TCSVT.2018.2857489
Article Google Scholar
Zhang T, Jia W, Gong C, Sun J, Song X (2018) Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognit Lett 107:98–104. https://doi.org/10.1016/j.patrec.2017.08.021
Article Google Scholar
Pȩkalska E, Tax DMJ, Duin RPW (2003) One-class LP classifier for dissimilarity representations. Adv Neural Inf Process Syst
Zhang T, Jia W, Yang B, Yang J, He X, Zheng Z (2017) MoWLD: a robust motion image descriptor for violence detection. Multimed Tools Appl 76(1):1419–1438. https://doi.org/10.1007/s11042-015-3133-0
Article Google Scholar
Zhang T, Jia W, He X, Yang J (2017) Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans Circuits Syst Video Technol 27(3):696–709. https://doi.org/10.1109/TCSVT.2016.2589858
Article Google Scholar
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) “Learning temporal regularity in video sequences. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-Decem:733–742. https://doi.org/10.1109/CVPR.2016.86
Article Google Scholar
Hinami R, Mei T, Satoh S (2017) Joint detection and recounting of abnormal events by learning deep generic knowledge. Proc IEEE Int Conf Comput Vis 2017-Octob:3639–3647. https://doi.org/10.1109/ICCV.2017.391
Article Google Scholar
Smeureanu S, Ionescu RT, Popescu M, Alexe B (2017) Deep appearance features for abnormal behavior detection in video. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 10485 LNCS:779–789. https://doi.org/10.1007/978-3-319-68548-9_70
Article MathSciNet Google Scholar
Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked RNN framework. Proc IEEE Int Conf Comput Vis 2017-Octob:341–349. https://doi.org/10.1109/ICCV.2017.45
Article Google Scholar
Ravanbakhsh M, Nabi M, Sangineto E, Marcenaro L, Regazzoni C, Sebe N (2017) DITEN, University of Genova DISI, University of Trento Carlos III University of Madrid. Icip, pp. 1577–1581
Hanson A, Pnvr K, Krishnagopal S, Davis L (2019) Bidirectional convolutional LSTM for the detection of violence in videos. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 11130 LNCS:280–295. https://doi.org/10.1007/978-3-030-11012-3_24
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14
Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. Proc - Int Conf Pattern Recognit 2:28–31. https://doi.org/10.1109/icpr.2004.1333992
Article Google Scholar
Zivkovic Z, Van Der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit Lett 27(7):773–780. https://doi.org/10.1016/j.patrec.2005.11.005
Article Google Scholar
Curtis JB, Zumberge JE, Brown SW, Park N (2013) Evaluation of Niobrara and Mowry formation petroleum systems in the Powder River, Denver and Central Basins of the Rocky Mountains, Colorado and. no. March, pp. 31–33
Yeh CH, Lin CY, Muchtar K, Kang LW (2014) Real-time background modeling based on a multi-level texture description. Inf Sci (NY) 269:106–127. https://doi.org/10.1016/j.ins.2013.08.014
Article MathSciNet Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. Proc IEEE Int Conf Comput Vis 2015 Inter:4489–4497. https://doi.org/10.1109/ICCV.2015.510
Article Google Scholar
Pinhanez CS (1999) Representation and recognition of action in interactive spaces. Media Arts Sci Progr
Khan UA, Javed A, Ashraf R (2021) An effective hybrid framework for content based image retrieval (CBIR). Multimed Tools Appl 80(17):26911–26937. https://doi.org/10.1007/s11042-021-10530-x
Article Google Scholar
Koller D, Weber J, Malik J (1994) Robust multiple car tracking with occlusion reasoning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 800 LNCS:189–196. https://doi.org/10.1007/3-540-57956-7_22
Article Google Scholar
Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872. https://doi.org/10.1109/34.868686
Article Google Scholar
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479-6488. https://doi.org/10.48550/arXiv.1801.04264
Ren H, Liu W, Olsen SI, Escalera S, Moeslund TB (2015) Unsupervised behavior-specific dictionary learning for abnormal event detection 28.1–28.13. https://doi.org/10.5244/c.29.28
Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127. https://doi.org/10.1016/j.cviu.2016.10.010
Article Google Scholar
Zhang Y, Lu H, Zhang L, Ruan X, Sakai S (2016) Video anomaly detection based on locality sensitive hashing filters, vol 59. Elsevier
Kooij JFP, Liem MC, Krijnders JD, Andringa TC, Gavrila DM (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120. https://doi.org/10.1016/j.cviu.2015.06.009
Article Google Scholar
Saleemi I, Shafique K, Shah M (2009) Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Trans Pattern Anal Mach Intell 31(8):1472–1485. https://doi.org/10.1109/TPAMI.2008.175
Article Google Scholar
Zhou S, Shen W, Zeng D, Fang M, Wei Y, Zhang Z (2016) Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process Image Commun 47:358–368. https://doi.org/10.1016/j.image.2016.06.007
Article Google Scholar
Jian M, Lam KM, Dong J (2014) Illumination-insensitive texture discrimination based on illumination compensation and enhancement. Inf Sci (NY) 269:60–72. https://doi.org/10.1016/j.ins.2014.01.019
Article MathSciNet Google Scholar
Lin CY, Muchtar K, Lin WY, Jian ZY (2020) Moving object detection through image bit-planes representation without thresholding. IEEE Trans Intell Transp Syst 21(4):1404–1414. https://doi.org/10.1109/TITS.2019.2909915
Article Google Scholar
Zhong JX, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019-June:1237–1246. https://doi.org/10.1109/CVPR.2019.00133
Article Google Scholar
Zaheer MZ, Mahmood A, Astrid M, Lee SI (2020) CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 12367 LNCS:358–376. https://doi.org/10.1007/978-3-030-58542-6_22
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No.62172366), and "Pioneer" and "Leading Goose" R & D. Program of Zhejiang Province (2023C01150). This research work was also supported by the Cluster grant R20143 of Zayed University, UAE.

Author information

Authors and Affiliations

Department of Computer Science, CECOS University of IT and Emerging Sciences, Peshawar, 25000, Pakistan
Muhammad Shoaib
Department of Computer Science and Information Technology, Sarhad University of Science & Information Technology, Peshawar, 25000, Pakistan
Muhammad Shoaib, Asad Ullah & Jahangir Khan
College of Technological Innovation, Zayed University, 19282, Dubai, United Arab Emirates
Babar Shah
School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, 310018, China
Tariq Hussain & Bailin Yang
School of Mathematics and Statistics, Zhejiang Gongshang University, Hangzhou, 310018, China
Tariq Hussain & Bailin Yang
Department of Computer Science and Engineering, School of Convergence, College of Computing and Informatics, Sungkyunkwan University, Seoul, 03063, South Korea
Farman Ali

Authors

Muhammad Shoaib
View author publications
You can also search for this author in PubMed Google Scholar
Babar Shah
View author publications
You can also search for this author in PubMed Google Scholar
Tariq Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Bailin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Asad Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Jahangir Khan
View author publications
You can also search for this author in PubMed Google Scholar
Farman Ali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bailin Yang or Farman Ali.

Ethics declarations

Conflicts of interest

The authors of this manuscript declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Muhammad Shoaib and Farman Ali these authors contributed equally to this work and co-first authors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shoaib, M., Shah, B., Hussain, T. et al. A deep learning-assisted visual attention mechanism for anomaly detection in videos. Multimed Tools Appl 83, 73363–73390 (2024). https://doi.org/10.1007/s11042-023-17770-z

Download citation

Received: 11 April 2023
Revised: 25 October 2023
Accepted: 28 November 2023
Published: 05 December 2023
Issue Date: September 2024
DOI: https://doi.org/10.1007/s11042-023-17770-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep learning-assisted visual attention mechanism for anomaly detection in videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep anomaly detection through visual attention in surveillance videos

Spatial Attention Transformer Based Framework for Anomaly Classification in Image Sequences

Detection and localization of anomalous objects in video sequences using vision transformers and U-Net model

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A deep learning-assisted visual attention mechanism for anomaly detection in videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep anomaly detection through visual attention in surveillance videos

Spatial Attention Transformer Based Framework for Anomaly Classification in Image Sequences

Detection and localization of anomalous objects in video sequences using vision transformers and U-Net model

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation