Abstract
Ensuring public safety in urban areas is a crucial element in maintaining a good quality of life. The successful deployment of video surveillance systems depends heavily on the acquisition and processing of large volumes of urban data to derive meaningful insights. Manual monitoring and analysis of anomalous activities in the surveillance footage is both a time-consuming and error-prone process that is not scalable for urban environments with high levels of foot and vehicular traffic. Moreover, traditional surveillance systems are limited by their inability to process real-time data at scale, which can result in missed or delayed detection of potential security threats. This paper tackles this problem by proposing an automatic anomaly detection method via an attention mechanism. The attention area is identified using the background subtraction (BG) algorithm which identifies motion regions in the video frames. This information is then passed through a 3D convolutional neural network (3D CNN) to classify the normal and anomalous events. To evaluate the proposed method, experiments and analysis were conducted using the publicly available UCF crime dataset, demonstrating its effectiveness with an accuracy of 96.89% compared to the state-of-the-art methods. In case an anomaly is detected, an alert is sent to the nearest authorities to take immediate action to prevent further harm or damage.
Similar content being viewed by others
Data availability
The datasets used in this research work are already online available on the following link: https://www.crcv.ucf.edu/projects/real-world/.
References
Cárdenas AA, Amin S, Sastry S (2008) Secure control: towards survivable cyber-physical systems. Proc - Int Conf Distrib Comput Syst:495–500. https://doi.org/10.1109/ICDCS.Workshops.2008.40
Ghazal S, Khan US, Saleem MM, Rashid N, Iqbal J (2019) Human activity recognition using 2D skeleton data and supervised machine learning. IET Image Process 13(13):2572–2578. https://doi.org/10.1049/iet-ipr.2019.0030
Ding W, Liu K, Belyaev E, Cheng F (2018) Tensor-based linear dynamical systems for action recognition from 3D skeletons. Pattern Recognit 77:75–86. https://doi.org/10.1016/j.patcog.2017.12.004
Dong J, Jiang W, Huang Q, Bao H, Zhou X Fast and robust multi-person 3D pose estimation from multiple views
Wang X, Yang LT, Song L, Wang H, Ren L, Deen MJ (2021) A tensor-based multiattributes visual feature recognition method for industrial intelligence. IEEE Trans Ind Inf 17(3):2231–2241. https://doi.org/10.1109/TII.2020.2999901
Tan W, Yao Q, Liu J (2022) Overlooked video classification in weakly supervised video anomaly detection. arXiv preprint arXiv:2210.06688. https://doi.org/10.48550/arXiv.2210.06688
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71. https://doi.org/10.1016/s0004-3702(96)00034-3
Irfanullah, Hussain T, Iqbal A, Yang B, Hussain A (2022) Real time violence detection in surveillance videos using convolutional neural networks. Multimed Tools Appl: 1–23.https://doi.org/10.1007/s11042-022-13169-4
Landi F, Snoek CGM, Cucchiara R (2019) Anomaly locality in video surveillance. [Online]. Available: http://arxiv.org/abs/1901.10364
Xu Q, See J, Lin W (2019) Localization guided fight action detection in surveillance videos. Proc - IEEE Int Conf Multimed Expo 2019-July:568–573. https://doi.org/10.1109/ICME.2019.00104
Jain M, Van Gemert J, Jegou H, Bouthemy P, Snoek CGM (2014) Action localization with tubelets from motion. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:740–747. https://doi.org/10.1109/CVPR.2014.100
Xu D, Ricci E, Yan Y, Song J, Sebe N (2015) Learning deep representations of appearance and motion for anomalous event detection. 8.1–8.12. https://doi.org/10.5244/c.29.8
Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:2054–2060. https://doi.org/10.1109/CVPR.2010.5539882
Basharat A, Gritai A, Shah M (2008) Learning object motion patterns for anomaly detection and improved object detection. 26th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587510
Cui X, Liu Q, Gao M, Metaxas DN (2011) Abnormal detection using interaction energy potentials. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:3161–3167. https://doi.org/10.1109/CVPR.2011.5995558
Antić B, Ommer B (2011) Video parsing for abnormality detection. Proc IEEE Int Conf Comput Vis:2415–2422. https://doi.org/10.1109/ICCV.2011.6126525
Hospedales T, Gong S, Xiang T (2009) A Markov clustering topic model for mining behaviour in video. Proc IEEE Int Conf Comput Vis (Iccv):1165–1172. https://doi.org/10.1109/ICCV.2009.5459342
Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware activity recognition and anomaly detection in video. IEEE J Sel Top Signal Process 7(1):91–101. https://doi.org/10.1109/JSTSP.2012.2234722
Gnouma M, Ejbali R, Zaied M (2020) Video anomaly detection and localization in crowded scenes. Adv Intell Syst Comput 951(10):87–96. https://doi.org/10.1007/978-3-030-20005-3_9
Kratz L, Nishino K (2009) Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. June, pp. 1446–1453. https://doi.org/10.1109/CVPRW.2009.5206771
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 FPS in MATLAB. Proc IEEE Int Conf Comput Vis:2720–2727. https://doi.org/10.1109/ICCV.2013.338
Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:3313–3320. https://doi.org/10.1109/CVPR.2011.5995524
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences: supplementary material. Cvpr, pp. 1–31, [Online]. Available: http://arxiv.org/abs/1604.04574
Cheng KW, Chen YT, Fang WH (2015) Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 07:2909–2917. https://doi.org/10.1109/CVPR.2015.7298909
Cong Y, Yuan J, Liu J 2002_Studi Tingkah Laku Pelolosan Kerapu Macan (Epinephelus fuscoguttatus) PADA BUBU (skripsi).pdf
Dutta JK, Banerjee B (2015) Online detection of abnormal events using incremental coding length. Proc Natl Conf Artif Intell 5:3755–3761. https://doi.org/10.1609/aaai.v29i1.9799
Ionescu RT, Smeureanu S, Popescu M, Alexe B (2019) Detecting abnormal events in video using narrowed normality clusters. Proc. - 2019 IEEE Winter Conf. Appl. Comput. Vision, WACV 2019, pp. 1951–1960. https://doi.org/10.1109/WACV.2019.00212
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. June, pp. 2921–2928. https://doi.org/10.1109/CVPRW.2009.5206569
Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. 1, pp. 935–942. https://doi.org/10.1109/CVPRW.2009.5206641
Leyva R, Sanchez V, Li C, Member S (2017) Feature sets for online performance. 26(7): 3463–3478
Ahmed SA, Dogra DP, Kar S, Roy PP (2019) Trajectory-based surveillance analysis: a survey. IEEE Trans Circuits Syst Video Technol 29(7):1985–1997. https://doi.org/10.1109/TCSVT.2018.2857489
Zhang T, Jia W, Gong C, Sun J, Song X (2018) Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognit Lett 107:98–104. https://doi.org/10.1016/j.patrec.2017.08.021
Pȩkalska E, Tax DMJ, Duin RPW (2003) One-class LP classifier for dissimilarity representations. Adv Neural Inf Process Syst
Zhang T, Jia W, Yang B, Yang J, He X, Zheng Z (2017) MoWLD: a robust motion image descriptor for violence detection. Multimed Tools Appl 76(1):1419–1438. https://doi.org/10.1007/s11042-015-3133-0
Zhang T, Jia W, He X, Yang J (2017) Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans Circuits Syst Video Technol 27(3):696–709. https://doi.org/10.1109/TCSVT.2016.2589858
Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) “Learning temporal regularity in video sequences. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-Decem:733–742. https://doi.org/10.1109/CVPR.2016.86
Hinami R, Mei T, Satoh S (2017) Joint detection and recounting of abnormal events by learning deep generic knowledge. Proc IEEE Int Conf Comput Vis 2017-Octob:3639–3647. https://doi.org/10.1109/ICCV.2017.391
Smeureanu S, Ionescu RT, Popescu M, Alexe B (2017) Deep appearance features for abnormal behavior detection in video. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 10485 LNCS:779–789. https://doi.org/10.1007/978-3-319-68548-9_70
Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked RNN framework. Proc IEEE Int Conf Comput Vis 2017-Octob:341–349. https://doi.org/10.1109/ICCV.2017.45
Ravanbakhsh M, Nabi M, Sangineto E, Marcenaro L, Regazzoni C, Sebe N (2017) DITEN, University of Genova DISI, University of Trento Carlos III University of Madrid. Icip, pp. 1577–1581
Hanson A, Pnvr K, Krishnagopal S, Davis L (2019) Bidirectional convolutional LSTM for the detection of violence in videos. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 11130 LNCS:280–295. https://doi.org/10.1007/978-3-030-11012-3_24
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14
Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. Proc - Int Conf Pattern Recognit 2:28–31. https://doi.org/10.1109/icpr.2004.1333992
Zivkovic Z, Van Der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit Lett 27(7):773–780. https://doi.org/10.1016/j.patrec.2005.11.005
Curtis JB, Zumberge JE, Brown SW, Park N (2013) Evaluation of Niobrara and Mowry formation petroleum systems in the Powder River, Denver and Central Basins of the Rocky Mountains, Colorado and. no. March, pp. 31–33
Yeh CH, Lin CY, Muchtar K, Kang LW (2014) Real-time background modeling based on a multi-level texture description. Inf Sci (NY) 269:106–127. https://doi.org/10.1016/j.ins.2013.08.014
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. Proc IEEE Int Conf Comput Vis 2015 Inter:4489–4497. https://doi.org/10.1109/ICCV.2015.510
Pinhanez CS (1999) Representation and recognition of action in interactive spaces. Media Arts Sci Progr
Khan UA, Javed A, Ashraf R (2021) An effective hybrid framework for content based image retrieval (CBIR). Multimed Tools Appl 80(17):26911–26937. https://doi.org/10.1007/s11042-021-10530-x
Koller D, Weber J, Malik J (1994) Robust multiple car tracking with occlusion reasoning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 800 LNCS:189–196. https://doi.org/10.1007/3-540-57956-7_22
Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872. https://doi.org/10.1109/34.868686
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479-6488. https://doi.org/10.48550/arXiv.1801.04264
Ren H, Liu W, Olsen SI, Escalera S, Moeslund TB (2015) Unsupervised behavior-specific dictionary learning for abnormal event detection 28.1–28.13. https://doi.org/10.5244/c.29.28
Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127. https://doi.org/10.1016/j.cviu.2016.10.010
Zhang Y, Lu H, Zhang L, Ruan X, Sakai S (2016) Video anomaly detection based on locality sensitive hashing filters, vol 59. Elsevier
Kooij JFP, Liem MC, Krijnders JD, Andringa TC, Gavrila DM (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120. https://doi.org/10.1016/j.cviu.2015.06.009
Saleemi I, Shafique K, Shah M (2009) Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Trans Pattern Anal Mach Intell 31(8):1472–1485. https://doi.org/10.1109/TPAMI.2008.175
Zhou S, Shen W, Zeng D, Fang M, Wei Y, Zhang Z (2016) Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process Image Commun 47:358–368. https://doi.org/10.1016/j.image.2016.06.007
Jian M, Lam KM, Dong J (2014) Illumination-insensitive texture discrimination based on illumination compensation and enhancement. Inf Sci (NY) 269:60–72. https://doi.org/10.1016/j.ins.2014.01.019
Lin CY, Muchtar K, Lin WY, Jian ZY (2020) Moving object detection through image bit-planes representation without thresholding. IEEE Trans Intell Transp Syst 21(4):1404–1414. https://doi.org/10.1109/TITS.2019.2909915
Zhong JX, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019-June:1237–1246. https://doi.org/10.1109/CVPR.2019.00133
Zaheer MZ, Mahmood A, Astrid M, Lee SI (2020) CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 12367 LNCS:358–376. https://doi.org/10.1007/978-3-030-58542-6_22
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No.62172366), and "Pioneer" and "Leading Goose" R & D. Program of Zhejiang Province (2023C01150). This research work was also supported by the Cluster grant R20143 of Zayed University, UAE.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflicts of interest
The authors of this manuscript declare no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Muhammad Shoaib and Farman Ali these authors contributed equally to this work and co-first authors.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shoaib, M., Shah, B., Hussain, T. et al. A deep learning-assisted visual attention mechanism for anomaly detection in videos. Multimed Tools Appl 83, 73363–73390 (2024). https://doi.org/10.1007/s11042-023-17770-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17770-z