Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

A deep learning-assisted visual attention mechanism for anomaly detection in videos

  • 1232: Human-centric Multimedia Analysis
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Ensuring public safety in urban areas is a crucial element in maintaining a good quality of life. The successful deployment of video surveillance systems depends heavily on the acquisition and processing of large volumes of urban data to derive meaningful insights. Manual monitoring and analysis of anomalous activities in the surveillance footage is both a time-consuming and error-prone process that is not scalable for urban environments with high levels of foot and vehicular traffic. Moreover, traditional surveillance systems are limited by their inability to process real-time data at scale, which can result in missed or delayed detection of potential security threats. This paper tackles this problem by proposing an automatic anomaly detection method via an attention mechanism. The attention area is identified using the background subtraction (BG) algorithm which identifies motion regions in the video frames. This information is then passed through a 3D convolutional neural network (3D CNN) to classify the normal and anomalous events. To evaluate the proposed method, experiments and analysis were conducted using the publicly available UCF crime dataset, demonstrating its effectiveness with an accuracy of 96.89% compared to the state-of-the-art methods. In case an anomaly is detected, an alert is sent to the nearest authorities to take immediate action to prevent further harm or damage.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data availability

The datasets used in this research work are already online available on the following link: https://www.crcv.ucf.edu/projects/real-world/.

References

  1. Cárdenas AA, Amin S, Sastry S (2008) Secure control: towards survivable cyber-physical systems. Proc - Int Conf Distrib Comput Syst:495–500. https://doi.org/10.1109/ICDCS.Workshops.2008.40

  2. Ghazal S, Khan US, Saleem MM, Rashid N, Iqbal J (2019) Human activity recognition using 2D skeleton data and supervised machine learning. IET Image Process 13(13):2572–2578. https://doi.org/10.1049/iet-ipr.2019.0030

    Article  Google Scholar 

  3. Ding W, Liu K, Belyaev E, Cheng F (2018) Tensor-based linear dynamical systems for action recognition from 3D skeletons. Pattern Recognit 77:75–86. https://doi.org/10.1016/j.patcog.2017.12.004

    Article  Google Scholar 

  4. Dong J, Jiang W, Huang Q, Bao H, Zhou X Fast and robust multi-person 3D pose estimation from multiple views

  5. Wang X, Yang LT, Song L, Wang H, Ren L, Deen MJ (2021) A tensor-based multiattributes visual feature recognition method for industrial intelligence. IEEE Trans Ind Inf 17(3):2231–2241. https://doi.org/10.1109/TII.2020.2999901

    Article  Google Scholar 

  6. Tan W, Yao Q, Liu J (2022) Overlooked video classification in weakly supervised video anomaly detection. arXiv preprint arXiv:2210.06688. https://doi.org/10.48550/arXiv.2210.06688

  7. Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71. https://doi.org/10.1016/s0004-3702(96)00034-3

    Article  Google Scholar 

  8. Irfanullah, Hussain T, Iqbal A, Yang B, Hussain A (2022) Real time violence detection in surveillance videos using convolutional neural networks. Multimed Tools Appl: 1–23.https://doi.org/10.1007/s11042-022-13169-4

  9. Landi F, Snoek CGM, Cucchiara R (2019) Anomaly locality in video surveillance. [Online]. Available: http://arxiv.org/abs/1901.10364

  10. Xu Q, See J, Lin W (2019) Localization guided fight action detection in surveillance videos. Proc - IEEE Int Conf Multimed Expo 2019-July:568–573. https://doi.org/10.1109/ICME.2019.00104

    Article  Google Scholar 

  11. Jain M, Van Gemert J, Jegou H, Bouthemy P, Snoek CGM (2014) Action localization with tubelets from motion. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:740–747. https://doi.org/10.1109/CVPR.2014.100

  12. Xu D, Ricci E, Yan Y, Song J, Sebe N (2015) Learning deep representations of appearance and motion for anomalous event detection. 8.1–8.12. https://doi.org/10.5244/c.29.8

  13. Wu S, Moore BE, Shah M (2010) Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:2054–2060. https://doi.org/10.1109/CVPR.2010.5539882

  14. Basharat A, Gritai A, Shah M (2008) Learning object motion patterns for anomaly detection and improved object detection. 26th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587510

  15. Cui X, Liu Q, Gao M, Metaxas DN (2011) Abnormal detection using interaction energy potentials. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:3161–3167. https://doi.org/10.1109/CVPR.2011.5995558

  16. Antić B, Ommer B (2011) Video parsing for abnormality detection. Proc IEEE Int Conf Comput Vis:2415–2422. https://doi.org/10.1109/ICCV.2011.6126525

  17. Hospedales T, Gong S, Xiang T (2009) A Markov clustering topic model for mining behaviour in video. Proc IEEE Int Conf Comput Vis (Iccv):1165–1172. https://doi.org/10.1109/ICCV.2009.5459342

  18. Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware activity recognition and anomaly detection in video. IEEE J Sel Top Signal Process 7(1):91–101. https://doi.org/10.1109/JSTSP.2012.2234722

    Article  Google Scholar 

  19. Gnouma M, Ejbali R, Zaied M (2020) Video anomaly detection and localization in crowded scenes. Adv Intell Syst Comput 951(10):87–96. https://doi.org/10.1007/978-3-030-20005-3_9

    Article  Google Scholar 

  20. Kratz L, Nishino K (2009) Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. June, pp. 1446–1453. https://doi.org/10.1109/CVPRW.2009.5206771

  21. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 FPS in MATLAB. Proc IEEE Int Conf Comput Vis:2720–2727. https://doi.org/10.1109/ICCV.2013.338

  22. Zhao B, Fei-Fei L, Xing EP (2011) Online detection of unusual events in videos via dynamic sparse coding. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit:3313–3320. https://doi.org/10.1109/CVPR.2011.5995524

  23. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) Learning temporal regularity in video sequences: supplementary material. Cvpr, pp. 1–31, [Online]. Available: http://arxiv.org/abs/1604.04574

  24. Cheng KW, Chen YT, Fang WH (2015) Video anomaly detection and localization using hierarchical feature representation and Gaussian process regression. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 07:2909–2917. https://doi.org/10.1109/CVPR.2015.7298909

    Article  Google Scholar 

  25. Cong Y, Yuan J, Liu J 2002_Studi Tingkah Laku Pelolosan Kerapu Macan (Epinephelus fuscoguttatus) PADA BUBU (skripsi).pdf

  26. Dutta JK, Banerjee B (2015) Online detection of abnormal events using incremental coding length. Proc Natl Conf Artif Intell 5:3755–3761. https://doi.org/10.1609/aaai.v29i1.9799

    Article  Google Scholar 

  27. Ionescu RT, Smeureanu S, Popescu M, Alexe B (2019) Detecting abnormal events in video using narrowed normality clusters. Proc. - 2019 IEEE Winter Conf. Appl. Comput. Vision, WACV 2019, pp. 1951–1960. https://doi.org/10.1109/WACV.2019.00212

  28. Kim J, Grauman K (2009) Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. June, pp. 2921–2928. https://doi.org/10.1109/CVPRW.2009.5206569

  29. Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. 2009 IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2009, no. 1, pp. 935–942. https://doi.org/10.1109/CVPRW.2009.5206641

  30. Leyva R, Sanchez V, Li C, Member S (2017) Feature sets for online performance. 26(7): 3463–3478

  31. Ahmed SA, Dogra DP, Kar S, Roy PP (2019) Trajectory-based surveillance analysis: a survey. IEEE Trans Circuits Syst Video Technol 29(7):1985–1997. https://doi.org/10.1109/TCSVT.2018.2857489

    Article  Google Scholar 

  32. Zhang T, Jia W, Gong C, Sun J, Song X (2018) Semi-supervised dictionary learning via local sparse constraints for violence detection. Pattern Recognit Lett 107:98–104. https://doi.org/10.1016/j.patrec.2017.08.021

    Article  Google Scholar 

  33. Pȩkalska E, Tax DMJ, Duin RPW (2003) One-class LP classifier for dissimilarity representations. Adv Neural Inf Process Syst

  34. Zhang T, Jia W, Yang B, Yang J, He X, Zheng Z (2017) MoWLD: a robust motion image descriptor for violence detection. Multimed Tools Appl 76(1):1419–1438. https://doi.org/10.1007/s11042-015-3133-0

    Article  Google Scholar 

  35. Zhang T, Jia W, He X, Yang J (2017) Discriminative dictionary learning with motion weber local descriptor for violence detection. IEEE Trans Circuits Syst Video Technol 27(3):696–709. https://doi.org/10.1109/TCSVT.2016.2589858

    Article  Google Scholar 

  36. Hasan M, Choi J, Neumann J, Roy-Chowdhury AK, Davis LS (2016) “Learning temporal regularity in video sequences. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016-Decem:733–742. https://doi.org/10.1109/CVPR.2016.86

    Article  Google Scholar 

  37. Hinami R, Mei T, Satoh S (2017) Joint detection and recounting of abnormal events by learning deep generic knowledge. Proc IEEE Int Conf Comput Vis 2017-Octob:3639–3647. https://doi.org/10.1109/ICCV.2017.391

    Article  Google Scholar 

  38. Smeureanu S, Ionescu RT, Popescu M, Alexe B (2017) Deep appearance features for abnormal behavior detection in video. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 10485 LNCS:779–789. https://doi.org/10.1007/978-3-319-68548-9_70

    Article  MathSciNet  Google Scholar 

  39. Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked RNN framework. Proc IEEE Int Conf Comput Vis 2017-Octob:341–349. https://doi.org/10.1109/ICCV.2017.45

    Article  Google Scholar 

  40. Ravanbakhsh M, Nabi M, Sangineto E, Marcenaro L, Regazzoni C, Sebe N (2017) DITEN, University of Genova DISI, University of Trento Carlos III University of Madrid. Icip, pp. 1577–1581

  41. Hanson A, Pnvr K, Krishnagopal S, Davis L (2019) Bidirectional convolutional LSTM for the detection of violence in videos. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 11130 LNCS:280–295. https://doi.org/10.1007/978-3-030-11012-3_24

    Article  Google Scholar 

  42. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14

  43. Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. Proc - Int Conf Pattern Recognit 2:28–31. https://doi.org/10.1109/icpr.2004.1333992

    Article  Google Scholar 

  44. Zivkovic Z, Van Der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit Lett 27(7):773–780. https://doi.org/10.1016/j.patrec.2005.11.005

    Article  Google Scholar 

  45. Curtis JB, Zumberge JE, Brown SW, Park N (2013) Evaluation of Niobrara and Mowry formation petroleum systems in the Powder River, Denver and Central Basins of the Rocky Mountains, Colorado and. no. March, pp. 31–33

  46. Yeh CH, Lin CY, Muchtar K, Kang LW (2014) Real-time background modeling based on a multi-level texture description. Inf Sci (NY) 269:106–127. https://doi.org/10.1016/j.ins.2013.08.014

    Article  MathSciNet  Google Scholar 

  47. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. Proc IEEE Int Conf Comput Vis 2015 Inter:4489–4497. https://doi.org/10.1109/ICCV.2015.510

    Article  Google Scholar 

  48. Pinhanez CS (1999) Representation and recognition of action in interactive spaces. Media Arts Sci Progr

  49. Khan UA, Javed A, Ashraf R (2021) An effective hybrid framework for content based image retrieval (CBIR). Multimed Tools Appl 80(17):26911–26937. https://doi.org/10.1007/s11042-021-10530-x

    Article  Google Scholar 

  50. Koller D, Weber J, Malik J (1994) Robust multiple car tracking with occlusion reasoning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 800 LNCS:189–196. https://doi.org/10.1007/3-540-57956-7_22

    Article  Google Scholar 

  51. Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872. https://doi.org/10.1109/34.868686

    Article  Google Scholar 

  52. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479-6488. https://doi.org/10.48550/arXiv.1801.04264

  53. Ren H, Liu W, Olsen SI, Escalera S, Moeslund TB (2015) Unsupervised behavior-specific dictionary learning for abnormal event detection 28.1–28.13. https://doi.org/10.5244/c.29.28

  54. Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127. https://doi.org/10.1016/j.cviu.2016.10.010

    Article  Google Scholar 

  55. Zhang Y, Lu H, Zhang L, Ruan X, Sakai S (2016) Video anomaly detection based on locality sensitive hashing filters, vol 59. Elsevier

  56. Kooij JFP, Liem MC, Krijnders JD, Andringa TC, Gavrila DM (2016) Multi-modal human aggression detection. Comput Vis Image Underst 144:106–120. https://doi.org/10.1016/j.cviu.2015.06.009

    Article  Google Scholar 

  57. Saleemi I, Shafique K, Shah M (2009) Probabilistic modeling of scene dynamics for applications in visual surveillance. IEEE Trans Pattern Anal Mach Intell 31(8):1472–1485. https://doi.org/10.1109/TPAMI.2008.175

    Article  Google Scholar 

  58. Zhou S, Shen W, Zeng D, Fang M, Wei Y, Zhang Z (2016) Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes. Signal Process Image Commun 47:358–368. https://doi.org/10.1016/j.image.2016.06.007

    Article  Google Scholar 

  59. Jian M, Lam KM, Dong J (2014) Illumination-insensitive texture discrimination based on illumination compensation and enhancement. Inf Sci (NY) 269:60–72. https://doi.org/10.1016/j.ins.2014.01.019

    Article  MathSciNet  Google Scholar 

  60. Lin CY, Muchtar K, Lin WY, Jian ZY (2020) Moving object detection through image bit-planes representation without thresholding. IEEE Trans Intell Transp Syst 21(4):1404–1414. https://doi.org/10.1109/TITS.2019.2909915

    Article  Google Scholar 

  61. Zhong JX, Li N, Kong W, Liu S, Li TH, Li G (2019) Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2019-June:1237–1246. https://doi.org/10.1109/CVPR.2019.00133

    Article  Google Scholar 

  62. Zaheer MZ, Mahmood A, Astrid M, Lee SI (2020) CLAWS: clustering assisted weakly supervised learning with normalcy suppression for anomalous event detection. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 12367 LNCS:358–376. https://doi.org/10.1007/978-3-030-58542-6_22

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No.62172366), and "Pioneer" and "Leading Goose" R & D. Program of Zhejiang Province (2023C01150). This research work was also supported by the Cluster grant R20143 of Zayed University, UAE.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bailin Yang or Farman Ali.

Ethics declarations

Conflicts of interest

The authors of this manuscript declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Muhammad Shoaib and Farman Ali these authors contributed equally to this work and co-first authors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shoaib, M., Shah, B., Hussain, T. et al. A deep learning-assisted visual attention mechanism for anomaly detection in videos. Multimed Tools Appl 83, 73363–73390 (2024). https://doi.org/10.1007/s11042-023-17770-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17770-z

Keywords