Abstract
Detection of violence and weaponized violence in closed-circuit television (CCTV) footage requires a comprehensive approach. In this work, we introduce the Smart-City CCTV Violence Detection (SCVD) dataset, specifically designed to facilitate the learning of weapon distribution in surveillance videos. To tackle the complexities of analyzing 3D surveillance video for violence recognition tasks, we propose a novel technique called SSIVD-Net (Salient-Super-Image for Violence Detection). Our method reduces 3D video data complexity, dimensionality, and information loss while improving inference, performance, and explainability through salient-super-Image representations. Considering the scalability and sustainability requirements of futuristic smart cities, the authors introduce the Salient-Classifier, a novel architecture combining a kernelized approach with a residual learning strategy. We evaluate variations of SSIVD-Net and Salient Classifier on our SCVD dataset and benchmark against state-of-the-art (SOTA) models commonly employed in violence detection. Our approach exhibits significant improvements in detecting both weaponized and non-weaponized violence instances. By advancing the SOTA in violence detection, our work offers a practical and scalable solution suitable for real-world applications. The proposed methodology not only addresses the challenges of violence detection in CCTV footage but also contributes to the understanding of weapon distribution in smart surveillance. Ultimately, our research findings should enable smarter and more secure cities, as well as enhance public safety measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bhatti, M.T., Khan, M.G., Aslam, M., Fiaz, M.J.: Weapon detection in real-time CCTV videos using deep learning. IEEE Access 9, 34366–34382 (2021)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.: YOLOv4: optimal speed and accuracy of object detection (2020)
Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset (2018)
Cheng, M., Cai, K., Li, M.: RWF-2000: an open large scale video database for violence detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4183–4190 (2021)
Fan, Q., Chen, C.-F., Panda, R.: An image classifier can suffice for video understanding, June 2021. https://arxiv.org/
Girshick, R.: Fast R-CNN (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation (2013). https://arxiv.org/. Accessed 17 Sep 2022
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv preprint arXiv:1703.06870 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Islam, Z., Rukonuzzaman, M., Ahmed, R., Kabir, M.H., Farazi, M.: Efficient two-stream network for violence detection using separable convolutional LSTM. In: 2021 International Joint Conference on Neural Networks (IJCNN), July 2021. IEEE (2021)
Jain, H., Vikram, A., Mohana, Kashyap, A., Jain, A.: Weapon detection using artificial intelligence and deep learning for security applications. In: 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC) (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017)
Mumtaz, A., Sargano, A.B., Habib, Z.: Violence detection in surveillance videos with deep network using transfer learning. In: 2018 2nd European Conference on Electrical Engineering and Computer Science (EECS), pp. 558–563 (2018)
Nadeem, M.S., Franqueira, V.N.L., Kurugollu, F., Zhai, X.: WVD: a new synthetic dataset for video-based violence detection. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 158–164. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_13
Nievas, E.R., Suarez, O.D., Garcia, G.B., Sukthankar, R.: Hockey fight detection dataset. In: Computer Analysis of Images and Patterns, pp. 332–339. Springer, Heidelberg (2011)
Perez, M., Kot, A.C., Rocha, A.: Detection of real-world fights in surveillance videos. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 2662–2666 (2019)
Pérez-Hernández, F., Tabik, S., Lamas, A., Olmos, R., Fujita, H., Herrera, F.: Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: application in video surveillance. Knowl. Based Syst. 194, 105590 (2020)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection (2015). https://arxiv.org/. Accessed 17 Sep 2022
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger (2016). https://arxiv.org/
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks (2015). https://arxiv.org/
Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR, abs/1402.1128 (2014)
Sharma, M., Baghel, R.: Video surveillance for violence detection using deep learning. In: Borah, S., Emilia Balas, V., Polkowski, Z. (eds.) Advances in Data Science and Management. LNDECT, vol. 37, pp. 411–420. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0978-0_40
Soliman, M.M., Kamal, M.H., El-Massih Nashed, M.A., Mostafa, Y.M., Chawky, B.S., Khattab, D.: Violence recognition from videos using deep learning techniques. In: 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 80–85 (2019)
Sudhakaran, S., Lanz, O.: Learning to detect violent videos using convolutional long short-term memory (2017)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks (2014). https://arxiv.org/
Verma, G.K., Dhillon, A.: A handheld gun detection using faster R-CNN deep learning. In: Proceedings of the 7th International Conference on Computer and Communication Technology, ICCCT-2017 (2017)
Wang, C., Yang, J., Xie, L., Yuan, J.: Kervolutional neural networks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, June 2019
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Aremu, T., Zhiyuan, L., Alameeri, R., Khan, M., Saddik, A.E. (2024). SSIVD-Net: A Novel Salient Super Image Classification and Detection Technique for Weaponized Violence. In: Arai, K. (eds) Intelligent Computing. SAI 2024. Lecture Notes in Networks and Systems, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-62269-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-62269-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62268-7
Online ISBN: 978-3-031-62269-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)