Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3582649.3582653acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicigpConference Proceedingsconference-collections
research-article

Dual-Channel Improved ShuffleNet (DCISN) for Real-time Violence Detection

Published: 07 April 2023 Publication History

Abstract

In this paper, we propose a lightweight deep learning network architecture, named dual-channel improved ShuffleNet (DCISN), for real-time violence detection in videos. The proposed extracts space-time features using two parallel channels like SlowFast networks and adopts newly designed ShuffleNet units to construct lightweight stage modules. Cross-stage connections are introduced to boost the accuracy of the DCISN network. Cascaded depth-wise convolution layers and Squeeze-and-Excitation (SE) block are employed in the newly designed ShuffleNet units to lower computation cost and meanwhile ensure good accuracy. A DCISN model has been designed and evaluated on recognized benchmark datasets, namely Hockey-Fight, Movies-Fight and RWF-2000. The DCISN model has 0.168M parameters and requires only 0.253GFlops in computation cost. Experiment results suggest that, in comparison with reported schemes, the DCISN model achieves competitive accuracy with much lower computation cost.

References

[1]
Batyrkhan Omarov, Sergazi Narynov, Zhandos Zhumanov, Aidana Gumar, and Mariyam Khassanova. 2022. State-of-the-art violence detection techniques in video surveillance security systems: A systematic review. PeerJ Computer Science 8 (2022). http://dx.doi.org/10.7717/PEERJ-CS.920
[2]
Laptev and Lindeberg. 2003. Space-time interest points. In Proceedings Ninth IEEE International Conference on Computer Vision. 432–439 vol.1. https://doi.org/10.1109/ICCV.2003.1238378
[3]
Heng Wang and Cordelia Schmid. 2013. Action Recognition with Improved Trajectories. In 2013 IEEE International Conference on Computer Vision. 3551–3558. https://doi.org/10.1109/ICCV.2013.441
[4]
Enrique Bermejo Nievas, Oscar Deniz Suarez, Gloria Bueno García, and Rahul Sukthankar. 2011. Violence Detection in Video Using Computer Vision Techniques. In Computer Analysis of Images and Patterns, Pedro Real, Daniel Diaz-Pernil, Helena Molina-Abril, Ainhoa Berciano, and Walter Kropatsch (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 332–339.
[5]
Long Xu, Chen Gong, Jie Yang, Qiang Wu, and Lixiu Yao. 2014. Violent video detection based on MoSIFT feature and sparse coding. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3538–3542. https://doi.org/10.1109/ICASSP.2014.6854259
[6]
Tasweer Ahmad, Junaid Rafique, Hassam Muazzam, and Tahir Rizvi. 2015. Using Discrete Cosine Transform Based Features for Human Action Recognition. Journal of Image and Graphics, Vol. 3, No. 2, pp. 96-101.
[7]
Piotr Bilinski and Francois Bremond. 2016. Human violence recognition and detection in surveillance videos. In 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 30–36. https: //doi.org/10.1109/AVSS.2016.7738019
[8]
Peipei, Zhou, Qinghai, Ding, Haibo, Luo, Xinglin, and Hou. 2018. Violence detection in surveillance video using low-level features. PloS one (2018).
[9]
Naresh Kumar and Nagarajan Sukavanam. 2018. Motion Trajectory for Human Action Recognition Using Fourier Temporal Features of Skeleton Joints. Journal of Image and Graphics, Vol. 6, No. 2, pp. 174-180.
[10]
Javad Mahmoodi and Afsane Salajeghe. 2019. A classification method based on optical flow for violence detection. Expert Systems with Applications 127 (2019), 121–127. https://doi.org/10.1016/j.eswa.2019.02.032
[11]
Mohtavipour, S.M., Saeidi, M. & Arabsorkhi, A. 2022. A multi-stream CNN for deep violence detection in video sequences using handcrafted features. Vis Comput 38, 2057–2072 (2022). https://doi.org/10.1007/s00371-021-02266-4
[12]
Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. CoRR abs/1406.2199 (2014). arXiv:1406.2199 http://arxiv.org/abs/1406.2199
[13]
Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2017. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 677–691. https://doi.org/10.1109/TPAMI.2016.2599174
[14]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning Spatiotemporal Features with 3D Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). 4489–4497. https://doi.org/10.1109/ICCV.2015.510
[15]
João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. CoRR abs/1705.07750 (2017). arXiv:1705.07750 http://arxiv.org/abs/1705.07750
[16]
Seyma Yucer and Yusuf Sinan Akgul. 2018. 3D Human Action Recognition with Siamese-LSTM Based Deep Metric Learning. Journal of Image and Graphics, Vol. 6, No. 1, pp. 21-26.
[17]
Rohit Halder and Rajdeep Chatterjee. 2020. CNN-BiLSTM Model for Violence Detection in Smart Surveillance. SN computer science 1, 4 (2020).
[18]
Zhihong Dong, Jie Qin, and Yunhong Wang. 2016. Multi-stream Deep Networks for Person to Person Violence Detection in Videos. In Pattern Recognition, Tieniu Tan, Xuelong Li, Xilin Chen, Jie Zhou, Jian Yang, and Hong Cheng (Eds.). Springer Singapore, Singapore, 517–531.
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90
[20]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2017. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CoRR abs/1711.11248 (2017). arXiv:1711.11248 http://arxiv.org/abs/1711.11248
[21]
Ming Cheng, Kunjing Cai, and Ming Li. 2021. RWF-2000: An Open Large Scale Video Database for Violence Detection. In 2020 25th International Conference on Pattern Recognition (ICPR). 4183–4190. https://doi.org/10.1109/ICPR48806.2021.9412502
[22]
Wei Wang, Shuai Dong, Kun Zou, and Wensheng Li. 2022. A Lightweight Network for Violence Detection. In 2022 the 5th International Conference on Image and Graphics Processing (ICIGP) (ICIGP 2022). Association for Computing Machinery, New York, NY, USA, 15–21. https://doi.org/10.1145/3512388.3512391
[23]
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast Networks for Video Recognition. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 6201–6210. https://doi.org/10.1109/ICCV.2019.00630
[24]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun, and IEEE. 2018. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. IEEE, NEW YORK, 6848–6856.
[25]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision – ECCV 2018, Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 122–138.
[26]
Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. 2020. Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 8 (2020), 2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372

Index Terms

  1. Dual-Channel Improved ShuffleNet (DCISN) for Real-time Violence Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIGP '23: Proceedings of the 2023 6th International Conference on Image and Graphics Processing
    January 2023
    246 pages
    ISBN:9781450398572
    DOI:10.1145/3582649
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 April 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep learning
    2. Depth-wise convolution
    3. Lightweight model
    4. Point-wise convolution
    5. Violence detection

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICIGP 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 44
      Total Downloads
    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media