Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3664647.3680592acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection

Published: 28 October 2024 Publication History

Abstract

Human behavior anomaly detection aims to identify unusual human actions, playing a crucial role in intelligent surveillance and other areas. The current mainstream methods still adopt reconstruction or future frame prediction techniques. However, reconstructing or predicting low-level pixel features easily enables the network to achieve overly strong generalization ability, allowing anomalies to be reconstructed or predicted as effectively as normal data. Different from their methods, inspired by the Student-Teacher Network, we propose a novel framework called the Multilevel Guidance-Exploration Network (MGENet), which detects anomalies through the difference in high-level representation between the Guidance and Exploration network. Specifically, we first utilize the Normalizing Flow that takes skeletal keypoints as input to guide an RGB encoder, which takes unmasked RGB frames as input, to explore latent motion features. Then, the RGB encoder guides the mask encoder, which takes masked RGB frames as input, to explore the latent appearance feature. Additionally, we design a Behavior-Scene Matching Module to detect scene-related behavioral anomalies. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance on ShanghaiTech and UBnormal datasets, with AUC of 86.9% and 74.3%, respectively. The code is available at https://github.com/molu-ggg/GENet.

References

[1]
Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Ubnormal: New benchmark for supervised open-set video anomaly detection. In CVPR. 20143--20153.
[2]
Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luvcić, and Cordelia Schmid. 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF international conference on computer vision. 6836--6846.
[3]
Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. 2021. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021).
[4]
Antonio Barbalau, Radu Tudor Ionescu, Mariana-Iuliana Georgescu, Jacob Dueholm, Bharathkumar Ramachandra, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B Moeslund, and Mubarak Shah. 2023. Ssmtl: Revisiting self-supervised multi-task learning for video anomaly detection. Computer Vision and Image Understanding, Vol. 229 (2023), 103656.
[5]
Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger. 2020. Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings. In CVPR. 4183--4192.
[6]
Bertasius. 2021. Is space-time attention all you need for video understanding?. In ICML, Vol. 2. 4.
[7]
Ruichu Cai, Hao Zhang, Wen Liu, Shenghua Gao, and Zhifeng Hao. 2021. Appearance-motion memory consistency network for video anomaly detection. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 938--946.
[8]
Yunpeng Chang, Zhigang Tu, Wei Xie, Bin Luo, Shifu Zhang, Haigang Sui, and Junsong Yuan. 2022. Video anomaly detection with spatio-temporal dissociation. Pattern Recognition, Vol. 122 (2022), 108213.
[9]
Chengwei Chen, Yuan Xie, Shaohui Lin, Angela Yao, Guannan Jiang, Wei Zhang, Yanyun Qu, Ruizhi Qiao, Bo Ren, and Lizhuang Ma. 2022. Comprehensive regularization in a bi-directional predictive network for video anomaly detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 230--238.
[10]
Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. 2023. Context autoencoder for self-supervised representation learning. International Journal of Computer Vision (2023), 1--16.
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[12]
Hao-Shu Fang, Jiefeng Li, Hongyang Tang, Chao Xu, Haoyi Zhu, Yuliang Xiu, Yong-Lu Li, and Cewu Lu. 2022. Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[13]
Feng. 2021. Mist: Multiple instance self-training framework for video anomaly detection. In CVPR. 14009--14018.
[14]
Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Stefano D'arrigo, Marco Aurelio Sterpa, Alessio Sampieri, and Fabio Galasso. 2023. Contracting Skeletal Kinematic Embeddings for Anomaly Detection. arXiv preprint arXiv:2301.09489 (2023).
[15]
Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, and Mubarak Shah. 2021. Anomaly detection in video via self-supervised and multi-task learning. In CVPR. 12742--12752.
[16]
Mariana Iuliana Georgescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, and Mubarak Shah. 2022. A background-agnostic framework with adversarial training for abnormal event detection in video. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 9 (2022), 4505--4523.
[17]
Yunpeng et al. Gong. 2022. Person re-identification method based on color attack and joint defence. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2871--2880.
[18]
Hirschorn. 2022. Normalizing Flows for Human Pose Anomaly Detection. arXiv preprint arXiv:2211.10946 (2022).
[19]
Xiangyu Huang, Caidan Zhao, and Zhiqiang Wu. 2023. A Video Anomaly Detection Framework Based on Appearance-Motion Semantics Representation Consistency. In ICASSP 2023. IEEE, 1--5.
[20]
Yingxin Lai, Guoqing Yang, Yifan He, Zhiming Luo, and Shaozi Li. 2024. Selective Domain-Invariant Feature for Generalizable Deepfake Detection. In ICASSP 2024. IEEE, 2335--2339.
[21]
Jungho Lee, Minhyeok Lee, Dogyoon Lee, and Sangyoun Lee. 2023. Hierarchically decomposed graph convolutional networks for skeleton-based action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10444--10453.
[22]
Shuo Li, Fang Liu, and Licheng Jiao. 2022. Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection. In AAAI, Vol. 36. 1395--1403.
[23]
Wujin Li, Jiawei Zhan, Jinbao Wang, Bizhong Xia, Bin-Bin Gao, Jun Liu, Chengjie Wang, and Feng Zheng. 2022. Towards continual adaptation in industrial anomaly detection. In Proceedings of the 30th ACM International Conference on Multimedia. 2871--2880.
[24]
Wen Liu, Weixin Luo, Dongze Lian, and Shenghua Gao. 2018. Future frame prediction for anomaly detection--a new baseline. In CVPR. 6536--6545.
[25]
Zhian Liu, Yongwei Nie, Chengjiang Long, Qing Zhang, and Guiqing Li. 2021. A hybrid video anomaly detection framework via memory-augmented flow reconstruction and flow-guided frame prediction. In CVPR. 13588--13597.
[26]
Weixin Luo, Wen Liu, and Shenghua Gao. 2017. A revisit of sparse coding based anomaly detection in stacked rnn framework. In Proceedings of the IEEE international conference on computer vision. 341--349.
[27]
Amir Markovitz, Gilad Sharir, Itamar Friedman, Lihi Zelnik-Manor, and Shai Avidan. 2020. Graph embedded pose clustering for anomaly detection. In CVPR. 10539--10547.
[28]
Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Mansour, and Svetha Venkatesh. 2019. Learning regularity in skeleton trajectories for anomaly detection in videos. In CVPR. 11996--12004.
[29]
Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning memory-guided normality for anomaly detection. In CVPR. 14372--14381.
[30]
Nicolae-Cuatualin Ristea, Neelu Madan, Radu Tudor Ionescu, Kamal Nasrollahi, Fahad Shahbaz Khan, Thomas B Moeslund, and Mubarak Shah. 2022. sspcab. In CVPR. 13576--13586.
[31]
Marco Rudolph, Tom Wehrbein, Bodo Rosenhahn, and Bastian Wandt. 2023. Asymmetric Student-Teacher Networks for Industrial Anomaly Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2592--2602.
[32]
Sultani. 2018. Real-world anomaly detection in surveillance videos. In CVPR. 6479--6488.
[33]
Shengyang Sun and Gong. 2023. Long-Short Temporal Co-Teaching for Weakly Supervised Video Anomaly Detection. arXiv preprint arXiv:2303.18044 (2023).
[34]
Shengyang Sun and Xiaojin Gong. 2023. Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection. In CVPR. 22846--22856.
[35]
Yu Tian, Guansong Pang, Yuanhong Chen, Rajvinder Singh, Johan W Verjans, and Gustavo Carneiro. 2021. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In ICCV. 4975--4986.
[36]
Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. 2022. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems, Vol. 35 (2022), 10078--10093.
[37]
Guodong Wang and Wang. 2022. Video anomaly detection by solving decoupled spatio-temporal jigsaw puzzles. In ECCV. Springer, 494--511.
[38]
Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, and Yu-Gang Jiang. 2023. Masked video distillation: Rethinking masked feature modeling for self-supervised video representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6312--6322.
[39]
Yizhou Wang, Can Qin, Yue Bai, Yi Xu, Xu Ma, and Yun Fu. 2022. Making Reconstruction-based Method Great Again for Video Anomaly Detection. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 1215--1220.
[40]
Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, and Qi Tian. 2022. Mvp: Multimodality-guided visual pre-training. In European Conference on Computer Vision. Springer, 337--353.
[41]
Peng Wu and Jing Liu. 2021. Learning causal temporal relation and feature discrimination for anomaly detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 3513--3527.
[42]
Haoke Xiao, Lv Tang, Bo Li, Zhiming Luo, and Shaozi Li. 2024. Zero-shot co-salient object detection framework. In ICASSP 2024. IEEE, 4010--4014.
[43]
Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. 2022. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9653--9663.
[44]
Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. 2018. Pose Flow: Efficient Online Pose Tracking. In BMVC.
[45]
Shinji Yamada, Satoshi Kamiya, and Kazuhiro Hotta. 2022. Reconstructed Student-Teacher and Discriminative Networks for Anomaly Detection. In 2022 IROS. IEEE, 2725--2732.
[46]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
[47]
Zhiwei Yang, Jing Liu, Zhaoyang Wu, Peng Wu, and Xiaotao Liu. 2023. Video Event Restoration Based on Keyframes for Video Anomaly Detection. In CVPR. 14592--14601.
[48]
M Zaigham Zaheer, Arif Mahmood, M Haris Khan, Mattia Segu, Fisher Yu, and Seung-Ik Lee. 2022. Generative cooperative learning for unsupervised video anomaly detection. In CVPR. 14744--14754.
[49]
Jiahang Zhang, Lilang Lin, and Jiaying Liu. 2023. Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning. In Proceedings of the 31st ACM International Conference on Multimedia. 7175--7183.
[50]
Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, et al. 2022. Cae v2: Context autoencoder with clip target. arXiv preprint arXiv:2211.09799 (2022).

Index Terms

  1. A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. human anomaly detection
    2. multimodal features
    3. one-class

    Qualifiers

    • Research-article

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 24
      Total Downloads
    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media