Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3508546.3508547acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

Multi-scale Multi-clue Crowd Counting Network

Published: 25 February 2022 Publication History

Abstract

At present, crowd counting under complex background is still a big challenge, but a meaningful task for public safety. We focus on this problem and propose a multi-scale multi-clue crowd counting network (MMNet), which is composed of a feature encoder backbone and four stacked multi-clue crowd estimation modules (MCEM) under multiple scales as decoders. Each module consists of three predictors, including a shared attention predictor (SAP), a density map predictor (DMP) and a local counting map predictor (LCMP). DMP utilizes the information of each pixel on the image, while LCMP divides the image into patches and counts the number of people on these patches, focusing on the number in each patch. These two predictors solve the problem of inaccurate crowd counting under complex background from the perspective of training target. They use the microscopic information and macro information of the image for model training, respectively. SAP helps them concentrate more on the human head region in the image by generating multi-scale shared attention maps from the perspective of feature extraction. Furthermore, we design a multi-task joint training strategy that automatically adjusts the loss weights of different tasks to promote training and the robustness of the model. Extensive experiments on three challenging datasets (ShanghaiTech, UCF_CC_50, UCF-QNRF) show the superior performance of MMNet.

References

[1]
Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. 2018. Scale aggregation network for accurate and efficient crowd counting. In Proceedings of the European Conference on Computer Vision (ECCV). 734–750.
[2]
Antoni B Chan, Zhang-Sheng John Liang, and Nuno Vasconcelos. 2008. Privacy preserving crowd monitoring: Counting people without people models or tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–7.
[3]
Ke Chen, Chen Change Loy, Shaogang Gong, and Tony Xiang. 2012. Feature mining for localised crowd counting. In Bmvc, Vol. 1. 3.
[4]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), Vol. 1. Ieee, 886–893.
[5]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440–1448.
[6]
Haroon Idrees, Imran Saleemi, Cody Seibert, and Mubarak Shah. 2013. Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2547–2554.
[7]
Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al-Maadeed, Nasir Rajpoot, and Mubarak Shah. 2018. Composition loss for counting, density map estimation and localization in dense crowds. In Proceedings of the European Conference on Computer Vision (ECCV). 532–546.
[8]
Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, and Ling Shao. 2019. Crowd counting and density estimation by trellis encoder-decoder networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6133–6142.
[9]
Victor Lempitsky and Andrew Zisserman. 2010. Learning to count objects in images. Advances in neural information processing systems 23 (2010), 1324–1332.
[10]
Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1091–1100.
[11]
Weizhe Liu, Mathieu Salzmann, and Pascal Fua. 2019. Context-aware crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5099–5108.
[12]
Xiyang Liu, Jie Yang, and Wenrui Ding. 2020. Adaptive mixture regression network with local counting map for crowd counting. arXiv preprint arXiv:2005.05776 (2020).
[13]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431–3440.
[14]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779–788.
[15]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015).
[16]
Deepak Babu Sam, Shiv Surya, and R Venkatesh Babu. 2017. Switching convolutional neural network for crowd counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4031–4039.
[17]
Chong Shang, Haizhou Ai, and Bo Bai. 2016. End-to-end crowd counting via joint learning local and global count. In 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 1215–1219.
[18]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).
[19]
Tobias Stahl, Silvia L Pintea, and Jan C van Gemert. 2018. Divide and count: Generic object counting by image divisions. IEEE Transactions on Image Processing 28, 2 (2018), 1035–1044.
[20]
Pongpisit Thanasutives, Ken-ichi Fukui, Masayuki Numao, and Boonserm Kijsirikul. 2020. Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting. arXiv preprint arXiv:2003.05586 (2020).
[21]
Yukun Tian, Yiming Lei, Junping Zhang, and James Z Wang. 2019. Padnet: Pan-density crowd counting. IEEE Transactions on Image Processing 29 (2019), 2714–2727.
[22]
Paul Viola and Michael J Jones. 2004. Robust real-time face detection. International journal of computer vision 57, 2 (2004), 137–154.
[23]
Jia Wan and Antoni Chan. 2019. Adaptive density map generation for crowd counting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1130–1139.
[24]
Xinyu Wu, Guoyuan Liang, Ka Keung Lee, and Yangsheng Xu. 2006. Crowd density estimation using texture analysis and learning. In 2006 IEEE international conference on robotics and biomimetics. IEEE, 214–219.
[25]
Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, and Chunhua Shen. 2019. From open set to closed set: Counting objects by spatial divide-and-conquer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8362–8371.
[26]
Anran Zhang, Jiayi Shen, Zehao Xiao, Fan Zhu, Xiantong Zhen, Xianbin Cao, and Ling Shao. 2019. Relational attention network for crowd counting. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6788–6797.
[27]
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-image crowd counting via multi-column convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 589–597.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
December 2021
699 pages
ISBN:9781450385053
DOI:10.1145/3508546
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Additional Key Words and Phrases: crowd counting
  2. attention
  3. density map
  4. local counting map

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Science and Technology Commission of Shanghai Municipality
  • Science and Technology Major Project of Commission of Science and Technology of Shanghai
  • research of High-tech industry and technological innovation special project in Lingang New Area on ecological environment monitoring system based on 5G Internet of Things and edge computing

Conference

ACAI'21

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 125
    Total Downloads
  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media