Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3688867.3690172acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Attention Mixture Network for Crowd Counting via Binarization Transfer

Published: 28 October 2024 Publication History

Abstract

Crowd counting endeavors to estimate the numerical count of individuals present within an image depicting a gathering of people. In recent years, there has been notable and gradual advancement in the realm of crowd counting, driven by the integration of attention mechanisms. Nonetheless, these methodologies have predominantly concentrated on either binary or non-binary attention maps in isolation. The binary attention map serves to enhance model performance by distinguishing between the intricate background and the distribution of the crowd. On the other hand, the non-binary attention map is centered around capturing the density gradient within the crowd region. In order to harness the potential of these two attention maps concurrently, we propose a novel Binarization Transfer Module (BTM) for the binarization process in network training and Attention Mixture Net (AMNet) based on BTM. The distinctive attribute of AMNet lies in its ability to simultaneously exploit the binary and non-binary attention maps in a harmonized manner. Furthermore, it effectively mitigates the disruptive influence of a cluttered background through the integration of the binarization transfer module. We have evaluated our method on four popular crowd-counting datasets (ShanghaiTech PartA and PartB, UCF_CC_50, WorldExpo'10, and UCF-QNRF), and AMNet achieves significant improvement in crowd-counting accuracy and outperforms the state-of-the-art methods.

References

[1]
Jiale Cao, Yanwei Pang, and Xuelong Li. 2017. Learning multilayer channel features for pedestrian detection. IEEE Transactions on Image Processing 26, 7 (2017), 3210--3220.
[2]
Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. 2018. Scale aggregation network for accurate and efficient crowd counting. In European Conference on Computer Vision (ECCV). 734--750.
[3]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision (ECCV). 801 818.
[4]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. 886--893.
[5]
Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona. 2011. Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 4 (2011), 743--761.
[6]
JunyuGao,QiWang,andYuanYuan.2019. SCAR:Spatial-/channel-wiseattention regression networks for crowd counting. Neurocomputing 363 (2019), 1--8.
[7]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 315--323.
[8]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In International Conference on Computer Vision (ICCV). 2961--2969.
[9]
Haroon Idrees, Imran Saleemi, Cody Seibert, and Mubarak Shah. 2013. Multi source multi-scale counting in extremely dense crowd images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2547--2554.
[10]
Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al Maadeed, Nasir Rajpoot, and Mubarak Shah. 2018. Composition loss for counting, density map estimation and localization in dense crowds. In European Conference on Computer Vision (ECCV). 532--546.
[11]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 448--456.
[12]
Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, and Ling Shao. 2019. Crowd counting and density estimation by trellis encoder-decoder networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6133--6142.
[13]
Xiaoheng Jiang, Li Zhang, Mingliang Xu, Tianzhu Zhang, Pei Lv, Bing Zhou, Xin Yang, and Yanwei Pang. 2020. Attention scaling for crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4706--4715.
[14]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic opti mization. In International Conference on Learning Representations (ICLR).
[15]
Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. Csrnet: Dilated convolu tional neural networks for understanding the highly congested scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1091--1100.
[16]
Chenchen Liu, Xinyu Weng, and Yadong Mu. 2019. Recurrent attentive zooming for joint crowd counting and precise localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1217--1226.
[17]
Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin. 2019. Crowd counting with deep structured scale integration network. In Inter national Conference on Computer Vision (ICCV). 1774--1783.
[18]
Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin. 2019. Crowd counting with deep structured scale integration network. In Inter national Conference on Computer Vision (ICCV). 1774--1783.
[19]
Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, and Hefeng Wu. 2019. Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3225--3234.
[20]
Weizhe Liu, Mathieu Salzmann, and Pascal Fua. 2019. Context-aware crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5099--5108.
[21]
Xiyang Liu, Jie Yang, and Wenrui Ding. 2020. Adaptive mixture regression network with local counting map for crowd counting. In European Conference on Computer Vision (ECCV). 241--257.
[22]
Yan Liu, Lingqiao Liu, Peng Wang, Pingping Zhang, and Yinjie Lei. 2020. Semi supervised crowd counting via self-training on surrogate tasks. In European Conference on Computer Vision (ECCV). 242--259.
[23]
Zhiheng Ma, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2019. Bayesian loss for crowd count estimation with point supervision. In International Conference on Computer Vision (ICCV). 6142--6151.
[24]
Yunqi Miao, Zijia Lin, Guiguang Ding, and Jungong Han. 2020. Shallow fea ture based dense attention network for crowd counting. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 11765--11772.
[25]
Min-hwan Oh, Peder Olsen, and Karthikeyan Natesan Ramamurthy. 2020. Crowd counting with decomposed uncertainty. In AAAI Conference on Artificial Intelli gence (AAAI), Vol. 34. 11799--11806.
[26]
Viet-Quoc Pham, Tatsuo Kozakaya, Osamu Yamaguchi, and Ryuzo Okada. 2015. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In International Conference on Computer Vision (ICCV). 3253--3261.
[27]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 779--788.
[28]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transac tions on Pattern Analysis and Machine Intelligence 39, 6 (2016), 1137--1149.
[29]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. 234--241.
[30]
Deepak Babu Sam, Shiv Surya, and R Venkatesh Babu. 2017. Switching convolu tional neural network for crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4031--4039.
[31]
Zan Shen, Yi Xu, Bingbing Ni, Minsi Wang, Jianguo Hu, and Xiaokang Yang. 2018. Crowd counting via adversarial cross-scale consistency pursuit. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5245--5254.
[32]
Miaojing Shi, Zhaohui Yang, Chao Xu, and Qijun Chen. 2019. Revisiting perspec tive information for efficient crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7279--7288.
[33]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional net works for large-scale image recognition. In International Conference on Learning Representations (ICLR).
[34]
Vishwanath A Sindagi and Vishal M Patel. 2017. Cnn-based cascaded multi task learning of high-level prior and density estimation for crowd counting. In International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1--6.
[35]
Vishwanath A Sindagi and Vishal M Patel. 2017. Generating high-quality crowd density maps using contextual pyramid cnns. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1861--1870.
[36]
Vishwanath A Sindagi and Vishal M Patel. 2019. Ha-ccn: Hierarchical attention based crowd counting network. IEEE Transactions on Image Processing 29 (2019), 323--335.
[37]
Vishwanath A Sindagi and Vishal M Patel. 2019. Multi-level bottom-top and top-bottom feature fusion for crowd counting. In International Conference on Computer Vision (ICCV). 1002--1012.
[38]
Venkatesh Bala Subburaman, Adrien Descamps, and Cyril Carincotte. 2012. Counting people in the crowd using a generic head detector. In 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance. 470 475.
[39]
Xin Tan, Chun Tao, Tongwei Ren, Jinhui Tang, and Gangshan Wu. 2019. Crowd counting via multi-layer regression. 1907--1915.
[40]
Qi Wang, Tao Han, Junyu Gao, and Yuan Yuan. 2021. Neuron linear transfor mation: Modeling the domain shift for crowd counting. In IEEE Transactions on Neural Networks and Learning Systems.
[41]
Yi Wang and Yuexian Zou. 2016. Fast visual object counting via example-based density estimation. In International Conference on Image Processing (ICIP). 3653 3657.
[42]
Xingjiao Wu, Yingbin Zheng, Hao Ye, Wenxin Hu, Tianlong Ma, Jing Yang, and Liang He. 2020. Counting crowds with varying densities via adaptive scenario discovery framework. Neurocomputing 397 (2020), 127--138.
[43]
HaipengXiong,HaoLu,ChengxinLiu,LiangLiu,ZhiguoCao,andChunhuaShen. 2019. From openset to closed set: Counting objects by spatial divide-and-conquer. In International Conference on Computer Vision (ICCV). 8362--8371.
[44]
Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, and Xiang Bai. 2019. Learn to scale: Generating multipolar normalized density maps for crowd counting. In International Conference on Computer Vision (ICCV). 8382--8390.
[45]
Yifan Yang, Guorong Li, Zhe Wu, Li Su, Qingming Huang, and Nicu Sebe. 2020. Reverse perspective network for perspective-aware object counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4374--4383.
[46]
Anran Zhang, Jiayi Shen, Zehao Xiao, Fan Zhu, Xiantong Zhen, Xianbin Cao, and Ling Shao. 2019. Relational attention network for crowd counting. In International Conference on Computer Vision (ICCV). 6788--6797.
[47]
Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, and Ling Shao. 2019. Attentional neural fields for crowd counting. In International Conference on Computer Vision (ICCV). 5714--5723.
[48]
Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross scene crowd counting via deep convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 833--841.
[49]
YingyingZhang,DesenZhou,SiqinChen,ShenghuaGao,andYiMa.2016. Single image crowd counting via multi-column convolutional neural network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 589--597.
[50]
Muming Zhao, Jian Zhang, Chongyang Zhang, and Wenjun Zhang. 2019. Lever aging heterogeneous auxiliary tasks to assist crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 12736--12745

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
McGE '24: Proceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice
October 2024
77 pages
ISBN:9798400711947
DOI:10.1145/3688867
  • Program Chairs:
  • Cheng Jin,
  • Liang He,
  • Mingli Song,
  • Rui Wang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binarization transfer module
  2. binary attention maps
  3. crowd counting
  4. deep learning
  5. non-binary attention maps

Qualifiers

  • Research-article

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 33
    Total Downloads
  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media