research-article

Attention Mixture Network for Crowd Counting via Binarization Transfer

Authors:

Kun YuAuthors Info & Claims

McGE '24: Proceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

Pages 45 - 53

https://doi.org/10.1145/3688867.3690172

Published: 28 October 2024 Publication History

Abstract

Crowd counting endeavors to estimate the numerical count of individuals present within an image depicting a gathering of people. In recent years, there has been notable and gradual advancement in the realm of crowd counting, driven by the integration of attention mechanisms. Nonetheless, these methodologies have predominantly concentrated on either binary or non-binary attention maps in isolation. The binary attention map serves to enhance model performance by distinguishing between the intricate background and the distribution of the crowd. On the other hand, the non-binary attention map is centered around capturing the density gradient within the crowd region. In order to harness the potential of these two attention maps concurrently, we propose a novel Binarization Transfer Module (BTM) for the binarization process in network training and Attention Mixture Net (AMNet) based on BTM. The distinctive attribute of AMNet lies in its ability to simultaneously exploit the binary and non-binary attention maps in a harmonized manner. Furthermore, it effectively mitigates the disruptive influence of a cluttered background through the integration of the binarization transfer module. We have evaluated our method on four popular crowd-counting datasets (ShanghaiTech PartA and PartB, UCF_CC_50, WorldExpo'10, and UCF-QNRF), and AMNet achieves significant improvement in crowd-counting accuracy and outperforms the state-of-the-art methods.

References

[1]

Jiale Cao, Yanwei Pang, and Xuelong Li. 2017. Learning multilayer channel features for pedestrian detection. IEEE Transactions on Image Processing 26, 7 (2017), 3210--3220.

Digital Library

[2]

Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. 2018. Scale aggregation network for accurate and efficient crowd counting. In European Conference on Computer Vision (ECCV). 734--750.

Digital Library

[3]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In European Conference on Computer Vision (ECCV). 801 818.

Digital Library

[4]

Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. 886--893.

Digital Library

[5]

Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona. 2011. Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 4 (2011), 743--761.

Digital Library

[6]

JunyuGao,QiWang,andYuanYuan.2019. SCAR:Spatial-/channel-wiseattention regression networks for crowd counting. Neurocomputing 363 (2019), 1--8.

Digital Library

[7]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 315--323.

[8]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In International Conference on Computer Vision (ICCV). 2961--2969.

[9]

Haroon Idrees, Imran Saleemi, Cody Seibert, and Mubarak Shah. 2013. Multi source multi-scale counting in extremely dense crowd images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2547--2554.

Digital Library

[10]

Haroon Idrees, Muhmmad Tayyab, Kishan Athrey, Dong Zhang, Somaya Al Maadeed, Nasir Rajpoot, and Mubarak Shah. 2018. Composition loss for counting, density map estimation and localization in dense crowds. In European Conference on Computer Vision (ECCV). 532--546.

Digital Library

[11]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 448--456.

[12]

Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, and Ling Shao. 2019. Crowd counting and density estimation by trellis encoder-decoder networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6133--6142.

[13]

Xiaoheng Jiang, Li Zhang, Mingliang Xu, Tianzhu Zhang, Pei Lv, Bing Zhou, Xin Yang, and Yanwei Pang. 2020. Attention scaling for crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4706--4715.

[14]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic opti mization. In International Conference on Learning Representations (ICLR).

[15]

Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. Csrnet: Dilated convolu tional neural networks for understanding the highly congested scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1091--1100.

[16]

Chenchen Liu, Xinyu Weng, and Yadong Mu. 2019. Recurrent attentive zooming for joint crowd counting and precise localization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1217--1226.

[17]

Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin. 2019. Crowd counting with deep structured scale integration network. In Inter national Conference on Computer Vision (ICCV). 1774--1783.

[18]

Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, and Liang Lin. 2019. Crowd counting with deep structured scale integration network. In Inter national Conference on Computer Vision (ICCV). 1774--1783.

[19]

Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, and Hefeng Wu. 2019. Adcrowdnet: An attention-injective deformable convolutional network for crowd understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3225--3234.

[20]

Weizhe Liu, Mathieu Salzmann, and Pascal Fua. 2019. Context-aware crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5099--5108.

[21]

Xiyang Liu, Jie Yang, and Wenrui Ding. 2020. Adaptive mixture regression network with local counting map for crowd counting. In European Conference on Computer Vision (ECCV). 241--257.

Digital Library

[22]

Yan Liu, Lingqiao Liu, Peng Wang, Pingping Zhang, and Yinjie Lei. 2020. Semi supervised crowd counting via self-training on surrogate tasks. In European Conference on Computer Vision (ECCV). 242--259.

Digital Library

[23]

Zhiheng Ma, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2019. Bayesian loss for crowd count estimation with point supervision. In International Conference on Computer Vision (ICCV). 6142--6151.

[24]

Yunqi Miao, Zijia Lin, Guiguang Ding, and Jungong Han. 2020. Shallow fea ture based dense attention network for crowd counting. In AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 11765--11772.

[25]

Min-hwan Oh, Peder Olsen, and Karthikeyan Natesan Ramamurthy. 2020. Crowd counting with decomposed uncertainty. In AAAI Conference on Artificial Intelli gence (AAAI), Vol. 34. 11799--11806.

[26]

Viet-Quoc Pham, Tatsuo Kozakaya, Osamu Yamaguchi, and Ryuzo Okada. 2015. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In International Conference on Computer Vision (ICCV). 3253--3261.

Digital Library

[27]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 779--788.

[28]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transac tions on Pattern Analysis and Machine Intelligence 39, 6 (2016), 1137--1149.

Digital Library

[29]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. 234--241.

[30]

Deepak Babu Sam, Shiv Surya, and R Venkatesh Babu. 2017. Switching convolu tional neural network for crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4031--4039.

[31]

Zan Shen, Yi Xu, Bingbing Ni, Minsi Wang, Jianguo Hu, and Xiaokang Yang. 2018. Crowd counting via adversarial cross-scale consistency pursuit. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5245--5254.

[32]

Miaojing Shi, Zhaohui Yang, Chao Xu, and Qijun Chen. 2019. Revisiting perspec tive information for efficient crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7279--7288.

[33]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional net works for large-scale image recognition. In International Conference on Learning Representations (ICLR).

[34]

Vishwanath A Sindagi and Vishal M Patel. 2017. Cnn-based cascaded multi task learning of high-level prior and density estimation for crowd counting. In International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1--6.

[35]

Vishwanath A Sindagi and Vishal M Patel. 2017. Generating high-quality crowd density maps using contextual pyramid cnns. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1861--1870.

[36]

Vishwanath A Sindagi and Vishal M Patel. 2019. Ha-ccn: Hierarchical attention based crowd counting network. IEEE Transactions on Image Processing 29 (2019), 323--335.

Digital Library

[37]

Vishwanath A Sindagi and Vishal M Patel. 2019. Multi-level bottom-top and top-bottom feature fusion for crowd counting. In International Conference on Computer Vision (ICCV). 1002--1012.

[38]

Venkatesh Bala Subburaman, Adrien Descamps, and Cyril Carincotte. 2012. Counting people in the crowd using a generic head detector. In 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance. 470 475.

Digital Library

[39]

Xin Tan, Chun Tao, Tongwei Ren, Jinhui Tang, and Gangshan Wu. 2019. Crowd counting via multi-layer regression. 1907--1915.

[40]

Qi Wang, Tao Han, Junyu Gao, and Yuan Yuan. 2021. Neuron linear transfor mation: Modeling the domain shift for crowd counting. In IEEE Transactions on Neural Networks and Learning Systems.

[41]

Yi Wang and Yuexian Zou. 2016. Fast visual object counting via example-based density estimation. In International Conference on Image Processing (ICIP). 3653 3657.

[42]

Xingjiao Wu, Yingbin Zheng, Hao Ye, Wenxin Hu, Tianlong Ma, Jing Yang, and Liang He. 2020. Counting crowds with varying densities via adaptive scenario discovery framework. Neurocomputing 397 (2020), 127--138.

[43]

HaipengXiong,HaoLu,ChengxinLiu,LiangLiu,ZhiguoCao,andChunhuaShen. 2019. From openset to closed set: Counting objects by spatial divide-and-conquer. In International Conference on Computer Vision (ICCV). 8362--8371.

[44]

Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, and Xiang Bai. 2019. Learn to scale: Generating multipolar normalized density maps for crowd counting. In International Conference on Computer Vision (ICCV). 8382--8390.

[45]

Yifan Yang, Guorong Li, Zhe Wu, Li Su, Qingming Huang, and Nicu Sebe. 2020. Reverse perspective network for perspective-aware object counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4374--4383.

[46]

Anran Zhang, Jiayi Shen, Zehao Xiao, Fan Zhu, Xiantong Zhen, Xianbin Cao, and Ling Shao. 2019. Relational attention network for crowd counting. In International Conference on Computer Vision (ICCV). 6788--6797.

[47]

Anran Zhang, Lei Yue, Jiayi Shen, Fan Zhu, Xiantong Zhen, Xianbin Cao, and Ling Shao. 2019. Attentional neural fields for crowd counting. In International Conference on Computer Vision (ICCV). 5714--5723.

[48]

Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross scene crowd counting via deep convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 833--841.

[49]

YingyingZhang,DesenZhou,SiqinChen,ShenghuaGao,andYiMa.2016. Single image crowd counting via multi-column convolutional neural network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 589--597.

[50]

Muming Zhao, Jian Zhang, Chongyang Zhang, and Wenjun Zhang. 2019. Lever aging heterogeneous auxiliary tasks to assist crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 12736--12745

Index Terms

Attention Mixture Network for Crowd Counting via Binarization Transfer
1. Human-centered computing
  1. Visualization
    1. Visualization application domains
      1. Visual analytics
    2. Visualization techniques
      1. Graph drawings
      2. Heat maps

Recommendations

Crowd counting with crowd attention convolutional neural network
Abstract
Crowd counting is a challenging problem due to the scene complexity and scale variation. Although deep learning has achieved great improvement in crowd counting, scene complexity affects the judgement of these methods and they usually ...
Multi Scale Attention Network for Crowd Counting
CSAE '21: Proceedings of the 5th International Conference on Computer Science and Application Engineering

Reasonable management and control of extra crowded scenes have become a hot topic in recent years. Counting people from density map generated from the object location annotations is an effective way to analyze crowd information and control crowds in ...
MACC Net: Multi-task attention crowd counting network
Abstract
Crowd counting and Crowd density map estimation face several challenges, including occlusions, non-uniform density, and intra-scene scale and perspective variations. Significant progress has been made in the development of most crowd counting ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

McGE '24: Proceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

October 2024

77 pages

ISBN:9798400711947

DOI:10.1145/3688867

Program Chairs:
Cheng Jin
Professor, Fudan University, China
,
Liang He
Professor, East China Normal University, China
,
Mingli Song
Professor, Zhejiang University, China
,
Rui Wang
Professor, IIE, Chinese Academy of Sciences, China

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
39
Total Downloads

Downloads (Last 12 months)39
Downloads (Last 6 weeks)6

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten