research-article

Crowd Counting via Segmentation Guided Attention Networks and Curriculum Loss

Authors:

Toby P. BreckonAuthors Info & Claims

IEEE Transactions on Intelligent Transportation Systems, Volume 23, Issue 9

Pages 15233 - 15243

https://doi.org/10.1109/TITS.2021.3138896

Published: 01 September 2022 Publication History

Abstract

Automatic crowd behaviour analysis is an important task for intelligent transportation systems to enable effective flow control and dynamic route planning for varying road participants. Crowd counting is one of the keys to automatic crowd behaviour analysis. Crowd counting using deep convolutional neural networks (CNN) has achieved encouraging progress in recent years. Researchers have devoted much effort to the design of variant CNN architectures and most of them are based on the pre-trained VGG16 model. Due to the insufficient expressive capacity, the backbone network of VGG16 is usually followed by another cumbersome network specially designed for good counting performance. Although VGG models have been outperformed by Inception models in image classification tasks, the existing crowd counting networks built with Inception modules still only have a small number of layers with basic types of Inception modules. To fill in this gap, in this paper, we firstly benchmark the baseline Inception-v3 model on commonly used crowd counting datasets and achieve surprisingly good performance comparable with or better than most existing crowd counting models. Subsequently, we push the boundary of this disruptive work further by proposing a Segmentation Guided Attention Network (SGANet) with Inception-v3 as the backbone and a novel curriculum loss for crowd counting. We conduct thorough experiments to compare the performance of our SGANet with prior arts and the proposed model can achieve state-of-the-art performance with MAE of 57.6, 6.3 and 87.6 on ShanghaiTechA, ShanghaiTechB and UCF_QNRF, respectively.

References

[1]

Q. Zhou, J. Zhang, L. Che, H. Shan, and J. Z. Wang, “Crowd counting with limited labeling through submodular frame selection,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 5, pp. 1728–1738, May 2019.

[2]

B. Zhan, D. N. Monekosso, P. Remagnino, S. A. Velastin, and L.-Q. Xu, “Crowd analysis: A survey,” Mach. Vis. Appl., vol. 19, nos. 5–6, pp. 345–357, 2008.

Digital Library

[3]

D. Ryan, S. Denman, S. Sridharan, and C. Fookes, “An evaluation of crowd counting methods, features and regression models,” Comput. Vis. Image Understand., vol. 130, pp. 1–17, Jan. 2015.

Digital Library

[4]

V. Sindagi and V. M. Patel, “A survey of recent advances in CNN-based single image crowd counting and density estimation,” Pattern Recognit. Lett., vol. 107, pp. 3–16, May 2018.

[5]

X. Ding, F. He, Z. Lin, Y. Wang, H. Guo, and Y. Huang, “Crowd density estimation using fusion of multi-layer features,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 8, pp. 4776–4787, Aug. 2021.

Digital Library

[6]

W. Xie, J. A. Noble, and A. Zisserman, “Microscopy cell counting and detection with fully convolutional regression networks,” Comput. Methods Biomech. Biomed. Eng., Imag. Vis., vol. 6, no. 3, pp. 283–292, 2018.

[7]

M. Liang, X. Huang, C.-H. Chen, X. Chen, and A. Tokuta, “Counting and classification of highway vehicles by regression analysis,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 5, pp. 2878–2888, Oct. 2015.

Digital Library

[8]

T. Moranduzzo and F. Melgani, “Automatic car counting method for unmanned aerial vehicle images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 3, pp. 1635–1647, Mar. 2014.

[9]

M. V. Giuffrida, M. Minervini, and S. Tsaftaris, “Learning to count leaves in rosette plants,” in Proc. Comput. Vis. Problems Plant Phenotyping (CVPPP), S. A. Tsaftaris, H. Scharr, and T. Pridmore, Eds. BMVA Press, Sep. 2015, pp. 1.1–1.13. 10.5244/C.29.CVPPP.1.

[10]

S. Aich and I. Stavness, “Leaf counting with deep convolutional and deconvolutional networks,” in Proc. ICCV Workshop, Venice, Italy, Oct. 2017, pp. 22–29.

[11]

T. Zhao and R. Nevatia, “Bayesian human segmentation in crowded situations,” in Proc. CVPR, 2003, p. 459.

[12]

L. Dong, V. Parameswaran, V. Ramesh, and I. Zoghlami, “Fast crowd segmentation using shape indexing,” in Proc. ICCV, 2007, pp. 1–8.

[13]

V. B. Subburaman, A. Descamps, and C. Carincotte, “Counting people in the crowd using a generic head detector,” in Proc. IEEE 9th Int. Conf. Adv. Video Signal-Based Surveill., Sep. 2012, pp. 470–475.

[14]

D. Kong, D. Gray, and H. Tao, “A viewpoint invariant approach for crowd counting,” in Proc. ICPR, vol. 3, 2006, pp. 1187–1190.

[15]

P. Siva, M. J. Shafiee, M. Jamieson, and A. Wong, “Real-time, embedded scene invariant crowd counting using scale-normalized histogram of moving gradients (HoMG),” in CVPR Workshop, Jun. 2016, pp. 67–74.

[16]

V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in Proc. NIPS, 2010, pp. 1324–1332.

[17]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.

[18]

Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 589–597.

[19]

H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 532–546.

[20]

V. A. Sindagi and V. M. Patel, “Generating high-quality crowd density maps using contextual pyramid CNNs,” in Proc. ICCV, Oct. 2017, pp. 1879–1888.

[21]

J. Liu, C. Gao, D. Meng, and A. G. Hauptmann, “DecideNet: Counting varying density crowds through attention guided detection and density estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5197–5206.

[22]

Y. Zhang, C. Zhou, F. Chang, and A. C. Kot, “Attention to head locations for crowd counting,” 2018, arXiv:1806.10287.

[23]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. CVPR, Jun. 2016, pp. 2818–2826.

[24]

X. Cao, Z. Wang, Y. Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 734–750.

[25]

X. Jianget al., “Crowd counting and density estimation by trellis encoder-decoder networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 6133–6142.

[26]

D. Guo, K. Li, Z.-J. Zha, and M. Wang, “DADNet: Dilated-attention-deformable ConvNet for crowd counting,” in Proc. ACM Int. Conf. Multimedia, Oct. 2019, pp. 1823–1832.

[27]

Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6142–6151.

[28]

Q. Wang, J. Gao, W. Lin, and Y. Yuan, “Learning from synthetic data for crowd counting in the wild,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8198–8207.

[29]

D. B. Sam, S. Surya, and R. V. Babu, “Switching convolutional neural network for crowd counting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4031–4039.

[30]

Z.-Q. Cheng, J.-X. Li, Q. Dai, X. Wu, J.-Y. He, and A. G. Hauptmann, “Improving the learning of multi-column convolutional neural network for crowd counting,” in Proc. 27th ACM Int. Conf. Multimedia (MM), New York, NY, USA, 2019, pp. 1897–1906. 10.1145/3343031.3350898.

Digital Library

[31]

L. Liu, Z. Qiu, G. Li, S. Liu, W. Ouyang, and L. Lin, “Crowd counting with deep structured scale integration network,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1774–1783.

[32]

V. Ranjan, H. Le, and M. Hoai, “Iterative crowd counting,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 270–285.

[33]

V. Sindagi and V. Patel, “HA-CCN: Hierarchical attention-based crowd counting network,” IEEE Trans. Image Process., vol. 29, pp. 323–335, 2019.

Digital Library

[34]

D. B. Sam and R. V. Babu, “Top-down feedback for crowd counting convolutional neural network,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 1–8.

[35]

V. Sindagi and V. Patel, “Multi-level bottom-top and top-bottom feature fusion for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1002–1012.

[36]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. CVPR, 2015, pp. 1–9.

[37]

N. Liu, Y. Long, C. Zou, Q. Niu, L. Pan, and H. Wu, “ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3225–3234.

[38]

C. Zhang, H. Li, X. Wang, and X. Yang, “Cross-scene crowd counting via deep convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 833–841.

[39]

D. Onoro-Rubio and R. J. López-Sastre, “Towards perspective-free object counting with deep learning,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 615–629.

[40]

M. Shi, Z. Yang, C. Xu, and Q. Chen, “Revisiting perspective information for efficient crowd counting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 7279–7288.

[41]

Z. Yanet al., “Perspective-guided convolution networks for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 952–961.

[42]

M. Zhao, J. Zhang, C. Zhang, and W. Zhang, “Leveraging heterogeneous auxiliary tasks to assist crowd counting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2019, pp. 12736–12745.

[43]

V. A. Sindagi and V. M. Patel, “Inverse attention guided deep crowd counting network,” in Proc. AVSS, Sep. 2019, pp. 1–8.

[44]

Z. Shi, P. Mettes, and C. Snoek, “Counting with focus for free,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4200–4209.

[45]

Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 41–48.

[46]

J. L. Elman, “Learning and development in neural networks: The importance of starting small,” Cognition, vol. 48, no. 1, pp. 71–99, 1993.

[47]

L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G. Hauptmann, “Self-paced curriculum learning,” in Proc. 29th AAAI Conf. Artif. Intell., 2015, pp. 1–7.

[48]

M. P. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models,” in Proc. Adv. Neural Inf. Process. Syst., 2010, pp. 1189–1197.

[49]

Y. Liu, M. Shi, Q. Zhao, and X. Wang, “Point in, box out: Beyond counting persons in crowds,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 6469–6478.

[50]

H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 2547–2554.

[51]

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in Proc. NIPS Workshop, 2017, pp. 1–4.

[52]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. ICLR, 2015, pp. 1–15.

[53]

A. Zhang, J. Shen, Z. Xiao, F. Zhu, X. Zhen, X. Cao, and L. Shao, “Relational attention network for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6788–6797.

[54]

Y. Li, X. Zhang, and D. Chen, “CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1091–1100.

[55]

W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 5099–5108.

[56]

J. Wan and A. Chan, “Adaptive density map generation for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1130–1139.

[57]

A. Zhanget al., “Attentional neural fields for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 5714–5723.

[58]

Z.-Q. Cheng, J.-X. Li, Q. Dai, X. Wu, and A. Hauptmann, “Learning spatial awareness to improve crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6152–6161.

[59]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.

[60]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.

[61]

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708.

[62]

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 116–131.

[63]

Q. Wang and T. P. Breckon, “Unsupervised domain adaptation via structured prediction based selective pseudo-labeling,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 6243–6250.

[64]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Conf. Neural Inf. Process. Syst., 2017, pp. 5998–6008.

Cited By

Zeng XWang HGuo QWu Y(2024)Correlation-attention guided regression network for efficient crowd countingJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.10407899:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.jvcir.2024.104078
Wang YWang FHuang D(2024)Dual-branch counting method for dense crowd based on self-attention mechanismExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121272236:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121272
Wang MZhou XChen Y(2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10115-023-02056-5
Show More Cited By

Index Terms

Crowd Counting via Segmentation Guided Attention Networks and Curriculum Loss
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Crowd Counting via Conditional Generative Adversarial Networks
Pattern Recognition and Computer Vision
Abstract
Most of current crowd counting algorithms use Euclidean loss to narrow the gap between density map and ground-truth, which leads to the low quality of density maps. In order to alleviate the above problems, we propose a crowd counting method based ...
A Novel Spatiotemporal Attention Convolutional Neural Network for Video Crowd Counting
AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

For most existing crowd counting methods, image-based methods are still used for crowd counting in the presence of video datasets, ignoring powerful time information. Thus, a novel spatiotemporal attention convolutional neural network is proposed to ...
Pyramid-dilated deep convolutional neural network for crowd counting
Abstract
Statistics on crowds in crowded scenes can reflect the density level of crowds and provide safety warnings. This is a laborious task if conducted manually. In recent years, automated crowd counting has received extensive attention in the computer ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Intelligent Transportation Systems

IEEE Transactions on Intelligent Transportation Systems Volume 23, Issue 9

Sept. 2022

2944 pages

ISSN:1524-9050

Issue’s Table of Contents

1558-0016 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 September 2022

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zeng XWang HGuo QWu Y(2024)Correlation-attention guided regression network for efficient crowd countingJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.10407899:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.jvcir.2024.104078
Wang YWang FHuang D(2024)Dual-branch counting method for dense crowd based on self-attention mechanismExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121272236:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121272
Wang MZhou XChen Y(2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10115-023-02056-5
Sun LZhao JZhang FZhang RYe K(2024)FMSYS: Fine-Grained Passenger Flow Monitoring in a Large-Scale Metro System Based on AFC Smart Card DataAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2262-4_27(336-349)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1007/978-981-97-2262-4_27
Kshitiz KShreshtha SMounir RVatsa MSingh RAnand SSarkar SParihar SElkind E(2023)Long-term monitoring of bird flocks in the wildProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/704(6344-6352)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/704
Yi JPang YZhou WZhao MZheng F(2023)A Perspective-Embedded Scale-Selection Network for Crowd Counting in Public TransportationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.332800025:5(3420-3432)Online publication date: 7-Nov-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3328000
Miao ZZhang YRen HHu YYin B(2023)Multi-Level Dynamic Graph Convolutional Networks for Weakly Supervised Crowd CountingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.332769825:5(3483-3495)Online publication date: 7-Nov-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3327698
Hao LHuang BJia BXu GMao G(2023)Toward Accurate Crowd Counting in Large Surveillance Areas Based on Passive WiFi SensingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.330370024:12(14086-14096)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3303700
Huang SJaw DHoang QLe T(2023)3FL-Net: An Efficient Approach for Improving Performance of Lightweight Detectors in Rainy Weather ConditionsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.323533924:4(4293-4305)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3235339
Meng YBridge JZhao YJoddrell MQiao YYang XHuang XZheng Y(2023)Transportation Object Counting With Graph-Based Adaptive Auxiliary LearningIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.322650424:3(3422-3437)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1109/TITS.2022.3226504
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents