Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Crowd Counting via Segmentation Guided Attention Networks and Curriculum Loss

Published: 01 September 2022 Publication History

Abstract

Automatic crowd behaviour analysis is an important task for intelligent transportation systems to enable effective flow control and dynamic route planning for varying road participants. Crowd counting is one of the keys to automatic crowd behaviour analysis. Crowd counting using deep convolutional neural networks (CNN) has achieved encouraging progress in recent years. Researchers have devoted much effort to the design of variant CNN architectures and most of them are based on the pre-trained VGG16 model. Due to the insufficient expressive capacity, the backbone network of VGG16 is usually followed by another cumbersome network specially designed for good counting performance. Although VGG models have been outperformed by Inception models in image classification tasks, the existing crowd counting networks built with Inception modules still only have a small number of layers with basic types of Inception modules. To fill in this gap, in this paper, we firstly benchmark the baseline Inception-v3 model on commonly used crowd counting datasets and achieve surprisingly good performance comparable with or better than most existing crowd counting models. Subsequently, we push the boundary of this disruptive work further by proposing a Segmentation Guided Attention Network (SGANet) with Inception-v3 as the backbone and a novel curriculum loss for crowd counting. We conduct thorough experiments to compare the performance of our SGANet with prior arts and the proposed model can achieve state-of-the-art performance with MAE of 57.6, 6.3 and 87.6 on ShanghaiTechA, ShanghaiTechB and UCF_QNRF, respectively.

References

[1]
Q. Zhou, J. Zhang, L. Che, H. Shan, and J. Z. Wang, “Crowd counting with limited labeling through submodular frame selection,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 5, pp. 1728–1738, May 2019.
[2]
B. Zhan, D. N. Monekosso, P. Remagnino, S. A. Velastin, and L.-Q. Xu, “Crowd analysis: A survey,” Mach. Vis. Appl., vol. 19, nos. 5–6, pp. 345–357, 2008.
[3]
D. Ryan, S. Denman, S. Sridharan, and C. Fookes, “An evaluation of crowd counting methods, features and regression models,” Comput. Vis. Image Understand., vol. 130, pp. 1–17, Jan. 2015.
[4]
V. Sindagi and V. M. Patel, “A survey of recent advances in CNN-based single image crowd counting and density estimation,” Pattern Recognit. Lett., vol. 107, pp. 3–16, May 2018.
[5]
X. Ding, F. He, Z. Lin, Y. Wang, H. Guo, and Y. Huang, “Crowd density estimation using fusion of multi-layer features,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 8, pp. 4776–4787, Aug. 2021.
[6]
W. Xie, J. A. Noble, and A. Zisserman, “Microscopy cell counting and detection with fully convolutional regression networks,” Comput. Methods Biomech. Biomed. Eng., Imag. Vis., vol. 6, no. 3, pp. 283–292, 2018.
[7]
M. Liang, X. Huang, C.-H. Chen, X. Chen, and A. Tokuta, “Counting and classification of highway vehicles by regression analysis,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 5, pp. 2878–2888, Oct. 2015.
[8]
T. Moranduzzo and F. Melgani, “Automatic car counting method for unmanned aerial vehicle images,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 3, pp. 1635–1647, Mar. 2014.
[9]
M. V. Giuffrida, M. Minervini, and S. Tsaftaris, “Learning to count leaves in rosette plants,” in Proc. Comput. Vis. Problems Plant Phenotyping (CVPPP), S. A. Tsaftaris, H. Scharr, and T. Pridmore, Eds. BMVA Press, Sep. 2015, pp. 1.1–1.13. 10.5244/C.29.CVPPP.1.
[10]
S. Aich and I. Stavness, “Leaf counting with deep convolutional and deconvolutional networks,” in Proc. ICCV Workshop, Venice, Italy, Oct. 2017, pp. 22–29.
[11]
T. Zhao and R. Nevatia, “Bayesian human segmentation in crowded situations,” in Proc. CVPR, 2003, p. 459.
[12]
L. Dong, V. Parameswaran, V. Ramesh, and I. Zoghlami, “Fast crowd segmentation using shape indexing,” in Proc. ICCV, 2007, pp. 1–8.
[13]
V. B. Subburaman, A. Descamps, and C. Carincotte, “Counting people in the crowd using a generic head detector,” in Proc. IEEE 9th Int. Conf. Adv. Video Signal-Based Surveill., Sep. 2012, pp. 470–475.
[14]
D. Kong, D. Gray, and H. Tao, “A viewpoint invariant approach for crowd counting,” in Proc. ICPR, vol. 3, 2006, pp. 1187–1190.
[15]
P. Siva, M. J. Shafiee, M. Jamieson, and A. Wong, “Real-time, embedded scene invariant crowd counting using scale-normalized histogram of moving gradients (HoMG),” in CVPR Workshop, Jun. 2016, pp. 67–74.
[16]
V. Lempitsky and A. Zisserman, “Learning to count objects in images,” in Proc. NIPS, 2010, pp. 1324–1332.
[17]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
[18]
Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 589–597.
[19]
H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot, and M. Shah, “Composition loss for counting, density map estimation and localization in dense crowds,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 532–546.
[20]
V. A. Sindagi and V. M. Patel, “Generating high-quality crowd density maps using contextual pyramid CNNs,” in Proc. ICCV, Oct. 2017, pp. 1879–1888.
[21]
J. Liu, C. Gao, D. Meng, and A. G. Hauptmann, “DecideNet: Counting varying density crowds through attention guided detection and density estimation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5197–5206.
[22]
Y. Zhang, C. Zhou, F. Chang, and A. C. Kot, “Attention to head locations for crowd counting,” 2018, arXiv:1806.10287.
[23]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. CVPR, Jun. 2016, pp. 2818–2826.
[24]
X. Cao, Z. Wang, Y. Zhao, and F. Su, “Scale aggregation network for accurate and efficient crowd counting,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 734–750.
[25]
X. Jianget al., “Crowd counting and density estimation by trellis encoder-decoder networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 6133–6142.
[26]
D. Guo, K. Li, Z.-J. Zha, and M. Wang, “DADNet: Dilated-attention-deformable ConvNet for crowd counting,” in Proc. ACM Int. Conf. Multimedia, Oct. 2019, pp. 1823–1832.
[27]
Z. Ma, X. Wei, X. Hong, and Y. Gong, “Bayesian loss for crowd count estimation with point supervision,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6142–6151.
[28]
Q. Wang, J. Gao, W. Lin, and Y. Yuan, “Learning from synthetic data for crowd counting in the wild,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8198–8207.
[29]
D. B. Sam, S. Surya, and R. V. Babu, “Switching convolutional neural network for crowd counting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4031–4039.
[30]
Z.-Q. Cheng, J.-X. Li, Q. Dai, X. Wu, J.-Y. He, and A. G. Hauptmann, “Improving the learning of multi-column convolutional neural network for crowd counting,” in Proc. 27th ACM Int. Conf. Multimedia (MM), New York, NY, USA, 2019, pp. 1897–1906. 10.1145/3343031.3350898.
[31]
L. Liu, Z. Qiu, G. Li, S. Liu, W. Ouyang, and L. Lin, “Crowd counting with deep structured scale integration network,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1774–1783.
[32]
V. Ranjan, H. Le, and M. Hoai, “Iterative crowd counting,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 270–285.
[33]
V. Sindagi and V. Patel, “HA-CCN: Hierarchical attention-based crowd counting network,” IEEE Trans. Image Process., vol. 29, pp. 323–335, 2019.
[34]
D. B. Sam and R. V. Babu, “Top-down feedback for crowd counting convolutional neural network,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 1–8.
[35]
V. Sindagi and V. Patel, “Multi-level bottom-top and top-bottom feature fusion for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1002–1012.
[36]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. CVPR, 2015, pp. 1–9.
[37]
N. Liu, Y. Long, C. Zou, Q. Niu, L. Pan, and H. Wu, “ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3225–3234.
[38]
C. Zhang, H. Li, X. Wang, and X. Yang, “Cross-scene crowd counting via deep convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 833–841.
[39]
D. Onoro-Rubio and R. J. López-Sastre, “Towards perspective-free object counting with deep learning,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 615–629.
[40]
M. Shi, Z. Yang, C. Xu, and Q. Chen, “Revisiting perspective information for efficient crowd counting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 7279–7288.
[41]
Z. Yanet al., “Perspective-guided convolution networks for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 952–961.
[42]
M. Zhao, J. Zhang, C. Zhang, and W. Zhang, “Leveraging heterogeneous auxiliary tasks to assist crowd counting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2019, pp. 12736–12745.
[43]
V. A. Sindagi and V. M. Patel, “Inverse attention guided deep crowd counting network,” in Proc. AVSS, Sep. 2019, pp. 1–8.
[44]
Z. Shi, P. Mettes, and C. Snoek, “Counting with focus for free,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4200–4209.
[45]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 41–48.
[46]
J. L. Elman, “Learning and development in neural networks: The importance of starting small,” Cognition, vol. 48, no. 1, pp. 71–99, 1993.
[47]
L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G. Hauptmann, “Self-paced curriculum learning,” in Proc. 29th AAAI Conf. Artif. Intell., 2015, pp. 1–7.
[48]
M. P. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models,” in Proc. Adv. Neural Inf. Process. Syst., 2010, pp. 1189–1197.
[49]
Y. Liu, M. Shi, Q. Zhao, and X. Wang, “Point in, box out: Beyond counting persons in crowds,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 6469–6478.
[50]
H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 2547–2554.
[51]
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in Proc. NIPS Workshop, 2017, pp. 1–4.
[52]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. ICLR, 2015, pp. 1–15.
[53]
A. Zhang, J. Shen, Z. Xiao, F. Zhu, X. Zhen, X. Cao, and L. Shao, “Relational attention network for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6788–6797.
[54]
Y. Li, X. Zhang, and D. Chen, “CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1091–1100.
[55]
W. Liu, M. Salzmann, and P. Fua, “Context-aware crowd counting,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 5099–5108.
[56]
J. Wan and A. Chan, “Adaptive density map generation for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 1130–1139.
[57]
A. Zhanget al., “Attentional neural fields for crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 5714–5723.
[58]
Z.-Q. Cheng, J.-X. Li, Q. Dai, X. Wu, and A. Hauptmann, “Learning spatial awareness to improve crowd counting,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 6152–6161.
[59]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, arXiv:1409.1556.
[60]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[61]
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708.
[62]
N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: Practical guidelines for efficient CNN architecture design,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 116–131.
[63]
Q. Wang and T. P. Breckon, “Unsupervised domain adaptation via structured prediction based selective pseudo-labeling,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 6243–6250.
[64]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Conf. Neural Inf. Process. Syst., 2017, pp. 5998–6008.

Cited By

View all
  • (2024)Correlation-attention guided regression network for efficient crowd countingJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.10407899:COnline publication date: 1-Mar-2024
  • (2024)Dual-branch counting method for dense crowd based on self-attention mechanismExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121272236:COnline publication date: 1-Feb-2024
  • (2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 1-May-2024
  • Show More Cited By

Index Terms

  1. Crowd Counting via Segmentation Guided Attention Networks and Curriculum Loss
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Intelligent Transportation Systems
          IEEE Transactions on Intelligent Transportation Systems  Volume 23, Issue 9
          Sept. 2022
          2944 pages

          Publisher

          IEEE Press

          Publication History

          Published: 01 September 2022

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 13 Sep 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Correlation-attention guided regression network for efficient crowd countingJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.10407899:COnline publication date: 1-Mar-2024
          • (2024)Dual-branch counting method for dense crowd based on self-attention mechanismExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121272236:COnline publication date: 1-Feb-2024
          • (2024)JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd countingKnowledge and Information Systems10.1007/s10115-023-02056-566:5(3033-3053)Online publication date: 1-May-2024
          • (2024)FMSYS: Fine-Grained Passenger Flow Monitoring in a Large-Scale Metro System Based on AFC Smart Card DataAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2262-4_27(336-349)Online publication date: 7-May-2024
          • (2023)Long-term monitoring of bird flocks in the wildProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/704(6344-6352)Online publication date: 19-Aug-2023
          • (2023)A Perspective-Embedded Scale-Selection Network for Crowd Counting in Public TransportationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.332800025:5(3420-3432)Online publication date: 7-Nov-2023
          • (2023)Multi-Level Dynamic Graph Convolutional Networks for Weakly Supervised Crowd CountingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.332769825:5(3483-3495)Online publication date: 7-Nov-2023
          • (2023)Toward Accurate Crowd Counting in Large Surveillance Areas Based on Passive WiFi SensingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.330370024:12(14086-14096)Online publication date: 1-Dec-2023
          • (2023)3FL-Net: An Efficient Approach for Improving Performance of Lightweight Detectors in Rainy Weather ConditionsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.323533924:4(4293-4305)Online publication date: 1-Apr-2023
          • (2023)Transportation Object Counting With Graph-Based Adaptive Auxiliary LearningIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.322650424:3(3422-3437)Online publication date: 1-Mar-2023
          • Show More Cited By

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media