Abstract
Estimating crowd counts automatically via computer vision technology has been attracting great attention due to its numerous practical applications. The crowd counting task has many challenges, and one of the main difficulties is scale variation since the scales of people’s heads vary dramatically across various images and between different regions of the same image. In this paper, we tackle the problem by proposing a novel scale-aware counting model named FPN-LDA Net, where the Feature Pyramid Network (FPN) handles the scale variation problem by fusing multi-scale feature maps from different depth levels of the network and the Local Difference Attention (LDA) module captures the local differences between the multi-scale pyramid pooling features at a specific location and its neighborhood. To tackle the head scale variation within the same image, the dynamically learned difference scores are utilized as the weights to adaptively highlight the scale-varying head regions of the crowd which need to be focused and filter irrelevant background regions. We conduct extensive experiments on three widely adopted benchmark datasets UCF-QNRF, ShanghaiTech and UCF_CC_50. And the experimental results showed the superiority of the proposed method.
Similar content being viewed by others
References
Chen K, Gong S, Xiang T, et al. (2013) Cumulative attribute space for age and crowd density estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2467–2474
Chen J, Su W, Wang Z (2020) Crowd counting with crowd attention convolutional neural network. Neurocomputing 382:210–220
Ge W, Collins R T (2009) Marked point processes for crowd counting. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 2913–2920
Hossain M, Hosseinzadeh M, Chanda O, et al. (2019) Crowd counting using scale-aware attention networks. In: 2019 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1280–1288
Idrees H, Saleemi I, Seibert C, et al. (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554
Idrees H, Tayyab M, Athrey K, et al. (2018) Composition loss for counting, density map estimation and localization in dense crowds. In: Proceedings of the European conference on computer vision (ECCV), pp 532–546
Ilyas N, Ahmad A, Kim K (2019) Casa-crowd: a context-aware scale aggregation cnn-based crowd counting technique. IEEE Access 7:182050–182059
Jiang X, Zhang L, Xu M, et al. (2020) Attention scaling for crowd counting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4706–4715
Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Advances in neural information processing systems, pp 1324–1332
Li M, Zhang Z, Huang K, et al. (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: 2008 19th International conference on pattern recognition. IEEE, pp 1–4
Li Y, Zhang X, Chen D (2018) Csrnet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1091–1100
Lin T Y, Dollár P, Girshick R, et al. (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu J, Gao C, Meng D, et al. (2018) Decidenet: counting varying density crowds through attention guided detection and density estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5197–5206
Liu L, Wang H, Li G, et al. (2018b) Crowd counting using deep recurrent spatial-aware network. arXiv:180700601
Liu L, Qiu Z, Li G, et al. (2019) Crowd counting with deep structured scale integration network. In: Proceedings of the IEEE international conference on computer vision, pp 1774–1783
Liu W, Salzmann M, Fua P (2019) Context-aware crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5099–5108
Liu S, Wang S, Liu X et al (2021) Human memory update strategy: a multi-layer template update mechanism for remote visual monitoring. IEEE Trans Multimed 23:2188–2198
Liu S, Wang S, Liu X et al (2022) Human inertial thinking strategy: a novel fuzzy reasoning mechanism for iot-assisted visual monitoring. IEEE Internet Things J
Liu S, Xu X, Zhang Y et al (2022) A reliable sample selection strategy for weakly supervised visual tracking. IEEE Trans Reliab
Ma Z, Wei X, Hong X et al (2019) Bayesian loss for crowd count estimation with point supervision. In: Proceedings of the IEEE international conference on computer vision, pp 6142–6151
Ma Z, Wei X, Hong X et al (2021) Learning to count via unbalanced optimal transport. In: Proceedings of the AAAI conference on artificial intelligence
Oh M h, Olsen P, Ramamurthy K N (2020) Crowd counting with decomposed uncertainty. In: Proceedings of the AAAI conference on artificial intelligence, pp 11799–11806
Sam D B, Surya S, Babu R V (2017) Switching convolutional neural network for crowd counting. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 4031–4039
Shen Z, Xu Y, Ni B et al (2018) Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5245–5254
Sindagi V A, Patel V M (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6
Sindagi V A, Patel V M (2017) Generating high-quality crowd density maps using contextual pyramid cnns. In: Proceedings of the IEEE international conference on computer vision, pp 1861–1870
Sindagi V A, Patel V M (2018) A survey of recent advances in cnn-based single image crowd counting and density estimation. Pattern Recogn Lett 107:3–16
Sindagi V A, Patel V M (2019) Multi-level bottom-top and top-bottom feature fusion for crowd counting. In: Proceedings of the IEEE international conference on computer vision, pp 1002–1012
Wang C, Zhang H, Yang L et al (2015) Deep people counting in extremely dense crowds. In: Proceedings of the 23rd ACM international conference on multimedia, pp 1299–1302
Wang X, Cai Z, Gao D et al (2019) Towards universal object detection by domain attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7289–7298
Wang Y, Zhang J, Kan M et al (2020) Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12275–12284
Wang Y, Hou J, Houa X et al (2021) A self-training approach for point-supervised object detection and counting in crowds. IEEE Trans Image Process PP(99)
Xiong H, Lu H, Liu C et al (2019) From open set to closed set: counting objects by spatial divide-and-conquer. In: Proceedings of the IEEE International Conference on Computer Vision, pp 8362–8371
Yang Y, Li G, Du D et al (2020) Embedding perspective analysis into multi-column convolutional neural network for crowd counting. IEEE Trans Image Process 30:1395–1407
Yang Y, Li G, Wu Z et al (2020) Weakly-supervised crowd counting learns from sorting rather than locations. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16. Springer, pp 1–17
Zhang Y, Zhou D, Chen S et al (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Zhang L, Shi M, Chen Q (2018) Crowd counting via scale-adaptive convolutional neural network. In: 2018 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1113–1121
Zhang S, Yang Y, Wang P et al (2019) Attend to the difference: cross-modality person re-identification via contrastive correlation. arXiv:191011656
Zhang F, Jiao L, Li L et al (2020) Multiresolution attention extractor for small object detection. arXiv:200605941
Zhao M, Zhang J, Zhang C et al (2019) Leveraging heterogeneous auxiliary tasks to assist crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12736–12745
Zhu L, Zhao Z, Lu C et al (2019) Dual path multi-scale fusion networks with attention for crowd counting. arXiv:190201115
Zhu M, Wang X, Tang J et al (2020) Attentive multi-stage convolutional neural network for crowd counting. Pattern Recogn Lett 135:279–285
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The manuscript has not been published before and is not being considered for publication elsewhere. All authors have contributed to the creation of this manuscript for important intellectual content and read and approved the final manuscript. We declare there is no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by the National Natural Science Foundation of China (NFSC) under Grant U19B2037.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Q., Zhang, S., Liu, X. et al. Scale-aware local difference attention on pyramidal features for crowd counting. Multimed Tools Appl 83, 5165–5180 (2024). https://doi.org/10.1007/s11042-023-15366-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15366-1