CSFNet: A novel counting network based on context features and multi-scale information

Xiong, Liyan; Li, Zhida; Huang, Xiaohui; Wang, Heng

doi:10.1007/s00530-024-01603-6

CSFNet: A novel counting network based on context features and multi-scale information

Regular Paper
Published: 11 December 2024

Volume 31, article number 7, (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Liyan Xiong¹,
Zhida Li¹,
Xiaohui Huang¹ &
…
Heng Wang¹

60 Accesses
Explore all metrics

Abstract

The goal of crowd-counting techniques is to estimate the number of people in an image or video in real-time and accurately. In recent years, with the development of deep learning, the accuracy of the crowd-counting task has improved. However, the accuracy of the crowd-counting task in crowded scenes with large-scale variations still needs improvement. To address this situation, this paper proposes a novel crowd-counting network: Context-Scaled Fusion Network (CSFNet). The details include: (1) the design of the Multi-Scale Receptive Field Fusion Module (MRFF Module), which employs multiple dilated convolutional layers with different dilation rates and uses a fusion mechanism to obtain multi-scale hybrid information to generate higher quality feature maps; (2) the proposal of the Contextual Space Attention Module (CSA Module), which can obtain pixel-level contextual information and combine it with the attention map to enable the model to autonomously learn and focus on important regions, thereby achieving a reduction in counting error. In this paper, the model is trained and evaluated on five datasets: ShanghaiTech, UCF_CC_50, WorldExpo'10, BEIJING-BRT, and Mall. The experimental results show that CSFNet outperforms many state-of-the-art (SOTA) methods on these datasets, demonstrating its superior counting ability and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting

Article 30 January 2024

Double multi-scale feature fusion network for crowd counting

Article 07 March 2024

An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales

Article 26 September 2022

Data availability

The annotated dataset used in this paper is requested from the corresponding author. No datasets were generated or analysed during the current study.

References

Siva, P., Javad Shafiee, M., Jamieson, M. and Wong, A.: Real-time, embedded scene invariant crowd counting using scale-normalized histogram of moving gradients (homg). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 67–74 (2016)
Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. Advances in Neural Information Processing Systems (NIPS). 1324–1332(2010)
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34, 743–761 (2012)
Article Google Scholar
Wu, B., Nevatia, R.: Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int. J. Comput. Vis. 75(2), 247–266 (2007)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Proc. IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) 1, 886–893 (2005)
Google Scholar
Rao, A.S., Gubbi, J., Marusic, S., et al.: Estimation of crowd densityby clustering motion cues. Vis. Comput. 31, 1533–1552 (2015)
Article Google Scholar
Chan, B., Vasconcelos, N.: Bayesian poisson regression for crowd counting. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1, 2(2009)
Pham, V.Q., Kozakaya, T., Yamaguchi, Q., Okada, R.: Count Forest: co-voting uncertain number of targets using random forest for crowd density estimation. Proc. IEEE Int. Conf. Comput. Vis. (ICCV). 2015, 3253–3261 (2015)
Google Scholar
Tripathy, S.K., Srivastava, S., Bajaj, D., et al.: A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation. Soft. Comput. 28, 8319–8335 (2024)
Article Google Scholar
Davies, A.C., Yin, J., Velastin, S.: Crowd monitoring using image processing. Electron. Commun. Eng. J. 7, 37–47 (1995)
Article Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 589–597 (2016)
Liu W., Salzmann M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 5099–5108 (2019)
Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI): 11765–11772 (2020)
Sam, D.B., Surya, S., Venkatesh Babu, R.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 4031–4039 (2017)
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 1091–1100 (2018)
Wang, F., Liu, K., Long, F., Sang, N., Xia, X., Sang, J.: Joint cnn and transformer network via weakly supervised learning for efficient crowd counting. arXiv preprint arXiv:2203.06388 (2022)
Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5245–5254 (2018)
Tian, Y., Chu, X., Wang, H.: Cctrans: simplifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021)
Song, Q., Wang, C., Jiang, Z. et al.: Rethinking counting and localization in crowds: a purely point-based framework. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV): 3365–3374 (2021)
Chen, Y., Yang, J., Chen, B., Shaoyi, Du.: Counting varying density crowds through density guided adaptive selection cnn and transformer estimation. IEEE Trans. Circ. Syst. Video. Technol. 33(3), 1055–1068 (2023)
Article Google Scholar
Wang, S., Lu, Y., Zhou, T., Di, H., Lu, L., Zhang, L.: SCLNet: spatial context learning network for congested crowd counting. Neurocomputing 404, 227–239 (2020)
Article Google Scholar
Tripathy, S. K. and Srivastava, R.: A novel deep architecture for multi-task crowd analysis. In: IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, pp. 1–6 (2022)
Du, Z., Shi, M., Deng, J., Zafeiriou, S.: Redesigning multi-scale neural network for crowd counting. IEEE Trans. Image Process. 32, 3664–3678 (2023)
Article Google Scholar
Chen, I., Chen, W.T., Liu, Y.W., Yang, M.H. and Kuo, S.Y.: Improving point-based crowd counting and localization based on auxiliary point guidance. arxiv preprint arxiv:2405.10589.(2024)
Hossain, M. A., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. WACV: 1280–1288 (2019)
Zhang, A., Yue, L., Shen, J., Zhu, F., Zhen, X., Cao, X., Shao, L.: Attentional neural fields for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV): 5713–5722 (2019)
Kang, D., Chan, A.B.: Crowd counting by adaptively fusing predictions from an image pyramid. In: Proceedings of the British Machine Vision Conference (BMVC): 89 (2018)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 7794–7803 (2018)
Woo, S., Park, J., Lee, J.-Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2018)
Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L. and Wu, H.: Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3225–3234 (2019)
Wu, X., Zheng, Y., Ye, H., Hu, W., Yang, J. and He, L.: Adaptive scenario discovery for crowd counting. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2382–2386 (2019)
Zhang, A., Shen, J., Xiao, Z., Zhu, F., Zhen, X., Cao, X., and Shao, L.: Relational attention network for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6787–6796, 1, 3 (2019)
Liu, C., Weng, X., and Mu, Y.: Recurrent attentive zooming for joint crowd counting and precise localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1217–1226, 3(2019)
Tian, M., Guo, H., Long, C.: Multi-level attentive convoluntional neural network for crowd counting. arXiv preprint arXiv:2105.11422 (2021)
Tripathy, S.K., Srivastava, R.: AMS-CNN: attentive multi-stream CNN for video-based crowd counting. Int. J. Multimed. Info. Retr. 10, 239–254 (2021)
Article Google Scholar
Tripathy, S.K., Srivastava, S., Srivastava, R.: MHAMD-MST-CNN: multiscale head attention guided multiscale density maps fusion for video crowd counting via multi-attention spatial-temporal CNN. Comput. Methods Biomech. Biomed. En.: Imaging & Visualiz. 11(5), 1777–1790 (2023)
Google Scholar
Xiong, L., Li, Z., Huang, X., et al.: TFA-CNN: an efficient method for dealing with crowding and noise problems in crowd counting. Multimedia Syst. 29, 3259–3276 (2023)
Article Google Scholar
Liyan Xiong, Hu., Yi, X.H., Huang, W.: An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales. Multimedia Tools Appl. 82(9), 13929–13949 (2023)
Article Google Scholar
Zhai, W., Li, Q., Zhou, Y., et al.: DA²Net: a dual attention-aware network for robust crowd counting. Multimedia Syst. 29, 3027–3040 (2023)
Article Google Scholar
Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. IEEE Conf Comput Vis Pattern Recognit (CVPR), pp. 833–841 (2015)
Topkaya, S., Erdogan, H., and Porikli, F.: Counting people by clustering person detector outputs. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 313–318 (2014)
Zeng, X., Wu, Y., Hu, S., Wang, R., Ye, Y.: Dspnet: deep scale purifier network for dense crowd counting. Expert Syst. Appl. 141, 112977 (2020)
Article Google Scholar
Shi, X., Li, X., Wu, C., Kong, S., Yang, J.S., He, L.: A real-time deep network for crowd counting. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2328–2332 (2020)
Zhang, L., Shi, Z., Cheng, M.M., Liu, Y., Bian, J.W., Zhou, J.T., Zheng, G., Zeng, Z.: Nonlinear regression via deep negative correlation learning. IEEE Trans. Pattern Anal. Mach. Intell. 43, 982–998 (2021)
Article Google Scholar
Ding, X., He, F., Lin, Z., Wang, Y., Guo, H., Huang, Y.: Crowd density estimation using fusion of multilayer features. IEEE Trans. Intell. Transp. Syst. 22(8), 4776–4787 (2021)
Article Google Scholar
Ma, Y.: Inception-based crowd counting-being fast while remaining accurate. arXiv preprint arXiv:2210.09796 (2022)
Liang, D., Chen, X., Xu, W., Zhou, Y., Bai, X.: Transcrowd: weakly-supervised crowd counting with transformers. Sci. China Inf. Sci. 65(6), 1–14 (2022)
Article Google Scholar
Chenfeng, Xu., Liang, D., Yongchao, Xu., Bai, S., Zhan, W., Bai, X., Tomizuka, M.: AutoScale: learning to scale for crowd counting. Int. J. Comput. Vis. 130(2), 405–434 (2022)
Article Google Scholar
Aldhaheri, S., Alotaibi, R., Alzahrani, B., Hadi, A., Mahmood, A., Alhothali, A., Barnawi, A.: Macc net: multi-task attention crowd counting network. Appl. Intell. (2022). https://doi.org/10.1007/s10489-022-03954-x
Article Google Scholar
Jiang, S., Li, B., Cheng, F., Liu, Q.: Crowd Counting with Online Knowledge Learning. arXiv preprint. arXiv:2303.10318 (2023)
Zhang, Li., Yan, L., Zhang, M., Jingang, Lu.: T²CNN: a novel method for crowd counting via two-task convolutional neural network. Vis. Comput. 39(1), 73–85 (2023)
Article Google Scholar
Hu, C., Cheng, K., Xie, Y., Li, T.: Arbitrary perspective crowd counting via local to global algorithm. Multimed. Tools Appl. 79, 15059–15071 (2020)
Article Google Scholar
Wang, W., Liu, Q., Wang, W.: Pyramid-dilated deep convolutional neural network for crowd counting. Appl. Intell. 52(2), 1825–1837 (2022)
Article Google Scholar
Khan, S.D., Basalamah, S.: Sparse to dense scale prediction for crowd counting in high density crowds. Arab. J. Sci. Eng. 46(4), 3051–3065 (2021)
Article Google Scholar
Ma, T., Ji, Q., Ning, L.: Scene invariant crowd counting using multi-scales head detection in video surveillance. IET Image Process 12(12), 2258–2263 (2018)
Article Google Scholar
Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multiscale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)
Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. British Mach Vis Conf (BMVC): 1–11 (2012)
Ding, X., Lin, Z., He, F., Wang, Y., Huang, Y.: A deeply-recursive convolutional network for crowd counting. ICASSP: 1942–1946 (2018)
Li, H., Zhang, S., Kong, W.: Crowd counting using a self-attention multi-scale cascaded network. IET Comput. Vis. 13(6), 556–561 (2019)
Article Google Scholar
Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., Ding, E.: Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 952–961 (2019)
Marsden, M., McGuinness, K., Little, S., O'Connor, N.E.: Fully convolutional crowd counting on highly congested scenes. In: Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 27–33 (2017)
Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. Eur. Conf. Comput. Vis. (2022). https://doi.org/10.1007/978-3-031-19769-7_3
Article Google Scholar
Guo, M., et al.: Regressor-segmenter mutual prompt learning for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)
Chen L-C, Papandreou G, Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arxiv. arxiv preprint arxiv:1706.05587 5 (2017)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant (Nos. 62067002, 61967006, and 62462031), in part by the Science and Technology Project of the Transportation Department of Jiangxi Province, China (No. 2022X0040) and in part by the Natural Science Foundation of Jiangxi Province under Grant 20242BAB26023.

Author information

Authors and Affiliations

School of Information and Software Engineering, East China Jiaotong University, Nanchang, 330013, China
Liyan Xiong, Zhida Li, Xiaohui Huang & Heng Wang

Authors

Liyan Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Zhida Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Heng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Liyan Xiong and Zhida Li completed the entire manuscript, Xiaohui Huang optimized the manuscript, and Heng Wang ran and recorded the experimental results. All authors participated in writing and checking the manuscrip.

Corresponding author

Correspondence to Zhida Li.

Ethics declarations

Conflict of interest

The authors declare that there are no competing interests related to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiong, L., Li, Z., Huang, X. et al. CSFNet: A novel counting network based on context features and multi-scale information. Multimedia Systems 31, 7 (2025). https://doi.org/10.1007/s00530-024-01603-6

Download citation

Received: 15 June 2024
Accepted: 25 November 2024
Published: 11 December 2024
DOI: https://doi.org/10.1007/s00530-024-01603-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CSFNet: A novel counting network based on context features and multi-scale information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting

Double multi-scale feature fusion network for crowd counting

An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

CSFNet: A novel counting network based on context features and multi-scale information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting

Double multi-scale feature fusion network for crowd counting

An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation