Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

CSFNet: A novel counting network based on context features and multi-scale information

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

The goal of crowd-counting techniques is to estimate the number of people in an image or video in real-time and accurately. In recent years, with the development of deep learning, the accuracy of the crowd-counting task has improved. However, the accuracy of the crowd-counting task in crowded scenes with large-scale variations still needs improvement. To address this situation, this paper proposes a novel crowd-counting network: Context-Scaled Fusion Network (CSFNet). The details include: (1) the design of the Multi-Scale Receptive Field Fusion Module (MRFF Module), which employs multiple dilated convolutional layers with different dilation rates and uses a fusion mechanism to obtain multi-scale hybrid information to generate higher quality feature maps; (2) the proposal of the Contextual Space Attention Module (CSA Module), which can obtain pixel-level contextual information and combine it with the attention map to enable the model to autonomously learn and focus on important regions, thereby achieving a reduction in counting error. In this paper, the model is trained and evaluated on five datasets: ShanghaiTech, UCF_CC_50, WorldExpo'10, BEIJING-BRT, and Mall. The experimental results show that CSFNet outperforms many state-of-the-art (SOTA) methods on these datasets, demonstrating its superior counting ability and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The annotated dataset used in this paper is requested from the corresponding author. No datasets were generated or analysed during the current study.

References

  1. Siva, P., Javad Shafiee, M., Jamieson, M. and Wong, A.: Real-time, embedded scene invariant crowd counting using scale-normalized histogram of moving gradients (homg). In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 67–74 (2016)

  2. Lempitsky, V.S., Zisserman, A.: Learning to count objects in images. Advances in Neural Information Processing Systems (NIPS). 1324–1332(2010)

  3. Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34, 743–761 (2012)

    Article  Google Scholar 

  4. Wu, B., Nevatia, R.: Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors. Int. J. Comput. Vis. 75(2), 247–266 (2007)

    Article  Google Scholar 

  5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Proc. IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR) 1, 886–893 (2005)

    Google Scholar 

  6. Rao, A.S., Gubbi, J., Marusic, S., et al.: Estimation of crowd densityby clustering motion cues. Vis. Comput. 31, 1533–1552 (2015)

    Article  Google Scholar 

  7. Chan, B., Vasconcelos, N.: Bayesian poisson regression for crowd counting. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 1, 2(2009)

  8. Pham, V.Q., Kozakaya, T., Yamaguchi, Q., Okada, R.: Count Forest: co-voting uncertain number of targets using random forest for crowd density estimation. Proc. IEEE Int. Conf. Comput. Vis. (ICCV). 2015, 3253–3261 (2015)

    Google Scholar 

  9. Tripathy, S.K., Srivastava, S., Bajaj, D., et al.: A Novel cascaded deep architecture with weak-supervision for video crowd counting and density estimation. Soft. Comput. 28, 8319–8335 (2024)

    Article  Google Scholar 

  10. Davies, A.C., Yin, J., Velastin, S.: Crowd monitoring using image processing. Electron. Commun. Eng. J. 7, 37–47 (1995)

    Article  Google Scholar 

  11. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 589–597 (2016)

  12. Liu W., Salzmann M., Fua, P.: Context-aware crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 5099–5108 (2019)

  13. Miao, Y., Lin, Z., Ding, G., Han, J.: Shallow feature based dense attention network for crowd counting. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI): 11765–11772 (2020)

  14. Sam, D.B., Surya, S., Venkatesh Babu, R.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 4031–4039 (2017)

  15. Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 1091–1100 (2018)

  16. Wang, F., Liu, K., Long, F., Sang, N., Xia, X., Sang, J.: Joint cnn and transformer network via weakly supervised learning for efficient crowd counting. arXiv preprint arXiv:2203.06388 (2022)

  17. Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., Yang, X.: Crowd counting via adversarial cross-scale consistency pursuit. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5245–5254 (2018)

  18. Tian, Y., Chu, X., Wang, H.: Cctrans: simplifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021)

  19. Song, Q., Wang, C., Jiang, Z. et al.: Rethinking counting and localization in crowds: a purely point-based framework. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV): 3365–3374 (2021)

  20. Chen, Y., Yang, J., Chen, B., Shaoyi, Du.: Counting varying density crowds through density guided adaptive selection cnn and transformer estimation. IEEE Trans. Circ. Syst. Video. Technol. 33(3), 1055–1068 (2023)

    Article  Google Scholar 

  21. Wang, S., Lu, Y., Zhou, T., Di, H., Lu, L., Zhang, L.: SCLNet: spatial context learning network for congested crowd counting. Neurocomputing 404, 227–239 (2020)

    Article  Google Scholar 

  22. Tripathy, S. K. and Srivastava, R.: A novel deep architecture for multi-task crowd analysis. In: IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India, pp. 1–6 (2022)

  23. Du, Z., Shi, M., Deng, J., Zafeiriou, S.: Redesigning multi-scale neural network for crowd counting. IEEE Trans. Image Process. 32, 3664–3678 (2023)

    Article  Google Scholar 

  24. Chen, I., Chen, W.T., Liu, Y.W., Yang, M.H. and Kuo, S.Y.: Improving point-based crowd counting and localization based on auxiliary point guidance. arxiv preprint arxiv:2405.10589.(2024)

  25. Hossain, M. A., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. WACV: 1280–1288 (2019)

  26. Zhang, A., Yue, L., Shen, J., Zhu, F., Zhen, X., Cao, X., Shao, L.: Attentional neural fields for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV): 5713–5722 (2019)

  27. Kang, D., Chan, A.B.: Crowd counting by adaptively fusing predictions from an image pyramid. In: Proceedings of the British Machine Vision Conference (BMVC): 89 (2018)

  28. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 7794–7803 (2018)

  29. Woo, S., Park, J., Lee, J.-Y., So Kweon, I.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 3–19 (2018)

  30. Liu, N., Long, Y., Zou, C., Niu, Q., Pan, L. and Wu, H.: Adcrowdnet: an attention-injective deformable convolutional network for crowd understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3225–3234 (2019)

  31. Wu, X., Zheng, Y., Ye, H., Hu, W., Yang, J. and He, L.: Adaptive scenario discovery for crowd counting. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2382–2386 (2019)

  32. Zhang, A., Shen, J., Xiao, Z., Zhu, F., Zhen, X., Cao, X., and Shao, L.: Relational attention network for crowd counting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6787–6796, 1, 3 (2019)

  33. Liu, C., Weng, X., and Mu, Y.: Recurrent attentive zooming for joint crowd counting and precise localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1217–1226, 3(2019)

  34. Tian, M., Guo, H., Long, C.: Multi-level attentive convoluntional neural network for crowd counting. arXiv preprint arXiv:2105.11422 (2021)

  35. Tripathy, S.K., Srivastava, R.: AMS-CNN: attentive multi-stream CNN for video-based crowd counting. Int. J. Multimed. Info. Retr. 10, 239–254 (2021)

    Article  Google Scholar 

  36. Tripathy, S.K., Srivastava, S., Srivastava, R.: MHAMD-MST-CNN: multiscale head attention guided multiscale density maps fusion for video crowd counting via multi-attention spatial-temporal CNN. Comput. Methods Biomech. Biomed. En.: Imaging & Visualiz. 11(5), 1777–1790 (2023)

    Google Scholar 

  37. Xiong, L., Li, Z., Huang, X., et al.: TFA-CNN: an efficient method for dealing with crowding and noise problems in crowd counting. Multimedia Syst. 29, 3259–3276 (2023)

    Article  Google Scholar 

  38. Liyan Xiong, Hu., Yi, X.H., Huang, W.: An efficient multi-scale contextual feature fusion network for counting crowds with varying densities and scales. Multimedia Tools Appl. 82(9), 13929–13949 (2023)

    Article  Google Scholar 

  39. Zhai, W., Li, Q., Zhou, Y., et al.: DA2Net: a dual attention-aware network for robust crowd counting. Multimedia Syst. 29, 3027–3040 (2023)

    Article  Google Scholar 

  40. Zhang, C., Li, H., Wang, X., Yang, X.: Cross-scene crowd counting via deep convolutional neural networks. IEEE Conf Comput Vis Pattern Recognit (CVPR), pp. 833–841 (2015)

  41. Topkaya, S., Erdogan, H., and Porikli, F.: Counting people by clustering person detector outputs. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 313–318 (2014)

  42. Zeng, X., Wu, Y., Hu, S., Wang, R., Ye, Y.: Dspnet: deep scale purifier network for dense crowd counting. Expert Syst. Appl. 141, 112977 (2020)

    Article  Google Scholar 

  43. Shi, X., Li, X., Wu, C., Kong, S., Yang, J.S., He, L.: A real-time deep network for crowd counting. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2328–2332 (2020)

  44. Zhang, L., Shi, Z., Cheng, M.M., Liu, Y., Bian, J.W., Zhou, J.T., Zheng, G., Zeng, Z.: Nonlinear regression via deep negative correlation learning. IEEE Trans. Pattern Anal. Mach. Intell. 43, 982–998 (2021)

    Article  Google Scholar 

  45. Ding, X., He, F., Lin, Z., Wang, Y., Guo, H., Huang, Y.: Crowd density estimation using fusion of multilayer features. IEEE Trans. Intell. Transp. Syst. 22(8), 4776–4787 (2021)

    Article  Google Scholar 

  46. Ma, Y.: Inception-based crowd counting-being fast while remaining accurate. arXiv preprint arXiv:2210.09796 (2022)

  47. Liang, D., Chen, X., Xu, W., Zhou, Y., Bai, X.: Transcrowd: weakly-supervised crowd counting with transformers. Sci. China Inf. Sci. 65(6), 1–14 (2022)

    Article  Google Scholar 

  48. Chenfeng, Xu., Liang, D., Yongchao, Xu., Bai, S., Zhan, W., Bai, X., Tomizuka, M.: AutoScale: learning to scale for crowd counting. Int. J. Comput. Vis. 130(2), 405–434 (2022)

    Article  Google Scholar 

  49. Aldhaheri, S., Alotaibi, R., Alzahrani, B., Hadi, A., Mahmood, A., Alhothali, A., Barnawi, A.: Macc net: multi-task attention crowd counting network. Appl. Intell. (2022). https://doi.org/10.1007/s10489-022-03954-x

    Article  Google Scholar 

  50. Jiang, S., Li, B., Cheng, F., Liu, Q.: Crowd Counting with Online Knowledge Learning. arXiv preprint. arXiv:2303.10318 (2023)

  51. Zhang, Li., Yan, L., Zhang, M., Jingang, Lu.: T2CNN: a novel method for crowd counting via two-task convolutional neural network. Vis. Comput. 39(1), 73–85 (2023)

    Article  Google Scholar 

  52. Hu, C., Cheng, K., Xie, Y., Li, T.: Arbitrary perspective crowd counting via local to global algorithm. Multimed. Tools Appl. 79, 15059–15071 (2020)

    Article  Google Scholar 

  53. Wang, W., Liu, Q., Wang, W.: Pyramid-dilated deep convolutional neural network for crowd counting. Appl. Intell. 52(2), 1825–1837 (2022)

    Article  Google Scholar 

  54. Khan, S.D., Basalamah, S.: Sparse to dense scale prediction for crowd counting in high density crowds. Arab. J. Sci. Eng. 46(4), 3051–3065 (2021)

    Article  Google Scholar 

  55. Ma, T., Ji, Q., Ning, L.: Scene invariant crowd counting using multi-scales head detection in video surveillance. IET Image Process 12(12), 2258–2263 (2018)

    Article  Google Scholar 

  56. Idrees, H., Saleemi, I., Seibert, C., Shah, M.: Multi-source multiscale counting in extremely dense crowd images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2547–2554 (2013)

  57. Chen, K., Loy, C.C., Gong, S., Xiang, T.: Feature mining for localised crowd counting. British Mach Vis Conf (BMVC): 1–11 (2012)

  58. Ding, X., Lin, Z., He, F., Wang, Y., Huang, Y.: A deeply-recursive convolutional network for crowd counting. ICASSP: 1942–1946 (2018)

  59. Li, H., Zhang, S., Kong, W.: Crowd counting using a self-attention multi-scale cascaded network. IET Comput. Vis. 13(6), 556–561 (2019)

    Article  Google Scholar 

  60. Yan, Z., Yuan, Y., Zuo, W., Tan, X., Wang, Y., Wen, S., Ding, E.: Perspective-guided convolution networks for crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 952–961 (2019)

  61. Marsden, M., McGuinness, K., Little, S., O'Connor, N.E.: Fully convolutional crowd counting on highly congested scenes. In: Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP), pp. 27–33 (2017)

  62. Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. Eur. Conf. Comput. Vis. (2022). https://doi.org/10.1007/978-3-031-19769-7_3

    Article  Google Scholar 

  63. Guo, M., et al.: Regressor-segmenter mutual prompt learning for crowd counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

  64. Chen L-C, Papandreou G, Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arxiv. arxiv preprint arxiv:1706.05587 5 (2017)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant (Nos. 62067002, 61967006, and 62462031), in part by the Science and Technology Project of the Transportation Department of Jiangxi Province, China (No. 2022X0040) and in part by the Natural Science Foundation of Jiangxi Province under Grant 20242BAB26023.

Author information

Authors and Affiliations

Authors

Contributions

Liyan Xiong and Zhida Li completed the entire manuscript, Xiaohui Huang optimized the manuscript, and Heng Wang ran and recorded the experimental results. All authors participated in writing and checking the manuscrip.

Corresponding author

Correspondence to Zhida Li.

Ethics declarations

Conflict of interest

The authors declare that there are no competing interests related to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, L., Li, Z., Huang, X. et al. CSFNet: A novel counting network based on context features and multi-scale information. Multimedia Systems 31, 7 (2025). https://doi.org/10.1007/s00530-024-01603-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-024-01603-6

Keywords