Abstract
In recent years, crowd counting in still images has attracted many research interests due to its applications in public safety. However, it remains a challenging task for reasons of perspective and scale variations. In this paper, we propose an effective Skip-connection Convolutional Neural Network (SCNN) for crowd counting to overcome the issue of scale variations. The proposed SCNN architecture consists of several multi-scale units to extract multi-scale features. Each multi-scale unit including three convolutional layers builds connections between the input and each convolutional layer. In addition, we propose a scale-related training method to improve the accuracy and robustness of crowd counting. We evaluate our method on three crowd counting benchmarks. Experimental results verify the efficiency of the proposed method, and it achieves superior performance compared with other methods.
Similar content being viewed by others
References
Idrees H, Saleemi I, Seibert C, Shah M (2013) Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2547–2554
Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 589–597
Sam D B, Surya S, Babu R V (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol 1, p 6
Boominathan L, Kruthiventi S S, Babu R V (2016) Crowdnet: A deep convolutional network for dense crowd counting. In: Proceedings of the 2016 ACM on Multimedia Conference, pp 640–644. ACM
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440
Onoro-Rubio D, López-Sastre RJ (2016) Towards perspective-free object counting with deep learning. In: European Conference on Computer Vision, pp 615–629. Springer
Lin S-F, Chen J-Y, Chao H-X (2001) Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans Syst Man Cybern Syst Hum 31(6):645–654
Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In: 2005 10th IEEE International Conference on Computer Vision, 2005. ICCV, vol 1, pp 90–97. IEEE
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR, vol 1, pp 886–893. IEEE
Wang M, Wang X (2011) Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3401–3408. IEEE
Ge W, Collins R T (2009) Marked point processes for crowd counting. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR, pp 2913–2920. IEEE
Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In: 2008 ICPR 2008, 19th International Conference on Pattern Recognition, pp 1–4. IEEE
Chan A B, Liang Z-S J, Vasconcelos N (2008) Privacy preserving crowd monitoring: Counting people without people models or tracking. In: CVPR 2008. IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp 1–7. IEEE
Chen K, Loy C C, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: fBMVC, vol 1, p 3
Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Advances in Neural Information Processing Systems, pp 1324–1332
Chan A B, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In: 2009 IEEE 12th International Conference on Computer Vision, pp 545–551. IEEE
Kong D, Gray D, Tao H (2006) A viewpoint invariant approach for crowd counting. In: ICPR 2006. 18th International Conference on Pattern Recognition, 2006, vol 3, pp 1187–1190. IEEE
Marana A, Costa LdF, Lotufo R, Velastin S (1998) On the efficacy of texture analysis for crowd monitoring. In: 1998 Proceedings. SIBGRAPI’98. International Symposium on Computer Graphics, Image Processing, and Vision, pp 354–361. IEEE
Chan A B, Vasconcelos N (2012) Counting people with low-level features and bayesian regression. IEEE Trans Image Process 21(4):2160–2177
Paragios N, Ramesh V (2001) A mrf-based approach for real-time subway monitoring. In: 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the, vol 1, pp I–I. IEEE
Regazzoni C S, Tesei A (1996) Distributed data fusion for real-time crowding estimation. Signal Process 53(1):47–63
Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2874–2883
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 833–841
Hu Y, Chang H, Nian F, Wang Y, Li T (2016) Dense crowd counting from still images with convolutional neural networks. J Vis Commun Image Represent 38:530–539
Zhang Y, Chang F, Wang M, Zhang F, Han C (2017) Auxiliary learning for crowd counting via count-net. Neurocomputing
Sindagi V A, Patel V M (2017) Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. arXiv:1707.09605
Chen L-C, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3640–3649
Neverova N, Wolf C, Taylor G W, Nebout F (2014) Multi-scale deep learning for gesture detection and localization. In: Workshop at the European Conference on Computer Vision, pp 474–490. Springer
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp 2366–2374
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2658
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Zeiler M D, Ranzato M, Monga R, Mao M, Yang K, Le Q V, Nguyen P, Senior A, Vanhoucke V, Dean J et al (2013) On rectified linear units for speech processing. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 3517–3521. IEEE
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Dumoulin V, Visin F (2016). arXiv:1603.07285
Marsden M, McGuiness K, Little S, O’Connor N E (2016) Fully convolutional crowd counting on highly congested scenes. arXiv:1612.00220
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678. ACM
Rodriguez M, Laptev I, Sivic J, Audibert J-Y (2011) Density-aware person detection and tracking in crowds. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp 2423–2430. IEEE
Zeng L, Xu X, Cai B, Qiu S, Zhang T (2017) Multi-scale convolutional neural networks for crowd counting. arXiv:1702.02359
Acknowledgments
This work is supported in part by the National Natural Science Foundation of China under grant No. 61233003, in part by the Equipment Pre-research Fund under grant No. 61403120201.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, L., Yin, B., Guo, A. et al. Skip-connection convolutional neural network for still image crowd counting. Appl Intell 48, 3360–3371 (2018). https://doi.org/10.1007/s10489-018-1150-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1150-1