Abstract
The exploitation of contextual information among multiple images has been proven significant to improve detection performance by object co-detection methods. In this paper, we propose a pedestrian co-detection method that combines the strengths of convolutional neural networks (CNNs) and locality-constrained linear coding (LLC) in a unified conditional random field (CRF) model. First, we obtain object candidates by using a region proposal network (RPN) in Faster R-CNN. Second, we build a fully connected CRF that consists of unary potentials on individual object candidates and two types of pairwise potentials on pairs of object candidates. The unary potential is computed independently for each object candidate by using the baseline method. The pairwise potentials consist of multiscale CNN and LLC representation-based potentials, which contribute to the capturing of relationships among object candidates in all the test images. Finally, we jointly predict the category labels of all the object candidates through the mean field inference in the CRF. We evaluated the proposed method on the ETH, Caltech, and INRIA Pedestrian datasets. The experimental results demonstrate the effectiveness of the proposed method as compared to the baseline method.
Similar content being viewed by others
References
Adams A, Baek J, Davis MA (2010) Fast high-dimensional filtering using the permutohedral lattice. In: Computer graphics forum, vol 29(2). Blackwell Publishing Ltd, Oxford, pp 753–762
Appel R, Fuchs T, Dollár P et al (2013) Quickly boosting decision trees-pruning underachieving features early. In: International conference on machine learning, pp 594–602
Arnab A, Jayasumana S, Zheng S et al (2016) Higher order conditional random fields in deep neural networks. In: European conference on computer vision, Cham, pp 524–540
Arnab A, Zheng S, Jayasumana S et al (2018) Conditional random fields meet deep neural networks for semantic segmentation. IEEE Signal Proc Mag 35(1):37–52
Bao SY, Xiang Y, Savarese S (2012) Object co-detection. In: European conference on computer vision. Springer, Berlin, pp 86–101
Barinova O, Lempitsky V, Kholi P (2012) On detection of multiple object instances using hough transforms. IEEE Trans Softw Eng 34(9):1773
Bell S, Zitnick LC, Bala K et al (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Benenson R, Mathias M, Tuytelaars T et al (2013) Seeking the strongest rigid detector. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3666–3673
Brazil G, Yin X, Liu X (2017) Illuminating Pedestrians via Simultaneous Detection & Segmentation. arXiv:1706.08564
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 3361– 3369
Cai Z, Fan Q, Feris RS et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, Cham, pp 354–370
Chen Q, Song Z, Dong J et al (2015) Contextualizing object detection and classification. IEEE Trans Pattern Anal Mach Intell 37(1):13–27
Chen LC, Papandreou G, Kokkinos I et al (2018) Deeplab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005. IEEE, 1, pp 886– 893
Dollar P, Wojek C, Schiele B et al (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Felzenszwalb PF, Girshick RB, McAllester D et al (2010) Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence 32(9):1627– 1645
Fu K, Gu I, Yang J (2017) Saliency detection by fully learning a continuous conditional random field. IEEE Transactions on Multimedia 19(7):1531–1544
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
Girshick R, Fast r-cnn[C] (2015). In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T et al (2016) Region-Based Convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142
Guo X, Liu D, Jou B et al (2013) Robust object co-detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3206–3213
Hayder Z, Salzmann M, He X (2014) Object co-detection via efficient inference in a fully-connected CRF. In: European conference on computer vision. Springer, Cham, pp 330–345
Hayder Z, He X, Salzmann M (2015) Structural kernel learning for large scale multiclass object co-detection. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 2632– 2640
He K, Zhang X, Ren S et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, Cham, pp 346–361
Hoffman J, Guadarrama S, Tzeng ES et al (2014) LSDA: Large scale detection through adaptation. Advances in Neural Information Processing Systems, pp 3536–3544
Hosang J, Omran M, Benenson R et al (2015). In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4073–4082
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems, pp 109–117
Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68(2):179–201
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2169–2178
Li J, Liang X, Shen SM et al (2018) Scale-aware fast r-CNN for pedestrian detection. IEEE Transactions on Multimedia 20(4):985–996
Marin J, Vaázquez D, López AM et al (2013) Random forests of local experts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 2592–2599
Nam W, Dollár P, Han JH (2014) Local decorrelation for improved pedestrian detection. Advances in Neural Information Processing Systems, pp 424–432
Ouyang W, Wang X (2013) Joint deep learning for pedestrian detection, pp 2056–2063
Paisitkriangkrai S, Shen C, Van Den Hengel A (2014) Strengthening the effectiveness of pedestrian detection with spatially pooled features. In: European conference on computer vision. Springer, Cham, pp 546–561
Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3246–3253
Ren S, He K, Girshick R et al (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39 (6):1137–1149
Rui T, Zou J, Zhou Y et al (2017) Pedestrian detection based on multi-convolutional features by feature maps pruning. Multimed Tools Appl 76 (23):25079–25089
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Shi J, Liao R, Jia J (2013) CoDeL: a human co-detection and labeling framework. In: IEEE international conference on computer vision. IEEE, pp 2096–2103
Shen C, Wang P, Paisitkriangkrai S et al (2013) Training effective node classifiers for cascade classification. Int J Comput Vis 103(3):326–347
Shotton J, Winn J, Rother C et al (2009) Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int J Comput Vis 81(1):2–23
Tian Y, Luo P, Wang X et al (2015). In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912
Tian Y, Luo P, Wang X et al (2015). In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5079–5087
Toca C, Ciuc M, Patrascu C (2015) Normalized autobinomial Markov channels for pedestrian detection. BMVC, pp 175.1-175.13
Toyoda T, Hasegawa O (2008) Random field model for integration of local information and global information. IEEE Trans Pattern Anal Mach Intell 30 (8):1483–1489
Uijlings JR, Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Vineet V, Warrell J, Torr PHS (2014) Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. Int J Comput Vis 110 (3):290–307
Wang J, Yang J, Yu K et al (2010) Locality-constrained Linear Coding for image classification. In: Computer vision and pattern recognition, IEEE, pp 3360–3367
Yang J, Yu K, Gong Y et al (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE computer society conference on computer vision and pattern recognition. DBLP, pp 1794–1801
Yang B, Yan J, Lei Z et al (2015) Convolutional channel features. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 82–90
Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137
Yang J, Yang MH (2017) Top-down visual saliency via joint CRF and dictionary learning. IEEE Trans Pattern Anal Mach Intell 39(3):576–588
Yang D, Zhang J, Xu S et al (2018) Real-time pedestrian detection via hierarchical convolutional feature. Multimedia Tools & Applications 2018(4):1–20
Zhang S, Benenson R, Schiele B (2015) Filtered channel features for pedestrian detection. CVPR 1(2):4
Zhang L, Lin L, Liang X et al (2016) Is faster r-cnn doing well for pedestrian detection?. In: European conference on computer vision. Springer, Cham, pp 443–457
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China with Nos. 61673274, 61375008, and 61075106.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, L., Ji, J., Zhong, W. et al. Exploiting context based on CNN and coding representations for pedestrian co-detection. Multimed Tools Appl 79, 4277–4296 (2020). https://doi.org/10.1007/s11042-018-6806-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6806-7