Abstract
Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation—the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Then the evidence of co-existing pedestrians is used for improving the single pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the Caltech-Train dataset and the ETH dataset. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. The mutual visibility deep model leads to 6–15 % improvements on multiple benchmark datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
’Lena’ and ’Richard’ are used as placeholder names in this paper.
References
Bar-Hillel, A., Levi, D., Krupka. E., & Goldberg, C. (2010). Part-based feature synthesis for human detection. In Proceedings of ECCV. New York: Springer.
Benenson, R., Mathias, M., Timofte, R., & Gool, L. V. (2012). Pedestrian detection at 100 frames per second. In CVPR. Berlin: IEEE Press.
Benenson, R., Mathias, M., Tuytelaars, & T., Van Gool, L. (2013). Seeking the strongest rigid detector. In Proceedings of CVPR, New York.
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
Chen, G., Ding, Y., Xiao, J., & Han, T. X. (2013). Detection evolution with multi-order contextual co-occurrence. In Proceedings of CVPR, Boca Raton.
Dai, S., Yang, M., Wu, Y., & Katsaggelos, A. (2007). Detector ensemble. In IEEE Conference on CVPR. Heidelberg: Springer.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on CVPR. New York: IEEE.
Dean, T., Ruzon, M.A., Segal, M., Shlens, J., Vijayanarasimhan, S., & Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of the IEEE conference on CVPR, New York.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on CVPR. New York: Springer.
Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In IEEE international conference on ECCV. Heidelberg: Springer.
Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class object layout. In ICCV. New York: Springer.
Ding, Y., & Xiao, J. (2012). Contextual boost for pedestrian detection. In CVPR, Berlin.
Dollár, P. (2014). Caltech pedestrian detection benchmark. Accessed May 6, 2014, from http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians.
Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In BMVC, Beijing.
Dollár, P., Belongie, S., & Perona, P. (2010).The fastest pedestrian detector in the west. In BMVC, Heidelberg.
Dollár, P., Appel, R., & Kienzle, W. (2012a.) Crosstalk cascades for frame-rate pedestrian detection. In ECCV.
Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012b). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 743–761.
Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1532–1545.
Duan, G., Ai, H., & Lao, S. (2010). A structural filter approach to human detection. In ECCV, Berlin.
Enzweiler, M., Eigenstetter, A., Schiele, B., & Gavrila, D. M. (2010). Multi-cue pedestrian classification with partial occlusion handling. In CVPR.
Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11, 625–660.
Ess, A., Leibe, B., & Gool, L. V. (2007). Depth and appearance for mobile scene analysis. In ICCV.
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1915–1929.
Felzenszwalb, P., Grishick, R. B., McAllister, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771–1800.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.
Hu, J., Lu, J., & Tan, Y. P. (2014). Discriminative deep metric learning for face verification in the wild. In CVPR.
Jarrett, K., Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In CVPR.
Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
Krizhevsky, A., Sutskever, I.,&Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
Le, Q. V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G. S., & Dean, J, Ng. A. Y. (2012). Building high-level features using large scale unsupervised learning. In ICML.
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML.
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.
Li, C., Parikh, D., & Chen, T. (2011). Extracting adaptive contextual cues from unlabeled regions. In ICCV.
Lin, Z., Davis, L. S., Doermann, D., & DeMenthon, D. (2007). Hierarchical part-template matching for human detection and segmentation. In ICCV.
Liu, P., Jan, S., Meng, Z., & Tong, Y. (2014). Facial expression recognition via a boosted deep belief network. In CVPR.
Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In CVPR.
Marın, J., Vázquez, D., López, A. M., Amores, J., & Leibe, B. (2013). Random forests of local experts for pedestrian detection. In CVPR.
Mathias, M., Benenson, R., Timofte, R., & Van Gool, L. (2013). Handling occlusions with franken-classifiers. In CVPR.
Nam, W., Han, B., & Han, J. H. (2011). Improving object localization using macrofeature layout selection. In ICCV workshop, (pp 1801–1808). Berlin: IEEE Press.
Norouzi, M., Ranjbar, M., & Mori, G. (2009). Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning. In CVPR.
Ouyang, W., & Wang, X. (2012). A discriminative deep model for pedestrian detection with occlusion handling. In CVPR.
Ouyang, W., & Wang, X. (2013a). Joint deep learning for pedestrian detection. In ICCV.
Ouyang, W., & Wang, X. (2013b), Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR.
Ouyang, W., Zeng, X., & Wang, X. (2013). Modeling mutual visibility relationship in pedestrian detection. In CVPR.
Ouyang, W., Zeng, X., Wang, X. (2015). Single-pedestrian detection aided by 2-pedestrian detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2014.2377734.
Ouyang, W., Zeng, X., Wang, X. (2016). Partial occlusion handling in pedestrian detection with a deep model. IEEE Transactions on Circuits and Systems for Video Technology. doi:10.1109/TCSVT.2015.2501940.
Paisitkriangkrai, S., Shen, C., & Van Den Hengel, A. (2013). Efficient pedestrian detection by directly optimize the partial area under the roc curve. In ICCV.
Park, D., Ramanan, D., & Fowlkes, C. (2010). Multiresolution models for object detection. In ECCV.
Park, D., Zitnick, C. L., Ramanan, D., & Dollár, P. (2013). Exploring weak stabilization for motion feature extraction. In CVPR.
Pepikj, B., Stark, M., Gehler, P., & Schiele, B.(2013). Occlusion patterns for object class detection. In CVPR (pp. 3286–3293). New York: IEEE Press.
Ranzato, M., Susskind, J., Mnih, V., & Hinton, G. (2011). On deep generative models with applications to recognition. In CVPR.
Sadeghi, M. A., & Farhadi, A. (2011). Recognition using visual phrases. In CVPR, (pp. 1745–1752). Urbana: IEEE.
Schwartz, W., Kembhavi, A., Harwood, D., & Davis, L. (2009). Human detection using partial least squares analysis. In ICCV.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013a). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229.
Sermanet, P., Kavukcuoglu, K., Chintala, S., & Lecun, Y. (2013b). Pedestrian detection with unsupervised and multi-stage feature learning. In CVPR.
Shen, C., Wang, P., Paisitkriangkrai, S., & van den Hengel, A. (2013). Training effective node classifiers for cascade classification. IJCV, 103(3), 326–347.
Shet, V. D., Neumann, J., Ramesh, V., & Davis, L. S. (2007). Bilattice-based logical reasoning for human detection. In CVPR.
Sun, L., Jia, K., Chan, T. H., Fang, Y., & Yan, S. (2014). Deeply-learned slow feature analysis for action recognition. In CVPR.
Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation from predicting 10,000 classes. In CVPR.
Tang, S., Andriluka, M., & Schiele, B. (2012). Detection and tracking of occluded people. In BMVC, Surrey.
Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., & Schiele, B. (2013). Learning people detectors for tracking in crowded scenes. In Proceedings of ICCV.
Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. IJCV, 63(2), 153–161.
Walk, S., Majer, N., Schindler, K., & Schiele, B. (2010). New features and insights for pedestrian detection. In CVPR.
Wang, X., Han, X., & Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In CVPR.
Wojek, C., & Schiele, B. (2008). A performance evaluation of single and multi-feature people detection. In DAGM.
Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In ICCV.
Wu, B., & Nevatia, R. (2009). Detection and segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses. IJCV, 82(2), 185–204.
Wu, T., & Zhu, S. (2011). A numeric study of the bottom-up and top-down inference processes in and-or graphs. IJCV, 93(2), 226–252.
Yan, J., Lei, Z., Yi, D., & Li, S. Z. (2012). Multi-pedestrian detection in crowded scenes: A global view. In CVPR.
Yan, J., Zhang, X., Lei, Z., Liao, S., & Li, S. Z. (2013). Robust multi-resolution pedestrian detection in traffic scenes. In CVPR.
Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In CVPR.
Yang, Y., Baker, S., Kannan, A., & Ramanan, D. (2012). Recognizing proxemics in personal photos. In CVPR.
Yao, B., & Fei-Fei, L. (2010). Modeling mutual context of object and human pose in human-object interaction activities. In CVPR.
Zeng, X., Ouyang, W., & Wang, X. (2013). Multi-stage contextual deep learning for pedestrian detection. In ICCV.
Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In CVPR.
Acknowledgments
This work is supported by the General Research Fund sponsored by the Research Grants Council of Hong Kong (Project No. CUHK 417110 and CUHK 417011), National Natural Science Foundation of China (Project No. 61005057), and Guangdong Innovative Research Team Program (No. 201001D0104648280).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Deva Ramanan.
Rights and permissions
About this article
Cite this article
Ouyang, W., Zeng, X. & Wang, X. Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model. Int J Comput Vis 120, 14–27 (2016). https://doi.org/10.1007/s11263-016-0890-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0890-9