Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model

Ouyang, Wanli; Zeng, Xingyu; Wang, Xiaogang

doi:10.1007/s11263-016-0890-9

Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model

Published: 02 March 2016

Volume 120, pages 14–27, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Wanli Ouyang¹,
Xingyu Zeng¹ &
Xiaogang Wang¹

1768 Accesses
1 Altmetric
Explore all metrics

Abstract

Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation—the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Then the evidence of co-existing pedestrians is used for improving the single pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the Caltech-Train dataset and the ETH dataset. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. The mutual visibility deep model leads to 6–15 % improvements on multiple benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visible Feature Guidance for Crowd Pedestrian Detection

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

PSC-Net: learning part spatial co-occurrence for occluded pedestrian detection

Article 19 November 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

’Lena’ and ’Richard’ are used as placeholder names in this paper.
http://www.cvg.rdg.ac.uk/PETS2009/a.html.
http://www.ee.cuhk.edu.hk/~xgwang/2DBNped.html.

References

Bar-Hillel, A., Levi, D., Krupka. E., & Goldberg, C. (2010). Part-based feature synthesis for human detection. In Proceedings of ECCV. New York: Springer.
Benenson, R., Mathias, M., Timofte, R., & Gool, L. V. (2012). Pedestrian detection at 100 frames per second. In CVPR. Berlin: IEEE Press.
Benenson, R., Mathias, M., Tuytelaars, & T., Van Gool, L. (2013). Seeking the strongest rigid detector. In Proceedings of CVPR, New York.
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.
Article MathSciNet MATH Google Scholar
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.
Article Google Scholar
Chen, G., Ding, Y., Xiao, J., & Han, T. X. (2013). Detection evolution with multi-order contextual co-occurrence. In Proceedings of CVPR, Boca Raton.
Dai, S., Yang, M., Wu, Y., & Katsaggelos, A. (2007). Detector ensemble. In IEEE Conference on CVPR. Heidelberg: Springer.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on CVPR. New York: IEEE.
Dean, T., Ruzon, M.A., Segal, M., Shlens, J., Vijayanarasimhan, S., & Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of the IEEE conference on CVPR, New York.
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on CVPR. New York: Springer.
Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In IEEE international conference on ECCV. Heidelberg: Springer.
Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class object layout. In ICCV. New York: Springer.
Ding, Y., & Xiao, J. (2012). Contextual boost for pedestrian detection. In CVPR, Berlin.
Dollár, P. (2014). Caltech pedestrian detection benchmark. Accessed May 6, 2014, from http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians.
Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In BMVC, Beijing.
Dollár, P., Belongie, S., & Perona, P. (2010).The fastest pedestrian detector in the west. In BMVC, Heidelberg.
Dollár, P., Appel, R., & Kienzle, W. (2012a.) Crosstalk cascades for frame-rate pedestrian detection. In ECCV.
Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012b). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 743–761.
Article Google Scholar
Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1532–1545.
Article Google Scholar
Duan, G., Ai, H., & Lao, S. (2010). A structural filter approach to human detection. In ECCV, Berlin.
Enzweiler, M., Eigenstetter, A., Schiele, B., & Gavrila, D. M. (2010). Multi-cue pedestrian classification with partial occlusion handling. In CVPR.
Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11, 625–660.
MathSciNet MATH Google Scholar
Ess, A., Leibe, B., & Gool, L. V. (2007). Depth and appearance for mobile scene analysis. In ICCV.
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1915–1929.
Article Google Scholar
Felzenszwalb, P., Grishick, R. B., McAllister, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.
Article Google Scholar
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771–1800.
Article MATH Google Scholar
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Article MathSciNet MATH Google Scholar
Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.
Article MathSciNet MATH Google Scholar
Hu, J., Lu, J., & Tan, Y. P. (2014). Discriminative deep metric learning for face verification in the wild. In CVPR.
Jarrett, K., Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In CVPR.
Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
Article Google Scholar
Krizhevsky, A., Sutskever, I.,&Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.
Le, Q. V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G. S., & Dean, J, Ng. A. Y. (2012). Building high-level features using large scale unsupervised learning. In ICML.
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML.
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.
Li, C., Parikh, D., & Chen, T. (2011). Extracting adaptive contextual cues from unlabeled regions. In ICCV.
Lin, Z., Davis, L. S., Doermann, D., & DeMenthon, D. (2007). Hierarchical part-template matching for human detection and segmentation. In ICCV.
Liu, P., Jan, S., Meng, Z., & Tong, Y. (2014). Facial expression recognition via a boosted deep belief network. In CVPR.
Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In CVPR.
Marın, J., Vázquez, D., López, A. M., Amores, J., & Leibe, B. (2013). Random forests of local experts for pedestrian detection. In CVPR.
Mathias, M., Benenson, R., Timofte, R., & Van Gool, L. (2013). Handling occlusions with franken-classifiers. In CVPR.
Nam, W., Han, B., & Han, J. H. (2011). Improving object localization using macrofeature layout selection. In ICCV workshop, (pp 1801–1808). Berlin: IEEE Press.
Norouzi, M., Ranjbar, M., & Mori, G. (2009). Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning. In CVPR.
Ouyang, W., & Wang, X. (2012). A discriminative deep model for pedestrian detection with occlusion handling. In CVPR.
Ouyang, W., & Wang, X. (2013a). Joint deep learning for pedestrian detection. In ICCV.
Ouyang, W., & Wang, X. (2013b), Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR.
Ouyang, W., Zeng, X., & Wang, X. (2013). Modeling mutual visibility relationship in pedestrian detection. In CVPR.
Ouyang, W., Zeng, X., Wang, X. (2015). Single-pedestrian detection aided by 2-pedestrian detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2014.2377734.
Ouyang, W., Zeng, X., Wang, X. (2016). Partial occlusion handling in pedestrian detection with a deep model. IEEE Transactions on Circuits and Systems for Video Technology. doi:10.1109/TCSVT.2015.2501940.
Paisitkriangkrai, S., Shen, C., & Van Den Hengel, A. (2013). Efficient pedestrian detection by directly optimize the partial area under the roc curve. In ICCV.
Park, D., Ramanan, D., & Fowlkes, C. (2010). Multiresolution models for object detection. In ECCV.
Park, D., Zitnick, C. L., Ramanan, D., & Dollár, P. (2013). Exploring weak stabilization for motion feature extraction. In CVPR.
Pepikj, B., Stark, M., Gehler, P., & Schiele, B.(2013). Occlusion patterns for object class detection. In CVPR (pp. 3286–3293). New York: IEEE Press.
Ranzato, M., Susskind, J., Mnih, V., & Hinton, G. (2011). On deep generative models with applications to recognition. In CVPR.
Sadeghi, M. A., & Farhadi, A. (2011). Recognition using visual phrases. In CVPR, (pp. 1745–1752). Urbana: IEEE.
Schwartz, W., Kembhavi, A., Harwood, D., & Davis, L. (2009). Human detection using partial least squares analysis. In ICCV.
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013a). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229.
Sermanet, P., Kavukcuoglu, K., Chintala, S., & Lecun, Y. (2013b). Pedestrian detection with unsupervised and multi-stage feature learning. In CVPR.
Shen, C., Wang, P., Paisitkriangkrai, S., & van den Hengel, A. (2013). Training effective node classifiers for cascade classification. IJCV, 103(3), 326–347.
Article MathSciNet MATH Google Scholar
Shet, V. D., Neumann, J., Ramesh, V., & Davis, L. S. (2007). Bilattice-based logical reasoning for human detection. In CVPR.
Sun, L., Jia, K., Chan, T. H., Fang, Y., & Yan, S. (2014). Deeply-learned slow feature analysis for action recognition. In CVPR.
Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation from predicting 10,000 classes. In CVPR.
Tang, S., Andriluka, M., & Schiele, B. (2012). Detection and tracking of occluded people. In BMVC, Surrey.
Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., & Schiele, B. (2013). Learning people detectors for tracking in crowded scenes. In Proceedings of ICCV.
Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. IJCV, 63(2), 153–161.
Article Google Scholar
Walk, S., Majer, N., Schindler, K., & Schiele, B. (2010). New features and insights for pedestrian detection. In CVPR.
Wang, X., Han, X., & Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In CVPR.
Wojek, C., & Schiele, B. (2008). A performance evaluation of single and multi-feature people detection. In DAGM.
Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In ICCV.
Wu, B., & Nevatia, R. (2009). Detection and segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses. IJCV, 82(2), 185–204.
Article Google Scholar
Wu, T., & Zhu, S. (2011). A numeric study of the bottom-up and top-down inference processes in and-or graphs. IJCV, 93(2), 226–252.
Article MathSciNet MATH Google Scholar
Yan, J., Lei, Z., Yi, D., & Li, S. Z. (2012). Multi-pedestrian detection in crowded scenes: A global view. In CVPR.
Yan, J., Zhang, X., Lei, Z., Liao, S., & Li, S. Z. (2013). Robust multi-resolution pedestrian detection in traffic scenes. In CVPR.
Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In CVPR.
Yang, Y., Baker, S., Kannan, A., & Ramanan, D. (2012). Recognizing proxemics in personal photos. In CVPR.
Yao, B., & Fei-Fei, L. (2010). Modeling mutual context of object and human pose in human-object interaction activities. In CVPR.
Zeng, X., Ouyang, W., & Wang, X. (2013). Multi-stage contextual deep learning for pedestrian detection. In ICCV.
Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In CVPR.

Download references

Acknowledgments

This work is supported by the General Research Fund sponsored by the Research Grants Council of Hong Kong (Project No. CUHK 417110 and CUHK 417011), National Natural Science Foundation of China (Project No. 61005057), and Guangdong Innovative Research Team Program (No. 201001D0104648280).

Author information

Authors and Affiliations

Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, People’s Republic of China
Wanli Ouyang, Xingyu Zeng & Xiaogang Wang

Authors

Wanli Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Xingyu Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaogang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wanli Ouyang.

Additional information

Communicated by Deva Ramanan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ouyang, W., Zeng, X. & Wang, X. Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model. Int J Comput Vis 120, 14–27 (2016). https://doi.org/10.1007/s11263-016-0890-9

Download citation

Received: 07 May 2014
Accepted: 15 February 2016
Published: 02 March 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11263-016-0890-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Visible Feature Guidance for Crowd Pedestrian Detection

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

PSC-Net: learning part spatial co-occurrence for occluded pedestrian detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Visible Feature Guidance for Crowd Pedestrian Detection

BDF-YOLOV5: Improved YOLOV5 Based on Bi-directional Fusion Network for Dense Pedestrian Detection

PSC-Net: learning part spatial co-occurrence for occluded pedestrian detection

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation