Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation—the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Then the evidence of co-existing pedestrians is used for improving the single pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the Caltech-Train dataset and the ETH dataset. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. The mutual visibility deep model leads to 6–15 % improvements on multiple benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. ’Lena’ and ’Richard’ are used as placeholder names in this paper.

  2. http://www.cvg.rdg.ac.uk/PETS2009/a.html.

  3. http://www.ee.cuhk.edu.hk/~xgwang/2DBNped.html.

References

  • Bar-Hillel, A., Levi, D., Krupka. E., & Goldberg, C. (2010). Part-based feature synthesis for human detection. In Proceedings of ECCV. New York: Springer.

  • Benenson, R., Mathias, M., Timofte, R., & Gool, L. V. (2012). Pedestrian detection at 100 frames per second. In CVPR. Berlin: IEEE Press.

  • Benenson, R., Mathias, M., Tuytelaars, & T., Van Gool, L. (2013). Seeking the strongest rigid detector. In Proceedings of CVPR, New York.

  • Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.

    Article  MathSciNet  MATH  Google Scholar 

  • Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828.

    Article  Google Scholar 

  • Chen, G., Ding, Y., Xiao, J., & Han, T. X. (2013). Detection evolution with multi-order contextual co-occurrence. In Proceedings of CVPR, Boca Raton.

  • Dai, S., Yang, M., Wu, Y., & Katsaggelos, A. (2007). Detector ensemble. In IEEE Conference on CVPR. Heidelberg: Springer.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on CVPR. New York: IEEE.

  • Dean, T., Ruzon, M.A., Segal, M., Shlens, J., Vijayanarasimhan, S., & Yagnik, J. (2013). Fast, accurate detection of 100,000 object classes on a single machine. In Proceedings of the IEEE conference on CVPR, New York.

  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on CVPR. New York: Springer.

  • Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In IEEE international conference on ECCV. Heidelberg: Springer.

  • Desai, C., Ramanan, D., & Fowlkes, C. (2009). Discriminative models for multi-class object layout. In ICCV. New York: Springer.

  • Ding, Y., & Xiao, J. (2012). Contextual boost for pedestrian detection. In CVPR, Berlin.

  • Dollár, P. (2014). Caltech pedestrian detection benchmark. Accessed May 6, 2014, from http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians.

  • Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In BMVC, Beijing.

  • Dollár, P., Belongie, S., & Perona, P. (2010).The fastest pedestrian detector in the west. In BMVC, Heidelberg.

  • Dollár, P., Appel, R., & Kienzle, W. (2012a.) Crosstalk cascades for frame-rate pedestrian detection. In ECCV.

  • Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012b). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 743–761.

    Article  Google Scholar 

  • Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1532–1545.

    Article  Google Scholar 

  • Duan, G., Ai, H., & Lao, S. (2010). A structural filter approach to human detection. In ECCV, Berlin.

  • Enzweiler, M., Eigenstetter, A., Schiele, B., & Gavrila, D. M. (2010). Multi-cue pedestrian classification with partial occlusion handling. In CVPR.

  • Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? The Journal of Machine Learning Research, 11, 625–660.

    MathSciNet  MATH  Google Scholar 

  • Ess, A., Leibe, B., & Gool, L. V. (2007). Depth and appearance for mobile scene analysis. In ICCV.

  • Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30, 1915–1929.

    Article  Google Scholar 

  • Felzenszwalb, P., Grishick, R. B., McAllister, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.

    Article  Google Scholar 

  • Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14, 1771–1800.

    Article  MATH  Google Scholar 

  • Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.

    Article  MathSciNet  MATH  Google Scholar 

  • Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.

    Article  MathSciNet  MATH  Google Scholar 

  • Hu, J., Lu, J., & Tan, Y. P. (2014). Discriminative deep metric learning for face verification in the wild. In CVPR.

  • Jarrett, K., Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In CVPR.

  • Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.

    Article  Google Scholar 

  • Krizhevsky, A., Sutskever, I.,&Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In NIPS.

  • Le, Q. V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G. S., & Dean, J, Ng. A. Y. (2012). Building high-level features using large scale unsupervised learning. In ICML.

  • Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML.

  • Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in crowded scenes. In CVPR.

  • Li, C., Parikh, D., & Chen, T. (2011). Extracting adaptive contextual cues from unlabeled regions. In ICCV.

  • Lin, Z., Davis, L. S., Doermann, D., & DeMenthon, D. (2007). Hierarchical part-template matching for human detection and segmentation. In ICCV.

  • Liu, P., Jan, S., Meng, Z., & Tong, Y. (2014). Facial expression recognition via a boosted deep belief network. In CVPR.

  • Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In CVPR.

  • Marın, J., Vázquez, D., López, A. M., Amores, J., & Leibe, B. (2013). Random forests of local experts for pedestrian detection. In CVPR.

  • Mathias, M., Benenson, R., Timofte, R., & Van Gool, L. (2013). Handling occlusions with franken-classifiers. In CVPR.

  • Nam, W., Han, B., & Han, J. H. (2011). Improving object localization using macrofeature layout selection. In ICCV workshop, (pp 1801–1808). Berlin: IEEE Press.

  • Norouzi, M., Ranjbar, M., & Mori, G. (2009). Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning. In CVPR.

  • Ouyang, W., & Wang, X. (2012). A discriminative deep model for pedestrian detection with occlusion handling. In CVPR.

  • Ouyang, W., & Wang, X. (2013a). Joint deep learning for pedestrian detection. In ICCV.

  • Ouyang, W., & Wang, X. (2013b), Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR.

  • Ouyang, W., Zeng, X., & Wang, X. (2013). Modeling mutual visibility relationship in pedestrian detection. In CVPR.

  • Ouyang, W., Zeng, X., Wang, X. (2015). Single-pedestrian detection aided by 2-pedestrian detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. doi:10.1109/TPAMI.2014.2377734.

  • Ouyang, W., Zeng, X., Wang, X. (2016). Partial occlusion handling in pedestrian detection with a deep model. IEEE Transactions on Circuits and Systems for Video Technology. doi:10.1109/TCSVT.2015.2501940.

  • Paisitkriangkrai, S., Shen, C., & Van Den Hengel, A. (2013). Efficient pedestrian detection by directly optimize the partial area under the roc curve. In ICCV.

  • Park, D., Ramanan, D., & Fowlkes, C. (2010). Multiresolution models for object detection. In ECCV.

  • Park, D., Zitnick, C. L., Ramanan, D., & Dollár, P. (2013). Exploring weak stabilization for motion feature extraction. In CVPR.

  • Pepikj, B., Stark, M., Gehler, P., & Schiele, B.(2013). Occlusion patterns for object class detection. In CVPR (pp. 3286–3293). New York: IEEE Press.

  • Ranzato, M., Susskind, J., Mnih, V., & Hinton, G. (2011). On deep generative models with applications to recognition. In CVPR.

  • Sadeghi, M. A., & Farhadi, A. (2011). Recognition using visual phrases. In CVPR, (pp. 1745–1752). Urbana: IEEE.

  • Schwartz, W., Kembhavi, A., Harwood, D., & Davis, L. (2009). Human detection using partial least squares analysis. In ICCV.

  • Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013a). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229.

  • Sermanet, P., Kavukcuoglu, K., Chintala, S., & Lecun, Y. (2013b). Pedestrian detection with unsupervised and multi-stage feature learning. In CVPR.

  • Shen, C., Wang, P., Paisitkriangkrai, S., & van den Hengel, A. (2013). Training effective node classifiers for cascade classification. IJCV, 103(3), 326–347.

    Article  MathSciNet  MATH  Google Scholar 

  • Shet, V. D., Neumann, J., Ramesh, V., & Davis, L. S. (2007). Bilattice-based logical reasoning for human detection. In CVPR.

  • Sun, L., Jia, K., Chan, T. H., Fang, Y., & Yan, S. (2014). Deeply-learned slow feature analysis for action recognition. In CVPR.

  • Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation from predicting 10,000 classes. In CVPR.

  • Tang, S., Andriluka, M., & Schiele, B. (2012). Detection and tracking of occluded people. In BMVC, Surrey.

  • Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., & Schiele, B. (2013). Learning people detectors for tracking in crowded scenes. In Proceedings of ICCV.

  • Viola, P., Jones, M. J., & Snow, D. (2005). Detecting pedestrians using patterns of motion and appearance. IJCV, 63(2), 153–161.

    Article  Google Scholar 

  • Walk, S., Majer, N., Schindler, K., & Schiele, B. (2010). New features and insights for pedestrian detection. In CVPR.

  • Wang, X., Han, X., & Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In CVPR.

  • Wojek, C., & Schiele, B. (2008). A performance evaluation of single and multi-feature people detection. In DAGM.

  • Wu, B., & Nevatia, R. (2005). Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In ICCV.

  • Wu, B., & Nevatia, R. (2009). Detection and segmentation of multiple, partially occluded objects by grouping, merging, assigning part detection responses. IJCV, 82(2), 185–204.

    Article  Google Scholar 

  • Wu, T., & Zhu, S. (2011). A numeric study of the bottom-up and top-down inference processes in and-or graphs. IJCV, 93(2), 226–252.

    Article  MathSciNet  MATH  Google Scholar 

  • Yan, J., Lei, Z., Yi, D., & Li, S. Z. (2012). Multi-pedestrian detection in crowded scenes: A global view. In CVPR.

  • Yan, J., Zhang, X., Lei, Z., Liao, S., & Li, S. Z. (2013). Robust multi-resolution pedestrian detection in traffic scenes. In CVPR.

  • Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In CVPR.

  • Yang, Y., Baker, S., Kannan, A., & Ramanan, D. (2012). Recognizing proxemics in personal photos. In CVPR.

  • Yao, B., & Fei-Fei, L. (2010). Modeling mutual context of object and human pose in human-object interaction activities. In CVPR.

  • Zeng, X., Ouyang, W., & Wang, X. (2013). Multi-stage contextual deep learning for pedestrian detection. In ICCV.

  • Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In CVPR.

Download references

Acknowledgments

This work is supported by the General Research Fund sponsored by the Research Grants Council of Hong Kong (Project No. CUHK 417110 and CUHK 417011), National Natural Science Foundation of China (Project No. 61005057), and Guangdong Innovative Research Team Program (No. 201001D0104648280).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wanli Ouyang.

Additional information

Communicated by Deva Ramanan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ouyang, W., Zeng, X. & Wang, X. Learning Mutual Visibility Relationship for Pedestrian Detection with a Deep Model. Int J Comput Vis 120, 14–27 (2016). https://doi.org/10.1007/s11263-016-0890-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0890-9

Keywords