Article

Estimating human pose from occluded images

Authors:

Ming-Hsuan YangAuthors Info & Claims

ACCV'09: Proceedings of the 9th Asian conference on Computer Vision - Volume Part I

Pages 48 - 60

https://doi.org/10.1007/978-3-642-12307-8_5

Published: 23 September 2009 Publication History

Abstract

We address the problem of recovering 3D human pose from single 2D images, in which the pose estimation problem is formulated as a direct nonlinear regression from image observation to 3D joint positions. One key issue that has not been addressed in the literature is how to estimate 3D pose when humans in the scenes are partially or heavily occluded. When occlusions occur, features extracted from image observations (e.g., silhouettes-based shape features, histogram of oriented gradient, etc.) are seriously corrupted, and consequently the regressor (trained on un-occluded images) is unable to estimate pose states correctly. In this paper, we present a method that is capable of handling occlusions using sparse signal representations, in which each test sample is represented as a compact linear combination of training samples. The sparsest solution can then be efficiently obtained by solving a convex optimization problem with certain norms (such as l₁-norm). The corrupted test image can be recovered with a sparse linear combination of un-occluded training images which can then be used for estimating human pose correctly (as if no occlusions exist). We also show that the proposed approach implicitly performs relevant feature selection with un-occluded test images. Experimental results on synthetic and real data sets bear out our theory that with sparse representation 3D human pose can be robustly estimated when humans are partially or heavily occluded in the scenes.

References

[1]

Sigal, L., Isard, M., Sigelman, B., Black, M.: Attractive people: Assembling loose-limbed models using non-parametric belief propagation. In: NIPS, pp. 1539-1546 (2004)

[2]

Grauman, K., Shakhnarovich, G., Darrell, T.: Inferring 3d structure with a statistical imagebased shape model. In: ICCV, pp. 641-647 (2003)

[3]

Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Discriminative density propagation for 3d human motion estimation. In: CVPR, pp. 390-397 (2005)

[4]

Sigal, L., Black, M.: Predicting 3d people from 2d pictures. In: Perales, F.J., Fisher, R.B. (eds.) AMDO 2006. LNCS, vol. 4069, pp. 185-195. Springer, Heidelberg (2006)

[5]

Bo, L., Sminchisescu, C., Kanaujia, A., Metaxas, D.: Fast algorithms for large scale conditional 3d prediction. In: CVPR (2008)

[6]

Agarwal, A., Triggs, B.: A local basis representation for estimating human pose from cluttered images. In: Narayanan, P.J., Nayar, S.K., Shum, H.-Y. (eds.) ACCV 2006. LNCS, vol. 3851, pp. 50-59. Springer, Heidelberg (2006)

[7]

Elgammal, A., Lee, C.: Inferring 3d body pose from silhouettes using activity manifold learning. In: CVPR, vol. 2, pp. 681-688 (2004)

[8]

Jaeggli, T., Koller-Meier, E., Gool, L.V.: Learning generative models for multi-activity body pose estimation. IJCV 83(2), 121-134 (2009)

[9]

Sminchisescu, C., Kanaujia, A., Metaxas, D.: Bm3e: Discriminative density propagation for visual tracking. PAMI 29(11), 2030-2044 (2007)

[10]

Bissacco, A., Yang, M.H., Soatto, S.: Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In: CVPR, pp. 1-8 (2007)

[11]

Poppe, R.: Evaluating example-based pose estimation: experiments on the HumanEva sets. In: IEEEWorkshop on Evaluation of Articulated Human Motion and Pose Estimation (2007)

[12]

Okada, R., Soatto, S.: Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 434-445. Springer, Heidelberg (2008)

[13]

Ning, H., Xu, W., Gong, Y., Huang, T.: Discriminative learning of visual words for 3d human pose estimation. In: CVPR (2008)

[14]

Moeslund, T., Granum, E.: A survey of computer vision-based human motion capture. Computer Vision and Image Understanding 81(3), 231-268 (2001)

[15]

Gavrila, D.: The visual analysis of human movement: A survey. Computer Vision and Image Understanding 73(1), 82-98 (1999)

[16]

Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Transactions on Computers 22(1), 67-92 (1973)

[17]

Felzenszwalb, P., Huttenlocher, D.: Efficient matching of pictorial structures. In: CVPR, vol. 2, pp. 2066-2073 (2000)

[18]

Ronfard, R., Schmid, C., Triggs, B.: Learning to parse pictures of people. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 700- 714. Springer, Heidelberg (2002)

[19]

Ioffe, S., Forsyth, D.: Probabilistic methods for finding people. IJCV 43(1), 45-68 (2001)

[20]

Ramanan, D., Forsyth, D.: Finding and tracking people from the bottom up. In: CVPR, vol. 2, pp. 467-474 (2003)

[21]

Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: CVPR, vol. 2, pp. 326-333 (2004)

[22]

Taylor, C.J.: Reconstruction of articulated objects from point correspondence using a single uncalibrated image. In: CVPR, vol. 1, pp. 667-684 (2000)

[23]

Mori, G., Malik, J.: Estimating human body configurations using shape context matching. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part III. LNCS, vol. 2352, pp. 666-680. Springer, Heidelberg (2002)

[24]

Brand, M.: Shadow puppetry. In: ICCV, pp. 1237-1244 (1999)

[25]

Tipping, M.: Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211-244 (2004)

[26]

Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. PAMI 28(1), 44-58 (2006)

[27]

Rosales, R., Sclaroff, S.: Learning body pose via specialized maps. In: NIPS, pp. 1263-1270 (2001)

[28]

Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: ICCV, pp. 750-757 (2003)

[29]

Candes, E., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory 52(2), 489-509 (2006)

[30]

Candes, E., Tao, T.: Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Transactions on Information Theory 52(12), 5406-5425 (2006)

[31]

Donoho, D.: Compressed sensing. IEEE Transactions on Information Theory 52(4), 1289- 1306 (2006)

[32]

Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. PAMI 31(2), 210-227 (2009)

[33]

Boyd, S.P., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)

[34]

Chen, S., Donoho, D., Saunders, M.: Automatic decomposition by basis pursuit. SIAM Journal of Scientific Computation 20(1), 33-61 (1998)

[35]

Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT Press, Cambridge (2006)

Cited By

Hu LMa XHe CWang LCheng J(2023)Autoencoder and Masked Image Encoding-Based Attentional Pose NetworkPattern Recognition and Computer Vision10.1007/978-981-99-8432-9_18(221-233)Online publication date: 13-Oct-2023
https://dl.acm.org/doi/10.1007/978-981-99-8432-9_18
Monszpart AGuerrero PCeylan DYumer EMitra N(2019)iMapperACM Transactions on Graphics10.1145/3306346.332296138:4(1-15)Online publication date: 12-Jul-2019
https://dl.acm.org/doi/10.1145/3306346.3322961
Su YFeng ZZhang JPeng WXing M(2018)Sequential Articulated Motion Reconstruction from a Monocular Image SequenceACM Transactions on Multimedia Computing, Communications, and Applications10.1145/318042014:1s(1-21)Online publication date: 26-Mar-2018
https://dl.acm.org/doi/10.1145/3180420
Show More Cited By

Index Terms

Estimating human pose from occluded images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
        Reconstruction
      2. Computer vision tasks
        Scene understanding
  2. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

Recovering 3D Human Pose from Monocular Images

We describe a learning-based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labeling of body parts in the image. Instead, it recovers pose by ...
Model-based human pose estimation and tracking
Estimating Camera Pose Using Trajectories Generated by Pan-Tilt Motion
3DV '14: Proceedings of the 2014 2nd International Conference on 3D Vision - Volume 01

A novel method for auto-calibration of a PTZ (pan-tilt-zoom) camera network is proposed. The key idea on which it is based is to use pan-tilt motions generated by PTZ cameras themselves as calibration patterns. Generating and observing the pan-tilt ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ACCV'09: Proceedings of the 9th Asian conference on Computer Vision - Volume Part I

September 2009

383 pages

ISBN:3642123066

Editors:
Hongbin Zha
Department of Machine Intelligence, Peking University, Beijing, China
,
Rin-ichiro Taniguchi
Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan
,
Stephen Maybank
Birkbeck College, Department of Computer Science, University of London, London, UK

Sponsors

NSF of China: National Natural Science Foundation of China
Fujitsu
Microsoft Research: Microsoft Research
Key Laboratory of Machine Perception (MOE), Peking University: Key Laboratory of Machine Perception (MOE), Peking University
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences: National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 September 2009

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu LMa XHe CWang LCheng J(2023)Autoencoder and Masked Image Encoding-Based Attentional Pose NetworkPattern Recognition and Computer Vision10.1007/978-981-99-8432-9_18(221-233)Online publication date: 13-Oct-2023
https://dl.acm.org/doi/10.1007/978-981-99-8432-9_18
Monszpart AGuerrero PCeylan DYumer EMitra N(2019)iMapperACM Transactions on Graphics10.1145/3306346.332296138:4(1-15)Online publication date: 12-Jul-2019
https://dl.acm.org/doi/10.1145/3306346.3322961
Su YFeng ZZhang JPeng WXing M(2018)Sequential Articulated Motion Reconstruction from a Monocular Image SequenceACM Transactions on Multimedia Computing, Communications, and Applications10.1145/318042014:1s(1-21)Online publication date: 26-Mar-2018
https://dl.acm.org/doi/10.1145/3180420
Sarafianos NBoteanu BIonescu BKakadiaris I(2016)3D Human pose estimationComputer Vision and Image Understanding10.1016/j.cviu.2016.09.002152:C(1-20)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.cviu.2016.09.002

View Options

View options

Media

Figures

Other

Tables

View Table of Contents