research-article

An Image Cues Coding Approach for 3D Human Pose Estimation

Authors:

Jianhai ZhangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 15, Issue 4

Article No.: 113, Pages 1 - 20

https://doi.org/10.1145/3368066

Published: 16 December 2019 Publication History

Abstract

Although Deep Convolutional Neural Networks (DCNNs) facilitate the evolution of 3D human pose estimation, ambiguity remains the most challenging problem in such tasks. Inspired by the Human Perception Mechanism (HPM), we propose an image-to-pose coding method to fill the gap between image cues and 3D poses, thereby alleviating the ambiguity of 3D human pose estimation. First, in 3D pose space, we divide the whole 3D pose space into multiple subregions named pose codes, turning a disambiguation problem into a classification problem. The proposed coding mechanism covers multiple camera views and provides a complete description for 3D pose space. Second, it is noteworthy that the articulated structure of the human body lies on a sophisticated product manifold and the error accumulation in the chain structure will undoubtedly affect the coding performance. Therefore, in image space, we extract the image cues from independent local image patches rather than the whole image. The mapping relationship between image cues and 3D pose codes is established by a set of DCNNs. The image-to-pose coding method transforms the implicit image cues into explicit constraints. Finally, the image-to-pose coding method is integrated into a linear matching mechanism to construct a 3D pose estimation method that effectively alleviates the ambiguity. We conduct extensive experiments on widely used public benchmarks. The experimental results show that our method effectively alleviates the ambiguity in 3D pose recovery and is robust to the variations of view.

References

[1]

Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1446--1455.

[2]

Sikandar Amin, Mykhaylo Andriluka, Marcus Rohrbach, and Bernt Schiele. 2013. Multi-view pictorial structures for 3D human pose estimation. In Bmvc. Citeseer.

[3]

Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3686--3693.

Digital Library

[4]

Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. 2010. Monocular 3D pose estimation and tracking by detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 623--630.

[5]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In European Conference on Computer Vision. Springer, 561--578.

[6]

Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.

Digital Library

[7]

Ching-Hang Chen and Deva Ramanan. 2017. 3D human pose estimation= 2D pose estimation+ matching. In Conference on Computer Vision and Pattern Recognition. IEEE, 7035--7043.

[8]

Mingliang Chen, Weiyao Lin, and Bing Zhou. 2015. A real-time virtual dressing system with RGB-D camera. In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA’15). IEEE, 1041--1044.

[9]

Xiaochuan Fan, Kang Zheng, Youjie Zhou, and Song Wang. 2014. Pose locality constrained representation for 3D human pose reconstruction. In European Conference on Computer Vision. Springer, 174--188.

[10]

Haoshu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3D pose estimation. In The National Conference on Artificial Intelligence. AAAI, 6821--6828.

[11]

Albert Haque, Boya Peng, Zelun Luo, Alexandre Alahi, Serena Yeung, and Li Fei-Fei. 2016. Towards viewpoint invariant 3D human pose estimation. In European Conference on Computer Vision. Springer, 160--177.

[12]

Michael Hofmann and Dariu M. Gavrila. 2012. Multi-view 3D human pose estimation in complex environment. International Journal of Computer Vision 96, 1, 103--124.

Digital Library

[13]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3. 6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7, 1325--1339.

Digital Library

[14]

Sam Johnson and Mark Everingham. 2010. Clustered pose and nonlinear appearance models for human pose estimation. In The British Machine Vision Conference, Vol. 2. Springer, 5 pages.

[15]

Ilya Kostrikov and Juergen Gall. 2014. Depth sweep regression forests for estimating 3D human pose from images. In The British Machine Vision Conference, Vol. 1. Springer, 5 pages.

[16]

Hsi-Jian Lee, Chen Zen, et al. 1985. Determination of 3D human-body postures from a single view. Computer Vision Graphics and Image Processing 30, 2, 148--168.

[17]

Sijin Li and Antoni B. Chan. 2014. 3D human pose estimation from monocular images with deep convolutional neural network. In Asian Conference on Computer Vision. Springer, 332--347.

[18]

Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang Wang, and Alex Kot Chichung. 2019. Feature boosting network for 3D pose estimation. In Transactions on Pattern Analysis and Machine Intelligence. IEEE, 1--11.

[19]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Transactions on Graphics 34, 248 (2015), 6 pages.

Digital Library

[20]

Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3D human pose estimation. In International Conference on Computer Vision. IEEE, 2640--2649.

[21]

Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D human pose estimation in the wild using improved CNN supervision. In International Conference on 3D Vision (3DV’17). IEEE, 506--516.

[22]

Francesc Moreno-Noguer. 2017. 3D human pose estimation from a single image via distance matrix regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, 1561--1570.

[23]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483--499.

[24]

Bruce Xiaohan Nie, Ping Wei, and Song-Chun Zhu. 2017. Monocular 3D human pose estimation by predicting depth on joints. In IEEE International Conference on Computer Vision. IEEE, 3467--3475.

[25]

Vasu Parameswaran and Rama Chellappa. 2004. View independent human body pose estimation from a single perspective image. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04). Vol. 2. IEEE, II--II.

[26]

G. Pavlakos, X. Zhou, K. G. Derpanis, et al. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Conference on Computer Vision and Pattern Recognition. IEEE, 7025--7034.

[27]

Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2012. Reconstructing 3D human pose from 2D image landmarks. In European Conference on Computer Vision. Springer, 573--586.

Digital Library

[28]

Grégory Rogez and Cordelia Schmid. 2016. Mocap-guided data augmentation for 3D pose estimation in the wild. In Advances in Neural Information Processing Systems. MIT Press, 3108--3116.

[29]

Leonid Sigal, Alexandru O. Balan, and Michael J. Black. 2010. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision 87, 1--2, 4.

Digital Library

[30]

Leonid Sigal, Michael Isard, Horst Haussecker, and Michael J. Black. 2012. Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. International Journal of Computer Vision 98, 1, 15--48.

Digital Library

[31]

Edgar Simo-Serra, Arnau Ramisa, Guillem Alenyà, Carme Torras, and Francesc Moreno-Noguer. 2012. Single image 3D human pose estimation from noisy observations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, 2673--2680.

[32]

Yong Su, Zhiyong Feng, Jianhai Zhang, Weilong Peng, and Meng Xing. 2018. Sequential articulated motion reconstruction from a monocular image sequence. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 1s, 23.

Digital Library

[33]

Camillo J. Taylor. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. Computer Vision and Image Understanding 80, 3, 349--363.

Digital Library

[34]

Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 991--1000.

[35]

Denis Tome, Christopher Russell, and Lourdes Agapito. 2017. Lifting from the deep: Convolutional 3D pose estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2500--2509.

[36]

Xiaolin K. Wei and Jinxiang Chai. 2009. Modeling 3D human poses from uncalibrated monocular images. In IEEE 12th International Conference on Computer Vision. IEEE, 1873--1880.

[37]

Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, and William T. Freeman. 2016. Single image 3D interpreter network. In European Conference on Computer Vision. Springer, 365--382.

[38]

Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, and Xiaogang Wang. 2018. 3D human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 5255--5264.

[39]

Hashim Yasin, Umar Iqbal, Bjorn Kruger, Andreas Weber, and Juergen Gall. 2016. A dual-source approach for 3D pose estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 4948--4956.

[40]

Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In IEEE International Conference on Computer Vision. IEEE, 398--407.

[41]

Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 4966--4975.

[42]

Xiaowei Zhou, Menglong Zhu, Georgios Pavlakos, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2019. Monocap: Monocular human motion capture using a cnn coupled with a geometric prior. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 4 (2019), 901--914.

Digital Library

Cited By

Zhang SWang CDong WFan B(2022)A Survey on Depth Ambiguity of 3D Human Pose EstimationApplied Sciences10.3390/app12201059112:20(10591)Online publication date: 20-Oct-2022
https://doi.org/10.3390/app122010591
Zhang BXiao YXiong FWu CCao ZLiu PZhou J(2022)3D human pose estimation with cross-modality training and multi-scale local refinementApplied Soft Computing10.1016/j.asoc.2022.108950122(108950)Online publication date: Jun-2022
https://doi.org/10.1016/j.asoc.2022.108950
Navghare NGladence L(2021)End to End Learning Human Pose Detection Using Convolutional Neural NetworksMachine Learning and Information Processing10.1007/978-981-33-4859-2_13(135-142)Online publication date: 3-Apr-2021
https://doi.org/10.1007/978-981-33-4859-2_13

Index Terms

An Image Cues Coding Approach for 3D Human Pose Estimation
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

Image-Based Synthesis for Deep 3D Human Pose Estimation

This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, ...
3D Human pose estimation

Review of the recent literature in 3D human pose estimation from RGB images and videos.Release of a challenging, publicly available, 3D pose estimation synthetic dataset.Extensive experimental evaluation of some representative state-of-the-art methods. ...
A Survey of Recent Advances on Two-Step 3D Human Pose Estimation
Intelligent Systems
Abstract
Human pose estimation in images is an important and challenging problem in Computer Vision. Currently, methods that employ deep learning techniques excel in the task of 2D human pose estimation. 2D human poses can be used in a diverse and broad ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 15, Issue 4

November 2019

322 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3376119

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2019

Accepted: 01 August 2019

Revised: 01 May 2019

Received: 01 October 2018

Published in TOMM Volume 15, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Shenzhen Science and Technology Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
311
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)3

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang SWang CDong WFan B(2022)A Survey on Depth Ambiguity of 3D Human Pose EstimationApplied Sciences10.3390/app12201059112:20(10591)Online publication date: 20-Oct-2022
https://doi.org/10.3390/app122010591
Zhang BXiao YXiong FWu CCao ZLiu PZhou J(2022)3D human pose estimation with cross-modality training and multi-scale local refinementApplied Soft Computing10.1016/j.asoc.2022.108950122(108950)Online publication date: Jun-2022
https://doi.org/10.1016/j.asoc.2022.108950
Navghare NGladence L(2021)End to End Learning Human Pose Detection Using Convolutional Neural NetworksMachine Learning and Information Processing10.1007/978-981-33-4859-2_13(135-142)Online publication date: 3-Apr-2021
https://doi.org/10.1007/978-981-33-4859-2_13

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents