Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Sequential Articulated Motion Reconstruction from a Monocular Image Sequence

Published: 26 March 2018 Publication History

Abstract

In this article, we present a sequential approach for articulated motion estimation from a 2D skeleton sequence. This is a challenging task due to the complexity of human movements and the inherent depth ambiguities. The proposed approach models the human movement on a kinematic manifold with the tangent bundle, which is a natural geometrical representation of articulated motion. Combined with a second-order stochastic dynamic model based on the Markov hypothesis, we generalize the Extended Rauch Tung Striebel smoother to a Riemannian manifold to simulate the process of human movement. The human motor system might violate the Markov hypothesis when the human body is subject to external forces, and therefore a refinement stage is introduced to correct the estimation error. Specifically, the current estimation is refined in a feasible solution region consisting of a set of local estimations. This region is called a simplex, in which each element can be represented by a convex hull of all ingredients. We have proved that the refinement problem can be converted into a convex optimization problem with the simplicial constraint. Since the proposed formulation conforms to the principles of kinematic and spatio-temporal continuity of articulated motion, the reconstruction ambiguity can be alleviated essentially. The performance of the proposed algorithm is conducted on multiple synthetic sequences from the CMU and the HDM05 MoCap databases. The results show that, without requiring any training data, the proposed approach achieves greater accuracy over state-of-the-art baselines. Furthermore, the proposed approach outperforms two baselines on real sequences from the Human3.6m MoCap database.

Supplementary Material

su (su.zip)
Supplemental movie and image files for, Sequential Articulated Motion Reconstruction from a Monocular Image Sequence

References

[1]
P.-A. Absil, Robert Mahony, and Rodolphe Sepulchre. 2009. Optimization Algorithms on Matrix Manifolds. Princeton University Press.
[2]
A. Agudo, F. Morenonoguer, B. Calvo, and J. M. Montiel. 2015. Sequential non-rigid structure from motion using physical priors. IEEE Trans. Pattern Anal. Mach. Intell. 38, 5 (2015), 979--994.
[3]
Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1446--1455.
[4]
Ijaz Akhter, Yaser Sheikh, Sohaib Khan, and Takeo Kanade. 2011. Trajectory space: A dual representation for nonrigid structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. 33, 7 (2011), 1442--1456.
[5]
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep It SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision. 561--578.
[6]
C. Bregler, A. Hertzmann, and H. Biermann. 2000. Recovering non-rigid 3D shape from image streams. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition. 690--696.
[7]
C. Bregler and J. Malik. 1998. Tracking people with twists and exponential maps. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 8.
[8]
Marcus A. Brubaker, D. J. Fleet, and A. Hertzmann. 2007. Physics-based person tracking using simplified lower-body dynamics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07). 1--8.
[9]
Yuansi Chen, Julien Mairal, and Zaid Harchaoui. 2014. Fast and robust archetypal analysis for representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1478--1485.
[10]
Anoop Cherian, Julien Mairal, Karteek Alahari, and Cordelia Schmid. 2014. Mixing body-part sequences for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2361--2368.
[11]
CMU. 2014. Carnegie Mellon University Motion Capture Database. Retrieved from http://mocap.cs.cmu.edu.
[12]
Yuchao Dai, Hongdong Li, and Mingyi He. 2012. A simple prior-free method for non-rigid structure-from-motion factorization. Int. J. Comput. Vis. 107, 2 (2012), 101--122.
[13]
B. Daubney and Xianghua Xie. 2011. Tracking 3D human pose with large root node uncertainty. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1321--1328.
[14]
L. Ding and A. M. Martinez. 2009. Modelling and recognition of the linguistic components in american sign language. Image Vis. Comput. 27, 12 (2009), 1826--1844.
[15]
T. Drummond and R. Cipolla. 2001. Real-time tracking of highly articulated structures in the presence of noisy measurements. In Proceedings of the 8th IEEE International Conference on Computer Vision, 2001 (ICCV’01), Vol. 2. 315--320.
[16]
J. Duetscher, A. Blake, and I. Reid. 2000. Articulated body motion capture by annealed particle filtering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000, Vol. 2. 126--133.
[17]
Adam Gonczarek and Jakub M. Tomczak. 2016. Articulated Tracking with Manifold Regularized Particle Filter. Springer-Verlag, New York, 275--286.
[18]
Michael Grant and Stephen Boyd. 2008. Graph implementations for nonsmooth convex programs. In Recent Advances in Learning and Control, V. Blondel, S. Boyd, and H. Kimura (Eds.). Springer-Verlag Limited, 95--110.
[19]
Michael Grant and Stephen Boyd. 2014. CVX: Matlab Software for Disciplined Convex Programming, Version 2.1. Retrieved from http://cvxr.com/cvx.
[20]
K. Grauman, G. Shakhnarovich, and T. Darrell. 2003. Inferring 3D structure with a statistical image-based shape model. In Proceedings of the IEEE International Conference on Computer Vision, 2003, Vol. 1. 641--647.
[21]
Sigmundur Gudmundsson and Elias Kappos. 2002. On the geometry of tangent bundles. Expos. Math. 20, 1 (2002), 1--41.
[22]
Jia Bin Huang and Ming Hsuan Yang. 2009. Estimating human pose from occluded images. In Proceedings of the Asian Conference on Computer Vision. 48--60.
[23]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7 (July 2014), 1325--1339.
[24]
Chen Kong and Simon Lucey. 2016. Prior-less compressible structure from motion. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 4123--4131.
[25]
Andreas M. Lehrmann, Peter V. Gehler, and Sebastian Nowozin. 2014. Efficient nonlinear markov models for human motion. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 1314--1321.
[26]
Spyridon Leonardos, Xiaowei Zhou, and Kostas Daniilidis. 2016. Articulated motion estimation from a monocular image sequence using spherical tangent bundles. In Proceedings of the IEEE International Conference on Robotics and Automation.
[27]
Cornelius T. Leondes, John B. Peller, and Edwin B. Stear. 1970. Nonlinear smoothing theory. IEEE Trans. Syst. Sci. Cybern. 6, 1 (1970), 63--71.
[28]
Jigang Liu, Dongquan Liu, Justin Dauwels, and Hock Soon Seah. 2015. 3D human motion tracking by exemplar-based conditional particle filter. Signal Process. 110 (2015), 164--177.
[29]
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. arXiv:1705.01583 (2017).
[30]
Alexandros Moutzouris, Jesus Martinez-Del-Rincon, Michal Lewandowski, Jean Christophe Nebel, and Dimitrios Makris. 2011. Human pose tracking in low dimensional space enhanced by limb correction. In Proceedings of the IEEE International Conference on Image Processing. 2301--2304.
[31]
Meinard Müller, Tido Röder, Michael Clausen, Bernhard Eberhardt, Björn Krüger, and Andreas Weber. 2007. Documentation Mocap Database HDM05. Technical report, No. CG-2007-2, Universität Bonn.
[32]
Bruce Xiaohan Nie, Caiming Xiong, and Song Chun Zhu. 2015. Joint action recognition and pose estimation from video. In Computer Vision and Pattern Recognition. 1293--1301.
[33]
S. I. Olsen and A. Bartoli. 2008. Implicit non-rigid structure-from-motion with priors. J. Math. Imag. Vis. 31, 2--3 (2008), 233--244.
[34]
Dennis Park and Deva Ramanan. 2011. N-best maximal decoders for part models. In Proceedings of the IEEE International Conference on Computer Vision. 2627--2634.
[35]
Hyun Soo Park and Yaser Sheikh. 2011. 3D reconstruction of a smooth articulated trajectory from a monocular image sequence. In Proceedings of the 2011 International Conference on Computer Vision. 201--208.
[36]
Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2012. Reconstructing 3d human pose from 2d image landmarks. Computer Vision, (ECCV’12) (2012), 573--586.
[37]
Leonid M. Raskin, Ehud Rivlin, and Michael Rudzsky. 2007. 3D human tracking with gaussian process annealed particle filter. In Visapp 2007: Proceedings of the Second International Conference on Computer Vision Theory and Applications. 459--465.
[38]
Ignasi Rius, Gonz, Jordi Lez, Javier Varona, and F. Xavier Roca. 2009. Action-specific motion prior for efficient Bayesian 3D human body tracking. Pattern Recognit. 42, 11 (2009), 2907--2921.
[39]
Marta Sanzari, Valsamis Ntouskos, and Fiora Pirri. 2016. Bayesian image based 3D pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 566--582.
[40]
S. Sedai, M. Bennamoun, and du Q. Huynh. 2013. A gaussian process guided particle filter for tracking 3D human pose in video. IEEE Trans. Image Process. 22, 11 (2013), 4286--4300.
[41]
Leonid Sigal. 2008. Continuous-State Graphical Models for Object Localization, Pose Estimation and Tracking. Brown University.
[42]
Leonid Sigal. 2014. Human Pose Estimation. Springer. 362--370 pages.
[43]
Leonid Sigal, David J. Fleet, Nikolaus F. Troje, and Micha Livne. 2010. Human Attributes from 3D Pose Tracking. Springer, Berlin, 243--257.
[44]
Leonid Sigal, Michael Isard, Horst Haussecker, and Michael J. Black. 2012. Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. Int. J. Comput. Vis. 98, 1 (2012), 15--48.
[45]
Edgar Simo-Serra, Arnau Ramisa, Guillem Alenyà, Carme Torras, and Francesc Moreno-Noguer. 2012. Single image 3D human pose estimation from noisy observations. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’12).
[46]
E Simo-Serra, C. Torras, and F. Moreno-Noguer. 2015. Lie algebra-based kinematic prior for 3D human pose tracking. In Proceedings of the IAPR International Conference on Machine. Vision Applications. 394--397.
[47]
Edgar Simo-Serra, Carme Torras, and Francesc Moreno-Noguer. 2016. 3D human pose tracking priors using geodesic mixture models. Int. J. Comput. Vis. (2016), 1--21.
[48]
Cristian Sminchisescu and Bill Triggs. 2003. Kinematic jump processes for monocular 3D human tracking. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 69--76.
[49]
Yale Song, David Demirdjian, and Randall Davis. 2012. Continuous body and hand gesture recognition for natural human-computer interaction. ACM Trans. Interactive Intell. Syst. 2, 1 (2012), 1--28.
[50]
Graham W. Taylor, Leonid Sigal, David J. Fleet, and Geoffrey E. Hinton. 2010. Dynamical binary latent variable models for 3d human pose tracking. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 631--638.
[51]
Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 991--1000.
[52]
Yan Tian, Leonid Sigal, Fernando De la Torre, and Yonghua Jia. 2013. Canonical locality preserving latent variable model for discriminative pose inference. Image Vis. Comput. 31, 3 (2013), 223--230.
[53]
L. Torresani, A. Hertzmann, and C. Bregler. 2008. Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Trans. Pattern Anal. Mach. Intell. 30, 5 (2008), 878--892.
[54]
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1653--1660.
[55]
Raquel Urtasun, David J. Fleet, and Pascal Fua. 2006. 3D people tracking with gaussian process dynamical models. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, 238--245.
[56]
Jack Valmadre, Yingying Zhu, Sridha Sridharan, and Simon Lucey. 2012. Efficient articulated trajectory reconstruction using dynamic programming and filters. In Proceedings of the European Conference on Computer Vision. 72--85.
[57]
Marek Vondrak, Leonid Sigal, and Odest Chadwicke Jenkins. 2012. Dynamical simulation priors for human motion tracking. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2012), 52--65.
[58]
Bastian Wandt, Hanno Ackermann, and Bodo Rosenhahn. 2016. 3D reconstruction of human motion from monocular image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 8 (2016), 1505--1516.
[59]
Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan Yuille, and Wen Gao. 2014. Robust estimation of 3d human poses from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2361--2368.
[60]
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 4724--4732.
[61]
Xiaolin Wei and Jinxiang Chai. 2010. Videomocap: Modeling physically realistic human motion from monocular video sequences. In ACM Trans. Graphics (TOG), Vol. 29. ACM, 42.
[62]
Xiaolin K. Wei and Jinxiang Chai. 2009. Modeling 3d human poses from uncalibrated monocular images. In Proc. 2009 IEEE 12th International Conference on Computer Vision. IEEE, 1873--1880.
[63]
Jing Xiao and T. Kanade. 2004. Non-rigid shape and motion recovery: Degenerate deformations. In Proc. IEEE Computer Society Conference on Computer Vision Pattern Recognition, Vol. 1. I--668--I--675.
[64]
Xinyu Xu and Baoxin Li. 2007. Learning motion correlation for tracking articulated human body with a rao-blackwellised particle filter. In Proc. IEEE International Conference on Computer Vision. 1--8.
[65]
Wei Yang, Wanli Ouyang, Hongsheng Li, and Xiaogang Wang. 2016. End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In Comput. Vis. Pattern Recognit., 3073--3082.
[66]
Y. Yang and D. Ramanan. 2013. Articulated human detection with flexible mixtures of parts. IEEE Trans Pattern Anal. Mach. Intell. 35, 12 (2013), 2878--90.
[67]
Angela Yao, Juergen Gall, Luc V. Gool, and Raquel Urtasun. 2011. Learning probabilistic non-linear latent variable models for tracking complex activities. In Advances in Neural Information Processing Systems. 1359--1367.
[68]
Hashim Yasin, Umar Iqbal, Björn Krüger, Andreas Weber, and Juergen Gall. 2015. 3D pose estimation from a single monocular image. arXiv:1509.06720 (2015).
[69]
Xiaoqin Zhang, Weiming Hu, Nianhua Xie, Hujun Bao, and Stephen Maybank. 2015. A robust tracking system for low frame rate video. Int. J. Comput. Vis. 115, 3 (2015), 279--304.
[70]
Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 4447--4455.
[71]
Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, and Kostas Daniilidis. 2017. Sparse representation for 3D shape estimation: A convex relaxation approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 8 (2017), 1648--1661.
[72]
Yingying Zhu, Mark Cox, and Simon Lucey. 2011. 3D motion reconstruction for real-world camera motion. In Proc. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 1--8.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 1s
Special Section on Representation, Analysis and Recognition of 3D Humans and Special Section on Multimedia Computing and Applications of Socio-Affective Behaviors in the Wild
March 2018
234 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3190503
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2018
Accepted: 01 December 2017
Revised: 01 December 2017
Received: 01 June 2017
Published in TOMM Volume 14, Issue 1s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Articulated motion
  2. Extend Rauch Tung Striebel smoother
  3. Riemannian manifold
  4. convex hull
  5. simplex

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Spatio-Temporal Articulation & Coordination Co-attention Graph Network for human motion predictionSignal Processing10.1016/j.sigpro.2024.109551223(109551)Online publication date: Oct-2024
  • (2024)MTAN: Multi-degree Tail-aware Attention Network for human motion predictionInternet of Things10.1016/j.iot.2024.10113425(101134)Online publication date: Apr-2024
  • (2024)Differential motion attention network for efficient action recognitionThe Visual Computer10.1007/s00371-024-03478-0Online publication date: 13-Jun-2024
  • (2019)An Image Cues Coding Approach for 3D Human Pose EstimationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/336806615:4(1-20)Online publication date: 16-Dec-2019

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media