research-article

Cross Refinement Techniques for Markerless Human<?brk?> Motion Capture

Authors:

Xinguo LiuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 16, Issue 1

Article No.: 6, Pages 1 - 18

https://doi.org/10.1145/3372207

Published: 04 March 2020 Publication History

Abstract

This article presents a global 3D human pose estimation method for markerless motion capture. Given two calibrated images of a person, it first obtains the 2D joint locations in the images using a pre-trained 2D Pose CNN, then constructs the 3D pose based on stereo triangulation. To improve the accuracy and the stability of the system, we propose two efficient optimization techniques for the joints. The first one, called cross-view refinement, optimizes the joints based on epipolar geometry. The second one, called cross-joint refinement, optimizes the joints using bone-length constraints. Our method automatically detects and corrects the unreliable joint, and consequently is robust against heavy occlusion, symmetry ambiguity, motion blur, and highly distorted poses. We evaluate our method on a number of benchmark datasets covering indoors and outdoors, which showed that our method is better than or on par with the state-of-the-art methods. As an application, we create a 3D human pose dataset using the proposed motion capture system, which contains about 480K images of both indoor and outdoor scenes, and demonstrate the usefulness of the dataset for human pose estimation.

References

[1]

Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the IEEE CVPR.

[2]

Sikandar Amin, Mykhaylo Andriluka, Marcus Rohrbach, and Bernt Schiele. 2013. Multi-view pictorial structures for 3D human pose estimation. In Proceedings of the BMVC.

[3]

Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE CVPR. 1014--1021.

[4]

Michal Balazia and Petr Sojka. 2018. Gait recognition from motion capture data. ACM Trans. Multim. Comput. Commun. Appl. 14, 1s (2018), 22:1--22:18.

Digital Library

[5]

Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2014. 3D pictorial structures for multiple human pose estimation. In Proceedings of the IEEE CVPR. 1669--1676.

Digital Library

[6]

Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2016. 3D pictorial structures revisited: Multiple human pose estimation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 10 (2016), 1929--1942.

Digital Library

[7]

Martin Bergtholdt, Jörg Kappes, Stefan Schmidt, and Christoph Schnörr. 2010. A study of parts-based object class detection using complete graphs. Int. J. Comput. Vis. 87, 1--2 (2010), 93.

Digital Library

[8]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the ECCV. Springer, 561--578.

[9]

Magnus Burenius, Josephine Sullivan, and Stefan Carlsson. 2013. 3D pictorial structures for multiple view articulated pose estimation. In Proceedings of the IEEE CVPR. 3618--3625.

Digital Library

[10]

Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE CVPR.

[11]

Joao Carreira, Pulkit Agrawal, Katerina Fragkiadaki, and Jitendra Malik. 2016. Human pose estimation with iterative error feedback. In Proceedings of the IEEE CVPR. 4733--4742.

[12]

Ching-Hang Chen and Deva Ramanan. 2017. 3D human pose estimation= 2D pose estimation+ matching. In Proceedings of the IEEE CVPR, Vol. 2. 6.

[13]

Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, and Liang Lin. 2019. Weakly supervised discovery of geometry-aware representation for 3D human pose estimation. In Proceedings of the IEEE CVPR.

[14]

Xianjie Chen and Alan L. Yuille. 2014. Articulated pose estimation by a graphical model with image dependent pairwise relations. In Proceedings of the NIPS. 1736--1744.

[15]

Yen-Lin Chen and Jinxiang Chai. 2009. 3D reconstruction of human motion and skeleton from uncalibrated monocular video. In Proceedings of the ACCV. Springer.

[16]

Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, J. Thompson, Leonid Pishchulin, Mykhaylo Andriluka, Christoph Bregler, Bernt Schiele, and Christian Theobalt. 2017. MARCOnl-ConvNet-based MARker-less motion capture in outdoor and indoor scenes. IEEE Trans. Pattern Anal. Mach. Intell. 39, 3 (2017), 501--514.

Digital Library

[17]

Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Mykhaylo Andriluka, Christoph Bregler, Bernt Schiele, and Christian Theobalt. 2015. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE CVPR. 3810--3818.

[18]

Haoshu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2017. Learning knowledge-guided pose grammar machine for 3D human pose estimation. arXiv preprint:1710.06513 (2017).

[19]

Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2005. Pictorial structures for object recognition. Int. J. Comput. Vis. 61, 1 (2005), 55--79.

Digital Library

[20]

Martin A. Fischler and Robert A. Elschlager. 1973. The representation and matching of pictorial structures. IEEE Trans. Comput. 100, 1 (1973), 67--92.

Digital Library

[21]

Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.

[22]

Edmond S. L. Ho, Jacky C. P. Chan, Taku Komura, and Howard Leung. 2013. Interactive partner control in close interactions for real-time applications. ACM Trans. Multim. Comput. Commun. Applic. 9, 3 (2013), 21.

[23]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7 (2014), 1325--1339.

Digital Library

[24]

Vahid Kazemi, Magnus Burenius, Hossein Azizpour, and Josephine Sullivan. 2013. Multi-view body part recognition with random forests. In Proceedings of the BMVC.

[25]

Muhammed Kocabas, Salih Karagoz, and Emre Akbas. 2019. Self-supervised learning of 3D human pose using multi-view geometry. In Proceedings of the IEEE CVPR. 1077–1086.

[26]

Miaopeng Li, Zimeng Zhou, Jie Li, and Xinguo Liu. 2018. Bottom-up pose estimation of multiple person with bounding box constraint. In Proceedings of the IEEE ICPR.

[27]

Miaopeng Li, Zimeng Zhou, and Xinguo Liu. 2019. Multi-person pose estimation using bounding box constraint and LSTM. IEEE Trans. Multim. 21, 10 (2019), 2653–2663.

Digital Library

[28]

Sijin Li and Antoni B. Chan. 2014. 3D human pose estimation from monocular images with deep convolutional neural network. In Proceedings of the ACCV. Springer, 332--347.

[29]

Sijin Li, Weichen Zhang, and Antoni B. Chan. 2015. Maximum-margin structured learning with deep networks for 3D human pose estimation. In Proceedings of the ICCV. 2848--2856.

[30]

Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6 (2015), 248.

Digital Library

[31]

Alvaro Marcos-Ramiro, Daniel Pizarro, Marta Marron-Romera, and Daniel Gatica-Perez. 2015. Let your body speak: Communicative cue extraction on natural interaction using RGBD data. IEEE Trans. Multim. 17, 10 (2015), 1721--1732.

Digital Library

[32]

Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3D human pose estimation. In Proceedings of the IEEE ICCV, Vol. 206. 3.

[33]

Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D human pose estimation in the wild using improved CNN supervision. In Proceedings of the 3DV.

[34]

Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4 (2017), 44.

Digital Library

[35]

Thomas B. Moeslund, Adrian Hilton, and Volker Krüger. 2006. A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104, 2–3 (2006), 90--127.

Digital Library

[36]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the ECCV. Springer, 483--499.

[37]

Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE CVPR. 1263--1272.

[38]

Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Harvesting multiple views for marker-less 3D human pose annotations. arXiv preprint:1704.04793 (2017).

[39]

Tomas Pfister, James Charles, and Andrew Zisserman. 2015. Flowing ConvNets for human pose estimation in videos. In Proceedings of the IEEE ICCV. 1913--1921.

Digital Library

[40]

Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2012. Reconstructing 3D human pose from 2D image landmarks. In Proceedings of the ECCV. Springer.

Digital Library

[41]

Marta Sanzari, Valsamis Ntouskos, and Fiora Pirri. 2016. Bayesian image based 3D pose estimation. In Proceedings of the ECCV. Springer, 566--582.

[42]

Yemin Shi, Yonghong Tian, Yaowei Wang, and Tiejun Huang. 2017. Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Trans. Multim. 19, 7 (2017), 1510--1520.

Digital Library

[43]

Leonid Sigal, Alexandru O. Balan, and Michael J. Black. 2010. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87, 1--2 (2010), 4.

Digital Library

[44]

Leonid Sigal, Michael Isard, Horst Haussecker, and Michael J. Black. 2012. Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. Int. J. Comput. Vis. 98, 1 (2012), 15--48.

Digital Library

[45]

Yong Su, Zhiyong Feng, Jianhai Zhang, Weilong Peng, and Meng Xing. 2018. Sequential articulated motion reconstruction from a monocular image sequence. ACM Trans. Multim. Comput. Commun. Applic. 14, 1s (2018), 23.

[46]

Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. 2017. Compositional human pose regression. In Proceedings of the IEEE ICCV.

[47]

Graham W. Taylor, Leonid Sigal, David J. Fleet, and Geoffrey E. Hinton. 2010. Dynamical binary latent variable models for 3D human pose tracking. In Proceedings of the IEEE CVPR. 631--638.

[48]

Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016. Structured prediction of 3D human pose with deep neural networks. In Proceedings of the BMVC.

[49]

Bugra Tekin, Pablo Marquez Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2D and 3D image cues for monocular body pose estimation. In Proceedings of the IEEE ICCV.

[50]

Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proceedings of the IEEE CVPR. 991--1000.

[51]

Jonathan J. Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Proceedings of the NIPS. 1799--1807.

[52]

Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE CVPR. 1653--1660.

Digital Library

[53]

Bastian Wandt, Hanno Ackermann, and Bodo Rosenhahn. 2016. 3D reconstruction of human motion from monocular image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1505–1516.

Digital Library

[54]

Bastian Wandt, Hanno Ackermann, and Bodo Rosenhahn. 2018. A kinematic chain space for monocular motion capture. In Proceedings of the ECCV.

[55]

Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan L. Yuille, and Wen Gao. 2014. Robust estimation of 3D human poses from a single image. In Proceedings of the IEEE CVPR.

Digital Library

[56]

Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE CVPR.

[57]

Jiahong Wu, He Zheng, Bo Zhao, Yixin Li, Baoming Yan, Rui Liang, Wenjia Wang, Shipei Zhou, Guosen Lin, Yanwei Fu et al. 2017. AI challenger: A large-scale dataset for going deeper in image understanding. arXiv preprint:1711.06475 (2017).

[58]

Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, and Xiaogang Wang. 2018. 3D human pose estimation in the wild by adversarial learning. arXiv preprint:1803.09722 (2018).

[59]

Angela Yao, Juergen Gall, Luc V. Gool, and Raquel Urtasun. 2011. Learning probabilistic non-linear latent variable models for tracking complex activities. In Proceedings of the NIPS. 1359--1367.

[60]

Hashim Yasin, Umar Iqbal, Bjorn Kruger, Andreas Weber, and Juergen Gall. 2016. A dual-source approach for 3D pose estimation from a single image. In Proceedings of the IEEE CVPR. 4948--4956.

[61]

Petrissa Zell, Bastian Wandt, and Bodo Rosenhahn. 2017. Joint 3D human motion capture and physical analysis from monocular videos. In Proceedings of the IEEE CVPRW.

[62]

Feng Zhou and Fernando De la Torre. 2014. Spatio-temporal matching for human detection in video. In Proceedings of the ECCV. Springer, 62--77.

[63]

Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly supervised approach. In Proceedings of the IEEE ICCV.

[64]

Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In Proceedings of the IEEE CVPR. 4447--4455.

[65]

Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep kinematic pose regression. In Proceedings of the ECCV. Springer, 186--201.

[66]

Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2016. Sparseness meets deepness: 3D human pose estimation from monocular video. In Proceedings of the IEEE CVPR. 4966--4975.

Cited By

YANG WLI YXING SCAI JWANG X(2023)Lightweight multi-person motion capture system in the wildSCIENTIA SINICA Informationis10.1360/SSI-2022-039753:11(2230)Online publication date: 31-Oct-2023
https://doi.org/10.1360/SSI-2022-0397
Liu GWang JZhang ZLiu QRen YZhang MChen SBai P(2023)A Novel Model for Intelligent Pull-Ups Test Based on Key Point Estimation of Human Body and EquipmentMobile Information Systems10.1155/2023/36492172023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/3649217
Chen XJiang XZhan LGuo SRuan QLuo GLiao MQin Y(2022)Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible SensorsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3564700Online publication date: 29-Sep-2022
https://doi.org/10.1145/3564700
Show More Cited By

Index Terms

Cross Refinement Techniques for Markerless Human Motion Capture
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion capture

Recommendations

Camera Network Calibration and Synchronization from Silhouettes in Archived Video

In this paper we present an automatic method for calibrating a network of cameras that works by analyzing only the motion of silhouettes in the multiple video streams. This is particularly useful for automatic reconstruction of a dynamic event using a ...
Recovering Multiple View Geometry from Mutual Projections of Multiple Cameras

In this paper, we analyze the computation of epipolar geometry in some special cases where multiple cameras are projected each other in their images. In such cases, epipoles can be obtained directly from images as the projection of cameras. As the ...
Multi-view structure-from-motion for hybrid camera scenarios

We describe a pipeline for structure-from-motion (SfM) with mixed camera types, namely omnidirectional and perspective cameras. For the steps of this pipeline, we propose new approaches or adapt the existing perspective camera methods to make the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16, Issue 1

February 2020

363 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3384216

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2020

Accepted: 01 November 2019

Revised: 01 November 2019

Received: 01 May 2019

Published in TOMM Volume 16, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

NSFC
FaceUnity Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
287
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

YANG WLI YXING SCAI JWANG X(2023)Lightweight multi-person motion capture system in the wildSCIENTIA SINICA Informationis10.1360/SSI-2022-039753:11(2230)Online publication date: 31-Oct-2023
https://doi.org/10.1360/SSI-2022-0397
Liu GWang JZhang ZLiu QRen YZhang MChen SBai P(2023)A Novel Model for Intelligent Pull-Ups Test Based on Key Point Estimation of Human Body and EquipmentMobile Information Systems10.1155/2023/36492172023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/3649217
Chen XJiang XZhan LGuo SRuan QLuo GLiao MQin Y(2022)Full-body Human Motion Reconstruction with Sparse Joint Tracking Using Flexible SensorsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3564700Online publication date: 29-Sep-2022
https://doi.org/10.1145/3564700
Badiola-Bengoa AMendez-Zorrilla A(2021)A Systematic Review of the Application of Camera-Based Human Pose Estimation in the Field of Sport and Physical ExerciseSensors10.3390/s2118599621:18(5996)Online publication date: 7-Sep-2021
https://doi.org/10.3390/s21185996

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents