research-article

Conditional Directed Graph Convolution for 3D Human Pose Estimation

Authors:

Changgong Zhang,

Tien-Tsin WongAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 602 - 611

https://doi.org/10.1145/3474085.3475219

Published: 17 October 2021 Publication History

Abstract

Graph convolutional networks have significantly improved 3D human pose estimation by representing the human skeleton as an undirected graph. However, this representation fails to reflect the articulated characteristic of human skeletons as the hierarchical orders among the joints are not explicitly presented. In this paper, we propose to represent the human skeleton as a directed graph with the joints as nodes and bones as edges that are directed from parent joints to child joints. By so doing, the directions of edges can explicitly reflect the hierarchical relationships among the nodes. Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses. Altogether, we form a U-shaped network, named U-shaped Conditional Directed Graph Convolutional Network, for 3D human pose estimation from monocular videos. To evaluate the effectiveness of our method, we conducted extensive experiments on two challenging large-scale benchmarks: Human3.6M and MPI-INF-3DHP. Both quantitative and qualitative results show that our method achieves top performance. Also, ablation studies show that directed graphs can better exploit the hierarchy of articulated human skeletons than undirected graphs, and the conditional connections can yield adaptive graph topologies for different poses.

References

[1]

James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Conference on Neural Information Processing Systems (NeurIPS).

Digital Library

[2]

Yujun Cai, Liuhao Ge, Jun Liu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan, and Nadia Magnenat Thalmann. 2019. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In IEEE/CVF International Conference on Computer Vision (ICCV).

[3]

Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 43, 1 (2019), 172--186.

Digital Library

[4]

Ching-Hang Chen and Deva Ramanan. 2017. 3d human pose estimation= 2d pose estimation+matching. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]

Yu Cheng, Bo Yang, Bo Wang, and Robby T Tan. 2020. 3d human pose estimation using spatio-temporal networks with explicit occlusion training. In AAAI Conference on Artificial Intelligence.

[7]

Yu Cheng, Bo Yang, Bo Wang, Wending Yan, and Robby T Tan. 2019. Occlusion-aware networks for 3d human pose estimation in video. In IEEE/CVF International Conference on Computer Vision (ICCV).

[8]

Hai Ci, Chunyu Wang, Xiaoxuan Ma, and Yizhou Wang. 2019. Optimizing network structure for 3d human pose estimation. In IEEE/CVF International Conference on Computer Vision (ICCV).

[9]

Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Conference on Neural Information Processing Systems (NeurIPS).

Digital Library

[10]

Jianbang Ding, Xuancheng Ren, Ruixuan Luo, and Xu Sun. 2019. An Adaptive and Momental Bound Method for Stochastic Learning. arXiv preprint arXiv:1910.12249 (2019).

[11]

David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Conference on Neural Information Processing Systems (NeurIPS).

Digital Library

[12]

Hao-Shu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3d pose estimation. In AAAI Conference on Artificial Intelligence.

[13]

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML).

Digital Library

[14]

Yihui He, Rui Yan, Katerina Fragkiadaki, and Shoou-I Yu. 2020. Epipolar transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[16]

Mir Rayat Imtiaz Hossain and James J Little. 2018. Exploiting temporal information for 3d human pose estimation. In European Conference on Computer Vision (ECCV).

[17]

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 36, 7 (2013), 1325--1339.

Digital Library

[18]

Umar Iqbal, Pavlo Molchanov, and Jan Kautz. 2020. Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]

Karim Iskakov, Egor Burkov, Victor Lempitsky, and Yury Malkov. 2019. Learnable triangulation of human pose. In IEEE/CVF International Conference on Computer Vision (ICCV).

[20]

Ron Levie, Federico Monti, Xavier Bresson, and Michael M Bronstein. 2018. Cayleynets: Graph convolutional neural networks with complex rational spectral filters. IEEE Transactions on Signal Processing, Vol. 67, 1 (2018), 97--109.

Digital Library

[21]

Ruoyu Li, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2018. Adaptive graph convolutional neural networks. In AAAI Conference on Artificial Intelligence.

[22]

Junbang Liang and Ming C Lin. 2019. Shape-aware human pose and shape reconstruction using multi-view images. In IEEE/CVF International Conference on Computer Vision (ICCV).

[23]

Jiahao Lin and Gim Hee Lee. 2019. Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation. In British Machine Vision Conference (BMVC).

[24]

Kenkun Liu, Rongqi Ding, Zhiming Zou, Le Wang, and Wei Tang. 2020 a. A comprehensive study of weight sharing in graph networks for 3d human pose estimation. In European Conference on Computer Vision (ECCV).

[25]

Ruixu Liu, Ju Shen, He Wang, Chen Chen, Sen-ching Cheung, and Vijayan Asari. 2020 b. Attention mechanism exploits temporal contexts: Real-time 3D human pose reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]

Diogo C Luvizon, David Picard, and Hedi Tabia. 2018. 2d/3d pose estimation and action recognition using multitask deep learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]

James Martens. 2010. Deep learning via hessian-free optimization. In International Conference on Machine Learning (ICML).

Digital Library

[28]

Julieta Martinez, Rayat Hossain, Javier Romero, and James J Little. 2017. A simple yet effective baseline for 3d human pose estimation. In IEEE/CVF International Conference on Computer Vision (ICCV).

[29]

Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3d human pose estimation in the wild using improved cnn supervision. In International Conference on 3D Vision (3DV).

[30]

Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Transactions on Graphics (SIGGRAPH), Vol. 36, 4 (2017), 1--14.

Digital Library

[31]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (ECCV).

[32]

Guanghan Ning, Jian Pei, and Heng Huang. 2020. Lighttrack: A generic framework for online top-down human pose tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW).

[33]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Conference on Neural Information Processing Systems (NeurIPS).

Digital Library

[34]

Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]

Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]

Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, and Wenjun Zeng. 2019. Cross view fusion for 3d human pose estimation. In IEEE/CVF International Conference on Computer Vision (ICCV).

[37]

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE Transactions on Neural Networks, Vol. 20, 1 (2008), 61--80.

Digital Library

[38]

Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019 a. Skeleton-based action recognition with directed graph neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]

Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019 b. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]

Mingyi Shi, Kfir Aberman, Andreas Aristidou, Taku Komura, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2020. MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency. ACM Transactions on Graphics (SIGGRAPH Asia), Vol. 40, 1 (2020), 1--15.

Digital Library

[41]

Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]

Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. 2017. Compositional human pose regression. In IEEE/CVF International Conference on Computer Vision (ICCV).

[43]

Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016a. Structured Prediction of 3D Human Pose with Deep Neural Networks. In British Machine Vision Conference (BMVC).

[44]

Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016b. Direct prediction of 3d body poses from motion compensated sequences. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]

Zhi Tian, Chunhua Shen, and Hao Chen. 2020. Conditional convolutions for instance segmentation. In European Conference on Computer Vision (ECCV).

Digital Library

[46]

Hanyue Tu, Chunyu Wang, and Wenjun Zeng. 2020. VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment. In European Conference on Computer Vision (ECCV).

Digital Library

[47]

Bastian Wandt and Bodo Rosenhahn. 2019. Repnet: Weakly supervised training of an adversarial reprojection network for 3d human pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]

Jianbo Wang, Kai Qiu, Houwen Peng, Jianlong Fu, and Jianke Zhu. 2019. AI coach: Deep human pose estimation and analysis for personalized athletic training assistance. In ACM International Conference on Multimedia (MM).

Digital Library

[49]

Jingbo Wang, Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2020. Motion guided 3d pose estimation from videos. In European Conference on Computer Vision (ECCV).

Digital Library

[50]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020).

[51]

Jingwei Xu, Zhenbo Yu, Bingbing Ni, Jiancheng Yang, Xiaokang Yang, and Wenjun Zhang. 2020. Deep kinematics analysis for monocular 3d human pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]

Sijie Yan, Zhizhong Li, Yuanjun Xiong, Huahan Yan, and Dahua Lin. 2019. Convolutional sequence generation for skeleton-based action synthesis. In IEEE/CVF International Conference on Computer Vision (ICCV).

[53]

Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI Conference on Artificial Intelligence.

[54]

Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan Ngiam. 2019. CondConv: Conditionally Parameterized Convolutions for Efficient Inference. In Conference on Neural Information Processing Systems (NeurIPS).

Digital Library

[55]

Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, and Stephen Lin. 2020. Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. In European Conference on Computer Vision (ECCV).

Digital Library

[56]

Changgong Zhang, Fangneng Zhan, and Yuan Chang. 2021. Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting. arXiv preprint arXiv:2104.03520 (2021).

[57]

Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, and Dimitris N Metaxas. 2019. Semantic graph convolutional networks for 3d human pose regression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]

Zhiming Zou, Kenkun Liu, Le Wang, and Wei Tang. 2020. High-order Graph Convolutional Networks for 3D Human Pose Estimation. In British Machine Vision Conference (BMVC).

Cited By

Bidulka LGholami MZheng JMcKeown MWang Z(2025)ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose EstimationNeurocomputing10.1016/j.neucom.2024.128605611(128605)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2024.128605
Xu FWang JSun YQi JDong ZSun Y(2025)Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in videoComputer Vision and Image Understanding10.1016/j.cviu.2024.104258251(104258)Online publication date: Feb-2025
https://doi.org/10.1016/j.cviu.2024.104258
Sun YDeng RWei D(2025)ST-LineNet: A spatiotemporal network for real-time 3D Pose estimation in martial arts trainingAlexandria Engineering Journal10.1016/j.aej.2024.12.097117(136-147)Online publication date: Apr-2025
https://doi.org/10.1016/j.aej.2024.12.097
Show More Cited By

Index Terms

Conditional Directed Graph Convolution for 3D Human Pose Estimation
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion capture

Recommendations

3D human pose estimation with multi-scale graph convolution and hierarchical body pooling
Abstract
Since human pose can be naturally represented by a graph, graph convolutional networks (GCNs) have recently been proposed for 3D human pose estimation and achieved promising results. But most GCN-based methods use vanilla graph convolution which ...
Exploiting Temporal Information for 3D Human Pose Estimation
Computer Vision – ECCV 2018
Abstract
In this work, we address the problem of 3D human pose estimation from a sequence of 2D human poses. Although the recent success of deep networks has led many state-of-the-art methods for 3D pose estimation to train deep networks end-to-end to ...
MANet: Multi-level Attention Network for 3D Human Shape and Pose Estimation
Advances in Computer Graphics
Abstract
Although there has been some progress in 3D human pose and shape estimation, accurately predicting complex human poses is still challenging. To tackle this issue and improve the accuracy of the human mesh reconstruction, we propose an end-to-end ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
373
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)5

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bidulka LGholami MZheng JMcKeown MWang Z(2025)ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose EstimationNeurocomputing10.1016/j.neucom.2024.128605611(128605)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2024.128605
Xu FWang JSun YQi JDong ZSun Y(2025)Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in videoComputer Vision and Image Understanding10.1016/j.cviu.2024.104258251(104258)Online publication date: Feb-2025
https://doi.org/10.1016/j.cviu.2024.104258
Sun YDeng RWei D(2025)ST-LineNet: A spatiotemporal network for real-time 3D Pose estimation in martial arts trainingAlexandria Engineering Journal10.1016/j.aej.2024.12.097117(136-147)Online publication date: Apr-2025
https://doi.org/10.1016/j.aej.2024.12.097
Zhang ZPeng QZhang LZhang ZHuang W(2024)STAPFormer: A New 3D Human Pose Estimation Framework in Sports and HealthProceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3698587.3701367(1-10)Online publication date: 22-Nov-2024
https://dl.acm.org/doi/10.1145/3698587.3701367
Zhan LYing EGan JGuo SGao BQin YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)SATPose: Improving Monocular 3D Pose Estimation with Spatial-aware Ground TactilityProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681654(6192-6201)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681654
Tang TLiu HYou YWang TLi WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680881
Han XRen YYao YSun YMa YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Towards Practical Human Motion Prediction with LiDAR Point CloudsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680720(7629-7638)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680720
Cai XZhang HHe SSong HSun HGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera VideosProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658066(202-210)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658066
Armani RQian CJiang JHolz C(2024)Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband RangingACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657465(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657465
Kyriakou Tde la Campa Crespo MPanayiotou AChrysanthou YCharalambous PAristidou A(2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15065
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents