Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475219acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Conditional Directed Graph Convolution for 3D Human Pose Estimation

Published: 17 October 2021 Publication History

Abstract

Graph convolutional networks have significantly improved 3D human pose estimation by representing the human skeleton as an undirected graph. However, this representation fails to reflect the articulated characteristic of human skeletons as the hierarchical orders among the joints are not explicitly presented. In this paper, we propose to represent the human skeleton as a directed graph with the joints as nodes and bones as edges that are directed from parent joints to child joints. By so doing, the directions of edges can explicitly reflect the hierarchical relationships among the nodes. Based on this representation, we further propose a spatial-temporal conditional directed graph convolution to leverage varying non-local dependence for different poses by conditioning the graph topology on input poses. Altogether, we form a U-shaped network, named U-shaped Conditional Directed Graph Convolutional Network, for 3D human pose estimation from monocular videos. To evaluate the effectiveness of our method, we conducted extensive experiments on two challenging large-scale benchmarks: Human3.6M and MPI-INF-3DHP. Both quantitative and qualitative results show that our method achieves top performance. Also, ablation studies show that directed graphs can better exploit the hierarchy of articulated human skeletons than undirected graphs, and the conditional connections can yield adaptive graph topologies for different poses.

References

[1]
James Atwood and Don Towsley. 2016. Diffusion-convolutional neural networks. In Conference on Neural Information Processing Systems (NeurIPS).
[2]
Yujun Cai, Liuhao Ge, Jun Liu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan, and Nadia Magnenat Thalmann. 2019. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In IEEE/CVF International Conference on Computer Vision (ICCV).
[3]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 43, 1 (2019), 172--186.
[4]
Ching-Hang Chen and Deva Ramanan. 2017. 3d human pose estimation= 2d pose estimation+matching. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6]
Yu Cheng, Bo Yang, Bo Wang, and Robby T Tan. 2020. 3d human pose estimation using spatio-temporal networks with explicit occlusion training. In AAAI Conference on Artificial Intelligence.
[7]
Yu Cheng, Bo Yang, Bo Wang, Wending Yan, and Robby T Tan. 2019. Occlusion-aware networks for 3d human pose estimation in video. In IEEE/CVF International Conference on Computer Vision (ICCV).
[8]
Hai Ci, Chunyu Wang, Xiaoxuan Ma, and Yizhou Wang. 2019. Optimizing network structure for 3d human pose estimation. In IEEE/CVF International Conference on Computer Vision (ICCV).
[9]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Conference on Neural Information Processing Systems (NeurIPS).
[10]
Jianbang Ding, Xuancheng Ren, Ruixuan Luo, and Xu Sun. 2019. An Adaptive and Momental Bound Method for Stochastic Learning. arXiv preprint arXiv:1910.12249 (2019).
[11]
David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Conference on Neural Information Processing Systems (NeurIPS).
[12]
Hao-Shu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3d pose estimation. In AAAI Conference on Artificial Intelligence.
[13]
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In International Conference on Machine Learning (ICML).
[14]
Yihui He, Rui Yan, Katerina Fragkiadaki, and Shoou-I Yu. 2020. Epipolar transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[16]
Mir Rayat Imtiaz Hossain and James J Little. 2018. Exploiting temporal information for 3d human pose estimation. In European Conference on Computer Vision (ECCV).
[17]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 36, 7 (2013), 1325--1339.
[18]
Umar Iqbal, Pavlo Molchanov, and Jan Kautz. 2020. Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Karim Iskakov, Egor Burkov, Victor Lempitsky, and Yury Malkov. 2019. Learnable triangulation of human pose. In IEEE/CVF International Conference on Computer Vision (ICCV).
[20]
Ron Levie, Federico Monti, Xavier Bresson, and Michael M Bronstein. 2018. Cayleynets: Graph convolutional neural networks with complex rational spectral filters. IEEE Transactions on Signal Processing, Vol. 67, 1 (2018), 97--109.
[21]
Ruoyu Li, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2018. Adaptive graph convolutional neural networks. In AAAI Conference on Artificial Intelligence.
[22]
Junbang Liang and Ming C Lin. 2019. Shape-aware human pose and shape reconstruction using multi-view images. In IEEE/CVF International Conference on Computer Vision (ICCV).
[23]
Jiahao Lin and Gim Hee Lee. 2019. Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation. In British Machine Vision Conference (BMVC).
[24]
Kenkun Liu, Rongqi Ding, Zhiming Zou, Le Wang, and Wei Tang. 2020 a. A comprehensive study of weight sharing in graph networks for 3d human pose estimation. In European Conference on Computer Vision (ECCV).
[25]
Ruixu Liu, Ju Shen, He Wang, Chen Chen, Sen-ching Cheung, and Vijayan Asari. 2020 b. Attention mechanism exploits temporal contexts: Real-time 3D human pose reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26]
Diogo C Luvizon, David Picard, and Hedi Tabia. 2018. 2d/3d pose estimation and action recognition using multitask deep learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27]
James Martens. 2010. Deep learning via hessian-free optimization. In International Conference on Machine Learning (ICML).
[28]
Julieta Martinez, Rayat Hossain, Javier Romero, and James J Little. 2017. A simple yet effective baseline for 3d human pose estimation. In IEEE/CVF International Conference on Computer Vision (ICCV).
[29]
Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3d human pose estimation in the wild using improved cnn supervision. In International Conference on 3D Vision (3DV).
[30]
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Transactions on Graphics (SIGGRAPH), Vol. 36, 4 (2017), 1--14.
[31]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (ECCV).
[32]
Guanghan Ning, Jian Pei, and Heng Huang. 2020. Lighttrack: A generic framework for online top-down human pose tracking. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW).
[33]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Conference on Neural Information Processing Systems (NeurIPS).
[34]
Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35]
Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36]
Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, and Wenjun Zeng. 2019. Cross view fusion for 3d human pose estimation. In IEEE/CVF International Conference on Computer Vision (ICCV).
[37]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE Transactions on Neural Networks, Vol. 20, 1 (2008), 61--80.
[38]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019 a. Skeleton-based action recognition with directed graph neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019 b. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40]
Mingyi Shi, Kfir Aberman, Andreas Aristidou, Taku Komura, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2020. MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency. ACM Transactions on Graphics (SIGGRAPH Asia), Vol. 40, 1 (2020), 1--15.
[41]
Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42]
Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. 2017. Compositional human pose regression. In IEEE/CVF International Conference on Computer Vision (ICCV).
[43]
Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016a. Structured Prediction of 3D Human Pose with Deep Neural Networks. In British Machine Vision Conference (BMVC).
[44]
Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016b. Direct prediction of 3d body poses from motion compensated sequences. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45]
Zhi Tian, Chunhua Shen, and Hao Chen. 2020. Conditional convolutions for instance segmentation. In European Conference on Computer Vision (ECCV).
[46]
Hanyue Tu, Chunyu Wang, and Wenjun Zeng. 2020. VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment. In European Conference on Computer Vision (ECCV).
[47]
Bastian Wandt and Bodo Rosenhahn. 2019. Repnet: Weakly supervised training of an adversarial reprojection network for 3d human pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48]
Jianbo Wang, Kai Qiu, Houwen Peng, Jianlong Fu, and Jianke Zhu. 2019. AI coach: Deep human pose estimation and analysis for personalized athletic training assistance. In ACM International Conference on Multimedia (MM).
[49]
Jingbo Wang, Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2020. Motion guided 3d pose estimation from videos. In European Conference on Computer Vision (ECCV).
[50]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020).
[51]
Jingwei Xu, Zhenbo Yu, Bingbing Ni, Jiancheng Yang, Xiaokang Yang, and Wenjun Zhang. 2020. Deep kinematics analysis for monocular 3d human pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52]
Sijie Yan, Zhizhong Li, Yuanjun Xiong, Huahan Yan, and Dahua Lin. 2019. Convolutional sequence generation for skeleton-based action synthesis. In IEEE/CVF International Conference on Computer Vision (ICCV).
[53]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI Conference on Artificial Intelligence.
[54]
Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan Ngiam. 2019. CondConv: Conditionally Parameterized Convolutions for Efficient Inference. In Conference on Neural Information Processing Systems (NeurIPS).
[55]
Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, and Stephen Lin. 2020. Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. In European Conference on Computer Vision (ECCV).
[56]
Changgong Zhang, Fangneng Zhan, and Yuan Chang. 2021. Deep Monocular 3D Human Pose Estimation via Cascaded Dimension-Lifting. arXiv preprint arXiv:2104.03520 (2021).
[57]
Long Zhao, Xi Peng, Yu Tian, Mubbasir Kapadia, and Dimitris N Metaxas. 2019. Semantic graph convolutional networks for 3d human pose regression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[58]
Zhiming Zou, Kenkun Liu, Le Wang, and Wei Tang. 2020. High-order Graph Convolutional Networks for 3D Human Pose Estimation. In British Machine Vision Conference (BMVC).

Cited By

View all
  • (2025)ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose EstimationNeurocomputing10.1016/j.neucom.2024.128605611(128605)Online publication date: Jan-2025
  • (2025)Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in videoComputer Vision and Image Understanding10.1016/j.cviu.2024.104258251(104258)Online publication date: Feb-2025
  • (2025)ST-LineNet: A spatiotemporal network for real-time 3D Pose estimation in martial arts trainingAlexandria Engineering Journal10.1016/j.aej.2024.12.097117(136-147)Online publication date: Apr-2025
  • Show More Cited By

Index Terms

  1. Conditional Directed Graph Convolution for 3D Human Pose Estimation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '21: Proceedings of the 29th ACM International Conference on Multimedia
    October 2021
    5796 pages
    ISBN:9781450386517
    DOI:10.1145/3474085
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D human pose
    2. conditional directed graph convolution

    Qualifiers

    • Research-article

    Conference

    MM '21
    Sponsor:
    MM '21: ACM Multimedia Conference
    October 20 - 24, 2021
    Virtual Event, China

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)71
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)ESCAPE: Energy-based Selective Adaptive Correction for Out-of-distribution 3D Human Pose EstimationNeurocomputing10.1016/j.neucom.2024.128605611(128605)Online publication date: Jan-2025
    • (2025)Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in videoComputer Vision and Image Understanding10.1016/j.cviu.2024.104258251(104258)Online publication date: Feb-2025
    • (2025)ST-LineNet: A spatiotemporal network for real-time 3D Pose estimation in martial arts trainingAlexandria Engineering Journal10.1016/j.aej.2024.12.097117(136-147)Online publication date: Apr-2025
    • (2024)STAPFormer: A New 3D Human Pose Estimation Framework in Sports and HealthProceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3698587.3701367(1-10)Online publication date: 22-Nov-2024
    • (2024)SATPose: Improving Monocular 3D Pose Estimation with Spatial-aware Ground TactilityProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681654(6192-6201)Online publication date: 28-Oct-2024
    • (2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024
    • (2024)Towards Practical Human Motion Prediction with LiDAR Point CloudsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680720(7629-7638)Online publication date: 28-Oct-2024
    • (2024)A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera VideosProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658066(202-210)Online publication date: 30-May-2024
    • (2024)Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband RangingACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657465(1-11)Online publication date: 13-Jul-2024
    • (2024)Virtual Instrument Performances (VIP): A Comprehensive ReviewComputer Graphics Forum10.1111/cgf.1506543:2Online publication date: 30-Apr-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media