Abstract
Theskeleton structure of human body is a natural undirected graph. Being applied to 3D body pose estimation, graph convolutional network (GCN) has achieved good results. However, the vanilla GCN ignores the differences between joints and the connections between joints with different distances. Based on the above two problems, we propose High-order Local Connection Network (HLCN) for 3D human pose estimation. On one hand, different filters for different joints are assigned to produce different weights. On the other hand, the feature of multi-hop joints synthetically is gathered into HLCN. Furthermore, we study different methods of fusing these multi-hop features and compare their performance. The new network not only takes the differences between the joints in the human skeleton into consideration, but also captures the remote dependencies between human joints. The experiment suggests that this method is superior to vanilla GCN and achieve state-of-the-art performance. The average error on the H36M dataset is 50.9 mm.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Feichtenhofer C (2020) X3d:Expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 203–213. arXiv:2004.04730
Munro J, Damen D (2020) Multi-modal domain adaptation for fine-grained action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 122–132. https://doi.org/10.1109/CVPR42600.2020.00020
Yang C, Xu Y, Shi J, Dai B, Zhou B (2020) Temporal pyramid network for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 591–600. https://doi.org/10.1109/CVPR42600.2020.00067
Porcheron M, Fischer J.E, Reeves S, Sharples S (2018) Voice interfaces in everyday life. In: Proceedings of the 2018 CHI conference on human factors in computing systems, pp 1–12. https://doi.org/10.1145/3X00000.1735743174214
Wu S, Wang Z, Shen B, Wang J-H, Dongdong L (2020) Human-computer interaction based on machine vision of a smart assembly workbench. Assembly Automation. https://doi.org/10.1108/AA-10-2018-0170
Pustejovsky J, Krishnaswamy N (2021) Embodied human computer interaction. KI-Künstliche Intelligenz. https://doi.org/10.1007/s13218-021-00727-5
Chan C, Ginosar S, Zhou T, Efros A.A (2019) Everybody dance now. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5933–5942. arXiv:1808.07371v2
Ma L, Jia X, Sun Q, Schiele B, Tuytelaars T, Van Gool L (2017) Pose guided person image generation. In: Proceedings of the 31st international conference on neural information processing systems. NIPS’17. arXiv:1705.09368v1. Curran Associates Inc., Red Hook, pp 405–415
Siarohin A, Sangineto E, Lathuiliere S, Sebe N (2018) Deformable gans for pose-based human image generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3408–3416. https://doi.org/10.1109/CVPR.2018.00359
Moon G, Lee K.M (2020) I2l-meshnet:Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In: Computer Vision–ECCV 2020:16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, Springer, pp 752–768. arXiv:2008.03713
Pavlakos G, Zhou X, Daniilidis K (2018) Ordinal depth supervision for 3d human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7307– 7316. https://doi.org/10.1109/CVPR.2018.00763
Pavlakos G, Zhou X, Derpanis K.G, Daniilidis K (2017) Coarse-to-fine volumetric prediction for single-image 3d human pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7025–7034. https://doi.org/10.1109/CVPR.2017.139
Li C, Lee G.H (2019) Generating multiple hypotheses for 3d human pose estimation with mixture density network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9887–9895. arXiv:1904.05547
Wang M, Chen X, Liu W, Qian C, Lin L, Ma L (2018) Drpose3d:Depth ranking in 3d human pose estimation. In: Proceedings of the 27th international joint conference on artificial intelligence. IJCAI’18, pp 978–984. arXiv:1805.08973
Martinez J, Hossain R, Romero J, Little J.J (2017) A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 2640–2649. https://doi.org/10.1109/ICCV.2017.288
Tekin B, Márquez-Neila P, Salzmann M, Fua P (2017) Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3941–3950. arXiv:1611.05708
Zhou K, Han X, Jiang N, Jia K, Lu J (2019) Hemlets pose:Learning part-centric heatmap triplets for accurate 3d human pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2344–2353. https://doi.org/10.1109/ICCV.2019.00243
Wu Y, Jiang X, Fang Z, Gao Y, Fujita H (2021) Multi-modal 3d object detection by 2d-guided precision anchor proposal and multi-layer fusion. Appl Soft Comput 108:107405. https://doi.org/10.1016/j.asoc.2021.107405
Xiao J, Li H, Qu G, Fujita H, Cao Y, Zhu J, Huang C (2021) Hope:heatmap and offset for pose estimation. Journal of Ambient Intelligence and Humanized Computing, pp 1–13. https://doi.org/10.1007/s12652-021-03124-w
Kipf T.N, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. arXiv:1609.02907
Ci H, Wang C, Ma X, Wang Y (2019) Optimizing network structure for 3d human pose estimation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2262–2271. https://doi.org/10.1109/ICCV.2019.00235
Liu K, Ding R, Zou Z, Wang L, Tang W (2020) A comprehensive study of weight sharing in graph networks for 3d human pose estimation. In: European conference on computer vision, Springer, pp 318–334. https://doi.org/10.1007/978-3-030-58607-2_19
Zhao L, Peng X, Tian Y, Kapadia M, Metaxas D.N (2019) Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3425–3435. https://doi.org/10.1109/CVPR.2019.00354
Xu T, Takano W (2021) Graph stacked hourglass networks for 3d human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16105–16114. arXiv:2103.16385
Liu K, Zou Z, Tang W (2020) Learning global pose features in graph convolutional networks for 3d human pose estimation. In: Proceedings of the Asian conference on computer vision. https://accv2020.github.io/miniconf/poster_167.html
Liu J, Rojas J, Li Y, Liang Z, Guan Y, Xi N, Zhu H (2021) A graph attention spatio-temporal convolutional network for 3d human pose estimation in video. In: 2021 IEEE international conference on robotics and automation (ICRA), IEEE, pp 3374–3380. https://doi.org/10.1109/ICRA48506.2021.9561605
Cai Y, Ge L, Liu J, Cai J, Cham T.-J, Yuan J, Thalmann NM (2019) Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2272–2281. https://doi.org/10.1109/ICCV.2019.00236
Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human3.6m:Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36 (7):1325–1339. https://doi.org/10.1109/TPAMI.2013.248
Bruna J, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and locally connected networks on graphs. In: International conference on learning representations (ICLR2014), CBLS, April 2014. arXiv:1312.6203
Xu B, Shen H, Cao Q, Qiu Y, Cheng X (2019) Graph wavelet neural network. In: International conference on learning representations. arXiv:1904.07785v1
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems, vol 29. arXiv:1606.09375v2
Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein M.M (2017) Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5115–5124. arXiv:1611.08402
Gilmer J, Schoenholz S.S, Riley P.F, Vinyals O, Dahl G.E (2017) Neural message passing for quantum chemistry. In: International conference on machine learning, PMLR, pp 1263–1272. https://doi.org/10.5555/3305381.3305512
Mehta D, Rhodin H, Casas D, Fua P, Sotnychenko O, Xu W, Theobalt C (2017) Monocular 3d human pose estimation in the wild using improved cnn supervision. In: 2017 international conference on 3D vision (3DV), IEEE, pp 506–516. https://doi.org/10.1109/3DV.2017.00064
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499. arXiv:1603.06937
Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation:New benchmark and state of the art analysis. In: IEEE Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2014.471
Pavllo D, Feichtenhofer C, Grangier D, Auli M (2019) 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7753–7762. https://doi.org/10.1109/CVPR.2019.00794
Kingma D.P, Ba J (2015) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings Bengio Y, LeCun Y (eds). arXiv:1412.6980
Luvizon DC, Picard D, Tabia H (2018) 2d/3d pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5137–5146. arXiv:1802.09232
Sharma S, Varigonda PT, Bindal P, Sharma A, Jain A (2019) Monocular 3d human pose estimation by generation and ordinal ranking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2325–2334. arXiv:1904.01324
Wang J, Huang S, Wang X, Tao D (2019) Not all parts are created equal:3d pose estimation by modeling bi-directional dependencies of body parts. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7771–7780. arXiv:1905.07862
Zou Z, Liu K, 0003 LW, Tang W (2020) High-order graph convolutional networks for 3d human pose estimation. In: BMVC. https://www.evl.uic.edu/pubs/2518
Fang H-S, Xu Y, Wang W, Liu X, Zhu S-C (2018) Learning pose grammar to encode human body configuration for 3d pose estimation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32. arXiv:1710.06513
Yang W, Ouyang W, Wang X, Ren J, Li H, Wang X (2018) 3d human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5255–5264. https://doi.org/10.1109/CVPR.2018.00551
Ci H, Ma X, Wang C, Wang Y. (2020) Locally connected network for monocular 3d human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.3019139
Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In: Bmvc, vol 2, pp 5. https://doi.org/10.5244/C.24.12. Citeseer
Acknowledgements
This work was supported in part by the Key Program of NSFC (Grant No.U1908214), Dalian University Scientific Research Platform Project (No. 202101YB03), Special Project of Central Government Guiding Local Science and Technology Development (Grant No. 2021JH6/10500140), Program for the Liaoning Distinguished Professor, Program for Innovative Research Team in University of Liaoning Province, Dalian and Dalian University, and in part by the Science and Technology Innovation Fund of Dalian (Grant No. 2020JJ25CY001).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, W., Zhou, D., Zhang, Q. et al. High-order local connection network for 3D human pose estimation based on GCN. Appl Intell 52, 15690–15702 (2022). https://doi.org/10.1007/s10489-022-03312-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03312-x