Abstract
Since human pose can be naturally represented by a graph, graph convolutional networks (GCNs) have recently been proposed for 3D human pose estimation and achieved promising results. But most GCN-based methods use vanilla graph convolution which aggregates features of 1-hop neighbors and long-range dependencies between joints can only be captured by stacking multiple layers of graph convolution. To alleviate this problem, we propose a multi-scale graph convolution to aggregate features of neighbors at different distances and apply it to nodes with specified neighbor types. We further propose a hierarchical-body-pooling to aggregate and share body-level and body-part-level context information. Based on these components, we finally develop a light-weighted GCN for 3D pose lifting by repeatedly stacking a residual block of multi-scale graph convolution and a hierarchical-body-pooling layer. The experimental results on Human3.6M dataset indicate that our network can achieve state-of-the-art performance with much less model complexity.
Similar content being viewed by others
References
Abu-El-Haija, S., Perozzi, B., Kapoor, A., Harutyunyan, H., Alipourfard, N., Lerman, K., Steeg, G.V., Galstyan, A.: Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. ICML 97, 21–29 (2019)
Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2006). https://doi.org/10.1109/TPAMI.2006.21
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. CVPR (2014). https://doi.org/10.1109/CVPR.2014.471
Bruna, J., Zaremba, W., Szlam, A.D., LeCun, Y.: Spectral networks and locally connected networks on graphs. In: ICLR (2014)
Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.J., Yuan, J., Magnenat-Thalmann, N.: Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00236
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00742
Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3d human pose estimation. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00235
Drover, D., Chen, C.H., Agrawal, A., Tyagi, A., Phuoc Huynh, C.: Can 3d pose be learned from 2d projections alone? ECCV 11132, 78–94 (2018). https://doi.org/10.1007/978-3-030-11018-5_7
Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., Gómez-Bombarelli, R., Hirzel, T., Aspuru-Guzik, A., Adams, R.P.: Convolutional networks on graphs for learning molecular fingerprints. In: NIPS, pp. 2224–2232. (2015)
Fang, H., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning knowledge-guided pose grammar machine for 3d human pose estimation. (2017)
Grinciunaite, A., Gudi, A., Tasli, E., Den Uyl, M.: Human pose estimation in space and time using 3d cnn. ECCV Worksh. 9915, 32–39 (2016). https://doi.org/10.1007/978-3-319-49409-8_5
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)
Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. (2015)
Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3d human pose estimation. ECCV 11214, 68–84 (2018). https://doi.org/10.1007/978-3-030-01249-6_5
Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. ICCV (2011). https://doi.org/10.1109/ICCV.2011.6126500
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human 3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013). https://doi.org/10.1109/TPAMI.2013.248
Kazi, A., Shekarforoush, S., Krishna, S.A., Burwinkel, H., Vivar, G., Kortüm, K., Ahmadi, S.A., Albarqouni, S., Navab, N.: Inceptiongcn: Receptive field aware graph convolutional network for disease prediction. IPMI 11492, 73–85 (2019). https://doi.org/10.1007/978-3-030-20351-1_6
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Kocabas, M., Karagoz, S., Akbas, E.: Self-supervised learning of 3d human pose using multi-view geometry. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00117
Li, Q., Han, Z., Wu, X.M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: AAAI, pp. 3538–3545 (2018)
Li, S., Chan, A.B.: 3d human pose estimation from monocular images with deep convolutional neural network. ACCV 9004, 332–347 (2014)
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. In: ICLR (2016)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. ECCV 8693, 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, K., Ding, R., Zou, Z., Wang, L., Tang, W.: A comprehensive study of weight sharing in graph networks for 3d human pose estimation. ECCV 12355, 318–334 (2020). https://doi.org/10.1007/978-3-030-58607-2_19
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. ICCV (2017). https://doi.org/10.1109/ICCV.2017.288
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., Theobalt, C.: Monocular 3d human pose estimation in the wild using improved cnn supervision. 3D Vis. (2017). https://doi.org/10.1109/3DV.2017.00064
Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, M., Seidel, H.P., Xu, W., Casas, D., Theobalt, C.: Vnect: Real-time 3d human pose estimation with a single rgb camera. ACM Trans. Graph. 36(4), 4411–4414 (2017). https://doi.org/10.1145/3072959.3073596
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. ECCV 9912, 483–499 (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. ICML 48, 2014–2023 (2016)
Onishi, K., Takiguchi, T., Ariki, Y.: 3d human posture estimation using the hog features from monocular image. ICPR (2008). https://doi.org/10.1109/ICPR.2008.4761608
Park, S., Hwang, J., Kwak, N.: 3d human pose estimation using convolutional neural networks with 2d pose information. ECCV Worksh. 9915, 156–169 (2016). https://doi.org/10.1007/978-3-319-49409-8_15
Park, S., Kwak, N.: 3d human pose estimation with relational networks. In: BMVC, p. 129 (2018)
Pavlakos, G., Zhou, X., Daniilidis, K.: Ordinal depth supervision for 3d human pose estimation. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00763
Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. CVPR (2017). https://doi.org/10.1109/CVPR.2017.139
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3d human pose estimation in video with temporal convolutions and semi-supervised training. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00794
Sharma, S., Varigonda, P.T., Bindal, P., Sharma, A., Jain, A.: Monocular 3d human pose estimation by generation and ordinal ranking. ICCV (2019). https://doi.org/10.1109/ICCV.2019.00241
Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. ICCV (2017). https://doi.org/10.1109/ICCV.2017.284
Tekin, B., Márquez-Neila, P., Salzmann, M., Fua, P.: Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: ICCV, pp. 3941–3950 (2017)
Tome, D., Russell, C., Agapito, L.: Lifting from the deep: Convolutional 3d pose estimation from a single image. CVPR (2017). https://doi.org/10.1109/CVPR.2017.603
Wang, M., Chen, X., Liu, W., Qian, C., Lin, L., Ma, L.: Drpose3d: Depth ranking in 3d human pose estimation. IJCAI (2018). https://doi.org/10.24963/ijcai.2018/136
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3d human pose estimation in the wild by adversarial learning. CVPR (2018). https://doi.org/10.1109/CVPR.2018.00551
Zeng, A., Sun, X., Huang, F., Liu, M., Xu, Q., Lin, S.: Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. ECCV 12359, 507–523 (2020). https://doi.org/10.1007/978-3-030-58568-6_30
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. CVPR (2019). https://doi.org/10.1109/CVPR.2019.00354
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: A weakly-supervised approach. ICCV (2017). https://doi.org/10.1109/ICCV.2017.51
Zhu, Q., Du, B., Yan, P.: Multi-hop convolutions on weighted graphs. (2019)
Zou, Z., Liu, K., Wang, L., Tang, W.: High-order graph convolutional networks for 3d human pose estimation. In: BMVC (2020)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (U20B2063), the Sichuan Science and Technology Program (2020YFS0057), and the Fundamental Research Funds for the Central Universities (ZYGX2019Z015).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Huang, K., Sui, T. & Wu, H. 3D human pose estimation with multi-scale graph convolution and hierarchical body pooling. Multimedia Systems 28, 403–412 (2022). https://doi.org/10.1007/s00530-021-00808-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-021-00808-3