Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-58568-6_30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach

Published: 23 August 2020 Publication History

Abstract

Human poses that are rare or unseen in a training set are challenging for a network to predict. Similar to the long-tailed distribution problem in visual recognition, the small number of examples for such poses limits the ability of networks to model them. Interestingly, local pose distributions suffer less from the long-tail problem, i.e., local joint configurations within a rare pose may appear within other poses in the training set, making them less rare. We propose to take advantage of this fact for better generalization to rare and unseen poses. To be specific, our method splits the body into local regions and processes them in separate network branches, utilizing the property that a joint’s position depends mainly on the joints within its local body region. Global coherence is maintained by recombining the global context from the rest of the body into each branch as a low-dimensional vector. With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference. The proposed split-and-recombine approach, called SRNet, can be easily adapted to both single-image and temporal models, and it leads to appreciable improvements in the prediction of rare and unseen poses.

References

[1]
Biswas, S., Sinha, S., Gupta, K., Bhowmick, B.: Lifting 2D human pose to 3D: a weakly supervised approach. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2019)
[2]
Cai, Y., et al.: Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2272–2281 (2019)
[3]
Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: 2016 4th International Conference on 3D Vision (3DV), pp. 479–488. IEEE (2016)
[4]
Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)
[5]
Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2262–2271 (2019)
[6]
Dabral, R., Mundhada, A., Kusupati, U., Afaque, S., Sharma, A., Jain, A.: Learning 3D human pose from structure and motion. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 668–683 (2018)
[7]
Fang, H.S., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: 32nd AAAI Conference on Artificial Intelligence (2018)
[8]
Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2D features and intermediate 3D representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10905–10914 (2019)
[9]
Huang, C., Li, Y., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
[10]
Huang, F., Zeng, A., Liu, M., Lai, Q., Xu, Q.: Deepfuse: an imu-aware network for real-time 3D human pose estimation from multi-view image. arXiv preprint arXiv:1912.04071 (2019)
[11]
Ionescu C, Papava D, Olaru V, and Sminchisescu C Human3.6m large scale datasets and predictive methods for 3D human sensing in natural environments IEEE Trans. Pattern Anal. Mach. Intell. 2014 36 7 1325-1339
[12]
Jahangiri, E., Yuille, A.L.: Generating multiple diverse hypotheses for human 3D pose consistent with 2D joint detections. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 805–814 (2017)
[13]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
[14]
Lee, K., Lee, I., Lee, S.: Propagating lstm: 3D pose estimation based on joint interdependency. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–135 (2018)
[15]
Lin, J., Lee, G.H.: Trajectory space factorization for deep video-based 3D human pose estimation. arXiv preprint arXiv:1908.08289 (2019)
[16]
Luo, C., Chu, X., Yuille, A.: Orinet: a fully convolutional network for 3D human pose estimation. arXiv preprint arXiv:1811.04989 (2018)
[17]
Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146 (2018)
[18]
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)
[19]
Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved cnn supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)
[20]
Mehta D et al. Vnect: real-time 3D human pose estimation with a single RGB camera ACM Trans. Graph. (TOG) 2017 36 4 1-14
[21]
Newell A, Yang K, and Deng J Leibe B, Matas J, Sebe N, and Welling M Stacked hourglass networks for human pose estimation Computer Vision – ECCV 2016 2016 Cham Springer 483-499
[22]
Park, S., Kwak, N.: 3D human pose estimation with relational networks. arXiv preprint arXiv:1805.08961 (2018)
[23]
Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)
[24]
Pham, H.H., Salmane, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: A unified deep framework for joint 3D pose estimation and action recognition from a single RGB camera. arXiv preprint arXiv:1907.06968 (2019)
[25]
Pishchulin, L., Jain, A., Andriluka, M., Thorm ahlen, T., Schiele, B.: Articulated people detection and pose estimation: reshaping the future. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012)
[26]
Rayat Imtiaz Hossain, M., Little, J.J.: Exploiting temporal information for 3D human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 68–84 (2018)
[27]
Rogez, G., Schmid, C.: Mocap-guided data augmentation for 3D pose estimation in the wild. In: Advances in Neural Information Processing Systems, pp. 3108–3116 (2016)
[28]
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
[29]
Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)
[30]
Véges M, Varga V, and Lőrincz A 3D human pose estimation with siamese equivariant embedding Neurocomputing 2019 339 194-201
[31]
Wandt, B., Rosenhahn, B.: Repnet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7782–7791 (2019)
[32]
Wang, L., et al.: Generalizing monocular 3D human pose estimation in the wild. arXiv preprint arXiv:1904.05512 (2019)
[33]
Wang, Y.X., Ramanan, D., Hebert, M.: Learning to model the tail. In: Conference on Neural Information Processing Systems (2017)
[34]
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5255–5264 (2018)
[35]
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)
[36]
Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)

Cited By

View all
  • (2024)Deep learning methods for single camera based clinical in-bed movement action recognitionImage and Vision Computing10.1016/j.imavis.2024.104928143:COnline publication date: 1-Mar-2024
  • (2024)Human Pose Recognition via Occlusion-Preserving Abstract ImagesComputer Vision – ECCV 202410.1007/978-3-031-73007-8_18(304-321)Online publication date: 29-Sep-2024
  • (2023)A single 2D pose with context is worth hundreds for 3D human pose estimationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667315(27394-27413)Online publication date: 10-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV
Aug 2020
842 pages
ISBN:978-3-030-58567-9
DOI:10.1007/978-3-030-58568-6

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 August 2020

Author Tags

  1. Human pose estimation
  2. 2D to 3D
  3. Long-tailed distribution

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Deep learning methods for single camera based clinical in-bed movement action recognitionImage and Vision Computing10.1016/j.imavis.2024.104928143:COnline publication date: 1-Mar-2024
  • (2024)Human Pose Recognition via Occlusion-Preserving Abstract ImagesComputer Vision – ECCV 202410.1007/978-3-031-73007-8_18(304-321)Online publication date: 29-Sep-2024
  • (2023)A single 2D pose with context is worth hundreds for 3D human pose estimationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667315(27394-27413)Online publication date: 10-Dec-2023
  • (2023)A Global-Part-Local Approach for 3D Human Pose Estimation from Single-View ImagesProceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering10.1145/3652628.3652701(443-448)Online publication date: 17-Nov-2023
  • (2023)Optimising 2D Pose Representations: Improving Accuracy, Stability and Generalisability Within Unsupervised 2D-3D Human Pose EstimationProceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production10.1145/3626495.3626505(1-9)Online publication date: 30-Nov-2023
  • (2022)A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token CompletionProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547796(5123-5131)Online publication date: 10-Oct-2022
  • (2022)Uncertainty-Aware 3D Human Pose Estimation from Monocular VideoProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547773(5102-5113)Online publication date: 10-Oct-2022
  • (2022)A Survey of Recent Advances on Two-Step 3D Human Pose EstimationIntelligent Systems10.1007/978-3-031-21689-3_20(266-281)Online publication date: 28-Nov-2022
  • (2022)HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and ModelingComputer Vision – ECCV 202210.1007/978-3-031-20071-7_33(557-577)Online publication date: 23-Oct-2022
  • (2022)DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose EstimationComputer Vision – ECCV 202210.1007/978-3-031-20068-7_25(436-453)Online publication date: 23-Oct-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media