Article

SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach

Authors:

Stephen LinAuthors Info & Claims

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV

Pages 507 - 523

https://doi.org/10.1007/978-3-030-58568-6_30

Published: 23 August 2020 Publication History

Abstract

Human poses that are rare or unseen in a training set are challenging for a network to predict. Similar to the long-tailed distribution problem in visual recognition, the small number of examples for such poses limits the ability of networks to model them. Interestingly, local pose distributions suffer less from the long-tail problem, i.e., local joint configurations within a rare pose may appear within other poses in the training set, making them less rare. We propose to take advantage of this fact for better generalization to rare and unseen poses. To be specific, our method splits the body into local regions and processes them in separate network branches, utilizing the property that a joint’s position depends mainly on the joints within its local body region. Global coherence is maintained by recombining the global context from the rest of the body into each branch as a low-dimensional vector. With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference. The proposed split-and-recombine approach, called SRNet, can be easily adapted to both single-image and temporal models, and it leads to appreciable improvements in the prediction of rare and unseen poses.

References

[1]

Biswas, S., Sinha, S., Gupta, K., Bhowmick, B.: Lifting 2D human pose to 3D: a weakly supervised approach. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2019)

[2]

Cai, Y., et al.: Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2272–2281 (2019)

[3]

Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: 2016 4th International Conference on 3D Vision (3DV), pp. 479–488. IEEE (2016)

[4]

Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018)

[5]

Ci, H., Wang, C., Ma, X., Wang, Y.: Optimizing network structure for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2262–2271 (2019)

[6]

Dabral, R., Mundhada, A., Kusupati, U., Afaque, S., Sharma, A., Jain, A.: Learning 3D human pose from structure and motion. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 668–683 (2018)

[7]

Fang, H.S., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: 32nd AAAI Conference on Artificial Intelligence (2018)

[8]

Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., Theobalt, C.: In the wild human pose estimation using explicit 2D features and intermediate 3D representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10905–10914 (2019)

[9]

Huang, C., Li, Y., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

[10]

Huang, F., Zeng, A., Liu, M., Lai, Q., Xu, Q.: Deepfuse: an imu-aware network for real-time 3D human pose estimation from multi-view image. arXiv preprint arXiv:1912.04071 (2019)

[11]

Ionescu C, Papava D, Olaru V, and Sminchisescu C Human3.6m large scale datasets and predictive methods for 3D human sensing in natural environments IEEE Trans. Pattern Anal. Mach. Intell. 2014 36 7 1325-1339

Digital Library

[12]

Jahangiri, E., Yuille, A.L.: Generating multiple diverse hypotheses for human 3D pose consistent with 2D joint detections. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 805–814 (2017)

[13]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

[14]

Lee, K., Lee, I., Lee, S.: Propagating lstm: 3D pose estimation based on joint interdependency. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 119–135 (2018)

[15]

Lin, J., Lee, G.H.: Trajectory space factorization for deep video-based 3D human pose estimation. arXiv preprint arXiv:1908.08289 (2019)

[16]

Luo, C., Chu, X., Yuille, A.: Orinet: a fully convolutional network for 3D human pose estimation. arXiv preprint arXiv:1811.04989 (2018)

[17]

Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146 (2018)

[18]

Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)

[19]

Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved cnn supervision. In: 2017 International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)

[20]

Mehta D et al. Vnect: real-time 3D human pose estimation with a single RGB camera ACM Trans. Graph. (TOG) 2017 36 4 1-14

Digital Library

[21]

Newell A, Yang K, and Deng J Leibe B, Matas J, Sebe N, and Welling M Stacked hourglass networks for human pose estimation Computer Vision – ECCV 2016 2016 Cham Springer 483-499

[22]

Park, S., Kwak, N.: 3D human pose estimation with relational networks. arXiv preprint arXiv:1805.08961 (2018)

[23]

Pavllo, D., Feichtenhofer, C., Grangier, D., Auli, M.: 3D human pose estimation in video with temporal convolutions and semi-supervised training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7753–7762 (2019)

[24]

Pham, H.H., Salmane, H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: A unified deep framework for joint 3D pose estimation and action recognition from a single RGB camera. arXiv preprint arXiv:1907.06968 (2019)

[25]

Pishchulin, L., Jain, A., Andriluka, M., Thorm ahlen, T., Schiele, B.: Articulated people detection and pose estimation: reshaping the future. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2012)

[26]

Rayat Imtiaz Hossain, M., Little, J.J.: Exploiting temporal information for 3D human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 68–84 (2018)

[27]

Rogez, G., Schmid, C.: Mocap-guided data augmentation for 3D pose estimation in the wild. In: Advances in Neural Information Processing Systems, pp. 3108–3116 (2016)

[28]

Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

[29]

Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)

[30]

Véges M, Varga V, and Lőrincz A 3D human pose estimation with siamese equivariant embedding Neurocomputing 2019 339 194-201

Digital Library

[31]

Wandt, B., Rosenhahn, B.: Repnet: weakly supervised training of an adversarial reprojection network for 3D human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7782–7791 (2019)

[32]

Wang, L., et al.: Generalizing monocular 3D human pose estimation in the wild. arXiv preprint arXiv:1904.05512 (2019)

[33]

Wang, Y.X., Ramanan, D., Hebert, M.: Learning to model the tail. In: Conference on Neural Information Processing Systems (2017)

[34]

Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5255–5264 (2018)

[35]

Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)

[36]

Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3D human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)

Cited By

Karácsony TJeni LDe la Torre FCunha J(2024)Deep learning methods for single camera based clinical in-bed movement action recognitionImage and Vision Computing10.1016/j.imavis.2024.104928143:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.imavis.2024.104928
Manzur SHayes W(2024)Human Pose Recognition via Occlusion-Preserving Abstract ImagesComputer Vision – ECCV 202410.1007/978-3-031-73007-8_18(304-321)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-73007-8_18
Zhao QZheng CLiu MChen COh ANaumann TGloberson ASaenko KHardt MLevine S(2023)A single 2D pose with context is worth hundreds for 3D human pose estimationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667315(27394-27413)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667315
Show More Cited By

Recommendations

Semi- and weakly-supervised human pose estimation
Highlights
- Human pose estimation is achieved by semi- and weakly-supervised learning.
- Semi-...
Graphical abstract

Display Omitted

Abstract
For human pose estimation in still images, this paper proposes three semi- and weakly-supervised learning schemes. While recent advances of convolutional neural networks improve human pose estimation using supervised training data, our ...
A survey of human pose estimation

Summarization of methods on human pose estimation in recent years.Conclusion of the traditional human pose estimation methods.Illustrated based on a two-stage framework.Comprehensive comparisons are given based on the open source methods. Estimating ...
Towards Generalization of 3D Human Pose Estimation in the Wild
Pattern Recognition. ICPR International Workshops and Challenges
Abstract
In this paper, we propose 3DBodyTex.Pose, a dataset that addresses the task of 3D human pose estimation in-the-wild. Generalization to in-the-wild images remains limited due to the lack of adequate datasets. Existent ones are usually collected in ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV

Aug 2020

842 pages

ISBN:978-3-030-58567-9

DOI:10.1007/978-3-030-58568-6

Editors:
Andrea Vedaldi
University of Oxford, Oxford, UK
,
Horst Bischof
Graz University of Technology, Graz, Austria
,
Thomas Brox
University of Freiburg, Freiburg im Breisgau, Germany
,
Jan-Michael Frahm
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

© Springer Nature Switzerland AG 2020.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 23 August 2020

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Karácsony TJeni LDe la Torre FCunha J(2024)Deep learning methods for single camera based clinical in-bed movement action recognitionImage and Vision Computing10.1016/j.imavis.2024.104928143:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.imavis.2024.104928
Manzur SHayes W(2024)Human Pose Recognition via Occlusion-Preserving Abstract ImagesComputer Vision – ECCV 202410.1007/978-3-031-73007-8_18(304-321)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-73007-8_18
Zhao QZheng CLiu MChen COh ANaumann TGloberson ASaenko KHardt MLevine S(2023)A single 2D pose with context is worth hundreds for 3D human pose estimationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667315(27394-27413)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667315
Xie YHong CXie RLi J(2023)A Global-Part-Local Approach for 3D Human Pose Estimation from Single-View ImagesProceedings of the 4th International Conference on Artificial Intelligence and Computer Engineering10.1145/3652628.3652701(443-448)Online publication date: 17-Nov-2023
https://dl.acm.org/doi/10.1145/3652628.3652701
Hardy PDasmahapatra SKim H(2023)Optimising 2D Pose Representations: Improving Accuracy, Stability and Generalisability Within Unsupervised 2D-3D Human Pose EstimationProceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production10.1145/3626495.3626505(1-9)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3626495.3626505
Jiang JChen JGuo YMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token CompletionProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547796(5123-5131)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3547796
Zhang JChen YTu ZMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Uncertainty-Aware 3D Human Pose Estimation from Monocular VideoProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547773(5102-5113)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3547773
Manesco JMarana A(2022)A Survey of Recent Advances on Two-Step 3D Human Pose EstimationIntelligent Systems10.1007/978-3-031-21689-3_20(266-281)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-21689-3_20
Cai ZRen DZeng ALin ZYu TWang WFan XGao YYu YPan LHong FZhang MLoy CYang LLiu Z(2022)HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and ModelingComputer Vision – ECCV 202210.1007/978-3-031-20071-7_33(557-577)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-20071-7_33
Huang LLiang JDeng W(2022)DH-AUG: DH Forward Kinematics Model Driven Augmentation for 3D Human Pose EstimationComputer Vision – ECCV 202210.1007/978-3-031-20068-7_25(436-453)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-20068-7_25
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents