Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Hand pose estimation based on regression method from monocular RGB cameras for handling occlusion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Hand pose estimation is a significant research topic for various computer vision applications. Nonetheless, reliable and robust pose estimation with existing methods remains challenging due to the complex anatomy of the hand and the varying shapes and sizes of hands. The traditional approach involved using depth sensors or multi-camera setups. However, with the advent of deep learning, there has been a shift towards using deep neural networks to learn, grasp, and manipulate objects accurately. In this paper, we propose an end-to-end framework called “ResUnet network” that can efficiently detect and estimate the position of a human hand from a monocular RGB image. Our proposal aims to handle occlusion issue during the hand-object interaction in real-time. The ResUnet architecture includes three modules, feature extraction, 2D pose regression, and 3D hand estimation. The first module extracts the feature maps of the cropped hand to generate 2D heatmaps. The second module uses the previous outputs to regress the 2D pose coordinates employing Latent Heatmaps Representation (LHR). The last module concatenates the intermediate features with the upsampling block to process 3D regression and predict the 3D bones using a tree structure of the hand. Quantitative and qualitative results on three datasets GANerated, SynthHands, and Stereo Hand Pose Tracking Benchmark (STB), consistently demonstrate that our regression approach outperforms the current state-of-the-art hand pose estimation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Explore related subjects

Find the latest articles, discoveries, and news in related topics.

Data Availibility Statement

Three available dataset [24, 26, 46] are used in this article are published in the web.

References

  1. Athitsos, V., Sclaroff, S.: Estimating 3d hand pose from a cluttered image. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., vol. 2, p. 432 (2003). IEEE

  2. Baek, S., Kim, K.I., Kim, T.-K.: Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1067–1076 (2019)

  3. Boukhayma, A., Bem, R.d., Torr, P.H.: 3d hand shape and pose from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10843–10852 (2019)

  4. Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3d hand pose estimation from monocular rgb images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 666–682 (2018)

  5. Chen, L., Lin, S.-Y., Xie, Y., Tang, H., Xue, Y., Xie, X., Lin, Y.-Y., Fan, W.: Generating realistic training images based on tonality-alignment generative adversarial networks for hand pose estimation. arXiv preprint arXiv:1811.09916 (2018)

  6. Dibra, E., Melchior, S., Balkis, A., Wolf, T., Oztireli, C., Gross, M.: Monocular rgb hand pose inference from unsupervised refinable nets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1075–1085 (2018)

  7. Feng Q, Shum HP, Morishima S (2020) Resolving hand-object occlusion for mixed reality with joint deep learning and model optimization. Computer Animation and Virtual Worlds 31(4–5):1956

  8. Gao C, Yang Y, Li W (2022) 3d interacting hand pose and shape estimation from a single rgb image. Neurocomputing 474:25–36

    Article  Google Scholar 

  9. Ge, L., Liang, H., Yuan, J., Thalmann, D.: 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1991–2000 (2017)

  10. Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3593–3601 (2016)

  11. Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., Yuan, J.: 3d hand shape and pose estimation from a single rgb image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10833–10842 (2019)

  12. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Communications of the ACM 63(11):139–144

    Article  MathSciNet  Google Scholar 

  13. Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M.J., Laptev, I., Schmid, C.: Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11807–11816 (2019)

  14. He, Y., Hu, W., Yang, S., Qu, X., Wan, P., Guo, Z.: 3d hand pose estimation in the wild via graph refinement under adversarial learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  16. Iqbal, U., Molchanov, P., Gall, T.B.J., Kautz, J.: Hand pose estimation via latent 2.5 d heatmap regression. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 118–134 (2018)

  17. Kourbane, I., Genc, Y.: A graph-based approach for absolute 3d hand pose estimation using a single rgb image. Applied Intelligence, 1–16 (2022)

  18. Kulon, D., Wang, H., Güler, R.A., Bronstein, M., Zafeiriou, S.: Single image 3d hand reconstruction with mesh convolutions. arXiv preprint arXiv:1905.01326 (2019)

  19. Le, V.-H., Nguyen, T.-T., Tran, N.-A., Pham, T.-C.: Openpose’s evaluation in the video traditional martial arts presentation. In: 2019 19th International Symposium on Communications and Information Technologies (ISCIT), pp. 76–81 (2019). IEEE

  20. Li, S., Wang, H., Lee, D.: Hand pose estimation for hand-object interaction cases using augmented autoencoder. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 993–999 (2020). IEEE

  21. Li M, Wang J, Sang N (2021) Latent distribution-based 3d hand pose estimation from monocular rgb images. IEEE Transactions on Circuits and Systems for Video Technology 31(12):4883–4894

    Article  Google Scholar 

  22. Mofarreh-Bonab M, Seyedarabi H, Mozaffari Tazehkand B, Kasaei S (2022) 3d hand pose estimation using rgbd images and hybrid deep learning networks. The Visual Computer 38(6):2023–2032

    Article  Google Scholar 

  23. Moon, G., Chang, J.Y., Lee, K.M.: V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5079–5088 (2018)

  24. Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2018)

  25. Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3d hand tracking from monocular rgb. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2018)

  26. Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1154–1163 (2017)

  27. Oberweger M, Wohlhart P, Lepetit V (2019) Generalized feedback loop for joint hand-object pose estimation. IEEE transactions on pattern analysis and machine intelligence 42(8):1898–1912

    Article  PubMed  Google Scholar 

  28. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)

  29. Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1113 (2014)

  30. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  31. Romero, J., Kjellström, H., Kragic, D.: Hands in action: real-time 3d reconstruction of hands in interaction with objects. In: 2010 IEEE International Conference on Robotics and Automation, pp. 458–463 (2010). IEEE

  32. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: Modeling and capturing hands and bodies together. arXiv preprint arXiv:2201.02610 (2022)

  33. Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 89–98 (2018)

  34. Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using rgb and depth data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2456–2463 (2013)

  35. Su, Y., Rambach, J., Minaskan, N., Lesur, P., Pagani, A., Stricker, D.: Deep multi-state object pose estimation for augmented reality assembly. In: 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 222–227 (2019). IEEE

  36. Tekin, B., Bogo, F., Pollefeys, M.: H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4511–4520 (2019)

  37. Tekin, B., Bogo, F., Pollefeys, M.: H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4511–4520 (2019)

  38. Wan, C., Yao, A., Gool, L.V.: Hand pose estimation from local surface normals. In: European Conference on Computer Vision, pp. 554–569 (2016). Springer

  39. Wu M-Y, Ting P-W, Tang Y-H, Chou E-T, Fu L-C (2020) Hand pose estimation in object-interaction based on deep learning for virtual reality applications. Journal of Visual Communication and Image Representation 70:102802

    Article  Google Scholar 

  40. Wu M-Y, Ting P-W, Tang Y-H, Chou E-T, Fu L-C (2020) Hand pose estimation in object-interaction based on deep learning for virtual reality applications. Journal of Visual Communication and Image Representation 70:102802

    Article  Google Scholar 

  41. Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: Posing face, body, and hands in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965–10974 (2019)

  42. Ye, Q., Kim, T.-K.: Occlusion-aware hand pose estimation using hierarchical mixture density network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–817 (2018)

  43. Yeo H-S, Lee B-G, Lim H (2015) Hand tracking and gesture recognition system for human-computer interaction using low-cost hardware. Multimedia Tools and Applications 74(8):2687–2715

    Article  Google Scholar 

  44. Yuan, S., Stenger, B., Kim, T.-K.: Rgb-based 3d hand pose estimation via privileged learning with depth images. arXiv preprint arXiv:1811.07376 (2018)

  45. Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular rgb image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2354–2364 (2019)

  46. Zhang, Y., Xu, C., Cheng, L.: Learning to search on manifolds for 3d pose estimation of articulated objects. arXiv preprint arXiv:1612.00596 (2016)

  47. Zhou, Y., Lu, J., Du, K., Lin, X., Sun, Y., Ma, X.: Hbe: Hand branch ensemble network for real-time 3d hand pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 501–516 (2018)

  48. Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4903–4911 (2017)

Download references

Funding

No funding is provided for the preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors have equal contributions in this work.

Corresponding author

Correspondence to Bekiri Roumaissa.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Conflict of interest

The authors have no conflicts of interest to declare relevant to this article’s content.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roumaissa, B., Mohamed Chaouki, B. Hand pose estimation based on regression method from monocular RGB cameras for handling occlusion. Multimed Tools Appl 83, 21497–21523 (2024). https://doi.org/10.1007/s11042-023-16384-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16384-9

Keywords