research-article

Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition

Authors:

Mashrur M. Morshed,

Md. Kamrul HasanAuthors Info & Claims

The Visual Computer, Volume 40, Issue 1

Pages 11 - 25

https://doi.org/10.1007/s00371-022-02762-1

Published: 04 January 2023 Publication History

Abstract

An existing approach to dynamic hand gesture recognition is to use multimodal-fusion CRNN (Convolutional Recurrent Neural Networks) on depth images and corresponding 2D hand skeleton coordinates. However, an underlying problem in this method is that raw depth images possess a very low contrast in the hand ROI (region of interest). They do not highlight the details which are important to fine-grained hand gesture recognition details such as finger orientation, the overlap between the fingers and the palm, or overlap between multiple fingers. To address this issue, we propose generating quantized depth images as an alternative input modality to raw depth images. This creates sharp relative contrasts between key parts of the hand, which improves gesture recognition performance. In addition, we explore some ways to tackle the high variance problem in previously researched multimodal-fusion CRNN architectures. We obtained accuracies of 90.82 and 89.21% (14 and 28 gestures, respectively) on the DHG-14/28 dataset and accuracies of 93.81 and 90.24% (14 and 28 gestures, respectively) on the SHREC-2017 dataset, which is a significant improvement over previous multimodal-dusion CRNNs.

References

[1]

Araujo, A., Norris, W., Sim, J.: Computing receptive fields of convolutional neural networks. Distill (2019). https://distill.pub/2019/computing-receptive-fields

[2]

Barbhuiya AA, Karsh RK, and Jain R CNN based feature extraction and classification for sign language Multimedia Tools Appl. 2021 80 2 3051-3069

[3]

Chen, Y., Zhao, L., Peng, X., et al.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv:1907.08871 (2019)

[4]

Chen X, Wang G, Guo H, et al. Mfa-net: motion feature augmented network for dynamic hand gesture recognition from skeletal data Sensors 2019 19 2 239

[5]

De Smedt, Q., Wannous, H., Vandeborre, J.P., et al.: Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)

[6]

De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)

[7]

Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

[8]

Desai, S., Desai, A.: Human computer interaction through hand gestures for home automation using microsoft kinect. In: Proceedings of International Conference on Communication and Networks, pp. 19–29. Springer (2017)

[9]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

[10]

Facebook: Fvcore library (2019). https://github.com/facebookresearch/fvcore

[11]

Foto, B.H., Corp, E.: Intel realsense depth module sr300 (online) (2021). https://www.bhphotovideo.com/c/product/1567309-REG/intel_82535ivchvm_realsense_camera_sr300.html/specs. Accessed 1 Aug 2021

[12]

Geirhos, R., Rubisch, P., Michaelis, C., et al.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)

[13]

Hou, J., Wang, G., Chen, X., et al.: Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

[14]

Iwai, Y., Watanabe, K., Yagi, Y., et al.: Gesture recognition by using colored gloves. In: 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No. 96CH35929), pp. 76–81. IEEE (1996)

[15]

Jain R, Karsh RK, and Barbhuiya AA Encoded motion image-based dynamic hand gesture recognition Vis. Comput. 2022 38 6 1957-1974

[16]

Koller O, Zargaran S, Ney H, et al. Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMS Int. J. Comput. Vis. 2018 126 12 1311-1325

[17]

Kopuklu, O., Kose, N., Rigoll, G.: Motion fused frames: Data level fusion strategy for hand gesture recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 1–9 (2018)

[18]

Köpüklü, O., Ledwon, T., Rong, Y., et al.: Drivermhg: a multi-modal dataset for dynamic recognition of driver micro hand gestures and a real-time recognition framework. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 77–84. IEEE (2020)

[19]

Krizhevsky A, Sutskever I, and Hinton GE Imagenet classification with deep convolutional neural networks Adv. Neural. Inf. Process. Syst. 2012 25 1097-1105

[20]

Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp. 1975–1979. IEEE (2012)

[21]

Lai, K., Yanushkevich, S.: An ensemble of knowledge sharing models for dynamic hand gesture recognition. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp 1–7. IEEE (2020)

[22]

Lai, K., Yanushkevich, S.N.: CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3451–3456. IEEE (2018)

[23]

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

[24]

Mahmud, H., Islam, R., Hasan, M.K.: On-air English capital alphabet (ECA) recognition using depth information. Vis. Comput. https://link.springer.com/article/10.1007%2Fs00371-021-02065-x

[25]

Min, Y., Zhang, Y., Chai, X., et al.: An efficient pointlstm for point clouds based gesture recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5760–5769 (2020).

[26]

Molchanov, P., Yang, X., Gupta, S., et al.: Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

[27]

Nagi, J., Ducatelle, F., Di Caro, G.A., et al.: Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 342–347. IEEE (2011)

[28]

Naguri, C.R., Bunescu, R.C.: Recognition of dynamic hand gestures from 3d motion data using LSTM and CNN architectures. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1130–1133 (2017).

[29]

Nunez JC, Cabido R, Pantrigo JJ, et al. Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition Pattern Recogn. 2018 76 80-94

[30]

Oudah M, Al-Naji A, and Chahl J Hand gesture recognition based on computer vision: a review of techniques J. Imaging 2020 6 8 73

[31]

Pintea, S.L., Zheng, J., Li, X., et al.: Hand-tremor frequency estimation in videos. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)

[32]

Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)

[33]

Rogozhnikov, A.: Einops: flexible and powerful tensor operations for readable and reliable code (2018). https://github.com/arogozhnikov/einops

[34]

Tao W, Leu MC, and Yin Z American sign language alphabet recognition using convolutional neural networks with multiview augmentation and inference fusion Eng. Appl. Artif. Intell. 2018 76 202-213

[35]

Vandersteegen, M., Reusen, W., Van Beeck, K., et al.: Low-latency hand gesture recognition with a low-resolution thermal imager. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 98–99 (2020)

[36]

Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018, pp. 7444–7452. AAAI Press (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135

[37]

Zhang, Y., Cao, C., Cheng, J., et al.: Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimedia 20(5), 1038–1050 (2018)

Cited By

Singh RSingh L(2025)Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methodsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03307-441:1(41-51)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s00371-024-03307-4
Yang WXie LQian WWu CYang H(2025)Coarse-to-fine cascaded 3D hand reconstruction based on SSGC and MHSAThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03305-641:1(11-24)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s00371-024-03305-6

Index Terms

Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition
1. Computing methodologies
2. Human-centered computing
  1. Human computer interaction (HCI)

Index terms have been assigned to the content through auto-classification.

Recommendations

Depth-based hand gesture recognition

In this article, a dynamic gesture recognition system with the depth information is proposed. The proposed system consists of three main components: preprocessing, static posture recognition and dynamic gesture recognition. In the first component, the ...
Heterogeneous hand gesture recognition using 3D dynamic skeletal data
Abstract
Hand gestures are the most natural and intuitive non-verbal communication medium while interacting with a computer, and related research efforts have recently boosted interest. Additionally, the identifiable features of the hand pose ...
Highlights
- Dynamic hand gesture recognition using 3D skeletal data.
- Computing efficient ...
A dynamic hand gesture recognition dataset for human-computer interfaces
Abstract
Computer vision systems are commonly used to design touch-less human-computer interfaces (HCI) based on dynamic hand gesture recognition (HGR) systems, which have a wide range of applications in several domains, such as, gaming, ...

Comments

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics

The Visual Computer: International Journal of Computer Graphics Volume 40, Issue 1

Jan 2024

439 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 04 January 2023

Accepted: 20 December 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Singh RSingh L(2025)Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methodsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03307-441:1(41-51)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s00371-024-03307-4
Yang WXie LQian WWu CYang H(2025)Coarse-to-fine cascaded 3D hand reconstruction based on SSGC and MHSAThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03305-641:1(11-24)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s00371-024-03305-6

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents