Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition

Published: 04 January 2023 Publication History

Abstract

An existing approach to dynamic hand gesture recognition is to use multimodal-fusion CRNN (Convolutional Recurrent Neural Networks) on depth images and corresponding 2D hand skeleton coordinates. However, an underlying problem in this method is that raw depth images possess a very low contrast in the hand ROI (region of interest). They do not highlight the details which are important to fine-grained hand gesture recognition details such as finger orientation, the overlap between the fingers and the palm, or overlap between multiple fingers. To address this issue, we propose generating quantized depth images as an alternative input modality to raw depth images. This creates sharp relative contrasts between key parts of the hand, which improves gesture recognition performance. In addition, we explore some ways to tackle the high variance problem in previously researched multimodal-fusion CRNN architectures. We obtained accuracies of 90.82 and 89.21% (14 and 28 gestures, respectively) on the DHG-14/28 dataset and accuracies of 93.81 and 90.24% (14 and 28 gestures, respectively) on the SHREC-2017 dataset, which is a significant improvement over previous multimodal-dusion CRNNs.

References

[1]
Araujo, A., Norris, W., Sim, J.: Computing receptive fields of convolutional neural networks. Distill (2019). https://distill.pub/2019/computing-receptive-fields
[2]
Barbhuiya AA, Karsh RK, and Jain R CNN based feature extraction and classification for sign language Multimedia Tools Appl. 2021 80 2 3051-3069
[3]
Chen, Y., Zhao, L., Peng, X., et al.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv:1907.08871 (2019)
[4]
Chen X, Wang G, Guo H, et al. Mfa-net: motion feature augmented network for dynamic hand gesture recognition from skeletal data Sensors 2019 19 2 239
[5]
De Smedt, Q., Wannous, H., Vandeborre, J.P., et al.: Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
[6]
De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)
[7]
Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
[8]
Desai, S., Desai, A.: Human computer interaction through hand gestures for home automation using microsoft kinect. In: Proceedings of International Conference on Communication and Networks, pp. 19–29. Springer (2017)
[9]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
[11]
Foto, B.H., Corp, E.: Intel realsense depth module sr300 (online) (2021). https://www.bhphotovideo.com/c/product/1567309-REG/intel_82535ivchvm_realsense_camera_sr300.html/specs. Accessed 1 Aug 2021
[12]
Geirhos, R., Rubisch, P., Michaelis, C., et al.: Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231 (2018)
[13]
Hou, J., Wang, G., Chen, X., et al.: Spatial-temporal attention res-tcn for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
[14]
Iwai, Y., Watanabe, K., Yagi, Y., et al.: Gesture recognition by using colored gloves. In: 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No. 96CH35929), pp. 76–81. IEEE (1996)
[15]
Jain R, Karsh RK, and Barbhuiya AA Encoded motion image-based dynamic hand gesture recognition Vis. Comput. 2022 38 6 1957-1974
[16]
Koller O, Zargaran S, Ney H, et al. Deep sign: enabling robust statistical continuous sign language recognition via hybrid CNN-HMMS Int. J. Comput. Vis. 2018 126 12 1311-1325
[17]
Kopuklu, O., Kose, N., Rigoll, G.: Motion fused frames: Data level fusion strategy for hand gesture recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 1–9 (2018)
[18]
Köpüklü, O., Ledwon, T., Rong, Y., et al.: Drivermhg: a multi-modal dataset for dynamic recognition of driver micro hand gestures and a real-time recognition framework. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 77–84. IEEE (2020)
[19]
Krizhevsky A, Sutskever I, and Hinton GE Imagenet classification with deep convolutional neural networks Adv. Neural. Inf. Process. Syst. 2012 25 1097-1105
[20]
Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp. 1975–1979. IEEE (2012)
[21]
Lai, K., Yanushkevich, S.: An ensemble of knowledge sharing models for dynamic hand gesture recognition. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp 1–7. IEEE (2020)
[22]
Lai, K., Yanushkevich, S.N.: CNN+ RNN depth and skeleton based dynamic hand gesture recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3451–3456. IEEE (2018)
[23]
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
[24]
Mahmud, H., Islam, R., Hasan, M.K.: On-air English capital alphabet (ECA) recognition using depth information. Vis. Comput. https://link.springer.com/article/10.1007%2Fs00371-021-02065-x
[25]
Min, Y., Zhang, Y., Chai, X., et al.: An efficient pointlstm for point clouds based gesture recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5760–5769 (2020).
[26]
Molchanov, P., Yang, X., Gupta, S., et al.: Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
[27]
Nagi, J., Ducatelle, F., Di Caro, G.A., et al.: Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 342–347. IEEE (2011)
[28]
Naguri, C.R., Bunescu, R.C.: Recognition of dynamic hand gestures from 3d motion data using LSTM and CNN architectures. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1130–1133 (2017).
[29]
Nunez JC, Cabido R, Pantrigo JJ, et al. Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition Pattern Recogn. 2018 76 80-94
[30]
Oudah M, Al-Naji A, and Chahl J Hand gesture recognition based on computer vision: a review of techniques J. Imaging 2020 6 8 73
[31]
Pintea, S.L., Zheng, J., Li, X., et al.: Hand-tremor frequency estimation in videos. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
[32]
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)
[33]
Rogozhnikov, A.: Einops: flexible and powerful tensor operations for readable and reliable code (2018). https://github.com/arogozhnikov/einops
[34]
Tao W, Leu MC, and Yin Z American sign language alphabet recognition using convolutional neural networks with multiview augmentation and inference fusion Eng. Appl. Artif. Intell. 2018 76 202-213
[35]
Vandersteegen, M., Reusen, W., Van Beeck, K., et al.: Low-latency hand gesture recognition with a low-resolution thermal imager. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 98–99 (2020)
[36]
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7, 2018, pp. 7444–7452. AAAI Press (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17135
[37]
Zhang, Y., Cao, C., Cheng, J., et al.: Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimedia 20(5), 1038–1050 (2018)

Cited By

View all
  • (2025)Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methodsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03307-441:1(41-51)Online publication date: 1-Jan-2025
  • (2025)Coarse-to-fine cascaded 3D hand reconstruction based on SSGC and MHSAThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03305-641:1(11-24)Online publication date: 1-Jan-2025

Index Terms

  1. Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image The Visual Computer: International Journal of Computer Graphics
          The Visual Computer: International Journal of Computer Graphics  Volume 40, Issue 1
          Jan 2024
          439 pages

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 04 January 2023
          Accepted: 20 December 2022

          Author Tags

          1. Convolutional recurrent neural networks
          2. Dynamic hand gesture recognition
          3. Multimodal-fusion networks
          4. Depth image
          5. Hand skeleton joint points

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 08 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)Dyhand: dynamic hand gesture recognition using BiLSTM and soft attention methodsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03307-441:1(41-51)Online publication date: 1-Jan-2025
          • (2025)Coarse-to-fine cascaded 3D hand reconstruction based on SSGC and MHSAThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-024-03305-641:1(11-24)Online publication date: 1-Jan-2025

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media