research-article

Realtime Recognition of Dynamic Hand Gestures in Practical Applications

Authors:

Yongtian WangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 2

Article No.: 50, Pages 1 - 17

https://doi.org/10.1145/3561822

Published: 26 September 2023 Publication History

Abstract

Dynamic hand gesture acting as a semaphoric gesture is a practical and intuitive mid-air gesture interface. Nowadays benefiting from the development of deep convolutional networks, the gesture recognition has already achieved a high accuracy, however, when performing a dynamic hand gesture such as gestures of direction commands, some unintentional actions are easily misrecognized due to the similarity of the hand poses. This hinders the application of dynamic hand gestures and cannot be solved by just improving the accuracy of the applied algorithm on public datasets, thus it is necessary to study such problems from the perspective of human-computer interaction. In this article, two methods are proposed to avoid misrecognition by introducing activation delay and using asymmetric gesture design. First the temporal process of a dynamic hand gesture is decomposed and redefined, then a realtime dynamic hand gesture recognition system is built through a two-dimensional convolutional neural network. In order to investigate the influence of activation delay and asymmetric gesture design on system performance, a user study is conducted and experimental results show that the two proposed methods can effectively avoid misrecognition. The two methods proposed in this article can provide valuable guidance for researchers when designing realtime recognition system in practical applications.

References

[1]

Aaron Bangor, Philip Kortum, and James Miller. 2009. Determining what individual SUS scores mean: Adding an adjective rating scale. J. Usability Studies 4, 3 (May 2009), 114–123.

Digital Library

[2]

Evren Bozgeyikli and Lal Lila Bozgeyikli. 2021. Evaluating object manipulation interaction techniques in mixed reality: Tangible user interfaces and gesture. In Proceedings of the 2021 IEEE Virtual Reality and 3D User Interfaces Conference. 778–787. DOI:

[3]

John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (111995).

[4]

Marcio C. Cabral, Carlos H. Morimoto, and Marcelo K. Zuffo. 2005. On the usability of gesture interfaces in virtual reality environments. In Proceedings of the 2005 Latin American Conference on Human-Computer Interaction (CLIHC’05). ACM, New York, 100–108. DOI:

Digital Library

[5]

J. Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4724–4733. DOI:

[6]

Edwin Chan, Teddy Seyed, Wolfgang Stuerzlinger, Xing-Dong Yang, and Frank Maurer. 2016. User elicitation on single-hand microgestures. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI’16). ACM, New York, 3403–3414. DOI:

Digital Library

[7]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast networks for video recognition. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 6201–6210. DOI:

[8]

Shao Huang, Weiqiang Wang, Shengfeng He, and Rynson W. H. Lau. 2017. Egocentric hand detection via dynamic region growing. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1, Article 10 (Dec. 2017), 17 pages. DOI:

Digital Library

[9]

Matthew S. Hutchinson and Vijay N. Gadepally. 2021. Video action understanding. IEEE Access 9 (2021), 134611–134637. DOI:

[10]

Evgeny Izutov. 2021. LIGAR: Lightweight general-purpose action recognition. arXiv preprint arXiv:2108.13153 (2021).

[11]

Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 221–231. DOI:

Digital Library

[12]

Boyuan Jiang, MengMeng Wang, Weihao Gan, Wei Wu, and Junjie Yan. 2019. STM: Spatiotemporal and motion encoding for action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2000–2009.

[13]

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Los Alamitos, CA, 1725–1732. DOI:

Digital Library

[14]

Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, and Gerhard Rigoll. 2019. Real-time hand gesture detection and classification using convolutional neural networks. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face Gesture Recognition. 1–8. DOI:

Digital Library

[15]

Ji Lin, Chuang Gan, and Song Han. 2019. TSM: Temporal shift module for efficient video understanding. In Proceedings of the IEEE International Conference on Computer Vision.

[16]

Zhihan Lv, Alaa Halawani, Shengzhong Feng, Haibo Li, and Shafiq Ur Réhman. 2014. Multimodal hand and foot gesture interaction for handheld devices. ACM Trans. Multimedia Comput. Commun. Appl. 11, 1s, Article 10 (Oct. 2014), 19 pages. DOI:

Digital Library

[17]

Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, Mark Sandler, and Andrew Howard. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4510–4520.

[18]

Joanna Materzynska, Guillaume Berger, Ingo Bax, and Roland Memisevic. 2019. The Jester Dataset: A large-scale video dataset of human gestures. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. 2874–2882. DOI:

[19]

P. Morrel-Samuels. 1990. Clarifying the distinction between lexical and gestural commands. Int. J. Man-Mach. Stud. 32, 5 (May 1990), 581–590. DOI:

Digital Library

[20]

Michael Nielsen, Moritz Störring, Thomas B. Moeslund, and Erik Granum. 2004. A procedure for developing intuitive and ergonomic gesture interfaces for HCI. In Gesture-Based Communication in Human-Computer Interaction, Antonio Camurri and Gualtiero Volpe (Eds.). Springer, Berlin, Berlin, 409–420.

[21]

V. I. Pavlovic, R. Sharma, and T. S. Huang. 1997. Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7 (1997), 677–695. DOI:

Digital Library

[22]

Chen Qian, Xiao Sun, Yichen Wei, Xiaoou Tang, and Jian Sun. 2014. Realtime and robust hand tracking from depth. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1106–1113. DOI:

Digital Library

[23]

Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 5534–5542. DOI:

[24]

Niamul Quader, Juwei Lu, Peng Dai, and Wei Li. 2020. Towards efficient coarse-to-fine networks for action and gesture recognition. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 35–51.

[25]

Francis Quek, David McNeill, Robert Bryll, Susan Duncan, Xin-Feng Ma, Cemil Kirbas, Karl E. McCullough, and Rashid Ansari. 2002. Multimodal human discourse: Gesture and speech. ACM Trans. Comput.-Hum. Interact. 9, 3 (Sep. 2002), 171–193. DOI:

Digital Library

[26]

Adwait Sharma, Joan Sol Roo, and Jürgen Steimle. 2019. Grasping Microgestures: Eliciting Single-Hand Microgestures for Handheld Objects. ACM, New York, 1–13.

Digital Library

[27]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). MIT Press, Cambridge, MA, 568–576.

[28]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision. 4489–4497. DOI:

Digital Library

[29]

Radu-Daniel Vatavu and Laura-Bianca Bilius. 2021. GestuRING: A Web-Based Tool for Designing Gesture Input with Rings, Ring-Like, and Ring-Ready Devices. ACM, New York, 710–723.

Digital Library

[30]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2019. Temporal segment networks for action recognition in videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 11 (2019), 2740–2755. DOI:

[31]

Robert Y. Wang and Jovan Popović. 2009. Real-time hand-tracking with a color glove. In Proceedings of ACM SIGGRAPH 2009 Papers (SIGGRAPH’09). ACM, New York, Article 63, 8 pages. DOI:

Digital Library

[32]

Jacob O. Wobbrock, Meredith Ringel Morris, and Andrew D. Wilson. 2009. User-defined gestures for surface computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’09). ACM, New York, 1083–1092. DOI:

Digital Library

[33]

Yiqi Xiao and Renke He. 2019. The intuitive grasp interface: Design and evaluation of micro-gestures on the steering wheel for driving scenario. Universal Access in the Information Society 19, 2 (2019), 433–450. DOI:

[34]

Can Zhang, Yuexian Zou, Guang Chen, and Lei Gan. 2020. Pan: Towards fast action recognition via learning persistence of appearance. arXiv preprint arXiv:2008.03462 (2020).

[35]

Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. MediaPipe hands: On-device real-time hand tracking. In Proceedings of the 2020 CVPR Workshop on Computer Vision for Augmented and Virtual Reality.

[36]

Yifan Zhang, Congqi Cao, Jian Cheng, and Hanqing Lu. 2018. EgoGesture: A new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia 20, 5 (2018), 1038–1050. DOI:

[37]

Lu Zhao, Yue Liu, Dejiang Ye, Zhuoluo Ma, and Weitao Song. 2020. Implementation and evaluation of touch-based interaction using electrovibration haptic feedback in virtual environments. In Proceedings of the 2020 IEEE Conference on Virtual Reality and 3D User Interfaces. 239–247. DOI:

[38]

Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal relational reasoning in videos. In Proceedings of the European Conference on Computer Vision (ECCV’18). 803–818.

Digital Library

Cited By

Bhatt VDash R(2024)Real-Time Hand Gesture Recognition for American Sign Language Using CNN, Mediapipe and Convexity ApproachMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-3-031-62217-5_22(260-271)Online publication date: 11-Jun-2024
https://doi.org/10.1007/978-3-031-62217-5_22
Sun BGuo YYan TYe XWang ZLi HWang Z(undefined)Digging into Depth and Color Spaces: A Mapping Constraint Network for Depth Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3677123
https://dl.acm.org/doi/10.1145/3677123

Index Terms

Realtime Recognition of Dynamic Hand Gestures in Practical Applications
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques
      1. Gestural input
    2. Interactive systems and tools
      1. User interface management systems

Recommendations

Depth matrix and adaptive Bayes classifier based dynamic hand gesture recognition
Abstract
A sequence of apparently ad-hoc hand postures can generate meaningful dynamic gestures which can be utilized in interface controls for computer, television, or games. In order to develop deployable systems with these gestures, selected ...
Unified learning approach for egocentric hand gesture recognition and fingertip detection
Highlights
- Unified approach to recognize egocentric hand gesture and detect fingertips.
- ...
Abstract
Head-mounted device-based human-computer interaction often requires egocentric recognition of hand gestures and fingertips detection. In this paper, a unified approach of egocentric hand gesture recognition and fingertip detection is ...
Hand gesture recognition based on SOM and ART
ICCOMP'06: Proceedings of the 10th WSEAS international conference on Computers

Gesture recognition is needed for a variety of applications such as human-computer interfaces, communication aids for the deaf, etc. In this paper, we present a SOMART system for the recognition of hand gestures. The sequence of a hand gesture is first ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 2

February 2024

548 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3613570

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 September 2023

Online AM: 08 September 2022

Accepted: 01 September 2022

Revised: 29 June 2022

Received: 15 December 2021

Published in TOMM Volume 20, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
635
Total Downloads

Downloads (Last 12 months)431
Downloads (Last 6 weeks)26

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bhatt VDash R(2024)Real-Time Hand Gesture Recognition for American Sign Language Using CNN, Mediapipe and Convexity ApproachMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-3-031-62217-5_22(260-271)Online publication date: 11-Jun-2024
https://doi.org/10.1007/978-3-031-62217-5_22
Sun BGuo YYan TYe XWang ZLi HWang Z(undefined)Digging into Depth and Color Spaces: A Mapping Constraint Network for Depth Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3677123
https://dl.acm.org/doi/10.1145/3677123

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents