Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Realtime Recognition of Dynamic Hand Gestures in Practical Applications

Published: 26 September 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Dynamic hand gesture acting as a semaphoric gesture is a practical and intuitive mid-air gesture interface. Nowadays benefiting from the development of deep convolutional networks, the gesture recognition has already achieved a high accuracy, however, when performing a dynamic hand gesture such as gestures of direction commands, some unintentional actions are easily misrecognized due to the similarity of the hand poses. This hinders the application of dynamic hand gestures and cannot be solved by just improving the accuracy of the applied algorithm on public datasets, thus it is necessary to study such problems from the perspective of human-computer interaction. In this article, two methods are proposed to avoid misrecognition by introducing activation delay and using asymmetric gesture design. First the temporal process of a dynamic hand gesture is decomposed and redefined, then a realtime dynamic hand gesture recognition system is built through a two-dimensional convolutional neural network. In order to investigate the influence of activation delay and asymmetric gesture design on system performance, a user study is conducted and experimental results show that the two proposed methods can effectively avoid misrecognition. The two methods proposed in this article can provide valuable guidance for researchers when designing realtime recognition system in practical applications.

    References

    [1]
    Aaron Bangor, Philip Kortum, and James Miller. 2009. Determining what individual SUS scores mean: Adding an adjective rating scale. J. Usability Studies 4, 3 (May 2009), 114–123.
    [2]
    Evren Bozgeyikli and Lal Lila Bozgeyikli. 2021. Evaluating object manipulation interaction techniques in mixed reality: Tangible user interfaces and gesture. In Proceedings of the 2021 IEEE Virtual Reality and 3D User Interfaces Conference. 778–787. DOI:
    [3]
    John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (111995).
    [4]
    Marcio C. Cabral, Carlos H. Morimoto, and Marcelo K. Zuffo. 2005. On the usability of gesture interfaces in virtual reality environments. In Proceedings of the 2005 Latin American Conference on Human-Computer Interaction (CLIHC’05). ACM, New York, 100–108. DOI:
    [5]
    J. Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4724–4733. DOI:
    [6]
    Edwin Chan, Teddy Seyed, Wolfgang Stuerzlinger, Xing-Dong Yang, and Frank Maurer. 2016. User elicitation on single-hand microgestures. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI’16). ACM, New York, 3403–3414. DOI:
    [7]
    Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast networks for video recognition. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. 6201–6210. DOI:
    [8]
    Shao Huang, Weiqiang Wang, Shengfeng He, and Rynson W. H. Lau. 2017. Egocentric hand detection via dynamic region growing. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1, Article 10 (Dec. 2017), 17 pages. DOI:
    [9]
    Matthew S. Hutchinson and Vijay N. Gadepally. 2021. Video action understanding. IEEE Access 9 (2021), 134611–134637. DOI:
    [10]
    Evgeny Izutov. 2021. LIGAR: Lightweight general-purpose action recognition. arXiv preprint arXiv:2108.13153 (2021).
    [11]
    Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 221–231. DOI:
    [12]
    Boyuan Jiang, MengMeng Wang, Weihao Gan, Wei Wu, and Junjie Yan. 2019. STM: Spatiotemporal and motion encoding for action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2000–2009.
    [13]
    A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Los Alamitos, CA, 1725–1732. DOI:
    [14]
    Okan Köpüklü, Ahmet Gunduz, Neslihan Kose, and Gerhard Rigoll. 2019. Real-time hand gesture detection and classification using convolutional neural networks. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face Gesture Recognition. 1–8. DOI:
    [15]
    Ji Lin, Chuang Gan, and Song Han. 2019. TSM: Temporal shift module for efficient video understanding. In Proceedings of the IEEE International Conference on Computer Vision.
    [16]
    Zhihan Lv, Alaa Halawani, Shengzhong Feng, Haibo Li, and Shafiq Ur Réhman. 2014. Multimodal hand and foot gesture interaction for handheld devices. ACM Trans. Multimedia Comput. Commun. Appl. 11, 1s, Article 10 (Oct. 2014), 19 pages. DOI:
    [17]
    Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, Mark Sandler, and Andrew Howard. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4510–4520.
    [18]
    Joanna Materzynska, Guillaume Berger, Ingo Bax, and Roland Memisevic. 2019. The Jester Dataset: A large-scale video dataset of human gestures. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop. 2874–2882. DOI:
    [19]
    P. Morrel-Samuels. 1990. Clarifying the distinction between lexical and gestural commands. Int. J. Man-Mach. Stud. 32, 5 (May 1990), 581–590. DOI:
    [20]
    Michael Nielsen, Moritz Störring, Thomas B. Moeslund, and Erik Granum. 2004. A procedure for developing intuitive and ergonomic gesture interfaces for HCI. In Gesture-Based Communication in Human-Computer Interaction, Antonio Camurri and Gualtiero Volpe (Eds.). Springer, Berlin, Berlin, 409–420.
    [21]
    V. I. Pavlovic, R. Sharma, and T. S. Huang. 1997. Visual interpretation of hand gestures for human-computer interaction: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7 (1997), 677–695. DOI:
    [22]
    Chen Qian, Xiao Sun, Yichen Wei, Xiaoou Tang, and Jian Sun. 2014. Realtime and robust hand tracking from depth. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1106–1113. DOI:
    [23]
    Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 5534–5542. DOI:
    [24]
    Niamul Quader, Juwei Lu, Peng Dai, and Wei Li. 2020. Towards efficient coarse-to-fine networks for action and gesture recognition. In Computer Vision – ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer International Publishing, Cham, 35–51.
    [25]
    Francis Quek, David McNeill, Robert Bryll, Susan Duncan, Xin-Feng Ma, Cemil Kirbas, Karl E. McCullough, and Rashid Ansari. 2002. Multimodal human discourse: Gesture and speech. ACM Trans. Comput.-Hum. Interact. 9, 3 (Sep. 2002), 171–193. DOI:
    [26]
    Adwait Sharma, Joan Sol Roo, and Jürgen Steimle. 2019. Grasping Microgestures: Eliciting Single-Hand Microgestures for Handheld Objects. ACM, New York, 1–13.
    [27]
    Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). MIT Press, Cambridge, MA, 568–576.
    [28]
    Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision. 4489–4497. DOI:
    [29]
    Radu-Daniel Vatavu and Laura-Bianca Bilius. 2021. GestuRING: A Web-Based Tool for Designing Gesture Input with Rings, Ring-Like, and Ring-Ready Devices. ACM, New York, 710–723.
    [30]
    Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2019. Temporal segment networks for action recognition in videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 11 (2019), 2740–2755. DOI:
    [31]
    Robert Y. Wang and Jovan Popović. 2009. Real-time hand-tracking with a color glove. In Proceedings of ACM SIGGRAPH 2009 Papers (SIGGRAPH’09). ACM, New York, Article 63, 8 pages. DOI:
    [32]
    Jacob O. Wobbrock, Meredith Ringel Morris, and Andrew D. Wilson. 2009. User-defined gestures for surface computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’09). ACM, New York, 1083–1092. DOI:
    [33]
    Yiqi Xiao and Renke He. 2019. The intuitive grasp interface: Design and evaluation of micro-gestures on the steering wheel for driving scenario. Universal Access in the Information Society 19, 2 (2019), 433–450. DOI:
    [34]
    Can Zhang, Yuexian Zou, Guang Chen, and Lei Gan. 2020. Pan: Towards fast action recognition via learning persistence of appearance. arXiv preprint arXiv:2008.03462 (2020).
    [35]
    Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. MediaPipe hands: On-device real-time hand tracking. In Proceedings of the 2020 CVPR Workshop on Computer Vision for Augmented and Virtual Reality.
    [36]
    Yifan Zhang, Congqi Cao, Jian Cheng, and Hanqing Lu. 2018. EgoGesture: A new dataset and benchmark for egocentric hand gesture recognition. IEEE Transactions on Multimedia 20, 5 (2018), 1038–1050. DOI:
    [37]
    Lu Zhao, Yue Liu, Dejiang Ye, Zhuoluo Ma, and Weitao Song. 2020. Implementation and evaluation of touch-based interaction using electrovibration haptic feedback in virtual environments. In Proceedings of the 2020 IEEE Conference on Virtual Reality and 3D User Interfaces. 239–247. DOI:
    [38]
    Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal relational reasoning in videos. In Proceedings of the European Conference on Computer Vision (ECCV’18). 803–818.

    Cited By

    View all
    • (2024)Real-Time Hand Gesture Recognition for American Sign Language Using CNN, Mediapipe and Convexity ApproachMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-3-031-62217-5_22(260-271)Online publication date: 11-Jun-2024
    • (undefined)Digging into Depth and Color Spaces: A Mapping Constraint Network for Depth Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3677123

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 2
    February 2024
    548 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3613570
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 September 2023
    Online AM: 08 September 2022
    Accepted: 01 September 2022
    Revised: 29 June 2022
    Received: 15 December 2021
    Published in TOMM Volume 20, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Dynamic gesture recognition
    2. activation delay
    3. asymmetric gesture design
    4. convolutional neural network
    5. human-computer interaction

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)431
    • Downloads (Last 6 weeks)26
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Real-Time Hand Gesture Recognition for American Sign Language Using CNN, Mediapipe and Convexity ApproachMachine Learning, Image Processing, Network Security and Data Sciences10.1007/978-3-031-62217-5_22(260-271)Online publication date: 11-Jun-2024
    • (undefined)Digging into Depth and Color Spaces: A Mapping Constraint Network for Depth Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3677123

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media