Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3394171.3413685acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Attention Based Dual Branches Fingertip Detection Network and Virtual Key System

Published: 12 October 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Gesture and fingertip are becoming more and more important mediums for human-computer interaction (HCI). Therefore, algorithms of gesture recognition and fingertip detection have been extensively investigated. However, problems mainly remain in how to achieve a win-win situation between speed and accuracy, and how to deal with complex interaction environment. To rectify these problems, this paper proposes an attention-based dual branches network that can efficiently fulfill both fingertip detection and gesture recognition tasks. In order to deal with complex interaction environment, we combine both channel-wise attention and spatial-wise attention into the fingertip detection model. The extensive experiments demonstrate that our novel model is both effective and efficient. In the experiment, our proposed model achieves the average fingertip detection error at around 2.8 pixels in 640×480 video frame, and the average recognition accuracy among eight gestures reaches $99%$. Moreover, the average forward time is about 8 ms. Due to the light-weight design, this model can also achieve high-efficiency performance on CPU. In addition, we design a virtual key system based on our proposed model, which can allow users to complete the "clicking" operation naturally in virtual environment. Our proposed system can perform well with a single normal RGB camera without any pre-processing (e.g., image segmentation or contour extraction), which can significantly reduce the complexity of the interaction system.

    Supplementary Material

    MP4 File (3394171.3413685.mp4)
    This video is a brief introduction to the paper titled ?Attention Based Dual Branches Fingertip Detection Network and Virtual Key System?. In this video, we begin with the background and motivation of this paper. Then we introduce the architecture of our proposed model which can rectify the weaknesses mentioned earlier, and we compare the performance of our proposed model with that of existing competitive models to demonstrate the superiority of our proposed method. Finally, we present two kinds of gesture-based interactive applications (Air Writing and Virtual Clicking) which are established based on our proposed model. In summary, in this video, we want to present an idea which can perform both fingertip detection and gesture recognition with a single model, and demonstrate the superiority of this kind of model in fingertip detection and practical application.

    References

    [1]
    Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L Yuille, and Xiaogang Wang. 2017. Multi-context attention for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1831--1840.
    [2]
    Yichao Huang, Xiaorui Liu, Xin Zhang, and Lianwen Jin. 2016. A pointing gesture based egocentric interaction system: Dataset, approach and application. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 16--23.
    [3]
    Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning. 448--456.
    [4]
    Hui Ji, Jianxin Chen, Qingyu Lin, and Ang Li. 2018. A Local Fingertips Movement and Fingertips Clustering Based Virtual Keyboard Adopting a Camera. In Proceedings of the 2018 International Conference on Computing and Pattern Recognition. 61--67.
    [5]
    Xiaorui Liu, Yichao Huang, Xin Zhang, and Lianwen Jin. 2015. Fingertip in the Eye: A cascaded CNN pipeline for the real-time fingertip detection in egocentric videos. Computer Science (2015).
    [6]
    Sohom Mukherjee, Sk Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, and Partha Pratim Roy. 2019. Fingertip detection and tracking for recognition of air-writing in videos. Expert Systems with Applications, Vol. 136 (2019), 217--229.
    [7]
    Kazuya Murao. 2015. Wearable Text Input Interface using Touch Typing Skills. In Proceedings of the Augmented Human International Conference.
    [8]
    Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning.
    [9]
    Jongchan Park, Sanghyun Woo, Joon-Young Lee, and In-So Kweon. 2018. BAM: Bottleneck Attention Module. In Proceedings of the British Machine Vision Conference (BMVC).
    [10]
    Tomas Pfister, James Charles, and Andrew Zisserman. 2015. Flowing convnets for human pose estimation in videos. In Proceedings of the IEEE International Conference on Computer Vision. 1913--1921.
    [11]
    Huo Qiang. 2018. Fingertip detection based on protuberant saliency from depth image. In IEEE International Conference on Image Processing.
    [12]
    Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
    [13]
    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems. 91--99.
    [14]
    Jungpil Shin and Cheol Min Kim. 2016. Character Input System using Fingertip Detection with Kinect Sensor. In Proceedings of the International Conference on Research in Adaptive and Convergent Systems. 74--79.
    [15]
    Wenbin Wu, Chenyang Li, Zhuo Cheng, Xin Zhang, and Lianwen Jin. 2017. YOLSE: Egocentric fingertip detection from single RGB images. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 623--630.

    Cited By

    View all
    • (2024)Skeleton-Based Gesture Recognition With Learnable Paths and Signature FeaturesIEEE Transactions on Multimedia10.1109/TMM.2023.331824226(3951-3961)Online publication date: 1-Jan-2024
    • (2022)Two-Stream Spatial-Temporal Fusion Graph Convolutional Network for Dynamic Gesture Recognition2022 8th International Conference on Virtual Reality (ICVR)10.1109/ICVR55215.2022.9847803(444-453)Online publication date: 26-May-2022
    • (2021)Air-TextProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475694(1267-1274)Online publication date: 17-Oct-2021

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '20: Proceedings of the 28th ACM International Conference on Multimedia
    October 2020
    4889 pages
    ISBN:9781450379885
    DOI:10.1145/3394171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention mechanism
    2. fingertip detection
    3. gesture recognition
    4. human-computer interaction
    5. virtual key system

    Qualifiers

    • Research-article

    Funding Sources

    • Natural Science Foundation of Guangdong Province
    • Science and Technology Program of Guangzhou
    • Guangdong Science and Technology Department (GDSTP)

    Conference

    MM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)0
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Skeleton-Based Gesture Recognition With Learnable Paths and Signature FeaturesIEEE Transactions on Multimedia10.1109/TMM.2023.331824226(3951-3961)Online publication date: 1-Jan-2024
    • (2022)Two-Stream Spatial-Temporal Fusion Graph Convolutional Network for Dynamic Gesture Recognition2022 8th International Conference on Virtual Reality (ICVR)10.1109/ICVR55215.2022.9847803(444-453)Online publication date: 26-May-2022
    • (2021)Air-TextProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475694(1267-1274)Online publication date: 17-Oct-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media