research-article

Open access

Decoding Surface Touch Typing from Hand-Tracking

Authors:

Mark Richardson,

Matt Durasoff,

Robert WangAuthors Info & Claims

UIST '20: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology

Pages 686 - 696

https://doi.org/10.1145/3379337.3415816

Published: 20 October 2020 Publication History

PDF eReader

Abstract

We propose a novel text decoding method that enables touch typing on an uninstrumented flat surface. Rather than relying on physical keyboards or capacitive touch, our method takes as input hand motion of the typist, obtained through hand-tracking, and decodes this motion directly into text. We use a temporal convolutional network to represent a motion model that maps the hand motion, represented as a sequence of hand pose features, into text characters. To enable touch typing without the haptic feedback of a physical keyboard, we had to address more erratic typing motion due to drift of the fingers. Thus, we incorporate a language model as a text prior and use beam search to efficiently combine our motion and language models to decode text from erratic or ambiguous hand motion. We collected a dataset of 20 touch typists and evaluated our model on several baselines, including contact-based text decoding and typing on a physical keyboard. Our proposed method is able to leverage continuous hand pose information to decode text more accurately than contact-based methods and an offline study shows parity (73 WPM, 2.38% UER) with typing on a physical keyboard. Our results show that hand-tracking has the potential to enable rapid text entry in mobile environments.

Supplementary Material

VTT File (ufp1807pv.vtt)

Download
.84 KB

VTT File (ufp1807vf.vtt)

Download
4.00 KB

VTT File (3379337.3415816.vtt)

Download
8.27 KB

SRT File (ufp1807pvc.srt)

Preview video captions

Download
.85 KB

SRT File (ufp1807vfc.srt)

Video figure captions

Download
4.09 KB

MP4 File (ufp1807pv.mp4)

Preview video

Download
15.75 MB

MP4 File (ufp1807vf.mp4)

Video figure

Download
62.44 MB

MP4 File (3379337.3415816.mp4)

Presentation Video

Download
53.75 MB

References

[1]

Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, Jie Chen, Jingdong Chen, Zhijie Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Ke Ding, Niandong Du, Erich Elsen, Jesse Engel, Weiwei Fang, Linxi Fan, Christopher Fougner, Liang Gao, Caixia Gong, Awni Hannun, Tony Han, Lappi Johannes, Bing Jiang, Cai Ju, Billy Jun, Patrick LeGresley, Libby Lin, Junjie Liu, Yang Liu, Weigao Li, Xiangang Li, Dongpeng Ma, Sharan Narang, Andrew Ng, Sherjil Ozair, Yiping Peng, Ryan Prenger, Sheng Qian, Zongfeng Quan, Jonathan Raiman, Vinay Rao, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Kavya Srinet, Anuroop Sriram, Haiyuan Tang, Liliang Tang, Chong Wang, Jidong Wang, Kaifu Wang, Yi Wang, Zhijian Wang, Zhiqian Wang, Shuang Wu, Likai Wei, Bo Xiao, Wen Xie, Yan Xie, Dani Yogatama, Bin Yuan, Jun Zhan, and Zhenyao Zhu. 2016. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. PMLR, New York, New York, USA, 173--182. http://proceedings.mlr.press/v48/amodei16.html

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

StegoType: Surface Typing from Egocentric Cameras

StegoType: Surface Typing from Egocentric Cameras

Webcam-based Hand- and Object-Tracking for a Desktop Workspace in Virtual Reality

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations