research-article

Parakeet: a continuous speech recognition system for mobile touch-screen devices

Authors:

Keith Vertanen,

Per Ola KristenssonAuthors Info & Claims

IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces

Pages 237 - 246

https://doi.org/10.1145/1502650.1502685

Published: 08 February 2009 Publication History

Abstract

We present Parakeet, a system for continuous speech recognition on mobile touch-screen devices. The design of Parakeet was guided by computational experiments and validated by a user study. Participants had an average text entry rate of 18 words-per-minute (WPM) while seated indoors and 13 WPM while walking outdoors. In an expert pilot study, we found that speech recognition has the potential to be a highly competitive mobile text entry method, particularly in an actual mobile setting where users are walking around while entering text.

References

[1]

Accot, J. and Zhai, S. More than dotting the i's -- foundations for crossing-based interfaces. Proc. CHI 2002, ACM Press (2002), 73--80.

Digital Library

[2]

Bisani, M. and Ney, H. Bootstrap estimates for confidence intervals in ASR performance evaluation. Proc. ICASSP 2004, IEEE Press (2004), 409--412.

[3]

Buxton, W. Chunking and phrasing and the design of human-computer dialogues. Proc. IFIP World Computer Congress 1986. IFIP (1986), 475--480.

[4]

Cohen, J. Embedded speech recognition applications in mobile phones: status, trends and challenges. Proc. ICASSP 2008, IEEE Press (2008), 5352--5355.

[5]

Crossan, A., Murray-Smith, R., Brewster, S., Kelly, J. and Musizza, B. Gait phase effects in mobile interaction. Ext. Abstracts CHI 2005, ACM Press (2005), 1312--1315.

Digital Library

[6]

Darragh, J.J., Witten, I.H. and James, M.L. The reactive keyboard: a predictive typing aid. IEEE Computer 23, 11 (1990), 41--49.

Digital Library

[7]

Fitts, P. The information capacity in the human motor system in controlling the amplitude in movement. J. Experimental Psychology 47 (1954), 381--391.

[8]

Goodman, J., Venolia, G., Steury, K. and Parker, C. Language modeling for soft keyboards. Proc. AAAI 2002, AAAI Press (2002), 419--424.

Digital Library

[9]

Hakkani-Tür, D., Béchet, F., Riccardi, G. and Tur, G. Beyond ASR 1-best: using word confusion networks in spoken language understanding. J. Computer Speech and Language 20, 4 (2006), 495--514.

[10]

Hetherington, I.L. PocketSUMMIT: small footprint continuous speech recognition. Proc. ICSLP 2007, ISCA (2007), 1465--1468.

[11]

Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M. and Rudnicky, A.I. PocketSphinx: a free real-time continuous speech recognition system for hand-held devices. Proc. ICASSP 2006, IEEE Press (2006), 185--188.

[12]

Karat, C.M., Halverson, C., Horn, D. and Karat, J. Patterns of entry and correction in large vocabulary speech recognition systems. Proc. CHI 1999, ACM Press (1999), 568--575.

Digital Library

[13]

Karlson, A.K., Bederson, B.B. and Contreras-Vidal, J.L. Understanding one-handed use of mobile devices. In Lumsden, J. (Ed.) Handbook of Research on User Interface Design and Evaluation for Mobile Technology. Idea Group (2008), 86--100.

[14]

Kurihara, K., Goto, M., Ogata, J. and Igarashi, T. Speech Pen: Predictive Handwriting Based on Ambient Multimodal Recognition. Proc. CHI 2006, ACM Press (2006), 851--860.

Digital Library

[15]

Kristensson, P.O. and Zhai, S. Relaxing stylus typing precision by geometric pattern matching. Proc. IUI 2005, ACM Press (2005), 151--158.

Digital Library

[16]

Kristensson, P.O. and Zhai, S. Improving word-recognizers using an interactive lexicon with active and passive words. Proc. IUI 2008, ACM Press (2008), 353--356.

Digital Library

[17]

Mangu, L., Brill E. and Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. J. Computer Speech and Language 14, 4 (2000), 373--400.

[18]

Ogata, J. and Goto, M. Speech repair: quick error correction just by using selection operation for speech input interfaces. Proc. ICSLP 2005, ISCA (2005), 133--136.

[19]

Oulasvirta, A., Tamminen, S., Roto, V. and Kuorelahti, J. Interaction in 4-second bursts: the fragmented nature of attentional resources in mobile HCI. Proc. CHI 2005, ACM Press (2005), 919--927.

Digital Library

[20]

Oviatt, S. Cohen, P., Wu, L., Vergo, J., Duncan, L, Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. and Ferro, D. Designing the user interface for multimodal speech and pen--based gesture applications: state-of-the-art systems and future research directions. Human-Computer Interaction 15 (2000), 263--322.

Digital Library

[21]

Price, K.J., Lin, M., Feng, J., Goldman, R., Sears, A. and Jacko, J. Motion does matter: an examination of speech-based text entry on the move. Universal Access in the Information Society 4 (2006), 246--257.

Digital Library

[22]

Rosenbaum, D.A. Human Motor Control. Academic Press (1991).

[23]

Shneiderman, B. The limits of speech recognition. Communications of the ACM 43, 9 (2000), 63--65.

Digital Library

[24]

Stolcke, A. Entropy-based Pruning of Backoff Language Models. Proc. DARPA Broadcast News Transcription and Understanding Workshop, DARPA (1998), 270--284.

[25]

Suhm, B., Myers, B. and Waibel, A. Multimodal error correction for speech user interfaces. ACM TOCHI 8, 1 (2001), 60--98.

Digital Library

[26]

Vertanen, K. Efficient computer interfaces using continuous gestures, language models, and speech M.Phil. thesis. University of Cambridge, United Kingdom (2004).

[27]

Vertanen, K. Baseline WSJ acoustic models for HTK and Sphinx: training recipes and recognition experiments. Technical report, University of Cambridge, United Kingdom (2006).

[28]

Vertanen, K. and Kristensson, P.O. On the benefits of confidence visualization in speech recognition. Proc. CHI 2008, ACM Press (2008), 1497--1500.

Digital Library

[29]

Weng, F., Stolcke, A. and Sankar, A. Efficient lattice representation and generation. Proc. ICSLP 1999, ICSA (1999), 1251--1254.

[30]

Wobbrock, J.O., Chau, D.H. and Myers, B.A. An alternative to push, press, and tap-tap-tap: gesturing on an isometric joystick for mobile phone text entry. Proc. CHI 2007, ACM Press (2007), 667--676.

Digital Library

Cited By

Condado PLobo F(2023)Security and privacy concerns in assisted living environmentsJournal of Smart Cities and Society10.3233/SCS-2300152:2(99-121)Online publication date: 23-Aug-2023
https://doi.org/10.3233/SCS-230015
Kristensson PMjelde MVertanen K(2023)Understanding Adoption Barriers to Dwell-Free Eye-Typing: Design Implications from a Qualitative Deployment Study and Computational SimulationsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584093(607-620)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584093
Vertanen K(2021)Probabilistic Text Entry—Case Study 3Intelligent Computing for Interactive System Design10.1145/3447404.3447420(277-320)Online publication date: 23-Feb-2021
https://dl.acm.org/doi/10.1145/3447404.3447420
Show More Cited By

Index Terms

Parakeet: a continuous speech recognition system for mobile touch-screen devices
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Sound-based input / output

Recommendations

Parakeet: a demonstration of speech recognition on a mobile touch-screen device
IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces

We demonstrate Parakeet -- a continuous speech recognition system for mobile touch-screen devices. Parakeet's interface is designed to make correcting errors easy on a handheld device while on the move. Users correct errors using a touch-screen to ...
Efficient Speech-Recognition Error-Correction Interface for Japanese Text Entry on Smartwatches
MobileHCI '19: Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services

We propose an efficient speech-recognition error-correction interface for Japanese text entry on smart-watches. Although the accuracy of automatic speech recognition (ASR) has significantly improved, an interface for text modification is still ...
Robust Romanian language automatic speech recognizer based on multistyle training

This paper presents solutions for increasing environmental robustness of a Romanian language continuous speech recognizer, previously developed. All state-of-the-art automatic speech recognizers (ASR) are data-driven and rely heavily on huge speech data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces

February 2009

522 pages

ISBN:9781605581682

DOI:10.1145/1502650

General Chairs:
Cristina Conati
University of British Columbia, Canada
,
Mathias Bauer
mineway GmbH, Germany
,
Program Chairs:
Nuria Oliver
Telefonica Research, Spain
,
Dan Weld
University of Washington, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IUI09

Sponsor:

IUI09: 14th International Conference on Intelligent User Interfaces

February 8 - 11, 2009

Florida, Sanibel Island, USA

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
722
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Condado PLobo F(2023)Security and privacy concerns in assisted living environmentsJournal of Smart Cities and Society10.3233/SCS-2300152:2(99-121)Online publication date: 23-Aug-2023
https://doi.org/10.3233/SCS-230015
Kristensson PMjelde MVertanen K(2023)Understanding Adoption Barriers to Dwell-Free Eye-Typing: Design Implications from a Qualitative Deployment Study and Computational SimulationsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584093(607-620)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584093
Vertanen K(2021)Probabilistic Text Entry—Case Study 3Intelligent Computing for Interactive System Design10.1145/3447404.3447420(277-320)Online publication date: 23-Feb-2021
https://dl.acm.org/doi/10.1145/3447404.3447420
Adhikary JVertanen K(2021)Text Entry in Virtual Environments using Speech and a Midair KeyboardIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.306777627:5(2648-2658)Online publication date: May-2021
https://doi.org/10.1109/TVCG.2021.3067776
Akpinar EYeşilada YTemizer S(2020)The Effect of Context on Small Screen and Wearable Device Users’ Performance - A Systematic ReviewACM Computing Surveys10.1145/338637053:3(1-44)Online publication date: 28-May-2020
https://dl.acm.org/doi/10.1145/3386370
Völkel SSchneegass CEiband MBuschek DPaternò FOliver NConati CSpano LTintarev N(2020)What is "intelligent" in intelligent user interfaces?Proceedings of the 25th International Conference on Intelligent User Interfaces10.1145/3377325.3377500(477-487)Online publication date: 17-Mar-2020
https://dl.acm.org/doi/10.1145/3377325.3377500
Foley MCasiez GVogel DBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)Comparing Smartphone Speech Recognition and Touchscreen Typing for Composition and TranscriptionProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376861(1-11)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376861
Vertanen KGaines DFletcher CStanage AWatling RKristensson PBrewster SFitzpatrick GCox AKostakos V(2019)VelociWatchProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300821(1-14)Online publication date: 2-May-2019
https://dl.acm.org/doi/10.1145/3290605.3300821
Wobbrock J(2019)Situationally-Induced Impairments and DisabilitiesWeb Accessibility10.1007/978-1-4471-7440-0_5(59-92)Online publication date: 4-Jun-2019
https://doi.org/10.1007/978-1-4471-7440-0_5
Bhikne BJoshi AJoshi MAhire SMaravi N(2018)How Much Faster Can You Type by Speaking in Hindi?Proceedings of the 9th Indian Conference on Human-Computer Interaction10.1145/3297121.3297123(20-28)Online publication date: 16-Dec-2018
https://dl.acm.org/doi/10.1145/3297121.3297123
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents