Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1502650.1502685acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Parakeet: a continuous speech recognition system for mobile touch-screen devices

Published: 08 February 2009 Publication History

Abstract

We present Parakeet, a system for continuous speech recognition on mobile touch-screen devices. The design of Parakeet was guided by computational experiments and validated by a user study. Participants had an average text entry rate of 18 words-per-minute (WPM) while seated indoors and 13 WPM while walking outdoors. In an expert pilot study, we found that speech recognition has the potential to be a highly competitive mobile text entry method, particularly in an actual mobile setting where users are walking around while entering text.

References

[1]
Accot, J. and Zhai, S. More than dotting the i's -- foundations for crossing-based interfaces. Proc. CHI 2002, ACM Press (2002), 73--80.
[2]
Bisani, M. and Ney, H. Bootstrap estimates for confidence intervals in ASR performance evaluation. Proc. ICASSP 2004, IEEE Press (2004), 409--412.
[3]
Buxton, W. Chunking and phrasing and the design of human-computer dialogues. Proc. IFIP World Computer Congress 1986. IFIP (1986), 475--480.
[4]
Cohen, J. Embedded speech recognition applications in mobile phones: status, trends and challenges. Proc. ICASSP 2008, IEEE Press (2008), 5352--5355.
[5]
Crossan, A., Murray-Smith, R., Brewster, S., Kelly, J. and Musizza, B. Gait phase effects in mobile interaction. Ext. Abstracts CHI 2005, ACM Press (2005), 1312--1315.
[6]
Darragh, J.J., Witten, I.H. and James, M.L. The reactive keyboard: a predictive typing aid. IEEE Computer 23, 11 (1990), 41--49.
[7]
Fitts, P. The information capacity in the human motor system in controlling the amplitude in movement. J. Experimental Psychology 47 (1954), 381--391.
[8]
Goodman, J., Venolia, G., Steury, K. and Parker, C. Language modeling for soft keyboards. Proc. AAAI 2002, AAAI Press (2002), 419--424.
[9]
Hakkani-Tür, D., Béchet, F., Riccardi, G. and Tur, G. Beyond ASR 1-best: using word confusion networks in spoken language understanding. J. Computer Speech and Language 20, 4 (2006), 495--514.
[10]
Hetherington, I.L. PocketSUMMIT: small footprint continuous speech recognition. Proc. ICSLP 2007, ISCA (2007), 1465--1468.
[11]
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M. and Rudnicky, A.I. PocketSphinx: a free real-time continuous speech recognition system for hand-held devices. Proc. ICASSP 2006, IEEE Press (2006), 185--188.
[12]
Karat, C.M., Halverson, C., Horn, D. and Karat, J. Patterns of entry and correction in large vocabulary speech recognition systems. Proc. CHI 1999, ACM Press (1999), 568--575.
[13]
Karlson, A.K., Bederson, B.B. and Contreras-Vidal, J.L. Understanding one-handed use of mobile devices. In Lumsden, J. (Ed.) Handbook of Research on User Interface Design and Evaluation for Mobile Technology. Idea Group (2008), 86--100.
[14]
Kurihara, K., Goto, M., Ogata, J. and Igarashi, T. Speech Pen: Predictive Handwriting Based on Ambient Multimodal Recognition. Proc. CHI 2006, ACM Press (2006), 851--860.
[15]
Kristensson, P.O. and Zhai, S. Relaxing stylus typing precision by geometric pattern matching. Proc. IUI 2005, ACM Press (2005), 151--158.
[16]
Kristensson, P.O. and Zhai, S. Improving word-recognizers using an interactive lexicon with active and passive words. Proc. IUI 2008, ACM Press (2008), 353--356.
[17]
Mangu, L., Brill E. and Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. J. Computer Speech and Language 14, 4 (2000), 373--400.
[18]
Ogata, J. and Goto, M. Speech repair: quick error correction just by using selection operation for speech input interfaces. Proc. ICSLP 2005, ISCA (2005), 133--136.
[19]
Oulasvirta, A., Tamminen, S., Roto, V. and Kuorelahti, J. Interaction in 4-second bursts: the fragmented nature of attentional resources in mobile HCI. Proc. CHI 2005, ACM Press (2005), 919--927.
[20]
Oviatt, S. Cohen, P., Wu, L., Vergo, J., Duncan, L, Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. and Ferro, D. Designing the user interface for multimodal speech and pen--based gesture applications: state-of-the-art systems and future research directions. Human-Computer Interaction 15 (2000), 263--322.
[21]
Price, K.J., Lin, M., Feng, J., Goldman, R., Sears, A. and Jacko, J. Motion does matter: an examination of speech-based text entry on the move. Universal Access in the Information Society 4 (2006), 246--257.
[22]
Rosenbaum, D.A. Human Motor Control. Academic Press (1991).
[23]
Shneiderman, B. The limits of speech recognition. Communications of the ACM 43, 9 (2000), 63--65.
[24]
Stolcke, A. Entropy-based Pruning of Backoff Language Models. Proc. DARPA Broadcast News Transcription and Understanding Workshop, DARPA (1998), 270--284.
[25]
Suhm, B., Myers, B. and Waibel, A. Multimodal error correction for speech user interfaces. ACM TOCHI 8, 1 (2001), 60--98.
[26]
Vertanen, K. Efficient computer interfaces using continuous gestures, language models, and speech M.Phil. thesis. University of Cambridge, United Kingdom (2004).
[27]
Vertanen, K. Baseline WSJ acoustic models for HTK and Sphinx: training recipes and recognition experiments. Technical report, University of Cambridge, United Kingdom (2006).
[28]
Vertanen, K. and Kristensson, P.O. On the benefits of confidence visualization in speech recognition. Proc. CHI 2008, ACM Press (2008), 1497--1500.
[29]
Weng, F., Stolcke, A. and Sankar, A. Efficient lattice representation and generation. Proc. ICSLP 1999, ICSA (1999), 1251--1254.
[30]
Wobbrock, J.O., Chau, D.H. and Myers, B.A. An alternative to push, press, and tap-tap-tap: gesturing on an isometric joystick for mobile phone text entry. Proc. CHI 2007, ACM Press (2007), 667--676.

Cited By

View all
  • (2023)Security and privacy concerns in assisted living environmentsJournal of Smart Cities and Society10.3233/SCS-2300152:2(99-121)Online publication date: 23-Aug-2023
  • (2023)Understanding Adoption Barriers to Dwell-Free Eye-Typing: Design Implications from a Qualitative Deployment Study and Computational SimulationsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584093(607-620)Online publication date: 27-Mar-2023
  • (2021)Probabilistic Text Entry—Case Study 3Intelligent Computing for Interactive System Design10.1145/3447404.3447420(277-320)Online publication date: 23-Feb-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces
February 2009
522 pages
ISBN:9781605581682
DOI:10.1145/1502650
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. continuous speech recognition
  2. error correction
  3. mobile text entry
  4. predictive keyboard
  5. speech input
  6. text input
  7. touch-screen interface
  8. word confusion network

Qualifiers

  • Research-article

Conference

IUI09
IUI09: 14th International Conference on Intelligent User Interfaces
February 8 - 11, 2009
Florida, Sanibel Island, USA

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Security and privacy concerns in assisted living environmentsJournal of Smart Cities and Society10.3233/SCS-2300152:2(99-121)Online publication date: 23-Aug-2023
  • (2023)Understanding Adoption Barriers to Dwell-Free Eye-Typing: Design Implications from a Qualitative Deployment Study and Computational SimulationsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584093(607-620)Online publication date: 27-Mar-2023
  • (2021)Probabilistic Text Entry—Case Study 3Intelligent Computing for Interactive System Design10.1145/3447404.3447420(277-320)Online publication date: 23-Feb-2021
  • (2021)Text Entry in Virtual Environments using Speech and a Midair KeyboardIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2021.306777627:5(2648-2658)Online publication date: May-2021
  • (2020)The Effect of Context on Small Screen and Wearable Device Users’ Performance - A Systematic ReviewACM Computing Surveys10.1145/338637053:3(1-44)Online publication date: 28-May-2020
  • (2020)What is "intelligent" in intelligent user interfaces?Proceedings of the 25th International Conference on Intelligent User Interfaces10.1145/3377325.3377500(477-487)Online publication date: 17-Mar-2020
  • (2020)Comparing Smartphone Speech Recognition and Touchscreen Typing for Composition and TranscriptionProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376861(1-11)Online publication date: 21-Apr-2020
  • (2019)VelociWatchProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300821(1-14)Online publication date: 2-May-2019
  • (2019)Situationally-Induced Impairments and DisabilitiesWeb Accessibility10.1007/978-1-4471-7440-0_5(59-92)Online publication date: 4-Jun-2019
  • (2018)How Much Faster Can You Type by Speaking in Hindi?Proceedings of the 9th Indian Conference on Human-Computer Interaction10.1145/3297121.3297123(20-28)Online publication date: 16-Dec-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media