Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Using confidence scores to improve hands-free speech based navigation in continuous dictation systems

Published: 01 December 2004 Publication History

Abstract

Speech recognition systems have improved dramatically, but recent studies confirm that error correction activities still account for 66--75% of the users' time, and 50% of that time is spent just getting to the errors that need to be corrected. While researchers have suggested that confidence scores could prove useful during the error correction process, the focus is typically on error detection. More importantly, empirical studies have failed to confirm any measurable benefits when confidence scores are used in this way within dictation-oriented applications. In this article, we provide data that explains why confidence scores are unlikely to be useful for error detection. We propose a new navigation technique for use when speech-only interactions are strongly preferred and common, desktop-sized displays are available. The results of an empirical study that highlights the potential of this new technique are reported. An informal comparison between the current study and previous research suggests the new technique reduces time spent on navigation by 18%. Future research should include additional studies that compare the proposed technique to previous non-speech and speech-based navigation solutions.

References

[1]
Bouwman, G., Sturm, J., and Boves, L. 1999. Incorporating confidence measures in the dutch train timetable information system developed in the ARISE project. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, Vol. 1, 493--496, Phoenix.
[2]
Bouwman, G., Boves, L., and Koolwaaij, J. 2000. Weighting phone confidence measures for automatic speech recognition. In Proceedings of the COST249 Workshop on Voice Operated Telecom Services. 59--62.
[3]
Chase, L. 1997a. Error-responsive feedback mechanisms for speech recognizers. Ph.D. Thesis. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.
[4]
Chase, L. 1997b. Word and acoustic confidence annotation for large vocabulary speech recognition. Eurospeech-97. 815--818.
[5]
Christian, K., Kules, B., Shneiderman, B., and Youssef, A. 2000. A comparison of voice controlled and mouse controlled web browsing. In Proceedings OF Assets 2000. 72--79.
[6]
Danis, C., Comerford, L., Janke, E., Davies, K., Devries, J., and Bertrand, A. 1994. Storywriter: A Speech Oriented Editor, In CHI 94 Conference Companion. 277--278.
[7]
de Mauro, C., Gori, M., Maggini, M., and Martinelli, E. 2001. Easy access to graphical interfaces by voice mouse. [email protected].
[8]
Feng, J., Sears, A., and Karat, C.-M. 2003. A longitudinal investigation of hands-free speech-based navigation during dictation. UMBC Tech. Rep. available from the authors.
[9]
Gillick, L., Ito, Y., and Young, J. 1997. A probabilistic approach to confidence estimation and evaluation. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing. 879--882.
[10]
Gunawardana, A., Hon, H., and Jiang, L. 1998. Word-based acoustic confidence measures for large-vocabulary speech recognition. In Proceedings OF ICSLP-98, Sydney, Australia, Vol. 3, 791--794.
[11]
Hazen, T. J. and Bazzi, I. 2001. A comparison and combination of methods for OOV word detection and word confidence scoring. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 397--400.
[12]
Hazen, T. J., Polifroni, J., and Seneff, S. 2002. Recognition confidence scoring for use in speech understanding systems. Computer Speech and Language 16, 1, 49--67.
[13]
Karat, C.-M., Halverson, C., Karat, J., and Horn, D. 1999. Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proceedings OF CHI 99. 568--575.
[14]
Karat, C.-M., Vergo, J., and Nahamoo, D. 2003. Conversational Interface Technologies. In J. Jacko and A. Sears, Eds. The Human-Computer Interaction Handbook. LEA: NJ. 169--186.
[15]
Kemp, T. and Schaaf, T. 1997. Estimating confidence using word lattices. In Eurospeech, 1997, Vol. 1, 371--373.
[16]
Litman, D., Walker, M., and Kearns, M. S. 1999. Automatic detection of poor speech recognition at the dialogue level. In Proceedings of the 37th Annual Meeting for Computational Linguistics. 309--316.
[17]
Maison, B. and Gopinath, R. A. 2001. Robust confidence annotation and rejection for continuous speech recognition. In Proceedings OF the International Conference ON Acoustics, Speech, and Signal Processing. 389--392.
[18]
Manaris, B. and Harkreader, A. 1998. SUITEKeys: A speech understanding interface for the motor-control challenged. In Proceedings of the 3rd International ACM SIGCAPH Conference on Assistive Technologies (ASSETS'98). 108--115.
[19]
Mcnair, A. and Waibel, A. 1994. Improving recognizer acceptance through robust, natural speech repair. In Proceedings of the International Conference on Spoken Language Processing. 1299--1302.
[20]
Mou, X. and Zue, V. 2000. The use of dynamic reliability scoring in speech recognition. In Proceedings of the 6th International Conference on Spoken Language Processing.
[21]
Oviatt, S. L. 1997. Multimodal interactive maps: Designing for human performance. Human-Computer Interaction 12, 93--129.
[22]
Oviatt, S. L. 2000. Taming speech recognition errors within a multimodal interface. Comm. ACM 43, 9, 45--51.
[23]
Oviatt, S. L. 2003. Multimodal interfaces. In J. A. Jacko and A. Sears, Eds. The Human-Computer Interaction Handbook. Mahwah, NJ: Lawrence Erlbaum Assoc. 286--304.
[24]
Oviatt, S. L., Cohen, P. R., Wu, L.,Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., and Ferro, D. 2000. Designing the user interface for multimodal speech and gesture applications: State-of-the-art systems and research directions. Human Computer Interaction 15, 4, 263--322.
[25]
Sears, A., Feng, J., Oseitutu, K., and Karat, C.-M. 2003. Hands-free speech-based navigation during dictation: Difficulties, consequences, and solutions. Human Computer Interaction 18, 3, 229--257.
[26]
Sears, A., Karat, C.-M., Oseitutu, K., Karimullah, A., and Feng, J. 2001. Productivity, satisfaction, and interaction strategies of individual with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the Information Society 1, 4--15.
[27]
Setlur, A. R., Sukkar, R. A, and Jacob, J. 1996. Correcting recognition errors via discriminative utterance verification. In Proceedings of ICSLP'96, Vol. II, 602--605.
[28]
Suhm, B., Myers, B., and Waibel, A. 2001. Multimodal error correction for speech user interfaces. ACM Trans. Comp.-Hum. Interact. 8, 1, 60--98.

Cited By

View all
  • (2022)Exploring Gaze Movement Gesture Recognition Method for Eye-Based Interaction Using Eyewear with Infrared Distance Sensor ArrayElectronics10.3390/electronics1110163711:10(1637)Online publication date: 20-May-2022
  • (2022)Augmenting Ear Accessories for Facial Gesture Input Using Infrared Distance Sensor ArrayElectronics10.3390/electronics1109148011:9(1480)Online publication date: 5-May-2022
  • (2021)Probabilistic Text Entry—Case Study 3Intelligent Computing for Interactive System Design10.1145/3447404.3447420(277-320)Online publication date: 23-Feb-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer-Human Interaction
ACM Transactions on Computer-Human Interaction  Volume 11, Issue 4
December 2004
143 pages
ISSN:1073-0516
EISSN:1557-7325
DOI:10.1145/1035575
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2004
Published in TOCHI Volume 11, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Confidence score
  2. navigation
  3. simulation
  4. speech recognition

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Exploring Gaze Movement Gesture Recognition Method for Eye-Based Interaction Using Eyewear with Infrared Distance Sensor ArrayElectronics10.3390/electronics1110163711:10(1637)Online publication date: 20-May-2022
  • (2022)Augmenting Ear Accessories for Facial Gesture Input Using Infrared Distance Sensor ArrayElectronics10.3390/electronics1109148011:9(1480)Online publication date: 5-May-2022
  • (2021)Probabilistic Text Entry—Case Study 3Intelligent Computing for Interactive System Design10.1145/3447404.3447420(277-320)Online publication date: 23-Feb-2021
  • (2020)HeadCrossProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/33809834:1(1-22)Online publication date: 14-Sep-2020
  • (2020)The Study of Two Novel Speech-Based Selection Techniques in Voice-User InterfacesIEEE Access10.1109/ACCESS.2020.30416498(217024-217032)Online publication date: 2020
  • (2019)The State of Speech in HCI: Trends, Themes and ChallengesInteracting with Computers10.1093/iwc/iwz01631:4(349-371)Online publication date: 11-Sep-2019
  • (2018)HeadGestureProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870762:4(1-23)Online publication date: 27-Dec-2018
  • (2018)Vocal Programming for People with Upper-Body Motor ImpairmentsProceedings of the 15th International Web for All Conference10.1145/3192714.3192821(1-10)Online publication date: 23-Apr-2018
  • (2018)Vibrational Artificial Subtle ExpressionsProceedings of the 2018 CHI Conference on Human Factors in Computing Systems10.1145/3173574.3174052(1-9)Online publication date: 21-Apr-2018
  • (2018)Screen navigation system for visually impaired peopleJournal of Enabling Technologies10.1108/JET-04-2018-001912:3(114-128)Online publication date: 17-Sep-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media