Dynamic pronunciation models for automatic speech recognition

January 1999

Author:
John Eric Fosler-Lussier,
Chair:
Nelson Morgan

Publisher:

University of California, Berkeley

ISBN:978-0-599-71154-9

Order Number:AAI9966379

Pages:

228

Purchase on ProQuest

Bibliometrics

Abstract

As of this writing, the automatic recognition of spontaneous speech by computer is fraught with errors; many systems transcribe one out of every three to five words incorrectly, whereas humans can transcribe spontaneous speech with one error in twenty words or better. This high error rate is due in part to the poor modeling of pronunciations within spontaneous speech. This dissertation examines how pronunciations vary in this speaking style, and how speaking rate and word predictability can be used to predict when greater pronunciation variation can be expected. It includes an investigation of the relationship between speaking rate, word predictability, pronunciations, and errors made by speech recognition systems. The results of these studies suggest that for spontaneous speech, it may be appropriate to build models for syllables and words that can dynamically change the pronunciations used in the speech recognizer based on the extended context (including surrounding words, phones, speaking rate, etc.). Implementation of new pronunciation models automatically derived from data within the ICSI speech recognition system has shown a 4–5% relative improvement on the Broadcast News recognition task. Roughly two thirds of these gains can be attributed to static baseform improvements; adding the ability to dynamically adjust pronunciations within the recognizer provides the other third of the improvement. The Broadcast News task also allows for comparison of performance on different styles of speech: the new pronunciation models do not help for pre-planned speech, but they provide a significant gain for spontaneous speech. Not only do the automatically learned pronunciation models capture some of the linguistic variation due to the speaking style, but they also represent variation in the acoustic model due to channel effects. The largest improvement was seen in the telephone speech condition, in which 12% of the errors produced by the baseline system were corrected.

Cited By

Contributors

John Eric Fosler-Lussier
University of California, Berkeley
- Publication Years1999 - 1999
- Publication counts1
- Citation count3
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article3
View Full Profile
Nelson Morgan
International Computer Science Institute
- Publication Years1984 - 2011
- Publication counts52
- Citation count732
- Available for Download3
- Downloads (cumulative)35,903
- Downloads (12 months)1,086
- Downloads (6 weeks)71
- Average Downloads per Article11,968
- Average Citation per Article14
View Full Profile

Comments

Recommendations

Improving continuous speech recognition with automatic multiple pronunciation support
Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

This article presents an approach for the automatic recognition of non-native speech. Some non-native speakers tend to pronounce phonemes as they would in their native language. Model adaptation can improve the recognition rate for non-native speakers, ...
Pronunciation modeling for large vocabulary speech recognition

Browse Theses

Sections

Cited By

Improving continuous speech recognition with automatic multiple pronunciation support

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

Pronunciation modeling for large vocabulary speech recognition

Sections

Cited By

Save to Binder

Recommendations

Improving continuous speech recognition with automatic multiple pronunciation support

Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling

Pronunciation modeling for large vocabulary speech recognition