Robust speech recognition in a car using a microphone array

January 2006

Author:
Bowon Lee
University of Illinois at Urbana-Champaign
,
Adviser:
Mark A. Hasegawa-Johnson
University of Illinois at Urbana-Champaign

Publisher:

University of Illinois at Urbana-Champaign
Champaign, IL
United States

Order Number:AAI3250275

Pages:

121

Purchase on ProQuest

Abstract

Performance of automatic speech recognition relies on a vast amount of training speech data mostly recorded with little or no background noise. The performance degrades significantly with background noise, which increases type mismatch between train and test environments. Speech enhancement techniques can reduce the amount of type mismatch.

At very low SNR with nonstationary noise, the enhanced speech may still contain significant noise either in noise-only segments or speech segments. The former masquerade as nonexistent speech and the latter as distorted speech. Both significantly degrade the performance of the automatic speech recognizer. This encourages the use of voice activity detection (VAD) algorithms to determine regions with speech present. To use only the reliable speech features, we need to further determine whether the features from the speech region are mainly from speech or from nonstationary noises masking the speech. For more robust speech recognition, this thesis proposes a three-hypothesis VAD consisting of H ₀: noise-only region; H _S : speech-dominant speech region; and H _N : noise-dominant speech region.

Spectrum-based VAD uses knowledge of the noise spectrum to detect voice activity using the nonstationary nature of speech. This thesis proposes a method of estimating the instantaneous noise spectrum for VAD. The spectrum-based VAD, however, cannot distinguish speech from nonstationary noise because both appear nonstationary to the VAD, and thus look like speech. A microphone array can determine the noise-corrupted speech region when the nonstationary noise is from a location other than that of the speech source. This thesis proposes a method of distinguishing H _S from H _N based on the steered response power (SRP) method, which estimates power from any location.

Phonemic restoration is a phenomenon in which humans claim to hear missing phonemes that have been replaced by noise. Given strong nonstationary noises occasionally masking the speech region, as well as knowledge of H _S and H _N , this thesis proposes a phoneme restoration approach for automatic speech recognition in the hidden Markov model framework.

The proposed approach has two steps: speech enhancement as a preprocessor of noisy speech signals, followed by the phoneme restoration for robust speech recognition against nonstationary noises given knowledge of H _S and H _N .

Cited By

Chen A and Hasegawa-Johnson M (2014). Mixed stereo audio classification using a stereo-input mixed-to-panned level feature, IEEE/ACM Transactions on Audio, Speech and Language Processing, 22:12, (2025-2033), Online publication date: 1-Dec-2014.

Contributors

Mark Allan Hasegawa-Johnson
University of Illinois Urbana-Champaign
- Publication Years1996 - 2024
- Publication counts58
- Citation count718
- Available for Download21
- Downloads (cumulative)5,620
- Downloads (12 months)608
- Downloads (6 weeks)106
- Average Downloads per Article268
- Average Citation per Article12
View Full Profile
Bowon Lee
University of Illinois Urbana-Champaign
- Publication Years2006 - 2006
- Publication counts1
- Citation count1
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article1
View Full Profile

Index Terms

Robust speech recognition in a car using a microphone array

Comments

Recommendations

Robust mandarin speech recognition for car navigation interface
PCM'06: Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing

This paper presents a robust automatic speech recognition (ASR) system as multimedia interface for car navigation. In front-end, we use the minimum-mean square error (MMSE) enhancement to suppress the background in-car noise and then compensate the ...
Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings

We present a new framework for joint analysis of throat and acoustic microphone (TAM) recordings to improve throat microphone only speech recognition. The proposed analysis framework aims to learn joint sub-phone patterns of throat and acoustic ...
Microphone array driven speech recognition: influence of localization on the word error rate
MLMI'05: Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Interest within the automatic speech recognition (ASR) research community has recently focused on the recognition of speech captured with one or more microphones located in the far field, rather than being mounted on a headset and positioned next to the ...

Browse Theses

Sections

Cited By

Index Terms

Robust mandarin speech recognition for car navigation interface

Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings

Microphone array driven speech recognition: influence of localization on the word error rate

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Robust mandarin speech recognition for car navigation interface

Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings

Microphone array driven speech recognition: influence of localization on the word error rate