Tandem Connectionist Feature Extraction for Conversational Speech Recognition

Zhu, Qifeng; Chen, Barry; Morgan, Nelson; Stolcke, Andreas

doi:10.1007/978-3-540-30568-2_19

Qifeng Zhu¹⁸,
Barry Chen^18,19,
Nelson Morgan^18,19 &
…
Andreas Stolcke^18,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3361))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

1073 Accesses

Abstract

Multi-Layer Perceptrons (MLPs) can be used in automatic speech recognition in many ways. A particular application of this tool over the last few years has been the Tandem approach, as described in [7] and other more recent publications. Here we discuss the characteristics of the MLP-based features used for the Tandem approach, and conclude with a report on their application to conversational speech recognition. The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM. Two or more vectors of these features can easily be combined without increasing the feature dimension. We also report recognition results that show that MLP features can significantly improve recognition performance for the NIST 2001 Hub-5 evaluation set with models trained on the Switchboard Corpus, even for complex systems incorporating MMIE training and other enhancements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum

Phone recognition with hierarchical convolutional deep maxout networks

Article Open access 04 September 2015

Training Maxout Neural Networks for Speech Recognition Tasks

References

Andreou, A., Kamm, T., Cohen, J.: Experiments in Vocal Tract Normalization. In: Proc. CAIP Workshop: Frontiers in Speech Recognition II (1994)
Google Scholar
Benitez, C., Burget, L., Chen, B., Dupont, S., Garudadri, H., Hermanskey, H., Jain, P., Kajarekar, S., Morgan, N., Sivadas, S.: Robust ASR front-end using spectral based and discriminant features: experiments on the Aurora task. In: Eurospeech (2001)
Google Scholar
Bourlard, H., Wellekens, C.: Links between Markov models and multilayer percep-trons. IEEE Trans. Pattern Anal. Machine Intell. 12, 1167–1178 (1990)
Article Google Scholar
Chen, B., Zhu, Q., Morgan, N.: Learning long term temporal features in LVCSR using neural networks. In: ICSLP (2004) (submitted)
Google Scholar
Gales, M.J.F.: Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech and Audio Processing 7, 272–281 (1999)
Article Google Scholar
Gao, X., Zhu, W., Shi, Q.: The IBM LVCSR System Used for 1998 Mandarin Broad-cast News Transcription Evaluation. In: Proc. DARPA Broadcast News Workshop (1999)
Google Scholar
Hermansky, H., Ellis, D.P.W., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proc. ICASSP 2000, pp. 1635–1638 (2000)
Google Scholar
Hermansky, H., Sharma, S.: TRAPS - Classifiers of Temporal Patterns. In: Proc. ICSLP (1998)
Google Scholar
Misra, H., Bourlard, H., Tyagi, V.: New entropy based combination rules in HMM/ANN multi-stream ASR. In: Proc. ICASSP (2003)
Google Scholar
Morgan, N., Bourlard, H.: Continuous speech recognition. IEEE Signal Processing Magazine 12(3), 24 (1995)
Article Google Scholar
Morgan, N., Chen, B., Zhu, Q., Stolcke, A.: TRAPping Conversational Speech: Extending TRAP/Tandem approaches to conversational telephone speech recognition. In: ICASSP (2004)
Google Scholar
Reyes-Gomez, M., Ellis, D.P.W.: Error visualization for Tandem acoustic modeling on the Aurora task. In: ICASSP (2002)
Google Scholar
Robinson, A.J., Cook, G.D., Ellis, D.P.W., Fosler-Lussier, E., Renals, S.J., Williams, D.A.G.: Connectionist speech recognition of Broadcast News. Speech Communication 37(1-2), 27–45 (2002)
Article MATH Google Scholar
Stolcke, A., Bratt, H., Butzberger, J., Franco, H., Rao Gadde, V.R., Plauche, M., Richey, C., Shriberg, E., Sonmez, K., Weng, F., Zheng, J.: The SRI March 2005 Hub-5 con-versational speech transcription system. In: Proc. NIST Transcription Workshop (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

International Computer Science Institute,
Qifeng Zhu, Barry Chen, Nelson Morgan & Andreas Stolcke
University of California, Berkeley
Barry Chen & Nelson Morgan
SRI International,
Andreas Stolcke

Authors

Qifeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Barry Chen
View author publications
You can also search for this author in PubMed Google Scholar
Nelson Morgan
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Stolcke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
IDIAP Research Institute, CH-1920, Martigny, Switzerland
Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Q., Chen, B., Morgan, N., Stolcke, A. (2005). Tandem Connectionist Feature Extraction for Conversational Speech Recognition. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-30568-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tandem Connectionist Feature Extraction for Conversational Speech Recognition

Abstract

Access this chapter

Preview

Similar content being viewed by others

On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum

Phone recognition with hierarchical convolutional deep maxout networks

Training Maxout Neural Networks for Speech Recognition Tasks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Tandem Connectionist Feature Extraction for Conversational Speech Recognition

Abstract

Access this chapter

Preview

Similar content being viewed by others

On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum

Phone recognition with hierarchical convolutional deep maxout networks

Training Maxout Neural Networks for Speech Recognition Tasks

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation