Abstract
Multi-Layer Perceptrons (MLPs) can be used in automatic speech recognition in many ways. A particular application of this tool over the last few years has been the Tandem approach, as described in [7] and other more recent publications. Here we discuss the characteristics of the MLP-based features used for the Tandem approach, and conclude with a report on their application to conversational speech recognition. The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM. Two or more vectors of these features can easily be combined without increasing the feature dimension. We also report recognition results that show that MLP features can significantly improve recognition performance for the NIST 2001 Hub-5 evaluation set with models trained on the Switchboard Corpus, even for complex systems incorporating MMIE training and other enhancements.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andreou, A., Kamm, T., Cohen, J.: Experiments in Vocal Tract Normalization. In: Proc. CAIP Workshop: Frontiers in Speech Recognition II (1994)
Benitez, C., Burget, L., Chen, B., Dupont, S., Garudadri, H., Hermanskey, H., Jain, P., Kajarekar, S., Morgan, N., Sivadas, S.: Robust ASR front-end using spectral based and discriminant features: experiments on the Aurora task. In: Eurospeech (2001)
Bourlard, H., Wellekens, C.: Links between Markov models and multilayer percep-trons. IEEE Trans. Pattern Anal. Machine Intell. 12, 1167–1178 (1990)
Chen, B., Zhu, Q., Morgan, N.: Learning long term temporal features in LVCSR using neural networks. In: ICSLP (2004) (submitted)
Gales, M.J.F.: Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech and Audio Processing 7, 272–281 (1999)
Gao, X., Zhu, W., Shi, Q.: The IBM LVCSR System Used for 1998 Mandarin Broad-cast News Transcription Evaluation. In: Proc. DARPA Broadcast News Workshop (1999)
Hermansky, H., Ellis, D.P.W., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proc. ICASSP 2000, pp. 1635–1638 (2000)
Hermansky, H., Sharma, S.: TRAPS - Classifiers of Temporal Patterns. In: Proc. ICSLP (1998)
Misra, H., Bourlard, H., Tyagi, V.: New entropy based combination rules in HMM/ANN multi-stream ASR. In: Proc. ICASSP (2003)
Morgan, N., Bourlard, H.: Continuous speech recognition. IEEE Signal Processing Magazine 12(3), 24 (1995)
Morgan, N., Chen, B., Zhu, Q., Stolcke, A.: TRAPping Conversational Speech: Extending TRAP/Tandem approaches to conversational telephone speech recognition. In: ICASSP (2004)
Reyes-Gomez, M., Ellis, D.P.W.: Error visualization for Tandem acoustic modeling on the Aurora task. In: ICASSP (2002)
Robinson, A.J., Cook, G.D., Ellis, D.P.W., Fosler-Lussier, E., Renals, S.J., Williams, D.A.G.: Connectionist speech recognition of Broadcast News. Speech Communication 37(1-2), 27–45 (2002)
Stolcke, A., Bratt, H., Butzberger, J., Franco, H., Rao Gadde, V.R., Plauche, M., Richey, C., Shriberg, E., Sonmez, K., Weng, F., Zheng, J.: The SRI March 2005 Hub-5 con-versational speech transcription system. In: Proc. NIST Transcription Workshop (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, Q., Chen, B., Morgan, N., Stolcke, A. (2005). Tandem Connectionist Feature Extraction for Conversational Speech Recognition. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-30568-2_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)