Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Tandem Connectionist Feature Extraction for Conversational Speech Recognition

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3361))

Included in the following conference series:

  • 1073 Accesses

Abstract

Multi-Layer Perceptrons (MLPs) can be used in automatic speech recognition in many ways. A particular application of this tool over the last few years has been the Tandem approach, as described in [7] and other more recent publications. Here we discuss the characteristics of the MLP-based features used for the Tandem approach, and conclude with a report on their application to conversational speech recognition. The paper shows that MLP transformations yield variables that have regular distributions, which can be further modified by using logarithm to make the distribution easier to model by a Gaussian-HMM. Two or more vectors of these features can easily be combined without increasing the feature dimension. We also report recognition results that show that MLP features can significantly improve recognition performance for the NIST 2001 Hub-5 evaluation set with models trained on the Switchboard Corpus, even for complex systems incorporating MMIE training and other enhancements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Andreou, A., Kamm, T., Cohen, J.: Experiments in Vocal Tract Normalization. In: Proc. CAIP Workshop: Frontiers in Speech Recognition II (1994)

    Google Scholar 

  2. Benitez, C., Burget, L., Chen, B., Dupont, S., Garudadri, H., Hermanskey, H., Jain, P., Kajarekar, S., Morgan, N., Sivadas, S.: Robust ASR front-end using spectral based and discriminant features: experiments on the Aurora task. In: Eurospeech (2001)

    Google Scholar 

  3. Bourlard, H., Wellekens, C.: Links between Markov models and multilayer percep-trons. IEEE Trans. Pattern Anal. Machine Intell. 12, 1167–1178 (1990)

    Article  Google Scholar 

  4. Chen, B., Zhu, Q., Morgan, N.: Learning long term temporal features in LVCSR using neural networks. In: ICSLP (2004) (submitted)

    Google Scholar 

  5. Gales, M.J.F.: Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech and Audio Processing 7, 272–281 (1999)

    Article  Google Scholar 

  6. Gao, X., Zhu, W., Shi, Q.: The IBM LVCSR System Used for 1998 Mandarin Broad-cast News Transcription Evaluation. In: Proc. DARPA Broadcast News Workshop (1999)

    Google Scholar 

  7. Hermansky, H., Ellis, D.P.W., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proc. ICASSP 2000, pp. 1635–1638 (2000)

    Google Scholar 

  8. Hermansky, H., Sharma, S.: TRAPS - Classifiers of Temporal Patterns. In: Proc. ICSLP (1998)

    Google Scholar 

  9. Misra, H., Bourlard, H., Tyagi, V.: New entropy based combination rules in HMM/ANN multi-stream ASR. In: Proc. ICASSP (2003)

    Google Scholar 

  10. Morgan, N., Bourlard, H.: Continuous speech recognition. IEEE Signal Processing Magazine 12(3), 24 (1995)

    Article  Google Scholar 

  11. Morgan, N., Chen, B., Zhu, Q., Stolcke, A.: TRAPping Conversational Speech: Extending TRAP/Tandem approaches to conversational telephone speech recognition. In: ICASSP (2004)

    Google Scholar 

  12. Reyes-Gomez, M., Ellis, D.P.W.: Error visualization for Tandem acoustic modeling on the Aurora task. In: ICASSP (2002)

    Google Scholar 

  13. Robinson, A.J., Cook, G.D., Ellis, D.P.W., Fosler-Lussier, E., Renals, S.J., Williams, D.A.G.: Connectionist speech recognition of Broadcast News. Speech Communication 37(1-2), 27–45 (2002)

    Article  MATH  Google Scholar 

  14. Stolcke, A., Bratt, H., Butzberger, J., Franco, H., Rao Gadde, V.R., Plauche, M., Richey, C., Shriberg, E., Sonmez, K., Weng, F., Zheng, J.: The SRI March 2005 Hub-5 con-versational speech transcription system. In: Proc. NIST Transcription Workshop (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, Q., Chen, B., Morgan, N., Stolcke, A. (2005). Tandem Connectionist Feature Extraction for Conversational Speech Recognition. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30568-2_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24509-4

  • Online ISBN: 978-3-540-30568-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics