On the Use of MLP Features for Broadcast News Transcription

Fousek, Petr; Lamel, Lori; Gauvain, Jean-Luc

doi:10.1007/978-3-540-87391-4_39

Petr Fousek¹,
Lori Lamel¹ &
Jean-Luc Gauvain¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

977 Accesses
7 Citations

Abstract

Multi-Layer Perceptron (MLP) features have recently been attracting growing interest for automatic speech recognition due to their complementarity with cepstral features. In this paper the use of MLP features is evaluated in a large vocabulary continuous speech recognition task, exploring different types of MLP features and their combination. Cepstral features and three types of Bottle-Neck MLP features were first evaluated without and with unsupervised model adaptation using models with the same number of parameters. When used with MLLR adaption on a broadcast news Arabic transcription task, Bottle-Neck MLP features perform as well as or even slightly better than a standard 39 PLP based front-end. This paper also explores different combination schemes (feature concatenations, cross adaptation, and hypothesis combination). Extending the feature vector by combining various feature sets led to a 9% relative word error rate reduction relative to the PLP baseline. Significant gains are also reported with both ROVER hypothesis combination and cross-model adaptation. Feature concatenation appears to be the most efficient combination method, providing the best gain with the lowest decoding cost.

This work was in parts supported under the GALE program of the Defense Advanced Research Projects Agency, Contract No. HR0011-06-C-0022 an in parts by OSEO under the Quaero program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Slovak Broadcast News Speech Recognition and Transcription System

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Training Maxout Neural Networks for Speech Recognition Tasks

References

Zhu, Q., Stolcke, A., Chen, B.Y., Morgan, N.: Using MLP features in SRI’s conversational speech recognition system. In: INTERSPEECH 2005, pp. 2141–2144 (2005)
Google Scholar
Hermansky, H., Ellis, D., Sharma, S.: TANDEM connectionist feature extraction for conventional HMM systems. In: ICASSP 2000, Istanbul, Turkey (2000)
Google Scholar
Hermansky, H., Sharma, S.: TRAPs - classifiers of TempoRAl Patterns. In: ICSLP 1998 (November 1998)
Google Scholar
Grézl, F., Karafiát, M., Kontár, S., Černocký, J.: Probabilistic and bottle-neck features for LVCSR of meetings. In: ICASSP 2007, April 2007, pp. 757–760. IEEE Signal Processing Society, Hononulu (2007)
Google Scholar
Grézl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: ICASSP 2008, Las Vegas, ND (2008)
Google Scholar
Gauvain, J., Lamel, L., Adda, G.: The LIMSI Broadcast News Transcription System. Speech Communication 37(1-2), 89–108 (2002)
Article Google Scholar
Lamel, L., Messaoudi, A., J.L.G.: Improved Acoustic Modeling for Transcribing Arabic Broadcast Data. In: Interspeech 2007, Antwerp, Belgium (2007)
Google Scholar
Leggetter, C., Woodland, P.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9(2), 171–185 (1995)
Article Google Scholar
Fousek, P.: Extraction of Features for Automatic Recognition of Speech Based on Spectral Dynamics. PhD thesis, Czech Technical University in Prague, Faculty of Electrical Engineering, Prague (March 2007)
Google Scholar
Athineos, M., Hermansky, H., Ellis, D.P.: LP-TRAP: Linear predictive temporal patterns. In: ICSLP 2004 (2004)
Google Scholar
Fiscus, J.: A Post-Processing System to Yield Reduced Word Error Rates: Recogniser Output Voting Error Reduction (ROVER) (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Spoken Language Processing Group, LIMSI-CNRS, France
Petr Fousek, Lori Lamel & Jean-Luc Gauvain

Authors

Petr Fousek
View author publications
You can also search for this author in PubMed Google Scholar
Lori Lamel
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gauvain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fousek, P., Lamel, L., Gauvain, JL. (2008). On the Use of MLP Features for Broadcast News Transcription. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_39

Download citation

DOI: https://doi.org/10.1007/978-3-540-87391-4_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Use of MLP Features for Broadcast News Transcription

Abstract

Access this chapter

Preview

Similar content being viewed by others

Slovak Broadcast News Speech Recognition and Transcription System

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Training Maxout Neural Networks for Speech Recognition Tasks

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On the Use of MLP Features for Broadcast News Transcription

Abstract

Access this chapter

Preview

Similar content being viewed by others

Slovak Broadcast News Speech Recognition and Transcription System

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

Training Maxout Neural Networks for Speech Recognition Tasks

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation