Open Source German Distant Speech Recognition: Corpus and Acoustic Model

Radeck-Arneth, Stephan; Milde, Benjamin; Lange, Arvid; Gouvêa, Evandro; Radomski, Stefan; Mühlhäuser, Max; Biemann, Chris

doi:10.1007/978-3-319-24033-6_54

Stephan Radeck-Arneth^15,16,
Benjamin Milde¹⁵,
Arvid Lange^15,16,
Evandro Gouvêa^15,16,
Stefan Radomski¹⁵,
Max Mühlhäuser¹⁵ &
…
Chris Biemann¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1924 Accesses
7 Citations

Abstract

We present a new freely available corpus for German distant speech recognition and report speaker-independent word error rate (WER) results for two open source speech recognizers trained on this corpus. The corpus has been recorded in a controlled environment with three different microphones at a distance of one meter. It comprises 180 different speakers with a total of 36 hours of audio recordings. We show recognition results with the open source toolkit Kaldi (20.5% WER) and PocketSphinx (39.6% WER) and make a complete open source solution for German distant speech recognition possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Improving Automatic Speech Recognition with Dialect-Specific Language Models

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

References

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Proc. IEEE ASRU, pp. 1–4 (2011)
Google Scholar
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: PocketSphinx: a free, real-time continuous speech recognition system for hand-held devices. In: Proc. ICASSP (2006)
Google Scholar
Schiel, F., Steininger, S., Türk, U.: The smartkom multimodal corpus at BAS. In: Proc. LREC (2002)
Google Scholar
Wahlster, W.: Verbmobil: translation of face-to-face dialogs. In: Proc. 4th Machine Translation Summit, pp. 128–135 (1993)
Google Scholar
Hess, W.J., Kohler, K.J., Tillmann, H.G.: The Phondat-verbmobil speech corpus. In: Proc. EUROSPEECH (1995)
Google Scholar
Brinckmann, C., Kleiner, S., Knöbl, R., Berend, N.: German today: a really extensive corpus of spoken standard german. In: Proc. LREC (2008)
Google Scholar
Spiegl, W., Riedhammer, K., Steidl, S., Nöth, E.: FAU IISAH corpus - a german speech database consisting of human-machine and human-human interaction acquired by close-talking and far-distance microphones. In: Proc. LREC (2010)
Google Scholar
Schiel, F., Heinrich, C., Barfüßer, S.: Alcohol language corpus: the first public corpus of alcoholized German speech. In: Proc. LREC, vol. 46(3), pp. 503–521 (2012)
Google Scholar
Stadtschnitzer, M., Schwenninger, J., Stein, D., Köhler, J.: Exploiting the large-scale german broadcast corpus to boost the fraunhofer IAIS speech recognition system. In: Proc. LREC, pp. 3887–3890 (2014)
Google Scholar
Woelfel, M., McDonough, J.: Distant Speech Recognition. Wiley (2009)
Google Scholar
Gaida, C., Lange, P., Proba, P., Malatawy, A., Suendermann-Oeft, D.: Comparing open-source speech recognition toolkits. http://suendermann.com/su/pdf/oasis2014.pdf
Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R., Can, D., Georgiou, P., Narayanan, S., Leuski, A., Traum, D.: Which ASR should I choose for my dialogue system? In: Proc. SIGDIAL (2013)
Google Scholar
Akita, Y., Mimura, M., Kawahara, T.: Automatic transcription system for meetings of the japanese. In: Proc. INTERSPEECH, pp. 84–87 (2009)
Google Scholar
Biemann, C., Böhm, K., Heyer, G., Melz, R.: Automatically building concept structures and displaying concept trails for the use in brainstorming sessions and content management systems. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds.) IICS 2004. LNCS, vol. 3473, pp. 157–167. Springer, Heidelberg (2006)
Chapter Google Scholar
Schnelle-Walka, D., Radeck-Arneth, S., Biemann, C., Radomski, S.: An open source corpus and recording software for distant speech recognition with the microsoft kinect. In: Proc. 11. ITG Fachtagung Sprachkommunikation (2014)
Google Scholar
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proc. 10th MT Summit, Phuket, Thailand, AAMT, AAMT, pp. 79–86 (2005)
Google Scholar
Remus, S.: Unsupervised relation extraction of in-domain data from focused crawls. In: Proc. Student Research Workshop of EACL, Gothenburg, Sweden, pp. 11–20 (2014)
Google Scholar
Schröder, M., Trouvain, J.: The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. IJST 6, 365–377 (2003)
Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proc. ICASSP, vol. 1, pp. 181–184 (1995)
Google Scholar
Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete KALDI recipe for building Arabic speech recognition systems. In: Proc. IEEE SLT, pp. 525–529. Institute of Electrical and Electronics Engineers Inc. (2015)
Google Scholar
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N.K., Karafiat, M., Rastrow, A., Rose, R.C., Schwarz, P., Thomas, S.: Subspace gaussian mixture models for speech recognition. In: Proc. ICASSP, pp. 4330–4333 (2010)
Google Scholar
Gales, M.J.: Semi-Tied Covariance Matrices for Hidden Markov Models. IEEE Trans. Speech and Audio Processing 7, 272–281 (1999)
Article Google Scholar
Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language 12(2), 75–98 (1998)
Article Google Scholar
Gales, M.: Discriminative models for speech recognition. In: 2007 Information Theory and Applications Workshop (2007)
Google Scholar
Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proc. ICASSP, pp. 4057–4060 (2008)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech, Language Process. 20(1), 30–42 (2012)
Google Scholar
Swietojanski, P., Ghoshal, A., Renals, S.: Hybrid acoustic models for distant and multichannel large vocabulary speech recognition. In: Proc. IEEE ASRU, pp. 285–290 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technology Group Computer Science Departement, Technische Universität Darmstadt, Darmstadt, Germany
Stephan Radeck-Arneth, Benjamin Milde, Arvid Lange, Evandro Gouvêa, Stefan Radomski, Max Mühlhäuser & Chris Biemann
Telecooperation Group Computer Science Departement, Technische Universität Darmstadt, Darmstadt, Germany
Stephan Radeck-Arneth, Arvid Lange & Evandro Gouvêa

Authors

Stephan Radeck-Arneth
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Milde
View author publications
You can also search for this author in PubMed Google Scholar
Arvid Lange
View author publications
You can also search for this author in PubMed Google Scholar
Evandro Gouvêa
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Radomski
View author publications
You can also search for this author in PubMed Google Scholar
Max Mühlhäuser
View author publications
You can also search for this author in PubMed Google Scholar
Chris Biemann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan Radeck-Arneth .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Radeck-Arneth, S. et al. (2015). Open Source German Distant Speech Recognition: Corpus and Acoustic Model. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_54
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Open Source German Distant Speech Recognition: Corpus and Acoustic Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Improving Automatic Speech Recognition with Dialect-Specific Language Models

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Open Source German Distant Speech Recognition: Corpus and Acoustic Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

Improving Automatic Speech Recognition with Dialect-Specific Language Models

Evalita 2011: Automatic Speech Recognition Large Vocabulary Transcription

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation