Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Open Source German Distant Speech Recognition: Corpus and Acoustic Model

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2015)

Abstract

We present a new freely available corpus for German distant speech recognition and report speaker-independent word error rate (WER) results for two open source speech recognizers trained on this corpus. The corpus has been recorded in a controlled environment with three different microphones at a distance of one meter. It comprises 180 different speakers with a total of 36 hours of audio recordings. We show recognition results with the open source toolkit Kaldi (20.5% WER) and PocketSphinx (39.6% WER) and make a complete open source solution for German distant speech recognition possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Proc. IEEE ASRU, pp. 1–4 (2011)

    Google Scholar 

  2. Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: PocketSphinx: a free, real-time continuous speech recognition system for hand-held devices. In: Proc. ICASSP (2006)

    Google Scholar 

  3. Schiel, F., Steininger, S., Türk, U.: The smartkom multimodal corpus at BAS. In: Proc. LREC (2002)

    Google Scholar 

  4. Wahlster, W.: Verbmobil: translation of face-to-face dialogs. In: Proc. 4th Machine Translation Summit, pp. 128–135 (1993)

    Google Scholar 

  5. Hess, W.J., Kohler, K.J., Tillmann, H.G.: The Phondat-verbmobil speech corpus. In: Proc. EUROSPEECH (1995)

    Google Scholar 

  6. Brinckmann, C., Kleiner, S., Knöbl, R., Berend, N.: German today: a really extensive corpus of spoken standard german. In: Proc. LREC (2008)

    Google Scholar 

  7. Spiegl, W., Riedhammer, K., Steidl, S., Nöth, E.: FAU IISAH corpus - a german speech database consisting of human-machine and human-human interaction acquired by close-talking and far-distance microphones. In: Proc. LREC (2010)

    Google Scholar 

  8. Schiel, F., Heinrich, C., Barfüßer, S.: Alcohol language corpus: the first public corpus of alcoholized German speech. In: Proc. LREC, vol. 46(3), pp. 503–521 (2012)

    Google Scholar 

  9. Stadtschnitzer, M., Schwenninger, J., Stein, D., Köhler, J.: Exploiting the large-scale german broadcast corpus to boost the fraunhofer IAIS speech recognition system. In: Proc. LREC, pp. 3887–3890 (2014)

    Google Scholar 

  10. Woelfel, M., McDonough, J.: Distant Speech Recognition. Wiley (2009)

    Google Scholar 

  11. Gaida, C., Lange, P., Proba, P., Malatawy, A., Suendermann-Oeft, D.: Comparing open-source speech recognition toolkits. http://suendermann.com/su/pdf/oasis2014.pdf

  12. Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R., Can, D., Georgiou, P., Narayanan, S., Leuski, A., Traum, D.: Which ASR should I choose for my dialogue system? In: Proc. SIGDIAL (2013)

    Google Scholar 

  13. Akita, Y., Mimura, M., Kawahara, T.: Automatic transcription system for meetings of the japanese. In: Proc. INTERSPEECH, pp. 84–87 (2009)

    Google Scholar 

  14. Biemann, C., Böhm, K., Heyer, G., Melz, R.: Automatically building concept structures and displaying concept trails for the use in brainstorming sessions and content management systems. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds.) IICS 2004. LNCS, vol. 3473, pp. 157–167. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Schnelle-Walka, D., Radeck-Arneth, S., Biemann, C., Radomski, S.: An open source corpus and recording software for distant speech recognition with the microsoft kinect. In: Proc. 11. ITG Fachtagung Sprachkommunikation (2014)

    Google Scholar 

  16. Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proc. 10th MT Summit, Phuket, Thailand, AAMT, AAMT, pp. 79–86 (2005)

    Google Scholar 

  17. Remus, S.: Unsupervised relation extraction of in-domain data from focused crawls. In: Proc. Student Research Workshop of EACL, Gothenburg, Sweden, pp. 11–20 (2014)

    Google Scholar 

  18. Schröder, M., Trouvain, J.: The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. IJST 6, 365–377 (2003)

    Google Scholar 

  19. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proc. ICASSP, vol. 1, pp. 181–184 (1995)

    Google Scholar 

  20. Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete KALDI recipe for building Arabic speech recognition systems. In: Proc. IEEE SLT, pp. 525–529. Institute of Electrical and Electronics Engineers Inc. (2015)

    Google Scholar 

  21. Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N.K., Karafiat, M., Rastrow, A., Rose, R.C., Schwarz, P., Thomas, S.: Subspace gaussian mixture models for speech recognition. In: Proc. ICASSP, pp. 4330–4333 (2010)

    Google Scholar 

  22. Gales, M.J.: Semi-Tied Covariance Matrices for Hidden Markov Models. IEEE Trans. Speech and Audio Processing 7, 272–281 (1999)

    Article  Google Scholar 

  23. Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language 12(2), 75–98 (1998)

    Article  Google Scholar 

  24. Gales, M.: Discriminative models for speech recognition. In: 2007 Information Theory and Applications Workshop (2007)

    Google Scholar 

  25. Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proc. ICASSP, pp. 4057–4060 (2008)

    Google Scholar 

  26. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech, Language Process. 20(1), 30–42 (2012)

    Google Scholar 

  27. Swietojanski, P., Ghoshal, A., Renals, S.: Hybrid acoustic models for distant and multichannel large vocabulary speech recognition. In: Proc. IEEE ASRU, pp. 285–290 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Radeck-Arneth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Radeck-Arneth, S. et al. (2015). Open Source German Distant Speech Recognition: Corpus and Acoustic Model. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics