Abstract
We present a new freely available corpus for German distant speech recognition and report speaker-independent word error rate (WER) results for two open source speech recognizers trained on this corpus. The corpus has been recorded in a controlled environment with three different microphones at a distance of one meter. It comprises 180 different speakers with a total of 36 hours of audio recordings. We show recognition results with the open source toolkit Kaldi (20.5% WER) and PocketSphinx (39.6% WER) and make a complete open source solution for German distant speech recognition possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Proc. IEEE ASRU, pp. 1–4 (2011)
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M., Rudnicky, A.I.: PocketSphinx: a free, real-time continuous speech recognition system for hand-held devices. In: Proc. ICASSP (2006)
Schiel, F., Steininger, S., Türk, U.: The smartkom multimodal corpus at BAS. In: Proc. LREC (2002)
Wahlster, W.: Verbmobil: translation of face-to-face dialogs. In: Proc. 4th Machine Translation Summit, pp. 128–135 (1993)
Hess, W.J., Kohler, K.J., Tillmann, H.G.: The Phondat-verbmobil speech corpus. In: Proc. EUROSPEECH (1995)
Brinckmann, C., Kleiner, S., Knöbl, R., Berend, N.: German today: a really extensive corpus of spoken standard german. In: Proc. LREC (2008)
Spiegl, W., Riedhammer, K., Steidl, S., Nöth, E.: FAU IISAH corpus - a german speech database consisting of human-machine and human-human interaction acquired by close-talking and far-distance microphones. In: Proc. LREC (2010)
Schiel, F., Heinrich, C., Barfüßer, S.: Alcohol language corpus: the first public corpus of alcoholized German speech. In: Proc. LREC, vol. 46(3), pp. 503–521 (2012)
Stadtschnitzer, M., Schwenninger, J., Stein, D., Köhler, J.: Exploiting the large-scale german broadcast corpus to boost the fraunhofer IAIS speech recognition system. In: Proc. LREC, pp. 3887–3890 (2014)
Woelfel, M., McDonough, J.: Distant Speech Recognition. Wiley (2009)
Gaida, C., Lange, P., Proba, P., Malatawy, A., Suendermann-Oeft, D.: Comparing open-source speech recognition toolkits. http://suendermann.com/su/pdf/oasis2014.pdf
Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R., Can, D., Georgiou, P., Narayanan, S., Leuski, A., Traum, D.: Which ASR should I choose for my dialogue system? In: Proc. SIGDIAL (2013)
Akita, Y., Mimura, M., Kawahara, T.: Automatic transcription system for meetings of the japanese. In: Proc. INTERSPEECH, pp. 84–87 (2009)
Biemann, C., Böhm, K., Heyer, G., Melz, R.: Automatically building concept structures and displaying concept trails for the use in brainstorming sessions and content management systems. In: Böhme, T., Larios Rosillo, V.M., Unger, H., Unger, H. (eds.) IICS 2004. LNCS, vol. 3473, pp. 157–167. Springer, Heidelberg (2006)
Schnelle-Walka, D., Radeck-Arneth, S., Biemann, C., Radomski, S.: An open source corpus and recording software for distant speech recognition with the microsoft kinect. In: Proc. 11. ITG Fachtagung Sprachkommunikation (2014)
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: Proc. 10th MT Summit, Phuket, Thailand, AAMT, AAMT, pp. 79–86 (2005)
Remus, S.: Unsupervised relation extraction of in-domain data from focused crawls. In: Proc. Student Research Workshop of EACL, Gothenburg, Sweden, pp. 11–20 (2014)
Schröder, M., Trouvain, J.: The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching. IJST 6, 365–377 (2003)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proc. ICASSP, vol. 1, pp. 181–184 (1995)
Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., Glass, J.: A complete KALDI recipe for building Arabic speech recognition systems. In: Proc. IEEE SLT, pp. 525–529. Institute of Electrical and Electronics Engineers Inc. (2015)
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N.K., Karafiat, M., Rastrow, A., Rose, R.C., Schwarz, P., Thomas, S.: Subspace gaussian mixture models for speech recognition. In: Proc. ICASSP, pp. 4330–4333 (2010)
Gales, M.J.: Semi-Tied Covariance Matrices for Hidden Markov Models. IEEE Trans. Speech and Audio Processing 7, 272–281 (1999)
Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech & Language 12(2), 75–98 (1998)
Gales, M.: Discriminative models for speech recognition. In: 2007 Information Theory and Applications Workshop (2007)
Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proc. ICASSP, pp. 4057–4060 (2008)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech, Language Process. 20(1), 30–42 (2012)
Swietojanski, P., Ghoshal, A., Renals, S.: Hybrid acoustic models for distant and multichannel large vocabulary speech recognition. In: Proc. IEEE ASRU, pp. 285–290 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Radeck-Arneth, S. et al. (2015). Open Source German Distant Speech Recognition: Corpus and Acoustic Model. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)