Abstract
In this work, we focus on studying a morpheme-based speech recognition system for Basque, an highly inflected language that is official language in the Basque Country (northern Spain). Two different techniques are presented to decompose the words into their morphological units. The morphological units are then integrated into an Automatic Speech Recognition System, and those systems are then compared to a word-based approach in terms of accuracy and processing speed. Results show that whereas the morpheme-based approaches perform similarly from an accuracy point of view, they can be significantly faster than the word-based system when applied to a weather-forecast task.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kirchhoff, K., Vergyri, D., Bilmes, J., Duh, K., Stolcke, A.: Morphology-based language modeling for conversational Arabic speech recognition. Computer Speech and Language 20, 589–608 (2006)
Rotovnik, T., Maučec, M.S., Kačič, Z.: Large vocabulary continuous speech recognition of an inflected language using stems and endings. Speech Communication 49(6), 437–452 (2007)
Kirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkönen, J.: Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language 20, 515–541 (2006)
Arisoy, E., Dutag̈aci, H., Arslan, L.M.: A unified language model for large vocabulary continuous spech recognition of Turkish. Signal Processing 86, 2844–2862 (2006)
Kwon, O.W., Park, J.: Korean large vocabulary continuous speech recognition with morpheme-based recognition units. Speech Communication 39, 287–300 (2003)
Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings of the International and Interdisciplinary Conference on Aadaptive Knowledge Representation and Reasoning (AKRR), Espoo, Finland (June 2005)
Pérez, A., Torres, M.I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: 5th SALTMIL Workshop on Minority Languages, Genoa, pp. 99–101 (May 2006)
Kneissler, J., Klakow, D.: Speech recognition for huge vocabularies by using optimized sub-word units. In: Proc. Eurospeech 2001, Aalborg, pp. 69–72 (2001)
Guijarrubia, V., Torres, M.I., Rodríguez, L.J.: Evaluation of a spoken phonetic database in Basque language. In: Proceedings of LREC, Lisbon, vol. 6, pp. 2127–2130 (2004)
Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in ASR performance evaluation. In: Proc. IEEE ICASSP, vol. 1, pp. 409–412 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guijarrubia, V.G., Torres, M.I., Justo, R. (2009). Morpheme-Based Automatic Speech Recognition of Basque. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2009. Lecture Notes in Computer Science, vol 5524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02172-5_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-02172-5_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02171-8
Online ISBN: 978-3-642-02172-5
eBook Packages: Computer ScienceComputer Science (R0)