Abstract
In this work, we compare and evaluate speech recognition models for the Turkic state languages, namely Azerbaijani, Kazakh, Kyrgyz, Turkish, Turkmen, and Uzbek. For this purpose, experimental studies of neural speech recognition are being conducted for three available open-source models: Whisper is an ASR system by OpenAI, TurkicASR of ISSAI, and The Massively Multilingual Speech (MMS) project of Facebook AI’s initiative. This project represents a key step towards streamlining the process of recording and processing meeting minutes in diverse Turkic languages. The scientific contribution of this article is the comparative analysis and selection of speech recognition models for the Turkic state languages based on ongoing experimental studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Musaev, M., Mussakhojayeva, S., Khujayorov, I., Khassanov, Y., Ochilov, M., Varol, H.A.: USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments (2020). arXiv preprint arXiv:2107.14419
Mussakhojayeva, S., Janaliyeva, A., Mirzakhmetov, A., Khassanov, Y., Varol, H.A.: KazakhTTS: an open-source Kazakh text-to-speech synthesis dataset. In: Proceedings of Interspeech 2021, pp. 2786–2790 (2021). https://doi.org/10.21437/Interspeech.2021-2124. Open-Source Kazakh Text-to-Speech Synthesis Dataset arXiv preprint arXiv:2104.08459
Mamyrbayev, O., Alimhan, K., Zhumazhanov, B., Turdalykyzy, T., Gusmanova, F.: End-to-End Speech Recognition in Agglutinative Languages. In: Nguyen, N.T., Jearanaitanakij, K., Selamat, A., Trawiński, B., Chittayasothorn, S. (eds.) Intelligent Information and Database Systems: 12th Asian Conference, ACIIDS 2020, Phuket, Thailand, March 23–26, 2020, Proceedings, Part II, pp. 391–401. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-42058-1_33
Mamyrbayev, O., Alimhan, K., Oralbekova, D., Bekarystankyzy, A., Zhumazhanov, B.: Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level. Eastern-Eur. J. Enterp. Technol. 1(9(115)), 84–92 (2022). https://doi.org/10.15587/1729-4061.2022.252801
Mamyrbayev, O.Z., Oralbekova, D.O., Alimkhan, K., Othman M., Zhumazhanov, B.: Application of a hybrid integral model for Kazakh speech recognition (in Russian). In: News of the National academy of sciences of the republic of Kazakhstan, vol. 1, № 341, pp. 58–68 (2022)
Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., Adiyev, A., Nurpeiissov, M., Varol, H.A.: A crowdsourced open-source Kazakh speech corpus and initial speech recognition baseline. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 697–706. Association for Computational Linguistics (2021)
Mussakhojayeva, S., Dauletbek, K., Yeshpanov, R., Varol, H.A. Multilingual speech recognition for Turkic languages. Information 14, 74 (2023). https://doi.org/10.3390/info14020074
Balabekova, T., Kairatuly, B., Tukeyev, Ur.: Kazakh-Uzbek speech cascade machine translation on complete set of endings. In: Nguyen, N.T., Botzheim, J., Gulyás, L., Nunez, M., Treur, J., Vossen, G., Kozierkiewicz, A. (eds.) Advances in Computational Collective Intelligence: 15th International Conference, ICCCI 2023, Budapest, Hungary, September 27–29, 2023, Proceedings, pp. 430–442. Springer Nature Switzerland, Cham (2023). https://doi.org/10.1007/978-3-031-41774-0_34
Radford, A., Kim, J.W., et al.: Robust speech recognition via large-scale weak supervision. In: ICML. 23–29 Jul 2023, vol. 202 of Proceedings of Machine Learning Research, pp. 28492–28518. PMLR (2023)
Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., et al.: ESPnet: end-to-end speech processing toolkit. In: Proceedings of the Interspeech, Hyderabad, India, 2–6 September 2018, pp. 2207–2211 (2018)
Ardila, R., et al.: Common voice: a massively-multilingual speech corpus. In: Proceedings of the Language Resources and Evaluation Conference (LREC), Marseille, France, 11–16 May 2020; European Language Resources Association: Marseille, France, pp. 4218–4222 (2020)
Russian Open Speech-to-Text Dataset. https://github.com/snakers4/open_stt
Pratap, V., et al.: Scaling speech technology to 1,000+ languages. arXiv:2305.13516 (2023)
Baevski, A., Zhou, H., Mohamed, A., Auli. M.: wav2vec 2.0: a framework for self-supervised learning of speech representations. arXiv:2006.11477 https://doi.org/10.48550/arXiv.2006.11477
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nurmaganbet, D., Tukeyev, U., Shormakova, A., Zhumanov, Z. (2024). Comparative Analysis of Models for Neural Machine Speech-to-Text Translation for Turkic State Languages. In: Nguyen, N.T., et al. Intelligent Information and Database Systems. ACIIDS 2024. Lecture Notes in Computer Science(), vol 14796. Springer, Singapore. https://doi.org/10.1007/978-981-97-4985-0_28
Download citation
DOI: https://doi.org/10.1007/978-981-97-4985-0_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-4984-3
Online ISBN: 978-981-97-4985-0
eBook Packages: Computer ScienceComputer Science (R0)