Automatic Speech Recognition Improvement for Kazakh Language with Enhanced Language Model

Bekarystankyzy, Akbayan; Mamyrbayev, Orken; Mendes, Mateus; Oralbekova, Dina; Zhumazhanov, Bagashar; Fazylzhanova, Anar

doi:10.1007/978-3-031-42430-4_44

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1863))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

353 Accesses

Abstract

Last time there are unbelievable results in Natural Language Processing(NLP) and Automatic Speech Recognition(ASR). As a result, everybody can use smart search engines such as ChatGPT, smart voice assistants such as Siri, Alexa and more. But these opportunities are available only to the people who can use English or other common languages. For people who use low-resource languages these products are not available. As collection of transcribed data is time consuming and expensive process, scientists search ways of implementing reliable ASR models for low-resource languages. One of ASR improving methods in the case of lack of data is the use of external language model built on text larger than text in the entire dataset. And use this language model in the decoding process. As Kazakh language is also one of low-resource languages it is was decided to test this approach for kazakh language with different language models like Sequential RNNLM and Transformer LM. Inclusion of language model trained on bigger dataset allowed to decrease error values especially for Word Error Rate (WER). The best result was obtained with Transformer LM, WER was decreased to 7.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech translation system for english to dravidian languages

Article 17 September 2016

Urdu Natural Language Processing Issues and Challenges: A Review Study

GFCC based discriminatively trained noise robust continuous ASR system for Hindi language

Article 07 May 2018

References

Ren, Z., Yolwas, N., Slamu, W., Cao, R., Wang, H.: Improving hybrid CTC/attention architecture for agglutinative language speech recognition. Sensors 22, 7319 (2022)
Article Google Scholar
Mamyrbayev, O., Oralbekova, D., Alimhan, K., Nuranbayeva, B.: Hybrid end-to-end model for Kazakh speech recognition. Int. J. Speech Technol. 08, 1–10 (2022)
Google Scholar
Kuanyshbay, D., Amirgaliyev, Y., Baimuratov, O.: Development of automatic speech recognition for kazakh language using transfer learning. Int. J. Adv. Trends Comput. Sci. Eng. 9, 5880–5886 (2020)
Article Google Scholar
Mussakhojayeva, S., Dauletbek, K., Yeshpanov, R., Varol, H.A.: Multilingual speech recognition for turkic languages. Information 14(2), 74 (2023). https://doi.org/10.3390/info14020074
Article Google Scholar
Orken, M., Alimhan, K., Oralbekova, D., Bekarystankyzy, A., Zhumazhanov, B.: Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level. Eastern-Eur. J. Enterp. Technol. 1, 84–92 (2022)
Article Google Scholar
Orken, M., Oralbekova, D., Alimhan, K., Tolganay, T., Othman, M.: A study of transformer-based end-to-end speech recognition system for Kazakh language. Sci. Rep. 12(1), 8337 (2022)
Article Google Scholar
Chuang, S.-P., Liu, A.H., Sung, T.-W., Lee, H.: Improving automatic speech recognition and speech translation via word embedding prediction. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 93–105 (2021). https://doi.org/10.1109/TASLP.2020.3037543
Article Google Scholar
Kubo, Y., Karita, S., Bacchiani, M.: Knowledge transfer from large-scale pretrained language models to end-to-end speech recognizers (2022). https://www.researchgate.net/publication/358655492_Knowledge_Transfer_from_Large-scale_Pretrained_Language_Models_to_End-to-end_Speech_Recognizers
Huang, W.R., Peyser, C., Sainath, T.N., Pang, R., Strohman, T., Kumar, S.: Sentence-select: large-scale language model data selection for rare-word speech recognition. In: Interspeech (2022)
Google Scholar
Mukherji, K., Pandharipande, M., Kopparapu, S.K.: Improved language models for ASR using written language text. In: 2022 National Conference on Communications (NCC), Mumbai, India, pp. 362–366 (2022). https://doi.org/10.1109/NCC55593.2022.9806803
Amirgaliyev, Y., Kuanyshbay, D., Yedilkhan, D.: Automatic speech recognition system for Kazakh language using connectionist temporal classifier (2020)
Google Scholar
Watanabe, S., et al.: ESPnet: end-to-end speech processing toolkit. In: Proceedings of the Interspeech 2018, pp. 2207–2211 (2018). https://doi.org/10.21437/Interspeech.2018-1456
Watanabe, S., et al.: The 2020 ESPnet Update: new features, broadened applications, performance improvements, and future plans. In: Proceedings of the 2021 IEEE Data Science and Learning Workshop (DSLW) (2021)
Google Scholar
Jing, K., Xu, J.: A survey on neural network language models (2019). https://doi.org/10.48550/arXiv.1906.03591
Bengio, Y., Senecal, J.: Quick training of probabilistic neural nets by importance sampling. In: Bishop, Christopher M. and Frey, Brendan J. (eds.) International Conference on Artificial Intelligence and Statistics, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, vol. R4, pp. 17–24 (2003)
Google Scholar
Guo, P., et al.: Recent developments on ESPnet toolkit boosted by conformer. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, pp. 5874–5878 (2021). https://doi.org/10.1109/ICASSP39728.2021.9414858
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. ArXiv arXiv:1409.0473 (2014)

Download references

Acknowledgement

This research has is funded by the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant No. BR11765619).

Author information

Authors and Affiliations

Satbayev University, Satbayev Str. 22, Almaty, Kazakhstan
Akbayan Bekarystankyzy
Institute of Information and Computational Technologies CS MES RK, Shevchenko Str. 28, Almaty, Kazakhstan
Orken Mamyrbayev & Bagashar Zhumazhanov
Polytechnic Institute of Coimbra, ISEC, Coimbra, Portugal
Mateus Mendes
University of Coimbra, ISR, Coimbra, Portugal
Mateus Mendes
Almaty University of Power Engineering and Telecomminucations, Almaty, Kazakhstan
Dina Oralbekova
Committee of Science of the Ministry of Science and Higher Education of the RK, Institute of Linguistics and named after Akhmet Baitursynuly, Almaty, Kazakhstan
Anar Fazylzhanova

Authors

Akbayan Bekarystankyzy
View author publications
You can also search for this author in PubMed Google Scholar
Orken Mamyrbayev
View author publications
You can also search for this author in PubMed Google Scholar
Mateus Mendes
View author publications
You can also search for this author in PubMed Google Scholar
Dina Oralbekova
View author publications
You can also search for this author in PubMed Google Scholar
Bagashar Zhumazhanov
View author publications
You can also search for this author in PubMed Google Scholar
Anar Fazylzhanova
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Siridech Boonsang
Iwate Prefectural University, Iwate, Japan
Hamido Fujita
Wrocław University of Science and Technology, Wrocław, Poland
Bogumiła Hnatkowska
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
King Mongkut's Institute of Technology, Ladkrabang, Thailand
Kitsuchart Pasupa
Malaysia Japan International Institute of Technology, Kuala Lumpur, Malaysia
Ali Selamat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bekarystankyzy, A., Mamyrbayev, O., Mendes, M., Oralbekova, D., Zhumazhanov, B., Fazylzhanova, A. (2023). Automatic Speech Recognition Improvement for Kazakh Language with Enhanced Language Model. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_44

Download citation

DOI: https://doi.org/10.1007/978-3-031-42430-4_44
Published: 29 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42429-8
Online ISBN: 978-3-031-42430-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Speech Recognition Improvement for Kazakh Language with Enhanced Language Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech translation system for english to dravidian languages

Urdu Natural Language Processing Issues and Challenges: A Review Study

GFCC based discriminatively trained noise robust continuous ASR system for Hindi language

References

Acknowledgement

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Speech Recognition Improvement for Kazakh Language with Enhanced Language Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech translation system for english to dravidian languages

Urdu Natural Language Processing Issues and Challenges: A Review Study

GFCC based discriminatively trained noise robust continuous ASR system for Hindi language

References

Acknowledgement

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation