Building a neural speech recognizer for quranic recitations

Al-Issa, Suhad; Al-Ayyoub, Mahmoud; Al-Khaleel, Osama; Elmitwally, Nouh

doi:10.1007/s10772-022-09988-3

Building a neural speech recognizer for quranic recitations

Published: 05 August 2022

Volume 26, pages 1131–1151, (2023)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Suhad Al-Issa¹,
Mahmoud Al-Ayyoub²,
Osama Al-Khaleel ORCID: orcid.org/0000-0003-0585-2619¹ &
…
Nouh Elmitwally^3,4

331 Accesses
2 Citations
Explore all metrics

Abstract

This work is an effort towards building Neural Speech Recognizers system for Quranic recitations that can be effectively used by anyone regardless of their gender and age. Despite having a lot of recitations available online, most of them are recorded by professional male adult reciters, which means that an ASR system trained on such datasets would not work for female/child reciters. We address this gap by adopting a benchmark dataset of audio records of Quranic recitations that consists of recitations by both genders from different ages. Using this dataset, we build several speaker-independent NSR systems based on the DeepSpeech model and use word error rate (WER) for evaluating them. The goal is to show how an NSR system trained and tuned on a dataset of a certain gender would perform on a test set from the other gender. Unfortunately, the number of female recitations in our dataset is rather small while the number of male recitations is much larger. In the first set of experiments, we avoid the imbalance issue between the two genders and down-sample the male part to match the female part. For this small subset of our dataset, the results are interesting with 0.968 WER when the system is trained on male recitations and tested on female recitations. The same system gives 0.406 WER when tested on male recitations. On the other hand, training the system on female recitations and testing it on male recitation gives 0.966 WER while testing it on female recitations gives 0.608 WER.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quran reciter identification using NASNetLarge

Article Open access 05 February 2024

Acoustic modelling using deep learning for Quran recitation assistance

Article 21 June 2022

Intelligent Quran Recitation Recognition and Verification: Research Trends and Open Issues

Article 08 November 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Abdelhamid, A., Alsayadi, H., Hegazy, I., & Fayed, Z. (2020). End-to-end Arabic speech recognition: A review. In The 19th conference of language engineering (ESOLEC’19).
Abro, B., Naqvi, A.B., & Hussain, A. (2012). Qur’an recognition for the purpose of memorisation using speech recognition technique. In 2012 15th International multitopic conference (INMIC) (pp. 30–34). https://doi.org/10.1109/INMIC.2012.6511440
Abushariah, M. A. M. (2017). Tameem v1.0: Speakers and text independent Arabic automatic continuous speech recognizer. International Journal of Speech Technology, 20(2), 261–280.
Article Google Scholar
Agarwal, A., & Zesch, T. (2019). German end-to-end speech recognition based on deepspeech. In Proceedings of the 15th conference on natural language processing (KONVENS 2019).
Akkila, A.N., & Abu-Naser, S. S. (2018). In Rules of Tajweed the Holy Quran Intelligent Tutoring System.
Al-Anzi, F., & AbuZeina, D. (2018). Literature survey of Arabic speech recognition. In 2018 International conference on computing sciences and engineering (ICCSE), (pp. 1–6).
Al-Ayyoub, M., Damer, N. A., & Hmeidi, I. (2018). Using deep learning for automatically determining correct application of basic quranic recitation rules. International Arab Journal of Information Technology, 15, 620.
Google Scholar
Algihab, W., Alawwad, N., Aldawish, A., & AlHumoud, S. (2019). Arabic speech recognition with deep learning: A review. In G. Meiselwitz (Ed.), Social computing and social media. Design, human behavior and analytics (pp. 15–31). Springer.
Chapter Google Scholar
Alhawarat, M., Hegazi, M. O., & Hilal, A. (2015). Processing the text of the Holy Quran: A text mining study. International Journal of Advanced Computer Science and Applications, 6, 262–267.
Article Google Scholar
Alkhateeb, J. (2020). A machine learning approach for recognizing the Holy Quran reciter. International Journal of Advanced Computer Science and Applications. https://doi.org/10.14569/IJACSA.2020.0110735
Article Google Scholar
AlKhatib, H., Mansor, E., Alsamel, Z., & AlBarazi, J. (2020). A study of using VR game in teaching Tajweed for teenagers (pp. 244–260). https://doi.org/10.4018/978-1-7998-2637-8.ch013
Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J.H., Fan, L., Fougner, C., Han, T., Hannun, A.Y., Jun, B., LeGresley, P., Lin, L., …, Zhu, Z. (2015). Deep speech 2: End-to-end speech recognition in English and Mandarin. CoRR. http://arxiv.org/abs/1512.02595
Battenberg, E., Chen, J., Child, R., Coates, A., Li, Y.G.Y., Liu, H., Satheesh, S., Sriram, A., & Zhu, Z. (2017). Exploring neural transducers for end-to-end speech recognition. In 2017 IEEE automatic speech recognition and understanding workshop (ASRU) (pp. 206–213). https://doi.org/10.1109/ASRU.2017.8268937
Bettayeb, N. (2020). Speech synthesis system for the Holy Quran recitation. The International Arab Journal of Information Technology, 18, 8–15. https://doi.org/10.34028/iajit/18/1/2
Article Google Scholar
Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4960–4964). https://doi.org/10.1109/ICASSP.2016.7472621
Collobert, R., Puhrsch, C., & Synnaeve, G. (2016). Wav2letter: An end-to-end convnet-based speech recognition system. CoRR. http://arxiv.org/abs/1609.03193
Czerepinski, K. C. (2006). Tajweed rules of the Quran. DAR-AL-KHAIR ISLAMIC BOOK.
Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 30–42. https://doi.org/10.1109/TASL.2011.2134090
Article Google Scholar
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. https://doi.org/10.1109/TASSP.1980.1163420
Article Google Scholar
El Amrani, M. Y., Rahman, M. H., Wahiddin, M. R., & Shah, A. (2016). Building Cmu sphinx language model for the Holy Quran using simplified Arabic phonemes. Egyptian Informatics Journal, 17(3), 305–314. https://doi.org/10.1016/j.eij.2016.04.002
Article Google Scholar
Eldeeb, T. Deepspeech-quran. (2021) https://github.com/tarekeldeeb/DeepSpeech-Quran
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., & Ng, A.Y. (2014). Deep speech: Scaling up end-to-end speech recognition.
Hayou, S., Doucet, A., & Rousseau, J. (2019). On the impact of the activation function on deep neural networks training. https://doi.org/10.48550/ARXIV.1902.06853
Heafield, K. (2011). KenLM: Faster and smaller language model queries. In Proceedings of the sixth workshop on statistical machine translation (pp. 187–197). Association for Computational Linguistics, Edinburgh, Scotland. https://www.aclweb.org/anthology/W11-2123
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In Handbook of Natural Language Processing, Second Edition (pp. 339–366)
Hyassat, H., & Abu Zitar, R. (2006). Arabic speech recognition using sphinx engine. International Journal of Speech Technology, 9(3), 133–150.
Article Google Scholar
Iakushkin, O., Fedoseev, G., Shaleva, A., Degtyarev, A., & Sedova, O. (2018). Russian-language speech recognition system based on deepspeech. In Proceedings of the VIII international conference “Distributed computing and grid-technologies in science and education”.
Ibrahim, N. J., Idris, M., Razak, Z., & Rahman, N. (2013). Automated Tajweed checking rules engine for quranic learning. Multicultural Education & Technology Journal, 7, 275–287. https://doi.org/10.1108/metj-03-2013-0012
Article Google Scholar
Ibrahim, Y. A., Odiketa, J. C., & Ibiyemi, T. S. (2017). Preprocessing technique in automatic speech recognition for human computer interaction: An overview. The journal Annals. Computer Science Series, XV, 186–191.
Google Scholar
Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.
Article MathSciNet Google Scholar
Khalaf, E., Daqrouq, K., & Morfeq, A. (2014). Arabic vowels recognition by modular arithmetic and wavelets using neural network. Life Science Journal, 11, 33–41.
Google Scholar
Khalaf, E., Daqrouq, K., & Sherif, M. (2011a). Modular arithmetic and wavelets for speaker verification. Journal of Applied Sciences. https://doi.org/10.3923/jas.2011.2782.2790
Article Google Scholar
Khalaf, E., Daqrouq, K., & Sherif, M. (2011b). Wavelet packet and percent of energy distribution with neural networks based gender identification system. Journal of Applied Sciences, 11, 2940.
Article Google Scholar
Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schone, P., Schwartz, R., & Vergyri, D. (2003). Novel approaches to arabic speech recognition: Report from the 2002 johns-hopkins summer workshop (pp. I–344). https://doi.org/10.1109/ICASSP.2003.1198788
Lamere, P., Kwok, P., Gouvêa, E., Raj, B., Singh, R., Walker, W., Warmuth, M., & Wolf, P. (2003). The cmu sphinx-4 speech recognition system.
Lee, K. F., Hon, H. W., & Reddy, R. (1990). An overview of the sphinx speech recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(1), 35–45. https://doi.org/10.1109/29.45616
Article Google Scholar
Lei, Z., Jiandong, L., Jing, L., & Guanghui, Z. (2005). A novel wavelet packet division multiplexing based on maximum likelihood algorithm and optimum pilot symbol assisted modulation for rayleigh fading channels. Circuits, Systems and Signal Processing, 24(3), 287–302.
Article Google Scholar
Lou, H. L. (1995). Implementing the viterbi algorithm. IEEE Signal Processing Magazine, 12(5), 42–52. https://doi.org/10.1109/79.410439
Article Google Scholar
Mohammed, A., Sunar, M. S., & Salam, M. S. (2015). Quranic verses verification using speech recognition techniques. Journal Teknologi. https://doi.org/10.11113/jt.v73.4200
Article Google Scholar
Mozilla: Deepspeech. (2021) https://github.com/mozilla/DeepSpeech
Mozilla: Deepspeech 0.9.3. (2020) https://github.com/mozilla/DeepSpeech/releases
Mustafa, B.S. Qdat. (2020) https://www.kaggle.com/annealdahi/quran-recitation
Panaite, M., Ruseti, S., Dascalu, M., & Trausan-Matu, S. (2019). Towards a deep speech model for Romanian language. In 2019 22nd International Conference on Control Systems and Computer Science (CSCS) (pp. 416–419). https://doi.org/10.1109/CSCS.2019.00076
Pratap, V., Hannun, A., Xu, Q., Cai, J., Kahn, J., Synnaeve, G., Liptchinsky, V., & Collobert, R. (2018). wav2letter++: The fastest open-source speech recognition system. CoRR. http://arxiv.org/abs/1812.07625
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Prentice-Hall Inc.
Google Scholar
Rabiner, L.R., & Schafer, R.W. (2007). An introduction to digital speech processing. Foundations and Trends.
Radha, V. (2012). Implementing the Viterbi algorithm. World of Computer Science and Information Technology Journal (WCSIT), 2(1), 1–7.
Google Scholar
Riesen, K., & Bunke, H. (2010). Graph classification and clustering based on vector space embedding. World Scientific Publishing Co.
Book Google Scholar
Santosh, K., Bharti, W., & Yannawar, P. (2010). A review on speech recognition technique. International Journal of Computer Applications. https://doi.org/10.5120/1462-1976
Article Google Scholar
Schuster, M., & Paliwal, K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681. https://doi.org/10.1109/78.650093
Article Google Scholar
Shafie, N., Adam, M., & Abas, H. (2017). The model of al-quran recitation evaluation to support in da’wah technology media for self-learning of recitation using mobile apps (2017). https://doi.org/10.13140/RG.2.2.29744.87041
Tabbal, H., El Falou, W., & Monla, B. (2006). Analysis and implementation of a “quranic” verses delimitation system in audio files using speech recognition techniques. In 2006 2nd international conference on information communication technologies (vol. 2, pp. 2979–2984). https://doi.org/10.1109/ICTTA.2006.1684889
Wang, Y.Y., & Waibel, A. (1997). Decoding algorithm in statistical machine translation. In Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the European Chapter of the Association for Computational Linguistics, ACL ’98/EACL ’98 (pp. 366-372). association for computational linguistics, USA. https://doi.org/10.3115/976909.979664
Wang, D., Wang, X., & Lv, S. (2019). An overview of end-to-end automatic speech recognition. Symmetry, 11(8), 1018.
Article Google Scholar
Young, S. (1994). The htk hidden markov model toolkit: Design and philosophy (vol. 2, pp. 2–44). Entropic Cambridge Research Laboratory, Ltd.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Jordan University of Science and Technology, Irbid, 22110, Jordan
Suhad Al-Issa & Osama Al-Khaleel
Department of Computer Science, Jordan University of Science and Technology, Irbid, 22110, Jordan
Mahmoud Al-Ayyoub
School of Computing and Digital Technology, Birmingham City University, Birmingham, B4 7XG, UK
Nouh Elmitwally
Department of Computer Science, Faculty of Computers and Artificial Intelligence, Cairo University, Giza, 12613, Egypt
Nouh Elmitwally

Authors

Suhad Al-Issa
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud Al-Ayyoub
View author publications
You can also search for this author in PubMed Google Scholar
Osama Al-Khaleel
View author publications
You can also search for this author in PubMed Google Scholar
Nouh Elmitwally
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Osama Al-Khaleel.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Issa, S., Al-Ayyoub, M., Al-Khaleel, O. et al. Building a neural speech recognizer for quranic recitations. Int J Speech Technol 26, 1131–1151 (2023). https://doi.org/10.1007/s10772-022-09988-3

Download citation

Received: 01 May 2021
Accepted: 22 June 2022
Published: 05 August 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10772-022-09988-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building a neural speech recognizer for quranic recitations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Quran reciter identification using NASNetLarge

Acoustic modelling using deep learning for Quran recitation assistance

Intelligent Quran Recitation Recognition and Verification: Research Trends and Open Issues

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Building a neural speech recognizer for quranic recitations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Quran reciter identification using NASNetLarge

Acoustic modelling using deep learning for Quran recitation assistance

Intelligent Quran Recitation Recognition and Verification: Research Trends and Open Issues

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation