Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Building a neural speech recognizer for quranic recitations

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This work is an effort towards building Neural Speech Recognizers system for Quranic recitations that can be effectively used by anyone regardless of their gender and age. Despite having a lot of recitations available online, most of them are recorded by professional male adult reciters, which means that an ASR system trained on such datasets would not work for female/child reciters. We address this gap by adopting a benchmark dataset of audio records of Quranic recitations that consists of recitations by both genders from different ages. Using this dataset, we build several speaker-independent NSR systems based on the DeepSpeech model and use word error rate (WER) for evaluating them. The goal is to show how an NSR system trained and tuned on a dataset of a certain gender would perform on a test set from the other gender. Unfortunately, the number of female recitations in our dataset is rather small while the number of male recitations is much larger. In the first set of experiments, we avoid the imbalance issue between the two genders and down-sample the male part to match the female part. For this small subset of our dataset, the results are interesting with 0.968 WER when the system is trained on male recitations and tested on female recitations. The same system gives 0.406 WER when tested on male recitations. On the other hand, training the system on female recitations and testing it on male recitation gives 0.966 WER while testing it on female recitations gives 0.608 WER.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abdelhamid, A., Alsayadi, H., Hegazy, I., & Fayed, Z. (2020). End-to-end Arabic speech recognition: A review. In The 19th conference of language engineering (ESOLEC’19).

  • Abro, B., Naqvi, A.B., & Hussain, A. (2012). Qur’an recognition for the purpose of memorisation using speech recognition technique. In 2012 15th International multitopic conference (INMIC) (pp. 30–34). https://doi.org/10.1109/INMIC.2012.6511440

  • Abushariah, M. A. M. (2017). Tameem v1.0: Speakers and text independent Arabic automatic continuous speech recognizer. International Journal of Speech Technology, 20(2), 261–280.

    Article  Google Scholar 

  • Agarwal, A., & Zesch, T. (2019). German end-to-end speech recognition based on deepspeech. In Proceedings of the 15th conference on natural language processing (KONVENS 2019).

  • Akkila, A.N., & Abu-Naser, S. S. (2018). In Rules of Tajweed the Holy Quran Intelligent Tutoring System.

  • Al-Anzi, F., & AbuZeina, D. (2018). Literature survey of Arabic speech recognition. In 2018 International conference on computing sciences and engineering (ICCSE), (pp. 1–6).

  • Al-Ayyoub, M., Damer, N. A., & Hmeidi, I. (2018). Using deep learning for automatically determining correct application of basic quranic recitation rules. International Arab Journal of Information Technology, 15, 620.

    Google Scholar 

  • Algihab, W., Alawwad, N., Aldawish, A., & AlHumoud, S. (2019). Arabic speech recognition with deep learning: A review. In G. Meiselwitz (Ed.), Social computing and social media. Design, human behavior and analytics (pp. 15–31). Springer.

    Chapter  Google Scholar 

  • Alhawarat, M., Hegazi, M. O., & Hilal, A. (2015). Processing the text of the Holy Quran: A text mining study. International Journal of Advanced Computer Science and Applications, 6, 262–267.

    Article  Google Scholar 

  • Alkhateeb, J. (2020). A machine learning approach for recognizing the Holy Quran reciter. International Journal of Advanced Computer Science and Applications. https://doi.org/10.14569/IJACSA.2020.0110735

    Article  Google Scholar 

  • AlKhatib, H., Mansor, E., Alsamel, Z., & AlBarazi, J. (2020). A study of using VR game in teaching Tajweed for teenagers (pp. 244–260). https://doi.org/10.4018/978-1-7998-2637-8.ch013

  • Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J.H., Fan, L., Fougner, C., Han, T., Hannun, A.Y., Jun, B., LeGresley, P., Lin, L., …, Zhu, Z. (2015). Deep speech 2: End-to-end speech recognition in English and Mandarin. CoRR. http://arxiv.org/abs/1512.02595

  • Battenberg, E., Chen, J., Child, R., Coates, A., Li, Y.G.Y., Liu, H., Satheesh, S., Sriram, A., & Zhu, Z. (2017). Exploring neural transducers for end-to-end speech recognition. In 2017 IEEE automatic speech recognition and understanding workshop (ASRU) (pp. 206–213). https://doi.org/10.1109/ASRU.2017.8268937

  • Bettayeb, N. (2020). Speech synthesis system for the Holy Quran recitation. The International Arab Journal of Information Technology, 18, 8–15. https://doi.org/10.34028/iajit/18/1/2

    Article  Google Scholar 

  • Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4960–4964). https://doi.org/10.1109/ICASSP.2016.7472621

  • Collobert, R., Puhrsch, C., & Synnaeve, G. (2016). Wav2letter: An end-to-end convnet-based speech recognition system. CoRR. http://arxiv.org/abs/1609.03193

  • Czerepinski, K. C. (2006). Tajweed rules of the Quran. DAR-AL-KHAIR ISLAMIC BOOK.

  • Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 30–42. https://doi.org/10.1109/TASL.2011.2134090

    Article  Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. https://doi.org/10.1109/TASSP.1980.1163420

    Article  Google Scholar 

  • El Amrani, M. Y., Rahman, M. H., Wahiddin, M. R., & Shah, A. (2016). Building Cmu sphinx language model for the Holy Quran using simplified Arabic phonemes. Egyptian Informatics Journal, 17(3), 305–314. https://doi.org/10.1016/j.eij.2016.04.002

    Article  Google Scholar 

  • Eldeeb, T. Deepspeech-quran. (2021) https://github.com/tarekeldeeb/DeepSpeech-Quran

  • Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., & Ng, A.Y. (2014). Deep speech: Scaling up end-to-end speech recognition.

  • Hayou, S., Doucet, A., & Rousseau, J. (2019). On the impact of the activation function on deep neural networks training. https://doi.org/10.48550/ARXIV.1902.06853

  • Heafield, K. (2011). KenLM: Faster and smaller language model queries. In Proceedings of the sixth workshop on statistical machine translation (pp. 187–197). Association for Computational Linguistics, Edinburgh, Scotland. https://www.aclweb.org/anthology/W11-2123

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  • Huang, X., & Deng, L. (2010). An overview of modern speech recognition. In Handbook of Natural Language Processing, Second Edition (pp. 339–366)

  • Hyassat, H., & Abu Zitar, R. (2006). Arabic speech recognition using sphinx engine. International Journal of Speech Technology, 9(3), 133–150.

    Article  Google Scholar 

  • Iakushkin, O., Fedoseev, G., Shaleva, A., Degtyarev, A., & Sedova, O. (2018). Russian-language speech recognition system based on deepspeech. In Proceedings of the VIII international conference “Distributed computing and grid-technologies in science and education”.

  • Ibrahim, N. J., Idris, M., Razak, Z., & Rahman, N. (2013). Automated Tajweed checking rules engine for quranic learning. Multicultural Education & Technology Journal, 7, 275–287. https://doi.org/10.1108/metj-03-2013-0012

    Article  Google Scholar 

  • Ibrahim, Y. A., Odiketa, J. C., & Ibiyemi, T. S. (2017). Preprocessing technique in automatic speech recognition for human computer interaction: An overview. The journal Annals. Computer Science Series, XV, 186–191.

    Google Scholar 

  • Juang, B. H., & Rabiner, L. R. (1991). Hidden Markov models for speech recognition. Technometrics, 33(3), 251–272.

    Article  MathSciNet  Google Scholar 

  • Khalaf, E., Daqrouq, K., & Morfeq, A. (2014). Arabic vowels recognition by modular arithmetic and wavelets using neural network. Life Science Journal, 11, 33–41.

    Google Scholar 

  • Khalaf, E., Daqrouq, K., & Sherif, M. (2011a). Modular arithmetic and wavelets for speaker verification. Journal of Applied Sciences. https://doi.org/10.3923/jas.2011.2782.2790

    Article  Google Scholar 

  • Khalaf, E., Daqrouq, K., & Sherif, M. (2011b). Wavelet packet and percent of energy distribution with neural networks based gender identification system. Journal of Applied Sciences, 11, 2940.

    Article  Google Scholar 

  • Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schone, P., Schwartz, R., & Vergyri, D. (2003). Novel approaches to arabic speech recognition: Report from the 2002 johns-hopkins summer workshop (pp. I–344). https://doi.org/10.1109/ICASSP.2003.1198788

  • Lamere, P., Kwok, P., Gouvêa, E., Raj, B., Singh, R., Walker, W., Warmuth, M., & Wolf, P. (2003). The cmu sphinx-4 speech recognition system.

  • Lee, K. F., Hon, H. W., & Reddy, R. (1990). An overview of the sphinx speech recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(1), 35–45. https://doi.org/10.1109/29.45616

    Article  Google Scholar 

  • Lei, Z., Jiandong, L., Jing, L., & Guanghui, Z. (2005). A novel wavelet packet division multiplexing based on maximum likelihood algorithm and optimum pilot symbol assisted modulation for rayleigh fading channels. Circuits, Systems and Signal Processing, 24(3), 287–302.

    Article  Google Scholar 

  • Lou, H. L. (1995). Implementing the viterbi algorithm. IEEE Signal Processing Magazine, 12(5), 42–52. https://doi.org/10.1109/79.410439

    Article  Google Scholar 

  • Mohammed, A., Sunar, M. S., & Salam, M. S. (2015). Quranic verses verification using speech recognition techniques. Journal Teknologi. https://doi.org/10.11113/jt.v73.4200

    Article  Google Scholar 

  • Mozilla: Deepspeech. (2021) https://github.com/mozilla/DeepSpeech

  • Mozilla: Deepspeech 0.9.3. (2020) https://github.com/mozilla/DeepSpeech/releases

  • Mustafa, B.S. Qdat. (2020) https://www.kaggle.com/annealdahi/quran-recitation

  • Panaite, M., Ruseti, S., Dascalu, M., & Trausan-Matu, S. (2019). Towards a deep speech model for Romanian language. In 2019 22nd International Conference on Control Systems and Computer Science (CSCS) (pp. 416–419). https://doi.org/10.1109/CSCS.2019.00076

  • Pratap, V., Hannun, A., Xu, Q., Cai, J., Kahn, J., Synnaeve, G., Liptchinsky, V., & Collobert, R. (2018). wav2letter++: The fastest open-source speech recognition system. CoRR. http://arxiv.org/abs/1812.07625

  • Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Prentice-Hall Inc.

    Google Scholar 

  • Rabiner, L.R., & Schafer, R.W. (2007). An introduction to digital speech processing. Foundations and Trends.

  • Radha, V. (2012). Implementing the Viterbi algorithm. World of Computer Science and Information Technology Journal (WCSIT), 2(1), 1–7.

    Google Scholar 

  • Riesen, K., & Bunke, H. (2010). Graph classification and clustering based on vector space embedding. World Scientific Publishing Co.

    Book  Google Scholar 

  • Santosh, K., Bharti, W., & Yannawar, P. (2010). A review on speech recognition technique. International Journal of Computer Applications. https://doi.org/10.5120/1462-1976

    Article  Google Scholar 

  • Schuster, M., & Paliwal, K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681. https://doi.org/10.1109/78.650093

    Article  Google Scholar 

  • Shafie, N., Adam, M., & Abas, H. (2017). The model of al-quran recitation evaluation to support in da’wah technology media for self-learning of recitation using mobile apps (2017). https://doi.org/10.13140/RG.2.2.29744.87041

  • Tabbal, H., El Falou, W., & Monla, B. (2006). Analysis and implementation of a “quranic” verses delimitation system in audio files using speech recognition techniques. In 2006 2nd international conference on information communication technologies (vol. 2, pp. 2979–2984). https://doi.org/10.1109/ICTTA.2006.1684889

  • Wang, Y.Y., & Waibel, A. (1997). Decoding algorithm in statistical machine translation. In Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the European Chapter of the Association for Computational Linguistics, ACL ’98/EACL ’98 (pp. 366-372). association for computational linguistics, USA. https://doi.org/10.3115/976909.979664

  • Wang, D., Wang, X., & Lv, S. (2019). An overview of end-to-end automatic speech recognition. Symmetry, 11(8), 1018.

    Article  Google Scholar 

  • Young, S. (1994). The htk hidden markov model toolkit: Design and philosophy (vol. 2, pp. 2–44). Entropic Cambridge Research Laboratory, Ltd.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osama Al-Khaleel.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Issa, S., Al-Ayyoub, M., Al-Khaleel, O. et al. Building a neural speech recognizer for quranic recitations. Int J Speech Technol 26, 1131–1151 (2023). https://doi.org/10.1007/s10772-022-09988-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-022-09988-3

Keywords