Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

AbuZeina, Dia; Al-Khatib, Wasfi; Elshafei, Moustafa; Al-Muhtaseb, Husni

doi:10.1007/s10772-011-9122-4

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Published: 14 October 2011

Volume 15, pages 65–75, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Dia AbuZeina¹,
Wasfi Al-Khatib¹,
Moustafa Elshafei¹ &
…
Husni Al-Muhtaseb¹

197 Accesses
7 Citations
Explore all metrics

Abstract

Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of phonological rules on Arabic speech recognition

Article 24 July 2017

Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic

Article 26 June 2020

Diacritics Effect on Arabic Speech Recognition

Article 10 July 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

AbuZeina, D., Al-Khatib, W., Elshafei, M., & Al-Muhtaseb, H. (2011). Cross-word Arabic pronunciation variation modeling for speech recognition. International Journal of Speech Technology.
Alghamdi, M., Almuhtasib, H., & Elshafei, M. (2004). Arabic phonological rules. Journal of King Saud University: Computer and Information Sciences, 16, 1–25.
Google Scholar
Alghamdi, M., Elshafei, M., & Almuhtasib, H. (2009). Arabic broadcast news transcription system. International Journal of Speech and Technology, 10, 183–195.
Article Google Scholar
Ali, M., Moustafa, E., Mansour, A., Husni, A., & Atef, A. (2009). Arabic phonetic dictionaries for speech recognition. Journal of Information Technology Research, 2(4), 67–80.
Article Google Scholar
Alsuwaiyel, M. H. (2003). Algorithms: design techniques and analysis. Singapore: World Scientific.
Google Scholar
Amdal, I., & Fossler-Lussier, E. (2003). Pronunciation variation modeling in automatic speech recognition. Telektronik, 99(2).
Al-Haj, H., Hsiao, R., Lane, I. W., Black, A., & Waibel, A. (2009). Pronunciation modeling for dialectal Arabic speech recognition. In ASRU 2009: IEEE workshop, Italy.
Google Scholar
Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V., & Wellekens, C. (2007). Automatic speech recognition and speech variability: a review. Speech Communication, 49(10–11), 763–786.
Article Google Scholar
Biadsy, F., Habash, N., & Hirschberg, J. (2009). Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In The 2009 annual conference of the North American chapter of the ACL, Colorado (pp. 397–405).
Google Scholar
Billa, et al. (2002). Arabic speech and test in Tides on Tap. In Proceedings of HLT.
Google Scholar
Elshafei, Ahmed M. (1991). Toward an Arabic text-to-speech system. The Arabian Journal of Science and Engineering, 16(4B), 565–583.
MathSciNet Google Scholar
Elshafei, M., Almuhtasib, H., & Alghamdi, M. (2002). Techniques for high quality text-to-speech. Information Sciences, 140(3–4), 255–267.
Article MATH Google Scholar
Finke, M., & Waibel, A. (1997). Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition. In Proceedings of EuroSpeech-97, Rhodes (pp. 2379–2382).
Google Scholar
Fosler-Lussier, E., Greenberg, S., & Morgan, N. (1999). Incorporating contextual phonetics into automatic speech recognition. In International Congress of Phonetic Sciences (ICPhS ’99), San Francisco, California (pp. 611–614).
Google Scholar
IPA for Arabic (2011). http://en.wikipedia.org/wiki/Wikipedia:IPA_for_Arabic.
Helmer, S. (2001). Pronunciation adaptation at the lexical level. In Proceedings ISCA ITRW workshop adaptation methods for speech recognition, Sophia Antipolis, France.
Google Scholar
Jeon, J., Cha, S., Chung, M., Park, J., & Hwang, K. (1998). Automatic generation of Korean pronunciation variants by multistage applications of phonological rules. In ICSLP-1998 (paper 0675).
Google Scholar
Jurafsky, D., & Martin, J. (2009). Speech and language processing (2nd ed.). Upper Saddle River: Pearson.
Google Scholar
Kessens, J. M., Strik, H., & Cucchiarini, C. (2000). A bottom-up method for obtaining information about pronunciation variation. In ICSLP, Beijing, China.
Google Scholar
Kyong-Nim, L., & Minhwa, C. (2007). Morpheme-based modeling of pronunciation variation for large vocabulary continuous speech recognition in Korean. IEICE Transactions on Information and Systems, E90-D(7), 1063–1072.
Article Google Scholar
Liu, Y., & Fung, P. (2003, to appear). Modeling partial pronunciation variations for spontaneous Mandarin speech recognition. Computer Speech and Language.
McAllister, D., et al. (1998). Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch. In Proceedings of the ICSLP, Sydney (pp. 1847–1850).
Google Scholar
MITCogNet (2010). http://mitpdev.mit.edu/library/erefs/arbib/images/figures/A248_fig001.gif.
Plötz, T. (2005). Advanced stochastic protein sequence analysis. PhD Thesis, Bielefeld University.
Saraçlar, M., Nock, H., & Khudanpur, S. (2000). Pronunciation modeling by sharing Gaussian densities across phonetic models. Computer Speech and Language, 14, 137–160.
Article Google Scholar
Seman, N., & Jusoff, K. (2008). Automatic segmentation and labeling for spontaneous standard Malay speech recognition. In International conference on advanced computer theory and engineering, Thailand (pp. 59–63).
Chapter Google Scholar
Saon, G., & Padmanabhan, M. (2001). Data-driven approach to designing compound words for continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(4), 327–332.
Article Google Scholar
Sloboda, T., & Waibel, A. (1996). Dictionary learning for spontaneous speech recognition. In Proceedings of the ICSLP-96, Philadelphia, PA, USA (pp. 2328–2331).
Google Scholar
Tajchman, G., Fosler, E., & Jurafsky, D. (1995). Building multiple pronunciation models for novel words using exploratory computational phonology. In EUROSPEECH-95, Madrid, Spain (pp. 2247–2250).
Google Scholar
Wester, M. (2003). Pronunciation modeling for ASR, knowledge-based and data-derived methods. Computer Speech & Language, 17(1), 69–85.
Article Google Scholar

Download references

Author information

Authors and Affiliations

King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Dia AbuZeina, Wasfi Al-Khatib, Moustafa Elshafei & Husni Al-Muhtaseb

Authors

Dia AbuZeina
View author publications
You can also search for this author in PubMed Google Scholar
Wasfi Al-Khatib
View author publications
You can also search for this author in PubMed Google Scholar
Moustafa Elshafei
View author publications
You can also search for this author in PubMed Google Scholar
Husni Al-Muhtaseb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dia AbuZeina.

Rights and permissions

Reprints and permissions

About this article

Cite this article

AbuZeina, D., Al-Khatib, W., Elshafei, M. et al. Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach. Int J Speech Technol 15, 65–75 (2012). https://doi.org/10.1007/s10772-011-9122-4

Download citation

Received: 30 July 2011
Accepted: 28 September 2011
Published: 14 October 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10772-011-9122-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The impact of phonological rules on Arabic speech recognition

Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic

Diacritics Effect on Arabic Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The impact of phonological rules on Arabic speech recognition

Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic

Diacritics Effect on Arabic Speech Recognition

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation