Dr. Eiman Mustafawi is currently an Associate Professor in the Department of English Literature and Linguistics and the Vice President for Student Affairs at Qatar University. She was the former Dean of College of Arts and Sciences. She received her PhD in 2006 in Linguistics from the University of Ottawa in Canada, specializing in Theoretical Phonology.Her PhD dissertation "An Optimality Theoretic Approach to Variable Phonological Alternations in Qatari Arabic" was nominated for two prestigious awards: the Governor General’s Gold Medal and the Pierre Laberge Prize for outstanding doctoral dissertations.Her MA studies focused on the linguistic constraints on the manifestations of linguistic outcomes of Bilingualism. She worked as a Research Assistant and a Teaching Assistant from 2000-2006 in the Department of Linguistics at the University of Ottawa. She holds a BA in English/Education from Qatar University.Dr Mustafawi joined Qatar University in 2006 as an Assistant Professor in Linguistics. In 2007, she was appointed as the Associate Dean for Faculty Affairs at the College of Arts and Sciences. Then, in 2009, she was appointed as the Associate Dean for Academic Affairs within the college prior to her promotion to the post of Dean in 2011. She served as Dean of College of Arts and Sciences until June 2016 when she went back to faculty, focusing on research in Arabic linguistics and in education.
AFFRICATION IN NORTH ARABIC REVISITED* Eiman Mustafawi Qatar University 1. Introduction One of th... more AFFRICATION IN NORTH ARABIC REVISITED* Eiman Mustafawi Qatar University 1. Introduction One of the characteristics of North Arabic1 varieties is the affrication of the voiced velar stop [g] to Q] 2 (Johnstone 1967: 2), a process that is generally assumed to be triggered in the ...
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic... more In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseline acoustic models were fused together after two independent trainings to create a hybrid acoustic model. Pronunciation modeling was also hybrid by generating graphemic pronunciation variants as well as phonemic variants. Different techniques are proposed for pronunciation modeling to reduce model complexity. Experiments were conducted on large vocabulary news broadcast speech domain. The proposed hybrid approach has shown a relative reduction in WER of 8.8% to 12.6% based on pronunciation modeling settings and the supervision in the baseline systems.
Talk presented at the 47th annual meeting of the North Atlantic Conference on Afroasiatic Linguis... more Talk presented at the 47th annual meeting of the North Atlantic Conference on Afroasiatic Linguistics (NACAL), 24-26 June 2019, Paris, France.
Talk presented at the Brill's Journal of Afroasiatic Languages and Linguistics International ... more Talk presented at the Brill's Journal of Afroasiatic Languages and Linguistics International Conference, 14-16 November 2018, Nantes, France.
Poster presented at the 11th Annual Meeting of the Society for the Neurobiology of Language, 20-2... more Poster presented at the 11th Annual Meeting of the Society for the Neurobiology of Language, 20-22 August 2019, Helsinki, Finland.
This study investigates the linguistic attitudes and perceptions of Qatar University students reg... more This study investigates the linguistic attitudes and perceptions of Qatar University students regarding the utility and vitality of the two languages that define the education and communication scenes in Qatar, namely, Arabic and English. It also reports on the predictors of these attitudes in terms of demographic traits. 861 students completed a questionnaire that was divided into: Media Language Preference (MLP); Value and Symbolism of Arabic (VSA); Arabic in Education and Society (AES); Medium of Instruction (MOI); Impact of Al-Jazeera Network (IJN); English in Scientific and Professional Communication (ESPC); Qatari Cultural Identity (QCI); Arabic Books (AB); English in Society and Work (ESW); Language in Workplace (LIW); Arabic in Employment (AE); Status of Arabic (SA); and Manifestations of Sociocultural Identity (MSI). Results showed that Arabic got higher ratings for MLP, VSA, AES, MOI, QCI, and MSI, while English was perceived as more useful than Arabic in ESPC. Correlation...
Poster presented at the 25th Architectures and Mechanisms of Language Processing (AMLaP) meeting,... more Poster presented at the 25th Architectures and Mechanisms of Language Processing (AMLaP) meeting, September 2019, Moscow, Russia.
In this paper, a framework for long audio alignment for conversational Arabic speech is proposed.... more In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. We have collected more than 1,400 hours of conversational Arabic besides the corresponding human generated non-aligned transcriptions. Automatic audio segmentation is performed using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass. In a second pass, a more restricted LM is generated for each audio segment, and unsupervised acoustic model ad...
— The Arabic language is characterized by the existence of many different colloquial varieties th... more — The Arabic language is characterized by the existence of many different colloquial varieties that significantly differ from the standard Arabic form. In this paper, we propose a state-of-the-art speech recognition system for Levantine Colloquial Arabic (LCA). A fully continuous context dependent acoustic model was trained using 50 hours of speech from the BBN DARPA Babylon corpus. Pronunciation modeling was initially grapheme-based due to the absence of diacritic marks in transcriptions. Acoustic model parameters have been optimized including number of senones and Gaussians. In order to improve speech recognition accuracy, a cross-lingual hybrid acoustic and pronunciation modeling approach is proposed, where a MSA phoneme-based acoustic model is adapted using a small amount of LCA speech data. The adapted AM was then combined with the initial grapheme-based model to create a hybrid acoustic model. 1
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resourc... more A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, we propose a trans-fer learning framework to jointly use large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. We have chosen the Qatari Arabic (QA) dialect as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari TV series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and sys-tem combination. The proposed approach can achieve more than 28 % relative reduction in WER.
Idrissi, Muralikrishnan et al. (2018). Poster presented at the CUNY Conference on Sentence Proces... more Idrissi, Muralikrishnan et al. (2018). Poster presented at the CUNY Conference on Sentence Processing, University of California, Davis, USA.
We propose a framework for long audio alignment for conversational Arabic speech. Accurate alignm... more We propose a framework for long audio alignment for conversational Arabic speech. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. In this work, we have collected more than 1400 hours of conversational Arabic besides the corresponding non-aligned text transcriptions. Automatic segmentation is applied using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass applied on the current episode's segments. In second pass, a more restricted LM is generated for each segment, and unsupervised a...
Imperical research has shown that such single items constitute the majority of the other language... more Imperical research has shown that such single items constitute the majority of the other language material in most bilingual discourse, so grouping them with the wrong category may obscure the patterns of behavior of the true members of that category. For example, if lone words are categorized with codeswitches, their patterns of behavior may skew the patterns of behavior of the true codeswitches, which gives rise to theories of codeswitching which account poorly for the data (Ghafar-Samar & Meechan 1998, p. 206). Therefore, it is important to keep the status of lone words ambiguous until their patterns of behavior show similarity, to either established loanwords (borrowings) or unambiguous codeswitches (CSs). There are three different views in the field with respect to lone words. The first as reflected in work of Mahoutian (1993), Eliasson (1990), and Myers-Scotton (1992; 1993), does not distinguish between borrowing and codeswitching and attribute them to the same mechanism. The ...
AFFRICATION IN NORTH ARABIC REVISITED* Eiman Mustafawi Qatar University 1. Introduction One of th... more AFFRICATION IN NORTH ARABIC REVISITED* Eiman Mustafawi Qatar University 1. Introduction One of the characteristics of North Arabic1 varieties is the affrication of the voiced velar stop [g] to Q] 2 (Johnstone 1967: 2), a process that is generally assumed to be triggered in the ...
In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic... more In this research, we propose a hybrid approach for acoustic and pronunciation modeling for Arabic speech recognition. The hybrid approach benefits from both vocalized and non-vocalized Arabic resources, based on the fact that the amount of non-vocalized resources is always higher than vocalized resources. Two speech recognition baseline systems were built: phonemic and graphemic. The two baseline acoustic models were fused together after two independent trainings to create a hybrid acoustic model. Pronunciation modeling was also hybrid by generating graphemic pronunciation variants as well as phonemic variants. Different techniques are proposed for pronunciation modeling to reduce model complexity. Experiments were conducted on large vocabulary news broadcast speech domain. The proposed hybrid approach has shown a relative reduction in WER of 8.8% to 12.6% based on pronunciation modeling settings and the supervision in the baseline systems.
Talk presented at the 47th annual meeting of the North Atlantic Conference on Afroasiatic Linguis... more Talk presented at the 47th annual meeting of the North Atlantic Conference on Afroasiatic Linguistics (NACAL), 24-26 June 2019, Paris, France.
Talk presented at the Brill's Journal of Afroasiatic Languages and Linguistics International ... more Talk presented at the Brill's Journal of Afroasiatic Languages and Linguistics International Conference, 14-16 November 2018, Nantes, France.
Poster presented at the 11th Annual Meeting of the Society for the Neurobiology of Language, 20-2... more Poster presented at the 11th Annual Meeting of the Society for the Neurobiology of Language, 20-22 August 2019, Helsinki, Finland.
This study investigates the linguistic attitudes and perceptions of Qatar University students reg... more This study investigates the linguistic attitudes and perceptions of Qatar University students regarding the utility and vitality of the two languages that define the education and communication scenes in Qatar, namely, Arabic and English. It also reports on the predictors of these attitudes in terms of demographic traits. 861 students completed a questionnaire that was divided into: Media Language Preference (MLP); Value and Symbolism of Arabic (VSA); Arabic in Education and Society (AES); Medium of Instruction (MOI); Impact of Al-Jazeera Network (IJN); English in Scientific and Professional Communication (ESPC); Qatari Cultural Identity (QCI); Arabic Books (AB); English in Society and Work (ESW); Language in Workplace (LIW); Arabic in Employment (AE); Status of Arabic (SA); and Manifestations of Sociocultural Identity (MSI). Results showed that Arabic got higher ratings for MLP, VSA, AES, MOI, QCI, and MSI, while English was perceived as more useful than Arabic in ESPC. Correlation...
Poster presented at the 25th Architectures and Mechanisms of Language Processing (AMLaP) meeting,... more Poster presented at the 25th Architectures and Mechanisms of Language Processing (AMLaP) meeting, September 2019, Moscow, Russia.
In this paper, a framework for long audio alignment for conversational Arabic speech is proposed.... more In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. We have collected more than 1,400 hours of conversational Arabic besides the corresponding human generated non-aligned transcriptions. Automatic audio segmentation is performed using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass. In a second pass, a more restricted LM is generated for each audio segment, and unsupervised acoustic model ad...
— The Arabic language is characterized by the existence of many different colloquial varieties th... more — The Arabic language is characterized by the existence of many different colloquial varieties that significantly differ from the standard Arabic form. In this paper, we propose a state-of-the-art speech recognition system for Levantine Colloquial Arabic (LCA). A fully continuous context dependent acoustic model was trained using 50 hours of speech from the BBN DARPA Babylon corpus. Pronunciation modeling was initially grapheme-based due to the absence of diacritic marks in transcriptions. Acoustic model parameters have been optimized including number of senones and Gaussians. In order to improve speech recognition accuracy, a cross-lingual hybrid acoustic and pronunciation modeling approach is proposed, where a MSA phoneme-based acoustic model is adapted using a small amount of LCA speech data. The adapted AM was then combined with the initial grapheme-based model to create a hybrid acoustic model. 1
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resourc... more A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, we propose a trans-fer learning framework to jointly use large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. We have chosen the Qatari Arabic (QA) dialect as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari TV series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and sys-tem combination. The proposed approach can achieve more than 28 % relative reduction in WER.
Idrissi, Muralikrishnan et al. (2018). Poster presented at the CUNY Conference on Sentence Proces... more Idrissi, Muralikrishnan et al. (2018). Poster presented at the CUNY Conference on Sentence Processing, University of California, Davis, USA.
We propose a framework for long audio alignment for conversational Arabic speech. Accurate alignm... more We propose a framework for long audio alignment for conversational Arabic speech. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. In this work, we have collected more than 1400 hours of conversational Arabic besides the corresponding non-aligned text transcriptions. Automatic segmentation is applied using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass applied on the current episode's segments. In second pass, a more restricted LM is generated for each segment, and unsupervised a...
Imperical research has shown that such single items constitute the majority of the other language... more Imperical research has shown that such single items constitute the majority of the other language material in most bilingual discourse, so grouping them with the wrong category may obscure the patterns of behavior of the true members of that category. For example, if lone words are categorized with codeswitches, their patterns of behavior may skew the patterns of behavior of the true codeswitches, which gives rise to theories of codeswitching which account poorly for the data (Ghafar-Samar & Meechan 1998, p. 206). Therefore, it is important to keep the status of lone words ambiguous until their patterns of behavior show similarity, to either established loanwords (borrowings) or unambiguous codeswitches (CSs). There are three different views in the field with respect to lone words. The first as reflected in work of Mahoutian (1993), Eliasson (1990), and Myers-Scotton (1992; 1993), does not distinguish between borrowing and codeswitching and attribute them to the same mechanism. The ...
Uploads