2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 2019
Schizophrenia is one of the mental disorders that impacts a person's thinking, speech, and ac... more Schizophrenia is one of the mental disorders that impacts a person's thinking, speech, and actions. It can reduce a person’s ability to process auditory information and make decisions. Analyzing this disorder correctly is important because it might help with different ways of reducing its negative effects on its patients. Linguists and psychiatrists have been investigating language impairments and speech disorder in people with schizophrenia disorder which can be challenging. In this study, we attempt to address this issue by analyzing linguistic features i.e. cohesion in the writings and speech scripts of schizophrenia patients. Our results show that using referential cohesion with text easability or situation model features provides the best performance for speech whereas for writing dataset, readability or a combination of situation model and readability yield the best performance.
Our paper addresses the problem of annotation projection for semantic role labeling for resource-... more Our paper addresses the problem of annotation projection for semantic role labeling for resource-poor languages using supervised annotations from a resource-rich language through parallel data. We propose a transfer method that employs information from source and target syntactic dependencies as well as word alignment density to improve the quality of an iterative bootstrapping method. Our experiments yield a 3.5 absolute labeled F-score improvement over a standard annotation projection method.
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VM... more This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018). For most languages, morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools us...
In this paper, we present a word sense disambiguation (WSD) based system for multilingual lexical... more In this paper, we present a word sense disambiguation (WSD) based system for multilingual lexical substitution. Our method depends on having a WSD system for English and an automatic word alignment method. Crucially the approach relies on having parallel corpora. For Task 2 (Sinha et al., 2009) we apply a supervised WSD system to derive the English word senses. For Task 3 (Lefever & Hoste, 2009), we apply an unsupervised approach to the training and test data. Both of our systems that participated in Task 2 achieve a decent ranking among the participating systems. For Task 3 we achieve the highest ranking on several of the language pairs: French, German and Italian.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
Qatar Foundation Annual Research Conference Proceedings Volume 2016 Issue 1, 2016
One of the characteristics of writing in Modern Standard Arabic (MSA) is that the commonly used o... more One of the characteristics of writing in Modern Standard Arabic (MSA) is that the commonly used orthography is mostly consonantal and does not provide full vocalization of the text. It sometimes includes optional diacritical marks (henceforth, diacritics or vowels).Arabic script consists of two classes of symbols: letters and diacritics. Letters comprise long vowels such as A, y, w as well as consonants. Diacritics on the other hand comprise short vowels, gemination markers, nunation markers, as well as other markers (such as hamza, the glottal stop which appears in conjunction with a small number of letters, dots on letters, elongation and emphatic markers) which in all, if present, render a more or less exact precise reading of a word. In this study, we are mostly addressing three types of diacritical marks: short vowels, nunation, and shadda (gemination).Diacritics are extremely useful for text readability and understanding. Their absence in Arabic text adds another layer of lexi...
Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 4, 2018
Language ambiguity is an inherent characteristic of natural languages. It refers to the phenomeno... more Language ambiguity is an inherent characteristic of natural languages. It refers to the phenomenon where an instance can be interpreted in multiple ways. Ambiguity is at the core of the problems faced by natural language processing applications (Obeid et al. 2013). Although humans have the ability to resolve such ambiguity based on their prior knowledge and context, there are instances (sentences, words,... etc) that require multiple readings to resolve it within a context (Hawwari et al. 2013; Diab et al. 2008). The problem of natural language ambiguity is further exacerbated by conventional orthographic decisions where not all phonemes are explicitly represented (Maamouri et al. 2010; Maamouri et al. 2012). Arabic standard orthography is one of these languages that is underspecified for some of the characters such as short vowels, gemination, glottal stops, etc which are collectively represented as diacritics (Zaghouani et al. 2012; Zaghouani et al. 2016). Most typical text in Ara...
Transactions of the Association for Computational Linguistics, 2018
Most existing methods for automatic bilingual dictionary induction rely on prior alignments betwe... more Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monolingual word embeddings. The proposed method exploits local and global structures in monolingual vector spaces to align them such that similar words are mapped to each other. We show empirically that the performance of bilingual correspondents that are learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.
Measuring Verb Similarity Philip Resnik and Mona Diab Department of Linguistics and Institute for... more Measuring Verb Similarity Philip Resnik and Mona Diab Department of Linguistics and Institute for Advanced Computer Studies University of Maryland College Park, MD USA f resnik,mdiab g @umiacs.umd.edu Abstract The way we model semantic similarity is closely tied to our understanding of linguistic representations. We present several models of semantic similarity, based on diering representational assumptions, and investigate their properties via comparison with human ratings of verb similarity. The results oer insight into the bases for human similarity judgments and provide a testbed for further investigation of the interactions among syn- tactic properties, semantic structure, and semantic con- tent. Introduction The way we model semantic similarity is closely tied to our understanding of how linguistic representations are acquired and used. Some models of similarity, such as Tversky's (1977), assume an explicit set of features over which a similarity measure can be computed, a...
2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 2019
Schizophrenia is one of the mental disorders that impacts a person's thinking, speech, and ac... more Schizophrenia is one of the mental disorders that impacts a person's thinking, speech, and actions. It can reduce a person’s ability to process auditory information and make decisions. Analyzing this disorder correctly is important because it might help with different ways of reducing its negative effects on its patients. Linguists and psychiatrists have been investigating language impairments and speech disorder in people with schizophrenia disorder which can be challenging. In this study, we attempt to address this issue by analyzing linguistic features i.e. cohesion in the writings and speech scripts of schizophrenia patients. Our results show that using referential cohesion with text easability or situation model features provides the best performance for speech whereas for writing dataset, readability or a combination of situation model and readability yield the best performance.
Our paper addresses the problem of annotation projection for semantic role labeling for resource-... more Our paper addresses the problem of annotation projection for semantic role labeling for resource-poor languages using supervised annotations from a resource-rich language through parallel data. We propose a transfer method that employs information from source and target syntactic dependencies as well as word alignment density to improve the quality of an iterative bootstrapping method. Our experiments yield a 3.5 absolute labeled F-score improvement over a standard annotation projection method.
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VM... more This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). VMWEs were annotated according to the universal guidelines in 19 languages. The corpora are provided in the cupt format, inspired by the CONLL-U format. The corpora were used in the 1.1 edition of the PARSEME Shared Task (2018). For most languages, morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools us...
In this paper, we present a word sense disambiguation (WSD) based system for multilingual lexical... more In this paper, we present a word sense disambiguation (WSD) based system for multilingual lexical substitution. Our method depends on having a WSD system for English and an automatic word alignment method. Crucially the approach relies on having parallel corpora. For Task 2 (Sinha et al., 2009) we apply a supervised WSD system to derive the English word senses. For Task 3 (Lefever & Hoste, 2009), we apply an unsupervised approach to the training and test data. Both of our systems that participated in Task 2 achieve a decent ranking among the participating systems. For Task 3 we achieve the highest ranking on several of the language pairs: French, German and Italian.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
Qatar Foundation Annual Research Conference Proceedings Volume 2016 Issue 1, 2016
One of the characteristics of writing in Modern Standard Arabic (MSA) is that the commonly used o... more One of the characteristics of writing in Modern Standard Arabic (MSA) is that the commonly used orthography is mostly consonantal and does not provide full vocalization of the text. It sometimes includes optional diacritical marks (henceforth, diacritics or vowels).Arabic script consists of two classes of symbols: letters and diacritics. Letters comprise long vowels such as A, y, w as well as consonants. Diacritics on the other hand comprise short vowels, gemination markers, nunation markers, as well as other markers (such as hamza, the glottal stop which appears in conjunction with a small number of letters, dots on letters, elongation and emphatic markers) which in all, if present, render a more or less exact precise reading of a word. In this study, we are mostly addressing three types of diacritical marks: short vowels, nunation, and shadda (gemination).Diacritics are extremely useful for text readability and understanding. Their absence in Arabic text adds another layer of lexi...
Qatar Foundation Annual Research Conference Proceedings Volume 2018 Issue 4, 2018
Language ambiguity is an inherent characteristic of natural languages. It refers to the phenomeno... more Language ambiguity is an inherent characteristic of natural languages. It refers to the phenomenon where an instance can be interpreted in multiple ways. Ambiguity is at the core of the problems faced by natural language processing applications (Obeid et al. 2013). Although humans have the ability to resolve such ambiguity based on their prior knowledge and context, there are instances (sentences, words,... etc) that require multiple readings to resolve it within a context (Hawwari et al. 2013; Diab et al. 2008). The problem of natural language ambiguity is further exacerbated by conventional orthographic decisions where not all phonemes are explicitly represented (Maamouri et al. 2010; Maamouri et al. 2012). Arabic standard orthography is one of these languages that is underspecified for some of the characters such as short vowels, gemination, glottal stops, etc which are collectively represented as diacritics (Zaghouani et al. 2012; Zaghouani et al. 2016). Most typical text in Ara...
Transactions of the Association for Computational Linguistics, 2018
Most existing methods for automatic bilingual dictionary induction rely on prior alignments betwe... more Most existing methods for automatic bilingual dictionary induction rely on prior alignments between the source and target languages, such as parallel corpora or seed dictionaries. For many language pairs, such supervised alignments are not readily available. We propose an unsupervised approach for learning a bilingual dictionary for a pair of languages given their independently-learned monolingual word embeddings. The proposed method exploits local and global structures in monolingual vector spaces to align them such that similar words are mapped to each other. We show empirically that the performance of bilingual correspondents that are learned using our proposed unsupervised method is comparable to that of using supervised bilingual correspondents from a seed dictionary.
Measuring Verb Similarity Philip Resnik and Mona Diab Department of Linguistics and Institute for... more Measuring Verb Similarity Philip Resnik and Mona Diab Department of Linguistics and Institute for Advanced Computer Studies University of Maryland College Park, MD USA f resnik,mdiab g @umiacs.umd.edu Abstract The way we model semantic similarity is closely tied to our understanding of linguistic representations. We present several models of semantic similarity, based on diering representational assumptions, and investigate their properties via comparison with human ratings of verb similarity. The results oer insight into the bases for human similarity judgments and provide a testbed for further investigation of the interactions among syn- tactic properties, semantic structure, and semantic con- tent. Introduction The way we model semantic similarity is closely tied to our understanding of how linguistic representations are acquired and used. Some models of similarity, such as Tversky's (1977), assume an explicit set of features over which a similarity measure can be computed, a...
Uploads
Papers by Mona Diab