Baye Mekonnen

In this paper, we describe an ongoing project of developing a treebank for Amharic. The main objective of developing the treebank is to use it as an input for the development of a parser. Morphologically-rich Languages like Arabic,... more

In this paper, we describe an ongoing project of developing a treebank for Amharic. The main objective of developing the treebank is to use it as an input for the development of a parser. Morphologically-rich Languages like Arabic, Amharic and other Semitic languages present challenges to the state-of-art in parsing. In such language morphemes play important functions in both morphology and syntax. In addition to the existence of high lexical variations due the morphology, Amharic has a number of clitics which are not indicated with any special marker in the orthography. Considering the status of Amharic resources and challenges to the existing approach to parsing, we suggest to develop a treebank where clitics are separated manually from content words and annotated semi-automatically for part-of-speech, morphological features and syntactic relations.

Publisher: CLiF

Publication Date: 2016

Research Interests:
Computer Science, Artificial Intelligence, Natural Language Processing, Parsing, Treebanks, and 2 moreAmharic and Morpholoically-rich language

Download (.pdf)

In this paper, we describe the process of creating an Amharic Dependency Treebank, which is the first attempt to introduce Universal Dependencies (UD) into Amharic. Amharic is a morphologically-rich and less-resourced language within the... more

In this paper, we describe the process of creating an Amharic Dependency Treebank, which is the first attempt to introduce Universal Dependencies (UD) into Amharic. Amharic is a morphologically-rich and less-resourced language within the Semitic language family. In Amharic, an orthographic word may be bundled with information other than morphology. There are some clitics attached to major lexical categories with grammatical functions. We first explain the segmentation of clitics, which is problematic to retrieve from the orthographic word due to morpheme co-occurrence restriction, assimilation and ambiguity of the clitics. Then, we describe the annotation processes for POS tagging, morphological information and dependency relations. Based on this, we have created a Treebank of 1,096 sentences.

Publisher: LREC

Publication Date: 2018

Research Interests:
Computer Science, Treebanks, Amharic, and Dependency Parsing

Download (.pdf)

In this paper, we compare four state-of-the-art neural network dependency parsers for the Semitic language Amharic. As Amharic is a morphologically-rich and less-resourced language, the out-of-vocabulary (OOV) problem will be higher when... more

In this paper, we compare four state-of-the-art neural network dependency parsers for the Semitic language Amharic. As Amharic is a morphologically-rich and less-resourced language, the out-of-vocabulary (OOV) problem will be higher when we develop data-driven models. This fact limits researchers to develop neural network parsers because the neural network requires large quantities of data to train a model. We empirically evaluate neural network parsers when a small Amharic treebank is used for training. Based on our experiment, we obtain an 83.79 LAS score using the UDPipe system. Better accuracy is achieved when the neural parsing system uses external resources like word embedding. Using such resources, the LAS score for UDPipe improves to 85.26. Our experiment shows that the neural networks can learn dependency relations better from limited data while segmentation and POS tagging require much data.

Publisher: RAIL

Publication Date: 2020

Research Interests:
Computer Science

Download (.pdf)

Research Interests:
Inductive Logic Programming, Morphology Learning, and Amharic Morphology

Download (.pdf)

Publication Date: Dec 8, 2012

Research Interests:
Inductive Logic Programming, Morphology Learning, and Amharic Morphology

Download (.pdf)

Publisher: CLiF

Publication Date: 2016

Publisher: LREC

Publication Date: 2018

Research Interests: Computer Science, Treebanks, Amharic, and Dependency Parsing<div>()</div>

Publisher: RAIL

Publication Date: 2020

Research Interests: Computer Science<div>()</div>

Research Interests: Inductive Logic Programming, Morphology Learning, and Amharic Morphology<div>()</div>

Publication Date: Dec 8, 2012

Research Interests: Inductive Logic Programming, Morphology Learning, and Amharic Morphology<div>()</div>

Log In

Research Interests:
Computer Science, Treebanks, Amharic, and Dependency Parsing

Research Interests:
Computer Science

Research Interests:
Inductive Logic Programming, Morphology Learning, and Amharic Morphology

Research Interests:
Inductive Logic Programming, Morphology Learning, and Amharic Morphology