Nuguse Final Research
Nuguse Final Research
Nuguse Final Research
By
NUGUSE NEGESE
To
Of
In
Computer Science
2023
ACCEPTANCE
Examine Members
_________________________________________________________
Examiner Committee
II | P a g e
DECLARATION
I, the undersigned, declare that this thesis is my original work, prepared under
advisor Dr. Michael Melese all resource used for research have been duly
acknowledged. I further confirm that the thesis has not been submitted to any higher
education institution for the purpose of taking degree.
Name Signature
III | P a g e
DEDICATED
This research dedicated to all people who help me for my success and
make my life comfort.
IV | P a g e
ACKNOWLEDGMENT
First and for most, I would like to express my heartfelt thanks to the Almighty god, for giving me
the strength, determination, endurance and wisdom to bring this thesis to completion. And I am
deeply grateful to my advisors Dr. Michael Melese for their constructive suggestions, review and
concern. Also I say thanks to my mother Mimi Abera, she always been with me supporting, helping
and appreciating in my journey to be a better man today.
I would also like all my family for their love and endless support and encouragement throughout
my academic life. And I would to thanks Getachow Mamo and Damme Lemma who support me
financial for this success. Many peoples help me but I can’t list all of them but I say thanks all of
them. And I also say thanks to my friends (Oromia bank IT infra Team) who support me in data
collection.
V|Page
Table of Contents
DECLARATION ......................................................................................................................................... III
DEDICATED............................................................................................................................................... IV
ACKNOWLEDGMENT............................................................................................................................... V
LIST OF TABLES ....................................................................................................................................... IX
LIST OF FIGURE......................................................................................................................................... X
ABBREVIATIONS LIST ............................................................................................................................ XI
Abstract ....................................................................................................................................................... XII
CHAPTER ONE ........................................................................................................................................... 1
INTRODUCTION .................................................................................................................................... 1
1.1 Background of Study .................................................................................................................... 1
1.2 Motivation ..................................................................................................................................... 3
1.3 Statement of the Problem .................................................................................................................... 4
1.4 Research question ............................................................................................................................... 5
1.5 Objective of the Study ....................................................................................................................... 5
1.5.1 General Objective of the Study .................................................................................................... 5
1.5.2 Specific Objective of the Study ................................................................................................... 5
1.6 Scope and limitation of the Study ...................................................................................................... 6
1.6.1 Scope ............................................................................................................................................ 6
1.6.2 Limitation ..................................................................................................................................... 6
1.7 Methodology ....................................................................................................................................... 6
1.7.1. Research Design.......................................................................................................................... 7
1.7.2. Literature Review........................................................................................................................ 7
1.7.3. Data collection ............................................................................................................................ 7
1.7.4. Implementation Tool ................................................................................................................... 7
1.7.5. Evaluation ................................................................................................................................... 8
1.8 Significance of the Study .................................................................................................................. 10
1.9 Organization of the paper................................................................................................................. 10
CHAPTER TWO ....................................................................................................................................... 12
LITERETURE REVIEW ............................................................................................................................ 12
2.1 Overview ........................................................................................................................................... 12
2.2 Question and Answering ................................................................................................................... 12
VI | P a g e
2.3 Question and Answering System Content ........................................................................................ 13
2.3.1 Question Processing Content ..................................................................................................... 14
2.3..2 Document Processsing .............................................................................................................. 15
2.3.3 Answer Processing Module ....................................................................................................... 16
2.4 Speech Recognition .......................................................................................................................... 17
2.4.1 Automatic speech recognition approach .................................................................................... 18
2.4.2 Hidden Makrov Model ............................................................................................................... 19
2.5 Related Work .................................................................................................................................... 21
2.5.1 Afaan Oromo and Other Language Text Based QAS ................................................................ 21
2.5.2 Speech Based Question and Answer ................................................................................... 22
2.5 Summary analysis and Gaps from Related works............................................................................. 24
CHAPTER THREE .................................................................................................................................... 26
AFAAN OROMO LANGUAGE ................................................................................................................ 26
3.1 Overview ........................................................................................................................................... 26
3.2 Basics of Afaan Oromo Language ........................................................................................................ 26
3.3 Afaan Oromo Writing System and Phonetics ................................................................................... 27
3.3.1 Afaan Oromo Writing Systems .................................................................................................. 27
3.3.2 Afaan Oromo Phonetics ............................................................................................................. 30
3.4 Script and Orthography ..................................................................................................................... 31
3.5 Morphology................................................................................................................................. 32
3.6 Summary ........................................................................................................................................... 34
CHAPTER FOUR....................................................................................................................................... 35
SYSTEM DESIGN AND IMPLEMENTATION ....................................................................................... 35
4.1. Overview .............................................................................................................................................. 35
4.2. System Architecture ......................................................................................................................... 35
4.3 Afaan Oromo speech recognition ..................................................................................................... 37
4.3.1 Speech Capturing ................................................................................................................ 38
4.3.2 Feature Extraction ...................................................................................................................... 38
4.3.3 Acoustic Model .......................................................................................................................... 39
4.3.4 Language Model ........................................................................................................................ 42
4.3.5 Lexical Model. ........................................................................................................................... 43
4.3.6 Decoder ...................................................................................................................................... 43
VII | P a g e
4.3.7 Output Text ................................................................................................................................ 43
4.3.8 Recognize Question ................................................................................................................... 44
4.4 Question and Answering ................................................................................................................... 45
4.4.1 Question Analysis ...................................................................................................................... 46
4.4.2 Question Classification .............................................................................................................. 46
4.4.3 Morphological Analysis ............................................................................................................. 49
4.4.4 Query Generation ....................................................................................................................... 49
4.4.5 Document Pre-processing .......................................................................................................... 50
4.5 software tools used for development ................................................................................................ 52
4.5.1 System interface ......................................................................................................................... 54
CHAPTER FIVE ........................................................................................................................................ 56
EXPERIMENT RESULT AND DISCUSSION ......................................................................................... 56
5.1 OVERVIEW ......................................................................................................................................... 56
5.2 Experimentation Environment .......................................................................................................... 56
5.3 Speech Recognition System Prototype Components ........................................................................ 57
5.3.1 Sound Recognition ..................................................................................................................... 57
5.4 Experimentation and Evaluation for Speech Recognition ................................................................ 58
5.4.1 Performance ............................................................................................................................... 58
5.4.2 Our Evaluation and Experiments ............................................................................................... 59
5.4.3 Experiments ............................................................................................................................... 60
5.5 Experimentation and Evaluation for factoid Question answering System ........................................ 61
5.5.1 Question Classification Evaluation ............................................................................................ 61
5.5.2 Question and Answering Evaluation.......................................................................................... 62
5.6 Discussion and Challenges................................................................................................................ 66
CHAPTER SIX ........................................................................................................................................... 68
RECOMMENDATION AND CONCLUSION .......................................................................................... 68
6.1 Conclusion ........................................................................................................................................ 68
6.2 Recommendation .............................................................................................................................. 69
References ............................................................................................................................................... 70
Appendix I .............................................................................................................................................. 75
Appendix II ............................................................................................................................................. 76
Appendix III ............................................................................................................................................ 76
VIII | P a g e
Annex ...................................................................................................................................................... 81
LIST OF TABLES
Table 2. 1 Related Word analysis and Gaps identify .................................................................................. 24
Table 3. 1 Afaan Oromo consonants (dubbifaama) (extracted from: Oromo phonology) .......................... 29
Table 3. 2 Personal pronoun in afaan Oromo [59]. ..................................................................................... 32
Table 3. 3 Afaan Oromo regular verb identification ................................................................................... 33
IX | P a g e
LIST OF FIGURE
Figure 4. 1. System architecture for Afaan Oromo speech based question and answering. ....................... 36
Figure 4. 2 parts of automatic speech recognition of afaan Oromo language ............................................ 38
Figure 4. 3 MFCC flow for feature extraction. ......................................................................................... 39
Figure 4. 4 training database other. ............................................................................................................. 41
Figure 4. 5 text to language model change steps toolkit [55] ..................................................................... 42
Figure 4. 6 Speech Recognition Output samples ........................................................................................ 44
Figure 4. 7 Question and answering overall Architecture........................................................................... 45
Figure 4. 8 Rule-based QAS Classification Algorithm to determine type’s question ................................ 49
Figure 4. 9 Question and answering user interface templates ..................................................................... 55
X|Page
ABBREVIATIONS LIST
ASR Automatic Speech Recognition
IE Information Extraction
IR Information Retirival
QA Question Answering
SVO Subject-Object-Verb
XI | P a g e
ABSTRACT
One of the information retrieval disciplines that accurately predicts answers to a
given question from massive documents is question answering. Our research
concentrated on developing an interactive model as a result. An interface using both
Afaan Oromo speech recognition integrated with factoid question and answering.
An automatic question classification system for speech-based questions for Afaan
oromo question answering is what this project aims to design and build. After all,
the study is integrate of both voice recognition and question-answering techniques.
Numerous tools were used in the construction of the system's prototype. from those
cygwin, python, perl and Neatbean 8.0 for Java coding. These study contains large
number of Afaan Oromo documents for speech testing, training and also for answer
extraction for question answering. The corpus collected from different Afaan Oromo
newspaper online newspaper such as (Fana, Bariisaa, Bakkalcha and Ethiopres)
and internet.
We used 2,152 dataset for question-answering to evaluate the systems quality and
also speech based question sentences corpus trains by 21 different people (male 13,
and women 8 with total trains of 1344 speech dataset) those who can speak and read
Afaan Oromo language and tested by both who trains and not trained. Each
individual reads 64 questions aloud, and the questions types are about places and
person. The model provided recognition accuracy of 80.2% with 19.8% WER. The
speech recognition system's experimental findings showed accuracy of 78.4%. The
question classification without question and answering for both person and place
question types classified with a 98% and 96% for both questions list respectively.
But with question and answering the Rule based question classification accurate
89.1% precision, 91.6% recall and 90.3% F-measurement. The results of speech-
based questions and automatic question classification for Afaan Oromo question-
answering are generally achieves 71.45% accuracy.
The challenges with this research is that it did not parse a query using synonyms. As
a result, in order to improve the performance of Speech-based question and Afaan
Oromo question answering Classification system, semantic similarity using
ontology-based structure is needed.
Keywords: Afaan Oromo question answering, speech recognition, question classification.
XII | P a g e
CHAPTER ONE
INTRODUCTION
1.1 Background of Study
Oromo language, also known as Afaan Oromo. Oromo language is a Cushitic language spoken
by more than about 50 million people in Ethiopia, Kenya, Somalia, and Egypt and is the 3rd largest
language in Africa [1]. Nowadays there is a huge volume of data available on the web. This huge
volume of data on the web can satisfy most of the information need. But without the appropriate
search facilities it is difficult to get the required information from the web documents. Search
engines like Google, Yahoo, etc., help the users to get new concepts from different documents on
the web. This is to mean that such kind of search engines return a ranked list of documents that
contain the concepts that the user requested. Then the users by themselves go through the returned
list of documents to filter the concepts that satisfy their needs [2]. Because information retrieval
systems lack the capacity to completely comprehend users' requests, they, like most search
engines, return a list of documents that are irrelevant to the query. Users send a query to the search
engine in order to obtain what they want, and the search engine subsequently ranks and provides
content that are linked to the query words. Users must read and pick material of interest from the
returned document manually, which takes time, and the information provided by search engines is
not tailored to the query [3]. There are two mechanisms which to make those access easy.those are
Information Retrieval (IR) and Information Extraction (IE).
NLP technologies are used in the IE approach to accurately indicate a valid text. Deep analysis of
queries (i.e., user inquiries) to comprehend the users' intent, as well as deep analysis of the content
to derive proper replies, are both part of IE (sentences or passages). Users were dissatisfied with
the search engines' performance since they could only return ranked lists of relevant documents
[4]. Relevant documents should be obtained using a question answering system in order to offer
accurate responses to questions posed in natural language. One of the IE approaches used to extract
accurate responses for a given inquiry is question and answer (QA). QA is one of the IE techniques
that is used to extract precise answers for a specific question.
1|Page
Types of questions asked in Question Answering (QA) systems directly have an effect on the
answers. We organize types of questions into different categories. The different categories of
question types are Factoid type questions (what, which, when, who, how), List type questions (a
list of facts or entities as answers), Confirmation Questions (yes or no), Causal Questions (why
or how) and Hypothetical Questions (any hypothetical event and no specific answers of these
questions) [5, 6].
A classification of question answering system is classified in to two catagories those are Open
domain and closed domain Question Answering system [6]. Open domain Question Answering
systems are not restricted to any specific domain and provide a short answer to a question,
addressed in natural language and In Closed domain QA system, there is restriction of domain
which is based on web and questions are related to a specific domain. Closed domain Question
Answering system consists of limited repository of domain specific questions and can answer a
limited number of questions. Open-domain QA system supports any domain questions and answers
which are collected from different sources, such as; internet, reporters, newspaper, and articles.
Open-domains which are questions almost about everything in the study [7].
Question processing, document processing, and response extraction are all components of a
Question and Answer (QA) system. The question processing module is in charge of selecting
question kinds, expected answer types, question focus, and the appropriate question to send to the
document retrieval component. Using Rule based question classification, determine the question
type, i.e., what the inquiry is about and what can be done. The question focus is used to determine
the Rule based question classification. The intended answer type is linked to the question type and
the emphasis of the inquiry. The most important purpose of defining the expected answer type is
for the answer extraction module to quickly extract the correct answer. In addition, the
questionprocessing module provides an appropriate question that will assist in relevant documents.
The document retrieval component is in charge of locating relevant documents within a collection.
It's similar to how IR systems, like as search engines, present relevant documents to users
depending on the inquiry they've asked. The document retrieval component is obvious, as an
irrelevant document results in an incorrect or NO response. Depending on the needs and
procedures utilized in the QA system, the document retrieval component may include
paragraph/sentence retrieval. The answer extraction module, which is a critical component of
2|Page
quality assurance systems, employs a variety of ways to extract the right answer. This lesson will
use several methods and strategies to obtain the precise answer[8,9].
This research study focus on factoid question and answering system which integrated with speech
based question that recognized from user input. Thus there is a need to design and develop a
question and answering prototype which simplify search those huge documents.
1.2 Motivation
The technique known as automatic speech recognition (ASR) enables people to use their voices to
communicate with computer interfaces in a way that, in its most advanced forms, closely resembles
natural human speech. A difficult endeavor, finding succinct, specific responses to users' inquiries
is question answering. Through the use of computational linguistics, speech-to-text software can
recognize spoken language and convert it into text. Speech recognition or computer speech
recognition are other names for it.
We are constantly looking for information. However, information and knowledge are not the same.
We can easily get pertinent information thanks to the development of information retrieval and
web search. A specialized type of knowledge-seeking information retrieval is question answering.
Not only are we interested in finding the pertinent pages, but we are also interested in finding
answers to specific questions. NLP, IR, and rule-based model representation all come together in
question and answering.
Now today some NLP application are constructed for Afaan Oromo [16]. QA technology will
become more and more crucial as it gets harder and harder to get answers on the web using
traditional search engines. Also Factoid questions make up a large portion of the real queries
entered into search engines. After inserted to search engines instead of providing shorter answer
for user for asked query a current search engines can return links and full-length data’s and also
you can take time to identify documents were answer is occured.So NLP is the technology
motivated as solution for analyzing and selection precise answer shortly for asked factoid type
question and In order to handle language for a variety of jobs, higher levels of analysis are needed.
The art and science of question-answering systems coexist in their very nature. There is a global
need for question-answering systems. The demand for technological aid is present in every area of
life. As a result, it is worthwhile to investigate the fascinating topic of question answering.
3|Page
1.3 Statement of the Problem
Language of the Oromo people, spoken in Northeast Africa and primarily Ethiopia and Kenya, as
well as parts of Somalia and Egypt. As a macro language, it is estimated that Oromo is spoken by
as many as 50,000,000 people [12]. Oromo is the 3rd most widely spoken African language after
Arabic, Hausa and Swahili. From the Cushitic branch of the Afro-asiatic language family, it is
used as a lingua franca also by non-Oromo groups in Ethiopia, Kenya and Somalia [10]. Numerous
publications in the world, including newspapers, magazines, educational materials, government
documents, and religious texts, have been produced in the language. You can access this
information electronically in both online and offline locations [11]. As a result, each question must
be carefully considered in order to provide the appropriate response using their respective
languages' QAS methods.
Since the conception of QAS, numerous studies have been conducted in numerous languages with
good results .in our country some research developed on local languages likes: Amharic, Tigrigna,
and Hadiy…etc. Speech-based question for also done for Amharic language documents. But afaan
Oromo Cushitic language which used Latin scripts which called “Qubee”.however they have
different linguistic, words, sentences, questions, answers and pronunciations [1, 11]. So the
developed system previous not worked for afaan Oromo because they are different
morphologically. In afaan Oromo and other local languages some research is done on question and
answering.
Many researchs are done in various languages to solve the issue of question and answering .from
those Afaan Oromo Question Answering System for Factoid Questions by [14],Definition
Question answering system for Afaan Oromo Language by Dejene Hundesa[2], Afaan Oromo List,
Definition and Description Question Answering System by Chaltu Fita[15,16], Amhari c Question
answering for Factoid Question by [17,18].The above listed researches are takes text questions
and provides answers in the text form.
The number of Afaan Oromo documents produced electronically is increasing quickly in our
country ,like journal, newspaper, and research publishers begin disseminating their products
online. However, since asking questions is a part of what makes people human, the answers to
4|Page
these questions can be found in the newspapers. Due to this gap, there will be a greater need for a
system that can takes questions, look up answers in the knowledge base, and give a straightforward
solution.because without this it has the challenging of wasting time on identify documents.
So it’s very important to construct speech based question and question answering classification
which accept question simply in voice recognition technology and transfer it to question and
answering. The QAS analyse, ranking, classify and retrieve the question and answer simple to
solve above gaps.
In order to accomplish this, the following research questions are investigated and answered in this
study:
To review literature to understand the state-of-the-art in the area of QAS and Afaan Oromo
language beside identifying the research gaps.
To collect and prepare representative dataset for Afaan Oromo speech recognition and
QAS.
5|Page
To design the general architecture of Afaan Oromo Speech-based question for QAS
To examine and identify the study component methodologies, strategies, and tools, such as
voice recognition and classify question for the queston answering system.
To develop prototypes and for voice recognition, question answering classification afaan
langauge.
To integrate speech question with question anwering to fulfill the correct answer for
question and answering.
To report the finding of study for the upcoming research area.
To evaluate the system's performance.
1.6.2 Limitation
In the speech-based question and QAS for Afaan Oromo research the main challenges is Afaan
Oromo language large-scale document corpus are a critical component of this research. This paper
does not answer all possible correct answer for all asked question.Because the main problem is the
structure matching technique; also, the questions are unclear and absence of WordNet-compatible
semantic similarity, may be the answer of the question say no answer or retrieve false answer. The
other limitation of the research is classification of the question which focus only on person and
place, because collecting large corpus is difficult due to the language has not standard corpora.
1.7 Methodology
Methodology refers the broad plan and justification for your research effort are referred to as your
methodology. It entails researching the theories and ideas that underpin the procedures employed
in your industry in order to create a strategy that is in line with your goals. A variety of approaches
have been used to achieve the study's general and specific aims.
6|Page
1.7.1. Research Design
This study used the design science research approach methodology. It is a design and development
integration science that are commonly used in fields like software engineering, computer science,
information science, and information technology. It is a group of study approaches that employ
design and development to comprehend underlying processes.
7|Page
tools. Because it is reliable and independent platform. Lucene libraries also additionally utilized
for answering a question.
Cygwin, Pocketsphinx Sphinx-Train, and Sphinx4 were the tools used for the training phase of the
voice recognition system. The majority of those products feature a straightforward process for
creating STT in this system. The Rule based model is generated and automatically classified using
the Java programming language library. To examine the speech and identify its formants and other
pertinent data from the speech file, Wave Surfer was also employed as a speech analysis tool.
Based on the above tools, the researcher is able to design and build a prototype with three distinct
components that are combined using the Java NetBeans 8.0 in Windows 11 computer.
1.7.5. Evaluation
Two general performance testing techniques are used in the system evaluation process. System
performance evaluation as the first step and testing is the second step. Question data types prepared
for person and places are used for question classification in the training part. For both respectively
20 and 20. Additionally, the test dataset included 20 question sentences spoken by two speakers.
During the recognition phase, 2 users were utilized to verify the system's naturalness and
understandability as well as user acceptance.
The Afaan Oromo speech recognition and afaan Oromo question-answering system, are tested
independently to evaluate the system performance's correctness or accuracy. The analysis of voice
recognition using sphinx tools to determine whether words or sentences are accurate matches. And
also Precision, Recall, and F-measure are used to evaluate the accuracy of the question-answering
system model. The final stage of the system tested both and evaluated as one.
Or
True positive(TP)
Precision =
True Positive(TP) + False Positive(FP)
8|Page
Correctl retrieved answer
𝑅𝑒𝑐𝑎𝑙𝑙 =
Total correct + Not displayed
Or
True Positive(TP)
𝑅𝑒𝑐𝑎𝑙𝑙 =
True positive(TP) + False Negative(FN)
Where TP stands for true positive, FP for false positive and FN for false negative.
An indicator of a system's accuracy, the harmonic mean of precision and recall, or F-Measure or
F-Score, reaches its best value at 1 and its worst score at 0. It is 0 if no pertinent solutions have
been discovered, and it is 1 if the precise solution has been identified.
2 ∗ Precision ∗ Recall
𝐹 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 =
(Precision + Recall )
A popular metric for evaluating the efficacy of speech recognition and machine translation systems
is the word error rate (WER). Assessing performance is generally difficult since the reference word
sequence—which is intended to be the correct one—can be longer than the accepted word
sequence. In contrast to the phoneme level, the WER operates at the word level. In order to tackle
this issue, dynamic string alignment is used to first align the recognised word sequence with the
reference (spoken) word sequence.
S+D+I
𝑊𝐸𝑅 =
N
Where:
9|Page
N= is the number of words in the reference
When describing a voice recognition system's performances word recognition rate which contains
number of correctly recognized words using this N-(S+D) equation.
The study offers a solution for less typing challenged users who prefer Afaan oromo voice as a
means of accessing Afaan oromo documents. Furthermore, the study's significance is that it may
be used to a variety of business applications using speech to communicate with machines. The
outcomes of this study may aid in the knowledge of voice recognition components, question
categorization approaches in Question and answering systems componenets , all of which are
critical for system design.
The study's findings will be extremely useful to a variety of government, multilateral, and bilateral
development partners, as well as Typing takes far longer than talking.. A document can be dictated
three times faster than it can be typed. Software for dictation and transcribing results in lower
transcription costs and a much simpler workflow. Speech recognition software can be used by any
industry. Like for educational purposes and in developing future research.
10 | P a g e
In chapter two. The literature is examined in order to have a thorough grasp of the subject and
to identify any gaps. Information retrieval and information extraction are compared and contrasted.
This chapter also discusses automatic voice recognition, speech synthesis, and afaan oromo
language in relation to question and answering systems. This chapter explains what question
answering entails in order to have a thorough grasp of question and answering as well as specifics
on factoid questions.
In Chapter three.The basic Afaan Oromo and components of the proposed architecture for
designing a speech-based afaan oromo question answering system, as well as their interaction, are
discussed in this chapter. Chapter four is about the system's detailed design (implementation
prototype). It goes through the algorithms we utilized to achieve the system's goals for each
component.Chapter five deals with the tests performed in each component, as well as the
outcomes obtained, as well as explanations of how such results occur. Finally, conclusions and
research recommendations are in the chapter six.
11 | P a g e
CHAPTER TWO
LITERETURE REVIEW
2.1 Overview
This chapter offers to reviews the literature and related works in order to identify the gaps, model
they used. And also this chapter starts with question answering, components like question
processing, information retrieval and answer extraction. Also the process of speech recognition
from input utterance up to text output are discussed in this chapter. Finally the gaps of reviewed
literature are identified.
12 | P a g e
Open-domains which are questions almost about everything in the study and can only rely on
general ontologies and world knowledge. On the other hand, these systems usually have much
more data available from which to extract the answer.
The history of question and answering are baseball and lunar were two of the earliest systems for
answering questions [9],[22]. Over the course of a year, baseball responded to inquiries regarding
the Major League Baseball league. Lunar, in turn, provided answers on the geological evaluation
of the rocks that the Apollo lunar missions brought back. In their respective fields, both question-
answering systems were quite productive. QA system attempts to deal with a wide range of
question types including; definition, list, fact, how, why, where, hypothetical, semantically-
constrained and cross-lingual questions [23]. Since the research focused on factoids of questions
and answers, it made an effort to expand on this area.
Question processing is the process that establishes the question's emphasis, classifies the question
type, establishes the intended answer type, and rewords the question into a number of semantically
equivalent inquiries. The memory of the information retrieval system is expanded through question
expansion, often known as reformulating a question into other questions with similar meanings.
Information retrieval (IR) recover the most important and relevant documents that will be put
through passage filtering, which extracts passages that identify potential response strings. Because
no valid answers can be found in a document if IR recall is not present, finding an answer cannot
be processed further [9]. Performance in the IR phase while answering questions can also be
impacted by the precision and ranking of candidate passages or sentences.
Answer extaction is one of the QAS contents, which sets them apart from text retrieval systems
in the conventional sense. Identifying replies is the responsibility of this module, which will then
use answer extraction to obtain the precise response before confirming it.The technology used in
answer extraction is increasingly influencing and determining the outcomes of question answering
systems.
13 | P a g e
2.3.1 Question Processing Content
Question processing content or module is is to process and analyses the question, and to create
some representation of the information requested. Creating this representation requires the
question processing module to determine: - question type, expected answer type and question focus
[9], [24].
Question type is a process of classify the question .The type to which it belongs are used to
determine the questions' types, and the data will then be processed to yield the expected response
for each type of question.As it offers important guidance about the nature of the needed answer,
the question type classification component is thus a helpful, though not essential, component in a
QA system. Since pattern matching techniques are utilized, the question is first categorized
according to its type: what, why, who, how, when, and where inquiries.
In the previous years, question classification was done by using SVM and rule-based approach.
This approach was too specific for the users and it was difficult to achieve the purpose [25]. the
other is SVM. The Rule based model based question classification takes benefit of flexibility over
the SVM question classification but. It needs to rules which means it does not need hard-coded
rules to handle new cases while the language model can be automatically maintained [9], [26]. If
the desired keyword is not obtained, a pattern matching technique in rule-based systems can no
longer predict the expected answer kinds.
Rule based Approach
The process of categorizing questions into different groups is known as question classification.
The list of potential classes is specified and can be narrowed down to a few basic sets by looking
at the key words, by the researcher defined in the rules." The total effectiveness of the Question
Answering system depends heavily on how accurately questions are classified. As a result, most
systems turn to a more in-depth study of the question, which establishes new restrictions on the
answer entity.
There are numerous techniques to implement question classification. Using a set of rules that
convert question types into patterns of questions is the simplest approach. On the surface form,
regular expressions are used to express the patterns. Analyzing the interrogative phrases of the
14 | P a g e
question wh-terms is often how the answer type is determined. So this methods very easy for
classification and we used it for question classification.
A support vector machine (SVM) is another algorithm to supervised machine learning technique
for classifying and predicting types of question. It is largely utilised in Machine Learning
Classification issues, though. SVM chooses the extreme points/vectors that help in creating the
hyperplane [29]. These extreme cases are called as support vectors, and hence algorithm is termed
as Support Vector Machine. Consider the below diagram in which there are two different
categories that are classified using a decision boundary or hyperplane [30].
The module creates a list of keywords to be provided to the document processing module's
information retrieval component after determining the "focus" and "question type." Standard
methods like named-entity recognition, stop-word lists, and part-of-speech taggers, among others,
could be used to extract keywords [9].
15 | P a g e
Information Retrieval (IR)
The purpose of the information retrieval system is to get relevant results that are correct in response
to a question that the user submits. QA systems are independent of IR systems, which gauge
document and query similarity using the cosine vector space model. This is primarily due to the
fact that QA systems often only require documents to be retrieved when they include all of the
keywords. This is due to the Question Processing module's thorough selection and reformulation
of the keywords [8].
Paragraph filtering can reduce both the number of candidate documents and the amount of
candidate text in each document. The idea behind paragraph filtering is that the most pertinent
documents should have the keyword(s) in the search query concentrated in a few nearby
paragraphs rather than throughout the entire page and The aim of paragraph ordering is to rank
the paragraphs according to an acceptability degree of containing the correct answer [8],[9].
The answer's identity depends heavily on the answer type that was established during question
processing. The need to rely on a parser to identify named things arises from the fact that the
answer type is typically not explicitly stated in the inquiry nor the response (e.g. persons,
organizations, place, dates, etc.).
In answer extraction the recognition of the response candidates in the paragraphs is made possible
by the parser. In order to extract only the pertinent word or phrase that responds to the inquiry, a
set of heuristics is used once an answer candidate has been found. Measures of the distance
between keywords, the amount of keywords that matched, and other comparable heuristic metrics
may be used to base extraction. QA systems typically fall back to giving the best-ranked paragraph
if no match is discovered [6, 8].
Answer validation is a part of the question-answering system that chooses dependable answers
among response candidates obtained using particular techniques. Validation systems for answering
questions can be broken down into two steps: the first involves gathering potential responses, and
16 | P a g e
the second involves validating each of those answers. It has been thoroughly researched up to this
point to collect answer candidates in the first step. The following describes its typical technology:
First, a question's answer type—such as PLACE or PERSON—is determined. Then, using queries
created from the inquiry phrase, the documents that might contain answer candidates are obtained
from the available document collection. Lastly, response candidates are gathered from the
retrieved documents using named entities that fit the question's answer type [9].
In speech recognition there are some content and terms to accomplish the ASR correctly. From
those list: - utterance, speaker model or pronunciation, grammar and vocabulary [31].
Utterance
Utterance is a vocalization or speaking of word or sentences by any person utterance speaking by
any person may be single word, multiple word, phrase, sentences and may be paragraphs. The
speech engine receives utterances to process. If the user remains silent, the engine issues what is
known as a silence timeout, which informs the application that no speech was detected during the
anticipated time limit Takes the proper action, such as asking the user for input again. A statement
might be a single word or include several words (a phrase or a sentence]) [31].There are different
types of utterances [9]:-
Isolated: - artificial pause should be inserted before and after each word speaks.
Continuous:-User speaks normally and continually the other is spontaneous which based on
speaking rate, filled pause, correction and repetition and read properly something [9].
Speaker model
Speaker model in automatic speech recognition’s are speaker dependent and independent. Speaker
dependent are speech which we used in training time at developing time. In Speaker independent
speech recognize system accept speech from any speaker and recognize it.it does not depend on
any speaker likes (age and sex.).
17 | P a g e
Pronunciation
To translate spoken input into text, the speech recognition engine employs a variety of data,
statistical models, and algorithms. The pronunciation of a word is one piece of data that the voice
recognition engine needs to process it. This data indicates how the speech recognition engine
believes a word should sound. In order to help the speech recognizer understand the continuous
Afaan Oromo speech, we must develop a pronunciation dictionary [31].
Grammar
Defining the terms and expressions that users can use to communicate with your program. The
speech recognition engine is given definitions for certain words and phrases, which it uses during
the recognition process. A grammar defines the words and expressions that the engine can identify
using a specific syntax, or set of rules. A grammar can be as straightforward as a list of words or
phrases, or it can be flexible enough to allow for enough variation in what can be said that it comes
close to being able to function like real language [31].
Acoustic phonetic
The acoustic-phonetic approach is based on the theory of acoustic phonetics, which holds that
spoken language is composed of discrete, distinct phonetic units, and that these units are broadly
characterized by a set of characteristics that become apparent over time in the speech signal, or its
spectrum. It is assumed that the rules governing the variability are simple and can be easily learned
and applied in real-world situations, despite the fact that the acoustic properties of phonetic units
are highly variable, both with speakers and with nearby phonetic units (the so-called co-
articulation of sounds) [57].
This is the preferred speech synthesis method because it is “the product of many studies in acoustic
phonetics, coupled with principles of phonology. Assuming that the language under consideration
18 | P a g e
has an adequate description of allophonic rules and phonotactic constraints, the researcher can
move directly to feature extraction. But a few questions must be answered first. [32].
Pattern recognition
Pattern training and pattern comparison are the two key steps in the pattern-matching approach.
The approach's pattern-comparison stage compares each potential pattern that was learnt during
the training stage directly with the unknown speeches in order to identify the unknown based on
how well the patterns match.
The essential feature of this approach is that it uses a well-formulated mathematical framework
and establishes consistent speech pattern representations, for reliable pattern comparison, from a
set of labeled training samples via a formal training algorithm. A speech pattern representation can
be in the form of a speech template and a Stochastic model (e.g., Hidden Markov Model) and can
be applied to a sound smaller than a word, a word, or a phrase. The pattern-matching approach has
become the predominant method for speech recognition in the last six decades [33].
Stochastic model are more suitable approach to speech recognition as it uses probabilistic models
to deal with undetermined or incomplete information [34]. There are many methods in this
approach like HMM, SVM, DTW, etc., among these hidden markov model is most popular
stochastic approach today and it applied on the study [9][39].
Artificial intelligence
The artificial intelligence method combines the acoustic phonetic technology and the pattern
recognition strategy. This makes use of the principles and ideas of acoustic phonetic and pattern
recognition techniques.
19 | P a g e
parameters are temporal variability's, while the output distribution model parameters are spectrum
variability. HMM uses signals from a finite-state Markov model and a collection of output
distributions. These two types of variability are essential for speech recognition. Hidden Markov
modeling is more general and has a secure mathematical foundation compared to template based
approach. Compared to knowledge base approach, HMM enables easy incorporation of knowledge
sources into organized architecture [37]. it has ability to estimate parameters from a large amount
of data automatically, their simplicity as well computational feasibility.so for this reason this
model is used to develop afaan Oromo speech recognizer prototype that converts speech question
to text to retrievers the data from the large documents.
HMMs are becoming more and more popular as a result of their automatic capacity to estimate
parameters from enormous amounts of data, as well as their simplicity and computing viability. In
order to translate oral questions into text and search for appropriate element documents that contain
responses, it was employed to construct the necessary prototype for Afaan Oromo speech
recognition. When we need to calculate a probability for a series of observable occurrences, the
Markov chain comes in handy. The events we are most interested in, however, are frequently
concealed and go unnoticed. For instance, we rarely notice concealed part-of-speech markers in
texts. Instead, we only see words, and we must deduce the tags from the word order. The tags are
referred to be hidden because nobody can see them.HMMs algorithms defines as:
Q = q1 q2…..........qN
one group of N states
A = a11 . . . ai j . . ... aNN
A matrix A of transition probabilities, where each entry, ai, represents the likelihood
of transitioning from state i to state j, s.t. ∑N j=1 ai j = 1 ∀
O = o1o2 . . . oT
A series of T observations where each observation is taken from the vocabulary V =
v1, v2... vV
B = bi(ot )
A series of observation likelihoods, also known as emission probabilities, each of
which expresses the likelihood that an observation was produced from a particular
condition.
20 | P a g e
π = π1, π2, ..., πN
A starting distribution of probabilities for states. I represents the likelihood that state
i will be the first state in the Markov chain. Some states j might not be starting states
because of the condition j = 0. Moreover, n i=1 i = 1
Definition, List and Description Question Types for Non-factoid Questions for afaan Oromo
language done by [16]. The research is focused on non-factoid question and answer for definition,
list and description. This study's objective is to suggest solutions to significant issues in Afaan
Oromo non-factoid QA, particularly in list, definition, and description questions. The suggested
QA system includes question analysis, document analysis, document preparation, and components
for extracting answers. Using rule-based methods, the classification of the questions is done. The
method utilized in the component's document analysis gets relevant documents and uses filtering
patterns on the documents that were obtained. The researcher used F-score, precision and recall to
evaluate the performance of the system. The system evaluating question classification classified
98.3% correctly done. The F-score on the stemmed documents is 0.729 and on the other data it set
is 0.764. Moreover, the average F-Score of the answer extraction component is 0.592. 2700
question-answer pair’s datasets are prepared to evaluate system.
Other research is Afaan Oromo factoid question and answer have been attempted [48]. In the work
research’s goal was to find fact-based responses for users. The documents of data set collected
from Oromia Radio and Television Agency, Fana's Afaan Oromo service, Online VOA, and
periodicals published in the language, like Barisa, Kallacha, and Oromia culture and tourism
bureaus, provided electronic Afaan Oromo documents.. The answer extraction module is used to
21 | P a g e
extract candidate answers from documents, whereas the question analysis module and the IR
module are used to identify response types and extract candidate passages from documents,
respectively. In order to broaden his search, the researcher also employed synonyms. To determine
the various answer kinds, rule-based patterns were employed. According to the researcher, 92.2%
of respondents correctly identified their answer type using patterns. Additionally, the researcher
noted that the system displayed 0.83 recall, 0.71 precision, and an F-measure of 0.78. According
to the researcher, the results were encouraging, and the employment of synonyms and phrase-
based indexing further enhanced the system's performance.
In another study of the Amharic question-and-answer system revealed a clear pattern [17]. A
technique used to identify the determine the question types, the possible question focuses, and
expected answer depending on our language-specific data types as well as to build appropriate
Information Retrieval queries investigate issues. Three different types of documents are the focus
of one method of document retrieval (Sentence, paragraph, and file .The named-entity and pattern-
based answer pinpointing algorithms developed help locating possible answer particles in a
document. Approximately 89% of the questions are successfully classified by the Rule-based
question categorization module. The document retrieval component shows greater coverage of
relevant document retrieval (97%) while the sentence based retrieval has the least (93%) which
contributes to the better recall of our system. The gazetteer-based response selection method,
which employs a paragraph answer selection strategy, correctly responds to 72% of the questions,
which is encouraging. The file based answer selection technique exhibits better recall (91%) which
indicates that most relevant documents which are thought to have the correct answer are returned.
The primary study is conducted for designing and constructing to automatic question classification
for speech-based Amharic question answering [9]. After all, the study is built using a combination
of voice recognition, question answering, and speech synthesizer. Speech synthesis is done using
unit selection methods, while question classification is done using SVM. 22600 news pieces from
various online news sources, including Ethiopian News Agency and Ethiopian Reporter, were used
in the study. These documents were produced for training and testing purposes. For this study, 84
22 | P a g e
voice question phrases were read by 24 participants—9 women and 15 men—from a corpus of
2,016 speech question sentences. The questions are numeric and person-related. The voice
recognition system's experimental findings showed 85.58% accuracy. Furthermore. The speech
synthesis also accurately pronouncing 80.86% with 3.17 and 3.45 accuracy in; intelligibility and
naturalness based on MOS. In addition, the SVM question classification offers 82.92%F-measure,
73.91% precision, and 94.44% recall. In general, the speech-based Amharic question answering
system achieves 72.75 % [9].
In addition to this proposed a prototype of towards speech based Amharic question answering
system for open domain factoid questions [49]. As a component Use the Sphinx tool for speech
recognition, the Lucene tool for question answering, and NetBeans to combination of the two tasks.
The experimented result and evaluation shows that the performance of continuous Amharic speech
recognition developed for question corpus registered 4.5% were using development testing and
84.93% recognition performance used live speech input data. The performance of answering
questions is 76% average Precision in finding the right answers. After integrating the speech
recognition and question answering, the performance registered used speech-based question
answering system is 75% average precision in the retrieving correct answer of a given question. In
the study of [49] there are some challenging to identify for speaker independence because they
were done using less number of training on speech recognition process. Additionally, manual
question classification was utilized to determine the different types of questions.
The scholars of [50] noted that, despite being difficult to write text using small keyboards and to
browse web pages on small screens, mobile devices are becoming the predominant way of
information access. The study of Qme, a speech-based question-answering system that retrieves
answers to questions rather than web pages. They highlighted the benefits of the voice recognition
and retrieval components of the system being closely integrated and provided bootstrapping
strategies to differentiate between dynamic and static requests.
The research of [56] build an open domain with a voice interface, and the first prototype
(SpeechQoogle) is constructed with three different modules: ASR, Question-answering, and
speech synthesis. There have been 600,000 QA pairs collected. The associated audio model and
language model are particularly built for the voice recognition module, which promotes the
23 | P a g e
character ACC to 87.17%. Finally, in open-set testing, the integrated prototype correctly answered
56.25% of spoken questions.
Bekele The researcher used sphinx4 It used SVM model for question
library for speech recognition answering classification. SVM is not
Mengesha.H...[9]
and he use SVM model for clear classifies for question because
question Classification and not accurate on pattern matching
question answering. algorithms .so we used Rule based
question classification because rule
based is good classifier for latten
letters.
SVM method needs more accurate
training on question
Belisty The researcher used the The recognize not accurate for
M….[49] sphinx and SVM question speaker independent because the
classification and question number of trainer is small.
answering. Used manual question classification
for identifying question types.
It also used SVM question
classification which is not accurate
Method needs more accurate training
on question.
The works of those how done their work in our country Ethiopia. The above table shows their jobs
and the model they uses to accomplish their work.inorder to identify their gaps.
24 | P a g e
Research gap: the number of electronically created Afaan Oromo data in our country, is rising at
an increasing rate as studies, historical records, fiction, magazines, and many newspaper publishers
began making their works available online. In addition people also ask question since it's in our
fundamental nature to inquire and know somethings.So those peoples who ask question also need
answer for their question from the data or documents quickly. Because more people needs their
answers in little time rather than taking more time. This gap will increase the requirement for a
system that can accepts information in question form, search the knowledge base for relevant
information, and provide a direct response to the questions.
25 | P a g e
CHAPTER THREE
3.1 Overview
Understanding the language's structure is necessary and aid in the creation of the suggested
prototype, the fundamental structure of Afaan Oromo is presented in this chapter. This chapter
discusses the nature of the language, including how widely it is spoken, its areas of application
(such as newspapers, various Oromia states' offices, various research publications, higher
educational institutions, etc.), how words are formed, its morphological makeup, and other crucial
aspects of the language that are particularly crucial for this thesis.
26 | P a g e
Afaan Oromo is a language spoken outside of Ethiopia in Kenya and Somalia, and also the official
tongue of the Oromia Regional State in Ethiopia. It also a language used for learning and teaching
in colleges for teachers, lecturer and as well as in area of schools. Additionally, it is taught as a
major course at the BA, MA, and PhD levels in many Ethiopian universities. What is more, Afaan
Oromo is also taught in North American Minneapolis College [51]. There are also radio and
television programs in Ethiopia that provide information in Afaan Oromo. These include Ethiopian
Radio, Radio Fana, Oromia Television, and Ethiopian Television (ETV) and Fana Television.
Documents written in Afaan Oromo before 1991 were written in the Ge'ez script [51]. Since 1991,
the Qubee alphabet with a Latin foundation has been used as Afaan Oromo's official script. About
26 consonants and 10 vowels make up the language.
In Afaan Oromo, there are five vowels: a, e, i, o, and u. those vowels are doubled when they
pronounced as lengthened or stretched.Thoose are „aa‟, „ee‟, „ii‟, „oo‟, „uu‟.The consonant of
Afaan Oromo is that much same within English consonant, yet there are unique Letters, such
“ch”,”dh”,”sh”,”ny””ph”,”ts”,”zh” and “ ’ (hudhaa)”. “ch” and “sh” same speech as English. In
Afaan Oromo, "dh" is formed similarly to the English "d" by slightly curling the tongue back and drawing
27 | P a g e
in the air such that a glottal stop is audible before the next vowel starts. Another Afaan Oromo consonant
is “ny" is smacked with the lips outward in a manner similar to how "gn" sounds in English. These
few unique letter combinations are frequently used to create words. For instance, ch is used to
mean "eating" in nyaachuu, sh is used to mean shan "five," dh is used to mean dhaabachuu"stop,"
ph is used to mean buuphaa "egg," . In general, the 36 letters called “Qubee” in Afaan Oromo (26
consonants and 10 vowels) are used [16].
Grammar
The Afaan Oromo alphabet is crucial because everyday people uses its structure to write and read
communicate each. Even if you know how to write some words, without them, you will not be
able to pronounce them correctly. You will be more understood when speaking Afaan Oromo if
you pronounce each letter of a word correctly. Afaan Oromo uses a Subject-Object-Verb (SOV)
structure. Although the word order can be flexible because it is a declined language (nouns vary
depending on their function in the sentence), verbs always come after their subjects and objects.
In general, indirect objects come after direct objects. In Afaan Oromo preposition and post
positions rules also more frequent.
28 | P a g e
Afaan Oromo Vowels: (Dubbachiftuu)
The vowels in the Afaan Oromo language are denoted by the five letters a, e, o, u, and i. In Afaan
Oromo, vowels are often pronounced in the same way. When stressed, these vowels can either be
opened, as in maaliif (why), dhuugi (drink), or closed, as in muka and tahi. Every word in Afaan
Oromo is pronounced powerfully because the vowels are always pronounced sharply and clearly,
as example:
a: dhaagaa,bara,macaafa,jalqabaa,afaan
e: Seensa,Seena,Keenya,Beekuu
i: Fidi,irraa,Xinnoo,biyya,Hoojii
o: Gooftaa,Tokko,Yeroo,Kiyyoo,Boodde
u: oduu, umaa, xumura…etc.
The majority of Afaan Oromo constants are similar to those in Italian, however there are a few
exceptions and unique combinations.
29 | P a g e
Except for the combination consonants ny, dh, ph, and sh, all Afaan Oromo consonants have
double consonant combinations if the phrase is stressed.
The stress is placed on the final syllable in some Afaan Oromo words: example: gammoojjii,
ijoollummaa, ilaallu. However, few words have their first syllable stressed.
The alphabet of Afaan Oromo doubles the letters for the five vowel sounds to represent the
standard Southern Cushitic arrangement of five short and five long vowels. The length disparity is
contrastive. For example, bara 'year', and baaraa 'know'. In Afaan Oromo, germination is also
crucial. For example, the length of a consonant can make words distinct from one another. For
example, bara 'year ', barraa 'knowing' [31].
In Afaan Oromo Instead of the syllable's vowel, the more is the tone-bearing unit in Oromo. Two
mora make up a long vowel or diphthong, which can have two tones. The tone of each mora is
classified as high or low. There is only one high tone per word, and it must be on the last or next-
to-last mora.
There are three tones in terms of phonetics: rising, falling, and high. Rules:
A pitch accent could be used to describe this tone-based technique. Similar to what is found in
Somalia. Tone is related to stress. The decreasing tone has less stress, the low tone has none, and
the high tone has tremendous stress.
The following are the rules for indicating tone in written in Afaan Oromo:
acute accent – tone is high
grave accent – tone is low
Circumflex - tone is falling.
30 | P a g e
3.3.2.1 The glottal stop (‘) (Hudhaa)
In Afaan Oromo, the word "Hudhaa" has a diacritical mark. Like in many other languages, the
Oromo language has glottal sounds at certain points in the word's syllable. Hudhaa is mostly used
in Oromo as a diacritical marker to alter the sound value of letters to which it is attached. The
diaeresis-symbolized vowel is spoken separately from the preceding vowel as evidenced by the
acute and grave emphases, which can indicate that a final vowel is to be spoken differently. In
words with successive vowels of different types, such as ka'e, ba'ee, ta'ee, mo'a, xaa'oo, de'uu, and
du'e, Hudhaa often appears between the letters. Therefore, Hudhaa is required whenever two
consecutive vowels of a different type occur in an Oromo word.
Double vowels are used to denote long vowels. Unless it is written as a digraph, a long
consonant is also expressed by doubling.
These letters stand in for the glottalized and implosive stops: [p'] for ph, [t'] for x, [] for dh,
and [k'] for q.
The affricates are shown as [t] as [ch], [d] as [j], and [t'] as [c].
31 | P a g e
The digraph ny stands in for the nasal [ɲ].
3.5 Morphology
Nominal. Case, number, and occasionally gender are indicated in noun markings.
Case: nominative, ablative, instrumental, locative, genitive, and absolute. Nominative and
absolutive are the two primary conditions that result in agreement between the components of the
noun phrase. The unmarked absolutive serves as a citation and can be used as a predicative or
direct object.
Gender: feminine and masculine. Nouns are not gendered, unless they are used in certain dialects
or while talking about persons.
Number: in Afaan Oromo number identify singular or plural. Singular identify single things like
harre.Where plural identify more than one like harroota.The prefix –oota identify the plurals.
For each situation, there are seven different personal pronouns that distinguish between three
different people, two different numbers, and only the 3rd person singular's gender. They are
declined in the nominative, dative, instrumental, locative, and ablative situations in addition to the
absolutive:
32 | P a g e
Verbal: The lexical meaning of the verb is represented by the stem of an Afaan Oromo verb, while
the tense, aspect, and subject agreement are indicated by the suffix. For instance, dem- is the stem
('come') in demne, "we go," while -ne denotes the past tense and first person plural subject of the
verb.
Regular verb: The majority of Oromo verbs of afaan oromoo is "regular,” they add the standard
person-based and number suffixes to stems without making any other alterations. Below is an
example of the present-future conjugations for the word nyachuu, with the bolded suffixes. These
verbs' stems don't end in a double consonant (ch), a vowel (y), or a letter (w)..
Isiin nyaatttani
Isheen nyaatte,nyaati
33 | P a g e
3.6 Summary
The Oromo language, which is spoken by people in Ethiopia and its neighboring countries, is an
Afro-Asiatic language belonging to the Cushitic language family. Noun, verb, adjective, and other
word classifications in Afaan Oromo are significant to the language's writing system. The
morphology of the Afaan Oromo language can be broadly divided into two types: derivational and
inflectional. Derivational is the process of creating new words from preexisting ones, whereas
inflectional is the adaptation of new word forms while maintaining the same meaning.
34 | P a g e
CHAPTER FOUR
4.1. Overview
This study's objective is to create and build a speech based question and QAS classification for
Afaan Oromo languages. This is achieved by combining speech recognition question-answering,
and rule-based Algorithms. The proposed system is moving in the direction of making an effort to
close the societal gap in which the conventional methods of document retrieval do not
accommodate all users. The most often used type of appropriate methods for human-computer
communication settings.
35 | P a g e
Figure 4. 1. System architecture for Afaan Oromo speech based question and answering.
36 | P a g e
The above figure 4.1 shows the structure of Speech based question for Afaan Oromo question-
answering classification system in this study, we define the methods and algorithm needed to build
a user interface for the system.
The system accepts question from user and processes them and recognised it and change into text.
The question-answering system then analyses the text question, uses the appropriate procedures to
classify the query, and then retrieves the precise response. Finally, the Answer is delivered in the
text for. The overall descriptions of the architecture and the system's operation detail described
below.
Automatic Speech Recognition is a technology that enables people to use their voices to
communicate with computer interfaces in a way that, in its most advanced forms, closely resembles
natural human speech. ASR allows a computer to recognize the words a person says into a
microphone or over the phone and translate those words into written text [31]. A comprehensive
ASR system based on the HMM-based CMUSphinx4 technology is constructed. Continuous
recognition and speaker independence are features of the system. It can manage extensive
vocabularies. Our method for creating Afaan Oromo sound models for the CMU .The Sphinx
system consists of creating and honing linguistic and acoustic models utilizing Afaan Oromo voice
data and creating an Afaan Oromo -character-based lexicon.
37 | P a g e
Lexical model. Acoustic model
Speech capture
Feature
extraction
Output text
The method of feature extraction involves converting the speech waveform to a parametric
representation at a significantly lower data rate for later processing and analysis. By taking features
from the input data, feature extraction improves the precision of learnt models. By eliminating
redundant data, this step in the general framework lowers the dimensionality of the data. The
Automatic Speech Recognition (ASR) process begins with features extraction, which is a crucial
phase in which pertinent information is taken from a speech. A speech signal is first pre-processed
(noise reduction, endpoint identification, pre-emphasis, framing, and normalization), and then a
38 | P a g e
feature extraction stage uses extraction techniques like Mel-Frequency Cepstral Coefficients
(MFCCs), Discrete Wavelet Transforms (DWTs), and Linear Predictive Coding (LPC) to retain a
set of predefined features from the processed speech [52].In this research MFCCs is implemented
. This feature extraction technique's primary goal is to replicate the human ear. First, the MFCC is
determined by dividing the voice signal into alternating, 25 or 30 frame segments. 10-millisecond
intervals separated by milliseconds successive frames DFT stands for discrete Fourier transform.
After each frame is calculated on each windowed frame, with an increased Hamming window
function. High identification accuracy, good discrimination, and low coefficients of correlation are
all characteristics of MFCC.
Fast Fourier
Transform(FFT)
Mel spectrum
Mel-cestrum Mel-frequency
Cestrum
warping
The main part of Speech recognition is an acoustic model, which is made up of statistical
representations of the sounds for words from speech audio recordings and transcriptions. The
second element is a language model, which provides probability for word sequences. The acoustic
39 | P a g e
model is the central component of an ASR and is in charge of the bulk of the computational work
as well as system performance. The acoustic model establishes a mapping between phonemes and
potential auditory manifestations. It was designed to pick up spoken phonemes. A statistical
representation of the sounds that make up words is used in its development, together with speech
audio recordings, their text scripts, and other elements. A set of speech recorded files, a phonetic
dictionary, a text file containing a parallel transcription of these speech files, and a list of phones
are the four main types of data used in the acoustic model training process. The acoustic files
containing those files in our researches. The acoustic model folder called Other.ci_cont which
contains seven file created.
Recorded Speech Data: defines the requirements for question recording data, which are either
NIST or WAV.It is necessary to prepare the questions that were chosen for training and recording
those texts in order to create question speech. The wave file's following characteristics have been
kept constant throughout the recording period: 16 kHz is the audio sampling rate. Bit rate (samples
per bit): 16 (The element position will be divided into 65536 potential values at 16 bit per sample.).
The channel also mono channel.
Dictionary: The Dictionary includes pronunciations for terms contained in the Language Model.
The pronunciations separate words into groups of the Acoustic Model’s sub-word units. The
Dictionary interface also offers word categorization and permits the inclusion of a single term in
numerous categories. We create dictionary name nuguse.dic and it lookslike:
BEEKAMAAN B AH AH K AE M EY AH N
BEEKAMEGAREEN B AH EH K AH M AH G AE R IY AH N
BEEKAMTII B AH AH K AE M T IY
BEEKAMTIIN B AH EH K AH M T IY N
BEEKAMTU B AH AH K AE M T UW
BEEKAMU B AH EH K AH M UW
BEEKAMUU B AH AH K AE M Y UW AH
File IDs Definition File: The training and test files define all audio file ids without extensions and
references to the root folder .A *.fields file contains the wav directory's path in a file-system. Keep
in mind that a *.fields file's content should only include the names of audio filenames.
40 | P a g e
Speaker_4/sp_1
Speaker_4/sp_2
Speaker_4/sp_3
Transcription Data: establishes in each file sentence to utterance mappings. Speech fields (filed
file) are placed between parenthesis at the beginning <s> and </s>end of each line. These are not
phone to audio mappings; rather, they are mappings between the words in the dictionary file's left
column and the audio files. In the training file ids file, the order of the files is specified, and that
order must be followed.*.transcription: The other_train.transcription and other_test.transcription
files are text files listing the transcription for each audio file:our transcription file like this:
<s> AMEERIKAATTI AMBAASADDARRI ITYOOPHIYAA EENYU </s> (file_1)
<s> AYYAANNI IID ALFAXIIR WAGGAA KUMA TOKKO DHIBBA AFUR FI AFURTAMII SADAAFFAAN
EESSATTI GAGGEEFFAME </s> (file_3)
Phoneset file (*.phone): should each line have a phone. In addition to the unique SIL phone for
quiet, the number of phones should correspond to those utilised in the lexicon.we create phoneset
called other.phone.it contains phone like this:
HH
IH
IY
JH
Filler dictionary (*.filler): contains filler phones that the language model does not cover. Non-
linguistic noises (such as laughter, "hmm," or breath). It might only be silent:
<s> SIL
</s> SIL
<Sil> SIL
Finally we set up a training script. To start go to other and run the following commands:
41 | P a g e
4.3.4 Language Model
Another crucial prerequisite for any ASR system is the language model. In order to create a
language model, the word unigram counts are first calculated. These counts are then converted
into a task vocabulary with word frequencies, which is then used to generate the bi-grams and
trigrams from the training text. Finally, the n-grams are then converted into a binary format
language model and the standard ARPA format.
Before we begin, you need to download a few components. Make sure you have the following
Unix-compatible standard software. If you installed Linux or Cygwin on a Windows system, these
are the necessary utilities. They typically come in stock. They are also not that hard to obtain. We
install package like: Tar, gzip, gcc, python..etc. Finally language model called weather.lm is
created.
42 | P a g e
4.3.5 Lexical Model.
Lexical is just a collection of words whose pronunciations have been divided into phonemes, or
units of word pronunciation. It functions a lot like a pronunciation dictionary. It resembles a
pronunciation dictionary in many ways. The lexicon is crucial to automatic speech recognition
because it connects the acoustic-level representation to the word sequence that the speech
recognizer outputs. The lexicon has two functions: first, it defines the words or lexical items that
the system is capable of understanding, and second, it offers the tools needed to create acoustic
models for each entry.
4.3.6 Decoder
It comes from the coding theory roots of this particular stochastic model fitting. To select the most
likely word sequence, the decoder basically examines all conceivable alignments, pronunciations,
and word sequences. There are different offline tools or libraries used for speech recognition those
are Vosk, CMU Sphinx, Snow boy Hotword Detection ,HTK…etc. Those are work offline but
depend on the language that you are going to make speech recognize. From those sphinx tools
support afaan Oromo Unicode without transliterating the language so due to this reason sphinx-4
is used for decoding,sphinixTrain, sphinixbase and sphinix3 are used sphinx tools for training and
testing the system. Sphinx train is a module used for training the voice corpus that mostly contains
the frontend, model and model loader, and dictionary and language models. The Sphinx Train must
be downloaded for free from the CMU website and used with the above-mentioned prerequisites
may change. The reasons why we use sphinx it has a powerful system for cross referencing
documents, external documents, software apis, bibliographies, glossaries, searching and more. The
configuration of the system typed in below:-
configuration.setAcousticModelPath ("resource:/accoustic/");
configuration.setDictionaryPath ("resource:/oromoLM/nuguses.dic");
configuration.setLanguageModelPath ("resource:/oromoLM/weathers.lm");
43 | P a g e
Figure 4. 6 Speech Recognition Output samples
In the above figure 4.5 shows the output MUUMMEN LIIGII EENYU based on the input sound
from speakers. The system may add words, deletes and also substitute. Based on the above image
the word LIIGII is added word and the word MINIISTIIRI is also deleted from trains question .so
the accuracy of the system defined based on word error rate algorithms.
The java code developed for user how the user pronounce a question, how the system recognized
it based on the model created during the training phase and how it transformed the question into
text. The first step creating acoustic model and the second is identifying the language model,
dictionary. This file contains, Dictionary, filler, language model, Decoder, Acoustic model, front-
44 | P a g e
end, sphinx-4 properties….etc. The configuration oromoLM file contains language model in binary
and dump format and it also contains dictionary files called nuguse.
The technology uses a microphone to accept specific questions from users. The vocal inquiry was
then recognized by the system and transformed into a text-based question. The text question is sent
to QA systems via the QAS (speech) function after being recognized as a complete text question.
When there is no noise and interruption the system recognize effectively. The system automatically
substitutes, adds, and modifies words that fit the unrecognized term if the system cannot recognize
it owing to an interruption.
45 | P a g e
4.4.1 Question Analysis
The first goal of question analysis is to extract characteristics that will probably be used in the
answer extraction module. The second goal is to extract terms that will re-index the selected
documents in order to retain only a subset of them and to provide additional evidence during the
final matching. The focus, question category, and intended type of answer are the retrieved
features. Understanding the type of information that the question is seeking is the major purpose
of the question analysis component. Additionally, it is in charge of creating appropriate queries for
document retrieval. The system's question analysis begins when a user asks a question.
Components receive the user query and transmit it to their child components. With the following
two goals in mind, question analysis is done: the initial step is to take some the second is to identify
features that are likely to be employed in the answer extraction module; in order to save only a
portion of the terms used to re-index the chosen texts, During the final matching, provide further
evidence in support of them.
46 | P a g e
S/NO Question Particles Expected/answer
In different research done previous or literature review different methods is used in question
classification.so based on this the researcher construct rule based model for question classification.
Because: rule based is one of the most robust and accurate algorithm among the other classification
algorithms. And this research uses automatic question classification. To classifying questions
using the rule based technique, the decision model is created.
47 | P a g e
Corsea class Fine class
The rule based algorithm is not need train the system's query classification in order to get the best
accuracy/precision. We've established both the coarse- and fine-grained classes of answer types.
Based on the given data, the Question categorization forecasts the expected class. Examples of
Afaan oromoo Factoid question and answer systems include the ones below. Based on the training
data, the Question categorization forecasts the expected class. Illustrations of afaan oromoo the
following factoid question and answer formats are examples. “Ministirri Muummee Itiyoophiyaa
eenyu?” which means “Who is the Prime Minister of Ethiopia?” the coarse class is “person” and
the fine class is “Ministirri Muummee”.the other example “Magaalaan Adamaa Finfinnee irraa
kiiloo meetira meeqa fagaatti?” means “How many kilometres is the city of Adama from Addis
Ababa?” the corsea class is “Lakkofsa (Quantity)” and fine is “fageenya” which means
“distance”.
48 | P a g e
For each natural language question posed
Check for question particles “eessa", "eessatti", "eessarra”
If question contains one of these question particles
Classify question as place type
Else check for question particles “yoom", "yoomi", "hamma","yoomiraa, …
If question contains one of these question particles
Classify question as time type
Else If question contains terms like “yoom”,”yoomfaa”, “guyyaa”, “yoomif”
Classify question as numeric type
Else check for question particles eenyu", "eenyuf", "eenyufa","eenyuf eenyu",
"eenyun”
If question contains one of these question particles, // question focus
Classify questions as person type
Else Try the IR based technique
End for
49 | P a g e
The first step in creating a query is to eliminate any stop words, including question marks and
punctuation. The stop word removal algorithm is identical to the one we used in the document pre-
processing module. Since stop words have already been eliminated from the index and cannot be
used to match relevant documents, this subcomponent includes the stop word removal component.
The remaining search keywords will then undergo character normalization, which helps to change
the characters to the same format as used in the document preparation module, once the stop words
have been eliminated. Character normalization must be taken into account at this step of question
processing if pertinent documents are to match the inquiry.
50 | P a g e
fact that QA systems often only require documents to be retrieved when they include all of the
keywords. This is due to the Question Processing module's thorough selection and reformulation
of the keywords. Finding relevant information from documents in the Afaan Oromo corpus that
corresponds to a user's search is the goal of the document retrieval component. In reality, once
papers have been preprocessed, their nature and content are stable. However, it is crucial to prepare
parallel corpora before doing such document preparation activities. With the provided Question
sets and Answer sets aligned at the sentence level as a source of data, it is necessary to train and
test the system.for this document retrieval and processing we use apache lucene which is free and
open-source search engine software library.
4.4.5.3 Searching
The generated query is then sent to the document retrieval component, which looks for documents
that are believed to be related to the inquiry and could contain potential responses. QA systems
frequently demand that documents only be retrieved if they contain every possible keyword. A
paragraph is filtered using the highest question word that matches the documents and the
paragraph's ordering, together with the distance score, score for missing keywords, and identical
word sequence score.
4.4.5.4 Indexing
A Document serves as both the search and indexing unit in Lucene. There are one or more
Documents in an index. Searching entails obtaining Documents from an index using an Index
Searcher, while indexing entails adding Documents to an Index Writer. In order to search, an index
must already have been created. A query is created (often using a query parser), which is then sent
to an index searcher, who then provides a list of hits. The user can select which field(s) to search
on, which fields to give greater weight to (boost), how to do Boolean queries (AND, OR, NOT),
and other features using the Lucene query language. By adding documents containing the field(s)
to Index Writer, which analyses the document(s) using the analyzer before creating, opening, or
editing the necessary indexes and storing or updating them in a directory, we may use Lucene to
index terms. The Index Writer tool can be used to create or change indexes. It does not read
indexes.
51 | P a g e
4.4.5.5 Document Analysis
The document analysis uses a list of the most likely responses and a question categorization
description to determine the appropriate response. Researchers can better understand and organize
primary sources original accounts from people who have firsthand experience of a subject by
analyzing documents. Researchers obtain ideas and data for their studies from reputable sources
in order to bolster their assertions. This procedure enables researchers to assess the value and intent
of the sources they utilize to determine whether the data they include will be useful for their
research. Understanding this procedure better could make it easier for you to manage your
resources and do research more successfully.
4.4.5.6 Ranking
Information retrieval (IR) includes a procedure known as document ranking. It displays the
documents that were found in the order of their estimated levels of relevance to the query. The
majority of conventional document ranking techniques are based on calculations of similarity
between documents and queries. One of the key issues in information retrieval is the ranking of
queries. It’s the scientific/engineering discipline behind search engines. Example “pirezidaantiin
itoophiyaa eenyu?” which means who is the president of Ethiopia? For this answer we can get
some document but for answer we need one document.so to identify these ranking document is
very important.
52 | P a g e
based lecune analyser for question classification, and NetBeans for integrating all modules are
some used tool in the research.
Sphinx tools: Sphinx is an n-gram statistical language model and hidden Markov acoustic models
(HMMs)-based continuous-speech recognition system. An established set of tools for creating
speech apps is the Sphinx toolkit for speech recognition.
Sphinx4: Pure Java speech recognition library called Sphinx4 is used. It offers a quick and simple
API that uses CMUSphinx acoustic models to convert speech recordings into text. Both desktop
and server programmers can make advantage of it. In addition to voice recognition, Sphinx4 aids
with speaker identification, model adaptation, alignment of existing transcription to audio for time
stamping, and more.
Sphinx 3: is a Large Vocabulary ASR System that is slightly slower but more accurate. It serves
as an evaluation-focused server implementation of Sphinx. It makes use of HMMs with continuous
PDF output. It supports a variety of operating modes. The original Sphinx-3 version has the more
precise model, "known as the flat decoder."
Pocket Sphinx5.0.0: One of the open source, speaker-independent continuous speech recognition
engines with a big vocabulary developed by Carnegie Mellon University is called Pocket Sphinx.
Sphinx variant suitable for embedded systems, such as those based on ARM processors. Fixed-
point arithmetic and effective GMM computing techniques are included in the actively being
developed Pocket Sphinx. It is a voice recognizer that can be included into embedded technology.
It has been carefully optimized for CPUs like ARMs. CMU's fastest voice recognition technology
is called Pocket Sphinx. It employs HMMs with PDF output that is semi-continuous. Despite not
being as accurate as Sphinx-3 or Sphinx-4, it runs in real time and is therefore appropriate for live
applications.
Sphinx Train5.0.0: is a group of sphinx tools used for training acoustic models. It is a Sphinx
training program from CMU. It develops vector quantized continuous or semi continuous models
for the SPHINX decoder versions 3 or 4, which Pocket Sphinx also employs. Under certain
circumstances relating to sphinx-2 limits, it is also possible to convert the sphinx-3 format to the
sphinx-2 format. Sphinx Train supports MFCC and PLP coefficients with delta or delta-delta
features.
53 | P a g e
Lucene search engine: A fully functional, high-performance search engine library, Apache
Lucene is created entirely in Java. It is a technology that can be used for almost any application
that needs spell checking, query suggestions, nearest-neighbor search over high-dimensionality
vectors, full-text search, faceting, and structured search. A programming interface (API), not an
application, is what Lucene is specifically. This indicates that all of the challenging tasks have
been completed, leaving us to construct the simple code in order to meet the demands of our
application.
Wave Surfer: is an open source tool for manipulating and visualizing sound. Speech/sound
analysis and sound annotation/transcription are typical uses. Wave Surfer can be enhanced with
plug-ins and integrated with other programmers. Customizable features allow users to design their
own setups for the wave surfer. Localization assistance. Extensible - new functionality can be
added through a plugin architecture, embeddable - Wave Surfer is a multi-platform widget that
works with Linux, Windows, and OS X apps.
54 | P a g e
Figure 4. 9 Question and answering user interface templates
On the above using java swing the interface question and answer interaction done. This java
Interface on the Gaaffi accept question and on the text area Deebii provides the correct answer. On
the right side of text area called Madda it provides the passages were the answer extracted
from.Deebii filanoo 1:, 2,3 and 4 also answer that are ranked for documents.
55 | P a g e
CHAPTER FIVE
5.1 OVERVIEW
In this Chapter we identify the experimentation result, the data set, the evaluation metrics, and the
results of the performance of afaan Oromo speech recognition, the performance of question and
answering and question classification and finally we measure the overall QAS. The reason for
performance review is being done to determine how well the study is working and to make
recommendations for the future.
Sphinx: is a voice recognition library written entirely in Java. It offers a quick and simple API that
uses CMUSphinx acoustic models to convert speech recordings into text. Both servers and desktop
apps can make advantage of it.
Lecune: is a fully .functional, high-performance search engine library that was created entirely in
Java. Nearly all applications that call for structured search, full-text search, faceting, nearest-
neighbor search across high-dimensionality vectors, spell checking, or query recommendations
can benefit from this technology.
Java programming language: is one of the key computing platforms and programming languages
for numerous applications is Java. Sun Microsystems introduced it in 1995. Java is used by many
applications, especially web apps since it has a lot of promise in the fields of web, games,
databases, and many other applications. It includes a variety of compilers and editors, including
the Text Pad, Eclipse, and NetBeans platforms.
The computer used to create and test the system has the following specifications:
56 | P a g e
Windows 11 pro operating system,
8GB RAM Extendable to 16GB ,
500GB SSD Hard disk and
Processor 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.00 GHz.
Java 2 SDK.
Java runtime Environment (JRE)
NetBeans 8.0
Cygwin
Notepad++
Python
Perl
Visual studio
Ant
57 | P a g e
Figure 5. 1 speech accepted from user to question and answer.
The above image show of speech recognition output. First when we run java code that accepts the
sound the image sample run code is displayed. The way to connect to a speech recognition is
depend on the microphone on your computer The LiveSpeechRecognizer uses a microphone as
the speech source which accepts sounds from user. If the microphone on it accepts the speech and
change to text but if the microphone not work it shows error message microphone not work. Based
on the above image experimental and evaluation is done for speech recognition.
5.4.1 Performance
The speech recognizer primarily accepts speech inputs and can identify sentences. A speech
recognition system uses a sphinx decoder to convert spoken words into text sequences. It is made
up of a speech database, an Afaan Oromo dictionary, language models, lexical models, and
acoustic models.
The accuracy and speed of speech recognition systems' performance are typically measured. Speed
is quantified using the real time factor, however accuracy can be measured in terms of performance
accuracy, which is often graded with word error rate (WER).and Single Word Error Rate (SWER)
and Command Success Rate (CSR) are additional metrics for accuracy.The speech recognizer
primarily accepts speech as inputs and identify to sentences. A speech recognition system uses a
58 | P a g e
sphinx decoder to convert spoken words into text sequences. It is made up of a speech database,
an Afaanoromo dictionary, language models, lexical models, and acoustic models.
Sample Size 16
in bits
Signed True
Character Little-
type Endian
59 | P a g e
5.4.3 Experiments
In this experimental we test the system using 20 question, divided to 2 groups.one group from train
and one from other person that does not train. Means each group for one speaker. The speech
data recorded- and tested. The testing application's code has to be modified so that it can read from
an existing file when using a wave file. Additionally, that wave file must be processed in order for
it to work with our system. Each file must successfully complete a format test as part of the
checking channel before it can be used in the system. In the table above, the speech file's supported
formats are listed. In this case, we immediately provide the recognizer with audio data using the
microphone.
Number of 20 20
Question/sentences
Number of Deleted 10 12
words
Number of 12 12
substituted Words
Number of 16 18
Added words
Based on the above table we identify the accuracy and WER (word error rate) for speech
recognition systems. Which used sphinx4-5 and show results as below table.
60 | P a g e
Speaker Number Accuracy Duration WER(word error rate)
In sec
Speaker 1 79.5% 76 20.5%
The table above shows the performance of speech recognition based on the 2 speaker’s speaker 1
from trainers and speakers 2 that does not train. The system recognize 78.4% accuracy and 21.6%
word error rate.
61 | P a g e
Figure 5. 2 Question classification analyzers
The above image shows the rule based question qualification.it depends on the rule above
identified example, the place question particle’s “eessatti” shows place based question. The
challenges of other classification like SVM is a query is exist in different class so, SVM is not
clear classifies a question So based on the above, when a user submits a question, the system
instantly categorizes it based on the predicted term, such as (eessati).based on the above analyzer
74 question are prepared for both Person and place question types it analyzer classified 98% and
96% for both questions list respectively.
These are the most widely used metrics for assessing the quality of IR retrieval and QA. Precision
is the ratio of the number of shared words to the total number of words in the prediction. Recall is
62 | P a g e
the ratio of the number of shared words to the total number of words in the ground truth. And F-
Score can be defined as harmonic mean of precision and recall. It is a Measure of system’s
accuracy. It considers both the precision and recall to compute the score. Our systems contains
Precision and recall can be summarized as follows. Precision is calculated as the number of
correctly Retrieved answered correctly document over the total retrieved list of answers.retrived
correct means it contains (correct, not exactly correct and wrong answer and not answer). The
recall is also calculated as a number of correctly answered questions among the list of expected
answer sets where documents are first analyzed for the presence of correct answers.
When a user submits a question, the system instantly categorises it based on the predicted term,
such as (eenyu). In this category, the term "eenyu display person".The correct answer could be
the one which is displayed on the second text field. The other four text fields display the answer
choice which is depend on the word in documents that have similarity. The answer and question
displays based on question analyser which contains question classification queries and query
generator.
63 | P a g e
88.8% 88.8% 88.8%
The performance of question and answering using rule based question classification performs
89.1% precision, 91.6% recall and 90.3% F-measurement. The challenges in rule based question
classification based question and answering if the class of question wrong, the answer must be
wrong.
In above figure the question kantibaan Magalaa Finfinnee eenyu? Classifies question correctly and
answer correctly.it classifies as person and Debii (answer) also correct Aadde Adaanach Abeebee.
64 | P a g e
Figure 5. 4 Screen shoot of correct place type answer
The above figure 5.6 shows the answer for Atleet Sanbaree Tafariin fiigicha daandirraa walakkaa
Maaraatoonii eessattii injifatee the question classify the question in to place and the answer for
this question Niwuu Yorkitti which exist in documents.
The figure above asks (Who is the Ethiopian coffee forward player). The documents exist in
corpus and the answer should be Abuubeker Nasiir, but it returns No answer.
65 | P a g e
Figure 5. 6 Screen shoots of in correct answer types
In this above figure 5.8 the question classifies correctly but it does not retrieve correct answer from
exist documents in corpus and the answer for this question should FNBtti.
In this study, we try to combine afaan oromo speech recognition with an Afaan oromo question-
answering system using automatic question classification, which enables the question-answering
system to accept speech queries and retrieve precise textual answers before producing responses
in the form of text to the users for a particular query.
The main conclusions of this study are the development of an Afaan Omo Continuous Speech
Recognizer and the use of question categorization methods for question answering by using rule-
based algorithms based on performance attained and integrating them with question answering.
The speech recognition system's experimental findings showed accuracy of 78.4%. The question
classification without question and answering for both Person and place question types it analyzer
classified 98% and 96% for both questions list respectively. But with question and answering the
Rule based question classification accurate 89.1% precision, 91.6% recall and 90.3% F-
measurement.when we compare other related work. Amharic Speech recognizer for
question answering question rate of 85.58% and the speech recognition of speechGoog show the
66 | P a g e
recognition rates 87.17% which we appreciate[9]. And also the experiments of questions and
answering 89.1% precision and in the previous research indicates 73.91% using SVM algorithms.
Our study gets good accuracy. Some challenges are the reason for why the system does not score
best results.
67 | P a g e
CHAPTER SIX
6.1 Conclusion
This work demonstrates the potential for creating an automated question categorization speech-
based Afaan oromo question for automated question answering. The user asks the query using STT
in afaan oromo native language. It extract the necessary response from question-answering systems
and output the response as text, the user inquiries are compared with the documents.
In the first corpus collected and the speech recognition training and testing done.in the second
round question classification and the question and answering system developed.in the final we
connect both of to make to make Automatic question classification speech based question for afaan
Oromo question and answering.in the speech recognition different component occurs from those
Language model, feature extraction, dictionary, acoustic model and decoder like sphinx.to develop
those models we use NetBeans 8.0,perl,python,Cygwin and notepad++.for decoding the code we
use pocketsphinix5.0 and sphinxtrain5.0. We integrates those by java NetBeans.
On the second round creating factoid question and response systems for information extraction
and retrieval. Question analysis, document retrieval, and response extraction make up the three
primary parts of the question answering section. In response to the user's request, the question
analysis component determines the kind of fact the user is looking for, such as a person, time,
place, or amount.in this question classification done by Rule based qualification. Which classify
question easily by define rules.
The component that gets created questions from the question analysis component that came before
it searches for documents that contain information relevant to the user's query. This search is
conducted within the directory of indexed documents. The answer extraction component will
receive ranking copies of any pertinent documents it discovers. Finally the answer extraction is a
component in charge of unearthing the information that is thought to be a response to the user's
query and returning this information to the user in a specific order.
We test the system using 20 question, divided to 2 groups.one group from train and one from other
person that does not train. Means each group for one speaker. The speech data recorded- and tested.
The testing application's code has to be modified so that it can read from an existing file when
using a wave file. The result of the training shows that and 19% is word error and 81% accuracy.
The performance of question and answering using rule based question classification performs
89.1% precision, 91.6% recall and 90.3% F-measurement. The challenges in rule based question
classification based question and answering if the class of question wrong, the answer must be
wrong. In general the overall performance of Automatic Question Classification and Speech Based
Question for Afaan Oromo Question Answering achieves 71.45% accuracy.
68 | P a g e
6.2 Recommendation
To improve the performance the Automatic Question Classification for speech based question for
afaan Oromo question answering. Because it need different natural language processing modules.
We specify additional future works that can be added to system to increase the performance of
researches. We listed it below:
Doing TTS based answering is highly recommend because it used for visually impaired
peoples Recommended future work for the Afaan Oromo language.
The question and answering support only factoid question. Thus doing for other non-
factoid question and answering.
Adding Afaan Oromo spelling checker to this system would help to avoid user input error
when writing an afaan Oromo question.
Adding afaan Oromo word-net is also increase the performance query expansion,beacuse
it used to search documents from different angle.
The performance of the system depends on the corpus size.so maximize the corpus data
also recommended.
Working with other question types like definition, list, biomedical also recommended.
The system customized using java neat beans on desktop and we recommend develop on
android, web and like artificial API.
The research is done on users for monolingual Afaan Oromo language speakers. But, in
Ethiopia, there are more than 80 languages so the next researchers’ will propose on this.
There are no perfect noise detection techniques, which are applicable to natural language
processing in part of speech technologies for question answering. So for the next
investigation, we recommend removing noises when question speech are greater than
noise.
69 | P a g e
References
[1] Beekan Erana “The oromo language (afaan oromo)” Univesty of Harvard ,spring 2018,
Cambridge, Massachusetts.
[2] Dejene hundessa,” definition question answering system for afan oromo language”, AAU,
October, 2015.
[3] Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze,” Introduction to
Information Retrieval” Online edition (c) 2009 Cambridge UP,2009
[4] J. Liu, "An Intelligent FAQ Answering System Using a Combination of Statistic and
Semantic IR Techniques.," Department of Computer Science and Electronics at Malardalen
University, 2007.
[5] A. Chandra Obula Reddy,” A Survey on Types of Question Answering System” ,Research
Scholar, Department of Computer Science & Engineering, JNTUA, Ananthapuramu -
515002, A.P., India.
[9] Bekele mengesha h/mesikel,” Automatic Question Classification For Speech Based
Amharic Question Answering”, Adama Science And Technology University School Of
Electrical Engineering And Computing, January, 2017.
[11] Kibrom Haftu Amare,” Tigrigna Question answering for factoid questions”, Addis Ababa
University School of Computer Science, June 17, 2016.
70 | P a g e
[13] Google, “how do blind people use internet?” July, 2018. https://pixelplex.io/blog/how-do-
blind-people-use-the-internet/.
[14] Aberash Tesfaye, Afaan Oromo Question Answering System for Factoid Questions,
Unpublished MSc Thesis, Department of Computer Science, Addis Ababa University, July
2014.
[15] Chaltu Fita: Afaan Oromo List, Definition and Description Question Answering System,
MSc Thesis, Department of Computer Science, Addis Ababa University, 2016.
[16] Endale daba, Improving Afaan Oromo Question Answering System: Definition, List and
Description Question Types for Non-factoid Questions, St. Mary’s University, 2021.
[17] Seid Muhe, TETEYEQ: Amharic Question Answering System for Factoid Questions,
Unpublished MSc Thesis, Department of Computer Science, Addis Ababa University,
2009.
[18] Desalegn Abebaw Zeleke, LETEYEQ: A Web Based Amharic Question Answering
System for Factoid Questions Using Machine Learning Approach, Unpublished Master’s
Thesis, Computer Science Department, Addis Ababa University, 2013.
[20] Lin, J. (2002). The Web as a Resource for Question Answering: Perspectives and
Challenges. In Proceedings of the Third International Conference on Language Resources
and Evaluation (LREC 2002).
[21] Roser Morante, Martin Krallinger, Alfonso Valencia and Walter Daelemans. Machine
Reading of Biomedical Texts about Alzheimer's Disease. CLEF 2012 Evaluation Labs and
Workshop. September 17, 2012.
71 | P a g e
[25] Google “question classification in question answering system on cooking”
https://link.springer.com/chapter/10.1007/978-3-030-60887-3_10.
[28] S. R. Gunn and others, "Support vector machines for classification and regression," ISIS
technical report, vol. 14, 1998.
[30] Javatpoint,”https://www.javatpoint.com/machine-learning-support-vector-machine-
algorithm”.
[31] Kassahun gelana micho, a continuous, speaker independent speech recognizer for Afaan
Oromo , Unpublished Master’s Thesis, Information Science., Addis Ababa University,
2010.
[33] Karpagavalli, S. and Chandra, E., 2016. A review on automatic speech recognition
architecture and approaches. International Journal of Signal Processing, Image Processing
and Pattern Recognition, 9(4), pp.393-404.
[35] Jake Vasilakes, Rui Zhang, in Machine Learning in Cardiovascular Medicine, 2021.
72 | P a g e
[37] Nidhi Desai1 , Prof.Kinnal Dhameliya2 and Prof.Vijayendra Desai “Feature Extraction
and Classification Techniques for Speech Recognition: A Review” ,International Journal
of Emerging Technology and Advanced Engineering.
[38] M. Takashi, "HMM-Based Speech Synthesis and Its Applications," Unpublished Master’s
thesis, Japan, Tokyo, November, 2002.
[39] T. Dutoit, A Short Introduction to Text to Speech synthesis, Boston,: Kluwer Academic
Publishers, December,1999.
[41] Trivedi, A., Pant, N., Shah, P., Sonik, S. and Agrawal, S., 2018. Speech to text and text to
speech recognition systems-Areview. IOSR J. Comput. Eng, 20(2).
[42] Muhidin Kedir Wosho, Text to Speech Synthesizer for Afaan Oromoo using Statistical
Parametric Speech Synthesis, Unpublished MSc Thesis, Department of Computer Science,
Addis Ababa University, June 2020.
[43] Dartmouth College: Music andComputersArchived 2011-06-08 at the Wayback Machine,
1993.
[44] B. Divya, G. Ankita and J. Khushneet, "Punjabi Speech Synthesis System Ussing HTK,"
International Journal of Information Sciences and Techniques (IJIST), vol. Vol.2, no. No.4,
p. 2, July 2012.
[45] Aimilios Chalamandaris, Sotiris Karabetsos, Pirros Tsiakoulis,and Spyros Raptis(2010),"
A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen Readers
", IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, 1890-1897.
[46] Eyob Bayou, concatenative speech synthesis for amharic using unit selection method,
Unpublished Master’s Thesis, Information Science., Addis Ababa University, June ,2011.
[47] X. D. Huang, Y. Ariki, and M. A. Jack, Hidden Markov models for speech recognition,
Edinburgh University Press, 1990.
[48] K. Abdissa, "Factoid Question Answering for Afaan Oromo," Addis Ababa University,
Addis Ababa, Ethiopia, June, 2014.
[49] Belisty.Y, "(TASBFQA): Towards Amharic Speech Based Factoid Questions Answering,"
2014.
73 | P a g e
[51] Amanuel Raga Yadate LINGUISTIC SEXISM IN GENDER ASSIGNMENT SYSTEMS
OF AFAN OROMO, AMHARIC, AND GAMO, Doctor of Philosophy in Linguistics, 20
September 2019, Bologna.
[52] Maria Labied, Abdisamad Belangour, “Automatic Speech Recognition Features Extraction
Techniques: A Multi-criteria Comparison”, Hassan II University, Ben M'sik Faculty of
Sciences Casablanca, Morocco, International Journal of Advanced Computer Science and
Applications, Vol. 12, No. 8, 2021.
[54] Text analysis for speech synthesis, The Journal of the Acoustical Society of America 94,
1841 (1993); doi: 10.1121/1.407734.
[55] Clarkson and Rosenfeld, Statistical language modeling using the CMU-Cambridge toolkit,î
in Proceedings of the 5th European Conference on Speech Communication and
Technology, Rhodes, Greece, Sept. 1997.
[56] G. Hu, Dan, Q. Liu and R. Wang, "SpeechQoogle: An Open-Domain Question Answering
System with Speech Interface," Presented at International symposium on Chinese spoken
language processing, 2006.
[57] L. Rabiner and B.-H. Juang, "Fundamentals of Speech Recognition.," Prentice hall, 1993.
[58] M. Anusuya and S. K. Katti , "Speech recognition by machine, a review," arXiv preprint
arXiv:1001.2267, 2010.
[59] https://www.languagesgulper.com. The Language Gulper is a comprehensive language
site,accesse june ,2023
74 | P a g e
Appendix I
List of Afaan Oromo stop words
75 | P a g e
Appendix II
List of Afaan Oromo Abbrivesion
Fkn. Fakeenya
Bil. Bilbiilaa
Ful. Fulbaana
Mil. Miliyoona
Onk. onkololessa
Mud. Muddee
Wax. Waxaabajjii
Appendix III
S.no Question QuestionTypes
76 | P a g e
6 Abbootiin qabeenyaa biyya keessaa carraa gaarii fi haala mijataa Person
uumametti fayyadamuun Paarkii Qonna Qindaa’aa Agroo
Indaastirii Bulbulaa keessatti hirmaachuu akka qaban waamichi
dhiyeesse eenyutu
77 | P a g e
23 Daayreektarri gabaa fi maamilaa Baankii Daashan eenyu Person
jedhamu
78 | P a g e
5 Fiigicha Maaraatoonii 48ffaa har’a eessatti gaggeeffame Place
79 | P a g e
20 Dorgommiin Maaraatoonii Riilee Ityoophiyaa 17ffaan Magaalaa Place
eessatti gaggeffame
29 Jilli Atleetiksii Ityoophiyaa marsaa 3ffaan halkan eda gara eessa Place
galan
80 | P a g e
34 Jila Atleetiksii Ityoophiyaa Shaampiyoonaa Addunyaa umrii Place
waggaa 20 gadii Kolombiyaa magaalaa eessati gaggeefame
Annex
Source code for speech recognition
package AAFSQA;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.Port;
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.result.WordResult;
81 | P a g e
private boolean resourcesThreadRunning;
public Main() {
// Configuration
configuration.setAcousticModelPath("resource:/accoustic/");
configuration.setDictionaryPath("resource:/oromoLM/nuguses.dic");
configuration.setLanguageModelPath("resource:/oromoLM/weathers.lm");
Sourcecode of QAS
package qa.all.afaanoromoo;
import java.io.File;
import java.text.NumberFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.StringTokenizer;
import java.util.Vector;
import javax.swing.JOptionPane;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.document.Document;
82 | P a g e
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public static String Queryword = null;// the analyzed query word accesible
public AQAMain() {
this.questiontype = null;
initComponents();
throws Exception {
jTextArea2.setText("");
jTextArea3.setText("");
TAmarachmelis1.setText("");
TAmarachmelis2.setText("");
TAmarachmelis3.setText("");
TAmarachmelis4.setText("");
83 | P a g e
DocumentNormalization dn=new DocumentNormalization();
if (jTextArea1.getText().equals("")) {
} else {
questiontype = an.AnalyzedQuery(dn.NormalizeQuery(jTextArea1.getText()));
if (questiontype == null) {
jTextArea1.setText("");
} else {
String q = qgen.QueryExpand((jTextArea1.getText()));
q= "\""+q+"\"";
Queryword = q;
if (!indexDir.exists() || !indexDir.isDirectory()) {
search(indexDir, q);
84 | P a g e