Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Project Proposal

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Introduction

Word sense disambiguation is a long standing problem in Computational


Linguistics. Much of recent work in lexical ambiguity resolution offers the
prospect that disambiguation system might be able to receive as input unrestricted text
and tag each word with the most likely sense with fairly reasonable accuracy and
efficiency. The most extended approach is to attempt to use the context of the word to
be disambiguated together with information about each of its word senses to solve this
problem.
This paper presents a general automatic decision procedure for lexical ambiguity
resolution based on a formula of the conceptual distance among concepts:
Conceptual Density. The system needs to know how words are clustered in
semantic classes, and how semantic classes are
Literature Review
A paper on Machine readable dictionaries (MRD) in 1986 by lesk and many
researchers reveals that, external knowledge resources like dictionaries, thesauri and
MRD’s were used as structured source of lexical knowledge to deal with the WSD. These
knowledge sources define explicit sense distinctions for assigning the correct sense of a
word in context. Further Agirre and Martinez (2001b) reveal 10 various ways of
distinguishing useful information which were in MRD, part of speech, semantic word
associations, selection preferences, and frequency of senses. It was again Lesk (1986)
who proposed a method for predicting the correct word sense by counting word
overlaps between dictionary definitions of the words pertaining to the context of the
ambiguous word. Mihalcea and Moldovan (1999) comes out with a method that
attempts to disambiguate all the nouns, verbs, adverbs, and adjectives in a given text by
referring to the senses provided by Word-Net. Magnini et al. (2002) explore the role of
domain information in WSD using WordNet domains (Magnini & Strapparava, 2000);
precision in corpus based WSD is more than knowledge based WSD. Ping Chen et.al,
(2007)discuss context knowledge acquisition and representation methods.
Word Sense Disambiguation (WSD) is an important part of Machine Readable
Dictionary (MRD) which is extensively used in Expert System/Intelligent Systems. All
languages have multiple meanings of words or phrases depending on the context of
their usage. WSD draws the correct (intended) meaning using a database called Machine
Readable Dictionary (MRD). Some rudimentary designs of MRD have been made for
some European Languages. In this paper a preliminary attempt has been made towards
the formulation and design of MRD in Punjabi Language using modified Lesk Algorithm
which uses a simple method for relating the appropriate word sense relative to set of
dictionary meanings of the word or phrase.

Word Sense Disambiguation (WSD) is the capability of finding the right


interpretation of the given word in the given context through computation. Punjabi is
among one of the 10 most widely spoken languages which is also morphologically rich
but surprisingly, not much work has been done for computerization and development of
lexical resources of this language. It is therefore motivating to develop a corpus of
Punjabi language that will convey the correct sense of an ambiguous word. The
availability of sense tagged corpora largely contributes in WSD and some of the most
accurate WSD systems use supervised learning algorithms (like Naïve Bayes, k-NN and
Decision Trees classifiers) to learn contextual rules or classification models automatically
from sense-annotated examples. These algorithms have shown high accuracy in WSD
and we are discussing these three supervised techniques, their algorithm,
implementation and result when applied on Punjabi Corpora.
Approaches to WSD:
 Dictionary and Knowledge Based Methods
 Supervised Methods
 Semi-Supervised Methods
 Un Supervised Methods
In our system we choose supervised method or approach.
Support vector machines
SVMs are a popular supervised learning model that you can use for classification
or regression. This approach works well with high-dimensional spaces (many features in
the feature vector) and can be used with small data sets effectively. When the algorithm
is trained on a data set, it can easily classify new observations efficiently. It does this by
constructing one or more hyperplanes to separate the data set between two classes.
Supervised Techniques: It uses machine-learning techniques for inducing a
classifier from manually sense-annotated data sets. Usually, the classifier (often called
word expert) is concerned with a single word and performs a classification task in order
to assign the appropriate sense to each instance of that word. The training set used to
learn the classifier typically contains a set of examples in which a given target word is
manually tagged with a sense from the sense inventory of a reference dictionary. Let us
take the example of the learning process of a small child. The child doesn’t know how to
read/write. He/she is being taught by the parents at home and then by their teachers in
school. The children are trained and modules to recognize the alphabets, numerals, etc.
Their each and every action is supervised by the teacher. Actually, a child works on the
basis of the output that he/she has to produce. Similarly, a word sense disambiguation
system is learned from a representative set of labeled instances drawn from same
distribution as test set to be used. Basically this WSD algorithm gives well result than
other approaches. Methods in Supervise WSD are as follow:
Decision Lists: It is an ordered set of rules for categorizing test instances (in the
case of WSD, for assigning the appropriate sense to a target word). It can be seen as a
list of weighted [if-then-else] rules.
A training set is used for inducing a set of features. When any word is
considered, first its occurrence is calculated and its representation in terms of feature
vector is used to create the decision list, from where the score is calculated. The
maximum score for a vector represents the sense.
METHODOLOGY
It is found in the previous articles that we considered to be suitable with the
topic. That to determine the method that use on some researches, it drew a systematic
literature review. A systematic literature review is a technique to get an overview of
previous studies in systematic ways, conducted literature review to summarize the
existing literature and identify gaps in current then provide framework for future
research. In this study, literature review use to find several methods from the papers.
To understand the methods that used in previous research, it categorized by 3
methods, that are: supervised ; knowledge based study; non supervised ; semi
supervised;.
To explore the method that have been reviewed, this paper will be analyzed by
grouping the data. And This research mapping quadrant base on primary and secondary
data.

You might also like