This document discusses word sense disambiguation (WSD) which aims to determine the intended meaning of words in context. It covers several approaches to WSD including dictionary-based, supervised learning, semi-supervised, and unsupervised methods. The document focuses on supervised learning techniques, describing how Naive Bayes, k-NN and decision tree classifiers can be applied to Punjabi language corpora to automatically learn contextual rules for WSD. Support vector machines are also discussed as a popular supervised learning model for classification.
This document discusses word sense disambiguation (WSD) which aims to determine the intended meaning of words in context. It covers several approaches to WSD including dictionary-based, supervised learning, semi-supervised, and unsupervised methods. The document focuses on supervised learning techniques, describing how Naive Bayes, k-NN and decision tree classifiers can be applied to Punjabi language corpora to automatically learn contextual rules for WSD. Support vector machines are also discussed as a popular supervised learning model for classification.
This document discusses word sense disambiguation (WSD) which aims to determine the intended meaning of words in context. It covers several approaches to WSD including dictionary-based, supervised learning, semi-supervised, and unsupervised methods. The document focuses on supervised learning techniques, describing how Naive Bayes, k-NN and decision tree classifiers can be applied to Punjabi language corpora to automatically learn contextual rules for WSD. Support vector machines are also discussed as a popular supervised learning model for classification.
This document discusses word sense disambiguation (WSD) which aims to determine the intended meaning of words in context. It covers several approaches to WSD including dictionary-based, supervised learning, semi-supervised, and unsupervised methods. The document focuses on supervised learning techniques, describing how Naive Bayes, k-NN and decision tree classifiers can be applied to Punjabi language corpora to automatically learn contextual rules for WSD. Support vector machines are also discussed as a popular supervised learning model for classification.
Download as DOCX, PDF, TXT or read online from Scribd
Download as docx, pdf, or txt
You are on page 1of 4
Introduction
Word sense disambiguation is a long standing problem in Computational
Linguistics. Much of recent work in lexical ambiguity resolution offers the prospect that disambiguation system might be able to receive as input unrestricted text and tag each word with the most likely sense with fairly reasonable accuracy and efficiency. The most extended approach is to attempt to use the context of the word to be disambiguated together with information about each of its word senses to solve this problem. This paper presents a general automatic decision procedure for lexical ambiguity resolution based on a formula of the conceptual distance among concepts: Conceptual Density. The system needs to know how words are clustered in semantic classes, and how semantic classes are Literature Review A paper on Machine readable dictionaries (MRD) in 1986 by lesk and many researchers reveals that, external knowledge resources like dictionaries, thesauri and MRD’s were used as structured source of lexical knowledge to deal with the WSD. These knowledge sources define explicit sense distinctions for assigning the correct sense of a word in context. Further Agirre and Martinez (2001b) reveal 10 various ways of distinguishing useful information which were in MRD, part of speech, semantic word associations, selection preferences, and frequency of senses. It was again Lesk (1986) who proposed a method for predicting the correct word sense by counting word overlaps between dictionary definitions of the words pertaining to the context of the ambiguous word. Mihalcea and Moldovan (1999) comes out with a method that attempts to disambiguate all the nouns, verbs, adverbs, and adjectives in a given text by referring to the senses provided by Word-Net. Magnini et al. (2002) explore the role of domain information in WSD using WordNet domains (Magnini & Strapparava, 2000); precision in corpus based WSD is more than knowledge based WSD. Ping Chen et.al, (2007)discuss context knowledge acquisition and representation methods. Word Sense Disambiguation (WSD) is an important part of Machine Readable Dictionary (MRD) which is extensively used in Expert System/Intelligent Systems. All languages have multiple meanings of words or phrases depending on the context of their usage. WSD draws the correct (intended) meaning using a database called Machine Readable Dictionary (MRD). Some rudimentary designs of MRD have been made for some European Languages. In this paper a preliminary attempt has been made towards the formulation and design of MRD in Punjabi Language using modified Lesk Algorithm which uses a simple method for relating the appropriate word sense relative to set of dictionary meanings of the word or phrase.
Word Sense Disambiguation (WSD) is the capability of finding the right
interpretation of the given word in the given context through computation. Punjabi is among one of the 10 most widely spoken languages which is also morphologically rich but surprisingly, not much work has been done for computerization and development of lexical resources of this language. It is therefore motivating to develop a corpus of Punjabi language that will convey the correct sense of an ambiguous word. The availability of sense tagged corpora largely contributes in WSD and some of the most accurate WSD systems use supervised learning algorithms (like Naïve Bayes, k-NN and Decision Trees classifiers) to learn contextual rules or classification models automatically from sense-annotated examples. These algorithms have shown high accuracy in WSD and we are discussing these three supervised techniques, their algorithm, implementation and result when applied on Punjabi Corpora. Approaches to WSD: Dictionary and Knowledge Based Methods Supervised Methods Semi-Supervised Methods Un Supervised Methods In our system we choose supervised method or approach. Support vector machines SVMs are a popular supervised learning model that you can use for classification or regression. This approach works well with high-dimensional spaces (many features in the feature vector) and can be used with small data sets effectively. When the algorithm is trained on a data set, it can easily classify new observations efficiently. It does this by constructing one or more hyperplanes to separate the data set between two classes. Supervised Techniques: It uses machine-learning techniques for inducing a classifier from manually sense-annotated data sets. Usually, the classifier (often called word expert) is concerned with a single word and performs a classification task in order to assign the appropriate sense to each instance of that word. The training set used to learn the classifier typically contains a set of examples in which a given target word is manually tagged with a sense from the sense inventory of a reference dictionary. Let us take the example of the learning process of a small child. The child doesn’t know how to read/write. He/she is being taught by the parents at home and then by their teachers in school. The children are trained and modules to recognize the alphabets, numerals, etc. Their each and every action is supervised by the teacher. Actually, a child works on the basis of the output that he/she has to produce. Similarly, a word sense disambiguation system is learned from a representative set of labeled instances drawn from same distribution as test set to be used. Basically this WSD algorithm gives well result than other approaches. Methods in Supervise WSD are as follow: Decision Lists: It is an ordered set of rules for categorizing test instances (in the case of WSD, for assigning the appropriate sense to a target word). It can be seen as a list of weighted [if-then-else] rules. A training set is used for inducing a set of features. When any word is considered, first its occurrence is calculated and its representation in terms of feature vector is used to create the decision list, from where the score is calculated. The maximum score for a vector represents the sense. METHODOLOGY It is found in the previous articles that we considered to be suitable with the topic. That to determine the method that use on some researches, it drew a systematic literature review. A systematic literature review is a technique to get an overview of previous studies in systematic ways, conducted literature review to summarize the existing literature and identify gaps in current then provide framework for future research. In this study, literature review use to find several methods from the papers. To understand the methods that used in previous research, it categorized by 3 methods, that are: supervised ; knowledge based study; non supervised ; semi supervised;. To explore the method that have been reviewed, this paper will be analyzed by grouping the data. And This research mapping quadrant base on primary and secondary data.