Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Word Sense Disambiguation: by Under The Guidance of

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 99

WORD SENSE

DISAMBIGUATION
By
Mitesh M. Khapra
Under the guidance of
Prof. Pushpak Bhattacharyya
MOTIVATION
 One of the central challenges in NLP.
 Ubiquitous across all languages.

CFILT - IITB
 Needed in:
 Machine Translation: For correct lexical choice.
 Information Retrieval: Resolving ambiguity in queries.
 Information Extraction: For accurate analysis of text.

 Computationally determining which sense of a word is activated by


its use in a particular context.
 E.g. I am going to withdraw money from the bank.
 A classification problem:
 Senses  Classes
 Context  Evidence

2
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
3
KNOWLEDEGE BASED v/s MACHINE LEARNING
BASED v/s HYBRID APPROACHES
 Knowledge Based Approaches
 Relyon knowledge resources like WordNet, Thesaurus etc.
 May use grammar rules for disambiguation.
 May use hand coded rules for disambiguation.

 Machine Learning Based Approaches


 Rely on corpus evidence.
 Train a model using tagged or untagged corpus.
 Probabilistic/Statistical models.

 Hybrid Approaches
 Use
corpus evidence as well as semantic relations form
WordNet.
4
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
5
WSD USING SELECTIONAL PREFERENCES
AND ARGUMENTS
Sense 1 Sense 2
 This airlines serves dinner in  This airlines serves the sector
the evening flight. between Agra & Delhi.

CFILT - IITB
 serve (Verb)  serve (Verb)
 agent  agent
 object – edible  object – sector

Requires exhaustive enumeration of:


Argument-structure of verbs.
Selectional preferences of arguments.
Description of properties of words such that meeting the selectional preference criteria can
be decided.
E.g. This flight serves the “region” between Mumbai and Delhi
How do you decide if “region” is compatible with “sector”
6
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
7
OVERLAP BASED APPROACHES
 Require a Machine Readable Dictionary (MRD).

CFILT - IITB
 Find the overlap between the features of different senses of an
ambiguous word (sense bag) and the features of the words in its
context (context bag).

 These features could be sense definitions, example sentences,


hypernyms etc.

 The features could also be given weights.

 The sense which has the maximum overlap is selected as the


8
contextually appropriate sense.
LESK’S ALGORITHM
Sense Bag: contains the words in the definition of a candidate sense of the ambiguous
word.
Context Bag: contains the words in the definition of each sense of each context word.
E.g. “On burning coal we get ash.”

CFILT - IITB
Ash Coal
 Sense 1  Sense 1
Trees of the olive family with pinnate leaves, thin A piece of glowing carbon or burnt wood.
furrowed bark and gray branches.  Sense 2
 Sense 2 charcoal.
The solid residue left when combustible material
is thoroughly burned or oxidized.
 Sense 3
A black solid combustible substance formed by the
 Sense 3 partial decomposition of vegetable matter without
To convert into ash free access to air and under the influence of
moisture and often increased pressure and
temperature that is widely used as a fuel for burning

In this case Sense 2 of ash would be the winner sense.


WALKER’S ALGORITHM
 A Thesaurus Based approach.
 Step 1: For each sense of the target word find the thesaurus category to which that
sense belongs.
 Step 2: Calculate the score for each sense by using the context words. A context
words will add 1 to the score of the sense if the thesaurus category of the word
matches that of the sense.

 E.g. The money in this bank fetches an interest of 8% per annum


 Target word: bank
 Clue words from the context: money, interest, annum, fetch
Sense1: Finance Sense2: Location
Context words
Money +1 0 add 1 to the
sense when
the topic of the
Interest +1 0 word matches that
of the sense
Fetch 0 0
Annum +1 0
10
Total 3 0
WSD USING CONCEPTUAL DENSITY
 Select a sense based on the relatedness of that word-sense to the
context.

CFILT - IITB
 Relatedness is measured in terms of conceptual distance
 (i.e. how close the concept represented by the word and the concept represented by its
context words are)
 This approach uses a structured hierarchical semantic net (WordNet)
for finding the conceptual distance.
 Smaller the conceptual distance higher will be the conceptual
density.
 (i.e. if all words in the context are strong indicators of a particular concept then that
concept will have a higher density.)

11
CONCEPTUAL DENSITY (EXAMPLE)
 The dots in the figure represent the
senses of the word to be
disambiguated or the senses of the

CFILT - IITB
words in context.
 The CD formula will yield highest
density for the sub-hierarchy
containing more senses.
 The sense of W contained in the sub-
hierarchy with the highest CD will be
chosen.

12
CONCEPTUAL DENSITY (EXAMPLE)

administrative_unit
body

CD = 0.062
division CD = 0.256

CFILT - IITB
committee department

government department

local department

jury operation police department jury administration

The jury(2) praised the administration(3) and operation (8) of Atlanta Police Department(1)

Step 1: Step
Make
2: aStep
lattice
Compute
3: of the the
nouns
The
Step
conceptual
concept
4: withSelect
highest
the senses below the
in the context,
density
theirofsenses
CD
resultant
isselected
selected.
concepts
concept as the correct sense for
and hypernyms.
(sub-hierarchies). the respective words.
13
WSD USING RANDOM WALK ALGORITHM
0.46 0.97
0.42
a
S3 b a
S3 S3
c

CFILT - IITB
0.49
e
0.35 0.63

S2 f S2 S2
k
g

h
i 0.58
0.92 0.56 l 0.67
j
S1 S1 S1 S1
Bell ring church Sunday

Step 1: Step
Add2:a vertex
Step
Add
for
3: each
weighted
Step
Apply
edges
4:graph
using
based
Selectranking
the vertex (sense)
possible sense of eachdefinition
word in the
based
algorithm
which
semantic
to find
has score
the highest
of score.
text. similarity (Lesk’s
each vertex
method).
(i.e. for each
word sense). 14
KB APPROACHES – COMPARISONS

Algorithm Accuracy

CFILT - IITB
WSD using Selectional Restrictions 44% on Brown Corpus

Lesk’s algorithm 50-60% on short samples of “Pride


and Prejudice” and some “news
stories”.
WSD using conceptual density 54% on Brown corpus.

WSD using Random Walk Algorithms 54% accuracy on SEMCOR corpus


which has a baseline accuracy of 37%.
Walker’s algorithm 50% when tested on 10 highly
polysemous English words.
15
KB APPROACHES –CONCLUSIONS
 Drawbacks of WSD using Selectional Restrictions
 Needs exhaustive Knowledge Base.

CFILT - IITB
 Drawbacks of Overlap based approaches
 Dictionary definitions are generally very small.
 Dictionary entries rarely take into account the distributional constraints
of different word senses (e.g. selectional preferences, kinds of
prepositions, etc.  cigarette and ash never co-occur in a dictionary).
 Suffer from the problem of sparse match.
 Proper nouns are not present in a MRD. Hence these approaches fail to
capture the strong clues provided by proper nouns.
E.g. “Sachin Tendulkar” will be a strong indicator of the category “sports”.
Sachin Tendulkar plays cricket.

16
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
17
NAÏVE BAYES

sˆ= argmax s ε senses Pr(s|Vw)

CFILT - IITB
 ‘Vw’ is a feature vector consisting of:
 POS of w
 Semantic & Syntactic features of w
 Collocation vector (set of words around it)  typically consists of next
word(+1), next-to-next word(+2), -2, -1 & their POS's
 Co-occurrence vector (number of times w occurs in bag of words around it)

 Applying Bayes rule and naive independence assumption


sˆ= argmax s ε senses Pr(s).Πi=1nPr(Vwi|s)

18
DECISION LIST ALGORITHM
 Based on ‘One sense per collocation’ property.
 Nearby words provide strong and consistent clues as to the sense of a target word.
 Collect a large set of collocations for the ambiguous word.

CFILT - IITB
 Calculate word-sense probability distributions for all such
collocations. Assuming there are only
 Calculate the log-likelihood ratio two senses for the word.
Of course, this can easily
Pr(Sense-A| Collocationi) be extended to ‘k’ senses.
Log( )
Pr(Sense-B| Collocationi)
 Higher log-likelihood = more predictive evidence
 Collocations are ordered in a decision list, with most predictive
collocations ranked highest.
19
DECISION LIST ALGORITHM (CONTD.)
Training Data Resultant Decision List

CFILT - IITB
Classification of a test sentence is based on the highest ranking
collocation found in the test sentence.
E.g.
…plucking flowers affects plant growth… 20
EXEMPLAR BASED WSD (K-NN)
 An exemplar based classifier is constructed for each word to be
disambiguated.
 Step1: From each sense marked sentence containing the ambiguous word ,

CFILT - IITB
a training example is constructed using:
 POS of w as well as POS of neighboring words.
 Local collocations
 Co-occurrence vector
 Morphological features
 Subject-verb syntactic dependencies
 Step2: Given a test sentence containing the ambiguous word, a test example
is similarly constructed.
 Step3: The test example is then compared to all training examples and the k-
closest training examples are selected.
 Step4: The sense which is most prevalent amongst these “k” examples is
then selected as the correct sense. 21
WSD USING SVMS
 SVM is a binary classifier which finds a hyperplane with the largest margin that
separates training examples into 2 classes.
 As SVMs are binary classifiers, a separate classifier is built for each sense of the

CFILT - IITB
word
 Training Phase: Using a tagged corpus, f or every sense of the word a SVM is
trained using the following features:
 POS of w as well as POS of neighboring words.
 Local collocations
 Co-occurrence vector
 Features based on syntactic relations (e.g. headword, POS of headword, voice of head word
etc.)
 Testing Phase: Given a test sentence, a test example is constructed using the
above features and fed as input to each binary classifier.
 The correct sense is selected based on the label returned by each classifier.
22
WSD USING PERCEPTRON TRAINED
HMM
 WSD is treated as a sequence labeling task.

 The class space is reduced by using WordNet’s super senses instead of actual

CFILT - IITB
senses.

 A discriminative HMM is trained using the following features:


 POS of w as well as POS of neighboring words.
 Local collocations
 Shape of the word and neighboring words
E.g. for s = “Merrill Lynch & Co shape(s) =Xx*Xx*&Xx

 Lends itself well to NER as labels like “person”, location”, "time” etc are
included in the super sense tag set.
23
SUPERVISED APPROACHES –
COMPARISONS
Approach Average Average Recall Corpus Average Baseline
Precision Accuracy
Naïve Bayes 64.13% Not reported Senseval3 – All 60.90%
Words Task

CFILT - IITB
Decision Lists 96% Not applicable Tested on a set of 63.9%
12 highly
polysemous
English words
Exemplar Based 68.6% Not reported WSJ6 containing 63.7%
disambiguation 191 content words
(k-NN)
SVM 72.4% 72.4% Senseval 3 – 55.2%
Lexical sample
task (Used for
disambiguation of
57 words)
Perceptron trained 67.60 73.74% Senseval3 – All 60.90%
HMM Words Task 24
SUPERVISED APPROACHES –
CONCLUSIONS
 General Comments
 Use corpus evidence instead of relying of dictionary defined senses.

CFILT - IITB
 Can capture important clues provided by proper nouns because proper nouns do
appear in a corpus.
 Naïve Bayes
 Suffers from data sparseness.
 Since the scores are a product of probabilities, some weak features might pull
down the overall score for a sense.
 A large number of parameters need to be trained.

 Decision Lists
 A word-specific classifier. A separate classifier needs to be trained for each word.
 Uses the single most predictive feature which eliminates the drawback of Naïve
Bayes.
25
SUPERVISED APPROACHES –
CONCLUSIONS
 Exemplar Based K-NN
 A word-specific classifier.
 Will not work for unknown words which do not appear in the corpus.

CFILT - IITB
 Uses a diverse set of features (including morphological and noun-subject-verb
pairs)
 SVM
 A word-sense specific classifier.
 Gives the highest improvement over the baseline accuracy.
 Uses a diverse set of features.

 HMM
 Significant in lieu of the fact that a fine distinction between the various senses of
a word is not needed in tasks like MT.
 A broad coverage classifier as the same knowledge sources can be used for all
words belonging to super sense.
 Even though the polysemy was reduced significantly there was not a comparable 26
significant improvement in the performance.
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
27
SEMI-SUPERVISED DECISION LIST
ALGORITHM
 Based on Yarowsky’s supervised algorithm that uses Decision Lists.
 Step1: Train the Decision List algorithm using a small amount of
seed data.

CFILT - IITB
 Step2: Classify the entire sample set using the trained classifier.
 Step3: Create new seed data by adding those members which are
tagged as Sense-A or Sense-B with high probability.
 Step4: Retrain the classifier using the increased seed data.
 Exploits “One sense per discourse” property
 Identify words that are tagged with low confidence and label them with the
sense which is dominant for that document

28
INITIALIZATION, PROGRESS AND
CONVERGENCE
Residual data

Life Manufacturing

CFILT - IITB
Seed set grows Stop when residual set stabilizes

29
SEMI-SUPERVISED APPROACHES –
COMPARISONS & CONCLUSIONS
Approach Average Precision Corpus Average Baseline
Accuracy
Supervised 96.1% Tested on a set of 12 63.9%
Decision Lists highly

CFILT - IITB
polysemous English
words
Semi-Supervised 96.1% Tested on a set of 12 63.9%
Decision Lists highly
polysemous English
words

 Works at par with its supervised version even though it needs significantly less
amount of tagged data.
 Has all the advantages and disadvantaged of its supervised version.

30
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
31
HYPERLEX
 KEY IDEA
 Instead of using “dictionary defined senses” extract the “senses from the corpus”
itself

CFILT - IITB
 These “corpus senses” or “uses” correspond to clusters of similar contexts for a
word.

(river)

(victory)
(electricity)
(water) (world)

(flow)
(cup)

(team)

32
DETECTING ROOT HUBS
 Different uses of a target word form highly interconnected bundles
(or high density components)
 In each high density component one of the nodes (hub) has a higher

CFILT - IITB
degree than the others.
 Step 1:
 Construct co-occurrence graph, G.
 Step 2:
 Arrange nodes in G in decreasing order of in-degree.
 Step 3:
 Select the node from G which has the highest frequency. This node will be the
hub of the first high density component.
 Step 4:
 Delete this hub and all its neighbors from G.
 Step 5:
 Repeat Step 3 and 4 to detect the hubs of other high density components 33
DETECTING ROOT HUBS (CONTD.)

CFILT - IITB
The four components for “barrage” can be characterized as:

34
DELINEATING COMPONENTS
 Attach each node to the root hub closest to it.
 The distance between two nodes is measured as the smallest sum of
the weights of the edges on the paths linking them.

CFILT - IITB
 Step 1:
 Add the target word to the graph G.
 Step 2:
 Compute a Minimum Spanning Tree (MST) over G taking the target word as
the root.

35
DISAMBIGUATION
 Each node in the MST is assigned a score vector with as many
dimensions as there are components.

CFILT - IITB
E.g. pluei(rain) belongs to the component EAU(water) and d(eau, pluie) = 0.82, s pluei =
(0.55, 0, 0, 0)

 Step 1:
 For a given context, add the score vectors of all words in that context.
 Step 2:
 Select the component that receives the highest weight.

36
DISAMBIGUATION (EXAMPLE)
Le barrage recueille l’eau a la saison des plueis.
The dam collects water during the rainy season.

CFILT - IITB
EAU is the winner in this case.

A reliability coefficient (ρ) can be calculated as the difference (δ)


between the best score and the second best score.
ρ = 1 – (1/(1+ δ)) 37
YAROWSKY’S ALGORITHM
(WSD USING ROGET’S THESAURUS CATEGORIES)

 Based on the following 3 observations:


 Different conceptual classes of words (say ANIMALS and MACHINES) tend to
appear in recognizably different contexts.

CFILT - IITB
 Different word senses belong to different conceptual classes (E.g. crane).
 A context based discriminator for the conceptual classes can serve as a context
based discriminator for the members of those classes.
 Identify salient words in the collective context of the thesaurus
category and weigh appropriately.
 Weight(word) = Salience(Word) =
ANIMAL/INSECT
species (2.3), family(1.7), bird(2.6), fish(2.4), egg(2.2), coat(2.5), female(2.0), eat (2.2),
nest(2.5), wild
TOOLS/MACHINERY
tool (3.1), machine(2.7), engine(2.6), blade(3.8), cut(2.2), saw(2.5), lever(2.0), wheel (2.2),
piston(2.5)
38
DISAMBIGUATION
 Predict the appropriate category for an ambiguous word using the
weights of words in its context.
ARGMAX

CFILT - IITB
RCat
…lift water and to grind grain. Treadmills attached to cranes were used to lift heavy
objects from Roman times, ….

TOOLS/MACHINE Weight ANIMAL/INSECT Weight


lift 2.44 Water 0.76
grain 1.68
used 1.32
heavy 1.28
Treadmills 1.16
attached 0.58
grind 0.29
Water 0.11
TOTAL 11.30 TOTAL 0.76 39
LIN’S APPROACH
Two different words are likely to have similar meanings if they occur in
identical local contexts.
E.g. The facility will employ 500 new employees.

CFILT - IITB
Senses of facility Subjects of “employ”
 installation Word Freq Log Likelihood

 proficiency ORG 64 50.4


 adeptness Plant 14 31.0
 readiness Company 27 28.6
 toilet/bathroom Industry 9 14.6
Unit 9 9.32
Aerospace 2 5.81
In this case Sense 1 of
installation would be the Memory 1 5.79
device
winner sense. 40
Pilot 2 5.37
SIMILARITY AND HYPERNYMY
sim(A,B) =

If A is a “Hill” and B is a “Coast” then

CFILT - IITB
the commonality between A and B is that
“A is a GeoForm and B is a GeoForm”.

sim(Hill, Coast) =

In general, similarity is directly


proportional to the probability that the
two words have the same super class To maximize similarity select that sense
(Hypernym) which has the same hypernym as most
of the Selector words.

41
WSD USING PARALLEL CORPORA
 A word having multiple senses in one language will have distinct
translations in another language, based on the context in which it is
used.

CFILT - IITB
 The translations can thus be considered as contextual indicators of
the sense of the word.
 Sense Model

 Concept Model

42
UNSUPERVISED APPROACHES –
COMPARISONS
Approach Precision Average Recall Corpus Baseline
Lin’s Algorithm 68.5%. Not reported Trained using WSJ 64.2%
The result was corpus containing 25
considered to be million words.
correct if the Tested on 7 SemCor

CFILT - IITB
similarity between files containing 2832
the predicted polysemous nouns.
sense and actual
sense was greater
than 0.27
Hyperlex 97% 82% Tested on a set of 10 73%
(words which were not highly polysemous
tagged with French words
confidence>threshold
were left untagged)
WSD using Roget’s 92% Not reported Tested on a set of 12 Not reported
Thesaurus categories (average degree of highly polysemous
polysemy was 3) English words

WSD using parallel SM: 62.4% SM: 61.6% Trained using a English Not reported
corpora CM: 67.2% CM: 65.1% Spanish parallel corpus
Tested using Senseval 2 – All
Words task (only nouns were
considered)

43
UNSUPERVISED APPROACHES –
CONCLUSIONS
 General Comments
 Combine the advantages of supervised and knowledge based approaches.

CFILT - IITB
 Just as supervised approaches they extract evidence from corpus.
 Just as knowledge based approaches they do not need tagged corpus.
 Lin’s Algorithm
 A general purpose broad coverage approach.
 Can even work for words which do not appear in the corpus.

 Hyperlex
 Use of small world properties was a first of its kind approach for automatically
extracting corpus evidence.
 A word-specific classifier.
 The algorithm would fail to distinguish between finer senses of a word (e.g. the
medicinal and narcotic senses of “drug”)
44
UNSUPERVISED APPROACHES –
CONCLUSIONS
 Yarowsky’s Algorithm
 A broad coverage classifier.
 Can be used for words which do not appear in the corpus. But it was not tested on

CFILT - IITB
an “all word corpus”.
 WSD using Parallel Corpora
 Can distinguish even between finer senses of a word because even finer senses of
a word get translated as distinct words.
 Needs a word aligned parallel corpora which is difficult to get.
 An exceptionally large number of parameters need to be trained.

45
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
46
AN ITERATIVE APPROACH TO WSD
 Uses semantic relations (synonymy and hypernymy) form WordNet.
 Extracts collocational and contextual information form WordNet
(gloss) and a small amount of tagged data.

CFILT - IITB
 Monosemic words in the context serve as a seed set of
disambiguated words.
 In each iteration new words are disambiguated based on their
semantic distance from already disambiguated words.
 It would be interesting to exploit other semantic relations available
in WordNet.

47
SENSELEARNER
 Uses some tagged data to build a semantic language model for
words seen in the training corpus.
 Uses WordNet to derive semantic generalizations for words which
are not observed in the corpus.

CFILT - IITB
Semantic Language Model
 For each POS tag, using the corpus, a training set is constructed.
 Each training example is represented as a feature vector and a class
label which is word#sense
 In the testing phase, for each test sentence, a similar feature vector is
constructed.
 The trained classifier is used to predict the word and the sense.
 If the predicted word is same as the observed word then the
predicted sense is selected as the correct sense.
48
SENSELEARNER (CONTD.)
Semantic Generalizations
 Improvises Lin’s algorithm by using semantic dependencies form
the WordNet.
E.g.

CFILT - IITB
 if “drink water” is observed in the corpus then using the
hypernymy tree we can derive the syntactic dependency “take-in
liquid”
 “take-in liquid” can then be used to disambiguate an instance of the
word tea as in “take tea”, by using the hypernymy-hyponymy
relations.

49
STRUCTURAL SEMANTIC
INTERCONNECTIONS (SSI)
 An iterative approach.
 Uses the following relations
 hypernymy (car#1 is a kind of vehicle#1) denoted by (kind-of )

CFILT - IITB
 hyponymy (the inverse of hypernymy) denoted by (has-kind)
 meronymy (room#1 has-part wall#1) denoted by (has-part )
 holonymy (the inverse of meronymy) denoted by (part-of )
 pertainymy (dental#1 pertains-to tooth#1) denoted by (pert)
 attribute (dry#1 value-of wetness#1) denoted by (attr)
 similarity (beautiful#1 similar-to pretty#1) denoted by (sim)
 gloss denoted by (gloss)
 context denoted by (context)
 domain denoted by (dl)
 Monosemic words serve as the seed set for disambiguation.
50
STRUCTURAL SEMANTIC
INTERCONNECTIONS (SSI) CONTD.

CFILT - IITB
A semantic relations graph for the two senses of the word
bus (i.e. vehicle and connector)
51
HYBRID APPROACHES –
COMPARISONS & CONCLUSIONS
Approach Precision Average Corpus Baseline
Recall
An Iterative 92.2% 55% Trained using 179 texts Not
Approach to from SemCor. reported

CFILT - IITB
WSD Tested using 52 texts
created from 6 SemCor
files
SenseLearner 64.6% 64.6% SenseEval-3 All Words 60.9%
Task

SSI 68.5% 68.4% SenseEval-3 Gloss Not


Disambiguation Task reported

General Comments
 Combine information obtained from multiple knowledge sources

 Use a very small amount of tagged data.


52
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
53
OVERCOMING KNOWLEDGE BOTTLE-
NECK
Using Search Engines
 Construct search queries using monosemic words and phrases form

CFILT - IITB
the gloss of a synset.
 Feed these queries to a search engine.
 From the retrieved documents extract the sentences which contain
the search queries.

Using Equivalent Pseudo Words


 Use monosemic words belonging to each sense of an ambiguous
word.
 Use the occurrences of these words in the corpus as training
examples for the ambiguous word.
54
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
55
DOES WSD HELP MT??
 Contradictory results have been published. Hence difficult to
conclusively decide.
 Depends on the quality of the underlying MT model.

CFILT - IITB
 The bias of BLEU score towards phrasal coherency often gives
misleading results.

E.g. (Chinese to English translation)


Hiero (SMT model): Australian minister said that North Korea bad behavior will be
more aid.
Hiero (SMT model) + WSD : Australian minister said that North Korea bad behavior
will be unable to obtain more aid.
Here the second sentence is more appropriate. But since the phrase “unable to obtain”
was not observed in the language model the second sentence gets a lower BLEU
score 56
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
57
SUMMARY
 Dictionary defined senses do not provide enough surface cues.
 Complete dependence on dictionary defined senses is the primary
reason for low accuracies in Knowledge Based approaches.

CFILT - IITB
 Extracting “sense definitions” or “usage patterns” from the corpus
greatly improves the accuracy.
 Word-specific classifiers are able to attain extremely good
accuracies but suffer from the problem of non-reusability.
 Unsupervised algorithms are capable of performing at par with
supervised algorithms.
 Relying on single most predictive evidence increases the accuracy.

58
SUMMARY (CONTD.)
 Classifiers that exploit syntactic dependencies between words are
able to perform large scale disambiguation (generic classifiers) and
at the same time give reasonably good accuracies.

CFILT - IITB
 Using a diverse set of features improves WSD accuracy.
 WSD results are better when the degree of polysemy is reduced.
 Hyperlex (unsupervised corpus based), Lin’s algorithm
(unsupervised corpus based) and SSI (hybrid) look promising for
resource-poor Indian languages.

59
ROADMAP
 Knowledge Based Approaches
 WSD using Selectional Preferences (or restrictions)
 Overlap Based Approaches

CFILT - IITB
 Machine Learning Based Approaches
 Supervised Approaches
 Semi-supervised Algorithms
 Unsupervised Algorithms
 Hybrid Approaches
 Reducing Knowledge Acquisition Bottleneck
 WSD and MT
 Summary
 Future Work
60
FUTURE WORK
 Use unsupervised or hybrid approaches to develop a multilingual
WSD engine. (focusing on MT)
 Automatically generate sense tagged data.

CFILT - IITB
 Explore the possibility of using an ensemble of WSD algorithms.
 Explore whether it possible to evaluate the role of WSD in MT (the
evaluation should be independent of the MT model being used)

61
REFERENCES
 Michael Lesk. 1986. “Automatic sense disambiguation using machine readable dictionaries:
how to tell a pine cone from an ice cream cone”, in Proceedings of the 5th annual international
conference on Systems documentation, Toronto, Ontario, Canada, 1986.
 Walker D. and Amsler R. 1986. "The Use of Machine Readable Dictionaries in Sublanguage

CFILT - IITB
Analysis", in Analyzing Language in Restricted Domains, Grishman and Kittredge (eds), LEA
Press, pp. 69-83, 1986.
 Yarowsky, David. 1992. "Word sense disambiguation using statistical models of Roget's
categories trained on large corpora", in Proceedings of the 14th International Conference on
Computational Linguistics (COLING), Nantes, France, 454-460, 1992.
 Yarowsky, David. 1994. "Decision lists for lexical ambiguity resolution: Application to accent
restoration in Spanish and French", in Proceedings of the 32nd Annual Meeting of the
Association for Computational Linguistics (ACL), Las Cruces, U.S.A., 88-95, 1994.
 Yarowsky, David. 1995. "Unsupervised word sense disambiguation rivaling supervised
methods", in Proceedings of the 33rd Annual Meeting of the Association for Computational
Linguistics (ACL), Cambridge, MA, 189-196, 1995.
 Agirre, Eneko & German Rigau. 1996. "Word sense disambiguation using conceptual density",
in Proceedings of the 16th International Conference on Computational Linguistics (COLING),
Copenhagen, Denmark, 1996
62
REFERENCES (CONTD.)
 Ng, Hwee T. & Hian B. Lee. 1996. "Integrating multiple knowledge sources to disambiguate
word senses: An exemplar-based approach", Proceedings of the 34th Annual Meeting of the
Association for Computational Linguistics (ACL), Santa Cruz, U.S.A., 40-47.
 Ng, Hwee T. 1997. "Exemplar-based word sense disambiguation: Some recent improvements",

CFILT - IITB
Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing
(EMNLP), Providence, U.S.A., 208-213.
 Lin, Dekang. 1997."Using syntactic dependency as local context to resolve word sense
ambiguity", in Proceedings of the 35th Annual Meeting of the Association for Computational
Linguistics (ACL), Madrid, 64-71,1997.
 Rada Mihalcea, Dan I. Moldovan, 1999. "An automatic method for generating sense tagged
corpora", Proceedings of the sixteenth national conference on Artificial intelligence and the
eleventh Innovative applications of artificial intelligence conference innovative applications of
artificial intelligence.Orlando, Florida, United States, 1999.
 Philip Resnik, 1999."Semantic Similarity in a Taxonomy: An Information-Based Measure and
its Application to Problems of Ambiguity in Natural Language", Journal of Artificial
Intelligence Research, 1999.

63
REFERENCES (CONTD.)
 E. Agirre, J. Atserias, L. Padr, G. Rigau, 2000."Combining Supervised and Unsupervised
Lexical Knowledge Methods for Word Sense Disambiguation Computers and the Humanities",
Special Double Issue on SensEval. Eds. Martha Palmer and Adam Kilgarriff. 34:1,2, 2000.
 Rada Mihalcea and Dan Moldovan, 2000."An Iterative Approach to Word Sense

CFILT - IITB
Disambiguation", in Proceedings of Florida Artificial Intelligence Research Society Conference
(FLAIRS 2000), [pg.219-223] Orlando, FL, May 2000.
 Agirre Eneko, Ansa Olatz, Hovy Eduard, Martinez David, 2001. "Enriching WordNet concepts
with topic signatures", Proceedings of the NAACL workshop on WordNet and Other lexical
Resources:Applications, Extensions and Customizations. Pittsburg, 2001.
 Mona Diab and Philip Resnik, 2002."An Unsupervised Method for Word Sense Tagging Using
Parallel Corpora", In Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics, Philadelphia, Pennsylvania, July 2002.
 Véronis, Jean. 2004."HyperLex: Lexical cartography for information retrieval", Computer
Speech & Language, 18(3):223-252, 2004.
 Rada Mihalcea and Ehsanul Faruque, 2004. "SenseLearner: Minimally Supervised Word Sense
Disambiguation for All Words in Open Text", in Proceedings of ACL/SIGLEX Senseval-3,
Barcelona, Spain, July 2004.

64
REFERENCES (CONTD.)
 Indrajit Bhattacharya, Lise Getoor, Yoshua Bengio, 2004."Unsupervised sense disambiguation
using bilingual probabilistic models", In Proceedings of the 42nd Annual Meeting on
Association for Computational Linguistics, Spain, 2004.
 Lee, Yoong K., Hwee T. Ng & Tee K. Chia. 2004. "Supervised word sense disambiguation with

CFILT - IITB
support vector machines and multiple knowledge sources", Proceedings of Senseval-3: Third
International Workshop on the Evaluation of Systems for the Semantic Analysis of Text,
Barcelona, Spain, 137-140.
 Mihalcea, Rada. 2005."Large vocabulary unsupervised word sense disambiguation with graph-
based algorithms for sequence data labeling", in Proceedings of the Joint Human Language
Technology and Empirical Methods in Natural Language Processing Conference (HLT/EMNLP),
Vancouver, Canada, 411-418, 2005.
 Marine Carpuat, Dekai Wu, 2005."Evaluating the Word Sense Disambiguation Performance of
Statistical Machine Translation",
 Marine Carpuat and Dekai Wu, 2005."Word sense disambiguation vs. statistical machine
translation", In Proceedings of the 43rd Annual Meeting of the Association for Computational
Linguistics (ACL’05), pages 387-394, Ann Arbor, Michigan, June 2005.
 Roberto Navigli, Paolo Velardi, 2005."Structural Semantic Interconnections: A Knowledge-
Based Approach to Word Sense Disambiguation", IEEE Transactions On Pattern Analysis and
Machine Intelligence, July 2005. 65
REFERENCES (CONTD.)
 M Ciaramita, Y. Altun, 2006."Broad-Coverage Sense Disambiguation and Information
Extraction with a Supersense Sequence Tagger", in Proceedings of the Conference on
Empirical Methods in Natural Language Processing (EMNLP 2006).
 Yee Seng Chan, Hwee Tou Ng, 2006."Estimating class priors in domain adaptation for word

CFILT - IITB
sense disambiguation", in Proceedings of the 21st International Conference on Computational
Linguistics and the 44th annual meeting of the ACL, Sydney, 2006.
 Roberto Navigli, 2006."Meaningful clustering of senses helps boost word sense
disambiguation performance", in Proceedings of the 21st International Conference on
Computational Linguistics and the 44th annual meeting of the ACL, Sydney, 2006.
 Roberto Navigli, 2006."Ensemble methods for unsupervised WSD", in Proceedings of the 21st
International Conference on Computational Linguistics and the 44th annual meeting of the ACL,
Sydney, 2006.
 Zhimao Lu, Haifeng Wang, Jianmin Yao, Ting Liu,Sheng Li, 2006."An equivalent pseudoword
solution to Chinese word sense disambiguation", in Proceedings of the 21st International
Conference on Computational Linguistics and the 44th annual meeting of the ACL, Sydney,
2006.

66
REFERENCES (CONTD.)
 Yee Seng Chan, Hwee Tou Ng and David Chiang, 2007."Word Sense Disambiguation Improves
Statistical Machine Translation", in Proceedings of the 45th Annual Meeting of the Association
for Computational Linguistics, Prague, 2007.

CFILT - IITB
67
??

CFILT - IITB
THANK YOU!
??

68
EXTRA SLIDES
LESK’S ALGORITHM
Two different words are likely to have similar meanings if
they occur in identical local contexts.
E.g. The facility will employ 500 new employees.

CFILT - IITB
Senses of facility Subjects of “employ”
 installation Word Freq Log Likelihood

 proficiency ORG 64 50.4


 adeptness Plant 14 31.0
 readiness Company 27 28.6
 toilet/bathroom Industry 9 14.6
Unit 9 9.32
Aerospace 2 5.81
To maximize similarity select that sense
which has the same hypernym as most of Memory 1 5.79
the other words in the context device
70
Pilot 2 5.37
Life

CFILT - IITB
Manufacturing

 All occurrences of the target word are identified


 A small training set of seed data is tagged with word sense
Seed collocation should accurately distinguish the senses.
Strategies for selecting seed words:
Use words from dictionary definitions.
Use a single defining collocate for each class.
E.g. “bird” and “machine” for the target “crane”
Hand-label salient corpus collocates
71
SELECTIONAL PREFERENCES
(INDIAN TRADITION)
 “Desire” of some words in the sentence (“aakaangksha”).
 I saw the boy with long hair.
 The verb “saw” and the noun “boy” desire an object here.

CFILT - IITB
 “Appropriateness” of some other words in the sentence to fulfil that
desire (“yogyataa”).
 I saw the boy with long hair.
 The PP “with long hair” can be appropriately connected only to “boy” and not “saw”.

 In case, the ambiguity is still present, “proximity” (“sannidhi”) can


determine the meaning.
 E.g. I saw the boy with a telescope.
 The PP “with a telescope” can be attached to both “boy” and “saw”, so ambiguity still
present. It is then attached to “boy” using the proximity check.

72
SELECTIONAL PREFERENCES
(RECENT LINGUISTIC THEORY)
 There are words which demand arguments, like, verbs, prepositions,
adjectives and sometimes nouns. These arguments are typically
nouns.

CFILT - IITB
 Arguments must have the property to fulfil the demand. They must
satisfy selectional preferences.
 Example
 Give (verb)
 agent – animate
 obj – direct

 obj – indirect

 I gave him the book


 I gave him the book (yesterday in the school) -> adjunct

 How does this help in WSD?


 One type of contextual information is the information about the type of 73
arguments that a word takes.
CRITIQUE
 Requires exhaustive enumeration in machine-readable form of:
 Argument-structure of verbs.
 Selectional preferences of arguments.

CFILT - IITB
 Description of properties of words such that meeting the selectional
preference criteria can be decided.
 E.g. This flight serves the “region” between Mumbai and Delhi
 How do you decide if “region” is compatible with “sector”

 Accuracy
 44% on Brown corpus.

74
CRITIQUE
 Proper nouns in the context of an ambiguous word can act as strong
disambiguators.
E.g. “Sachin Tendulkar” will be a strong indicator of the category
“sports”.
Sachin Tendulkar plays cricket.
 Proper nouns are not present in the thesaurus. Hence this approach fails
to capture the strong clues provided by proper nouns.
 Accuracy
 50% when tested on 10 highly polysemous English words.

75
CRITIQUE
 Suffers from sparse match: the possibility of word overlap is very less.
 Can be misled.
E.g. As a result of the forest fire all olive trees were reduced to ash.

CFILT - IITB

 Here Sense 1 of ash would be incorrectly chosen as the contextually appropriate sense.

 Proper nouns in the context of an ambiguous word can act as strong


disambiguators.
E.g. “Sachin Tendulkar” will be a strong indicator of the category “sports”.
Sachin Tendulkar plays cricket.
 Proper nouns are not present in the Wordnet. Hence this approach fails
to capture the strong clues provided by proper nouns.
 Accuracy
 50-60% on short samples of “Pride and Prejudice” and some “news stories”.

76
CRITIQUE
 The Good
 A non-syntactic approach.
 Simple Implementation.

CFILT - IITB
 Does not require a tagged corpus.

 The Bad
 Suffers from sparse match: the possibility of word overlap is very less.
 Can be misled.
 E.g. As a result of the forest fire all olive trees were reduced to ash.
 Here Sense 1 of ash would be incorrectly chosen as the contextually appropriate sense.
 Proper nouns in the context of an ambiguous word can act as strong
disambiguators.
E.g. “Sachin Tendulkar” will be a strong indicator of the category “sports”.
Sachin Tendulkar plays cricket.
 Proper nouns are not present in the Wordnet. Hence this approach fails to capture
the strong clues provided by proper nouns.
77
 Accuracy
 50-60% on short samples of “Pride and Prejudice” and some “news stories”.
CRITIQUE
 Resolves lexical ambiguity of nouns by finding a combination of senses that
maximizes the total Conceptual Density among senses.
 The Good

CFILT - IITB
 Does not require a tagged corpus.
 The Bad
 Fails to capture the strong clues provided by proper nouns in the context.
 Accuracy
 54% on Brown corpus.

78
CRITIQUE
 The Good
 Simple Implementation.
 Independence assumption avoids complex modeling of feature dependencies.

CFILT - IITB
 The Bad
 May suffer from data scarcity.
 The test sentence might have some features for which the P (feature|sense) may
be zero for all the senses (unable to handle unseen/unknown features).
 Some weak features might pull down the overall score i.e. they might reduce the
influence of the strong features/indicators.
 Accuracy
 64% using some trial and error smoothing on SEMCOR corpus where the
baseline accuracy was 61.2%.

79
CRITIQUE
 The Good
 Only the single most predictive piece of evidence is used to classify the target
word (contrast this with Naïve Bayes where weaker features can reduce the

CFILT - IITB
overall score).
 Simple implementation.
 Easy understandability of resulting decision list.
 Is able to capture the clues provided by Proper nouns from the corpus.

 The Bad
 Needs a large tagged corpus.
 The classifier is word-specific.
 A new classifier needs to be trained for every word that you want to
disambiguate.
 Accuracy
 Average accuracy of 96% when tested on a set of 12 highly polysemous words.
80
CONCEPTUAL DENSITY FORMULA
Wish list
 The conceptual distance between two words entity
should be proportional to the length of the path Sub-Tree

CFILT - IITB
between the two words in the hierarchical tree d (depth)
(WordNet).
 The conceptual distance between two words location finance
should be proportional to the depth of the
concepts in the hierarchy. h (height) of the
concept “location”
bank-2 bank-1 money

where, c = concept
nhyp = mean number of hyponyms
h = height of the sub-hierarchy
m = no. of senses of the word and senses of
context words contained in the sub- 81
hierarchy
CD = Conceptual Density
RANDOM WALK ALGORITHM
 A popular algorithm used by search engines for ranking web pages (e.g.
PageRank algorithm used by Google).

 Finds the importance score of a vertex in a graph.

CFILT - IITB
 Uses the idea of “voting” or “recommendation”.

 When one vertex links to another it is basically casting a vote for that
vertex. (E.g. a link from Yahoo to your home page)

 Large number of votes = high importance.

 The importance of the vertex casting the votes determines the importance
of the vote itself. (A link from Yahoo would be more important than a link
from your friend)

 A vertex recommends other vertices and the strength of the 82


recommendation is recursively computed.
RANDOM WALK ALGORITHM - PAGERANK
 Given a graph G = (V,E)
 In(Vi) = predecessors of Vi
 Out(Vi) = successors of Vi

CFILT - IITB
 In a weighted graph, the walker randomly selects an outgoing edge
with higher probability of selecting edges with higher weight.

83
CRITIQUE
 Relies on random walks on graphs encoding label dependencies.
 The Good

CFILT - IITB
Does not require any tagged data (a WordNet is sufficient).
 The weights on the edges capture the definition based semantic
similarities.
 Takes into account global data recursively drawn from the entire
graph.
 The Bad
 Poor accuracy
 Accuracy
 54% accuracy on SEMCOR corpus which has a baseline accuracy of 37%.

84
BAYES RULE AND INDEPENDENCE
ASSUMPTION
sˆ= argmax s ε senses Pr(s|Vw)
where Vw is the feature vector.

CFILT - IITB
 Apply Bayes rule:
Pr(s|Vw)=Pr(s).Pr(Vw|s)/Pr(Vw)

 Pr(Vw|s) can be approximated by independence assumption:

Pr(Vw|s) = Pr(Vw1|s).Pr(Vw2|s,Vw1)...Pr(Vwn|s,Vw1,..,Vwn-1)
= Πi=1nPr(Vwi|s)

sˆ= argmax sÎsenses Pr(s).Πi=1nPr(Vwi|s) 85


ESTIMATING PARAMETERS
 Parameters in the probabilistic WSD are:
 Pr(s)

CFILT - IITB
 Pr(V i
|s)
w

 Senses are marked with respect to sense repository (WORDNET)


Pr(s) = count(s,w) / count(w)
Pr(Vwi|s) = Pr(Vwi,s)/Pr(s)

= c(Vwi,s,w)/c(s,w)

86
ITERATIVE BOOTSTRAPPING ALGORITHM –
STEP 1
 Identify all contexts in which the polysemous word occurs.
 For each possible sense use seed collocations to identify a relatively
small number of training examples representative of that sense.

CFILT - IITB
Residual data

Life Manufacturing

 Seed collocation should accurately distinguish the senses.


 E.g. “life” and “manufacturing” for the target “plant”

87
ITERATIVE BOOTSTRAPPING ALGORITHM –
STEP 2
 Train the Decision List algorithm on the seed data.

CFILT - IITB
 Classify the entire sample set using the trained classifier.

 Create new seed data by adding those members which are tagged as
Sense-A or Sense-B with high probability.

 Retrain the classifier using the new seed data.

 These additions will contribute new collocations that are reliably


indicative of the 2 senses.
88
ONE SENSE PER DISCOURSE
 The accuracy of the algorithm can be improved by using the “One
sense per discourse” property.

CFILT - IITB
 After algorithm has converged
 Identify words that are tagged with low confidence and label them
with the sense which is dominant for that document.

 After each iteration


 If there is substantial disagreement concerning which is the dominant
sense, all instances in the discourse are returned to the residual set
rather than merely leaving their current tags unchanged. This helps
improve the purity of the training data .

89
CRITIQUE
 Harnesses powerful, empirically-observed properties of language.
 The Good

CFILT - IITB
Does not require large tagged corpus. Simple implementation.
 Simple semi-supervised algorithm which builds on an existing
supervised algorithm.
 Easy understandability of resulting decision list.
 Is able to capture the clues provided by Proper nouns from the corpus.

 The Bad
 The classifier is word-specific.
 A new classifier needs to be trained for every word that you want to
disambiguate.
 Accuracy
 Average accuracy of 96% when tested on a set of 12 highly
90
polysemous words.
SMALL LEXICAL WORLDS
 Construct a graph for each word to be disambiguated.
 Nodes are the words which co-occur with the target word.
 An edge connects two nodes if the corresponding words co-occur

CFILT - IITB
with each other.

(river)

(victory)
(electricity)
(water) (world)

(flow)
(cup)

(team)

91
 Such a graph has all the properties of small world graphs.
ADDING WEIGHTS TO THE EDGES
 Each edge is assigned a weight that decreases as the association frequency
of the words increases.
wA,B = 1 – max[P(A|B), P(B|A)]

CFILT - IITB
P(A|B) = fA,B/fB, P(B|A) = fB,A/fA
P(eau|ouverage) = 183/479 0.38, P(ouverage|eau) = 183/1057 = 0.17, w = 1 – 0.38 = 0.62

EAU (Water) ~EAU Total

OUVRAGE (Work) 183 296 479


~OUVRAGE 874 5556 6430
Total 1057 5852 6909
POTABLE (drinkable) 63 0 63
~POTABLE 994 5852 6846
Total 1057 5852 6909
92
STEP 1 – COLLECTING CONTEXTS
 Collect contexts which are representative of the Roget category.
 Extract concordances of 100 surrounding words for each occurrence of each member of the
category in the corpus

CFILT - IITB
Words in Context of the category TOOLS
equipment such as a hydraulic shovel capable of lifting 26 cubic….
………….Resembling a power shovel mounted on a floating hul…
.equipment, valves for nuclear generators, oil refinery turbines....
…………...flint-edged wooden sickles were used to gather wild….
....penetrating carbide-tipped drills forced manufacturers to…..
………... heightens the colors Drills live in the forests of equa…..
.traditional ABC method and drill were unchanged and dissa…..
…..center of rotation A tower crane is an assembly of fabricat…..
…marshy areas The crowned crane however occasionally…….

 The level of noise introduced due to polysemy is substantial but can be tolerated as the
spurious senses get distributed over the 1041 other categories whereas the signal is
concentrated in just one.

93
STEP 2 – IDENTIFY SALIENT WORDS
 Identify salient words in the collective context and weight
appropriately.
 Weight(word) = Salience(Word) =

CFILT - IITB
ANIMAL/INSECT
species (2.3), family(1.7), bird(2.6), fish(2.4), egg(2.2),
coat(2.5), female(2.0), eat (2.2), nest(2.5), wild
TOOLS/MACHINERY
tool (3.1), machine(2.7), engine(2.6), blade(3.8), cut(2.2),
saw(2.5), lever(2.0), wheel (2.2), piston(2.5)

 This list of words includes a broad set of relations like :


 Hyponymy (e.g. bird, engine)
 Typical functions (e.g. eat, cut)
94
 Typical modifiers (e.g. wild, sharp)
LOCAL CONTEXTS DATABASE
 Local context is defined in terms of the syntactic dependencies
between the words and other words in the sentence.
 The local context is a triple that corresponds to a dependency

CFILT - IITB
relationship in which W is the head or the modifier.
(type, word, position)
E.g. the boy chased a brown dog

Word Local Contexts


Boy (subject, chase, head)
Dog (adjn, brown, modifier) (comp1, chase, head)
 The corpus is parsed to construct a Lexical Context Database. Each
entry in the DB is a pair (lc, C(lc))
lc C(lc)
(subject, employ, head) ((ORG 64 50.4) (plant 14 31.0) .....(pilot 2 5.3)) 95
DISAMBIGUATION
 Step1
Parse the input text and extract local contexts of the ambiguous word w.

CFILT - IITB
 Step 2
Search the Local Context DB and find words that appeared in an identical
local context as w. These are called selectors of w.
 Step 3
Select a sense s of w that maximizes the similarity between w and
Selectors(w).
 Step 4
Assign this sense to all occurrences of w in the input text.

96
CRITIQUE
 Makes use of the “small world” structure of co-occurrence graphs.
 The Good

CFILT - IITB
Does not require any tagged data
 Automatically extracts a “use” list of words in a corpus.
 Does not rely on dictionary defined word senses.

 The Bad
 The classifier is word-specific.
 A new classifier needs to be trained for every word that you want to
disambiguate.
 Accuracy
 Average accuracy of 96% when tested on a set of 10 highly polysemous
words.

97
CRITIQUE
 The Good
 Lexical Network (thesaurus) + Corpus based.
 Is able to capture the clues provided by Proper nouns from the corpus.

CFILT - IITB
 E.g. “Sachin Tendulkar” will have a strong salience value in the category “sports”
 The classifier is not word-specific. Will work even for unseen/rare words.
 The Bad
 Performance is weaker for:
 Minor sense distinctions within a category.
 E.g. the two senses of drug in medical domain.
 Idioms
 E.g. the word “hand” in “on the other hand” and “close at hand”.

 Accuracy
 Average accuracy of 92% when tested on a set of 12 highly polysemous words.

98
CRITIQUE
 The Good
 The same knowledge sources are used for all words.
 Can deal with words that are infrequent or do not even appear in the

CFILT - IITB
corpus.
 The classifier is not word-specific.

 The Bad
 Syntactic dependencies need to be identified from the corpus.
 This requires an efficient broad coverage parser.

 Accuracy
 74% on Wall Street Journal Corpus.

99

You might also like