Unit 2: Word Associations and Relation Discovery
Unit 2: Word Associations and Relation Discovery
Unit 2: Word Associations and Relation Discovery
2
Why Mine Word Associations?
3
Word Context
4
Word Co-occurrence
5
Mining Word Associations
• Paradigmatic
• Represent each word by its context
• Compute context similarity
• Words with high context similarity likely have paradigmatic relation
• Syntagmatic
• Count how many times two words occur together in a context (e.g., sentence or
paragraph)
• Compare their co-occurrences with their individual occurrences
• Words with high co-occurrences but relatively low individual occurrences likely
have syntagmatic relation
• Paradigmatically related words tend to have syntagmatic relation with the
same word joint discovery of the two relations
• These ideas can be implemented in many different ways!
6
Word Context as “Pseudo Document”
7
Computing Similarity of
Word Context
8
9
Syntagmatic Relation –
Word Collocation
• Syntagmatic relation is word co-occurrence – called Collocation
• If two words occur together in a context more often than chance, they are in the
syntagmatic relation (i.e., related words).
10
Word Probability
11
Binomial Distribution
12
Entropy as a Measure of Randomness
13
Entropy for Word Probability
14
Mutual Information (MI) as a Measure of Word
Collocation
𝑝(𝑥 , 𝑦 )
𝐼 ( 𝑋 ;𝑌 ) = ∑ ∑ 𝑝 (𝑥 , 𝑦)∙ log
𝑦 ∈𝑌 𝑥 ∈𝑋 𝑝(𝑥)∙ 𝑝 ( 𝑦)
15
Mutual Information (MI) and
Word Collocation
16
16
Probabilities in MI
17
17
Estimation of Word Probability
18
18
Point-wise Mutual Information
19
Vector Semantics
Positive Pointwise Mutual
Information (PPMI)
Dan Jurafsky
Word-Word matrix
Sample contexts 7 words
… …
21
Dan Jurafsky
Word-word matrix
• We showed only 4x6, but the real matrix is 50,000 x 50,000
• So it’s very sparse
• Most values are 0.
• That’s OK, since there are lots of efficient algorithms for sparse matrices.
• The size of windows depends on your goals
• The shorter the windows , the more syntactic the representation
1-3 very syntacticy
• The longer the windows, the more semantic the representation
4-10 more semanticy
22
Dan Jurafsky
23
Dan Jurafsky
•Pointwise
mutual information:
Do events x and y co-occur more than if they were independent?
26
Dan Jurafsky
27
Dan Jurafsky
28
Other Word Collocation Measures
diabetes (63/63)
+insulin (14/58)