Discourse Structure Text Coherence and Cohesion Reference Resolution
Discourse Structure Text Coherence and Cohesion Reference Resolution
Discourse Structure
Text Coherence and Cohesion
Reference Resolution
Synchronic Model of Language
Pragmatic
Discourse
Semantic
Syntactic
Lexical
Morphological
Phonetic
Discourse Linguistics
7
Discourse Segmentation
Documents are automatically separated into passages,
sometimes called fragments, which are different discourse
segments
Techniques to separate documents into passages include
Rule-based systems based on clue words and phrases
Probabilistic techniques to separate fragments and to identify
discourse segments (Oddy)
TextTiling algorithm uses cohesion to identify segments, assuming
that each segment exhibits lexical cohesion within the segment, but
is not cohesive across different segments
Lexical cohesion score average similarity of words within a
segment
Identify boundaries by the difference of cohesion scores
NLTK has a text tiling algorithm available
8
Cohesion Surface Level Ties
A piece of text is intended and is perceived as more than a
simple sequencing of independent sentences.
Therefore, a text will exhibit unity / texture
on the surface level (cohesion)
at the meaning level (coherence)
Halliday & Hasans Cohesion in English (1976)
Sets forth the linguistic devices that are available in the
English language for creating this unity / texture
Identifies the features in a text that contribute to an
intelligent comprehension of the text
Important for language generation, produces natural-
sounding texts
Cohesive Relations
Define dependencies between sentences in text.
He said so.
He and so presuppose elements in the preceding
text for their understanding
This presupposition and the presence of information
elsewhere in text to resolve this presupposition provide
COHESION
- Part of the discourse-forming component of the linguistic
system
- Provides the means whereby structurally unrelated
elements are linked together
Six Types of Cohesive Ties
Grammatical
Reference
Substitution
Ellipsis
Conjunction
Lexical
Reiteration
Collocation
(In practice, there is overlap; some examples can show
more than one type of cohesion.)
1. Reference
- items in a language which, rather than being interpreted in
their own right, make reference to something else for their
interpretation.
Doctor Foster went to Gloucester in a shower of rain. He stepped in a
puddle right up to his middle and never went there again.
Types of Reference
endophora Coreference
exophora resolution
[textual]
[situation referring to
things outside of text
not part of cohesion]
anaphora cataphora
[preceding text] [following text]
2. Substitution:
- a substituted item that serves the same structural function as the
item for which it is substituted.
Nominal one, ones, same
Verbal do
Clausal so, not
- These biscuits are stale. Get some fresh ones.
- Person 1 Ill have two poached eggs on toast, please.
Person 2 Ill have the same.
- The words did not come the same as they used to do. I dont
know the meaning of half those long words, and whats
more, dont believe you do either, said Alice.
3. Ellipsis
- Very similar to substitution principles, embody same relation
between parts of a text
- Something is left unsaid, but understood nonetheless, but a
limited subset of these instances
Smith was the first person to leave. I was the second
__________.
Joan brought some carnations and Catherine ______ some
sweet peas.
Who is responsible for sales in the Northeast? I believe
Peter Martin is _______.
4. Conjunction
- Different kind of cohesive relation in that it doesnt require us
to understand some other part of the text to understand the
meaning
- Rather, a specification of the way the text that follows is
systematically connected to what has preceded
For the whole day he climbed up the steep mountainside,
almost without stopping.
And in all this time he met no one.
Yet he was hardly aware of being tired.
So by night the valley was far below him.
Then, as dusk fell, he sat down to rest.
Now, 2 types of Lexical Cohesion
- Lexical cohesion is oncerned with cohesive effects
achieved by selection of vocabulary
5. Reiteration continuum
I attempted an ascent of the peak. _X__ was easy.
- same lexical item the ascent
- synonym the climb
- super-ordinate term the task
- general noun the act
- pronoun - it
6. Collocations
- Lexical cohesion achieved through the association of
semantically related lexical items
- Accounts for any pair of lexical items that exist in some
lexico-semantic relationship, e. g.
- complementaries
boy / girl
stand-up / sit-down
- antonyms
wet / dry
crowded / deserted
- converses
order / obey
give / take
Collocations (contd)
- part-whole
brake / car
lid / box
20
Coherence Relations Semantic Meaning Ties
The set of possible relations between the meanings of
different utterances in the text
Hobbs (1979) suggests relations such as
Result: state in first sentence could cause the state in a second
sentence
Explanation: the state in the second sentence could cause the first
John hid Bills car keys. He was drunk.
Parallel: The states asserted by two sentences are similar
The Scarecrow wanted some brains. The Tin Woodsman wanted a
heart.
Elaboration: Infer the same assertion from the two sentences.
Textual Entailment
NLP task to discover the result and elaboration between two
sentences.
21
Anaphora / Reference Resolution
One of the most important NLP tasks for cohesion at the
discourse level
A linguistic phenomenon of abbreviated subsequent
reference
A cohesive tie of the grammatical and lexical types
Includes reference, substitution and reiteration
2 levels of resolution:
within document (co-reference resolution)
e.g. Bin Ladin = he
his followers = they
terrorist attacks = they
the Federal Bureau of Investigation = FBI = F.B.I
across document (or named entity resolution)
e.g. maverick Saudi Arabian multimillionaire = Usama Bin
Ladin = Bin Ladin
Event resolution is also possible, but not widely used
Examples from Contexts
1. The State Department renewed its appeal for Bin Laden on
Monday and warned of possible fresh attacks by his followers against U.S.
targets.
3. The F.B.I. will also have the final say over the hiring and firing of the
10 Hungarian agents who will work in the office, alongside five
American agents. The bureau has long had agents posted in American
embassies
Glossary of Terminology
Referring phrases
Reference Types
Definite noun phrases the X
Definite reference is used to refer to an entity identifiable by the
reader because it is either
a) already mentioned previously (in discourse), or
b) contained in the readers set of beliefs about the world (pragmatics), or
c) the object itself is unique. (Jurafsky & Martin, 2000)
E.g.
Mr. Torres and his companion claimed a hardshelled black vinyl
suitcase1. The police rushed the suitcase1 (a) to the Trans-Uranium
Institute2 (c) where experts cut it1 open because they did not have the
combination to the locks.
The German authorities3 (b) said a Colombian4 who had lived for a long
time in the Ukraine5 (c) flew in from Kiev. He had 300 grams of
plutonium 2396 in his baggage. The suspected smuggler4 (a) denied that
the materials6 (a) were his.
Pronominalization
Pronouns refer to entities that were introduced fairly recently,
1-4-5-10(?) sentences back.
Nominative (he, she, it, they, etc.)
e.g. The German authorities said a Colombian1 who had lived for a
long time in the Ukraine flew in from Kiev. He1 had 300 grams of
plutonium 239 in his baggage.
Oblique (him, her, them, etc.)
e.g. Undercover investigators negotiated with three members of a
criminal group2 and arrested them2 after receiving the first
shipment.
Possessive (his, her, their, etc. + hers, theirs, etc.)
e.g. He3 had 300 grams of plutonium 239 in his3 baggage. The
suspected smuggler3* denied that the materials were his3. (*chain)
Reflexive (himself, themselves, etc.)
e.g. There appears to be a growing problem of disaffected loners4
who cut themselves4 off from all groups .
Indefinite noun phrases a X, or an X
Typically, an indefinite noun phrase introduces a new entity
into the discourse and would not be used as a referring
phrase to something else
The exception is in the case of cataphora:
A Soviet pop star was killed at a concert in Moscow last night. Igor
Talkov was shot through the heart as he walked on stage.
Note that cataphora can occur with pronouns as well:
When he visited the construction site last month, Mr. Jones talked
with the union leaders about their safety concerns.
30
Demonstratives this and that
Demonstrative pronouns can either appear alone or as
determiners
this ingredient, that spice
These NP phrases with determiners are ambiguous
They can be indefinite
I saw this beautiful car today.
Or they can be definite
I just bought a copy of Thoreaus Walden. I had bought one five
years ago. That one had been very tattered; this one was in much
better condition.
31
Names
Names can occur in many forms, sometimes called name
variants.
Victoria Chen, Chief Financial Officer of Megabucks Banking Corp.
since 2004, saw her pay jump 20% as the 37-year-old also became the
Denver-based financial-services companys president. Megabucks
expanded recently . . . MBC . . .
(Victoria Chen, Chief Financial Officer, her, the 37-year-old, the Denver-based
financial-services companys president)
(Megabucks Banking Corp. , the Denver-based financial-services company,
Megabucks, MBC )
32
Unusual Cases
Compound phrases
John and Mary got engaged. They make a cute couple.
John and Mary went home. She was tired.
Singular nouns with a plural meaning
The focus group met for several hours. They were very intent.
Part/whole relationships
John bought a new car. A door was dented.
33
Approach to coreference resolution
Naively identify all referring phrases for
resolution:
all Pronouns
all definite NPs
all Proper Nouns
Filter things that look referential but, in fact, are
not
e.g. geographic names, the United State
pleonastic it, e.g. its 3:45 p.m., it was cold
non-referential it, they, there
e.g. it was essential, important, is understood,
they say,
there seems to be a mistake
Identify Referent Candidates
All noun phrases (both indef. and def.) are considered potential
referent candidates.
A referring phrase can also be a referent for a subsequent referring
phrases,
Example: (omitted sentence with name of suspect)
He had 300 grams of plutonium 239 in his baggage. The
suspected smuggler denied that the materials were his.
(chain of 4 referring phrases)
All potential candidates are collected in a table collecting feature
info on each candidate.
Problems:
chunking
e.g. the Chase Manhattan Bank of New York
nesting of NPs
Features
Define features between a refering phrase and each candidate
Number agreement: plural, singular or neutral
He, she, it, etc. are singular, while we, us, they, them, etc. are
plural and should match with singular or plural nouns, respectively
Exceptions: some plural or group nouns can be referred to by
either it or they
IBM announced a new product. They have been working on it
Gender agreement:
Generally animate objects are referred to by either male pronouns
(he, his) or female pronouns (she, hers)
Inanimate objects take neutral (it) gender
Person agreement:
First and second person pronouns are I and you
Third person pronouns must be used with nouns
More Features
Binding constraints
Reflexive pronouns (himself, themselves) have constraints on which
nouns in the same sentence can be referred to:
John bought himself a new Ford. (John = himself)
John bought him a new Ford. (John cannot = him)
Recency
Entities situated closer to the referring phrase tend to be more salient
than those further away
And pronouns cant go more than a few sentences away
Grammatical role / Hobbs distance
Entities in a subject position are more likely than in the object
position
37
Even more features
Repeated mention
Entities that have been the focus of the discourse are more likely to
be salient for a referring phrase
Parallelism
There are strong preferences introduced by parallel constructs
Long John Silver went with Jim. Billy Bones went with him.
(him = Jim)
Verb Semantics and selectional restrictions
Certain verbs take certain types of arguments and may prejudice the
resolution of pronouns
John parked his car in the garage after driving it around for hours.
38
Example: rules to assign gender info
40
Summary of Discourse Level Tasks
Most widely used task is coreference resolution
Important in many other text analysis tasks in order to understand
meaning of sentences
Dialogue structure is also part of discourse analysis and will
be considered separately (next time)
Document structure
Recognizing known structure, for example, abstracts
Separating documents accoring to known structure
Named entity resolution across documents
Using cohesive elements in language generation and
machine translation
41