Unit4 Final

UNIT-4
Word senses refer to the different meanings or interpretations that a

single word can have depending on its context. In linguistics and
computational semantics, understanding word senses is crucial for tasks
like machine translation, information retrieval, and natural language
understanding.
Example:
 The word "bank" can have multiple senses:

1. A financial institution: "I deposited money in the bank."
2. The side of a river: "We sat on the bank of the river."
The correct sense depends on the context in which the word is used.
Illustration of syntactic tree:
PROPOSITION BANK:
Information extraction (IE)
The task of automatically extracting structured information from
unstructured and/or semi-structured machine-readable documents and
other electronically represented sources is known as information
extraction (IE).
When there are a significant number of customers, manually assessing
Customer Feedback, for example, can be tedious, error-prone, and time-
consuming. There's a good chance we'll overlook a dissatisfied consumer.
Fortunately, sentiment analysis can aid in the improvement of customer
support interactions' speed and efficacy. By doing sentiment analysis on
all the incoming tickets and prioritizing them above the others, one can
quickly identify the most dissatisfied customers or the most important
issues. One might also allocate tickets to the appropriate individual or
team to handle them. As a result, Consumer satisfaction will improve
dramatically.
General Pipeline of the Information Extraction Process
The following steps are often involved in extracting structured

information from unstructured texts:
1. Initial processing.
2. Proper names identification.
3. Parsing.
4. Extraction of events and relations.
5. Anaphora resolution.
6. Output result generation.

1. Initial processing
The first step is to break down a text into fragments such as zones,
phrases, segments, and tokens. This function can be performed by
tokenizers, text zoners, segmenters, and splitters, among other
components. In the initial processing stage, part-of-speech tagging, and
phrasal unit identification (noun or verb phrases) are usually the next
tasks.
2. Proper names identification
One of the most important stages in the information extraction chain is

the identification of various classes of proper names, such as names of
people or organizations, dates, monetary amounts, places, addresses, and
so on. They may be found in practically any sort of text and are widely
used in the extraction process. Regular expressions, which are a
collection of patterns, are used to recognize these names.
3. Parsing
The syntactic analysis of the sentences in the texts is done at this step.
After recognizing the fundamental entities in the previous stage, the
sentences are processed to find the noun groups that surround some of
those entities and verb groups. At the pattern matching step, the noun and
verb groupings are utilized as sections to begin working on.
4. Extraction of events and relations
This stage establishes relations between the extracted ideas. This is

accomplished by developing and implementing extraction rules that
describe various patterns. The text is compared to certain patterns, and if
a match is discovered, the text element is labeled and retrieved later.
5. Coreference or Anaphora resolution
Coreference resolution is used to identify all of the ways the entity is

named throughout the text. The step where noun phrases are decided if
they relate to the same entity or not is called coreference or anaphora
resolution.
6. Output results generation
This stage entails converting the structures collected during the preceding
processes into output templates that follow the format defined by the user.
It might comprise a variety of normalization processes.
Spacy:
Spacy is a Python library for advanced natural language processing. It is

designed for production use and aids in the development of applications
that process and understand large amounts of text. It can be used to create
information extraction or natural language understanding systems, as well
as to preprocess text for deep learning.
Information Extraction Techniques Using Natural Language

Processing
1. Regular Expression.
2. Part-of-speech tagging.
3. Named Entity Recognition.
4. Topic Modeling.
5. Rule-Based Matching.
1. Regular Expression:
A regular expression (abbreviated regex or regexp, and sometimes known

as a rational expression) is a string of characters that defines a search
pattern. A regular expression, in other words, is a pattern that
characterizes a collection of strings. Regular expressions are built
similarly to arithmetic expressions, by combining smaller expressions
with various operators.
Although regular expressions precisely identify fine information,

circumstances around the underlying fine information, which might assist
in precisely locating information, are overlooked when using regular
expressions alone. As a result, regular expressions are commonly
considered as a fundamental approach that must be applied appropriately
in order to achieve high extraction performance.
A trained component contains binary data that is generated by giving a system enough
instances for it to make language-specific predictions — for example, in English, a word after
“the” is most often a noun.
3. Named Entity Recognition:
Named-entity recognition (NER) is a subtask of information extraction

that seeks to locate and classify named entities mentioned in unstructured
text into pre-defined categories such as person names, organizations,
locations, and so on.
Any word or group of words that consistently refers to the same item is
considered an entity.
A two-step approach lies at the heart of each NER model:
1. Detect a named entity.
2. Categorize the entity.
NER is appropriate for any scenario where a high-level overview of a

significant amount of text is required. You can rapidly categorize texts
based on their relevancy or similarity with NER and comprehend the
subject or topic.
4. Topic Modeling:
A topic model is a form of the statistical model used in machine learning

and natural language processing to find abstract "topics" that appear in a
collection of documents.
Topic Modeling is an unsupervised learning method for clustering

documents and identifying topics based on their contents. It works in the
same way as the K-Means algorithm and Expectation-Maximization. We
will have to evaluate individual words in each document to uncover
topics and assign values to each depending on the distribution of these
terms because we are clustering texts.
For a thorough explanation of how Topic Modeling works, see my

post here.
5. Rule-Based Matching:
When opposed to using regular expressions on raw text, rule-based

matcher engines and components allow you to access not only the words
and phrases you want but also the tokens and their relationships within
the document.
This enables easy access to and examination of the tokens in the area, as
well as the merging of spans into single tokens and the addition of entries
to defined entities.
 Token-based matching
Token annotations can be referred to by rules in this case. You can also
use the rule matcher to pass in a custom callback to act on matches. You
may also attach patterns to entity IDs to provide basic entity linking and
disambiguation. You may utilize the PhraseMatcher, which takes Doc
objects as match patterns, to match big terminology lists.
 Phrase Matching
If you need to match large terminology lists, instead of token patterns,

you may use the PhraseMatcher to generate Doc objects, which is much
more efficient in the long run. Doc patterns may contain one or more
tokens.
Conclusion
These are only a few examples of natural language processing techniques.

The aforementioned NLP techniques can be used to extract meaningful
information from grammatical text. Information extraction is not a simple
NLP operation to do. To better comprehend the data's structure and what
it has to give, we need to spend time with it. Spacy, on the other hand, is
a library developed to be used in the development of applications that
process enormous amounts of data.
TEMPLATE FILLING:
Template filling, also known as information extraction in Natural
Language Processing (NLP), is a method of extracting specific, structured
information from unstructured text data. It involves filling predefined
slots in a template with relevant data points found within a text. This
technique is essential for organizing and structuring information,
especially when handling large text datasets in areas like news
summarization, medical reports, customer service, and more.
The process begins with a pre-designed template that includes specific

fields or "slots" for information. For example, in a template for extracting
details from a job application, fields might include "Name,"
"Experience," "Skills," and "Education." NLP techniques like Named
Entity Recognition (NER) identify and extract data associated with these
fields. NLP models then use machine learning algorithms, or sometimes
rules-based systems, to find and fill the relevant information.
Template filling faces challenges like handling ambiguous language,

variations in phrasing, and the need to accurately interpret context.
Recently, advances in NLP through transformer-based
Template Filling in Natural Language Processing (NLP)
Template filling is a widely used technique in Natural Language

Processing (NLP) to extract structured information from unstructured text.
By taking free-form text—like articles, reports, emails, or social media
posts—and identifying specific details to fit into a predefined format,
template filling enables automation in data extraction, summarization,
and organization. This approach is used in numerous domains, such as
news article summarization, medical data organization, legal document
analysis, and customer service responses.
How Template Filling Works
The process typically begins with a template designed to capture

particular fields, such as "Name," "Location," "Date," "Event," or
"Organization." NLP models and algorithms analyze the input text to
locate and extract relevant information, filling in these fields. For instance,
in an email about a meeting, a template might require the date, time, and
location of the meeting, as well as the list of attendees. The NLP system
would scan the text, identify and extract these details, and input them into
the appropriate fields within the template.
Techniques in Template Filling
Several methods can be used for template filling, ranging from rule-based
systems to more advanced machine learning and deep learning
approaches:
1. Rule-Based Systems: Early template-filling systems relied on

predefined rules or regular expressions to detect keywords and
extract information. These systems work well in predictable
environments, but they lack the flexibility to handle varied
language structures.
2. Named Entity Recognition (NER): NER is often used to identify
and categorize key information, such as names, locations, dates,
and organizations, in a text. For example, in a news article, NER
would help label “Paris” as a location or “Apple Inc.” as an
organization. This data can then be placed into a structured
template.
3. Dependency Parsing and Semantic Role Labeling: These
techniques analyze sentence structure and understand the
relationships between different words or phrases. They are useful
for identifying subject-object relationships or actions, which helps
ensure the extracted information is accurate and contextually
relevant.
4. Transformer-Based Models: With the advent of deep learning
models like BERT, GPT, and T5, template filling has become more
sophisticated. These models use large-scale language
understanding to handle context, ambiguity, and varying phrasing.
They allow NLP systems to handle more complex and varied
language structures, improving template-filling accuracy in real-
world applications.
Applications of Template Filling
Template filling is versatile and widely applicable across different fields:
 Medical and Clinical Documentation: Extracting patient details,

diagnoses, and treatment plans from clinical notes for easier access
and analysis.
 Customer Service and Chatbots: Filling in fields for inquiries
and responses, allowing bots to deliver relevant answers quickly
and accurately.
 Legal Document Analysis: Extracting clauses, dates, parties, and
obligations from lengthy contracts to enable easier searching and
referencing.
 Financial News Summarization: Pulling out stock information,
company names, and market movements from financial news
reports for financial analysts and trading algorithms.
Challenges in Template Filling
Despite its advantages, template filling faces a few challenges:
 Language Ambiguity: Variability in phrasing and sentence

structure can complicate data extraction, especially in unstructured
or informal texts.
 Contextual Understanding: Accurately filling templates requires
understanding nuances and contextual information, which simple
rule-based systems may struggle with.
 Domain-Specific Requirements: Different fields may have unique
terminologies and structures that require tailored NLP models to
achieve reliable results.
Recent Advances
The rise of transformer-based models has significantly improved template

filling, as these models can understand context and nuance at a deeper
level. These advancements allow for more accurate information
extraction, even in complex and variable language environments.
Additionally, transfer learning—training models on large datasets and
then fine-tuning them on domain-specific data—has enhanced the
adaptability of template-filling applications in specialized fields.
In summary, template filling in NLP has evolved from rule-based

methods to sophisticated AI-driven approaches, enabling more efficient
and accurate structuring of unstructured data across various industries. As
NLP continues to advance, template filling is expected to become even
more precise, adaptable, and widely applied, helping businesses and
individuals manage information more effectively.
Selectional Restrictions
Consider the two interpretations of:
I want to eat someplace nearby.
a) sensible:
Eat is intransitive and “someplace nearby” is a location adjunct
b) Speaker is Godzilla
Eat is transitive and “someplace nearby” is a direct object
How do we know speaker didn’t mean b) ?
Because the THEME of eating tends to be something edible
3
Selectional restrictions are associated with
senses
• The restaurant serves green-‐lipped mussels.
• THEME is some kind of food
• Which airlines serve Denver?
• THEME is an appropriate location
4
Selectional restrictions vary in specificity
I often ask the musicians to imagine a tennis game.
To diagonalize a matrix is to find its eigenvalues.
Radon is an odorless gas that can’t be detected by human senses.
5
t consists of a single variable that stands for the event, a predicat
ntribution of a verb like=>eat might look like the following:
dish
f event, and variables=>and relations
nutriment, for the event
nourishment, roles. Ignoring t
nutrition...
ctures andRepresenting
using thematic=> sroles
9e, x, y rather
=> food, nutrient
Eating(e) Agent(e, x) than
T deep
heme(e,
electional restrictions
^
substance
^ y)event roles, the
on of a verb like eat might look like
=> matter
th this representation, all we know about y, the
the following:
filler
=> physical entity
of the THEME role, is th
Instead owith
s associated f representing
an Eating “event
eat” athrough
s: => entity
the Theme relation. To stipulate t
ectional restriction
9e, x, y Eating(e)
that y must be ^ Agent(e,
something x)
edible,
^
Figure 22.6 Evidence from WordNet that hamburgers are edible.
T
weheme(e, y)
simply add a new term
t effect:
Just add:
representation, all we know about y, the filler
9e, x, y Eating(e) ^ Agent(e, x) ^ T heme(e, y) ^ EdibleT hing(y)of the THEME
When a phrase like ate a hamburger is encountered, a semantic analyzer can
ro
iatedformwith an Eating
the following kind of event through the Theme relation. To
representation: sti
And “eat a hamburger” becomes
l restriction that y must be something edible, we simply add a ne
9e, x, y Eating(e) ^ Eater(e, x) ^ T heme(e, y) ^ EdibleT hing(y) ^ Hamburger(y)
:
But this assumes
This representation we reasonable
is perfectly have a large
sincekthenowledge
membership base
of yoinf the
facts
category
9e,6x, y Eating(e)
Hamburger
about iseconsistent Agent(e,
dible ^things
with its hx)
membership
and ^ T inheme(e,
amburgers and wy)
the category ^ EdibleT
EdibleThing,
hatnot. hing(y)
assuming
Let’s use WordNet synsets to specify
selectional restrictions
• The THEME of eat must be WordNet synset {food, nutrient}
“any substance that can be metabolized by an animal to give energy and build tissue”
• Similarly
THEME of imagine: synset {entity}
THEME of lift: synset {physical entity}
THEME of diagonalize: synset {matrix}
• This allows
imagine a hamburger and lift a hamburger,
• Correctly rules out
7 diagonalize a hamburger.
Selectional
Restrictions
Selectional Preferences
Selectional Preferences
• In early implementations, selectional restrictions were strict
constraints (Katz and Fodor 1963)
• Eat [+FOOD]
• But it was quickly realized selectional constraints are really
preferences (Wilks 1975)
• But it fell apart in 1931, perhaps because people realized you can’t eat gold
for lunch if you’re hungry.
• In his two championship trials, Mr. Kulkarni ate glass on an empty stomach,
accompanied only by water and tea.
9
Selectional Association (Resnik 1993)
• Selectional preference strength: amount of information that a
predicate tells us about the semantic class of its arguments.
• eat tells us a lot about the semantic class of its direct objects
• be doesn’t tell us much
• The selectional preference strength
• difference in information between two distributions:
P(c) the distribution of expected semantic classes for any direct object
P(c|v) the distribution of expected semantic classes for this verb
• The greater the difference, the more the verb is constraining its object
10
it that the direct object of the specific verb v will fall into semantic class c). The
greater
greater the the differencebetween
difference between these
these distributions,
distributions,thethe
more
moreinformation the verb
information the isverb is
giving us about possible objects. The difference between these two distributions can
giving us about possible objects. The difference between these two distributions can
relative entropy
ve entropy
KL divergence
Selectional preference strength
be quantified by relative entropy, or the Kullback-Leibler divergence (Kullback and
be quantified by relative
Leibler, 1951). entropy, or the
The Kullback-Leibler Kullback-Leibler
or KL divergence D(P||Q)divergence
expresses(Kullback
the dif- and
divergence Leibler, 1951).
ference Thetwo
between Kullback-Leibler or KL Pdivergence
probability distributions and Q (we’llD(P||Q)
return to expresses
this when wethe dif-
ference between
discuss two probability
distributional models of distributions
meaning in P and17).
Chapter Q (we’ll return to this when we
• Relative entropy, or the Kullback-‐Leibler divergence is the difference
discuss distributional models of meaning in Chapter 17).
between two distributions X P(x)
D(P||Q) = P(x) log (22.38)
X Q(x)
P(x)
x
D(P||Q) = P(x) log (22.38)
Q(x)
The selectional preference SR (v) usesxthe KL divergence to express how much
information, in bits, the verb v expresses about the possible semantic class of its
• Selectional
The preference:
selectional
argument.
How SRm(v)
preference uch
information
uses (in bits)
the KL divergence the verb
to express howexpresses
much
about the semantic
information, class
in bits, the verb of vits argument
expresses about the possible semantic class of its
argument. SR (v) = D(P(c|v)||P(c))
X P(c|v)
22.7 • S ELECTIONAL R ESTRICTIONS 15
= P(c|v) log (22.39)
SR (v) = D(P(c|v)||P(c))
c
P(c)
X to the general P(c|v)
Selectional
•selectional Association
as the relative o
contribution f a
of vthat
erb
= w
class ith a c lass: T he
r elative contribution
selectional preference of the of the
P(c|v) log a particular class and verb(22.39)
association verb:Resnik then defines the selectional association of
class to the general preference of c the verb P(c)
1 P(c|v)
11
selectional AR (v,
Resnik then defines the c) =
selectional P(c|v) log of a particular class and
association (22.40)
verb
association SR (v) P(c)
22.7 • S ELECTIONAL R ESTRICTI
as the relative contribution of that class to the general selectional prefere

Computing Selectional Association
verb:
1 P(c|v)
AR (v, c) = P(c|v) log
• A probabilistic measure of the strength of association
SR (v) bP(c)
etween a
predicate and a semantic class of association
The selectional its argument
is thus a probabilistic measure of the stren
sociation between a predicate and a class dominating the argument to the
• Parse a corpus Resnik estimates the probabilities for these associations by parsing a corp
• Count all the times eing
ach allpthe
redicate appears
times each with
predicate occurs each with
argument word word, and
each argument
that each word is a partial observation of all the WordNet concepts cont
• Assume each word iword. s a partial observation of all
The following table from Resnik the W ordNet
(1996) concepts
shows some sample high
associated with that selectional
word associations for verbs and some WordNet semantic classes of th
• Some high and low aobjects.
ssociations:
Direct Object Direct Object
Verb Semantic Class Assoc Semantic Class Assoc
read WRITING 6.80 ACTIVITY -.20
12
write WRITING 7.26 COMMERCE 0
see ENTITY 5.79 METHOD -0.01
Results from similar models
Ó Séaghdha and Korhonen (2012)
eat food#n#1, aliment#n#1, entity#n#1, solid#n#1, food#n#2

drink fluid#n#1, liquid#n#1, entity#n#1, alcohol#n#1, beverage#n#1
appoint individual#n#1, entity#n#1, chief#n#1, being#n#2, expert#n#1
publish abstract entity#n#1, piece of writing#n#1, communication#n#2, publication#n#1
Table 2: Most probable cuts learned by WN-C UT for the object argument of selected verbs
Verb-object Noun-noun Adjective-noun

13 Seen Unseen Seen Unseen Seen Unseen
alternative to using selectional association between a verb and the WordNet class
its arguments, Instead
is to simply o use f uthesing classes,
conditional probability of an argument word
en a predicate verb. This simple model of selectional preferences can be used to
a s impler m odel o f s electional
ectly modeling the strength of association of one verb (predicate) with one noun
association
gument).
Model just
The• conditional the association
probability model can obef p redicate byv parsing
computed with a anvery
oun large
n cor-
s (billions(one noun, as
of words), andopposed to
computing the co-occurrence
whole semantic class in
counts: how WordNet)
often a given
rb occurs•with a given
Parse noun
a huge in a given relation. The conditional probability of an
corpus
gument noun givenhow
• Count a verb fora anoun
often particular relation
n occurs P(n|v,r r)
in relation canverb
with thenv: be used as a
ectional preference metric for that
log count(n,v,r)
pair of words (Brockmann and Lapata, 2003):
• Or the probability: (
C(n,v,r)
if C(n, v, r) > 0
P(n|v, r) = C(v,r)
0 otherwise
The14inverse probability P(v|n, r) was found to have better performance in some

ses (Brockmann and Lapata, 2003):
Evaluation from Bergsma, Lin, Goebel
Verb Plaus./Implaus. Resnik Dagan et al. Erk MI
see friend/method 5.79/-0.01 0.20/1.40* 0.46/-0.07 1.11/
read article/fashion 6.80/-0.20 3.00/0.11 3.80/1.90 4.00/
find label/fever 1.10/0.22 1.50/2.20* 0.59/0.01 0.42/
hear story/issue 1.89/1.89* 0.66/1.50* 2.00/2.60* 2.99/
write letter/market 7.26/0.00 2.50/-0.43 3.60/-0.24 5.06/
urge daughter/contrast 1.14/1.86* 0.14/1.60* 1.10/3.60* -0.95
warn driver/engine 4.73/3.61 1.20/0.05 2.30/0.62 2.87/
judge contest/climate 1.30/0.28 1.50/1.90* 1.70/1.70* 3.90/
teach language/distance 1.87/1.86 2.50/1.30 3.60/2.70 3.53/
show sample/travel 1.44/0.41 1.60/0.14 0.40/-0.82 0.53/
expect visit/mouth 0.59/5.93* 1.40/1.50* 1.40/0.37 1.05/
answer request/tragedy 4.49/3.88 2.70/1.50 3.10/-0.64 2.93/
recognize author/pocket 0.50/0.50* 0.03/0.37* 0.77/1.30* 0.48/
repeat comment/journal 1.23/1.23* 2.30/1.40 2.90/— 2.59/
understand concept/session 1.52/1.51 2.70/0.25 2.00/-0.28 3.96/
15 remember reply/smoke 1.31/0.20 2.10/1.20 0.54/2.60* 1.13/
Selectional
Restrictions
Conclusion
Summary: Selectional Restrictions
• Two classes of models of the semantic type constraint that a
predicate places on its argument:
• Represent the constraint between predicate and WordNet class
• Represent the constraint between predicate and a word
• One fun recent use case: detecting metonomy (type coercion)
• Coherent with selectional restrictions: Pustejovsky et al (2010)
The spokesman denied the statement (PROPOSITION).
The child threw the stone (PHYSICAL OBJECT)
• Coercion:
The president denied the attack (EVENT → PROPOSITION).
17
The White House (LOCATION → HUMAN) denied the statement.

Unit4 Final

Uploaded by

Copyright:

Available Formats

Unit4 Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit4 Final

Uploaded by

Copyright:

Available Formats

UNIT-4

Word senses refer to the different meanings or interpretations that a

 The word "bank" can have multiple senses:

General Pipeline of the Information Extraction Process

The following steps are often involved in extracting structured

2. Proper names identification.

4. Extraction of events and relations.

6. Output result generation.

2. Proper names identification

One of the most important stages in the information extraction chain is

4. Extraction of events and relations

This stage establishes relations between the extracted ideas. This is

5. Coreference or Anaphora resolution

Coreference resolution is used to identify all of the ways the entity is

Spacy is a Python library for advanced natural language processing. It is

Information Extraction Techniques Using Natural Language

3. Named Entity Recognition.

A regular expression (abbreviated regex or regexp, and sometimes known

Although regular expressions precisely identify fine information,

Named-entity recognition (NER) is a subtask of information extraction

A two-step approach lies at the heart of each NER model:

1. Detect a named entity.

2. Categorize the entity.

NER is appropriate for any scenario where a high-level overview of a

A topic model is a form of the statistical model used in machine learning

Topic Modeling is an unsupervised learning method for clustering

For a thorough explanation of how Topic Modeling works, see my

When opposed to using regular expressions on raw text, rule-based

If you need to match large terminology lists, instead of token patterns,

These are only a few examples of natural language processing techniques.

The process begins with a pre-designed template that includes specific

Template filling faces challenges like handling ambiguous language,

Template Filling in Natural Language Processing (NLP)

Template filling is a widely used technique in Natural Language

How Template Filling Works

The process typically begins with a template designed to capture

Techniques in Template Filling

1. Rule-Based Systems: Early template-filling systems relied on

Applications of Template Filling

Template filling is versatile and widely applicable across different fields:

 Medical and Clinical Documentation: Extracting patient details,

Challenges in Template Filling

Despite its advantages, template filling faces a few challenges:

 Language Ambiguity: Variability in phrasing and sentence

The rise of transformer-based models has significantly improved template

In summary, template filling in NLP has evolved from rule-based

as the relative contribution of that class to the general selectional prefere

eat food#n#1, aliment#n#1, entity#n#1, solid#n#1, food#n#2

Verb-object Noun-noun Adjective-noun

The14inverse probability P(v|n, r) was found to have better performance in some

You might also like