Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit4 Final

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

UNIT-4

Word senses refer to the different meanings or interpretations that a


single word can have depending on its context. In linguistics and
computational semantics, understanding word senses is crucial for tasks
like machine translation, information retrieval, and natural language
understanding.

Example:

 The word "bank" can have multiple senses:


1. A financial institution: "I deposited money in the bank."
2. The side of a river: "We sat on the bank of the river."

The correct sense depends on the context in which the word is used.
Illustration of syntactic tree:
PROPOSITION BANK:
Information extraction (IE)
The task of automatically extracting structured information from
unstructured and/or semi-structured machine-readable documents and
other electronically represented sources is known as information
extraction (IE).
When there are a significant number of customers, manually assessing
Customer Feedback, for example, can be tedious, error-prone, and time-
consuming. There's a good chance we'll overlook a dissatisfied consumer.
Fortunately, sentiment analysis can aid in the improvement of customer
support interactions' speed and efficacy. By doing sentiment analysis on
all the incoming tickets and prioritizing them above the others, one can
quickly identify the most dissatisfied customers or the most important
issues. One might also allocate tickets to the appropriate individual or
team to handle them. As a result, Consumer satisfaction will improve
dramatically.

General Pipeline of the Information Extraction Process

The following steps are often involved in extracting structured


information from unstructured texts:

1. Initial processing.

2. Proper names identification.

3. Parsing.

4. Extraction of events and relations.

5. Anaphora resolution.

6. Output result generation.


1. Initial processing

The first step is to break down a text into fragments such as zones,
phrases, segments, and tokens. This function can be performed by
tokenizers, text zoners, segmenters, and splitters, among other
components. In the initial processing stage, part-of-speech tagging, and
phrasal unit identification (noun or verb phrases) are usually the next
tasks.

2. Proper names identification

One of the most important stages in the information extraction chain is


the identification of various classes of proper names, such as names of
people or organizations, dates, monetary amounts, places, addresses, and
so on. They may be found in practically any sort of text and are widely
used in the extraction process. Regular expressions, which are a
collection of patterns, are used to recognize these names.

3. Parsing

The syntactic analysis of the sentences in the texts is done at this step.
After recognizing the fundamental entities in the previous stage, the
sentences are processed to find the noun groups that surround some of
those entities and verb groups. At the pattern matching step, the noun and
verb groupings are utilized as sections to begin working on.

4. Extraction of events and relations

This stage establishes relations between the extracted ideas. This is


accomplished by developing and implementing extraction rules that
describe various patterns. The text is compared to certain patterns, and if
a match is discovered, the text element is labeled and retrieved later.

5. Coreference or Anaphora resolution

Coreference resolution is used to identify all of the ways the entity is


named throughout the text. The step where noun phrases are decided if
they relate to the same entity or not is called coreference or anaphora
resolution.
6. Output results generation

This stage entails converting the structures collected during the preceding
processes into output templates that follow the format defined by the user.
It might comprise a variety of normalization processes.

Spacy:

Spacy is a Python library for advanced natural language processing. It is


designed for production use and aids in the development of applications
that process and understand large amounts of text. It can be used to create
information extraction or natural language understanding systems, as well
as to preprocess text for deep learning.

Information Extraction Techniques Using Natural Language


Processing

1. Regular Expression.

2. Part-of-speech tagging.

3. Named Entity Recognition.

4. Topic Modeling.

5. Rule-Based Matching.
1. Regular Expression:

A regular expression (abbreviated regex or regexp, and sometimes known


as a rational expression) is a string of characters that defines a search
pattern. A regular expression, in other words, is a pattern that
characterizes a collection of strings. Regular expressions are built
similarly to arithmetic expressions, by combining smaller expressions
with various operators.

Although regular expressions precisely identify fine information,


circumstances around the underlying fine information, which might assist
in precisely locating information, are overlooked when using regular
expressions alone. As a result, regular expressions are commonly
considered as a fundamental approach that must be applied appropriately
in order to achieve high extraction performance.
A trained component contains binary data that is generated by giving a system enough
instances for it to make language-specific predictions — for example, in English, a word after
“the” is most often a noun.
3. Named Entity Recognition:

Named-entity recognition (NER) is a subtask of information extraction


that seeks to locate and classify named entities mentioned in unstructured
text into pre-defined categories such as person names, organizations,
locations, and so on.

Any word or group of words that consistently refers to the same item is
considered an entity.

A two-step approach lies at the heart of each NER model:

1. Detect a named entity.

2. Categorize the entity.

NER is appropriate for any scenario where a high-level overview of a


significant amount of text is required. You can rapidly categorize texts
based on their relevancy or similarity with NER and comprehend the
subject or topic.

4. Topic Modeling:

A topic model is a form of the statistical model used in machine learning


and natural language processing to find abstract "topics" that appear in a
collection of documents.

Topic Modeling is an unsupervised learning method for clustering


documents and identifying topics based on their contents. It works in the
same way as the K-Means algorithm and Expectation-Maximization. We
will have to evaluate individual words in each document to uncover
topics and assign values to each depending on the distribution of these
terms because we are clustering texts.

For a thorough explanation of how Topic Modeling works, see my


post here.

5. Rule-Based Matching:

When opposed to using regular expressions on raw text, rule-based


matcher engines and components allow you to access not only the words
and phrases you want but also the tokens and their relationships within
the document.

This enables easy access to and examination of the tokens in the area, as
well as the merging of spans into single tokens and the addition of entries
to defined entities.

 Token-based matching

Token annotations can be referred to by rules in this case. You can also
use the rule matcher to pass in a custom callback to act on matches. You
may also attach patterns to entity IDs to provide basic entity linking and
disambiguation. You may utilize the PhraseMatcher, which takes Doc
objects as match patterns, to match big terminology lists.
 Phrase Matching

If you need to match large terminology lists, instead of token patterns,


you may use the PhraseMatcher to generate Doc objects, which is much
more efficient in the long run. Doc patterns may contain one or more
tokens.

Conclusion

These are only a few examples of natural language processing techniques.


The aforementioned NLP techniques can be used to extract meaningful
information from grammatical text. Information extraction is not a simple
NLP operation to do. To better comprehend the data's structure and what
it has to give, we need to spend time with it. Spacy, on the other hand, is
a library developed to be used in the development of applications that
process enormous amounts of data.
TEMPLATE FILLING:
Template filling, also known as information extraction in Natural
Language Processing (NLP), is a method of extracting specific, structured
information from unstructured text data. It involves filling predefined
slots in a template with relevant data points found within a text. This
technique is essential for organizing and structuring information,
especially when handling large text datasets in areas like news
summarization, medical reports, customer service, and more.

The process begins with a pre-designed template that includes specific


fields or "slots" for information. For example, in a template for extracting
details from a job application, fields might include "Name,"
"Experience," "Skills," and "Education." NLP techniques like Named
Entity Recognition (NER) identify and extract data associated with these
fields. NLP models then use machine learning algorithms, or sometimes
rules-based systems, to find and fill the relevant information.

Template filling faces challenges like handling ambiguous language,


variations in phrasing, and the need to accurately interpret context.
Recently, advances in NLP through transformer-based

Template Filling in Natural Language Processing (NLP)

Template filling is a widely used technique in Natural Language


Processing (NLP) to extract structured information from unstructured text.
By taking free-form text—like articles, reports, emails, or social media
posts—and identifying specific details to fit into a predefined format,
template filling enables automation in data extraction, summarization,
and organization. This approach is used in numerous domains, such as
news article summarization, medical data organization, legal document
analysis, and customer service responses.

How Template Filling Works

The process typically begins with a template designed to capture


particular fields, such as "Name," "Location," "Date," "Event," or
"Organization." NLP models and algorithms analyze the input text to
locate and extract relevant information, filling in these fields. For instance,
in an email about a meeting, a template might require the date, time, and
location of the meeting, as well as the list of attendees. The NLP system
would scan the text, identify and extract these details, and input them into
the appropriate fields within the template.

Techniques in Template Filling

Several methods can be used for template filling, ranging from rule-based
systems to more advanced machine learning and deep learning
approaches:

1. Rule-Based Systems: Early template-filling systems relied on


predefined rules or regular expressions to detect keywords and
extract information. These systems work well in predictable
environments, but they lack the flexibility to handle varied
language structures.
2. Named Entity Recognition (NER): NER is often used to identify
and categorize key information, such as names, locations, dates,
and organizations, in a text. For example, in a news article, NER
would help label “Paris” as a location or “Apple Inc.” as an
organization. This data can then be placed into a structured
template.
3. Dependency Parsing and Semantic Role Labeling: These
techniques analyze sentence structure and understand the
relationships between different words or phrases. They are useful
for identifying subject-object relationships or actions, which helps
ensure the extracted information is accurate and contextually
relevant.
4. Transformer-Based Models: With the advent of deep learning
models like BERT, GPT, and T5, template filling has become more
sophisticated. These models use large-scale language
understanding to handle context, ambiguity, and varying phrasing.
They allow NLP systems to handle more complex and varied
language structures, improving template-filling accuracy in real-
world applications.

Applications of Template Filling

Template filling is versatile and widely applicable across different fields:

 Medical and Clinical Documentation: Extracting patient details,


diagnoses, and treatment plans from clinical notes for easier access
and analysis.
 Customer Service and Chatbots: Filling in fields for inquiries
and responses, allowing bots to deliver relevant answers quickly
and accurately.
 Legal Document Analysis: Extracting clauses, dates, parties, and
obligations from lengthy contracts to enable easier searching and
referencing.
 Financial News Summarization: Pulling out stock information,
company names, and market movements from financial news
reports for financial analysts and trading algorithms.

Challenges in Template Filling

Despite its advantages, template filling faces a few challenges:

 Language Ambiguity: Variability in phrasing and sentence


structure can complicate data extraction, especially in unstructured
or informal texts.
 Contextual Understanding: Accurately filling templates requires
understanding nuances and contextual information, which simple
rule-based systems may struggle with.
 Domain-Specific Requirements: Different fields may have unique
terminologies and structures that require tailored NLP models to
achieve reliable results.
Recent Advances

The rise of transformer-based models has significantly improved template


filling, as these models can understand context and nuance at a deeper
level. These advancements allow for more accurate information
extraction, even in complex and variable language environments.
Additionally, transfer learning—training models on large datasets and
then fine-tuning them on domain-specific data—has enhanced the
adaptability of template-filling applications in specialized fields.

In summary, template filling in NLP has evolved from rule-based


methods to sophisticated AI-driven approaches, enabling more efficient
and accurate structuring of unstructured data across various industries. As
NLP continues to advance, template filling is expected to become even
more precise, adaptable, and widely applied, helping businesses and
individuals manage information more effectively.
Selectional Restrictions
Consider   the  two  interpretations   of:
I  want  to  eat  someplace   nearby.  
a) sensible:
Eat  is  intransitive  and  “someplace  nearby”  is  a  location  adjunct
b) Speaker  is  Godzilla
Eat  is  transitive  and  “someplace  nearby”  is  a  direct  object
How  do  we  know   speaker   didn’t  mean   b)    ?
Because  the  THEME of  eating  tends  to  be  something   edible
3
Selectional restrictions  are  associated  with  
senses
• The  restaurant   serves  green-­‐lipped   mussels.  
• THEME is  some  kind  of  food
• Which  airlines  serve   Denver?  
• THEME is  an  appropriate  location

4
Selectional restrictions  vary  in  specificity
I  often  ask  the  musicians  to  imagine  a  tennis  game.
To  diagonalize a  matrix   is  to  find  its  eigenvalues.  
Radon  is  an  odorless  gas  that  can’t   be  detected  by  human   senses.  

5
t consists of a single variable that stands for the event, a predicat
ntribution of a verb like=>eat might look like the following:
dish
f event, and variables=>and relations
nutriment, for the event
nourishment, roles. Ignoring t
nutrition...

ctures andRepresenting  
using thematic=> sroles
9e, x, y rather
=> food, nutrient
Eating(e) Agent(e, x) than
T deep
heme(e,
electional restrictions
^
substance
^ y)event roles, the
on of a verb like eat might look like
=> matter
th this representation, all we know about y, the
the following:
filler
=> physical entity
of the THEME role, is th
Instead  owith
s associated f  representing  
an Eating “event
eat”  athrough
s: => entity
the Theme relation. To stipulate t
ectional restriction
9e, x, y Eating(e)
that y must be ^ Agent(e,
something x)
edible,
^
Figure 22.6 Evidence from WordNet that hamburgers are edible.
T
weheme(e, y)
simply add a new term
t effect:
Just  add:
representation, all we know about y, the filler
9e, x, y Eating(e) ^ Agent(e, x) ^ T heme(e, y) ^ EdibleT hing(y)of the THEME
When a phrase like ate a hamburger is encountered, a semantic analyzer can
ro
iatedformwith an Eating
the following kind of event through the Theme relation. To
representation: sti
And  “eat  a  hamburger”   becomes
l restriction that y must be something edible, we simply add a ne
9e, x, y Eating(e) ^ Eater(e, x) ^ T heme(e, y) ^ EdibleT hing(y) ^ Hamburger(y)
:
But  this  assumes  
This representation we  reasonable
is perfectly have   a  large  
sincekthenowledge  
membership base  
of yoinf  the
facts  
category
9e,6x, y Eating(e)
Hamburger
about  iseconsistent Agent(e,
dible  ^things  
with its hx)
membership
and   ^ T inheme(e,
amburgers   and  wy)
the category ^ EdibleT
EdibleThing,
hatnot. hing(y)
assuming
Let’s  use  WordNet synsets to  specify  
selectional restrictions
• The  THEME of  eat  must  be WordNet synset {food,  nutrient}  
“any  substance  that  can  be  metabolized  by  an  animal  to  give  energy  and  build  tissue”

• Similarly
THEME of  imagine:  synset {entity}
THEME of  lift:  synset {physical  entity}
THEME of  diagonalize:  synset {matrix}  

• This  allows
imagine  a  hamburger    and    lift  a  hamburger,  
• Correctly   rules  out  
7 diagonalize a  hamburger.  
Selectional
Restrictions

Selectional Preferences
Selectional Preferences
• In  early   implementations,   selectional restrictions  were   strict  
constraints  (Katz  and  Fodor   1963)
• Eat  [+FOOD]
• But  it  was   quickly  realized   selectional constraints  are  really  
preferences (Wilks 1975)
• But  it  fell  apart  in  1931,  perhaps  because  people  realized  you  can’t  eat  gold  
for  lunch  if  you’re  hungry.  
• In  his  two  championship  trials,  Mr.  Kulkarni  ate  glass  on  an  empty  stomach,  
accompanied  only  by  water  and  tea.  
9
Selectional Association  (Resnik 1993)
• Selectional preference  strength:  amount   of  information   that  a  
predicate   tells  us  about  the  semantic   class  of  its  arguments.  
• eat  tells  us  a  lot  about  the  semantic  class  of  its  direct  objects
• be  doesn’t  tell  us  much
• The  selectional preference   strength  
• difference  in  information  between  two  distributions:  
P(c)  the  distribution  of  expected  semantic  classes  for  any  direct  object
P(c|v)  the  distribution  of  expected  semantic  classes  for  this  verb
• The  greater  the  difference,  the  more  the  verb  is  constraining  its  object
10
it that the direct object of the specific verb v will fall into semantic class c). The
greater
greater the the differencebetween
difference between these
these distributions,
distributions,thethe
more
moreinformation the verb
information the isverb is
giving us about possible objects. The difference between these two distributions can
giving us about possible objects. The difference between these two distributions can
relative entropy
ve entropy
KL divergence
Selectional preference  strength
be quantified by relative entropy, or the Kullback-Leibler divergence (Kullback and
be quantified by relative
Leibler, 1951). entropy, or the
The Kullback-Leibler Kullback-Leibler
or KL divergence D(P||Q)divergence
expresses(Kullback
the dif- and
divergence Leibler, 1951).
ference Thetwo
between Kullback-Leibler or KL Pdivergence
probability distributions and Q (we’llD(P||Q)
return to expresses
this when wethe dif-
ference between
discuss two probability
distributional models of distributions
meaning in P and17).
Chapter Q (we’ll return to this when we
• Relative  entropy,  or  the  Kullback-­‐Leibler divergence  is  the  difference  
discuss distributional models of meaning in Chapter 17).
between  two  distributions X P(x)
D(P||Q) = P(x) log (22.38)
X Q(x)
P(x)
x
D(P||Q) = P(x) log (22.38)
Q(x)
The selectional preference SR (v) usesxthe KL divergence to express how much
information, in bits, the verb v expresses about the possible semantic class of its
• Selectional
The preference:  
selectional
argument.
How  SRm(v)
preference uch  
information  
uses (in  bits)  
the KL divergence the  verb  
to express howexpresses  
much
about  the  semantic  
information, class  
in bits, the verb of  vits  argument
expresses about the possible semantic class of its
argument. SR (v) = D(P(c|v)||P(c))
X P(c|v)
22.7 • S ELECTIONAL R ESTRICTIONS 15
= P(c|v) log (22.39)
SR (v) = D(P(c|v)||P(c))
c
P(c)
X to the general P(c|v)
Selectional
•selectional Association  
as the relative o
contribution f  a
of  vthat
erb  
= w
class ith  a  c lass:  T he  
r elative  contribution  
selectional preference of the of  the  
P(c|v) log a particular class and verb(22.39)
association verb:Resnik then defines the selectional association of
class  to  the  general  preference  of  c the  verb P(c)
1 P(c|v)
11
selectional AR (v,
Resnik then defines the c) =
selectional P(c|v) log of a particular class and
association (22.40)
verb
association SR (v) P(c)
22.7 • S ELECTIONAL R ESTRICTI

as the relative contribution of that class to the general selectional prefere


Computing  Selectional Association
verb:
1 P(c|v)
AR (v, c) = P(c|v) log
• A  probabilistic  measure   of  the  strength  of  association  
SR (v) bP(c)
etween   a  
predicate   and  a  semantic   class  of  association
The selectional its  argument
is thus a probabilistic measure of the stren
sociation between a predicate and a class dominating the argument to the
• Parse  a  corpus Resnik estimates the probabilities for these associations by parsing a corp
• Count  all  the  times  eing
ach   allpthe
redicate  appears  
times each with  
predicate occurs each  with
argument  word word, and
each argument
that each word is a partial observation of all the WordNet concepts cont
• Assume  each  word  iword. s  a  partial  observation  of  all  
The following table from Resnik the  W ordNet
(1996) concepts  
shows some sample high
associated  with  that  selectional
word associations for verbs and some WordNet semantic classes of th
• Some  high  and  low  aobjects.
ssociations:
Direct Object Direct Object
Verb Semantic Class Assoc Semantic Class Assoc
read WRITING 6.80 ACTIVITY -.20
12
write WRITING 7.26 COMMERCE 0
see ENTITY 5.79 METHOD -0.01
Results  from  similar  models
Ó Séaghdha and  Korhonen (2012)

eat food#n#1, aliment#n#1, entity#n#1, solid#n#1, food#n#2


drink fluid#n#1, liquid#n#1, entity#n#1, alcohol#n#1, beverage#n#1
appoint individual#n#1, entity#n#1, chief#n#1, being#n#2, expert#n#1
publish abstract entity#n#1, piece of writing#n#1, communication#n#2, publication#n#1
Table 2: Most probable cuts learned by WN-C UT for the object argument of selected verbs

Verb-object Noun-noun Adjective-noun


13 Seen Unseen Seen Unseen Seen Unseen
alternative to using selectional association between a verb and the WordNet class
its arguments, Instead  
is to simply o use f  uthesing   classes,
conditional probability of an argument word
en a predicate verb. This simple model of selectional preferences can be used to
a   s impler   m odel   o f   s electional
ectly modeling the strength of association of one verb (predicate) with one noun
association
gument).
Model   just  
The• conditional the  association  
probability model can obef  p redicate  byv parsing
computed with  a  anvery
oun  large
n cor-
s (billions(one   noun,  as  
of words), andopposed  to  
computing the  co-occurrence
whole  semantic  class  in  
counts: how WordNet)
often a given
rb occurs•with a given
Parse   noun
a  huge   in a given relation. The conditional probability of an
corpus
gument noun givenhow  
• Count   a verb fora  anoun  
often   particular relation
n  occurs   P(n|v,r  r)
in  relation   canverb  
with   thenv: be used as a
ectional preference metric for that
log count(n,v,r)
pair of words (Brockmann and Lapata, 2003):
• Or  the  probability: (
C(n,v,r)
if C(n, v, r) > 0
P(n|v, r) = C(v,r)
0 otherwise

The14inverse probability P(v|n, r) was found to have better performance in some


ses (Brockmann and Lapata, 2003):
Evaluation  from  Bergsma,  Lin,  Goebel
Verb Plaus./Implaus. Resnik Dagan et al. Erk MI
see friend/method 5.79/-0.01 0.20/1.40* 0.46/-0.07 1.11/
read article/fashion 6.80/-0.20 3.00/0.11 3.80/1.90 4.00/
find label/fever 1.10/0.22 1.50/2.20* 0.59/0.01 0.42/
hear story/issue 1.89/1.89* 0.66/1.50* 2.00/2.60* 2.99/
write letter/market 7.26/0.00 2.50/-0.43 3.60/-0.24 5.06/
urge daughter/contrast 1.14/1.86* 0.14/1.60* 1.10/3.60* -0.95
warn driver/engine 4.73/3.61 1.20/0.05 2.30/0.62 2.87/
judge contest/climate 1.30/0.28 1.50/1.90* 1.70/1.70* 3.90/
teach language/distance 1.87/1.86 2.50/1.30 3.60/2.70 3.53/
show sample/travel 1.44/0.41 1.60/0.14 0.40/-0.82 0.53/
expect visit/mouth 0.59/5.93* 1.40/1.50* 1.40/0.37 1.05/
answer request/tragedy 4.49/3.88 2.70/1.50 3.10/-0.64 2.93/
recognize author/pocket 0.50/0.50* 0.03/0.37* 0.77/1.30* 0.48/
repeat comment/journal 1.23/1.23* 2.30/1.40 2.90/— 2.59/
understand concept/session 1.52/1.51 2.70/0.25 2.00/-0.28 3.96/
15 remember reply/smoke 1.31/0.20 2.10/1.20 0.54/2.60* 1.13/
Selectional
Restrictions

Conclusion
Summary:  Selectional Restrictions
• Two  classes  of  models   of  the  semantic   type  constraint  that  a  
predicate   places  on  its  argument:
• Represent  the  constraint  between  predicate  and  WordNet class
• Represent  the  constraint  between  predicate  and  a  word
• One  fun  recent  use  case:   detecting  metonomy (type  coercion)
• Coherent  with  selectional restrictions: Pustejovsky et  al  (2010)
The  spokesman  denied  the  statement  (PROPOSITION).  
The  child  threw  the  stone  (PHYSICAL  OBJECT)  
• Coercion:
The  president  denied  the  attack  (EVENT  →  PROPOSITION).  
17
The  White  House  (LOCATION  →  HUMAN)  denied  the  statement.  

You might also like