Natural Language Processing Inside Pages 2
Natural Language Processing Inside Pages 2
Natural Language Processing Inside Pages 2
MCA
(TWO YEARS PATTERN)
SEMESTER - II (CBCS)
NATURAL LANGUAGE
PROCESSING
SUBJECT CODE: MCAE251
© UNIVERSITY OF MUMBAI
Published by : Director,
Institute of Distance and Open Learning,
University of Mumbai,
Vidyanagari,Mumbai - 400 098.
1. Introduction.............................................................................................................01
SEMESTER - II (CBCS)
SYLLABUS
1.0 OBJECTIVES
Natural language Processing should start with some input and ends with
effective and accurate output. The possible inputs to an NLP are quite
broad. Language can be in a variety of forms such as the paragraphs of
text, commands that are typed directly to a computer system, etc. The
input language might be given to the system a sentence at a time or it
might be multiple sentences all at once. The inputs for natural language
processor can be typed input, message text or speech. NLP systems have
some kind of pre-processor. Data preprocessing involves preparing and
cleaning text data for machines to be able to analyze it. Preprocessing puts
data in workable form and highlights features in the text that an algorithm
can work with. Preprocessing does dictionary lookup, morphological
analysis, lexical substitutions, and part-of-speech assignment.
22
There are variety of output can be generated by the system. The output Introduction
from a system that incorporates an NLP might be an answer from a
database, a command to change some data in a database, a spoken
response, Semantics, Part of speech, Morphology of word, Semantics of
the word or some other action on the part of the system. Remember these
are the output of the system as a whole, not the output of the NLP
component of the system.
Figure 1.2.1 shows a generic NLP system and its input-output variety.
Figure 1.2.2 shows a typical view of what might be inside the NLP box of
Figure 1.2.1. Each of the boxes in Figure 1.2.2 represents one of the types
of processing that make up an NLP analysis.
3
Natural Language Processing 1.3 LEVELS OF NLP
NLU NLG
NLU explains the meaning behind NLG generates the natural language
the written text or speech in natural using machines.
language
NLU draws facts from the natural NLG uses the insights generated from
language using various tools and parsers, POS tags, etc. to generate the
technologies such as parsers, POS natural language.
taggers, etc,
The NLP can broadly be divided into various levels as shown in Figure
1.3.1.
44
Introduction
66
Pragmatic Knowledge Introduction
88
The meaning can be, For every man there is a woman and also it can Introduction
be there is one particular woman who is loved by every man.
• Attachment Ambiguity
A sentence has attachment ambiguity if a constituent fits more than
one position in a parse tree. Attachment ambiguity arises from
uncertainty of attaching a phrase or clause to a part of a sentence.
Consider the example:
The man saw the girl with the telescope.
It is ambiguous whether the man saw her through his telescope or he
saw a girl carrying a telescope. The meaning is dependent on
whether the preposition ‘with’ is attached to the girl or the man.
Consider the example:
Buy books for children and attach to the verb buy
Preposition Phrase ‘for children’ can be either adverbial or adjectival
and attach to the object noun books.
3. Semantic Ambiguity: This type of ambiguity is typically related to
the interpretation of sentence. Even after the syntax and the
meanings of the individual words have been resolved, there are two
ways of reading the sentence.
Consider example: "Seema loves mother and Sriya does too"
The interpretations can be Sriya loves Seema's mother Sriya likes
her own mother.
Semantic ambiguities born from the fact that generally a computer is
not in a position to distinguishing what is logical from what is not.
Consider the example: “The car hit the pole while it was moving"
The interpretations can be
9
Natural Language Processing 4. Discourse Ambiguity: Discourse level processing needs shared
world or shared knowledge and interpretation is carried out using
this context. Anaphoric ambiguity under discourse level.
• Rule-Based Approaches
• Mak Model Approaches
• Maxman Entropy Approaches
• HMI Based Taggers
3. Machine Learning Approaches
11
Natural Language Processing 1) Lexical Analysis:
It is the first stage in NLP. It is also known as morphological
analysis. It consists of identifying and analyzing the structure of
words. Lexicon of a language means the collection of phrases and
words in a language. Lexical analysis is dividing the whole chunk
of text into words, sentences, and paragraphs
2) Syntactic Analysis:
Syntactic analysis consists of analysis of words in the sentence for
grammar and ordering words in a way that shows the relationship
among the words. For example the sentence such as “The school
goes to boy” is rejected by English syntactic analyzer.
3) Semantic Analysis:
Semantic analysis is a structure created by the syntactic analyzer
which assigns meanings. This component transfers linear sequences
of words into structures. It shows how the words are associated with
each other. Semantics focuses only on the literal meaning of words,
phrases, and sentences. This only draws the dictionary meaning or
the real meaning from the given text. The structures assigned by the
syntactic analyzer always have assigned meaning
The text is checked for meaningfulness. It is done by mapping
syntactic structure and objects in the task domain. E.g. “colorless
green idea”. This would be rejected by the Symantec analysis as
colorless here; green doesn’t make any sense.
4) Discourse Integration:
The meaning of any sentence depends upon the meaning of the
sentence just before it. Furthermore, it also brings about the meaning
of immediately succeeding sentence. For example, “He wanted
that”, in this sentence the word “that” depends upon the prior
discourse context.
5) Pragmatic Analysis:
Pragmatic analysis concerned with the overall communicative and
social content and its effect on interpretation. It means abstracting or
deriving the meaningful use of language in situations. In this
analysis, what was said is reinterpreted on what it truly meant. It
contains deriving those aspects of language which necessitate real
world knowledge.
E.g., “close the window?” should be interpreted as a request instead
of an order.
12
12
Introduction
• Pull apart the word “Bill’s” into proper noun “Bill” and the
possessive suffix “’s”.
• Recognize the sequence “.init” as a file extension that is functioning
as an adjective in the sentence.
This process will usually assign syntactic categories to all the words in the
sentence. Consider the word “prints”. This word is either a plural noun or
a third person singular verb (he prints).
13
Natural Language Processing Syntactic analysis:
This method examines the structure of a sentence and performs detailed
analysis of the sentence and semantics of the statement. In order to
perform this, the system is expected to have through knowledge of the
grammar of the language. The basic unit of any language is sentence,
made up of group of words, having their own meanings and linked
together to present an idea or thought. Apart from having meanings,
words fall under categories called parts of speech. In English languages,
there are eight different parts of speech. They are nouns, pronoun,
adjectives, verb, adverb, prepositions, conjunction and interjections.
In English language, a sentence S is made up of a noun phrase (NP) and a
verb phrase (VP), i.e.
S=NP+VP
The given noun phrase (NP) normally can have an article or delimiter (D)
or an adjective (ADJ) and the noun (N), i.e.
NP=D+ADJ+N
Also a noun phrase may have a prepositional phrase (PP) which has a
preposition (P), a delimiter (D) and the noun (N), i.e.
PP=D+P+N
The verb phrase (VP) has a verb (V) and the object of the verb. The
object of the verb may be a noun (N) and its determiner, i.e.
VP=V+N+D
These are some of the rules of the English grammar that helps one to
construct a small parser for NLP.
Discourse Integration:
The meaning of an individual sentence may depend on the sentences that
precede it. And also, may influence the meanings of the sentences that
follow it
Specifically we do not know whom the pronoun “I” or the proper noun
“Bill” refers to. To pin down these references requires an appeal to a
model of the current discourse context, from which we can learn that the
current user is USER068 and that the only person named “Bill” about
whom we could be talking is USER073.Once the correct referent for Bill
is known, we can also determine exactly which file is being referred to.
15
Natural Language Processing
Pragmatic Analysis
Moreover, The structure representing what said reinterpreted to determine
what was actually meant. The final step toward effective understanding is
to decide what to do as a results. One possible thing to do is to record what
was said as a fact and be done with it. For some sentences, whose intended
effect is clearly declarative, that is precisely correct thing to do. But for
other sentences, including this one, the intended effect is different. We can
discover this intended effect by applying a set of rules that characterize
cooperative dialogues. The final step in pragmatic processing is to
translate, from the knowledge based representation to a command to be
executed by the system.
Results of each of the main processes combine to form a natural language
system.
The results of the understanding process are lpr /ali/stuff.init. All of the
processes are important in a complete natural language understanding
system. Not all programs are written with exactly these components.
Sometimes two or more of them collapsed. Doing that usually results in a
system that is easier to build for restricted subsets of English but one that
is harder to extend to wider coverage
• Machine Translation
Machine translation is basically used to convert text or speech from
one natural language to another natural language. Machine
translation, an integral part of Natural Language Processing where
translation is done from source language to target language
preserving the meaning of the sentence. Example: Google
Translator
• Information Retrieval
It refers to the human-computer interaction (HCI) that happens when
we use a machine to search a body of information for information
objects (content) that match our search query. A Person's query is
matched against a set of documents to find a subset of 'relevant'
document. Examples: Google, Yahoo, Altavista, etc.
• Text Categorization
Text categorization (also known as text classification or topic
spotting) is the task of automatically sorting a set of documents into
categories (clusters).
Uses of Text Categorization
• Filtering of content
• Spam filtering Identification of document content
• Survey coding
• Information Extraction -
Identify specific pieces of information in unstructured or semi-structured
textual document. Transform unstructured information in a corpus of
documents or web pages into a structured database.
18
18
Applied to different types of text: Introduction
• Newspaper articles
• Web pages
• Scientific articles
• Newsgroup messages
• Classified ads
• Medical notes
• Grammar Correction-
In word processor software like MS-word, NLP techniques are widely
used for spelling correction & grammar check.
• Sentiment Analysis-
Sentiment Analysis is also known as opinion mining. It is mainly
used on the web to analyse the behaviour, attitude, and emotional
state of the sender. This application is implemented through a
combination of NLP) and statistics by assigning the values to the
text (natural, positive or negative), identify the mood of the context
(sad, happy, angry, etc.)
• Question-Answering systems-
Question Answering focuses on constructing systems that
automatically answer the questions asked by humans in a natural
language. It presents only the requested information instead of
searching full documents like search engine. The basic idea behind
the QA system is that the users just have to ask the question and the
system will retrieve the most appropriate and correct answer for that
question.
E.g.
Q. “What is the birth place of Shree Krishna?
A. Mathura
• Spam Detection
To detect unwanted e-mails getting to a user's inbox, spam detection
is used
• Chatbot
Chatbot is one of the most important applications of NLP. It is used
by many companies to provide the customer's chat services.
• Speech Recognition-
Speech recognition is used for converting spoken words into text. It
is used in applications, such as mobile, home automation, video
recovery, dictating to Microsoft Word, voice biometrics, voice user
interface, and so on.
19
Natural Language Processing • Text summarization
This task aims to create short summaries of longer documents while
retaining the core content and preserving the overall meaning of the
text.
1.9 SUMMARY
20
20 Ans : b
1.11 TRUE OR FALSE Introduction
LIST OF REFERENCES
1. https://www.guru99.com/nlp-tutorial.html
2. https://www.datascienceprophet.com/different-levels-of-nlp/
3. https://www.slideshare.net/HareemNaz/natural-languageprocessing
4. “Natural Language Processing”, Staredu Solutions.
5. Dan Jurafsky and James Martin. “Speech and Language Processing:
An Introduction to Natural Language Processing, Computational
Linguistics and Speech Recognition”, Prentice Hall, Second
Edition, 2009.
6. http://www.deepsky.com/~merovech/voynich/
voynich_manchu_reference_materials/PDFs/jurafsky_martin.pdf
7. Christopher D.Manning and Hinrich Schutze, ― Foundations of
Statistical Natural Language Processing ―, MIT Press, 1999.
8. https://www.cs.vassar.edu/~cs366/docs/
Manning_Schuetze_StatisticalNLP.pdf
9. https://slideplayer.com/slide/4211188/
21
Natural Language Processing
2
WORD LEVEL ANALYSIS - I
Unit Structure
2.0 Objectives
2.0.1 Morphology Analysis
2.0.2 Survey of English Morphology
2.0.3 Inflectional Morphology
2.0.4 Derivational Morphology
2.0.5 Difference of inflectional and derivational morphology
2.0.6 Stemming and Lemmatization
2.1 Summary
2.2 Multiple Choice Question Answers
2.3 List of References
2.1 OBJECTIVES
Figure 2.1.2
There are three morphemes, each carrying a certain amount of
meaning. un means "not", while ness means "being in a state or
condition". Happy is a free morpheme because it can appear on its own (as
a "word" in its own right). Bound morphemes have to be attached to a free
morpheme, and so cannot be words in their own right. Thus you can't have
sentences in English such as "Jason feels very un ness today".
We can usefully divide morphemes into two broad classes of morphemes:
stems and affixes.
• The stem is the “main” morpheme of the word, supplying the main
meaning
• The affixes add “additional” meanings of various kinds.
23
Natural Language Processing
29
Natural Language Processing Lemmatization
Lemmatization usually refers to doing things properly with the use of a
vocabulary and morphological analysis of words, normally aiming to
remove inflectional endings only and to return the base or dictionary form
of a word, which is known as the lemma
Example: Lemmatization will map gone, going and went->Go.
2.1 SUMMARY
31
Natural Language Processing iii. _________is the study of the internal structure of words, and the
rules by which words are formed
a) Syntax
b) Semantics
c) Pragmatics
d) Morphing
Ans : d
iv. Lemma for the word "studies" is_______________
a) studies
b) ies
c) studi
d) study
Ans : d
v. Stemming for the word "studies" is_______________
a) studies
b) ies
c) studi
d) study
Ans : c
TRUE OR FALSE
i. Derivational morphemes form new lexemes conveying new
meanings
True
ii. Lemmatization involves resolving words to their dictionary form.
True
iii. Inflectional morphemes change the grammatical category (part of
speech) of a word.
False
SAMPLE QUESTIONS
1. Define morpheme, stem and affixes.
2. Explain stemming and lemmatization with example
3. Explain with examples Inflectional morphology & Derivational
morphology
4. State differences between inflectional and derivational morphemes
32
32
2.3 LIST OF REFERENCES Word level analysis - I
33
Natural Language Processing
3
WORD LEVEL ANALYSIS - II
Unit Structure
3.0 Objectives
3.1 Regular expression
3.2 Finite Automata
3.3 Finite State Morphological Parsing
3.4 Building a Finite State Lexicon
3.5 Finite State Transducers (FST)
3.6 Morphological Parsing with FST
3.7 Lexicon free FST Porter Stemmer
3.8 N –Grams
3.9 N-Gram Language Model
3.10 Summary
3.11 Multiple Choice Question Answers
3.12 True or False
3.13 Sample Questions
3.14 List of References
3.0 OBJECTIVES
34
34
Word Level Analysis - II
• X, Y
• X.Y(Concatenation of XY)
• X+Y (Union of X and Y)
• X*, Y* (Kleen Closure of X and Y)
4. If a string is derived from the above rules then that would also be a
regular expression.
How can Regular Expressions be used in NLP?
In NLP, we can use Regular expressions at many places such as,
1. To validate data fields. For example, URLs, dates, email address,
abbreviations, etc.
2. To filter a particular text from the whole corpus. For
example, disallowed websites, spam etc.
3. To identify particular strings in a text. For example, token
boundaries
4. To convert the output of one processing component into the format
required for a second component
Basic Regular Expression Patterns
Brackets ([ ]):
They are used to specify a disjunction of characters.
35
Natural Language Processing
For Examples,
/[A-Z]/ → matches an uppercase letter
/[a-z]/ → matches a lowercase letter
/[0–9]/ → matches a single digit
For Examples,
/[cC]hirag/ → Chirag or chirag
/[xyz]/ → ‘x’, ‘y’, or ‘z’
/[1234567890]/ → any digit
Here slashes represent the start and end of a particular expression.
Dash (-):
They are used to specify a range.
Caret (^):
They can be used for negation or just to mean ^.
For Examples,
/[ˆa-z]/ → not an lowercase letter
/[ˆCc]/ → neither ‘C’ nor ‘c’
/[ˆ.]/ → not a period
/[cˆ]/ → either ‘c’ or ‘ˆ’
/xˆy/ → the pattern ‘xˆy’
36
36
Word Level Analysis - II
Period (.):
Used to specify any character between two expressions.
/beg.n/ → Match any character between which fits between beg and n
such as begin, begun
Anchors :
These are special characters that help us to perform string operations
either at the beginning or at the end of text input. They are used to assert
something about the string or the matching process. The most common
anchors are the caret ^ and the dollar sign $.
i) Caret character ‘^’
It specifies the start of the string. For a string to match the pattern,
the character followed by the ‘^’ in the pattern should be the first
character of the string.
For Examples,
^The: Matches a string that starts with ‘The’
ii) Dollar character ‘$’
It specifies the end of the string. For a string to match the pattern,
the character that precedes the ‘$’ in the pattern should be the last
character in the string.
For Examples,
end$: Matches a string that ends with ‘end’
^The end$’: Exact string match (starts and ends with ‘The end’)
roar: Matches a string that has the text roar in it.
37
Natural Language Processing
Quantifiers:
They allow us to mention and control over how many times a specific
character(s) pattern should occur in the given text.
The most common Quantifiers are: *, +, ? and { }
For Examples,
abc*: matches a string that has 'ab' followed by zero or more 'c'.
abc+: matches 'ab' followed by one or more 'c'
abc?: matches 'ab' followed by zero or one 'c'
abc{2}: matches 'ab' followed by 2 'c'
abc{2, }: matches 'ab' followed by 2 or more 'c'
abc{2, 5}: matches 'ab' followed by 2 upto 5 'c'
a(bc)*: matches 'a' followed by zero or more copies of the sequence 'bc'
The term automata, derived from the Greek word "αὐτόματα. Automaton
may be defined as an abstract self-propelled computing device that follows
a predetermined sequence of operations automatically. An automaton
having a finite number of states is called a Finite Automaton (FA) or
Finite State automata (FSA).
The finite-state automaton is not only the mathematical device used to
implement regular expressions, but also one of the most significant tools
of computational linguistics. Variations of automata such as finite- state
transducers, Hidden Markov Models, and N-gram grammars are important
components of the speech recognition and synthesis, spell-checking, and
information-extraction applications.
Mathematically, an automaton can be represented by a 5-tuple (Q, Σ, δ,
q0, F),
where −
State 0 1
q0 q0 q1
q1 q0 q2
q2 q0 q0
q3 q2 q1
q2
State 0 1
a a, b b
b c a, c
c b, c c
41
Natural Language Processing To build morphological parser, we need at least following database:
1. Lexicon: List of stems, and affixes, plus additional information
about them, like +N, +V.
2. Morphotactics rules: Rules about ordering of morphemes in a word,
e.g. -ed is followed after a verb (e.g., worked, studied), un (undo)
precede a verb, for example, unlock, untie, etc.
3. Orthographic rules (spelling): For combining morphemes, e.g., city+
-s gives cities and not citys.
A similar model for English verbal inflection might look like Fig. 3.4.2
Figure 3.4.5 Expanded FSA for a few English nouns with their
inflection
44
44
Expanded FSA for a few English nouns with their inflection is shown in Word Level Analysis - II
Fig, 3.4.5. Note that this automaton will incorrectly accept the input foxs.
46
46
Word Level Analysis - II
Fig. 3.5.2, for example, shows the composition of [a:b]+ with [b:c]+ to
produce [a:c]+.
Figure 3.5.2 The composition of [a: b]+ with [b:c]+ to produce [a:c]+.
The projection of an FST is the FSA that is produced by extracting only
one side of the relation. We can refer to the projection to the left or upper
side of the relation as the upper or first projection and the projection to the
lower or right side of the relation as the lower or second projection
Now we see the task of morphological parsing. If we give the input cats,
we’d like to output cat +N +Pl, telling us that cat is a plural noun
In the finite-state morphology paradigm, we represent a word as a
correspondence between a lexical level (which represents a concatenation
of morphemes making up a word) and the surface level (which represents
the concatenation of letters which make up the actual spelling of the
word). Fig. 3.6.1 shows these two levels for (English) cats.
48
48
reg-noun irreg-pl-noun irreg-sg-noun Word Level Analysis - II
fox cat g o:e o:e se goose sheep
dog sheep mouse
m o:i u:ǫ s:c e
3.8 N-GRAMS
50
50
Now which of these three N-grams have you seen quite frequently? Word Level Analysis - II
Probably, “San Francisco” and “The Three Musketeers”. On the other
hand, you might not have seen “She stood up slowly” that frequently.
Basically, “She stood up slowly” is an example of an N-gram that does not
occur as often in sentences as Examples 1 and 2.
Now if we assign a probability to the occurrence of an N-gram or the
probability of a word occurring next in a sequence of words, it can be very
useful. Why?
First of all, it can help in deciding which N-grams can be chunked together
to form single entities (like “San Francisco” chunked together as one
word, “high school” being chunked as one word).
It can also help make next word predictions. Say you have the partial
sentence “Please hand over your”. Then it is more likely that the next
word is going to be “test” or “assignment” or “paper” than the next word
being “school”.
It can also help to make spelling error corrections. For instance, the
sentence “drink cofee” could be corrected to “drink coffee” if you knew
that the word “coffee” had a high probability of occurrence after the word
“drink” and also the overlap of letters between “cofee” and “coffee” is
high.
51
Natural Language Processing Predictive text input systems can guess what you are typing and give
choices on how to complete it.
Here N-Gram Models ,Estimate probability of each word given prior
context.
• Unigram: P(phone)
• Bigram: P(phone | cell)
• Trigram: P(phone | your cell)
The Markov assumption is the presumption that the future behaviour of a
dynamical system only depends on its recent history. In particular, in a
kth-order Markov model, the next state only depends on the k most recent
states, therefore an N-gram model is a (N-1)-order Markov model.
N-Gram Model Formulas
Word sequences
Bigram approximation
N-gram approximation
Estimating Probabilities
N-gram conditional probabilities can be estimated from raw text based on
the relative frequency of word sequences.
52
52
Bigram: Word Level Analysis - II
N-gram:
54
54
v. ________is used to remove the suffixes from an English word and Word Level Analysis - II
obtain its stem which becomes very useful in the field of
Information Retrieval (IR).
a) HMM Stemmer
b) Porter Stemmer
c) Markov Stemmer
d) Bert Stemmer
Ans : b
55
Natural Language Processing
4
SYNTAX ANALYSIS - I
Unit Structure
4.0 Syntax analysis
4.1 Objective
4.2 Introduction
4.3 Define Syntax analysis
4.3.1 Part-Of-Speech Tagging
4.3.2 Tag set for English (Penn Treebank)
4.4 Rule based POS Tagging
4.4.1 Stochastic POS Tagging
4.5 Issues-Multiple tags & words
4.5.1 Introduction to CFG
4.6 Sequence labeling
4.6.1 Hidden Markov Model (HMM)
4.6.2 Maximum Entropy
4.1 Objective
This chapter would make you to give the idea of the goals of NLP.
Syntax Analysis is a second phase of the compiler design process in which
the given input string is checked for the confirmation of rules and structure
of the formal grammar. It analyses the syntactical structure and checks if
the given input is in the correct syntax of the programming language or
not.
57
Natural Language Processing Why do we need Parsing?
A parse also checks that the input string is well-formed, and if not, reject
it.
• Top-Down Parsing,
• Bottom-Up Parsing
Top-Down Parsing:
In the top-down parsing construction of the parse tree starts at the root and
then proceeds towards the leaves.
Two types of Top-down parsing are:
1. Predictive Parsing:
Predictive parse can predict which production should be used to
replace the specific input string. The predictive parser uses look-
ahead point, which points towards next input symbols. Backtracking
is not an issue with this parsing technique. It is known as LL(1)
Parser
2. Recursive Descent Parsing:
This parsing technique recursively parses the input to make a prase
tree. It consists of several small functions, one for each nonterminal
in the grammar.
Bottom-Up Parsing:
In the bottom up parsing in compiler design, the construction of the parse
tree starts with the leave, and then it processes towards its root. It is also
called as shift-reduce parsing. This type of parsing in compiler design is
created with the help of using some software tools.
Error – Recovery Methods
Common Errors that occur in Parsing in System Software
• In the case when the parser encounters an error, it helps you to take
corrective steps. This allows rest of inputs and states to parse ahead.
• For example, adding a missing semicolon is comes in statement
mode recover method. However, parse designer need to be careful
while making these changes as one wrong correction may lead to an
infinite loop.
Panic-Mode recovery
• In the case when the parser encounters an error, this mode ignores
the rest of the statement and not process input from erroneous input
to delimiter, like a semi-colon. This is a simple error recovery
method.
• In this type of recovery method, the parser rejects input symbols one
by one until a single designated group of synchronizing tokens is
found. The synchronizing tokens generally using delimiters like or.
Phrase-Level Recovery:
60
60
Rules of Form Grammar Syntax Analysis - I
• The non-terminal symbol should appear to the left of the at least one
production
• The goal symbol should never be displayed to the right of the::= of
any production
• A rule is recursive if LHS appears in its RHS
Notational Conventions
Notational conventions symbol may be indicated by enclosing the element
in square brackets. It is an arbitrary sequence of instances of the element
which can be indicated by enclosing the element in braces followed by an
asterisk symbol, { … }*.
It is a choice of the alternative which may use the symbol within the single
rule. It may be enclosed by parenthesis ([,] ) when needed.
Two types of Notational conventions area Terminal and Non-terminals
1. Terminals:
61
Natural Language Processing Grammar Derivation
Grammar derivation is a sequence of grammar rule which transforms the
start symbol into the string. A derivation proves that the string belongs to
the grammar’s language.
Left-most Derivation
When the sentential form of input is scanned and replaced in left to right
sequence, it is known as left-most derivation. The sentential form which is
derived by the left-most derivation is called the left-sentential form.
Right-most Derivation
Rightmost derivation scan and replace the input with production rules,
from right to left, sequence. It’s known as right-most derivation. The
sentential form which is derived from the rightmost derivation is known as
right-sentential form.
Syntax vs. Lexical Analyser
Syntax Analyser
The syntax analyser mainly deals with recursive constructs of the
language.
The syntax analyser works on tokens in a source program to recognize
meaningful structures in the programming language.
It receives inputs, in the form of tokens, from lexical analysers.
Lexical Analyser
The lexical analyser eases the task of the syntax analyser.
The lexical analyser recognizes the token in a source program.
It is responsible for the validity of a token supplied by the syntax analyser
Disadvantages of using Syntax Analysers
63
Natural Language Processing II. Why NLP is difficult
Human language is special for several reasons. It is specifically
constructed to convey the speaker/writer's meaning. It is a complex
system, although little children can learn it pretty quickly.
Another remarkable thing about human language is that it is all about
symbols. According to Chris Manning, a machine learning professor at
Stanford, it is a discrete, symbolic, categorical signaling system. This
means we can convey the same meaning in different ways (i.e., speech,
gesture, signs, etc.) The encoding by the human brain is a continuous
pattern of activation by which the symbols are transmitted via continuous
signals of sound and vision.
Understanding human language is considered a difficult task due to its
complexity. For example, there is an infinite number of different ways to
arrange words in a sentence. Also, words can have several meanings and
contextual information is necessary to correctly interpret sentences. Every
language is more or less unique and ambiguous. Just take a look at the
following newspaper headline "The Pope’s baby steps on gays." This
sentence clearly has two very different interpretations, which is a pretty
good example of the challenges in NLP.
Note that a perfect understanding of language by a computer would result
in an AI that can process the whole information that is available on the
internet, which in turn would probably result in artificial general
intelligence.
NLP is sometimes compared to 'computational linguistics', although NLP
is considered more applied. Nowadays, alternative terms are often
preferred, such as 'Language Technology' or 'Language Engineering'. The
term 'language' is often used in contrast to 'speech' (e.g. Speech and
Language Technology). However, I will simply refer to NLP and use the
term in a broader sense.
66
66
How Natural Language Processing is used Syntax Analysis - I
NLP is used in a variety of software and various use cases have been
identified as being solvable by deploying NLP models. Some of these
examples are:
68
68
5
SYNTAX ANALYSIS - II
Unit Structure
5.0 Part-Of-Speech Tagging
5.1 What are the parts of Speech?
5.2 Tag set for English (Penn Treebank)
5.3 Rule-based POS Tagging
5.4 Issues-Multiple tags & words
5.5 Sequence labeling
POS tagging words often have more than one POS: - The back door
(adjective) - On my back (noun) - Winning back the voters (particle) -
Promised to support the bill (verb) The POS tagging task: determine the
POS tag for all tokens in a sentence. Due to ambiguity (and unknown
words), we cannot rely on a dictionary to look up the correct POS tags.
Creating a POS Tagger :To deal with ambiguity and coverage, POS
taggers rely on learned models. For a new language (or domain) Step 0:
Define a POS tag set Step 1: Annotate a corpus with these tags For a well-
researched language (and domain):
Step 1: Obtain a POS -tagged corpus For any language....:
Step 2: Choose a POS tagging model (e.g., an HMM) Step 3: Train your
model on your training corpus Step 4: Evaluate your model on your test
corpus
Define a tag set We need to define an inventory of labels for the word
classes (i.e., the tag set) - Most taggers are based on models that need to be
trained on annotated (tagged) corpora.
- Evaluation also requires annotated corpora. - Since annotation by
humans is expensive and time-consuming, tag sets used in a few existing
corpora are becoming the de facto standard. - Tag sets must capture
semantically or syntactically important distinctions that can be easily
made by trained human annotators.
70
70
What is Treebank NLP? Syntax Analysis - II
71
Natural Language Processing 5.3 Rule-based POS Tagging
• Context-pattern rules
• Or, as Regular expression compiled into finite-state automata,
intersected with lexically ambiguous sentence representation.
We can also understand Rule-based POS tagging by its two-stage
architecture −
72
72
Word Frequency Approach Syntax Analysis - II
• Start with the solution − The TBL usually starts with some
solution to the problem and works in cycles.
73
Natural Language Processing • Most beneficial transformation chosen − In each cycle, TBL will
choose the most beneficial transformation.
• Apply to the problem − The transformation chosen in the last step
will be applied to the problem.
The algorithm will stop when the selected transformation in step 2 will not
add either more value or there are no more transformations to be selected.
Such kind of learning is best suited in classification tasks.
Advantages of Transformation-based Learning (TBL)
The advantages of TBL are as follows −
• We learn small set of simple rules and these rules are enough for
tagging.
• Development as well as debugging is very easy in TBL because the
learned rules are easy to understand.
• Complexity in tagging is reduced because in TBL there is interlacing
of machinelearned and human-generated rules.
• Transformation-based tagger is much faster than Markov-model
tagger.
Disadvantages of Transformation-based Learning (TBL)
The disadvantages of TBL are as follows −
74
74
Example Syntax Analysis - II
These probabilities don’t depend on the position in the sentence (i), but are
defined over word and tag types.
With subscripts i,j,k, to index word/tag types, they become P(ti | tj), P(ti | tj,
tk), P(wi | tj)
76
76
Syntax Analysis - II
77
Natural Language Processing We assumed that there are two states in the HMM and each of the state
corresponds to the selection of different biased coin. Following matrix
gives the state transition probabilities −
A=[a11a21a12a22]A=[a11a12a21a22]
Here,
• aij = probability of transition from one state to another from i to j.
• a11 + a12 = 1 and a21 + a22 =1
• P1 = probability of heads of the first coin i.e. the bias of the first
coin.
• P2 = probability of heads of the second coin i.e. the bias of the
second coin.
We can also create an HMM model assuming that there are 3 coins or
more.
This way, we can characterize HMM by the following elements −
• N, the number of states in the model (in the above example N =2,
only two states).
• M, the number of distinct observations that can appear with each
state in the above example M = 2, i.e., H or T).
• A, the state transition probability distribution − the matrix A in the
above example.
• P, the probability distribution of the observable symbols in each
state (in our example P1 and P2).
• I, the initial state distribution.
Use of HMM for POS Tagging
The POS tagging process is the process of finding the sequence of tags
which is most likely to have generated a given word sequence. We can
model this POS process by using a Hidden Markov Model (HMM),
where tags are the hidden states that produced the observable
output, i.e., the words.
Mathematically, in POS tagging, we are always interested in finding a tag
sequence (C) which maximizes −
P (C|W)
Where,
C = C1, C2, C3... CT
W = W1, W2, W3, WT
On the other side of coin, the fact is that we need a lot of statistical data to
reasonably estimate such kind of sequences. However, to simplify the
problem, we can apply some mathematical transformations along with
some assumptions.
78
78
The use of HMM to do a POS tagging is a special case of Bayesian Syntax Analysis - II
interference. Hence, we will start by restating the problem using Bayes’
rule, which says that the above-mentioned conditional probability is equal
to −
(PROB (C1, CT) * PROB (W1, WT | C1, CT)) / PROB (W1, WT)
We can eliminate the denominator in all these cases because we are
interested in finding the sequence C which maximizes the above value.
This will not affect our answer. Now, our problem reduces to finding the
sequence C that maximizes −
PROB (C1, CT) * PROB (W1, WT | C1, CT) (1)
Even after reducing the problem in the above expression, it would require
large amount of data. We can make reasonable independence assumptions
about the two probabilities in the above expression to overcome the
problem.
First Assumption
The probability of a tag depends on the previous one (bigram model) or
previous two (trigram model) or previous n tags (n-gram model) which,
mathematically, can be explained as follows −
PROB (C1,..., CT) = Πi=1..T PROB (Ci|Ci-n+1…Ci-1) (n-gram model)
PROB (C1,..., CT) = Πi=1..T PROB (Ci|Ci-1) (bigram model)
The beginning of a sentence can be accounted for by assuming an initial
probability for each tag.
PROB (C1|C0) = PROB initial (C1)
Second Assumption
The second probability in equation (1) above can be approximated by
assuming that a word appears in a category independent of the words in
the preceding or succeeding categories which can be explained
mathematically as follows −
PROB (W1,..., WT | C1,..., CT) = Πi=1..T PROB (Wi|Ci)
Now, on the basis of the above two assumptions, our goal reduces to
finding a sequence C which maximizes
Πi=1...T PROB(Ci|Ci-1) * PROB(Wi|Ci)
Now the question that arises here is has converting the problem to the
above form really helped us. The answer is - yes, it has. If we have a large
tagged corpus, then the two probabilities in the above formula can be
calculated as −
PROB (Ci=VERB|Ci-1=NOUN) = (# of instances where Verb follows Noun) /
(# of instances where Noun appears) (2) 79
Natural Language Processing PROB (Wi|Ci) = (# of instances where Wi appears in Ci) /(# of
instances where Ci appears) (3)
80
80
3.5.2 Maximum Entropy Syntax Analysis - II
The maximum entropy principle (MaxEnt) states that the most appropriate
distribution to model a given set of data is the one with highest entropy
among all those that satisfy the constrains of our prior knowledge.
Usually, these constrains are given as equations regarding moments of the
desired distribution.
Reference pdf :
• https://web.stanford.edu/group/cslipublications/cslipublications/pdf/
1575863480.pdf
• https://www.researchgate.net/publication/270568943_Syntax_Work
book
• https://linguistics.ucla.edu/people/stabler/isat.pdf
81
Natural Language Processing
6
SEMANTIC ANALYSIS - I
Unit Structure
6.0 Objective
6.1 Introduction
6.1.2 Definition of Semantic Analysis
6.2 Lexical Semantics
6.2.1 Attachment for fragment of English-sentences,
6.2.2 Noun phrases
6.3 Verb phrases
6.3.1 Prepositional phrases
6.4 Relations among lexemes & their senses
6.4.2 -Homonymy, Polysemy, Synonymy
6.5 Homonymy, Robust, Word Senses
6.5.1 Disambiguation (WSD)
6.5.2 Dictionary based approach
6.0 OBJECTIVE
82
82
What is semantic analysis used for? Semantic Analysis - I
Semantic Analysis
Michael L. Scott, in Programming Language Pragmatics (Third Edition),
2009
One-Pass Compilers
A compiler that interleaves semantic analysis and code generation
with parsing is said to be a one-pass compiler.4 It is unclear whether
interleaving semantic analysis with parsing makes a compiler simpler or
more complex; it's mainly a matter of taste. If intermediate code
generation is interleaved with parsing, one need not build a syntax tree at
all (unless of course the syntax tree is the intermediate code). Moreover, it
is often possible to write the intermediate code to an output file on the fly,
rather than accumulating it in the attributes of the root of the parse tree.
The resulting space savings were important for previous generations of
computers, which had very small main memories. On the other hand,
semantic analysis is easier to perform during a separate traversal of a
syntax tree, because that tree reflects the program's semantic
structure better than the parse tree does, especially with a top-down parser,
and because one has the option of traversing the tree in an order other than
that chosen by the parser.
Decorating a Syntax Tree
In our discussion so far we have used attribute grammars solely to
decorate parse trees. As we mentioned in the chapter introduction, 83
Natural Language Processing attribute grammars can also be used to decorate syntax trees. If our
compiler uses action routines simply to build a syntax tree, then the bulk
of semantic analysis and intermediate code generation will use the syntax
tree as base.
What is semantic analysis in natural language processing?
Semantic analysis describes the process of understanding natural
language–the way that humans communicate–based on meaning and
context. ... It analyzes context in the surrounding text and it analyzes the
text structure to accurately disambiguate the proper meaning of words that
have more than one definition.
What is semantic analysis language?
In linguistics, semantic analysis is the process of relating syntactic
structures, from the levels of phrases, clauses, sentences and paragraphs
to the level of the writing as a whole, to their language-independent
meanings. ... Semantic analysis can begin with the relationship between
individual words.
How important is semantics?
However, in truth, semantics is a very important subject. Semantics is the
study of the meaning of words. Many words have very similar meanings
and it is important to be able to distinguish subtle differences between
them. ... ' It is also important for children to know the opposite meanings
of words (antonyms).
What is meant by semantic features?
Semantic features are theoretical units of meaning-holding components
which are used for representing word meaning. These features play a
vital role in determining the kind of lexical relation which exists between
words in a language.
What are examples of semantics?
semantics Add to list Share. Semantics is the study of meaning in
language. It can be applied to entire texts or to single words. For
example, "destination" and "last stop" technically mean the same thing,
but students of semantics analyze their subtle shades of meaning.
We have learnt how a parser constructs parse trees in the syntax analysis
phase. The plain parse-tree constructed in that phase is generally of no use
for a compiler, as it does not carry any information of how to evaluate the
tree. The productions of context-free grammar, which makes the rules of
the language, do not accommodate how to interpret them.
For example
E→E+T
84
84
The above CFG production has no semantic rule associated with it, and it Semantic Analysis - I
cannot help in making any sense of the production.
Semantics
Semantics of a language provide meaning to its constructs, like tokens and
syntax structure. Semantics help interpret symbols, their types, and their
relations with each other. Semantic analysis judges whether the syntax
structure constructed in the source program derives any meaning or not.
For example:
int a = “value”;
• Scope resolution
• Type checking
• Array-bound checking
Semantic Errors
We have mentioned some of the semantics errors that the semantic
analyzer is expected to recognize:
• Type mismatch
• Undeclared variable
• Reserved identifier misuse.
• Multiple declaration of variable in a scope.
• Accessing an out of scope variable.
• Actual and formal parameter mismatch.
Attribute Grammar
Attribute grammar is a special form of context-free grammar where some
additional information (attributes) are appended to one or more of its non-
terminals in order to provide context-sensitive information. Each attribute
has well-defined domain of values, such as integer, float, character, string,
and expressions.
Attribute grammar is a medium to provide semantics to the context-free
grammar and it can help specify the syntax and semantics of a
programming language. Attribute grammar (when viewed as a parse-tree)
can pass values or information among the nodes of a tree.
85
Natural Language Processing Example:
The right part of the CFG contains the semantic rules that specify how the
grammar should be interpreted. Here, the values of non-terminals E and T
are added together and the result is copied to the non-terminal E.
Semantic attributes may be assigned to their values from their domain at
the time of parsing and evaluated at the time of assignment or conditions.
Based on the way the attributes get their values, they can be broadly
divided into two categories : synthesized attributes and inherited attributes.
Synthesized attributes
These attributes get values from the attribute values of their child nodes.
To illustrate, assume the following production:
S → ABC
S → ABC
A can get values from S, B and C. B can take values from S, A, and C.
Likewise, C can take values from S, A, and B.
Expansion : When a non-terminal is expanded to terminals as per a
grammatical rule
86
86
Reduction : When a terminal is reduced to its corresponding non-terminal Semantic Analysis - I
according to grammar rules. Syntax trees are parsed top-down and left to
right. Whenever reduction occurs, we apply its corresponding semantic
rules (actions).
Semantic analysis uses Syntax Directed Translations to perform the above
tasks.
Semantic analyzer receives AST (Abstract Syntax Tree) from its previous
stage (syntax analysis).
Semantic analyzer attaches attribute information with AST, which are
called Attributed AST.
Attributes are two tuple value, <attribute name, attribute value>
For example:
int value = 5;
<type, “integer”>
<presentvalue, “5”>
S can take values from A, B, and C (synthesized). A can take values from
S only. B can take values from S and A. C can get values from S, A, and
B. No non-terminal can get values from the sibling to its right.
Attributes in L-attributed SDTs are evaluated by depth-first and left-to-
right parsing manner.
6.1 INTRODUCTION
The purpose of semantic analysis is to draw exact meaning, or you can say
dictionary meaning from the text. The work of semantic analyzer is to
check the text for meaningfulness.
We already know that lexical analysis also deals with the meaning of the
words, then how is semantic analysis different from lexical analysis?
Lexical analysis is based on smaller token but on the other side semantic
analysis focuses on larger chunks. That is why semantic analysis can be
divided into the following two parts −
Studying meaning of individual word
It is the first part of the semantic analysis in which the study of the
meaning of individual words is performed. This part is called lexical
semantics.
Studying the combination of individual words
In the second part, the individual words will be combined to provide
meaning in sentences.
The most important task of semantic analysis is to get the proper meaning
of the sentence. For example, analyze the sentence “Ram is great.” In this
sentence, the speaker is talking either about Lord Ram or about a person
88
88
whose name is Ram. That is why the job, to get the proper meaning of the Semantic Analysis - I
sentence, of semantic analyzer is important.
Meaning Representation
Semantic analysis creates a representation of the meaning of a sentence.
But before getting into the concept and approaches related to meaning
representation, we need to understand the building blocks of semantic
system.
Building Blocks of Semantic System
In word representation or representation of the meaning of the words, the
following building blocks play an important role −
91
Natural Language Processing Space Management for Attributes
Any attribute evaluation method requires space to hold the attributes of the
grammar symbols. If we are building an explicit parse tree, then the
obvious approach is to store attributes in the nodes of the tree themselves.
If we are not building a parse tree, then we need to find a way to keep
track of the attributes for the symbols we have seen (or predicted) but not
yet finished parsing. The details differ in bottom-up and top-down parsers.
For a bottom-up parser with an S-attributed grammar, the obvious
approach is to maintain an attribute stack that directly mirrors the parse
stack: next to every state number on the parse stack is an attribute record
for the symbol we shifted when we entered that state. Entries in the
attribute stack are pushed and popped automatically by the parser driver;
space management is not an issue for the writer of action routines.
Complications arise if we try to achieve the effect of inherited attributes,
but these can be accommodated within the basic attribute stack
framework. For a top-down parser with an L-attributed grammar, we have
two principal options. The first option is automatic, but more complex
than for bottom-up grammars. It still uses an attribute stack, but one that
does not mirror the parse stack. The second option has lower space
overhead, and saves time by “shortcutting” copy rules, but requires action
routines to allocate and deallocate space for attributes explicitly. In both
families of parsers, it is common for some of the contextual information
for action routines to be kept in global variables. The symbol table in
particular is usually global. We can be sure that the table will always
represent the current referencing environment, because we control the
order in which action routines (including those that modify the
environment at the beginnings and ends of scopes) are executed. In a pure
attribute grammar, we should need to pass symbol table information into
and out of productions through inherited and synthesized attributes.
92
92
Semantic Analysis - I
94
94
7
SEMANTIC ANALYSIS - II
Unit structure
7.0 Attachment for fragment of English-sentences
7.1 Noun phrases
7.2 Verb phrases
7.3 Prepositional phrases
7.4 Relations among lexemes & their senses
7.4.1 Homonymy, Polysemy, Synonymy
7.5 Robust, Word Senses
7.5.1 Disambiguation (WSD)
7.5.2 Dictionary based approach
95
Natural Language Processing 'When we got in the car' is a sentence fragment and a dependent clause. It
clearly belongs to the independent clause that follows it and should be
rewritten like this:
When we got in the car, we rolled down the windows.
Or like this:
We rolled down the windows when we got in the car.
Subordinators
The sentence fragment 'When we got in the car' also has the subordinator
'when'. Some other examples of subordinators are: 'after', 'although',
'before', 'if', 'since', 'until', 'when', 'where', 'while', and 'why'. Clauses with
subordinators can be called either dependent clauses or subordinating
clauses, but when those clauses appear at the beginning of a sentence, they
should be followed by a comma.
Fragment Phrases
Phrases are groups of words that are missing a subject or verb, or both.
Phrases can also masquerade as sentences, like dependent clauses can.
Here are some examples.
Here's an example missing subject and verb:
From morning until night.
This fragment can be made a complete sentence by changing it to:
I worked from morning until night.
Adding 'I' as the subject and 'worked' as the verb corrects this fragment
and makes it an independent clause and a complete thought.
Here's an example of a missing subject:
Start after the weekend.
This fragment can be made a complete sentence by changing it to:
Classes start after the weekend.
Adding the subject 'classes' corrects this fragment and makes it an
independent clause and a complete thought.
Finally, here's an example of a missing verb:
Some girls in the class.
This fragment can be changed to:
Some girls in the class study together.
Adding the verb 'study' corrects this fragment and makes it an independent
clause and a complete thought.
Fragment Temptations
Certain words and expressions make it easy to fall into the sentence
fragment habit. Some of these words include 'also', 'for example', 'and',
'but', 'for instance', 'mainly', 'or', and 'that'. Here's how they appear in a
sentence:
96
96
Harris claims that men and women have different ideas about dating. For Semantic Analysis - II
example, that men should pay for dinner on a date.
Noun phrases are groups of words that function like nouns. Typically, they
act as subjects, objects or prepositional objects in a sentence. While that
might seem tricky to grasp, the best way to understand these useful
phrases is to see them in action. Get a clear idea of a noun phrase and how
it is in sentence through example.
What is a noun phrase example?
A noun phrase is either a pronoun or any group of words that can be
replaced by a pronoun. For example, 'they', 'cars', and 'the cars' are
noun phrases, but 'car' is just a noun, as you can see in these sentences (in
which the noun phrases are all in bold) Q: Do you like cars? A: Yes, I like
them.
Noun phrases are simply nouns with modifiers. Just as nouns can act as
subjects, objects and prepositional objects, so can noun phrases. Similarly,
noun phrases can also work in a sentence as adjectives, participles,
infinitives, and prepositional or absolute phrases. Noun phrases are
important for adding more detail to a noun. Examples of simple noun
phrases include:
Verb phrase (VP): These phrases are lexical units that have a verb
acting as the head word. Usually, there are two forms of verb phrases.
One form has the verb components as well as other entities such as nouns,
adjectives, or adverbs as parts of the object.
The phrase would include the verbal (participle, gerund or infinitive) and
any modifiers, complements or objects. Examples of verb phrases versus
verbal phrases include: The man was texting on his phone. (verb phrase
was texting functions as the action) Texting on his phone, the man
swerved into a ditch.
What is the main verb in a verb phrase?
The main verb is also called the lexical verb or the principal verb. This
term refers to the important verb in the sentence, the one that typically
shows the action or state of being of the subject. Main verbs can stand
alone, or they can be used with a helping verb, also called an auxiliary
verb.
97
Natural Language Processing What is a verb phrase in English language?
In English a verb phrase combines with a noun or noun phrase acting
as subject to form a simple sentence. a phrase consisting of a main verb
and any auxiliaries but not including modifiers, objects, or complements.
What are the types of verb phrases?
Verb phrases generally are divided among two types: finite, of which the
head of the phrase is a finite verb; and nonfinite, where the head is a
nonfinite verb, such as an infinitive, participle or gerund.
How do you find a verb phrase?
A verb phrase consists of a verb plus another word that further
illustrates the verb tense, action, and tone. The other word or words tied
to a verb in a verb phrase are its dependents, which can be adverbs,
prepositional phrases, helping verbs, or other modifiers
Verb Phrase Examples
● She was walking quickly to the mall.
● He should wait before going swimming.
● Those girls are trying very hard.
● Ted might eat the cake.
● You must go right now.
● You can't eat that!
● My mother is fixing us some dinner.
● Words were spoken.
98
98
● Prepositional phrases can act as adverbs or adjectives. When they Semantic Analysis - II
are used as adjectives, they modify nouns and pronouns in the same
way single-word adjectives do.
● When prepositional phrases are used as adverbs, they at the same
way single-word adverbs and adverb clauses do, modifying
adjectives, verbs, and other adverbs.
Examples of Prepositional Phrases
The following sentences contain examples of prepositional phrases; the
prepositional phrase in each sentence is italicized for easy identification.
The cupcake with sprinkles is yours.
The cupcake with colorful sprinkles is yours.
We climbed up the hill.
We climbed up the very steep hill.
The rabbits hopped through the garden.
The rabbits hopped through the perfectly manicured garden.
99
Natural Language Processing What is the meaning of Homonymy?
Homonymy is the relationship between words that are homonyms—
words that have different meanings but are pronounced the same or
spelled the same or both. It can also refer to the state of being
homonyms.
10 Homonyms with Meanings and Sentences
• Cache – Cash:
• Scents – Sense:
• Chile – Chili:
• Choir – Quire:
• Site – Sight:
• Facts- Fax:
• Finnish – Finish:
What is polysemy and examples?
polysemy Add to list Share. When a symbol, word, or phrase means
many different things, that's called polysemy. The verb "get" is a good
example of polysemy — it can mean "procure," "become," or
"understand."
What are the polysemy words?
A polysemous word is a word that has different meanings that derive
from a common origin; a homograph is a word that has different
meanings with unrelated origins. Polysemous words and homographs
constitute a known problem for language learners.
How many types of polysemy are there?
Types of polysemy
Linear polysemy accounts for a specialization-generalization relation
between senses and, in turn, is divided into four types:
autohyponymy, automeronymy, autosuperordination and autoholonymy.
Synonymy is a relation between individual senses of words, so that a
single word typically has different sets of synonyms for each of its
senses. For example, coat has different synonyms for its senses 'outer
garment' (e.g., jacket) and 'covering layer'(e.g., layer)
What is the concept of synonymy? Meaning of synonymy in English?
the state of being synonymous (= havings the same or almost the same
meaning as another word or phrase in the same language) or the fact
that words or phrases are synonymous: ... Children learn to accept two
labels for the same object (synonymy).
100
100
What are the 5 examples of synonyms? Semantic Analysis - II
104
104
method involves bootstrapping from the initial data, which is referred to as Semantic Analysis - II
the seed data.
Semi-supervised methods thus, use both labeled and unlabeled data.
Unsupervised Methods:
Unsupervised Methods pose the greatest challenge to researchers and NLP
professionals. A key assumption of these models is that similar meanings
and senses occur in a similar context. They are not dependent on manual
efforts, hence can overcome the knowledge acquisition deadlock.
Lesk Algorithm
Lesk Algorithm is a classical Word Sense Disambiguation algorithm
introduced by Michael E. Lesk in 1986.
The Lesk algorithm is based on the idea that words in a given region of the
text will have a similar meaning. In the Simplified Lesk Algorithm, the
correct meaning of each word context is found by getting the sense which
overlaps the most among the given context and its dictionary meaning.
Read More about the Lesk Algorithm here.
Let us start by importing the libraries.
So, the NLTK library can implement the Lesk method properly.
Do check out the code here.
Why is WSD any relevant?
Word Sense Disambiguation is closely related to Parts of speech tagging
and is an important part of the whole Natural Language Processing
process.
WSD if implemented properly, can lead to breakthroughs in NLP. A
problem that often arises is the whole meaning of word sense. Word sense
is not a numeric quantity that can be measured or a true or false value that
can be denoted as 1 or 0.
The whole idea of word sense is controversial. The meaning of a word is
highly contextual and depends on its usage. It is not something that can be
easily measured as a discrete quantity.
Lexicography deals with generalizing the corpus and explaining the full
and extended meaning of a word. But, sometimes these meanings might
not apply to the algorithms or data.
My personal opinion and experience, say that working with text data can
be tricky. So, implementation of WSD can be often very difficult and
problem-creating.
But, WSD has immense applications and uses.
If a computer algorithm can just read a text and identify different uses of a
text, it would mean vast improvements in the field of text analytics. 105
Natural Language Processing Comparing and evaluating various WSD methods can be difficult. But,
with time more research is going on regarding WSD and it can be
improved.
7.5.2 Dictionary based approach
Dictionary Based Algorithm. A simple approach to segment text is to scan
each character one at a time from left to right and look up those
characters in a dictionary. If the series of characters found in the
dictionary, then we have a matched word and segment that sequence as a
word.
Functional Modelling and Mathematical Models: A Semantic Analysis
How the model M is connected to the system S.
Feynman's weight on a spring is a convenient place to start. In all three
examples below, S is a weight on a spring, either a real one or one that we
propose to construct. But the connection takes a different form in each
example.
(A) The model is a computer mockup in virtual reality. It looks like the
real thing, and maybe sounds like the real thing if the programmer was
having fun. In this case the connection with the system
is resemblance or similarity; we say that the model is pictorial.
In functional modelling the modeler will sometimes turn an early stage of
the specification into a toy working system, called a prototype. The
prototype is a pictorial model of the final system. It shows how the final
system will operate, by working more or less like the final system but
maybe with some features missing.
(B) The model is a graph, for example
This graph describes the system less directly. Left to right in the graph
represents time, up and down represents the vertical distance of the center
of mass of the weight from its resting position. In both dimensions a
distance in the graph is proportional to a distance in space or time. A
model that can be read in this way, by taking some dimensions in the
model as corresponding to some dimensions in the system, is called
an analogue model.
106
106
In hydraulic and aeronautical engineering one often meets scale models. Semantic Analysis - II
These are analogue models where the dimensions of the final system are
accurately scaled up or down (usually down) so that the model is a more
convenient size than the final system. But if all the dimensions are scaled
down in a ratio r, then the areas are scaled down in ratio r2 and the
volumes (and hence the weights) in ratio r3. So given the laws of physics,
how should we scale the time if we want the behavior of the model to
predict the behavior of the system?
Dimensional analysis answers this question.
A model can be both pictorial and analogue. For example, the architect's
model is both. But the model is clearly not a pictorial model; it doesn't
look anything like a weight on a spring.
(C) Feynman himself models his system with an equation, (6). His
equation is a piece of text which makes a statement about the system. We
call it a textual model.
Some fields have developed specialist notations for their subject matter.
Generally, these notations are textual, in the sense that they build up
expressions from a finite alphabet, though there may be pictorial reasons
why one symbol was chosen rather than another. Often, they are meant to
be written and read rather than spoken. Musical scores are an obvious
example. There is no inherent problem about translating the score of a
Beethoven piano sonata symbol by symbol into English (‘The time
signature is common time, the first bar begins with a minim at g in the
right hand … ‘) in such a way that the original score could be recovered
from the English-though I can't think why anybody would want to do it.
The analogue model doesn't translate into English in any similar way.
Another example of a textual notation is Universal Modelling Language
(UML), which is often used in early stages of software modelling; it's less
specialist than musical scores but still very limited in what it can express.
The diagram notation of Barwise and Etchemin’s logic software Hyper
proof [Barwise and Etchemin, 1994] looks pictorial but can be read as
textual. The upper part of the screen carries pictures of a chessboard with
various objects arranged on it, while the lower part carries conventional
logical formulas. Built-in rules (Observe and Apply) carry information
from pictures to formulas and from formulas to pictures. Although we can
say in formulas anything we can say in pictures, and to some extent vice
versa, the two kinds of representation encourage different habits of
reasoning.
Keith Stenning [Stenning, 2002] reports some valuable experiments on
how students with different cognitive styles use Hyperproof. His surveys
different forms of representation from a cognitive point of view.
Textual models are particularly important for us because in principle
Tarski's semantic analysis applies to them. Tarski himself said
107
Natural Language Processing Whoever wishes … to pursue the semantics of colloquial language with
the help of exact methods will be driven first to undertake the thankless
task of a reform of this language…. It may, however, be doubted whether
the language of everyday life, after being ‘rationalized’, in this way, would
still preserve its naturalness and whether it would not rather take on the
characteristic features of the formalized languages.
Tarski may have intended these remarks to discourage people from
extending his semantic theory beyond the case of formalized languages.
But today his theory is applied very generally, and the ‘rationalization’,
that he refers to is taken as part of the job of a semanticist. For example
the diagrams of Barwise and Etchemendy (above) are studied in this spirit.
Very many models are text from the outset, or can be read as text. One
important case that we need to consider is computer models. For example
models for wind turbines are usually presented as computer programs
together with some accompanying theory to justify the programs. For
semantic analysis we need to be more precise about exactly what feature
of a computer model is the actual model. Let me give my own answer;
other analysts may see things differently.
The information about the proposed wind turbine is got by running the
program. So we should count the model as being the output of the
program. The output may include text printed on the screen or saved in a
file; in this respect the model is textual. The output may also consist of
pictures on the screen, or graphs; in this respect the model is pictorial, and
possibly also analogue. Dynamic real-time simulations are certainly
analogue; they may include sound as well as graphics.
Often the same program can be run with different parameters. (For
example in wind turbine modelling one uses programs that simulate wind
conditions and are seeded with a random number.
(13)f(A,γ,t.ωγ)=Ae-γt/2cosωγt.
108
108
This function is an abstract object. For definiteness some people give it a Semantic Analysis - II
set-theoretic form by identifying it with a set of ordered 5-tuples of real
numbers. Although the function clearly bears some close relationship to
the equation , it's a wholly different kind of object. We can't put it on a
page or a screen, or make it out of wood or plaster of paris. In short it is
not ‘accessible’, to us in any direct way.
https://www.researchgate.net/publication/289220287_Semantic_Analysis
A Coursebook SECOND EDITION JAMES R. HURFORD Professor of
General Linguistics, University of Edinburgh BRENDAN HEASLEY
Consultant (Postgraduate Training), Sharjah Women’s College, United
Arab Emirates MICHAEL B. SMITH Associate Professor of Linguistics,
Oakland University
109
Natural Language Processing
8
TEXT SUMMARIZATION, TEXT
CLASSIFICATION
Unit Structure
8.0 Objectives
8.1 Introduction
8.2 Overview
8.3 Text summarization- LEXRANK
8.4 Optimization based approaches for summarization
8.5 Summarization evaluation
8.6 Text classification
8.7 Let us Sum Up
8.8 List of References
8.9 Bibliography
8.10 Unit End Exercises
8.0 OBJECTIVES
8.1 INTRODUCTION
The text summary is a technique for condensing vast passages of text into
manageable chunks. The goal is to develop a logical and fluent summary
that only includes the document's major ideas. In machine learning and
natural language processing, automatic text summarization is a prevalent
challenge (NLP).
110
110
The technique has proven to be crucial in quickly and accurately Text Summarization, Text
Classification
summarizing large texts, which would be costly and time-consuming if
done manually.
Before producing the requisite summary texts, machine learning models
are frequently trained to comprehend documents and condense the useful
information.
The purpose of text summarization Data is to this century what oil was to
the previous one, propelled by modern technical breakthroughs. The
collection and transmission of massive volumes of data have parachuted
into our world today.
With so much data moving in the digital world, machine learning
algorithms that can automatically condense lengthy texts and offer
accurate summaries that carry the intended contents fluently are needed.
Furthermore, using text summarization reduces reading time, speeds up
the research process, and expands the quantity of information that may fit
in a given space.
8.2 OVERVIEW
• Extractive Summarization
• Abstractive Summarization
111
Natural Language Processing
112
112
Text Summarization, Text
Classification
Where Si denotes the sentences at the vertices, and Wij denotes the
weights on the edges.
Computation of cosine similarity
The bag of words model is used to describe N-dimensional vectors to
define similarity, where N is the number of all possible worlds in a given
language. The value of the appropriate dimension in the vector
representation of the sentence for each word that appears in the sentence is
the number of occurrences of the word in the sentence times the idf of the
word.
113
Natural Language Processing Matrix B is the adjacency matrix formed by dividing each element by the
associated row sum in the similarity graph. The left eigenvector of matrix
B has an eigenvalue of 1 and is denoted by pT.
ALGORITHM
Algortihm to calculate LexRank Scores is as follows:
114
114
8.4 OPTIMIZATION BASED APPROACHES FOR Text Summarization, Text
Classification
SUMMARIZATION
With the growth of the World Wide Web and electronic government
services over the last decade, automatic text summarization has gotten a
lot of attention (e.g., electronic document management systems). Users
can access a vast amount of information thanks to the continued growth of
the WWW and online text collections from e-government services.
Material overload either wastes a large amount of time viewing all of the
information or causes useful information to be overlooked. As a result,
new technologies that can successfully process documents are in high
demand. In technological surroundings, automatic text summarization is a
critical technology for overcoming this barrier. Automatic text
summarization technology is improving, and it may give a solution to the
information overload problem
The practice of automatically constructing a compressed version of a
given text that delivers helpful information to readers is known as
document summarising. Summarization has been defined as a reductive
transformation of a given text, usually stated as a three-step process:
selection of salient bits of text, aggregation and abstraction of information
for various selected chunks, and lastly display of the final summary text.
This method can be applied to a variety of tasks, including information
retrieval, intelligence gathering, data extraction, text mining, and indexing.
Extractive and abstractive approaches to document summarization can be
broadly distinguished. In general, an abstract is a summary of
concepts/ideas taken from a source that is then "reinterpreted" and given
differently, whereas an extract is a summary of units of text taken from a
source and presented verbatim. In reality, the vast majority of studies have
focused on summary extraction, which takes bits from the source such as
keywords, sentences, or even paragraphs to create a summary. Abstractive
approaches, in contrast to extractive approaches, are more difficult to
adopt since they need more work.
To read source texts and generate new texts, you'll need a lot of domain
knowledge. "Reading and understanding the text to recognize its content,
which is subsequently compiled in a succinct text" is how abstraction is
defined.
The summary might be a single document or a multi-document, depending
on the number of papers to be summarised. Only one document can be
condensed into a shorter form using single-document summarising. Multi-
document summarising reduces a large number of documents into a single
summary. It gives users a domain overview of a topic, indicating what is
similar and different across multiple papers, as well as relationships
between pieces of information in different documents, and lets them zoom
in for more information on specific elements of interest. Multi-document
summary seeks to extract relevant information about a subject or topic
from various texts written about that subject or topic. It accomplishes
115
Natural Language Processing knowledge synthesis and knowledge discovery by combining and
integrating information from several publications, and it can also be used
for knowledge acquisition.
There are two types of extraction-based summarization methods:
supervised and unsupervised. Supervised approaches are based on
algorithms that use a large number of human-made summaries, and are
thus best suited to texts that are relevant to the summarizer model. As a
result, they don't always give a sufficient summary for papers that aren't
exactly like the model. In addition, when users modify the aim of
summarising or the properties of documents, the training data must be
rebuilt or the model must be retrained. To train the summarizer,
unsupervised methods do not require training data such as human-made
summaries. To score and rank sentences using unsupervised approaches,
early researchers frequently choose various statistic and linguistic criteria.
On the internet, you can now obtain a wealth of information on people's
perspectives on practically any topic. In complicated, high-stakes
decision-making processes, the capacity to interpret such data is crucial. A
person interested in purchasing a laptop can browse consumer reviews
from others who have purchased and utilized the device. Customer
feedback on a freshly introduced product at the corporate level might help
discover flaws and features that need to be improved.
To express people's opinions to users, effective summarising techniques
are required. The creation of a content selection strategy that defines what
vital information should be given is a difficult problem when adopting this
approach in a specific domain. In general, content selection is a crucial
task at the heart of both summarization and NLG, and it's an area where
cross-fertilization could be fruitful.
Existing NLG systems typically handle content selection by creating a
heuristic based on a number of relevant parameters and optimising it.
ILEX (Intelligent Labelling Explorer) is a database-based system for
creating labels for groups of objects, such as museum artefacts (O'Donnell
et al., 2001). Its content selection approach comprises assigning
knowledge elements a heuristic relevance score and returning the things
with the highest scores.
Evaluative arguments are generated in GEA (Generator of Evaluative
Arguments) to describe an entity as good or negative. An entity is divided
into a hierarchy of features, with a relevance score generated individually
for each feature depending on the user's preferences and the product's
value of that feature. The most relevant characteristics for the current user
are chosen during content selection.
Over the last few decades, text categorization problems have been
extensively researched and handled in a variety of real-world applications.
Many researchers are currently interested in building applications that use
text categorization algorithms, especially in light of recent achievements
in Natural Language Processing (NLP) and text mining. Feature
extraction, dimension reductions, classifier selection, and assessments are
the four processes that most text classification and document
117
Natural Language Processing categorization systems go through. The organization and technical
implementations of text classification systems are discussed in this study
in terms of the pipeline shown in Figure 1.
The raw text data collection serves as the pipeline's initial input. Text data
sets, in general, contain text sequences in documents like D = X1, X2,...,
XN, where Xi refers to a data point (i.e., document, text segment) with s
number of sentences, each sentence containing ws words with lw letters. A
class value is assigned to each point from a collection of k discrete value
indices.
Then, for our training purposes, we should develop a structured collection
called Feature Extraction. The dimensionality reduction step is an optional
part of the pipeline that could be part of the classification system (for
example, if we use Term Frequency-Inverse Document Frequency (TF-
IDF) as our feature extraction and the train set has 200k unique words,
computational time will be very expensive, so we could reduce this option
by bringing feature space into another dimensional space). Choosing the
correct classification algorithm is the most important stage in document
categorization.
The evaluation step, which is separated into two parts, is the other half of
the pipeline (prediction the test set and evaluating the model). In general,
there are four main levels of scope that can be applied to the text
classification system:
1. Document-level: The method collects the necessary categories of a
full document at the document level.
2. Paragraph level: The algorithm obtains the appropriate categories of
a single paragraph at the paragraph level (a portion of a document).
3. Sentence level: Gets the relevant categories of a single sentence (a
section of a paragraph) at the sentence level.
4. Sub-sentence level: The algorithm obtains the relevant categories of
sub-expressions within a sentence (a piece of a sentence) at the sub-
sentence level.
119
Natural Language Processing Non-parametric approaches, such as a k-nearest neighbor, have been
investigated and employed as classification tasks. Another
prominent technique is the Support Vector Machine (SVM). For
document categorization, it uses a discriminative classifier. This
method can be applied to any situation.
Bioinformatics, image, video, human activity classification, safety,
and other data mining disciplines security, and so on Many
researchers utilize this model as a benchmark against which to
assess their findings. Attempts to bring attention to new ideas and
contributions Deep learning approaches have recently outperformed
prior machine learning algorithms on problems including image
classification, natural language processing, and face recognition,
among others. The ability of these deep learning algorithms to
represent complicated and non-linear relationships inside data is
critical to their success.
(IV) Evaluation: The text categorization pipeline's final step is
evaluation. The use and development of text categorization methods
require an understanding of how a model works. There are numerous
methods for assessing supervised techniques. The simplest way of
evaluation is accuracy calculation, but it does not work for
unbalanced data sets. F Score, Matthews Correlation Coefficient
(MCC), receiver operating characteristics (ROC), and area under the
ROC curve are all ways for evaluating text categorization systems
(AUC).
feature extractions, dimensionality reduction, and other pipeline steps
Techniques of classification and methods of evaluation In this part, state-
of-the-art techniques are compared based on a variety of criteria, including
model architecture, the novelty of the work, feature extraction technique,
corpus (the data set/s used), validation measure, and limitation of each
work. Choosing a feature extraction approach is necessary for determining
the optimum system for a certain application. Because some feature
extraction strategies are inefficient for a specific application, this decision
is entirely dependent on the application's purpose and data source. For a
given use, it is effective. for example, is trained on Wikipedia and does not
perform as well as TF-IDF when used for short text messages such as
short message service (SMS). Furthermore, due to the tiny amount of data,
this model cannot be trained as well as other strategies. The categorization
technique is the next step in this pipeline, where we briefly discuss the
limitations and drawbacks of each technique.
Text classification is a serious difficulty for scholars in many domains and
fields. Text categorization methods are extensively used in information
retrieval systems and search engine applications. Text classification could
be utilized for information filtering (e.g., email and text message spam
filtering) as an extension of existing applications. Following that, we
discuss document categorization's acceptance in public health and human
behavior. Document organization and knowledge management are two
120
120
further areas where text classification has aided. Finally, we'll talk about Text Summarization, Text
Classification
recommender systems, which are common in marketing and advertising.
TEXT PREPROCESSING
For text classification applications, feature extraction and pre-processing
are critical tasks. This section introduces strategies for cleaning text data
sets, reducing implicit noise and allowing for more accurate analysis. For
enlightenment. We also go over two popular strategies for extracting text
features: weighted words and word embedding.
CLEANING AND PRE-PROCESSING OF TEXT
Many redundant words, such as stopwords, misspellings, and slang, can be
found in most text and document data sets. Noise and superfluous
characteristics can degrade system performance in many algorithms,
particularly statistical and probabilistic learning algorithms. In this section,
we'll go through several text cleaning and pre-processing approaches and
methodologies.
TOKENIZATION
Tokenization is a pre-processing technique that divides a stream of text
into tokens, which are words, phrases, symbols, or other meaningful
pieces. The investigation of the words in a sentence is the primary purpose
of this step. Both text classification and text mining necessitate the use of
a parser to handle the tokenization of documents, such as the following
sentence:
He opted to sleep for another four hours after four hours of sleep.
The tokens in this scenario are as follows:
"After" "four" "hours of" "sleeping," "he" "decided" "to" "sleep" "for"
"another" "four."
STOP WORDS
Many terms in text and document classification are not significant enough
to be employed in classification algorithms, such as "a", "about", "above",
"across", "after", "afterwards", "again", and so on. To deal with these
words, the most typical method is to eliminate them from texts and
documents.
CAPITALIZATION
To compose a sentence, text and document data points have a variety of
caps. Since
When a document has a lot of sentences, different capitalization might be
a real pain.
the classification of huge documents Dealing with uneven capitalization is
the most usual solution.
121
Natural Language Processing is to change all of the letters to lower case. This approach converts all text
and document words into
the same feature area, yet it complicates the understanding of some words
significantly
(For instance, "US" (United States of America) to "us" (pronoun))
Converters for slang and abbreviations are available.
assist in accounting for these occurrences
ABBREVIATIONS AND SLANG
Other types of text abnormalities that are handled in the pre-processing
step include slang and abbreviation. SVM stands for Support Vector
Machine, and an abbreviation is a shorter form of a word or phrase that
contains mostly first letters from the terms.
Slang is a subset of spoken or written language that has several meanings,
such as "lost the plot," which essentially means "gone insane". Converting
these terms into formal language is a typical approach to dealing with
them
NOISE ABATEMENT
Many unneeded characters, such as punctuation and special characters,
can be found in most text and document data sets. Although critical
punctuation and special characters are necessary for human
comprehension of documents, they can harm categorization systems.
CORRECTIONS TO SPELLING
Spelling correction is a pre-processing step that can be skipped. Typos
(short for typographical errors) are widespread in texts and documents,
particularly in text data sets from social media (e.g., Twitter). This
challenge has been addressed by many algorithms, strategies, and methods
in NLP . For researchers, there are a variety of strategies and methods
accessible, including hashing-based and context-sensitive spelling
correction algorithms, as well as spelling correction utilizing the Trie and
Damerau–Levenshtein distance bigrams.
STEMMING
In NLP, a single word might exist in multiple forms (for example, singular
and plural noun forms), all of which have the same semantic meaning.
Stemming is one way for combining different variants of a word into the
same feature space. Text stemming uses linguistic processes like affixation
(adding affixes) to modify words to produce variant word forms. The stem
of the word "studying," for example, is "study."
LEMMATIZATION
Lemmatization is a natural language processing technique that replaces a
word's suffix with a different one or eliminates the suffix to reveal the
fundamental word form (lemma)
122
122
8.7 LET US SUM UP Text Summarization, Text
Classification
The text summary is a technique for condensing vast passages of text into
manageable chunks. In machine learning and natural language processing,
text summarization is a prevalent challenge. The main issues are topic
identification, interpretation, summary generation, and evaluation of the
created summary. Text summarization is the process of constructing a
concise, cohesive, and fluent summary of a lengthier text document. This
involves finding key sentences or phrases from the original text and
piecing together sections to create a shortened version.
It is a prevalent challenge in machine learning and natural language
processing (NLP). The bag of words model is used to describe N-
dimensional vectors to define similarity. Matrix B is the adjacency matrix
formed by dividing each element by the associated row sum in the
similarity graph. The degree of each node is called degree centrality in this
case
8.8 REFFERENCES
• https://www.sciencedirect.com/topics/computer-science/text-
summarization
• https://towardsdatascience.com/a-quick-introduction-to-text-
summarization-inmachine-learning-3d27ccf18a9f
• https://iq.opengenus.org/lexrank-text-summarization/
• Text Classification Algorithms: A Survey Kamran Kowsari 1,2,* ID
, Kiana Jafari Meimandi 1 , Mojtaba Heidarysafa 1 , Sanjana Mendu
1 ID , Laura Barnes 1,2,3 ID and Donald Brown 1,3 I
• Summarization Evaluation: An Overview Inderjeet MANI The
MITRE Corporation, W640 11493 Sunset Hills Road Reston, VA
20190-5214, USA
• AN OPTIMIZATION APPROACH TO AUTOMATIC GENERIC
DOCUMENT SUMMARIZATION RASIM M. ALGULIEV,
RAMIZ M. ALIGULIYEV, AND CHINGIZ A. MEHDIYEV
Institute of Information Technology of Azerbaijan National
Academy of Sciences, Baku, Azerbaijan
• “Optimization-based Content Selection for Opinion Summarization
Jackie Chi Kit Cheung Department of Computer Science University
of Toronto Toronto”, ON, M5S 3G4, Canada
jcheung@cs.toronto.edu Giuseppe Carenini and Raymond T. N
8.9 BIBLIOGRAPHY
• https://www.sciencedirect.com/topics/computer-science/text-
summarization
• https://towardsdatascience.com/a-quick-introduction-to-text-
summarization-inmachine-learning-3d27ccf18a9f
• https://iq.opengenus.org/lexrank-text-summarization/ 123
Natural Language Processing • “Text Classification Algorithms: A Survey Kamran Kowsari” 1,2,*
ID , Kiana Jafari Meimandi 1 , Mojtaba Heidarysafa 1 , Sanjana
Mendu 1 ID , Laura Barnes 1,2,3 ID and Donald Brown 1,3 I
• “Summarization Evaluation: An Overview Inderjeet MANI The
MITRE Corporation”, W640 11493 Sunset Hills Road Reston, VA
20190-5214, USA
• “AN OPTIMIZATION APPROACH TO AUTOMATIC GENERIC
DOCUMENT SUMMARIZATION RASIM M. ALGULIEV,
RAMIZ M. ALIGULIYEV, AND CHINGIZ A. MEHDIYEV”
Institute of Information Technology of Azerbaijan National
Academy of Sciences, Baku, Azerbaijan
• “Optimization-based Content Selection for Opinion Summarization
Jackie Chi Kit Cheung Department of Computer Science University
of Toronto Toronto”, ON, M5S 3G4, Canada
jcheung@cs.toronto.edu Giuseppe Carenini and Raymond T. N
124
124
9
SENTIMENT ANALYSIS AND
OPINION MINING - I
Unit Structure
9.0 Objective
9.1 Introduction
9.1.1 Definition of Sentiment Analysis
9.2 Sentiment analysis types
9.2.1 Sentiment Analysis - Affective lexicons
9.2.2 Learning affective lexicons
9.3.1 Computing with affective lexicons
9.3.2 Aspect based sentiment analysis
9. OBJECTIVE
9.1 INTRODUCTION
125
Natural Language Processing Sentiment analysis, also called opinion mining, is the field of study that
analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes,
and emotions towards entities such as products, services, organizations,
individuals, issues, events, topics, and their attributes. It represents a large
problem space. There are also many names and slightly different tasks,
e.g., sentiment analysis, opinion mining, opinion extraction, sentiment
mining, subjectivity analysis, affect analysis, emotion analysis, review
mining, etc. However, they are now all under the umbrella of sentiment
analysis or opinion mining. While in industry, the term sentiment analysis
is more commonly used, but in academia both sentiment analysis and
opinion mining are frequently employed. They basically represent the
same field of study., we use the terms sentiment analysis and opinion
mining interchangeably.
To simplify the presentation, throughout this book we will use the term
opinion to denote opinion, sentiment, evaluation, appraisal, attitude, and
emotion. However, these concepts are not equivalent. We will distinguish
them when needed. The meaning of opinion itself is still very broad.
Sentiment analysis and opinion mining mainly focuses on opinions which
express or imply positive or negative sentiments. Although linguistics and
natural language processing (NLP) have a long history, little research had
been done about people’s opinions and sentiments before the year 2000.
Since then, the field has become a very active research area. There are
several reasons for this. First, it has a wide arrange of applications, almost
in every domain.
The industry surrounding sentiment analysis has also flourished due to the
proliferation of commercial applications. This provides a strong
motivation for research. Second, it offers many challenging research
problems, which had never been studied before. Sentiment Analysis and
Opinion Mining 8 the first time in human history, we now have a huge
volume of opinionated data in the social media on the Web. Without this
data, a lot of research would not have been possible. Not surprisingly, the
inception and the rapid growth of sentiment analysis coincide with those
of the social media. In fact, sentiment analysis is now right at the center of
the social media research. Hence, research in sentiment analysis not only
has an important impact on NLP, but may also have a profound impact on
management sciences, political science, economics, and social sciences as
they are all affected by people’s opinions. Although the sentiment analysis
research mainly started from early 2000, there were some earlier work on
interpretation of metaphors, sentiment adjectives, subjectivity, view
points, and affects (Hatzivassiloglou and McKeown, 1997; Hearst, 1992;
Wiebe, 1990; Wiebe, 1994; Wiebe, Bruce and O'Hara, 1999).
9.1.1 Definition of Sentiment Analysis
Sentiment analysis is contextual mining of text which identifies and
extracts subjective information in source material, and helping a
business to understand the social sentiment of their brand, product or
service while monitoring online conversations.
126
126
Sentiment Analysis is the most common text classification tool that Sentiment Analysis and
Opinion Mining - I
analyses an incoming message and tells whether the underlying sentiment
is positive, negative our neutral.
In linguistics, semantic analysis is the process of relating syntactic
structures, from the levels of phrases, clauses, sentences and paragraphs to
the level of the writing as a whole, to their language-
independent meanings. It also involves removing features specific to
particular linguistic and cultural contexts, to the extent that such a project
is possible. The elements of idiom and figurative speech, being cultural,
are often also converted into relatively invariant meanings in semantic
analysis. Semantics, although related to pragmatics, is distinct in that the
former deals with word or sentence choice in any given context, while
pragmatics considers the unique or particular meaning derived from
context or tone. To reiterate in different terms, semantics is about
universally coded meaning, and pragmatics, the meaning encoded in
words that is then interpreted by an audience.[1]
Semantic analysis can begin with the relationship between individual
words. This requires an understanding of lexical hierarchy,
including hyponymy and hypernymy, meronomy, polysemy, synonyms,
antonyms, and homonyms.[2] It also relates to concepts like connotation
(semiotics) and collocation, which is the particular combination of words
that can be or frequently are surrounding a single word. This can include
idioms, metaphor, and simile, like, "white as a ghost."
Some important remarks about this definition are in order:
1. In this definition, we purposely use subscripts to emphasize that the
five pieces of information in the quintuple must correspond to one
another. That is, the opinion must be given by opinion holder about
aspect of entity at time. Any mismatch is an error.
2. The five components are essential. Missing any of them is
problematic in general. For example, if we do not have the time
component, we will not be able to analyze opinions on an entity
according to time, which is often very important in practice because
an opinion two years ago and an opinion yesterday is not the same.
Without opinion holder is also problematic. For example, in the
sentence “the mayor is loved by the people in the city, but he has
been criticized by the state government,” the two opinion holders,
“people in the city” and “state government,” are clearly important
for applications.
3. The definition covers most but not all possible facets of the semantic
meaning of an opinion, which can be arbitrarily complex. For
example, it does not cover the situation in “The view finder and the
lens are too close,” which expresses an opinion on the distance of
two parts. It also does not cover the context of the opinion, e g.,
“This car is too small for a tall person,” which does not say the car is
too small for everyone. “Tall person” is the context here. Note also
that in the original definition of entity, it is a hierarchy of parts, sub- 127
Natural Language Processing parts, and so on. Every part can have its set of attributes. Due to the
simplification, the quintuple representation can result in information
loss. For example, “ink” is a part/component of a printer. In a printer
review, one wrote “The ink of this printer is expensive.” This does
not say that the printer is expensive (which indicates the aspect
price). If one does not care about any attribute of the ink, this
sentence just gives a negative opinion to the ink, which is an aspect
of the printer entity. However, if one also wants to study opinions
about different aspects of the ink, e.g., price and quality, the ink
needs to be treated as a separate entity. Then, the quintuple
representation still applies, but the part-of relationship needs to be
saved. Of course, conceptually we can also expand the
representation of opinion target using a nested relation. Despite the
limitations, the definition does cover the essential information of an
opinion which is sufficient for most applications.
4. This definition provides a framework to transform unstructured text
to structured data. The quintuple above is basically a database
schema, based on which the extracted opinions can be put into a
database table. Then a rich set of qualitative, quantitative, and trend
analyses of opinions can be performed using the whole suite of
database management systems (DBMS) and OLAP tools.
5. The opinion defined here is just one type of opinion, called regular
opinion. Another type is comparative opinion (Jindal and Liu,
2006b; Liu, 2006 and 2011), which needs a different definition.
Section 2.3 will discuss different types of opinions. Chapter 8
defines and analyzes comparative opinions. For the rest of this
section, we only focus on regular opinions. For simplicity, we just
called them opinions.
128
128
One of the things that define us as human Is our ability to showcase a wide Sentiment Analysis and
Opinion Mining - I
range of emotions. We can be sad, excited, worried, and angry within
seconds. We can learn and analyze situations like no other animal can.
These are just some of the things that makes us unique and special.
This is evident during shopping, too. Almost all purchases are emotion-
driven. It could be out of fear, jealousy, or happiness. With emotions
playing a critical role in customer behavior, it has become essential for
brands to analyze the sentiments of their consumers.
Here’s where the concept of sentiment analysis comes into play. In this
post, we’ll discuss the idea and different types of sentiment analysis.
Sentiment Analysis Research
As discussed above, pervasive real-life applications are only part of the
reason why sentiment analysis is a popular research problem. It is also
highly challenging as a NLP research topic, and covers many novel
subproblems as we will see later. Additionally, there was little research
before the year 2000 in either NLP or in linguistics. Part of the reason is
that before then there was little opinion text available in digital forms.
Since the year 2000, the field has grown rapidly to become one of the
most active research areas in NLP. It is also widely researched in data
mining, Web mining, and information retrieval. In fact, it has spread from
computer science to management sciences (Archak, Ghose and Ipeirotis,
2007; Chen and Xie, 2008; Das and Chen, 2007; Dellarocas, Zhang and
Awad, 2007; Ghose, Ipeirotis and Sundararajan, 2007; Hu, Pavlou and
Zhang, 2006; Park, Lee and Han, 2007).
Different Levels of Analysis
Introduction to the main research problems based on the level of
granularities of the existing research. In general, sentiment analysis has
been investigated mainly at three levels:
Document level: The task at this level is to classify whether a whole
opinion document expresses a positive or negative sentiment. For
example, given a product review, the system determines whether the
review expresses an overall positive or negative opinion about the product.
This task is commonly Sentiment Analysis and Opinion Mining 11 known
as document-level sentiment classification. This level of analysis assumes
that each document expresses opinions on a single entity (e.g., a single
product). Thus, it is not applicable to documents which evaluate or
compare multiple entities.
Sentence level: The task at this level goes to the sentences and determines
whether each sentence expressed a positive, negative, or neutral opinion.
Neutral usually means no opinion. This level of analysis is closely related
to subjectivity classification (Wiebe, Bruce and O'Hara, 1999), which
distinguishes sentences (called objective sentences) that express factual
information from sentences (called subjective sentences) that express
subjective views and opinions. However, we should note that subjectivity
129
Natural Language Processing is not equivalent to sentiment as many objective sentences can imply
opinions, e.g., “We bought the car last month and the windshield wiper
has fallen off.” Researchers have also analyzed clauses (Wilson, Wiebe
and Hwa, 2004), but the clause level is still not enough, e.g., “Apple is
doing very well in this lousy economy.” Entity and Aspect level: Both the
document level and the sentence level analyses do not discover what
exactly people liked and did not like.
Aspect level performs finer-grained analysis. Aspect level was earlier
called feature level (feature-based opinion mining and summarization) .
Instead of looking at language constructs (documents, paragraphs,
sentences, clauses or phrases), aspect level directly looks at the opinion
itself. It is based on the idea that an opinion consists of a sentiment
(positive or negative) and a target (of opinion). An opinion without its
target being identified is of limited use. Realizing the importance of
opinion targets also helps us understand the sentiment analysis problem
better. For example, although the sentence “although the service is not that
great, I still love this restaurant” clearly has a positive tone, we cannot say
that this sentence is entirely positive. In fact, the sentence is positive about
the restaurant (emphasized), but negative about its service (not
emphasized). In many applications, opinion targets are described by
entities and/or their different aspects.
Thus, the goal of this level of analysis is to discover sentiments on entities
and/or their aspects. For example, the sentence “The iPhone’s call quality
is good, but its battery life is short” evaluates two aspects, call quality and
battery life, of iPhone (entity). The sentiment on iPhone’s call quality is
positive, but the sentiment on its battery life is negative. The call quality
and battery life of iPhone are the opinion targets. Based on this level of
analysis, a structured summary of opinions about entities and their aspects
can be produced, which turns unstructured text to structured data and can
be used for all kinds of qualitative and quantitative analyses. Both the
document level and sentence level classifications are already Sentiment
Analysis and Opinion Mining 12 highly challenging. The aspect-level is
even more difficult. It consists of several sub-problems, which we will
discuss.
To make things even more interesting and challenging, there are two types
of opinions, i.e., regular opinions and comparative opinions .
A regular opinion expresses a sentiment only on a particular entity or an
aspect of the entity, e.g., “Coke tastes very good,” which expresses a
positive sentiment on the aspect taste of Coke.
A comparative opinion compares multiple entities based on some of their
shared aspects, e.g., “Coke tastes better than Pepsi,” which compares Coke
and Pepsi based on their tastes (an aspect) and expresses a preference for
Coke ,
130
130
What Is Sentiment Analysis? Sentiment Analysis and
Opinion Mining - I
First things first, what is sentiment analysis? Sentiment analysis is a type
of market analysis that includes the use of text analysis, biometrics,
natural language processing (NLP), and computational linguistics to
recognize the state of the said information.
In simple terms, it’s the process of determining whether a piece of content
– email, social media post, or article – is negative, positive, or neutral.
Sentiment analysis enables you to ascertain public opinion and understand
consumer experiences.
But why should you even bother about sentiment analysis? For starters,
it’s extremely helpful in social media monitoring. It helps you gauge
public opinion on certain topics on an enormous scale.
Besides, it can play a pivotal role in market research and customer service.
With sentiment analysis, you can see what people think about your
products or your competitors’ products. This helps you understand
customer attitudes and preferences, enabling you to make informed
decisions.
Types of Sentiment Analysis
People have a wide range of emotions – sad or happy, interested or
uninterested, and positive or negative. Different sentiment analysis models
are available to capture this variety of emotions.
Let’s look at the most important types of sentiment analysis.
1. Fine-Grained
This sentiment analysis model helps you derive polarity precision. You
can conduct a sentiment analysis across the following polarity categories:
very positive, positive, neutral, negative, or very negative. Fine-grained
sentiment analysis is helpful for the study of reviews and ratings.
For a rating scale from 1 to 5, you can consider 1 as very negative and five
as very positive. For a scale from 1 to 10, you can consider 1-2 as very
negative and 9-10 as very positive.
2. Aspect-Based
While fine-grained analysis helps you determine the overall polarity of
your customer reviews, aspect-based analysis delves deeper. It helps you
determine the particular aspects people are talking about.
Let’s say; you’re a mobile phone manufacturer, and you get a customer
review stating, “the camera struggles in artificial lighting conditions.”
With aspect-based analysis, you can determine that the reviewer has
commented on something “negative” about the “camera.”
131
Natural Language Processing 3. Emotion Detection
As the name suggests, emotion detection helps you detect emotions. This
can include anger, sadness, happiness, frustration, fear, worry, panic, etc.
Emotion detection systems typically use lexicons – a collection of words
that convey certain emotions. Some advanced classifiers also utilize robust
machine learning (ML) algorithms.
It’s recommended to use ML over lexicons because people express
emotions in a myriad of ways. Take this line, for example: “This product
is about to kill me.” This line may express feelings of fear and panic.
A similar line – this product is killing it for me – has an entirely different
and positive meaning. But the word “kill” might be associated with fear or
panic in the lexicon. This may lead to inaccurate emotion detection.
4. Intent Analysis
Accurately determining consumer intent can save companies time, money,
and effort. So many times, businesses end up chasing consumers that don’t
plan to buy anytime soon. Accurate intent analysis can resolve this hurdle.
The intent analysis helps you identify the intent of the consumer – whether
the customer intends to purchase or is just browsing around.
If the customer is willing to purchase, you can track them and target them
with advertisements. If a consumer isn’t ready to buy, you can save your
time and resources by not advertising to them.
9.2.1 Sentiment Analysis - Affective lexicons
Definition:
● It is a Natural Language Processing task; Sentiment Analysis refers
to finding patterns in data and inferring the emotion of the given
piece of information which could be classified into one of these
categories:
• Negative
• Neutral
• Positive
Utility:
• Rule based
Rule based sentiment analysis refers to the study conducted by the
language experts. The outcome of this study is a set of rules (also known
as lexicon or sentiment lexicon) according to which the words classified
are either positive or negative along with their corresponding intensity
measure.
Generally, the following steps are needed to be performed while applying
the rule-based approach:
133
Natural Language Processing • Stop words removal. Those words which do not carry any significant
meaning and should not be used for the analysis activity. Examples
of stop words are: a, an, the, they, while etc.
• Punctuation removal (in some cases)
• Running the preprocessed text against the sentiment lexicon which
should provide the number/measurement corresponding to the
inferred emotion
• Example:
• Consider this problem instance: “Sam is a great guy.”
• Remove stop words and punctuation
• Running the lexicon on the preprocessed data, returns a positive
sentiment score/measurement because of the presence of a positive
word “great” in the input data.
• This is a very simple example. Real world data is much more
complex and nuanced e.g. your problem sentiment may contain
sarcasm (where seemingly positive words carry negative meaning or
vice versa), shorthand, abbreviations, different spellings (e.g. flavor
vs flavour), misspelled words, punctuation (especially question
marks), slang and of course, emojis.
• In order to tackle the complex data for analysis is to make use of
sophisticated lexicons that can take into consideration the intensity
of the words (e.g. if a word is positive then how positive it is,
basically there is a difference between good, great and amazing and
that is represented by the intensity assigned to a given word), the
subjectivity or objectivity of the word and the context also. There
are several such lexicons available. Following are a couple of
popular ones:
• VADER (Valence Aware Dictionary and Sentiment
Reasoner): Widely used in analyzing sentiment on social
media text because it has been specifically attuned to analyze
sentiments expressed in social media (as per the linked docs).
It now comes out of the box in Natural language toolkit,
NLTK. VADER is sensitive to both polarity and the intensity.
Here is how to read the measurements:
• -4: extremely negative
• 4: extremely positive
• 0: Neutral or N/A
Examples (taken from the docs):
TextBlob: Very useful NLP library that comes prepackaged with its own
sentiment analysis functionality. It is also based on NLTK. The sentiment
property of the api/library returns polarity and subjectivity.
136
136
Sarcastic sentences with or without sentiment words are hard to deal with, Sentiment Analysis and
e.g., “What a great car! It stopped working in two days.” Sarcasms are not Opinion Mining - I
so common in consumer reviews about products and services, but are very
common in political discussions, which make political opinions hard to
deal with. Many sentences without sentiment words can also imply
opinions. Many of these sentences are actually objective sentences that are
used to express some factual information. Again, there are many types of
such sentences. Here we just give two examples. The sentence “This
washer uses a lot of water” implies a negative sentiment about the washer
since it uses a lot of resource (water). The sentence “After sleeping on the
mattress for two days, a valley has formed in the middle” expresses a
negative opinion about the mattress. This sentence is objective as it states
a fact. All these sentences have no sentiment words. These issues all
present major challenges. In fact, these are just some of the difficult
problems.
Natural Language Processing Issues Finally, we must not forget sentiment
analysis is a NLP problem. It touches every aspect of NLP, e.g.,
coreference resolution, negation handling, and word sense disambiguation,
which add more difficulties since these are not solved problems in NLP.
However, it is also useful to realize that sentiment analysis is a highly
restricted NLP problem because the system does not need to fully
understand the semantics of each sentence or document but only needs to
understand some aspects of it, i.e., positive or negative sentiments and
their target entities or topics. In this sense, sentiment analysis offers a
great platform for NLP researchers to make tangible progresses on all
fronts of NLP with the potential of making a huge practical impact
content.
Researchers now also have a much better understanding of the whole
spectrum of the problem, its structure, and core issues. Numerous new
(formal) models and methods have been proposed. The research has not
only deepened but also broadened significantly. Earlier research in the
field mainly focused on classifying the sentiment or subjectivity expressed
in documents or sentences, which is insufficient for most real-life
applications. Practical applications often demand more in-depth and fine-
grained analysis. Due to the maturity of the field, the book is also written
in a structured form in the sense that the problem is now better defined and
different research directions are unified around the definition.
Opinion Spam Detection
A key feature of social media is that it enables anyone from anywhere in
the world to freely express his/her views and opinions without disclosing
his/her true identify and without the fear of undesirable consequences.
These opinions are thus highly valuable. However, this anonymity also
comes with a price. It allows people with hidden agendas or malicious
intentions to easily game the system to give people the impression that
they are independent members of the public and post fake opinions to
promote or to discredit target products, services, organizations, or
individuals without disclosing their true intentions, or the person or
137
Natural Language Processing organization that they are secretly working for. Such individuals are called
opinion spammers and their activities are called opinion spamming.
Opinion spamming has become a major issue. Apart from individuals who
give fake opinions in reviews and forum discussions, there are also
commercial companies that are in the business of writing fake reviews and
bogus blogs for their clients. Several high-profile cases of fake reviews
have been reported in the news. It is important to detect such spamming
activities to ensure that the opinions on the Web are a trusted source of
valuable information. Unlike extraction of positive and negative opinions,
opinion spam detection is not just a NLP problem as it involves the
analysis of people’s posting behaviors. It is thus also a data mining
problem. Chapter 10 will discuss the current state-of-the-art detection
techniques.
138
138
10
SENTIMENT ANALYSIS AND
OPINION MINING - II
Unit Structure
10.0 Computing with affective lexicons
10.1 Aspect based sentiment analysis
140
140
Major sub-fields of AI now include: Machine Learning, Neural Sentiment Analysis and
Opinion Mining - II
Networks, Evolutionary Computation, Vision, Robotics, Expert
Systems, Speech Processing, Natural Language Processing, and
Planning.
Affective computing is becoming more and more important as it enables
to extend the possibilities of computing technologies by incorporating
emotions. In fact, the detection of users' emotions has become one of the
most important aspects regarding Affective Computing.
Affective computing is becoming more and more important as it enables
to extend the possibilities of computing technologies by incorporating
emotions. In fact, the detection of users' emotions has become one of the
most important aspects regarding Affective Computing.
Social Media are the platforms created using these Web 2.0 technologies. -
Social Computing is simply the branch of computer science that
focuses on analyzing the relationship between human behavior and
computer systems.
How do you identify primary emotions?
Thomas says that primary emotions are simply our initial reactions to
external events or stimuli. Secondary emotions are the reactions we then
have to our reaction
Knowing the Difference Between Primary and Secondary Emotions Could
Be the Key to Fighting Fairly with Your Partner
145
Natural Language Processing
146
146
Although domain adaptation (or transfer learning) has been studied by Sentiment Analysis and
Opinion Mining - II
researchers, the technology is still far from mature, and the current
methods are also mainly used for document level sentiment classification
as documents are long and contain more features for classification than
individual sentences or clauses. Thus, supervised learning has difficulty to
scale up to a large number of application domains. Sentiment Analysis and
Opinion Mining 60 The lexicon-based approach can avoid some of the
issues , and has been shown to perform quite well in a large number of
domains. Such methods are typically unsupervised. They use a sentiment
lexicon (which contains a list of sentiment words, phrases, and idioms),
composite expressions, rules of opinions , and (possibly) the sentence
parse tree to determine the sentiment orientation on each aspect in a
sentence. They also consider sentiment shifters, but-clauses and many
other constructs which may affect sentiments.
Of course, the lexicon-based approach also has its own shortcomings,
which we will discuss later. An extension of this method to handling
comparative sentences . Below, we introduce one simple lexicon-based
method to give a flavor of this approach. The method and it has four
steps. Here, we assume that entities and aspects are known. Mark
sentiment words and phrases: For each sentence that contains one or more
aspects, this step marks all sentiment words and phrases in the sentence.
Each positive word is assigned the sentiment score of +1 and each
negative word is assigned the sentiment score of −1. For example, we have
the sentence, “The voice quality of this phone is not good, but the battery
life is long.” After this step, the sentence becomes “The voice quality of
this phone is not good [+1], but the battery life is long” because “good” is
a positive sentiment word (the aspects in the sentence are italicized). Note
that “long” here is not a sentiment word as it does not indicate a positive
or negative sentiment by itself in general, but we can infer its sentiment in
this context shortly. In fact, “long” can be regarded as a context-dependent
sentiment word, which we will discuss in Chapter 10. In the next section,
we will see some other expressions that can give or imply positive or
negative sentiments.
2. Apply sentiment shifters: Sentiment shifters (also called valence
shifters in (Polanyi and Zaenen, 2004)) are words and phrases that
can change sentiment orientations. There are several types of such
shifters. Negation words like not, never, none, nobody, nowhere,
neither, and cannot are the most common type. This step turns our
sentence into “The voice quality of this phone is not good[-1], but
the battery life is long” due to the negation word “not.” We will
discuss several other types of sentiment shifters in the next section.
Note that not every appearance of a sentiment shifter changes the
sentiment orientation, e.g., “not only … but also.” Such cases need
to be dealt with care. That is, such special uses and patterns need to
be identified beforehand.
3. Handle but-clauses: Words or phrases that indicate contrary need
special handling because they often change sentiment orientations
too. Sentiment Analysis and Opinion Mining 61 The most 147
Natural Language Processing commonly used contrary word in English is “but”. A sentence
containing a contrary word or phrase is handled by applying the
following rule: the sentiment orientations before the contrary word
(e.g., but) and after the contrary word are opposite to each other if
the opinion on one side cannot be determined. The if-condition in
the rule is used because contrary words and phrases do not always
indicate an opinion change, e.g., “Car-x is great, but Car-y is better.”
After this step, the above sentence is turned into “The voice quality
of this phone is not good[-1], but the battery life is long[+1]” due to
“but” ([+1] is added at the end of the but-clause). Notice here, we
can infer that “long” is positive for “battery life”. Apart from but,
phrases such as “with the exception of,” “except that,” and “except
for” also have the meaning of contrary and are handled in the same
way. As in the case of negation, not every but means contrary, e.g.,
“not only … but also.” Such non-but phrases containing “but” also
need to be identified beforehand.
4. Aggregate opinions: This step applies an opinion aggregation
function to the resulting sentiment scores to determine the final
orientation of the sentiment on each aspect in the sentence. Let the
sentence be s, which contains a set of aspects {a1, …, am} and a set
of sentiment words or phrases {sw1, …, swn} with their sentiment
scores obtained from steps 1- 3. The sentiment orientation for each
aspect ai in s is determined by the following aggregation function: , (
, ) . ( , ) ∑∈ = ow s j i j i j dist sw a sw so score a s (5) where swj is
an sentiment word/phrase in s, dist(swj, ai) is the distance between
aspect ai and sentiment word swj in s. swj.so is the sentiment score
of swi. The multiplicative inverse is used to give lower weights to
sentiment words that are far away from aspect ai. If the final score is
positive, then the opinion on aspect ai in s is positive. If the final
score is negative, then the sentiment on the aspect is negative. It is
neutral otherwise. This simple algorithm performs quite well in
many cases. It is able to handle the sentence “Apple is doing very
well in this bad economy” with no problem. Note that there are
many other opinion aggregation methods. For example, (Hu and Liu,
2004) simply summed up the sentiment scores of all sentiment
words in a sentence or sentence segment. Kim, and Hovy (2004)
used multiplication of sentiment scores of words. Similar methods
were also employed by other researchers (Wan, 2008; Zhu et al.,
2009). To make this method even more effective, we can determine
the scope of each individual sentiment word instead of using words
distance as above. In Sentiment Analysis and Opinion Mining 62
this case, parsing is needed to find the dependency as in the
supervised method discussed above. We can also automatically
discover the sentiment orientation of context dependent words such
as “long” above. In fact, the above simple approach can be enhanced
in many directions. For example, Blair-Goldensohn et al. (2008)
integrated the lexicon-based method with supervised learning.
Kessler and Nicolov (2009) experimented with four different
strategies of determining the sentiment on each aspect/target
148
148
(including a ranking method). They also showed several interesting Sentiment Analysis and
Opinion Mining - II
statistics on why it is so hard to link sentiment words to their targets
based on a large amount of manually annotated data. Along with
aspect sentiment classification research, researchers also studied the
aspect sentiment rating prediction problem which has mostly been
done together with aspect extraction in the context of topic
modeling, As indicated above, apart from sentiment words and
phrases, there are many other types of expressions that can convey
or imply sentiments. Most of them are also harder to handle. Below,
we list some of them, which are called the basic rules of opinions .
Aspect-based sentiment analysis goes one step further than sentiment
analysis by automatically assigning sentiments to specific features or
topics. It involves breaking down text data into smaller fragments,
allowing you to obtain more granular and accurate insights from your data.
Aspect-Based Opinion Mining ( ABOM ) involves extracting aspects or
features of an entity and figuring out opinions about those aspects. It's
a method of text classification that has evolved from sentiment analysis
and named entity extraction ( NER ). ABOM is thus a combination of
aspect extraction and opinion mining.
Which type of analytics is used in sentiment analysis?
Sentiment analysis is used to determine whether a given text contains
negative, positive, or neutral emotions. It's a form of text analytics that
uses natural language processing (NLP) and machine learning.
Sentiment analysis is also known as “opinion mining” or “emotion
artificial intelligence.
Issues and Challenges of Aspect-based Sentiment Analysis: A
Comprehensive Survey Ambreen Nazir, Yuan Rao, Lianwei Wu, Ling Sun
Abstract—The domain of Aspect-based Sentiment Analysis, in which
aspects are extracted, their sentiments are analyzed and sentiments are
evolved over time, is getting much attention with increasing feedback of
public and customers on social media. The immense advancements in the
field urged researchers to devise new techniques and approaches, each
sermonizing a different research analysis/question, that cope with
upcoming issues and complex scenarios of Aspect-based Sentiment
Analysis. Therefore, this survey emphasized on the issues and challenges
that are related to extraction of different aspects and their relevant
sentiments, relational mapping between aspects, interactions,
dependencies and contextual-semantic relationships between different data
objects for improved sentiment accuracy, and prediction of sentiment
evolution dynamicity. A rigorous overview of the recent progress is
summarized based on whether they contributed towards highlighting and
mitigating the issue of Aspect Extraction, Aspect Sentiment Analysis or
Sentiment Evolution.
The reported performance for each scrutinized study of Aspect Extraction
and Aspect Sentiment Analysis is also given, showing the quantitative
evaluation of the proposed approach. Future research directions are 149
Natural Language Processing proposed and discussed, by critically analysing the presented recent
solutions, that will be helpful for researchers and beneficial for improving
sentiment classification at aspect-level.
Index Terms— Aspect, Computational linguistic, Deep learning,
Sentiment analysis, Sentiment evolution, Social media —
INTRODUCTION HE field of Natural Language Processing (NLP)
dealing Sentiment Analysis (SA), also named as opinion mining, is an
active research area to display emotions and to automatically discover the
sentiments expressed within the text.
The object of SA is usually a product or service that is of keen interest
among people, that they care to put a sentiment towards it. Traditionally,
SA has been considered as opinion polarity that whether someone has
expressed positive, negative or neutral sentiment about an event .
Since last decade, researchers are putting efforts to capture, quantify and
measure dynamic public sentiments through different methods, tools and
techniques, and thus allowing SA as one of the rapidly growing research
areas.
SA applications have been widely spread to nearly every domain, like
social events, consumer products and services, healthcare, political
elections and financial services.
Influential groups and business organizations, like, Google, Microsoft,
SAP and SAS, have designed their own in-house capabilities that support
them in decision making and assist them in developing better business
applications to track and predict evolving market trends. From studies, SA
has been generally categorized at three levels.
Document-level
sentence-level
and aspect level
To classify whether a whole document, sentence (subjective or objective)
and an aspect expresses a sentiment, i.e., positive, negative or neutral. The
Aspect based Sentiment Analysis (AbSA) helps to understand the problem
of SA better comparatively, because it directly focuses on sentiments
rather than language structure. Where, an aspect is related to an entity, and
the basic concept of an aspect is not just limited to judgement but also
extends towards thoughts, point of views, ways of thinking, perspectives,
an underlying theme or social influence towards an occurrence.
Hence, AbSA provides a great opportunity to analyse sentiments (public)
over time across different contents present on media .
AbSA can be categorized by three main processing phases, i.e., Aspect
Extraction (AE), Aspect Sentiment Analysis (ASA) and Sentiment
Evolution (SE). The first phase deals with the extraction of aspects, which
can be
150
150
explicit aspects , Sentiment Analysis and
Opinion Mining - II
implicit aspects ,
aspect terms ,
entities and Opinion Target Expressions (OTE) .
The second phase classifies sentiment polarity for a predefined aspect,
target or entity .
This phase also formulates interactions, dependencies and contextual
semantic relationships between different data objects, e.g., aspect, entity,
target, multi-word target, sentiment word, for achieving improved
sentiment classification accuracy .
The expressed sentiment can be classified into ternary, or fine-gained
sentiment values .
The third phase concerns with the dynamicity of peoples’ sentiment
towards aspects (events) over a period of time. Social characteristics and
self-experience are considered as the leading causes for SE .
Focus of this Survey The field of AbSA is not a straight road, it has
suffered many diverse changes and touched many new eras to ponder
over. Researchers have been working hard to resolve multi-faceted
challenges containing many issues. They have come up with thorough
solutions of many complicated challenges through different machine
learning techniques, mostly deep-learning techniques, that represent their
critical idea in the field. They have provided pictorial representations and
numerical modeling for handling complex scenarios.
Organization of this Survey :The survey has been started with the brief
introduction and paramount significance of AbSA. The rest of the survey
is organized as follows: Section II provides the definitions of sentiment
with respect to aspect, and lists down the major issues and challenges
related to AE, ASA and SE. Section III and IV discusses the major issues
of AE and ASA, and concisely describes the recent solutions for these
issues. Section V discusses the dynamics of SE. Section VI highlights the
future research directions. Section VII concludes this survey.
KEY SOLUTIONS : This section presents AbSA definitions, and
outlines the major issues and sub-issues of AE, ASA and SE. This section
also illustrates three main processing steps of AbSA, i.e., AE, ASA and
SE, through framework for the ease and understandability of the reader.
Definitions (AbSA) In AbSA, sentiment is a subjective consciousness of
human beings towards an aspect (objective existence). People get affected
by the ups and downs of life, at any time and place, which results in the
change of their sentiment towards a specific aspect. This change depicts
human behavior flexibility, decision autonomy and creative thinking. Pang
and Lee defined SA as: “A sentiment is basically an opinion that a person
expresses towards an aspect, entity, person, event, feature, object, or a
certain target.” Now Title Critical Idea Challenges Pang . Opinion Mining
and SA ∙ presented applications that determine sentiment of the document,
and organized approaches related to opinion-oriented classification 151
Natural Language Processing problems How to perform SA at document-level using machine-learning
approaches?
A Survey on Sentiment Detection of Reviews ∙ discussed approaches
related to subjectivity classification, word classification and opinion
discovery in customer review domain How to perform SA at document-
level using machine-learning approaches?
A Survey on the Role of Negation in SA ∙ presented computational
methods for handling negation in SA How to cope with the scope and
challenges of negation modeling?
Survey on Mining Subjective Data on The Web ∙ differentiated between
four different approaches that classify word-sentiment value, i.e.,
machine-learning, semantic, statistical and dictionary-based approaches
How to perform subjective SA at documentlevel?
SA and Opinion Mining ∙ covered the field of SA at document, sentence
and aspect-level ∙ discussed various issues related to AE, sentiment
classification, sentiment lexicons, NLP and opinion-spam detection ∙
surveyed the till date practical solutions along with the future directions
How to cope with review ranking, redundancy issues, viewpoints quality,
genuine aspects, spammer detection etc… ?
A Survey on Opinion Mining and SA: Tasks, Approaches and
Applications ∙ organized subtasks of machine learning, NLP and SA
techniques, such as, subjectivity classification, sentiment classification,
lexicon relation, opinion-word extraction, and various applications of SA ∙
discussed open issues and future directions in SA How to focus on
sentence-level and document-level SA and their subtasks?
Like It or Not: A Survey of Twitter SA Methods ∙ discussed the deep-
learning algorithms related to twitter SA ∙ elaborated tasks specific to
emotion detection, change of sentiment over time, sarcasm detection, and
sentiment classification How to tackle the challenges, tasks and feature
selection methods limited to twitter SA?
Survey on Aspect-Level SA ∙ performed approach-based categorization
of different solutions those were related to AE, aspect classification and a
combination of both ∙ proposed future research direction for semantically-
rich-concept-centric AbSA How to cope with the challenges of
comparative opinions, conditional sentences, negation modifiers and
presentation?
Deep Learning for SA: A Survey ∙ presented applications and deep-
learning approaches for the SA related tasks such as sentiment
intersubjectivity, lexicon expansion, stance detection How to achieve
advances in SA using deep learning approaches?
The State-of-the-Art in Twitter SA: A Review and Benchmark
Evaluation ∙ focused on challenges and key trends related to classification
errors, twitter monitoring and event detection to perform twitter SA
152
152
effectively How to reveal the root causes of commonly occurring Sentiment Analysis and
Opinion Mining - II
classification errors?
A Survey of SA in Social Media ∙ categorized the latest technologies and
techniques in SA ∙ introduced latest tools for research approaches related
to subjectivity classification in customer-review domain How to focus on
different types of data and advance tools, to overcome the limitations of
social media SA?
A Survey on Sentiment and Emotion Analysis for Computational
Literary Studies ∙ presented approaches of SA and emotion analysis ∙
discussed the computational methods for sentiment and emotion
classification How to classify and interpret the emotional text through
sentiment and emotional analysis in digital human community? Our Issues
and Challenges of Aspect-based Sentiment Analysis: A Comprehensive
Survey ∙ discusses the issues and challenges of AE, ASA and SE ∙ presents
the progress of AbSA by concisely describing the recent solutions ∙
highlight factors responsible for SE dynamicity ∙ proposes future research
directions by critically analyzing the present solutions How to improve the
mechanism of AE? What measures should be taken to achieve good
classification accuracy at aspect-level? How to predict SE dynamicity?
Reference pdf:
• https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-
OpinionMining.pdf
• https://www.pdfdrive.com/sentiment-analysis-and-opinion-mining-
e1590362.html
• https://www.researchgate.net/publication/265163299_Sentiment_An
alysis_and_Opinion_Mining_A_Survey
• https://link.springer.com/referenceworkentry/10.1007/978-1-4899-
7687-1_907
• https://www.morganclaypool.com/doi/abs/10.2200/S00416ED1V01
Y201204HLT016
153