Lecture 07

Natural Language Processing (COSC 6405)
Lecture 07: Applications of NLP
Department of Computer Science,

Addis Ababa University
Yaregal Assabie
2018/19—Sem I
Information Retrieval
The Retrieval Process
Information Extraction
Classic IR Models
Machine Translation
IR Performance Evaluation
Question-Answering and Dialogue Systems
NLP in IR
Text Summarization
• Information Retrieval (IR) provides a list of potentially relevant documents in response

to users query.
User Interface
user need
Text Operations text text
user feedback logical view logical view
Query Operations Indexing DB Manager Module

query inverted file
Searching Index
Text
ranked docs retrieved docs
Database
Ranking
The process of retrieving information [Baeza-Yates & Ribeiro-Neto]
Department of Computer Science, Addis Ababa University Lecture 07: Applications of NLP 2/59
Classic IR Models
Machine Translation
NLP in IR
Text Summarization
Classic IR Models
• Classic IR models consider that each document is described by a set of representative

keywords called index terms.
• Index terms have the following characteristics
♦ (Document) words whose semantics help in remembering the documents’ main

themes.
♦ Used to index and summarize the documents.
♦ Mainly nouns because nouns have meaning by themselves.
• Depending on how index terms are treated, there are three classic IR models: Boolean,
Vector and Probabilistic models.
Classic IR Models
Machine Translation
NLP in IR
Text Summarization
Classic IR Models: Boolean Model
• Based on set theoretic (set theory and Boolean algebra) concepts.
• Documents and queries are represented as a set of index terms.
• Queries are specified as Boolean expressions which have precise semantics.
• Similarity of documents to queries is measured with exact matching.
• Retrieval strategy is based on binary decisions (relevant or non-relevant).
• Considers that index terms are present or absent.
• Index terms are linked by three connectives (AND, OR, NOT).
• Simple but not efficient.
Classic IR Models
Machine Translation
NLP in IR
Text Summarization
Classic IR Models: Vector Model
• Based on algebraic concepts.
• Documents and queries are represented as vectors in a t-dimensional vector space.
• Recognizes that the use of binary weights is too limiting.
• Assigns non-binary weights to index terms in queries and documents.
♦ These term weights are used to compute the degree of similarity each
document stored in the systems and the user query.
♦ Retrieved documents can be sorted in decreasing order to get ranked list of
documents.
• Advantages:
♦ Its term-weighting scheme improves retrieval performance.

♦ Its partial matching strategy allows retrieval of documents that approximate
query conditions.
♦ Ranking of retrieved documents.
Classic IR Models
Machine Translation
NLP in IR
Text Summarization
Classic IR Models: Probabilistic Model
• Based on probabilistic concepts.
• Captures the IR problem with the assumption that for a given user query there is a set
of documents which contains exactly the relevant documents and no other.
♦ This set of documents is the ideal answer set.
♦ Given the description of this ideal answer set, there would be no problem in
retrieving its documents.
• The querying process can then be thought of as a process of specifying the properties
of an ideal answer set.
♦ Initially guess the properties (we can start by using index terms).
♦ User feed back is then initiated and taken to improve the probability that the
user will find document d with query q.
♦ Measure of document similarity to the query:
P(d relevant-to q)
P(d non-relevant-to q)
• The main advantage is that documents are ranked in decreasing order of their
probability of being relevant.
Classic IR Models
Machine Translation
NLP in IR
Text Summarization
• The performance of IR systems can be evaluated by using two commonly used metrics:
precision and recall.
• Recall is the fraction of the relevant documents which has been retrieved.
relevant ∩ retrieved
Recall =
relevant
• Precision is the fraction of the retrieved documents which is relevant.
relevant ∩ retrieved
Precision =
retrieved
Classic IR Models
Machine Translation
NLP in IR
Text Summarization
NLP in IR
• NLP is widely used to improve the performance of IR systems.
• Most commonly used applications are:
♦ Lexical analysis: with the objective of treating digits, hyphens, punctuation

marks, and the case of letters.
♦ Stop word removal: with the objective of filtering out words with very low
discrimination values for retrieval purpose.
♦ Stemming: with the objective of removing affixes.
♦ Automatic indexing: with the objective of determining representative words (or

groups of words (usually noun groups).
♦ Document clustering: with the objective of building the relationships between

documents.
Classic IR Models
Machine Translation
NLP in IR
Text Summarization
NLP in IR
♦ Construction of lexical relationships: with the objective of building term

categorization structures such as thesaurus which allows
the expansion of the original query with related terms.
Improves recall and precision.
Synonymy: absence of synonymy relationship yields poor recall from

missing synonymous documents.
Polysemy/homonymy: terms yield poor precision results from search

terms that have multiple meanings leading to retrieval of non-
relevant documents.
Information Retrieval Components of Information Extraction
Information Extraction Named Entity Recognition
Machine Translation Relation Detection and Classification
Question-Answering and Dialogue Systems Temporal and Event Processing
Text Summarization Template Filling
Components of Information Extraction
• Information Extraction (IE) focuses on the recognition, tagging, and extraction of

certain key elements of information (e.g. persons, companies, locations, organizations,
etc.) from large collections of text into a structured representation.
• Example:
Firm XYZ is a full service advertising agency specializing in direct and
Text: interactive marketing. Located in Bole, Addis Ababa, Firm XYZ is looking for
an Assistant Account Manager to help manage and coordinate interactive
marketing initiatives. Experience in online marketing and/or the advertising
field is a plus. Depending on the experiences of the applicants, the company
pays an attractive salary of Birr 3,000- Birr 5,000 per month.
Extracted Information:
INDUSTRY: Advertising
POSITION: Assistant Account Manager
LOCATION: Bole, Addis Ababa.
COMPANY: Firm XYZ
SALARY: Birr 3,000 - Birr 5,000 per month
Components of Information Extraction
• IE is applied to a narrowly restricted domain.
• It has the following subtasks:
♦ Named Entity Recognition: recognition of entity names.
♦ Relation Detection and Classification: identification of relations between entities.
♦ Coreference and Anaphoric Resolution: resolving links to previously named

entities.
♦ Temporal and Event Processing: recognizing temporal expressions and analyzing

events.
♦ Template Filling: filling in the extracted information.
Named Entity Recognition
• Named Entity Recognition is the process of recognition of entity names such as:
♦ People: Abebe, Kebede, Aበበ, ከበደ, etc.
♦ Organization: Ministry of Education, ABC Company, ትምህርት ሚኒስቴር, ሀለመ

ኩባንያ, etc.
♦ Place: Addis Ababa, Megenagna, Aዲስ Aበባ, መገናኛ, etc.
♦ Time expression: Tuesday, February 14, ማክሰኞ, የካቲት 6, etc.
♦ Quantities: three quintals of teff, 3000 Birr, ሶስት ኩንታል ጤፍ, 3ሺ ብር, etc.
♦ etc.
Relation Detection and Classification
• Relation Detection and Classification involves identification of relations between entities.
• Examples:
♦ PERSON works for ORGANIZATION
♦ ORGANIZATION located in PLACE
♦ PERSON lives in PLACE
♦ SALARY amounts to QUANTITY
♦ PERSON is paid SALARY
♦ PERSON works for ORGANIZATION since DATE
Temporal and Event Processing
• Temporal and Event Processing recognizes and normalizes temporal expressions and
analyzes events.
• It has three major components:
♦ Temporal Expression Recognition
Examples:
He was born on October 2, 1938.
He was born in the middle of the Second Italo-Ethiopian War.
He was born two years after the Second Italo-Ethiopian War broke.
The pump circulates the water every 2 hours.
♦ Temporal Normalization
♦ Event Detection and Analysis
Template Filling
• Template Filling is the final task of information extraction systems where structured
data is to be filled in the template slots.
• Example:
Information Retrieval Applications and Approaches
Information Extraction Rule-Based Machine Translation
Machine Translation Statistical Machine Translation
Question-Answering and Dialogue Systems Example-Based Machine Translation
Text Summarization Hybrid Approaches to Machine Translation
Applications and Approaches
• Machine Translation (MT) refers to a translation of texts from one natural language to
another by means of a computerized system.
• MT is one of the earliest studied applications of NLP.
• Most important applications of MT are:
♦ Web-based translation services
♦ Spoken language translation services
• Although MT for Western languages is currently under intensive research and

development, it far from being a solved problem.
• Commonly used approaches to MT are:
♦ Rule-based (involving direct, transfer, and Interlingua) translations

♦ Statistical translations
♦ Example-based translations
♦ Hybrid approaches
Rule-Based Machine Translation: Direct Translation
• Direct (dictionary-based) translation uses a large bilingual dictionary and translates the
source language text word-by-word.
• The process of direct translation involves morphological analysis, lexical transfer, local
reordering, and morphological generation.
Abebe {3Per+Masc+Sing} Aበበ {3Per+Masc+Sing} Aበበ {3Per+Masc+Sing} Aበበ {3Per+Masc+Sing}

PAST(break) PAST(ስብር) መስኮት {Object} መስኮት[U][ን]
the U/ው U/ው
window መስኮት PAST(ስብር) ሰበር[ኧ][ው]
Morphological Lexical Local Morphological

analysis transfer re-ordering generation
Source text Target text

Abebe broke the window Aበበ መስኮቱን ሰበረው
Rule-Based Machine Translation: Direct Translation
Pros and Cons

• Pros:
♦ Fast
♦ Simple
♦ Inexpensive
♦ No translation rules hidden in lexicon
• Cons:
♦ Unreliable
♦ Not powerful
♦ Rule proliferation
♦ Requires too much context
♦ Major restructuring after lexical substitution
Rule-Based Machine Translation: Transfer-Based Translation
• Transfer-based translation uses an intermediate representation that captures the

structure of the original text in order to generate the correct translation.
• The process of transfer-based translation involves analysis, structural transfer, and

generation.
• The structural transfer can be made at two levels:
♦ Superficial (syntactic) transfer

Transfers syntactic structures between the source and target
languages.
Suitable for translation between closely related languages
♦ Deep (semantic) transfer
Transfers semantic structures between the source and target
languages.
Used for translation between distantly related languages.
Superficial (Syntactic) Transfer
NP VP S
N V NP NP VP
Syntactic Syntactic Syntactic
Abebe broke Det N Structure Transfer Structure N N V
the window Abebe the window broke
Word Lexical Word

Structure Transfer Structure

Deep (Semantic) Transfer
Semantic Semantic Semantic

broke(Abebe, window) Transfer broke(Abebe, window)
Structure Structure

Word Lexical Word


Pros and Cons of Transfer-Based Translation

• Pros:
♦ Offers the ability to deal with more complex source language phenomena than
the direct approach.
♦ High quality translations can be achieved as compared to direct translation.
♦ Relatively fast as compared to Interlingual translation.
• Cons:
♦ O(N2) sets transfer rules in multilingual machine translation.
♦ Proliferation of language-specific rules in lexicon and syntax.
Rule-Based Machine Translation: Interlingual Translation
• Interlingual translation uses a language-independent, 'universal', and abstract

representation of the original text in order to generate the target text.
• The process of Interlingual translation involves analysis and generation.
EVENT: breaking
TENSE: past
AGENT: Abebe
PATIENT: window
DEFINITENESS: definite
Interlingua
Analysis Generation

• Interlingual translation is suitable for multilingual machine translation, and its main
drawback is that the definition of an Interlingua is difficult and maybe even impossible
for a wider domain.
Rule-Based Machine Translation: The Vauquois Triangle
Analysis Interlingua Generation
Conceptual Analysis Conceptual Generation

Semantic Semantic Semantic
Semantic Analysis Semantic Generation

Syntactic Analysis Syntactic Generation

Word Direct Word
Structure Structure
Morphological Analysis Morphological Generation

Statistical Machine Translation
• Statistical Machine Translation (SMT) finds the most probable target sentence given a
source text sentence.
• Parameters of probabilistic models are derived from the analysis of bilingual text
corpora.
Bilingual Corpus of English and Amharic (Example)
Abebe ate besso. Aበበ በሶ በላ።
Abebe bought besso. Aበበ በሶ ገዛ።
Abebe threw the stone. Aበበ ድንጋዩን ወረወረው።
Abebe went to school. Aበበ ወደ ትምህርት ቤት ሄደ።
Kebede bought a car. ከበደ መኪና ገዛ።
Kebede bought the car. ከበደ መኪናውን ገዛው።
Kebede bought the car. ከበደ መኪናዋን ገዛት።
Almaz made tea. Aልማዝ ሻይ Aፈላች።
Almaz made tella. Aልማዝ ጠላ ጠነሰሰች።
General Architecture of SMTs
Source text
Language Model Decoding Translation Model
Target text
Statistical Machine Translation: Language Model
• Language Model tries to ensure that words come in the right order.
♦ Some notion of grammaticality
• Given an English string e, language model assigns p(e) by formula.
♦ Good English string Î high p(e); and bad English string Î low p(e).
♦ Calculated with:
A statistical grammar such as a probabilistic context free grammar; or
An n-gram language model.
Probabilistic context free grammar [example]
Grammar Probability Lexicon

S → NP VP 0.8 Det → the | a | that | this
S → Aux NP VP 0.1 + 1.0 0.6 0.2 0.1 0.1
S → VP 0.1 Noun → book | flight | meal | money
NP → Pronoun 0.2 0.1 0.5 0.2 0.2
NP → Proper-Noun 0.2 + 1.0 Verb → book | include | prefer
NP → Det Nominal 0.6 0.5 0.2 0.3
.. .. ..
. . .
• N-gram models can be computed from monolingual corpus.

• Unigram probabilities
Aበበ በሶ በላ።
count (w1) 4 Aበበ በሶ ገዛ።
p(w1) = p(Aበበ) = =0.138
total words observed 29 Aበበ ድንጋዩን ወረወረው።
Aበበ ወደ ትምህርት ቤት ሄደ።
• Bigram probabilities ከበደ መኪና ገዛ።
count (w1w2) count(Aበበ በሶ) ከበደ መኪናውን ገዛው።
p(w2|w1) = p(በሶ|Aበበ) =
count (w1) count(Aበበ) ከበደ መኪናዋን ገዛት።
Aልማዝ ሻይ Aፈላች።
2
= 4 = 0.500 Aልማዝ ጠላ ጠነሰሰች።
• Trigram probabilities
count (w1w2w3) count(Aበበ በሶ በላ)

p(w3|w1w2) = p(በላ|Aበበ በሶ) =
count (w1w2) count(Aበበ በሶ)
1
= 2 = 0.500
• Similarly, higher order n-gram models can be computed.

• Problems:
• How can we deal with n-gram models if sentences are too long?
What is p(Aበበ ወደ ትምህርት ቤት ሄደ Eና ድንጋዩን ወረወረው)?
• How can we deal with n-gram models if sentences are not seen in the corpus?
What is p(ወረወረው | Aበበ ወደ ትምህርት ቤት ሄደ Eና ድንጋዩን)?
• Solutions:
• Smoothing: avoid zero probability
For example, p(Aበበ ወደ ትምህርት ቤት ሄደ Eና ድንጋዩን ወረወረው)=0.00001
• Compute higher order n-gram models using lower order models such as bigram
and trigram.
p(Aበበ ወደ ትምህርት ቤት ሄደ Eና ድንጋዩን ወረወረው) ≈ p(Aበበ|<s>) * p(ወደ|Aበበ)
* p(ትምህርት|ወደ) * p(ቤት|ትምህርት) * p(ሄደ|ቤት) * p(Eና|ሄደ) * p(ድንጋዩን|Eና)
*p(ወረወረው|ድንጋዩን) * p(</s>|ወረወረው)
Statistical Machine Translation: Translation Model
• The job of the translation model is to assign a probability that a given source language
sentence generates target language sentence.
• We can model the translation from a source language sentence S to a target language
sentence Tˆ as:
best-translation Tˆ = argmax faithfulness(T,S) * fluency(T)
T
• Suppose that we want to build a foreign-to-English machine translation system.
• Thus, in a probabilistic model, the best English sentence e is the one whose probability
p(e|f) is the highest.
ê = argmax p(e|f)
e
♦ Bayes’ rule:
p(e|f ) = p(f|e) * p(e) / p(f )
argmax p(e|f ) = argmax p(f|e) * p(e) / p(f )

e e
♦ Noisy channel equation: ê = argmax p(f|e) * p(e) [for a given f]

e
Translation model Language model
• We are looking for the e that maximizes p(f|e) * p(e).

♦ We need to tell a generative “story” about how English strings become
foreign strings, i.e. we want to compute p(f|e) .
♦ When we see an actual foreign string f, we want to reason backwards ...
what English string e is:
likely to be uttered; and
likely to subsequently translate to f ?
• p(f|e) will be a module in overall foreign-to-English machine translation system.
• How do we assign values to p(f|e)?
count (f,e)
p(f|e) =
count (e)
♦ Impossible because sentences are novel, so we would never have enough
data to find values for all sentences.
♦ For example:
p(Aበበ ወደ ትምህርት ቤት ሄደ Eና ድንጋዩን ወረወረው|Abebe went to school and threw the stone)=?
• Decompose the sentences into smaller chunks, like in language modeling.
p(f|e) = Σa p(a, f|e)

♦ The variable a represents alignments between the individual chunks in the
sentence pair where the chunks in the sentence pair can be words or phrases.
♦ In word-based translation, the fundamental unit of translation is a word.
♦ Phrase-based translations translates whole sequences of words (called blocks or
phrases), where the lengths may differ.
Blocks are not linguistic phrases but phrases found using statistical
methods from corpora.
Most commonly used form of translation.
• The alignment probability p(a, f|e) is defined as follows:
m
p(a, f|e) = Π
j=1
t(f |e )
j i where t(fj|ei) is translation probability.
• The translation probability t(fj|ei) is calculated by counting as follows:

count (fj , ei )
t(fj|ei) =
count (ei )
• Word translation table [example]
Aበበ በሶ በላ ።
Abebe
ate
besso
• Unfortunately, it is difficult to get word aligned data to compute word translation

probability, so we can't do this directly.
♦ Use Expectation-Maximization (EM) algorithm.
• EM algorithm estimates model parameters by
♦ initiatilizing probabilities (e.g. uniformly);
♦ iteratively find the maximim likelihood of estimates of parameters.
Statistical Machine Translation: Decoding
Source text
f
Language Model Decoding Translation Model

p(e) argmax p(f|e) * p(e) p(f|e)
e
Target text
e
Statistical Machine Translation: Decoding
• A decoder searches for the best sequence of transformations that translates a source
sentence.
♦ Look up all translations of every source word or phrase, using word or phrase
translation table.
♦ Recombine the target language phrases that maximizes the translation model
probability * the language model probability.
♦ This search over all possible combinations can get very large so we need to find
ways of limiting the search space.
• Decoding is, therefore, a searching problem that can be reformulated as a classic
Artificial Intelligence problem, i.e. searching for the shortest path in an implicit graph.
Pros and Cons

• Pros:
♦ Has a way of dealing with lexical ambiguity
♦ Can deal with idioms that occur in the training data
♦ Can be built for any language pair that has enough training data (language
independent)
♦ No need of language experts (requires minimal human effort)
• Cons:
♦ Does not explicitly deal with syntax
Choosing SMT
• Economic reasons:
♦ Low cost
♦ Rapid prototyping
• Practical reasons:
♦ Many language pairs don't have NLP resources, but do have parallel corpora
• Quality reasons:
♦ Uses chunks of human translated text as its building blocks
♦ Produces state of the art results (when very large data sets are available)
Materials Needed to Build SMT

• Parallel corpus
♦ For example, Negarit Gazette (ነጋሪት ጋዜጣ) is a useful resource for English-to-
Amharic machine translation or vice versa.
• Word alignment software

♦ For example, Giza++ is useful for testing the quality of automatically generated
word alignments.
• Language modeling toolkit

♦ For example, SRILM toolkit estimates n-gram probabilities.
• Decoder
♦ For example, Pharaoh (phrase-based decoder that builds phrase tables from
Giza++ word alignments and produces best translation for new input using the
phrase table plus SRILM language model)
Example-Based Machine Translation
• Fundamental idea:
♦ People do not translate by doing deep linguistics analysis of a sentence.
♦ They translate by decomposing sentence into fragments, translating each of
those, and then composing those properly.
• Uses the principle of analogy in translation
• Example:
Given the following translations:
የመጽሃፉ ዋጋ ከ500 ብር በላይ ነው Î The price of the book is more than 500 Birr
የቤቱ ዋጋ ርካሽ ነው Î The price of the house is cheap
Based on the above examples, the following translation can be made:
የቤቱ ዋጋ ከ500 ብር በላይ ነው Î The price of the house is more than 500 Birr
Example-Based Machine Translation
Challenges
• Locating similar sentences
• Aligning sub-sentential fragments
• Combining multiple fragments of example translations into a single sentence
• Determining when it is appropriate to substitute one fragment for another
• Selecting the best translation out of many candidates
Pros and Cons

• Pros:
♦ Uses fragments of human translations which can result in higher quality
• Cons:
♦ May have limited coverage depending on the size of the example database, and
flexibility of matching heuristics
Hybrid Approaches to Machine Translation
• Machine translation systems discussed so far have their own pros and cons.
• Hybrid systems take the synergy effect of rule-based, statistical and example-based
machine translations.
♦ Rules can be post-processed by statistics and/or examples
♦ Statistics guided by rules and/or examples
• Example:
If you prefer another hotel please let me know

Segment1 Segment2
Example-based translation Statistical translation Rule-based translation

Alternative translations with confidence values
Selection module
Segment1 Segment2
Translated by rule-based Translated by example-based
Information Extraction NL Interfaces for Human-Machine Interaction
Machine Translation Question Answering Systems
Question-Answering and Dialogue Systems Dialogue Systems
Text Summarization
NL Interfaces for Human-Machine Interaction
• Natural Languages (NLs) are increasingly becoming important interfaces styles in

Human-Computer Interaction (HCI).
• The growing popularity of natural language interfaces is due to the rise of human needs
to interact/communicate with computer systems to:
♦ get answers for real world questions; or
♦ make conversation in a coherent way about various topics.
• Two of the most important applications of NLP that deal with such issues are Question
Answering and Dialogue Systems.
• Question Answering (QA) System:
♦ A system that provides an answer or answer containing text for a given question
formulated using natural language.
• Dialogue System (DS):
♦ A system that converses with human beings in a coherent way.
♦ An extension of QA system, i.e., a two-way QA system.
Text Summarization
NL Interfaces for Human-Machine Interaction
Question
Answer
Question Question
Answer Answer
HCI in Question Answering HCI in Dialogue Systems
Text Summarization
Question Answering Systems: Questions and Answers
• QA Systems deal with a wide range of question types such as fact, list, “wh”-questions,
definition, hypothetical, and semantically-constrained questions.
• Search engines do not speak natural language.
♦ Human beings need to speak the language of search engines.
♦ QA Systems attempt to let human beings ask their questions in the normal way
using natural languages.
• QA Systems are important NLP applications especially for inexperienced users.
♦ QA Systems are closer to human beings than search engines are.
♦ QA Systems are viewed as natural language search engines.
♦ QA Systems are considered as next step to current search engines.
• Question answering can be approached from one of two existing NLP research areas:
♦ Information Retrieval: QA can be viewed as short passage retrieval.
♦ Information Extraction: QA can be viewed as open-domain information
extraction.
• The performance of QA Systems is heavily dependent on good search corpus.
Text Summarization
Question Answering Systems: Questions and Answers
• Answers are searched in collections from

♦ Database (e.g. documents in an organization):
relies on small local document collections
limited type of questions are allowed (e.g. factoid questions)
key words are used to represent the question
only shallow analysis is made to compile answers
closed-domain question answering
♦ Corpus data (e.g. web):
relies on world knowledge
entertains any type of question
deep linguistic analysis is required (e.g. named entity recognition, relation
detection, co-reference resolution, word sense disambiguation, etc.
open-domain question answering
Text Summarization
Question Answering Systems: General Architecture
Major Components of QA Systems

• Question Analysis: The natural language question input by the user needs to be
analyzed into whatever form or forms are needed by subsequent parts of the system.
♦ The user could be asked to clarify his or her question before proceeding.
• Candidate Document Selection: A subset of documents from the total document
collection (typically several orders of magnitude smaller) is selected, comprising those
documents deemed most likely to contain an answer to the question.
♦ This collection may need to be processed before querying, in order to transform
it into a form which is appropriate for real-time question answering.
• Candidate Document Analysis: If the preprocessing stage has only superficially
analyzed the documents in the document collection, then additional detailed analysis of
the candidates selected at the preceding stage may be carried out.
• Answer Extraction: Using the appropriate representation of the question and of each
candidate document, candidate answers are extracted from the documents and ranked
in terms of probable correctness.
• Response Generation: A response is returned to the user.
♦ This may be affected by the clarification request, and may in turn lead to the
response being updated.
Text Summarization
Question Answering Systems: General Architecture
User
Question Response
Clarification Request
Question Analysis Response Generation
Question Representation Ranked Answers
Candidate Document Candidate Candidate Document Analyzed

Selection Analysis Documents Answer Extraction
Documents
Document
Collection
Text Summarization
Dialogue Systems: Modality of Conversation
• The modality of Dialogue Systems can be text-based, spoken-dialogue, graphical user

interface, or multi-modal.
• Text-Based:
♦ The conversation is made by making use of natural language texts.
♦ For example, ELIZA.
• Spoken Dialogue:
♦ The conversation is made by making use of voice.
♦ For example, HMIHY (how may I help you) developed at AT&T for call routing.
• Graphical User Interface:
♦ The conversation is made by making use of images.
♦ For example, Dialogue Boxes in Windows applications.
• Multimodal:
♦ The conversation is made by any combination of the above three modalities.
Text Summarization
Dialogue Systems: Dialogue Interation Modes
• Dialogue Systems differ in the degree with which human or computer takes the
initiative.
Question
Answer
Question
Answer
Computer-Initiative Human-Initiative
♦ Computer maintains tight control ♦ Human maintains tight control
♦ Human is highly restricted ♦ Computer is highly restricted
♦ E.g., Dialogue Boxes ♦ E.g., ELIZA
Mixed-Initiative
♦ Human and computer have flexibility to specify constraints
♦ Mainly research prototypes
Text Summarization
Dialogue Systems: Application Areas
• Currently, Dialogue Systems are used in specific domains such as:

♦ Customer service: Responding to customers' general questions about products
and services, e.g., answering questions about applying for a bank loan.
♦ Help desk: Responding to internal employee questions, e.g., responding to
human resource questions.
♦ Website navigation: Guiding customers to relevant portions of complex
websites, e.g., helping people determine where information or services reside
on a company's website.
♦ Guided selling: Providing answers and guidance in the sales process,
particularly for complex products being sold to novice customers.
♦ Technical support: Responding to technical problems, such as diagnosing a
problem with a device.
Text Summarization
Dialogue Systems: General Architecture
General Architecture of Spoken Dialogue System
Human Computer System
Input Speech Natural Language

Interface Recognition Understanding
I/O
Server Dialogue Knowledge
Manager Base
Output Text-to-Speech Natural Language

Interface Synthesis Generation
These components do not exist in Text-Based Dialogue Systems.
Text Summarization
• In the process of Natural Language Understanding, there are many ways to represent
the meaning of sentences.
♦ For dialogue systems, the most common is “frame and slot semantics”
representation.
Show me morning flights from Addis Ababa to London on Tuesday [Example]
SHOW:
FLIGHTS:
ORIGIN
CITY: Addis Ababa
DATE: Tuesday
TIME: morning
DESTINATION
CITY: London
DATE:
TIME:
Text Summarization
• “Frame and slot semantics” can be generated by a semantic grammar.
Semantic grammar for the aforementioned request
LIST →show me | I want | can I see

DEPARTTIME→ (after|before|around) HOUR | morning | evening
HOUR → one | . . . | twelve | (am|pm)
FLIGHTS → a (flight) — flights
ORIGIN → from CITY
DESTINATION → to CITY
CITY → Addis Ababa | Dire Dawa | Bahir Dar | London | Paris | New York
Genres and Types of Text Summarization
Machine Translation
Computational Approach
Text Summarization
• Automatic Text Summarization refers to the generation of a shortened version of a text

that still contains the most important points of the original text.
• The content of the summary depends on the purpose and/or type of summarization.
• Types of summaries can be the following:
♦ Indicative vs. informative
Used for quick categorization vs. content processing.
♦ Extract vs. abstract
Lists fragments of text vs. re-phrases content coherently.
♦ Generic vs. query-oriented
Provides author’s view vs. reflects user’s interest.
♦ Background vs. just-the-news
Assumes reader’s prior knowledge is poor vs. up-to-date.
♦ Single-document vs. multi-document source
Based on one text vs. fuses together many texts.
Machine Translation
Text Summarization
Computational Approach: Procedures
• The following general procedure is used to build Text Summarization system:
Given a corpus of documents and their summaries

Label each sentence in the document as summary-worthy or not
compute document keywords
score document sentences with respect to these keywords
Learn which sentences are likely to be included in a summary
Given an unseen (test document) classify sentences as summary-worthy or not
Cohesion and coherence check: Spot anaphoric references and modify text accordingly
Balance and coverage: modify summary to have an appropriate text structure
• There are two computational approaches to Text Summarization:

♦ top-down; and
♦ bottom-up
Machine Translation
Text Summarization
Computational Approach: Top-Down
• Top-down approach is considered as query-driven summarization.

♦ User needs only certain types of information.
♦ System needs particular criteria of interest, used to focus search.
• Criteria of interest can be modeled using:
♦ Templates with slots having semantic characteristics; or
♦ A set of important terms.
• Top-down approach can be implemented using Information Extraction (IE).
♦ IE task: Given a template and a text, find all the information relevant to each
slot of the template and fill it in.
♦ IE-for-summarization task: Given a query, select the best template, fill it in,
and generate the contents.
• Problems:
♦ IE works only for very particular templates; can it scale up?
♦ What about information that doesn’t fit into any template?
• Pros: higher quality, and also supports abstracting.
• Cons: low speed, and still needs to scale up to robust open-domain summarization.
Machine Translation
Text Summarization
Computational Approach: Bottom-Up
• Bottom-up approach is considered as text-driven summarization.

♦ User needs any information that is important.
♦ System uses strategies (importance metrics) over representation of whole text.
• Generic importance metrics can be modeled using:
♦ Degree of connectedness in semantic graphs; or
♦ Frequency of occurrence of tokens.
• Bottom-up approach can be implemented using Information Retrieval (IR).
♦ IR task: Given a query, find the relevant document(s) from a large set of
documents.
♦ IR-for-summarization task: Given a query, find the relevant passage(s) from a
set of passages (i.e., from one or more documents).
• Problems:
♦ IR techniques work on large volumes of data; can they scale down?
♦ IR works on words; do abstracts require abstract representations?
• Pros: robust, and good for query-oriented summaries.
• Cons: lower quality, and inability to manipulate information at abstract levels.
Machine Translation
Text Summarization
Computational Approach: Evaluation
• Given the original text T and summary S, two measures are commonly used to
evaluate Text Summarization systems:
♦ Compression Ratio (CR)

Length (S)
CR =
Length (T)
♦ Retention Ratio (RR)

Information (S)
RR =
Information (T)
• Measuring length:
♦ Number of letters
♦ Number of words
♦ Number of sentences
• Measuring information:
♦ Shannon Game: quantify information content.
♦ Question Game: test reader’s understanding.
TOC: Course Syllabus
Previous: Disambiguation
Current: Applications of NLP
Next: Related Fields

Lecture 07

Uploaded by

Copyright:

Available Formats

Lecture 07

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 07

Uploaded by

Copyright:

Available Formats

Natural Language Processing (COSC 6405)

Lecture 07: Applications of NLP

Department of Computer Science,

The Retrieval Process

• Information Retrieval (IR) provides a list of potentially relevant documents in response

user feedback logical view logical view

Query Operations Indexing DB Manager Module

The process of retrieving information [Baeza-Yates & Ribeiro-Neto]

• Classic IR models consider that each document is described by a set of representative

• Index terms have the following characteristics

♦ (Document) words whose semantics help in remembering the documents’ main

♦ Used to index and summarize the documents.

♦ Mainly nouns because nouns have meaning by themselves.

Classic IR Models: Boolean Model

• Based on set theoretic (set theory and Boolean algebra) concepts.

• Documents and queries are represented as a set of index terms.

• Queries are specified as Boolean expressions which have precise semantics.

• Similarity of documents to queries is measured with exact matching.

• Retrieval strategy is based on binary decisions (relevant or non-relevant).

• Considers that index terms are present or absent.

• Index terms are linked by three connectives (AND, OR, NOT).

• Simple but not efficient.

Classic IR Models: Vector Model

• Based on algebraic concepts.

• Documents and queries are represented as vectors in a t-dimensional vector space.

• Recognizes that the use of binary weights is too limiting.

• Assigns non-binary weights to index terms in queries and documents.

♦ Its term-weighting scheme improves retrieval performance.

Classic IR Models: Probabilistic Model

• Based on probabilistic concepts.

• Precision is the fraction of the retrieved documents which is relevant.

• NLP is widely used to improve the performance of IR systems.

• Most commonly used applications are:

♦ Lexical analysis: with the objective of treating digits, hyphens, punctuation

♦ Stemming: with the objective of removing affixes.

♦ Automatic indexing: with the objective of determining representative words (or

♦ Document clustering: with the objective of building the relationships between

♦ Construction of lexical relationships: with the objective of building term

 Improves recall and precision.

 Synonymy: absence of synonymy relationship yields poor recall from

 Polysemy/homonymy: terms yield poor precision results from search

Components of Information Extraction

• Information Extraction (IE) focuses on the recognition, tagging, and extraction of

Components of Information Extraction

• IE is applied to a narrowly restricted domain.

• It has the following subtasks:

♦ Named Entity Recognition: recognition of entity names.

♦ Relation Detection and Classification: identification of relations between entities.

♦ Coreference and Anaphoric Resolution: resolving links to previously named

♦ Temporal and Event Processing: recognizing temporal expressions and analyzing

♦ Template Filling: filling in the extracted information.

Named Entity Recognition

♦ People: Abebe, Kebede, Aበበ, ከበደ, etc.

♦ Organization: Ministry of Education, ABC Company, ትምህርት ሚኒስቴር, ሀለመ

♦ Place: Addis Ababa, Megenagna, Aዲስ Aበባ, መገናኛ, etc.

♦ Time expression: Tuesday, February 14, ማክሰኞ, የካቲት 6, etc.

Relation Detection and Classification

• Relation Detection and Classification involves identification of relations between entities.

♦ PERSON works for ORGANIZATION

Improves recall and precision.

Synonymy: absence of synonymy relationship yields poor recall from

Polysemy/homonymy: terms yield poor precision results from search