NLP Mid-1
NLP Mid-1
SAQs
1. Define NLP.
NLP stands for Natural Language Processing, which is a part of Computer
Science, Human
language, and Artificial Intelligence.
It is a branch of Artificial Intelligence that helps computers to understand,
interpret and manipulate human language.
2. Define Parsing.
Parsing in Natural Language Processing (NLP) refers to the process of
analyzing the grammatical structure of a sentence to understand its
components and how they relate to each other syntactically.
This process involves breaking down a sentence into its constituent parts,
such as nouns, verbs, adjectives, and phrases, and determining the
relationships between them, such as subject-verb-object relationships.
NLP MID-1 1
3. What is lexeme?
Lexeme refers to the basic unit of vocabulary, typically corresponding to a
single word as it appears in a dictionary or lexicon.
However, it can also represent a base or root form of a word, from which
various inflected forms (such as different tenses, plurals, etc.) can be derived.
E.g: the word "run" can be a lexeme representing both the base form of the
verb ("run") and its inflected forms ("running," "ran"). Similarly, "cat" is a
lexeme representing both the singular noun form ("cat") and its plural form
("cats").
4. What is Morphology?
Morphology refers to the study of the internal structure of words and the
rules governing their formation.
It deals with how words are constructed from morphemes, which are the
smallest units of meaning in a language.
5. What is Treebank?
These trees, often called parse trees or syntactic trees, depict the
grammatical structure of the sentences according to a predefined
formalism, such as constituency parsing or dependency parsing.
NLP MID-1 2
1. All production rules must be of the forms:
A → BC (or) A → a
NLP MID-1 3
4. Functional Morphology: Models morphological processes using functional
programming concepts.
LAQs:
9. Explain the structure of documents.
Effective feature design and selection are vital to prevent overfitting and
noise problems.
NLP MID-1 4
Similarly, morphologically rich languages may necessitate word structure
analysis to extract additional features.
These algorithms are then applied to tokens to further analyze and extract
meaningful linguistic features.
Generative Models:
Sequence Classification:
Training Phase:
NLP MID-1 5
Calculates probability of observing a sequence given a class label
y(P (X ∣ y))
Classification Decision:
Bayesian Inference:
NLP MID-1 6
information extraction.
Each node in the parse tree represents either a word (terminal node) or
a phrase (non-terminal node).
2. Treebank Annotation:
1. Tokenization:
NLP MID-1 7
3. Annotation Process:
In this tree:
S: Sentence
Det: Determiner
N: Noun
NLP MID-1 8
13. What are the Issues and Challenges of Morphology?
NLP MID-1 9
rules, but other lexically dependent irregularities often cannot be
generalized
1. Dictionary Lookup:
NLP MID-1 10
Application: FSM is widely used in natural language processing (NLP)
tasks, such as tokenization, stemming, and morphological analysis. It
offers a computationally efficient framework for capturing complex
morphological phenomena in a formal and expressive manner.
3. Unification-based Morphology:
4. Functional Morphology:
NLP MID-1 11
15. Explain how the Morphological typology divides languages into groups.
Example: Let's create a dependency graph for the sentence "The Cat chased
the mouse" and then find its minimum spanning tree.
Dependency Graph:
chased
____/ | \___
/ | \
The Cat mouse
In this graph, "chased" is the main verb, with "The" and "mouse" as its direct
objects, and "Cat" as its subject.
NLP MID-1 12
"chased" --> "mouse"
chased
/ \
Cat The
\
mouse
In this MST, we have selected the edges that maintain the syntactic
structure of the sentence while minimizing the total number of edges.
This tree connects all the words in the sentence without forming any
cycles and with the minimum possible number of edges.
Step-1 : Number the words from in order (include 0 at the start) . here we have
5 words.
Step-2 : Create a 5*5 matrix and number them as shown below.
Note that the column side starts from 1, while the row side starts from 0.
For example, place "DA" in position (0,1) and note its significance.
NLP MID-1 13
Continue placing and annotating each word in the chart.
NLP MID-1 14
Final correct calculated values for:
VP = 0.000012
S = 0.00000002304 (or) 2.304 * 10^(-8)
https://www.youtube.com/watch?v=SFQ-owZaU_s
NLP MID-1 15