Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
15 views

Topic 6_Natural Language Processing (NLP)

Uploaded by

nyokaisheanopa
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Topic 6_Natural Language Processing (NLP)

Uploaded by

nyokaisheanopa
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

NATURAL LANGUAGE

PROCESSING (NLP)
TOPIC 6

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.1 Introduction to Natural Language Processing (NLP)

• Natural Language Processing (NLP) refers to AI method of


communicating with an intelligent systems using a natural language
such as English.
• Processing of Natural Language is required when you want an
intelligent system like robot to perform as per your instructions, when
you want to hear decision from a dialogue based clinical expert system,
etc.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.1 Introduction to Natural Language Processing (NLP)
continued…

• NLP is the technology that is used by machines to understand, analyse,


manipulate, and interpret human's languages.
• The input and output of an NLP system can be:
 Speech
 Written Text

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.2 Components of NLP
There are the following two components of NLP:
1. Natural Language Understanding (NLU): - helps the machine to
understand and analyse human language by extracting the
metadata from content such as concepts, entities, keywords,
emotion, relations, and semantic roles.
NLU involves the following tasks –
 Mapping the given input in natural language into useful
representations.
 Analysing different aspects of the language.
SCS 4101 Artificial Intelligence Wednesday, December 25, 2024
6.2 Components of NLP continued….
There are the following two components of NLP:
2. Natural Language Generation (NLG): - acts as a translator that
converts the computerized data into natural language representation. It
mainly involves Text planning, Sentence planning, and Text Realization.
NLG involves −
 Text planning − It includes retrieving the relevant content from knowledge
base.
 Sentence planning − It includes choosing required words, forming
meaningful phrases, setting tone of the sentence.
 Text Realization − It is mapping sentence plan into sentence structure.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.3 Applications of NLP
The following are some common applications of NLP:

1. Question Answering:- focuses on building systems that automatically answer the


questions asked by humans in a natural language.

2. Spam detection:- used to detect unwanted e-mails getting to a user's inbox.

3. Sentiment Analysis:- also known as opinion mining. It is used on the web to analyse
the attitude, behaviour, and emotional state of the sender.

4. Machine translation:- used to translate text or speech from one natural language to
another natural language.

5. Speech recognition:- used for converting spoken words into text. It is used in
applications, such as mobile

6. Chatbot:- one of the important applications of NLP. It is used by many companies to


provide the customer's chat services.
SCS 4101 Artificial Intelligence Wednesday, December 25, 2024
6.4 NLP techniques and methods
To analyse and understand human language, NLP employs a variety of
techniques and methods. The following are some fundamental
techniques used in NLP:
 Tokenization. This is the process of breaking text into words,
phrases, symbols, or other meaningful elements, known as tokens.
 Parsing. Parsing involves analysing the grammatical structure of a
sentence to extract meaning.
 Lemmatization. This technique reduces words to their base or root
form, allowing for the grouping of different forms of the same word.
 Named Entity Recognition (NER). NER is used to identify entities
such as persons, organizations, locations, and other named items in
the text.
 Sentiment analysis. This method is used to gain an
SCS 4101 Artificial Intelligence
understanding
Wednesday, December 25, 2024
6.5 Phases of NLP
NLP has the following five steps/phases:

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.5 Phases of NLP
NLP has the following five steps/phases:
1. Lexical Analysis − It involves identifying and analysing the structure of words.
Lexicon of a language means the collection of words and phrases in a language.
Lexical analysis is dividing the whole chunk of txt into paragraphs, sentences, and
words.

2. Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for


grammar and arranging words in a manner that shows the relationship among the
words. The sentence such as “The school goes to boy” is rejected by English syntactic
analyser.

3. Semantic Analysis − It draws the exact meaning or the dictionary meaning from the
text. The text is checked for meaningfulness. It is done by mapping syntactic
structures and objects in the task domain. The semantic analyser disregards sentence
such as “hot ice-cream”.
SCS 4101 Artificial Intelligence Wednesday, December 25, 2024
6.5 Phases of NLP
NLP has the following five steps/phases:
4. Discourse Integration − The meaning of any sentence depends upon the meaning of
the sentence just before it. In addition, it also brings about the meaning of immediately
succeeding sentence.

5. Pragmatic Analysis − During this, what was said is re-interpreted on what it actually
meant. It involves deriving those aspects of language which require real world knowledge.

For Example: "Open the door" is interpreted as a request instead of an order.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.6 Difficulties in NLP
NLP is difficult because Ambiguity and Uncertainty exist in the
language.
1. Lexical Ambiguity - Lexical Ambiguity exists in the presence of
two or more possible meanings of the sentence within a single word.

Example:
Manya is looking for a match.
In the above example, the word match refers to that either Manya is
looking for a partner or Manya is looking for a match. (Cricket or
other match)

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.6 Difficulties in NLP continued….
NLP is difficult because Ambiguity and Uncertainty exist in the
language.
2. Syntactic Ambiguity - Syntactic Ambiguity exists in the presence
of two or more possible meanings within the sentence.

Example:
I saw the girl with the binocular.
In the above example, did I have the binoculars? Or did the girl have
the binoculars?

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.6 Difficulties in NLP continued….
NLP is difficult because Ambiguity and Uncertainty exist in the
language.
3. Referential Ambiguity - Referential Ambiguity exists when you
are referring to something using the pronoun.

Example:
Kiran went to Sunita. She said, "I am hungry."
In the above sentence, you do not know that who is hungry, either
Kiran or Sunita.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.7 NLP APIs
Natural Language Processing APIs allow developers to integrate
human-to-machine communications and complete several useful tasks
such as speech recognition, chatbots, spelling correction, sentiment
analysis, etc.
Some examples of NLP APIs:
 Chatbot API - allows you to create intelligent chatbots for any
service.
 Translation API by SYSTRAN - used to translate the text from the
source language to the target language.
 Speech to text API- used to convert speech to text.
SCS 4101 Artificial Intelligence Wednesday, December 25, 2024
6.8 NLP Libraries
• Scikit-learn: It provides a wide range of algorithms for building
machine learning models in Python.
• Natural language Toolkit (NLTK): NLTK is a complete toolkit for all
NLP techniques.
• Pattern: It is a web mining module for NLP and machine learning.
• TextBlob: It provides an easy interface to learn basic NLP tasks like
sentiment analysis, noun phrase extraction, or pos-tagging.
• Quepy: Quepy is used to transform natural language questions into
queries in a database query language.
• SpaCy: SpaCy is an open-source NLP library which is used for Data
Extraction, Data Analysis, Sentiment Analysis, and Text Summarization.
• Gensim: Gensim works with large datasets and processes data
streams.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.9 Working of Natural Language Processing (NLP)
An illustration of activities in NLP is shown in the figure below:

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.9 Working of Natural Language Processing (NLP)
1. Text Input and Data Collection

 Data Collection: Gathering text data from various sources such


as websites, books, social media, or proprietary databases.
 Data Storage: Storing the collected text data in a structured
format, such as a database or a collection of documents.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.9 Working of Natural Language Processing (NLP)
2. Text Preprocessing
Includes:
 Tokenization: Splitting text into smaller units like words or
sentences.
 Lowercasing: Converting all text to lowercase to ensure uniformity.
 Stopword Removal: Removing common words that do not
contribute significant meaning, such as “and,” “the,” “is.”
 Punctuation Removal: Removing punctuation marks.
 Stemming and Lemmatization: Reducing words to their base or
root forms. Stemming cuts off suffixes, while lemmatization considers
the context and converts words to their meaningful base form.
SCS 4101 Artificial Intelligence Wednesday, December 25, 2024
6.9 Working of Natural Language Processing (NLP)

3. Text Representation
 Bag of Words (BoW): Representing text as a collection of words,
ignoring grammar and word order but keeping track of word
frequency.
 Term Frequency-Inverse Document Frequency (TF-IDF): A
statistic that reflects the importance of a word in a document relative
to a collection of documents.
 Word Embeddings: Using dense vector representations of words
where semantically similar words are closer together in the vector
space (e.g., Word2Vec, GloVe).

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.9 Working of Natural Language Processing (NLP)

4. Feature Extraction
Extracting meaningful features from the text data that can be used for
various NLP tasks.
 N-grams: Capturing sequences of N words to preserve some context
and word order.
 Syntactic Features: Using parts of speech tags, syntactic
dependencies, and parse trees.
 Semantic Features: Leveraging word embeddings and other
representations to capture word meaning and context.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.9 Working of Natural Language Processing (NLP)

5. Model Selection and Training


Selecting and training a machine learning or deep learning model to
perform specific NLP tasks.
 Supervised Learning: Using labeled data to train models like
Support Vector Machines (SVM), Random Forests, or deep learning
models like Convolutional Neural Networks (CNNs) and Recurrent
Neural Networks (RNNs).
 Unsupervised Learning: Applying techniques like clustering or
topic modeling (e.g., Latent Dirichlet Allocation) on unlabeled data.
 Pre-trained Models: Utilizing pre-trained language models such as
BERT, GPT, or transformer-based models that have been trained on
large corpora.
SCS 4101 Artificial Intelligence Wednesday, December 25, 2024
6.9 Working of Natural Language Processing (NLP)

6. Model Deployment and Inference


Deploying the trained model and using it to make predictions or extract
insights from new text data.
• Text Classification: Categorizing text into predefined classes (e.g.,
spam detection, sentiment analysis).
• Named Entity Recognition (NER): Identifying and classifying entities
in the text.
• Machine Translation: Translating text from one language to another.
• Question Answering: Providing answers to questions based on the
context provided by text data.
SCS 4101 Artificial Intelligence Wednesday, December 25, 2024
6.9 Working of Natural Language Processing (NLP)

7. Evaluation and Optimization


Evaluating the performance of the NLP algorithm using metrics such as
accuracy, precision, recall, F1-score, and others.
• Hyperparameter Tuning: Adjusting model parameters to improve
performance.
• Error Analysis: Analysing errors to understand model weaknesses
and improve robustness.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024


6.9 Working of Natural Language Processing (NLP)

8. Iteration and Improvement


Continuously improving the algorithm by incorporating new data,
refining preprocessing techniques, experimenting with different
models, and optimizing features.

SCS 4101 Artificial Intelligence Wednesday, December 25, 2024

You might also like