Important 2 Marks
Important 2 Marks
Important 2 Marks
Text wrangling involves further manipulation of the text data to prepare it for analysis or machine
learning tasks.
Feature engineering for text representation involves converting raw text data into a format suitable
for machine learning models. This process transforms text into numerical features that can be used
to train and evaluate algorithms.
TF-IDF adjusts the word counts by considering the importance of a word in a document relative
to its frequency in the entire corpus. It helps in emphasizing important words and de-emphasizing
common words
output
running -> run
Output
['sample', 'sentence', '.']
Lemmatization is the process of reducing a word to its base or dictionary form (known as the
lemma), which is different from stemming. Unlike stemming, which simply cuts off word endings,
lemmatization considers the context and returns valid words.
For example:
Bag of Words (BoW) is a simple and commonly used model in natural language processing (NLP)
for representing text data. The main idea behind BoW is to treat text (like a document, sentence,
or paragraph) as a collection of individual words, disregarding the order and context in which they
appear.
2.What is embeddings?
Embeddings are dense, low-dimensional vector representations of words, phrases, or documents,
learned from large text corpora. They are designed to capture the semantic relationships between
words, where words with similar meanings are represented by similar vectors.
3.Define BERT?
BERT (Bidirectional Encoder Representations from Transformers): BERT generates
contextual embeddings, meaning that the representation of a word changes depending on the
words around it. It is based on the Transformer architecture and has achieved state-of-the-art
results in many NLP tasks.
4.What is Word2Vec?
Word2Vec is a popular technique in Natural Language Processing (NLP) for representing words
as vectors. Word2Vec models are used to capture the semantic relationships between words by
representing them in a continuous vector space, where words with similar meanings are placed
closer together.
5. What is Glove?
GloVe, or Global Vectors for Word Representation, is an unsupervised learning algorithm
developed by researchers at Stanford for obtaining vector representations of words. The idea
behind GloVe is to leverage the global statistical information of a corpus to produce dense word
embeddings.
• Text Classification
• Named Entity Recognition (NER)
• Sentiment Analysis
• Machine Translation
7.What is FastText?
FastText is an extension of the Word2Vec model developed by Facebook's AI Research (FAIR)
lab. It addresses some of the limitations of Word2Vec, particularly in handling out-of-vocabulary
(OOV) words and capturing subword information.
UNIT III
QUESTION ANSWERING AND DIALOGUE SYSTEMS
1.What is Information Retrieval?
Information Retrieval (IR) is the process of obtaining relevant information from a large
repository, often in response to a user query. The primary goal of IR is to help users find
the information they need quickly and efficiently. This is commonly applied to document
collections, such as web pages, research articles, or databases, where a system retrieves
documents based on their relevance to the user's search terms.
IR-based QA systems first find relevant information from a database or set of documents and then
extract the most relevant part to form an answer.
Language models for question answering (QA) have advanced significantly in recent years,
particularly with the advent of transformer-based architectures like OpenAI’s GPT, Google’s
BERT, and other similar models. These models use vast amounts of text data to learn linguistic
patterns, enabling them to generate or retrieve accurate answers to questions without relying on
explicit knowledge bases.
Classic QA models, developed before the deep learning and transformer revolutions, relied more
on structured approaches, traditional machine learning, and rule-based systems. These models
typically focused on understanding and retrieving answers from specific types of data, such as
documents, structured databases, or even human-curated knowledge bases.
Rule-based QA systems were among the earliest attempts at automating question answering. These
systems followed manually defined rules or templates to process the question and retrieve the
answer. They were effective for specific, narrow domains but lacked flexibility.
Pipeline-based QA systems break down the QA process into a sequence of independent steps, each
responsible for a specific task such as parsing, entity extraction, relation identification, and answer
generation.
Statistical machine learning models improved upon rule-based and IR-based systems by learning
patterns from data, often using features derived from questions and text. Classic machine learning
algorithms such as support vector machines (SVMs) and decision trees were applied to QA tasks.
Chatbots are designed for interactive communication, often implemented in customer service or
personal assistants like Siri and Alexa. While not purely QA systems, they often include QA
components.
ML-based chatbots: More advanced, with natural language understanding (NLU) and dialogue
management systems.
1. Customer Support Chatbots are widely used in customer support for handling FAQs,
troubleshooting issues, and guiding users through product features or services. They reduce
the workload for human agents and provide instant assistance.
2. Virtual Assistants AI chatbots like Siri, Google Assistant, and Alexa act as personal
assistants, helping users perform tasks like setting reminders, controlling smart home
devices, and answering questions.
3. E-commerce In e-commerce, chatbots help customers find products, process orders, and
provide information about discounts or promotions. They can guide users through the
shopping process or offer product recommendations.
4. Healthcare Healthcare chatbots assist patients with appointment scheduling, providing
medical information, reminding users to take medications, and even performing symptom
checking.
Designing dialogue systems, especially conversational agents like chatbots and virtual assistants,
requires a careful balance of linguistic understanding, interaction flow, and backend integration.
Dialogue systems are typically composed of multiple components that allow them to engage users
in natural, coherent, and goal-oriented conversations.
Combining rule-based components with machine learning models creates hybrid systems. For
example, NLU and dialogue management may be rule-based for task-oriented conversations,
while response generation is handled by an AI model for more natural interaction
UNIT IV
TEXT-TO-SPEECH SYNTHESIS
1.What is Text Normalization?
Text normalization is the process of transforming text into a standard format to facilitate easier
processing and analysis, especially in natural language processing (NLP) tasks. It involves several
steps that help to reduce the variability in text data.
• Text-to-Speech (TTS): Converts written text to speech by first converting letters into
phonemes.
• Automatic Speech Recognition (ASR): Uses phoneme models for recognizing speech
and mapping spoken words to text.
• Language Learning Tools: Helps learners by generating phonetic transcriptions of words.
4.Define Prosody?
Prosody refers to the rhythm, intonation, and stress patterns in speech that convey meaning,
emotion, and structure. It's an essential aspect of natural language and spoken communication,
affecting how messages are perceived beyond the basic phonetic sounds.
Signal processing is the analysis, manipulation, and interpretation of signals to extract useful
information, enhance their quality, or convert them into a desired format. Signals can be anything
that conveys information, such as sound, images, sensor readings, or data streams, and they can
be represented in various forms like analog (continuous) or digital (discrete).
1. Analog Signals: Continuous signals, like sound waves or light, that vary over time and
take any value in a given range.
o Example: Human speech captured by a microphone.
2. Digital Signals: Discrete-time signals, often derived from the sampling of analog signals,
represented as sequences of numbers (binary).
o Example: A digitally recorded audio file.
Communication Systems
Parametric TTS systems generate speech by modeling the speech production process. Instead of
concatenating pre-recorded speech, parametric approaches synthesize speech by using statistical
models to control parameters like pitch, duration, and formants (vocal tract resonances) to generate
audio waveforms from scratch.
Deep learning-based text-to-speech (TTS) systems, particularly those like WaveNet, represent a
major leap in generating natural and high-quality synthetic speech. These systems address many
limitations of traditional methods like concatenative and parametric TTS by using neural
networks to learn the complex patterns of human speech directly from data.
UNIT V
AUTOMATIC SPEECH RECOGNITION
1.What is Acoustic Modelling?
Acoustic modeling is a crucial component of speech recognition systems, where it deals with the
representation of the relationship between linguistic units of speech (such as phonemes or words)
and the corresponding audio signal. It focuses on how to statistically model the way phonetic units
are produced in various contexts, including differences in speakers, accents, and environmental
noise.
2.What is Phonemes?
Phonemes are the smallest units of sound in a language, and acoustic models attempt to recognize
these by mapping the audio signal to the corresponding phonetic sounds. For example, the words
"cat" and "bat" differ by just one phoneme: /k/ and /b/.
GMMs are used to model the distribution of the acoustic features associated with each HMM state.
A GMM is a weighted sum of several Gaussian distributions and helps capture the variability in
speech signals for a particular phoneme.
A Hidden Markov Model (HMM) is a statistical model that is widely used in speech recognition,
natural language processing, and various other time-series applications. It is particularly well-
suited for modeling sequences where observations are generated by underlying hidden states,
which evolve over time.
Discriminative Training
Better Generalization
Labeling
Supervised learning
Evaluation
Speech recognition systems often need to adapt to new speakers, environments, or languages.
Techniques like Maximum Likelihood Linear Regression (MLLR) or speaker adaptation
training can be used to fine-tune acoustic models for specific speakers or conditions.
Step 1: Pre-Emphasis
Step 2: Framing
Step 3: Windowing
Step 4: Fourier Transform