0% found this document useful (0 votes)

37 views

Machine Learning

Machine learning has its roots in artificial intelligence and aims to develop computational models that can learn from experience like the human brain. Machine learning is a type of artificial intelligence that enables computers to learn without being explicitly programmed by recognizing patterns in data. It works by analyzing data to identify patterns and make predictions. The relationship between machine learning and data analysis emphasizes the importance of extracting valuable insights from data in a holistic way.

Uploaded by

qa1475417

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Machine Learning

Uploaded by

qa1475417

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Machine Learning and NLP

Rahul Bhimakari
Machine Learning
Machine learning has its roots in the field of artificial intelligence (AI), and its development was
inspired by the human brain's ability to learn from experience. Researchers aimed to develop
computational models that could imitate this learning process.

Machine learning is a type of artificial intelligence (AI) that enables computers to learn and
make decisions without being explicitly programmed. In basic terms, it's like teaching a
computer to recognize patterns and make predictions based on data.
Machine Learning
Machine learning has its roots in the field of artificial intelligence (AI), and its development was
inspired by the human brain's ability to learn from experience. Researchers aimed to develop
computational models that could imitate this learning process.

Machine learning is a type of artificial intelligence (AI) that enables computers to learn and
make decisions without being explicitly programmed. In basic terms, it's like teaching a
computer to recognize patterns and make predictions based on data.
Relationship with AI
Dependence on data
The fundamental idea behind machine learning being data-driven is that the learning process
involves the analysis of data to identify patterns, make predictions, or perform specific tasks
without being explicitly programmed

•Training on Data
•Learning Patterns
•Generalization to New Data
•Iterative Improvement
Data-Driven Decision-Making

Data analysis is often a precursor to machine learning, helping to understand the structure and
characteristics of the data. Machine learning, in turn, uses these insights to build models that
can make predictions, classify data, or automate decision-making processes. The relationship
between the two emphasizes the importance of a holistic approach when working with data to
extract valuable information and derive actionable insights.
How is machine learning different from traditional
programming ?
Machine learning and traditional programming represent two different approaches to solving
problems with computers

Explicit Instructions vs Learning from Data

Rule-Based vs Data-Driven

Problem Complexity

Flexibility and Adaptability

Programming Paradigm
Types of algorithms
Supervised Learning Algorithms
Description: These algorithms learn from labeled training data, where the input features
are paired with corresponding target labels. The goal is to learn a mapping from inputs to
outputs

.
Unsupervised Learning Algorithms
Description: Unsupervised learning algorithms operate on unlabeled data to find hidden
patterns or structures. They are used for tasks such as clustering and dimensionality
reduction.
Reinforcement Based Learning Algorithms
Description: Reinforcement learning involves an agent learning to make decisions by interacting
with an environment. The agent receives feedback in the form of rewards or penalties, guiding
its learning process.
Semi-Supervised Learning Algorithms
Description: These algorithms leverage a combination of labeled and unlabeled data for
training. They are particularly useful when acquiring labeled data is expensive or time-
consuming.
Regression
Linear Regression is a statistical supervised learning technique to predict
the quantitative variable by forming a linear relationship with one or more
independent features.

→ If an independent variable does a good job in predicting the dependent

→ Which independent variable plays a significant role in predicting the
variable.

dependent variable.
Assumptions
*The Independent variables should be linearly related to the dependent variables.

*Every feature in the data is Normally Distributed.

*There should be little or no multi-collinearity in the data.

*The mean of the residual is zero.

*Residuals obtained should be normally distributed.

*Variance of the residual throughout the data should be same. This is known as
homoscedasticity.

*There should be little or no Auto-Correlation is the data.

Types of Linear Regression
Simple Linear Regression:
Simple Linear Regression helps to find the linear relationship between two continuous
variables, one independent and one dependent feature.
Formula can be represented as y=mx+b
Multiple Linear Regression:

Multiple linear Regression is the most common form of linear regression analysis. As a
predictive analysis, the multiple linear regression is used to explain the relationship between
one continuous dependent variable and two or more independent variables.
The independent variables can be continuous or categorical (dummy coded as appropriate).
We often use Multiple Linear Regression to do any kind of predictive analysis as the data we
get has more than 1 independent features to it.
Formula can be represented as Y=mX1+mX2+mX3…+b

Example : Predict a person's salary based on factors such as years of experience,

level of education, and age.
How does a linear Regression Work?
The whole idea of the linear Regression is to find the best fit line, which has very low error(cost
function).
This line is also called Least Square Regression Line(LSRL).
The line of best fit is described with the help of the formula y=mx+b.
where, m is the slope and b is the intercept.

Properties of the Regression line:

1. The line minimizes the sum of squared difference between the observed values(actual y-
value) and the predicted value(ŷ value)
2. The line passes through the mean of independent and dependent features.
Cost Function of Linear Regression

Cost Function is a function that measures the performance of a Machine Learning

model for given data.
Cost Function is basically the calculation of the error between predicted values and
expected values and presents it in the form of a single real number.
Linear Regression with Gradient Descent
Gradient descent is an optimization algorithm used to find the values of parameters
(coefficients) of a function that minimizes a cost function (cost).

The main aim of gradient descent is to find the best parameters of a model which gives the
highest accuracy on training as well as testing datasets. In gradient descent, The gradient is a
vector that points in the direction of the steepest increase of the function at a specific point.
Moving in the opposite direction of the gradient allows the algorithm to gradually descend
towards lower values of the function, and eventually reaching to the minimum of the function.

At the optimum, the gradient should be close to zero. Therefore, if the gradient is very small, it
may indicate that the algorithm is close to the optimal solution.

Adaptive learning rate schedules can be employed to dynamically adjust the learning rate
during training. Some techniques, like learning rate annealing or learning rate decay, can help
fine-tune the learning process and enhance convergence.
Classification
In machine learning, classification refers to a predictive modeling problem where a
class label is predicted for a given example of input data. Examples of classification
problems include, classify if email is spam or not. Given a handwritten character,
classify it as one of the known characters.

Classification is a process of categorizing a given set of data into classes, It can be

performed on both structured or unstructured data. The process starts with predicting
the class of given data points. The classes are often referred to as target, label or
categories.
Logistic Regression
Logistic regression is a supervised machine learning algorithm mainly used for binary
classification where we use a logistic function, also known as a sigmoid function that takes input
as independent variables and produces a probability value between 0 and 1.

For example, we have two classes Class 0 and Class 1 if the value of the logistic function for an
input is greater than 0.5 (threshold value) then it belongs to Class 1 it belongs to Class 0. It’s
referred to as regression because it is the extension of linear regression but is mainly used for
classification problems. The difference between linear regression and logistic regression is that
linear regression output is the continuous value that can be anything while logistic regression
predicts the probability that an instance belongs to a given class or not.
Sigmoid Function
Logistic Regression relies on the logistic function to convert the output into a probability score.
This score represents the probability that an observation belongs to a particular class. The S-
shaped curve assists in thresholding and categorising data into binary outcomes.

The sigmoid function is also known as the

squashing function, as it takes the input from
the previously hidden layer and squeezes it
between 0 and 1. So a value fed to the sigmoid
function will always return a value between 0
and 1, no matter how big or small the value is
fed.
Evaluation Metrics
Confusion Metrics
In the field of machine learning a confusion matrix, also known as error matrix, is a specific
table layout that allows visualization of the performance of an algorithm
Accuracy
Overall correctness of the model
Accuracy=Total Number of Predictions/True Positives+True Negatives

Precision
Ability to avoid false positives
Precision=True Positives / False Positives+True Positives

Recall
Ability to capture all positive instances
Recall=True Positives / False Negatives +True Positives

F1-Score
F1 Score=2*Precision*Recall/Precision + Recall
Natural Language Processing
Natural Language Processing (NLP) is a subfield of artificial intelligence that deals with the
interaction between computers and humans in natural language. It involves the use of
computational techniques to process and analyze natural language data, such as text and
speech, with the goal of understanding the meaning behind the language.

NLP techniques are widely used in a variety of applications such as search engines,
machine translation, sentiment analysis, text summarization, question answering,
and many more.

The goal of NLP is to develop algorithms and models that enable computers to understand,
interpret, generate, and manipulate human languages.
Fundamental Concepts in NLP
Tokenization
Tokenization is the process of breaking down a text into smaller units called tokens. Tokens are
the basic building blocks, which can be words, phrases, or even characters, depending on the
level of granularity required.

Tokenization is a crucial initial step in NLP, providing a structured way to analyze and
understand textual data. It helps convert unstructured text into a format that can be easily
processed by algorithms.

For the sentence "The quick brown fox jumps over the lazy dog," tokenization would result in
individual tokens such as "The," "quick," "brown," "fox," "jumps," "over," "the," "lazy," "dog."
Stemming
Stemming is a process in which words are reduced to their root or base form by removing
suffixes. The goal is to bring related words to a common base form, simplifying analysis and
improving information retrieval.

Stemming helps in reducing the dimensionality of the feature space and treating variations of
words as a single entity. This can be beneficial in tasks like document clustering and search
engines.Also in the tasks like classification, tenses of words are rendered irrelevant once
stemming is applied.

The stem of words like "running," "runner," and "ran" is "run."

Lemmatization

Lemmatization is a more advanced form of reducing words to their base form, known
as a lemma. Unlike stemming, lemmatization considers the meaning of words and
aims to transform them into their dictionary or canonical form.

Lemmatization is particularly useful in applications where semantic meaning is crucial, such as

question-answering systems, chatbots, and sentiment analysis.

The lemma for words like "running," "runner," and "ran" is "run." However, lemmatization would
also consider the lemma of "better" as "good."
Padding
For tasks like text classification or sentiment analysis, it's common to represent
sentences as sequences of word embeddings. However, these sentences may have
different lengths.

Padding is applied by adding zeros or a special token to the end of shorter sequences, making
all sequences equal in length. This ensures that batches of input data fed into a neural network
have consistent dimensions.

Example: If you have sentences "I love NLP" and "It's fascinating," and you're representing
them as sequences of word embeddings, you might pad the first sentence with zeros to match
the length of the second sentence.
Pruning
In applications like text classification or language modeling, the learned embeddings or weights
may contain information that is not crucial for the model's performance.

Pruning involves identifying and removing less important connections, neurons, or even entire
layers from the neural network, resulting in a more compact and efficient model.

In a sentiment analysis model, during training, certain words may end up having very low
weights, indicating they contribute less to the overall sentiment prediction. Pruning involves
removing or reducing the influence of these less important words, streamlining the model.
Stopwords
Stopwords are common words that are often removed from text during the pre-processing
phase in natural language processing (NLP). These words are generally the most frequently
occurring words in a language and are often considered to be of little value in terms of
conveying specific meaning. Including them in certain NLP tasks may introduce noise and
adversely affect the performance of models.

Examples of stopwords in English include "the," "and," "is," "in," etc.

In a practical NLP pipeline, you might first tokenize a document, then apply
stemming or lemmatization to normalize the words, creating a more consistent and
manageable dataset for analysis or model training.
Word Embeddings

Machine learning models take vectors (arrays of numbers) as input. When working with text, the
first thing you must do is come up with a strategy to convert strings to numbers (or to
"vectorize" the text) before feeding it to the model. In this section, you will look at three
strategies for doing so.

Three strategies to do this:

One hot encoding

TF-IDF

Word2Vec
One-Hot Encoding
Each word is represented as a sparse vector with a 1 in the position corresponding
to the word's index in the vocabulary and 0s elsewhere.

This approach is inefficient. A one-hot encoded vector is sparse (meaning, most indices
are zero). Imagine you have 10,000 words in the vocabulary. To one-hot encode each
word, you would create a vector where 99.99% of the elements are zero.
TF-IDF
TF-IDF vectorization involves calculating the TF-IDF score for every word in your
corpus relative to that document and then putting that information into a vector
(see image below using example documents “A” and “B”). Thus each document in
your corpus would have its own vector, and the vector would have a TF-IDF score
for every single word in the entire collection of documents. Once you have these
vectors you can apply them to various use cases such as seeing if two documents
are similar by comparing their TF-IDF vector using cosine similarity.

In TF-IDF, the term frequency (TF) component measures how often a term occurs in a
document, while the inverse document frequency (IDF) component penalizes terms that are
common across many documents. The resulting TF-IDF score reflects the importance of a term
in a specific document within a larger corpus.
Word2Vec
Word2Vec is a popular technique in
natural language processing (NLP)
that is used to represent words as
continuous vector spaces. Developed
by researchers at Google, Word2Vec
captures semantic relationships
between words by representing them
as dense vectors in a high-dimensional
space. The key idea behind Word2Vec
is that words with similar meanings or
contexts are mapped to similar vectors
in this space.
Architectures:
Continuous Bag of Words (CBOW):
CBOW predicts the target word (center word) based on its surrounding context (context
words). The model is trained to predict the target word given a window of context words.

The architecture uses a neural network with a hidden layer to learn the word
embeddings.
Skip Grams:
Skip-gram, on the other hand, predicts the context words based on a given target word.
The model is trained to predict the context words given a target word.
Like CBOW, it uses a neural network with a hidden layer.
In both architectures, the training objective is to adjust the model's parameters (word vectors) so that the
predicted words match the actual context words as closely as possible. The word vectors obtained after
training are dense and capture semantic relationships, making them suitable for various NLP tasks.

Here's a simplified overview of how Word2Vec works:

Data Preparation:
Preprocess a large corpus of text data, tokenizing and cleaning it

Context-Target Pairs:
Create context-target pairs, where the context consists of nearby words and the target is
the word to be predicted.
Neural Network Training:
Train a neural network (CBOW or Skip-gram) on the context-target pairs to learn word
embeddings.
Word Embeddings:
Extract the learned word embeddings from the neural network.

Pyq 2
No ratings yet
Pyq 2
8 pages
Module 1 & 2
No ratings yet
Module 1 & 2
21 pages
UNIT1
No ratings yet
UNIT1
38 pages
ML Notes
No ratings yet
ML Notes
10 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
13 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
64 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Supervised Learning Notes 1-4
No ratings yet
Supervised Learning Notes 1-4
42 pages
Machine Learning and Regression
No ratings yet
Machine Learning and Regression
8 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
63 pages
Zzplagiarism
No ratings yet
Zzplagiarism
23 pages
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
DS - UNIT - III - QB & Ans
No ratings yet
DS - UNIT - III - QB & Ans
25 pages
ML
No ratings yet
ML
5 pages
Codes
No ratings yet
Codes
3 pages
Class 8_Linear Regression
No ratings yet
Class 8_Linear Regression
56 pages
Unit I
No ratings yet
Unit I
14 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
AI
No ratings yet
AI
52 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
Representation Learning
No ratings yet
Representation Learning
6 pages
ML final
No ratings yet
ML final
92 pages
Information Retrieval Important questions
No ratings yet
Information Retrieval Important questions
20 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
Evolutional Study On KNN and K-Means Algorithms (SP)
No ratings yet
Evolutional Study On KNN and K-Means Algorithms (SP)
9 pages
ML notes
No ratings yet
ML notes
10 pages
unit V
No ratings yet
unit V
67 pages
CH 4
No ratings yet
CH 4
106 pages
What Are The Types of Machine Learning?
100% (1)
What Are The Types of Machine Learning?
24 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Technical_Report
No ratings yet
Technical_Report
5 pages
Unit II Deep Learning
No ratings yet
Unit II Deep Learning
11 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Workshop M3A - W6 - Linear Regression
No ratings yet
Workshop M3A - W6 - Linear Regression
16 pages
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
No ratings yet
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
17 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
Supervised ML Algorithms
No ratings yet
Supervised ML Algorithms
9 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
ML Unit 1
No ratings yet
ML Unit 1
27 pages
FML AAT (Techtalk)
No ratings yet
FML AAT (Techtalk)
25 pages
Wa0000.
No ratings yet
Wa0000.
26 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
17 pages
Plagiarism
No ratings yet
Plagiarism
24 pages
Supervised and Unsupervised Machine Learning Algorithms
No ratings yet
Supervised and Unsupervised Machine Learning Algorithms
3 pages
Linear Regression in Machine learning - GeeksforGeeks
No ratings yet
Linear Regression in Machine learning - GeeksforGeeks
25 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
DL DL2 DL3 Merged
No ratings yet
DL DL2 DL3 Merged
11 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
Part 2
No ratings yet
Part 2
10 pages
What Is Data?
No ratings yet
What Is Data?
8 pages
1 (1)
No ratings yet
1 (1)
7 pages
Machine Learning
No ratings yet
Machine Learning
115 pages
Aiml - 04 - 28
No ratings yet
Aiml - 04 - 28
4 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
No ratings yet
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
7 pages
Data Science Interview Questions and Answer
100% (1)
Data Science Interview Questions and Answer
41 pages
Ihic-2022 PPT Paper - Id 100
No ratings yet
Ihic-2022 PPT Paper - Id 100
11 pages
PDF Data Smart: Using Data Science To Transform Information Into Insight 2nd Edition Jordan Goldmeier Download
100% (4)
PDF Data Smart: Using Data Science To Transform Information Into Insight 2nd Edition Jordan Goldmeier Download
62 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Python MP Report PDF
No ratings yet
Python MP Report PDF
61 pages
Using Multivariate Statistics 7th Edition Barbara G. Tabachnick download pdf
100% (1)
Using Multivariate Statistics 7th Edition Barbara G. Tabachnick download pdf
47 pages
JRST-Inquiry-based Science Instruction - What Is It and Does It Matter - Results From A Research Synthesis Years 1984 To 2002
No ratings yet
JRST-Inquiry-based Science Instruction - What Is It and Does It Matter - Results From A Research Synthesis Years 1984 To 2002
24 pages
Relationship Between General Practice Capitation Funding and The Quality of Primary Care in England: A Cross-Sectional, 3 - Year Study
No ratings yet
Relationship Between General Practice Capitation Funding and The Quality of Primary Care in England: A Cross-Sectional, 3 - Year Study
11 pages
DS Fresher Resume
No ratings yet
DS Fresher Resume
3 pages
Final report of mini project
No ratings yet
Final report of mini project
52 pages
Early Onset of Drug and Polysubstance Use As Predictors of Injection Drug Use Among
No ratings yet
Early Onset of Drug and Polysubstance Use As Predictors of Injection Drug Use Among
6 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
Data Science Kelly 2018
No ratings yet
Data Science Kelly 2018
6 pages
Lec 05
No ratings yet
Lec 05
53 pages
The R Book, 3rd Edition Elinor Jones - Quickly access the ebook and start reading today
100% (3)
The R Book, 3rd Edition Elinor Jones - Quickly access the ebook and start reading today
62 pages
Infectious Disease Modelling: Lamiaa A. Amar, Ashraf A. Taha, Marwa Y. Mohamed
No ratings yet
Infectious Disease Modelling: Lamiaa A. Amar, Ashraf A. Taha, Marwa Y. Mohamed
13 pages
Do Gender, Educational Level, Religiosity, and Work Experience Affect The Ethical Decision-Making of U.S. Accountants?
No ratings yet
Do Gender, Educational Level, Religiosity, and Work Experience Affect The Ethical Decision-Making of U.S. Accountants?
17 pages
3 - Logistic Regression (v3)
No ratings yet
3 - Logistic Regression (v3)
31 pages
Probabilistic Models in The Study of Language
No ratings yet
Probabilistic Models in The Study of Language
274 pages
Delisi & Walters, 2011
No ratings yet
Delisi & Walters, 2011
15 pages
Artificial Intelligence and Image Processing Techniques For Blood Group Prediction
100% (1)
Artificial Intelligence and Image Processing Techniques For Blood Group Prediction
7 pages
18CSC305J - UNIT-4.pptx - 18CSC305J - UNIT-4
No ratings yet
18CSC305J - UNIT-4.pptx - 18CSC305J - UNIT-4
77 pages
Unit - 2
No ratings yet
Unit - 2
3 pages
Chi-Square Test For Feature Selection in Machine Learning
No ratings yet
Chi-Square Test For Feature Selection in Machine Learning
15 pages
Faccio Et Al .2006. Political Connections and Corporate Bailouts
No ratings yet
Faccio Et Al .2006. Political Connections and Corporate Bailouts
44 pages
DATA MINING Chapter 1 and 2 Lect Slide
No ratings yet
DATA MINING Chapter 1 and 2 Lect Slide
47 pages
Beyond - Cost - Model (Price Elasticity)
No ratings yet
Beyond - Cost - Model (Price Elasticity)
29 pages
Machine Learning Based Framework
No ratings yet
Machine Learning Based Framework
45 pages
The Acute: Chronic Workload Ratio Predicts Injury: High Chronic Workload May Decrease Injury Risk in Elite Rugby..
No ratings yet
The Acute: Chronic Workload Ratio Predicts Injury: High Chronic Workload May Decrease Injury Risk in Elite Rugby..
9 pages