0% found this document useful (0 votes)

25 views

Naive Bayes Algorithm

Uploaded by

Poonguzhali bcomca

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Naive Bayes Algorithm

Uploaded by

Poonguzhali bcomca

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Naive Bayes Classifiers

Last Updated : 01 Mar, 2024




A Naive Bayes classifiers, a family of algorithms based on Bayes’
Theorem. Despite the “naive” assumption of feature
independence, these classifiers are widely utilized for their
simplicity and efficiency in machine learning. The article delves
into theory, implementation, and applications, shedding light on
their practical utility despite oversimplified assumptions.
What is Naive Bayes Classifiers?
Naive Bayes classifiers are a collection of classification algorithms
based on Bayes’ Theorem. It is not a single algorithm but a family
of algorithms where all of them share a common principle, i.e.
every pair of features being classified is independent of each
other. To start with, let us consider a dataset.
One of the most simple and effective classification algorithms, the
Naïve Bayes classifier aids in the rapid development of machine
learning models with rapid prediction capabilities.
Naïve Bayes algorithm is used for classification problems. It is
highly used in text classification. In text classification tasks, data
contains high dimension (as each word represent one feature in
the data). It is used in spam filtering, sentiment detection, rating
classification etc. The advantage of using naïve Bayes is its
speed. It is fast and making prediction is easy with high
dimension of data.
This model predicts the probability of an instance belongs to a
class with a given set of feature value. It is a probabilistic
classifier. It is because it assumes that one feature in the model is
independent of existence of another feature. In other words, each
feature contributes to the predictions with no relation between
each other. In real world, this condition satisfies rarely. It uses
Bayes theorem in the algorithm for training and prediction
Why it is Called Naive Bayes?
The “Naive” part of the name indicates the simplifying
assumption made by the Naïve Bayes classifier. The classifier
assumes that the features used to describe an observation are
conditionally independent, given the class label. The “Bayes” part
of the name refers to Reverend Thomas Bayes, an 18th-century
statistician and theologian who formulated Bayes’ theorem.
Consider a fictional dataset that describes the weather conditions
for playing a game of golf. Given the weather conditions, each
tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for
playing golf.Here is a tabular representation of our dataset.
Play
Outlook Temperature Humidity Windy Golf

0 Rainy Hot High False No

1 Rainy Hot High True No

2 Overcast Hot High False Yes

3 Sunny Mild High False Yes

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

6 Overcast Cool Normal True Yes

7 Rainy Mild High False No

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overcast Mild High True Yes

12 Overcast Hot Normal False Yes

13 Sunny Mild High True No

The dataset is divided into two parts, namely, feature

matrix and the response vector.
 Feature matrix contains all the vectors(rows) of dataset in
which each vector consists of the value of dependent
features. In above dataset, features are ‘Outlook’,
‘Temperature’, ‘Humidity’ and ‘Windy’.
 Response vector contains the value of class
variable(prediction or output) for each row of feature
matrix. In above dataset, the class variable name is ‘Play
golf’.
Assumption of Naive Bayes
The fundamental Naive Bayes assumption is that each feature
makes an:
 Feature independence: The features of the data are
conditionally independent of each other, given the class
label.
 Continuous features are normally distributed: If a
feature is continuous, then it is assumed to be normally
distributed within each class.
 Discrete features have multinomial distributions: If
a feature is discrete, then it is assumed to have a
multinomial distribution within each class.
 Features are equally important: All features are
assumed to contribute equally to the prediction of the
class label.
 No missing data: The data should not contain any
missing values.
With relation to our dataset, this concept can be understood as:
 We assume that no pair of features are dependent. For
example, the temperature being ‘Hot’ has nothing to do
with the humidity or the outlook being ‘Rainy’ has no
effect on the winds. Hence, the features are assumed to
be independent.
 Secondly, each feature is given the same weight(or
importance). For example, knowing only temperature and
humidity alone can’t predict the outcome accurately.
None of the attributes is irrelevant and assumed to be
contributing equally to the outcome.
The assumptions made by Naive Bayes are not generally correct
in real-world situations. In-fact, the independence assumption is
never correct but often works well in practice.Now, before moving
to the formula for Naive Bayes, it is important to know about
Bayes’ theorem.

Bayes’ Theorem
Bayes’ Theorem finds the probability of an event occurring given
the probability of another event that has already occurred. Bayes’

𝑃(𝐴∣𝐵)=𝑃(𝐵∣𝐴)𝑃(𝐴)𝑃(𝐵)P(A∣B)=P(B)P(B∣A)P(A)
theorem is stated mathematically as the following equation:

where A and B are events and P(B) ≠ 0

 Basically, we are trying to find probability of event A,
given the event B is true. Event B is also termed
as evidence.
 P(A) is the priori of A (the prior probability, i.e. Probability
of event before evidence is seen). The evidence is an
attribute value of an unknown instance(here, it is event
B).
 P(B) is Marginal Probability: Probability of Evidence.
 P(A|B) is a posteriori probability of B, i.e. probability of
event after evidence is seen.
 P(B|A) is Likelihood probability i.e the likelihood that a
hypothesis will come true based on the evidence.
Now, with regards to our dataset, we can apply Bayes’ theorem in

𝑃(𝑦∣𝑋)=𝑃(𝑋∣𝑦)𝑃(𝑦)𝑃(𝑋)P(y∣X)=P(X)P(X∣y)P(y)
following way:

where, y is class variable and X is a dependent feature vector (of

𝑋=(𝑥1,𝑥2,𝑥3,…..,𝑥𝑛)X=(x1,x2,x3,…..,xn)
size n) where:

Just to clear, an example of a feature vector and corresponding

class variable can be: (refer 1st row of dataset)
X = (Rainy, Hot, High, False)

So basically, 𝑃(𝑦∣𝑋)P(y∣X)here means, the probability of “Not

y = No

playing golf” given that the weather conditions are “Rainy

outlook”, “Temperature is hot”, “high humidity” and “no wind”.
With relation to our dataset, this concept can be understood as:
 We assume that no pair of features are dependent. For
example, the temperature being ‘Hot’ has nothing to do
with the humidity or the outlook being ‘Rainy’ has no
effect on the winds. Hence, the features are assumed to
be independent.
 Secondly, each feature is given the same weight(or
importance). For example, knowing only temperature and
humidity alone can’t predict the outcome accurately.
None of the attributes is irrelevant and assumed to be
contributing equally to the outcome.
Now, its time to put a naive assumption to the Bayes’ theorem,
which is, independence among the features. So now, we
split evidence into the independent parts.
Now, if any two events A and B are independent, then,
P(A,B) = P(A)P(B)

𝑃(𝑦∣𝑥1,…,𝑥𝑛)=𝑃(𝑥1∣𝑦)𝑃(𝑥2∣𝑦)…𝑃(𝑥𝑛∣𝑦)𝑃(𝑦)𝑃(𝑥1)𝑃(𝑥2)…𝑃(𝑥𝑛)P(y∣x1,…,xn
Hence, we reach to the result:

)=P(x1)P(x2)…P(xn)P(x1∣y)P(x2∣y)…P(xn∣y)P(y)
which can be expressed as:
𝑃(𝑦∣𝑥1,…,𝑥𝑛)=𝑃(𝑦)∏𝑖=1𝑛𝑃(𝑥𝑖∣𝑦)𝑃(𝑥1)𝑃(𝑥2)…𝑃(𝑥𝑛)P(y∣x1,…,xn)=P(x1)P(x2)…P(xn
)P(y)∏i=1nP(xi∣y)
Now, as the denominator remains constant for a given input, we

𝑃(𝑦∣𝑥1,…,𝑥𝑛)∝𝑃(𝑦)∏𝑖=1𝑛𝑃(𝑥𝑖∣𝑦)P(y∣x1,…,xn)∝P(y)∏i=1nP(xi∣y)
can remove that term:

Now, we need to create a classifier model. For this, we find the

probability of given set of inputs for all possible values of the
class variable y and pick up the output with maximum probability.

𝑦=𝑎𝑟𝑔𝑚𝑎𝑥𝑦𝑃(𝑦)∏𝑖=1𝑛𝑃(𝑥𝑖∣𝑦)y=argmaxyP(y)∏i=1nP(xi∣y)
This can be expressed mathematically as:

calculating 𝑃(𝑦)P(y)and 𝑃(𝑥𝑖∣𝑦)P(xi∣y).

So, finally, we are left with the task of

Please note that 𝑃(𝑦)P(y) is also called class probability

and 𝑃(𝑥𝑖∣𝑦)P(xi∣y) is called conditional probability.

assumptions they make regarding the distribution of 𝑃(𝑥𝑖∣𝑦).P(xi∣y).

The different naive Bayes classifiers differ mainly by the

Let us try to apply the above formula manually on our weather

dataset. For this, we need to do some precomputations on our

We need to find𝑃(𝑥𝑖∣𝑦𝑗)P(xi∣yj)for each 𝑥𝑖xi in X and𝑦𝑗yj in y. All

dataset.

these calculations have been demonstrated in the tables below:

So, in the figure above, we have calculated 𝑃(𝑥𝑖 ∣𝑦𝑗)P(xi ∣yj) for
each 𝑥𝑖xi in X and 𝑦𝑗yj in y manually in the tables 1-4. For
example, probability of playing golf given that the temperature is

Also, we need to find class probabilities 𝑃(𝑦)P(y) which has been

cool, i.e P(temp. = cool | play golf = Yes) = 3/9.

calculated in the table 5. For example, P(play golf = Yes) = 9/14.

So now, we are done with our pre-computations and the classifier
is ready!
Let us test it on a new set of features (let us call it today):

𝑃(𝑌𝑒𝑠∣𝑡𝑜𝑑𝑎𝑦)=𝑃(𝑆𝑢𝑛𝑛𝑦𝑂𝑢𝑡𝑙𝑜𝑜𝑘∣𝑌𝑒𝑠)𝑃(𝐻𝑜𝑡𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒∣𝑌𝑒𝑠)𝑃(𝑁𝑜𝑟
today = (Sunny, Hot, Normal, False)

𝑚𝑎𝑙𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦∣𝑌𝑒𝑠)𝑃(𝑁𝑜𝑊𝑖𝑛𝑑∣𝑌𝑒𝑠)𝑃(𝑌𝑒𝑠)𝑃(𝑡𝑜𝑑𝑎𝑦)P(Yes∣today)=P(today)P(
SunnyOutlook∣Yes)P(HotTemperature∣Yes)P(NormalHumidity∣Yes)P(NoWind∣Yes)P(Yes)

𝑃(𝑁𝑜∣𝑡𝑜𝑑𝑎𝑦)=𝑃(𝑆𝑢𝑛𝑛𝑦𝑂𝑢𝑡𝑙𝑜𝑜𝑘∣𝑁𝑜)𝑃(𝐻𝑜𝑡𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒∣𝑁𝑜)𝑃(𝑁𝑜𝑟𝑚𝑎
and probability to not play golf is given by:

𝑙𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦∣𝑁𝑜)𝑃(𝑁𝑜𝑊𝑖𝑛𝑑∣𝑁𝑜)𝑃(𝑁𝑜)𝑃(𝑡𝑜𝑑𝑎𝑦)P(No∣today)=P(today)P(SunnyO
utlook∣No)P(HotTemperature∣No)P(NormalHumidity∣No)P(NoWind∣No)P(No)
Since, P(today) is common in both probabilities, we can ignore

𝑃(𝑌𝑒𝑠∣𝑡𝑜𝑑𝑎𝑦)∝39.29.69.69.914≈0.02116P(Yes∣today)∝93.92.96.96.149≈0.02116
P(today) and find proportional probabilities as:

𝑃(𝑁𝑜∣𝑡𝑜𝑑𝑎𝑦)∝35.25.15.25.514≈0.0068P(No∣today)∝53.52.51.52.145≈0.0068
and

𝑃(𝑌𝑒𝑠∣𝑡𝑜𝑑𝑎𝑦)+𝑃(𝑁𝑜∣𝑡𝑜𝑑𝑎𝑦)=1P(Yes∣today)+P(No∣today)=1
Now, since

These numbers can be converted into a probability by making the

𝑃(𝑌𝑒𝑠∣𝑡𝑜𝑑𝑎𝑦)=0.021160.02116+0.0068≈0.0237P(Yes∣today)=0.02116+0.00680.02116
sum equal to 1 (normalization):

≈0.0237

𝑃(𝑁𝑜∣𝑡𝑜𝑑𝑎𝑦)=0.00680.0141+0.0068≈0.33P(No∣today)=0.0141+0.00680.0068≈0.33
and

𝑃(𝑌𝑒𝑠∣𝑡𝑜𝑑𝑎𝑦)>𝑃(𝑁𝑜∣𝑡𝑜𝑑𝑎𝑦)P(Yes∣today)>P(No∣today)
Since

So, prediction that golf would be played is ‘Yes’.

The method that we discussed above is applicable for discrete
data. In case of continuous data, we need to make some
assumptions regarding the distribution of values of each feature.

assumptions they make regarding the distribution of 𝑃(𝑥𝑖∣𝑦).P(xi∣y).

The different naive Bayes classifiers differ mainly by the

Types of Naive Bayes Model

There are three types of Naive Bayes Model:
Gaussian Naive Bayes classifier
In Gaussian Naive Bayes, continuous values associated with each
feature are assumed to be distributed according to a Gaussian
distribution. A Gaussian distribution is also called Normal
distribution When plotted, it gives a bell shaped curve which is
symmetric about the mean of the feature values as shown below:

Updated table of prior probabilities for outlook feature is as

following:
The likelihood of the features is assumed to be Gaussian, hence,

𝑃(𝑥𝑖∣𝑦)=12𝜋𝜎𝑦2𝑒𝑥𝑝(−(𝑥𝑖−𝜇𝑦)22𝜎𝑦2)P(xi∣y)=2πσy21exp(−2σy2(xi−μy)2)
conditional probability is given by:

Now, we look at an implementation of Gaussian Naive Bayes

classifier using scikit-learn.

Yes No P(Yes) P(No)

Sunny 3 2 3/9 2/5

Rainy 4 0 4/9 0/5

Overcast 2 3 2/9 3/5

Total 9 5 100% 100%

 Python

# load the iris dataset

from sklearn.datasets import
load_iris
iris = load_iris()
# store the feature matrix (X) and
response vector (y)
X = iris.data
y = iris.target

# splitting X and y into training

and testing sets
from sklearn.model_selection import
train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X, y,
test_size=0.4, random_state=1)

# training the model on training

set
from sklearn.naive_bayes import
GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# making predictions on the testing

set
y_pred = gnb.predict(X_test)

# comparing actual response values

(y_test) with predicted response
values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model
accuracy(in %):",
metrics.accuracy_score(y_test,
y_pred)*100)

Output:
Gaussian Naive Bayes model accuracy(in %): 95.0
Multinomial Naive Bayes
Feature vectors represent the frequencies with which certain
events have been generated by a multinomial distribution. This is
the event model typically used for document classification.
Bernoulli Naive Bayes
In the multivariate Bernoulli event model, features are
independent booleans (binary variables) describing inputs. Like
the multinomial model, this model is popular for document
classification tasks, where binary term occurrence(i.e. a word
occurs in a document or not) features are used rather than term
frequencies(i.e. frequency of a word in the document).
Advantages of Naive Bayes Classifier
 Easy to implement and computationally efficient.
 Effective in cases with a large number of features.
 Performs well even with limited training data.
 It performs well in the presence of categorical features.
 For numerical features data is assumed to come from
normal distributions
Disadvantages of Naive Bayes Classifier
 Assumes that features are independent, which may not
always hold in real-world data.
 Can be influenced by irrelevant attributes.
 May assign zero probability to unseen events, leading to
poor generalization.
Applications of Naive Bayes Classifier
 Spam Email Filtering: Classifies emails as spam or non-
spam based on features.
 Text Classification: Used in sentiment analysis,
document categorization, and topic classification.
 Medical Diagnosis: Helps in predicting the likelihood of
a disease based on symptoms.
 Credit Scoring: Evaluates creditworthiness of individuals
for loan approval.
 Weather Prediction: Classifies weather conditions
based on various factors.
As we reach to the end of this article, here are some important
points to ponder upon:
 In spite of their apparently over-simplified assumptions,
naive Bayes classifiers have worked quite well in many
real-world situations, famously document classification
and spam filtering. They require a small amount of
training data to estimate the necessary parameters.
 Naive Bayes learners and classifiers can be extremely fast
compared to more sophisticated methods. The decoupling
of the class conditional feature distributions means that
each distribution can be independently estimated as a
one dimensional distribution. This in turn helps to
alleviate problems stemming from the curse of
dimensionality.
Conclusion
In conclusion, Naive Bayes classifiers, despite their simplified
assumptions, prove effective in various applications, showcasing
notable performance in document classification and spam
filtering. Their efficiency, speed, and ability to work with limited
data make them valuable in real-world scenarios, compensating
for their naive independence assumption.
Frequently Asked Questions on Naive Bayes
Classifiers
What is Naive Bayes real example?
Naive Bayes is a simple probabilistic classifier based on Bayes’
theorem. It assumes that the features of a given data point are
independent of each other, which is often not the case in reality.
However, despite this simplifying assumption, Naive Bayes has
been shown to be surprisingly effective in a wide range of
applications.
Why is it called Naive Bayes?
Naive Bayes is called “naive” because it assumes that the
features of a data point are independent of each other. This
assumption is often not true in reality, but it does make the
algorithm much simpler to compute.
What is an example of a Bayes classifier?
A Bayes classifier is a type of classifier that uses Bayes’ theorem
to compute the probability of a given class for a given data point.
Naive Bayes is one of the most common types of Bayes
classifiers.
What is better than Naive Bayes?
There are several classifiers that are better than Naive Bayes in
some situations. For example, logistic regression is often more
accurate than Naive Bayes, especially when the features of a data
point are correlated with each other.
Can Naive Bayes probability be greater than 1?
No, the probability of an event cannot be greater than 1. The
probability of an event is a number between 0 and 1, where 0
indicates that the event is impossible and 1 indicates that the
event is certain.

Naive Bayes Algorithm: A Complete

guide for Data Science Enthusiasts

Anshul Saini 03 May, 2024 • 8 min read

Introduction

In this article, we will discuss the mathematical intuition behind Naive Bayes

Classifiers, and we’ll also see how to implement this on Python.Also We Will

Discuss about Naive Bayes algorithm in Machine Learning

This model is easy to build and is mostly used for large datasets. It is a

probabilistic machine learning model that is used for classification problems. The

core of the classifier depends on the Naive Bayes theorem with an assumption of

independence among predictors. That means changing the value of a feature

doesn’t change the value of another feature.

Why is it called Naive?

It is called Naive because of the assumption that 2 variables are independent

when they may not be. In a real-world scenario, there is hardly any situation

where the features are independent.

Naive Bayes does seem to be a simple yet powerful algorithm. But why is it so

popular?

Since it is a probabilistic approach, the predictions can be made real quick. It can

be used for both binary and multi-class classification problems.

Before we dive deeper into this topic we need to understand what is “Conditional

probability”, what is “Bayes’ theorem” and how conditional probability help’s us in

Bayes’ theorem.

This article was published as a part of the Data Science Blogathon

Table of contents
What is Naive Bayes Algorithm?

The Naive Bayes algorithm is a popular and simple classification algorithm used

in machine learning. It works by calculating the probability of an item belonging

to a certain class based on its features.

Naive Bayes Algorithm in Machine Learning

Naive Bayes is a simple but powerful method in machine learning used for

guessing categories of things. Imagine sorting emails into spam or inbox. Naive

Bayes looks at each word (like a clue) and predicts how likely it is to be spam

based on past emails. It assumes these words aren’t connected (not always

true!), but it’s fast and works well, making it a popular choice for many

tasks.sharemore_vert

Conditional Probability for Naive Bayes

Conditional probability is defined as the likelihood of an event or outcome

occurring, based on the occurrence of a previous event or outcome. Conditional

probability is calculated by multiplying the probability of the preceding event by

the updated probability of the succeeding, or conditional, event.

Let’s start understanding this definition with examples.

Suppose I ask you to pick a card from the deck and find the probability of getting

a king given the card is clubs.

Observe carefully that here I have mentioned a condition that the card is clubs.
Now while calculating the probability my denominator will not be 52, instead, it

will be 13 because the total number of cards in clubs is 13.

Since we have only one king in clubs the probability of getting a KING given the

card is clubs will be 1/13 = 0.077.

Let’s take one more example,

Consider a random experiment of tossing 2 coins. The sample space here will

be:

S = {HH, HT, TH, TT}

If a person is asked to find the probability of getting a tail his answer would be

3/4 = 0.75

Now suppose this same experiment is performed by another person but now we

give him the condition that both the coins should have heads. This means if

event A: ‘Both the coins should have heads’, has happened then the elementary

outcomes {HT, TH, TT} could not have happened. Hence in this situation, the

probability of getting heads on both the coins will be 1/4 = 0.25

From the above examples, we observe that the probability may change if some

additional information is given to us. This is exactly the case while building any

machine learning model, we need to find the output given some features.

Mathematically, the conditional probability of event A given event B has already

happened is given by:

Image Source: Author

Bayes’ Rule

Now we are prepared to state one of the most useful results in conditional

probability: Bayes’ Rule.

Bayes’ theorem which was given by Thomas Bayes, a British Mathematician, in

1763 provides a means for calculating the probability of an event given some

information.

Mathematically Bayes’ theorem can be stated as:

Basically, we are trying to find the probability of event A, given event B is true.

Here P(B) is called prior probability which means it is the probability of an event

before the evidence

P(B|A) is called the posterior probability i.e., Probability of an event after the

evidence is seen.

With regards to our dataset, this formula can be re-written as:

Y: class of the variable

X: dependent feature vector (of size n)

Image Source: Author

What is Naive Bayes?

Bayes’ rule provides us with the formula for the probability of Y given some

feature X. In real-world problems, we hardly find any case where there is only

one feature.
When the features are independent, we can extend Bayes’ rule to what is called

Naive Bayes which assumes that the features are independent that means

changing the value of one feature doesn’t influence the values of other variables

and this is why we call this algorithm “NAIVE”

Naive Bayes can be used for various things like face recognition, weather

prediction, Medical Diagnosis, News classification, Sentiment Analysis, and a lot

more.

When there are multiple X variables, we simplify it by assuming that X’s are

independent, so

For n number of X, the formula becomes Naive Bayes:

Which can be expressed as:

Since the denominator is constant here so we can remove it. It’s purely your

choice if you want to remove it or not. Removing the denominator will help you

save time and calculations.

This formula can also be understood as:

Image Source: Author

There are a whole lot of formulas mentioned here but worry not we will try to

understand all this with the help of an example.

Naive Bayes Example

Let’s take a dataset to predict whether we can pet an animal or not.

Image Source: Author

Assumptions of Naive Bayes

· All the variables are independent. That is if the animal is Dog that doesn’t mean

that Size will be Medium

· All the predictors have an equal effect on the outcome. That is, the animal

being dog does not have more importance in deciding If we can pet him or not.

All the features have equal importance.

We should try to apply the Naive Bayes Classifier formula on the above dataset

however before that, we need to do some precomputations on our dataset.

We need to find P(x i|yj) for each x i in X and each y j in Y. All these calculations

have been demonstrated below:

Image Source: Author

We also need the probabilities (P(y)), which are calculated in the table below.

For example, P(Pet Animal = NO) = 6/14.

Image Source: Author

Now if we send our test data, suppose test = (Cow, Medium, Black)

Probability of petting an animal :

And the probability of not petting an animal:

We know P(Yes|Test)+P(No|test) = 1

So, we will normalize the result:

We see here that P(Yes|Test) > P(No|Test), so the prediction that we can pet this

animal is “Yes”.

Gaussian Naive Bayes

So far, we have discussed how to predict probabilities if the predictors take up

discrete values. But what if they are continuous? For this, we need to make

some more assumptions regarding the distribution of each feature. The different

naive Bayes classifiers differ mainly by the assumptions they make regarding the

distribution of P(xi | y). Here we’ll discuss Gaussian Naïve Bayes.

Gaussian Naïve Bayes is used when we assume all the continuous variables

associated with each feature to be distributed according to Gaussian

Distribution. Gaussian Distribution is also called Normal distribution.

The conditional probability changes here since we have different values now.

Also, the (PDF) probability density function of a normal distribution is given by:

We can use this formula to compute the probability of likelihoods if our data is

continuous.

Endnotes

Naive Bayes algorithms are mostly used in face recognition, weather prediction,

Medical Diagnosis, News classification, Sentiment Analysis, etc. In this article,

we learned the mathematical intuition behind Naive bayes algorithm in Machine

learning. You have already taken your first step to master this algorithm and

from here all you need is practi

Unsupervised Machine Learning
In the previous topic, we learned supervised machine learning in which
models are trained using labeled data under the supervision of training
data. But there may be many cases in which we do not have labeled data
and need to find the hidden patterns from the given dataset. So, to solve
such types of cases in machine learning, we need unsupervised learning
techniques.

What is Unsupervised Learning?

As the name suggests, unsupervised learning is a machine learning
technique in which models are not supervised using training dataset.
Instead, models itself find the hidden patterns and insights from the given
data. It can be compared to learning which takes place in the human brain
while learning new things. It can be defined as:

Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.

Unsupervised learning cannot be directly applied to a regression or

classification problem because unlike supervised learning, we have the
input data but no corresponding output data. The goal of unsupervised
learning is to find the underlying structure of dataset, group that
data according to similarities, and represent that dataset in a
compressed format.

Example: Suppose the unsupervised learning algorithm is given an input

dataset containing images of different types of cats and dogs. The
algorithm is never trained upon the given dataset, which means it does
not have any idea about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image features on their
own. Unsupervised learning algorithm will perform this task by clustering
the image dataset into the groups according to similarities between
images.

PlayNext
Mute

Current TimeÂ 0:02

DurationÂ 18:10
Loaded: 3.30%
Â
Fullscreen
ADVERTISEMENT

Why use Unsupervised Learning?

Below are some main reasons which describe the importance of
Unsupervised Learning:

o Unsupervised learning is helpful for finding useful insights from the

data.
o Unsupervised learning is much similar as a human learns to think by
their own experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data
which make unsupervised learning more important.
o In real-world, we do not always have input data with the
corresponding output so to solve such cases, we need unsupervised
learning.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below
diagram:
Here, we have taken an unlabeled input data, which means it is not
categorized and corresponding outputs are also not given. Now, this
unlabeled input data is fed to the machine learning model in order to train
it. Firstly, it will interpret the raw data to find the hidden patterns from the
data and then will apply suitable algorithms such as k-means clustering,
Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data
objects into groups according to the similarities and difference between
the objects.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two
types of problems:
o Clustering: Clustering is a method of grouping the objects into
clusters such that objects with most similarities remains into a
group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data
objects and categorizes them as per the presence and absence of
those commonalities.
o Association: An association rule is an unsupervised learning
method which is used for finding the relationships between variables
in the large database. It determines the set of items that occurs
together in the dataset. Association rule makes marketing strategy
more effective. Such as people who buy X item (suppose a bread)
are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.

Note: We will learn these algorithms in later chapters.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Advantages of Unsupervised Learning

o Unsupervised learning is used for more complex tasks as compared
to supervised learning because, in unsupervised learning, we don't
have labeled input data.
o Unsupervised learning is preferable as it is easy to get unlabeled
data in comparison to labeled data.

Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than supervised
learning as it does not have corresponding output.
o The result of the unsupervised learning algorithm might be less
accurate as input data is not labeled, and algorithms do not know
the exact output in advance.

Linear Regression vs Logistic Regression

Linear Regression and Logistic Regression are the two famous Machine
Learning Algorithms which come under supervised learning technique.
Since both the algorithms are of supervised in nature hence these
algorithms use labeled dataset to make the predictions. But the main
difference between them is how they are being used. The Linear
Regression is used for solving Regression problems whereas Logistic
Regression is used for solving the Classification problems. The description
of both the algorithms is given below along with difference table.

Linear Regression:
ADVERTISEMENT

o Linear Regression is one of the most simple Machine learning algorithm

that comes under Supervised Learning technique and used for solving
regression problems.
o It is used for predicting the continuous dependent variable with the help of
independent variables.
o The goal of the Linear regression is to find the best fit line that can
accurately predict the output for the continuous dependent variable.
o If single independent variable is used for prediction then it is called Simple
Linear Regression and if there are more than two independent variables
then such regression is called as Multiple Linear Regression.
o By finding the best fit line, algorithm establish the relationship between
dependent variable and independent variable. And the relationship should
be of linear nature.
o The output for Linear regression should only be the continuous values
such as price, age, salary, etc. The relationship between the dependent
variable and independent variable can be shown in below image:

In above image the dependent variable is on Y-axis (salary) and

independent variable is on x-axis(experience). The regression line can be
written as:

y= a0+a1x+ ε

Where, a0 and a1 are the coefficients and ε is the error term.

Logistic Regression:
o Logistic regression is one of the most popular Machine learning algorithm
that comes under Supervised Learning techniques.
o It can be used for Classification as well as for Regression problems, but
mainly used for Classification problems.
o Logistic regression is used to predict the categorical dependent variable
with the help of independent variables.
o The output of Logistic Regression problem can be only between the 0 and
1.
o Logistic regression can be used where the probabilities between two
classes is required. Such as whether it will rain today or not, either 0 or 1,
true or false etc.
o Logistic regression is based on the concept of Maximum Likelihood
estimation. According to this estimation, the observed data should be
most probable.
o In logistic regression, we pass the weighted sum of inputs through an
activation function that can map values in between 0 and 1. Such
activation function is known as sigmoid function and the curve obtained
is called as sigmoid curve or S-curve. Consider the below image:

o The equation for logistic regression is:

Difference between Linear Regression and Logistic Regression:

Linear Regression Logistic Regression

Linear regression is used to predict Logistic Regression is used to

the continuous dependent variable predict the categorical dependent
using a given set of independent variable using a given set of
variables. independent variables.

Linear Regression is used for Logistic regression is used for

solving Regression problem. solving Classification problems.

In Linear regression, we predict the In logistic Regression, we predict

value of continuous variables. the values of categorical variables.

In linear regression, we find the In Logistic Regression, we find the

best fit line, by which we can easily S-curve by which we can classify
predict the output. the samples.

Least square estimation method is Maximum likelihood estimation

used for estimation of accuracy. method is used for estimation of
accuracy.

The output for Linear Regression The output of Logistic Regression

must be a continuous value, such must be a Categorical value such as
as price, age, etc. 0 or 1, Yes or No, etc.

In Linear regression, it is required In Logistic regression, it is not

that relationship between required to have the linear
dependent variable and relationship between the dependent
independent variable must be and independent variable.
linear.

In linear regression, there may be In logistic regression, there should

collinearity between the not be collinearity between the
independent variables. independent variable.

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to
solve the clustering problems in machine learning or data science. In this
topic, we will learn what is K-means clustering algorithm, how the
algorithm works, along with the Python implementation of k-means
clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups
the unlabeled dataset into different clusters. Here K defines the number of
pre-defined clusters that need to be created in the process, as if K=2,
there will be two clusters, and for K=3, there will be three clusters, and so
on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a
way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way
to discover the categories of groups in the unlabeled dataset on its own
without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a

centroid. The main aim of this algorithm is to minimize the sum of
distances between the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset
into k-number of clusters, and repeats the process until it does not find
the best clusters. The value of k should be predetermined in this
algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative

process.
o Assigns each data point to its closest k-center. Those data points which are
near to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is

away from other clusters.

The below diagram explains the working of the K-means Clustering

Algorithm:
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the
input dataset).

Step-3: Assign each data point to their closest centroid, which will form
the predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to
the new closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of
these two variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these
datasets into two different clusters.
o We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point.
So, here we are selecting the below two points as k points, which are not
the part of our dataset. Consider the below image:
o Now we will assign each data point of the scatter plot to its closest K-point
or centroid. We will compute it by applying some mathematics that we
have studied to calculate the distance between two points. So, we will
draw a median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to
the K1 or blue centroid, and points to the right of the line are close to the
yellow centroid. Let's color them as blue and yellow for clear visualization.

o As we need to find the closest cluster, so we will repeat the process by

choosing a new centroid. To choose the new centroids, we will compute
the center of gravity of these centroids, and will find new centroids as
below:

o Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
below image:
From the above image, we can see, one yellow point is on the left side of
the line, and two blue points are right to the line. So, these three points
will be assigned to new centroids.

As reassignment has taken place, so we will again go to the step-4, which

is finding new centroids or K-points.

o We will repeat the process by finding the center of gravity of centroids, so

the new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on
either side of the line, which means our model is formed. Consider the
below image:

As our model is ready, so we can now remove the assumed centroids, and
the two final clusters will be as shown in the below image:
How to choose the value of "K number of clusters" in
K-means Clustering?
The performance of the K-means clustering algorithm depends upon
highly efficient clusters that it forms. But choosing the optimal number of
clusters is a big task. There are some different ways to find the optimal
number of clusters, but here we are discussing the most appropriate
method to find the number of clusters or value of K. The method is given
below:

Elbow Method
The Elbow method is one of the most popular ways to find the optimal
number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which defines
the total variations within a cluster. The formula to calculate the value of
WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2 distance(Pi C2)2+∑Pi in

2
CLuster3 distance(Pi C3)

In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances
between each data point and its centroid within a cluster1 and the same
for the other two terms.
To measure the distance between data points and centroid, we can use
any method such as Euclidean distance or Manhattan distance.

To find the optimal value of clusters, the elbow method follows the below
steps:

o It executes the K-means clustering on a given dataset for different K

values (ranges from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters
K.
o The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.

Since the graph shows the sharp bend, which looks like an elbow, hence it
is known as the elbow method. The graph for the elbow method looks like
the below image:

Note: We can choose the number of clusters equal to the given data points. If we choose
the number of clusters equal to the data points, then the value of WCSS becomes zero,
and that will be the endpoint of the plot.

Python Implementation of K-means Clustering

Algorithm
In the above section, we have discussed the K-means algorithm, now let's
see how it can be implemented using Python.

Before implementation, let's understand what type of problem we will

solve here. So, we have a dataset of Mall_Customers, which is the data
of customers who visit the mall and spend there.

In the given dataset, we have Customer_Id, Gender, Age, Annual

Income ($), and Spending Score (which is the calculated value of how
much a customer has spent in the mall, the more the value, the more he
has spent). From this dataset, we need to calculate some patterns, as it is
an unsupervised method, so we don't know what to calculate exactly.

The steps to be followed for the implementation are given below:

o Data Pre-processing
o Finding the optimal number of clusters using the elbow method
o Training the K-means algorithm on the training dataset
o Visualizing the clusters

Step-1: Data pre-processing Step

The first step will be the data pre-processing, as we did in our earlier
topics of Regression and Classification. But for the clustering problem, it
will be different from other models. Let's discuss it:

o Importing Libraries
As we did in previous topics, firstly, we will import the libraries for our
model, which is part of data pre-processing. The code is given below:

1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd

In the above code, the numpy we have imported for the performing
mathematics calculation, matplotlib is for plotting the graph,
and pandas are for managing the dataset.

o Importing the Dataset:

Next, we will import the dataset that we need to use. So here, we are
using the Mall_Customer_data.csv dataset. It can be imported using the
below code:
1. # Importing the dataset
2. dataset = pd.read_csv('Mall_Customers_data.csv')

By executing the above lines of code, we will get our dataset in the
Spyder IDE. The dataset looks like the below image:

From the above dataset, we need to find some patterns in it.

o Extracting Independent Variables

Here we don't need any dependent variable for data pre-processing step
as it is a clustering problem, and we have no idea about what to
determine. So we will just add a line of code for the matrix of features.

1. x = dataset.iloc[:, [3, 4]].values

As we can see, we are extracting only 3 rd and 4th feature. It is because we

need a 2d plot to visualize the model, and some features are not required,
such as customer_id.
Step-2: Finding the optimal number of clusters using the elbow
method
In the second step, we will try to find the optimal number of clusters for
our clustering problem. So, as discussed above, here we are going to use
the elbow method for this purpose.

As we know, the elbow method uses the WCSS concept to draw the plot
by plotting WCSS values on the Y-axis and the number of clusters on the
X-axis. So we are going to calculate the value for WCSS for different k
values ranging from 1 to 10. Below is the code for it:

1. #finding optimal number of clusters using the elbow method

2. from sklearn.cluster import KMeans
3. wcss_list= [] #Initializing the list for the values of WCSS
4.
5. #Using for loop for iterations from 1 to 10.
6. for i in range(1, 11):
7. kmeans = KMeans(n_clusters=i, init='k-means++', random_state
= 42)
8. kmeans.fit(x)
9. wcss_list.append(kmeans.inertia_)
10.mtp.plot(range(1, 11), wcss_list)
11. mtp.title('The Elobw Method Graph')
12.mtp.xlabel('Number of clusters(k)')
13. mtp.ylabel('wcss_list')
14.mtp.show()

As we can see in the above code, we have used the KMeans class of
sklearn. cluster library to form the clusters.

Next, we have created the wcss_list variable to initialize an empty list,

which is used to contain the value of wcss computed for different values
of k ranging from 1 to 10.

After that, we have initialized the for loop for the iteration on a different
value of k ranging from 1 to 10; since for loop in Python, exclude the
outbound limit, so it is taken as 11 to include 10 th value.

The rest part of the code is similar as we did in earlier topics, as we have
fitted the model on a matrix of features and then plotted the graph
between the number of clusters and WCSS.
Output: After executing the above code, we will get the below output:

From the above plot, we can see the elbow point is at 5. So the number
of clusters here will be 5.
Step- 3: Training the K-means algorithm on the training dataset
As we have got the number of clusters, so we can now train the model on
the dataset.

To train the model, we will use the same two lines of code as we have
used in the above section, but here instead of using i, we will use 5, as we
know there are 5 clusters that need to be formed. The code is given
below:

1. #training the K-means model on a dataset

2. kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
3. y_predict= kmeans.fit_predict(x)

The first line is the same as above for creating the object of KMeans class.

In the second line of code, we have created the dependent

variable y_predict to train the model.

By executing the above lines of code, we will get the y_predict variable.
We can check it under the variable explorer option in the Spyder IDE.
We can now compare the values of y_predict with our original dataset.
Consider the below image:

From the above image, we can now relate that the CustomerID 1 belongs
to a cluster

3(as index starts from 0, hence 2 will be considered as 3), and 2 belongs
to cluster 4, and so on.
Step-4: Visualizing the Clusters
The last step is to visualize the clusters. As we have 5 clusters for our
model, so we will visualize each cluster one by one.

To visualize the clusters will use scatter plot using mtp.scatter() function
of matplotlib.

1. #visulaizing the clusters

2. mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue',
label = 'Cluster 1') #for first cluster
3. mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c =
'green', label = 'Cluster 2') #for second cluster
4. mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', l
abel = 'Cluster 3') #for third cluster
5. mtp.scatter(x[y_predict == 3, 0], x[y_predict == 3, 1], s = 100, c =
'cyan', label = 'Cluster 4') #for fourth cluster
6. mtp.scatter(x[y_predict == 4, 0], x[y_predict == 4, 1], s = 100, c = 'mage
nta', label = 'Cluster 5') #for fifth cluster
7. mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[
:, 1], s = 300, c = 'yellow', label = 'Centroid')
8. mtp.title('Clusters of customers')
9. mtp.xlabel('Annual Income (k$)')
10.mtp.ylabel('Spending Score (1-100)')
11. mtp.legend()
12.mtp.show()

In above lines of code, we have written code for each clusters, ranging
from 1 to 5. The first coordinate of the mtp.scatter, i.e., x[y_predict == 0,
0] containing the x value for the showing the matrix of features values,
and the y_predict is ranging from 0 to 1.

Output:
The output image is clearly showing the five different clusters with
different colors. The clusters are formed between two parameters of the
dataset; Annual income of customer and Spending. We can change the
colors and labels as per the requirement or choice. We can also observe
some points from the above patterns, which are given below:

o Cluster1 shows the customers with average salary and average spending
so we can categorize these customers as
o Cluster2 shows the customer has a high income but low spending, so we
can categorize them as careful.
o Cluster3 shows the low income and also low spending so they can be
categorized as sensible.
o Cluster4 shows the customers with low income with very high spending so
they can be categorized as careless.
o Cluster5 shows the customers with high income and high spending so they
can be categorized as target, and these customers can be the most
profitable customers for the mall owner.

Naïve Bayes
No ratings yet
Naïve Bayes
15 pages
IBM Slide CHAPTER 3
No ratings yet
IBM Slide CHAPTER 3
48 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
01 Naiv Bayes
No ratings yet
01 Naiv Bayes
25 pages
Naive Bayes
No ratings yet
Naive Bayes
25 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
NOTES
No ratings yet
NOTES
15 pages
ML Naive Bayes 1
No ratings yet
ML Naive Bayes 1
19 pages
Experiment No 6
No ratings yet
Experiment No 6
3 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
UNIT 2 AAM notes (1)
No ratings yet
UNIT 2 AAM notes (1)
38 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
07 - ML - Naive-Bayes-update
No ratings yet
07 - ML - Naive-Bayes-update
26 pages
6. Naive Bayes
No ratings yet
6. Naive Bayes
26 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
07_Naive_Bayes
No ratings yet
07_Naive_Bayes
6 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
16 pages
Unit2_5_part 2
No ratings yet
Unit2_5_part 2
1 page
Bayes Theorem
No ratings yet
Bayes Theorem
7 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Naive Bayes Theory
No ratings yet
Naive Bayes Theory
4 pages
Bayesian Classification and Reasoning_FADML_Seminar
No ratings yet
Bayesian Classification and Reasoning_FADML_Seminar
26 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
AI NOTES unit 2
No ratings yet
AI NOTES unit 2
9 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
3 pages
An Introduction to Naive Bayes Algorithm for Beginners
No ratings yet
An Introduction to Naive Bayes Algorithm for Beginners
11 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
Machine Ass
No ratings yet
Machine Ass
33 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Assignment No 2
No ratings yet
Assignment No 2
5 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
11 pages
Naive Bayes
No ratings yet
Naive Bayes
29 pages
9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
7 pages
Naive Bayesian Classifier: National Institute of Technology Sikkim
No ratings yet
Naive Bayesian Classifier: National Institute of Technology Sikkim
6 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Bayes
No ratings yet
Bayes
5 pages
ML Lecture 10 (Naïve Bayes Classifier)
No ratings yet
ML Lecture 10 (Naïve Bayes Classifier)
14 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Practical-3 Ritesh
No ratings yet
Practical-3 Ritesh
5 pages
Bayesian-Classification Ok
No ratings yet
Bayesian-Classification Ok
21 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Hypothesis Testing Made Simple
From Everand
Hypothesis Testing Made Simple
Leonard Gaston
4/5 (5)
ML - Unit 5
No ratings yet
ML - Unit 5
22 pages
Non-Hierarchical Cluster Analysis Using K-Modes Method On Student of Statistics Major 2015 at The Faculty of Mathematics and Natural Sciences Mulawarman University
No ratings yet
Non-Hierarchical Cluster Analysis Using K-Modes Method On Student of Statistics Major 2015 at The Faculty of Mathematics and Natural Sciences Mulawarman University
8 pages
2009 - Applying cluster analysis to build a patient-centric healthcare service strategy for elderly
No ratings yet
2009 - Applying cluster analysis to build a patient-centric healthcare service strategy for elderly
16 pages
ai-900
No ratings yet
ai-900
23 pages
Introduction to Data Mining 2005th Edition Pang-Ning Tan download pdf
100% (21)
Introduction to Data Mining 2005th Edition Pang-Ning Tan download pdf
60 pages
TIBCO Spotfire Online Training Institutes
No ratings yet
TIBCO Spotfire Online Training Institutes
6 pages
Data Science Portfolio
No ratings yet
Data Science Portfolio
17 pages
COMP9417 Review Notes
No ratings yet
COMP9417 Review Notes
10 pages
Report
No ratings yet
Report
35 pages
Stock Market Analysis Using Clustering Techniques - The Impact of FO On Stock Volatility in Vietnam - Van Dai Ta
No ratings yet
Stock Market Analysis Using Clustering Techniques - The Impact of FO On Stock Volatility in Vietnam - Van Dai Ta
8 pages
ML Lab Manual (final) dtu
No ratings yet
ML Lab Manual (final) dtu
52 pages
Topic: Dimension Reduction With PCA: Instructions
No ratings yet
Topic: Dimension Reduction With PCA: Instructions
8 pages
Wang 2021
No ratings yet
Wang 2021
12 pages
HIT2203 Course Outline
No ratings yet
HIT2203 Course Outline
6 pages
ML 2024 Part6 Classification Unsupervised
No ratings yet
ML 2024 Part6 Classification Unsupervised
43 pages
07 OUTLIER DETECTION
No ratings yet
07 OUTLIER DETECTION
54 pages
Algoritma Data Science School Syllabus
No ratings yet
Algoritma Data Science School Syllabus
23 pages
7 - Chapter 7-Chapter 7 - Density-Based Clustering Methods
No ratings yet
7 - Chapter 7-Chapter 7 - Density-Based Clustering Methods
30 pages
Zhang et al. 2023
No ratings yet
Zhang et al. 2023
23 pages
Validating Clusters using Hopkins Statistics
No ratings yet
Validating Clusters using Hopkins Statistics
5 pages
Flow Diagram of Machine Learning or Life Cycle of Machine Learning
No ratings yet
Flow Diagram of Machine Learning or Life Cycle of Machine Learning
91 pages
Ieee Smart Car Parking
No ratings yet
Ieee Smart Car Parking
6 pages
Mastering Data Analysis with R 1st Edition Daroczi 2024 Scribd Download
100% (4)
Mastering Data Analysis with R 1st Edition Daroczi 2024 Scribd Download
60 pages
And Utilization of Nursing Resources
100% (1)
And Utilization of Nursing Resources
10 pages
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
No ratings yet
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
30 pages
Data Mining: Dosen: Dr. Vitri Tundjungsari
No ratings yet
Data Mining: Dosen: Dr. Vitri Tundjungsari
64 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
No ratings yet
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
46 pages
6th Semester Syllabus
No ratings yet
6th Semester Syllabus
20 pages
IIT Bombay Project
No ratings yet
IIT Bombay Project
24 pages