Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

3.5 Session 14 - Naive Bayes Classifier

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 47

16CS318 DATA ANALYTICS MODULE 3

16CS318 DATA ANALYTICS

MODULE III

Naïve Bayes – Bayes‘ Theorem –


Naïve Bayes Classifier

Prepared by
Ms.P.Anantha Prabha/
Ms.S.Soundarya
AP/CSE

SESSION 14
16CS318 DATA ANALYTICS MODULE 3

16CS318 DATA ANALYTICS

MODULE III

Course Outcome

C603.3 Analyze data by utilizing clustering and classification algorithms. [AN]

SESSION 14
16CS318 DATA ANALYTICS MODULE 3

16CS318 DATA ANALYTICS

MODULE III

Session 10 Exploratory Data Analys,

Session 11 Review of Descriptive Statistics

Session 12 Overview of Classification: Overview of a Decision Tree


SESSION 14 Evaluating a Decision Tree, Decision Trees in R
Session 14 Naïve Bayes – Bayes‘ Theorem – Naïve Bayes Classifier

Session 15 Overview of Clustering – K-means Clustering using R

3
SESSION 13
16CS318 DATA ANALYTICS MODULE 3

Learning Objective

• To provide an insight in to Decision Tree


Classifier

SESSION 14 4
16CS318 DATA ANALYTICS MODULE 3

Outline
• Bayesian Classifier
– Principle of Bayesian classifier
– Bayes’ theorem of probability

• Naïve Bayesian Classifier

5
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Bayesian Classifier

6
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Bayesian Classifier
• Principle
– If it walks like a duck, quacks like a duck, then it is probably a duck

CS 40003: Data Analytics 7


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Bayesian Classifier
• A statistical classifier
– Performs probabilistic prediction, i.e., predicts class membership probabilities

• Foundation
– Based on Bayes’ Theorem.

• Assumptions
1. The classes are mutually exclusive and exhaustive.
2. The attributes are independent given the class.

• Called “Naïve” classifier because of these assumptions.


– Empirically proven to be useful.
– Scales very well.

CS 40003: Data Analytics 8


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time

Cond. to next slide…

CS 40003: Data Analytics 9


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Example: Bayesian Classification


• Example 8.2: Air Traffic Data

– Let us consider a set


observation recorded in a
database
• Regarding the arrival of airplanes
in the routes from any airport to
New Delhi under certain
conditions.

CS 40003: Data Analytics 10


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Air-Traffic Data
Cond. from previous slide…

Days Season Fog Rain Class


Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time

CS 40003: Data Analytics 11


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Air-Traffic Data
• In this database, there are four attributes
A = [ Day, Season, Fog, Rain]
with 20 tuples.
• The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]

• Given this is the knowledge of data and classes, we are to find most likely
classification for any other unseen instance, for example:

Week Day Winter High None ???

• Classification technique eventually to map this tuple into an accurate class.

CS 40003: Data Analytics 12


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Bayesian Classifier
• In many applications, the relationship between the attributes set and the class
variable is non-deterministic.
– In other words, a test cannot be classified to a class label with certainty.

– In such a situation, the classification can be achieved probabilistically.

• The Bayesian classifier is an approach for modelling probabilistic relationships


between the attribute set and the class variable.

• More precisely, Bayesian classifier use Bayes’ Theorem of Probability for


classification.

• Before going to discuss the Bayesian classifier, we should have a quick look at
the Theory of Probability and then Bayes’ Theorem.

CS 40003: Data Analytics 13


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Bayes’ Theorem of Probability

CS 40003: Data Analytics 14


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Simple Probability

Definition 8.2: Simple Probability

If there are n elementary events associated with a random experiment and m of n


of them are favorable to an event A, then the probability of happening or
occurrence of A is
𝑚
𝑃 𝐴 =
𝑛

CS 40003: Data Analytics 15


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Simple Probability
• Suppose, A and B are any two events and P(A), P(B) denote the probabilities
that the events A and B will occur, respectively.

• Mutually Exclusive Events:


– Two events are mutually exclusive, if the occurrence of one precludes the
occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)

Can you give an example, so that two events are not mutually exclusive?
Hint: Tossing two identical coins, Weather (sunny, foggy, warm)

CS 40003: Data Analytics 16


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Simple Probability
• Independent events: Two events are independent if occurrences of one does
not alter the occurrence of other.

Example: Tossing both coin and ludo cube together.


(How many events are here?)

Can you give an example, where an event is dependent on one or more other
events(s)?
Hint: Receiving a message (A) through a communication channel (B)
over a computer (C), rain and dating.

CS 40003: Data Analytics 17


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Joint Probability

Definition 8.3: Joint Probability

If P(A) and P(B) are the probability of two events, then

𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵

If A and B are mutually exclusive, then 𝑃 𝐴 ∩ 𝐵 = 0


If A and B are independent events, then 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 . 𝑃(𝐵)

Thus, for mutually exclusive events


𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵

CS 40003: Data Analytics 18


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Conditional Probability
Definition 8.2: Conditional Probability

If events are dependent, then their probability is expressed by conditional


probability. The probability that A occurs given that Bis denoted by 𝑃(𝐴|𝐵).

Suppose, A and B are two events associated with a random experiment. The
probability of A under the condition that B has already occurred and 𝑃(𝐵) ≠ 0 is
given by

Number of events in 𝐵 which are favourable to 𝐴


𝑃 𝐴𝐵 =
Number of events in 𝐵

Number of events favourable to 𝐴 ∩ 𝐵


=
Number of events favourable to 𝐵

𝑃(𝐴 ∩ 𝐵)
=
𝑃(𝐵)
19
CS 40003: Data Analytics
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Conditional Probability
Corollary 8.1: Conditional Probability

𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 .𝑃 𝐵 𝐴 , 𝑖𝑓 𝑃 𝐴 ≠ 0
or 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 .𝑃 𝐴 𝐵 , 𝑖𝑓 𝑃(𝐵) ≠ 0

For three events A, B and C

𝑃 𝐴 ∩ 𝐵 ∩ 𝐶 = 𝑃 𝐴 .𝑃 𝐵 .𝑃 𝐶 𝐴 ∩ 𝐵

For n events A1, A2, …, An and if all events are mutually independent to each other

𝑃 𝐴1 ∩ 𝐴2 ∩ … … … … ∩ 𝐴𝑛 = 𝑃 𝐴1 . 𝑃 𝐴2 … … … … 𝑃 𝐴𝑛
Note:
𝑃 𝐴 𝐵 = 0if events are mutually exclusive
𝑃 𝐴 𝐵 = 𝑃 𝐴 if A and B are independent
𝑃 𝐴 𝐵 ⋅ 𝑃 𝐵 = 𝑃 𝐵 𝐴 ⋅ 𝑃(𝐴)otherwise,
P A ∩ B = P(B ∩ A) 20
CS 40003: Data Analytics
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Conditional Probability
• Generalization of Conditional Probability:

P(A ∩ B) P(B ∩ A)
P AB = =
P(B) P(B)

P(B|A)∙P(A)
= ∵P A ∩ B = P(B|A) ∙P(A) = P(A|B) ∙P(B)
P(B)

ഥ , where A
By the law of total probability : P(B) = P B ∩ A ∪ B ∩ A ഥ denotes the
compliment of event A. Thus,

P(B|A) ∙ P(A)
P AB =
P B∩A ∪ B∩A ഥ

P B A ∙ P(A)
=
P B A ∙ P A + P(B│A ഥ) ∙ P(A
ഥ)
CS 40003: Data Analytics 21
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Conditional Probability

In general,
P(A) ∙ P D A
P AD =
P A ∙ P D A + P B ∙ P D B + P(C) ∙ P(D│C)

CS 40003: Data Analytics 22


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Total Probability
Definition 8.3: Total Probability

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a


random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐴 = 𝑃 𝐸1 . 𝑃 𝐴 𝐸1 + 𝑃 𝐸2 . 𝑃 𝐴 𝐸2 + ⋯ … … … . +𝑃 𝐸𝑛 . 𝑃(𝐴|𝐸𝑛 )

CS 40003: Data Analytics 23


CS 40003: Data Analytics 23
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Total Probability: An Example


Example8.3
A bag contains 4 red and 3 black balls. A second bag contains 2 red and 4 black balls.
One bag is selected at random. From the selected bag, one ball is drawn. What is the
probability that the ball drawn is red?

This problem can be answered using the concept of Total Probability


𝐸1 =Selecting bag I
𝐸2 =Selecting bag II
A = Drawing the red ball

Thus, 𝑃 𝐴 = 𝑃 𝐸1 . 𝑃 𝐴|𝐸1 + 𝑃 𝐸2 . 𝑃(𝐴|𝐸2 )


where, 𝑃 𝐴|𝐸1 = Probability of drawing red ball when first bag has been chosen
and 𝑃 𝐴|𝐸2 = Probability of drawing red ball when second bag has been chosen

CS 40003: Data Analytics 24


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Reverse Probability
Example 8.3:
A bag (Bag I) contains 4 red and 3 black balls. A second bag (Bag II) contains 2 red and 4
black balls. You have chosen one ball at random. It is found as red ball. What is the
probability that the ball is chosen from Bag I?

Here,
𝐸1 = Selecting bag I
𝐸2 = Selecting bag II
A = Drawing the red ball

We are to determine P(𝐸1 |A). Such a problem can be solved using Bayes' theorem of
probability.

CS 40003: Data Analytics 25


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Bayes’ Theorem

Theorem 8.4: Bayes’ Theorem

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a


random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐸𝑖 . 𝑃(𝐴|𝐸𝑖 )
𝑃(𝐸𝑖 𝐴 =
σ𝑛𝑖=1 𝑃 𝐸𝑖 . 𝑃(𝐴|𝐸𝑖 )

CS 40003: Data Analytics 26


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Prior and Posterior Probabilities


– P(A) and P(B) are called prior probabilities X Y
– P(A|B), P(B|A) are called posterior probabilities
𝑥1 A
Example 8.6:Prior versus Posterior Probabilities 𝑥2 A
• This table shows that the event Y has two outcomes
namely A and B, which is dependent on another event X 𝑥3 B
with various outcomes like 𝑥1 , 𝑥2 and 𝑥3 . 𝑥3 A
• Case1: Suppose, we don’t have any information of the
event A. Then, from the given sample space, we can 𝑥2 B
5
calculate P(Y = A) = 10 = 0.5 𝑥1 A
𝑥1 B
• Case2: Now, suppose, we want to calculate P(X =
2
𝑥2 |Y =A) = 5= 0.4 . 𝑥3 B
𝑥2 B
The later is the conditional or posterior probability, where
as the former is the prior probability. 𝑥2 A
CS 40003: Data Analytics 27
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


• Suppose, Y is a class variable and X = 𝑋1, 𝑋2 , … . . , 𝑋𝑛 is a set of attributes,
with instance of Y.

INPUT (X) CLASS(Y)


… … …
… … … …
𝑥 1, 𝑥 2 , … , 𝑥 𝑛 𝑦 𝑖
… … … …

• The classification problem, then can be expressed as the class-conditional


probability
𝑃 𝑌 = 𝑦𝑖 | 𝑋1 = 𝑥1 AND 𝑋2 = 𝑥2 AND … . . 𝑋𝑛 = 𝑥𝑛

CS 40003: Data Analytics 28


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


• Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem,
which is as follows.
• From Bayes’ theorem on conditional probability, we have
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
𝑃 𝑌𝑋 =
𝑃(𝑋)
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
=
𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘
where,
𝑃 𝑋 = σ𝑘𝑖=1 𝑃(𝑋|𝑌 = 𝑦𝑖 ) ∙ 𝑃(Y = 𝑦𝑖 )
Note:
 𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.
 The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y)∙ 𝑃(𝑌).

 Thus, P(Y|X) can be taken as a measure of Y given that X.


P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
CS 40003: Data Analytics 29
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


• Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1 ) and ….. (𝑋𝑛 = 𝑥𝑛 )).

• There are any two class conditional probabilities namely P(Y= 𝑦𝑖 |X=x) and
P(Y= 𝑦𝑗 | X=x).

• If P(Y= 𝑦𝑖 | X=x) >P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger than 𝑦𝑗
for the instance X = x.

• The strongest 𝑦𝑖 is the classification for the instance X = x.

CS 40003: Data Analytics 30


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


• Example: With reference to the Air Traffic Dataset mentioned earlier, let us
tabulate all the posterior and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Day

Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0


Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Season

Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0


Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
CS 40003: Data Analytics 31
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier

Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fog

High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1


Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0
Rain

Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0


Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05

CS 40003: Data Analytics 32


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


Instance:

Week Day Winter High Heavy ???

Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013

Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125

Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222

Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000

Case3 is the strongest; Hence correct classification is Very Late

CS 40003: Data Analytics 33


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


Algorithm: Naïve Bayesian Classification
Input: Given a set of k mutually exclusive and exhaustive classes C =
𝑐1 , 𝑐2 , … . . , 𝑐𝑘 , which have prior probabilities P(C1),P(C2),…..P(Ck).

There are n-attribute set A = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 , which for a given instance have


values 𝐴1 = 𝑎1 ,𝐴2 = 𝑎2 ,…..,𝐴𝑛 = 𝑎𝑛

Step: For each 𝑐𝑖 ∈ C,calculate the class condition probabilities, i = 1,2,…..,k


𝑛

𝑝𝑖 = 𝑃 𝐶𝑖 × ෑ 𝑃(𝐴𝑗 = 𝑎𝑗 |𝐶𝑖 )
𝑗=1

𝑝𝑥 = max 𝑝1 , 𝑝2 , … . . , 𝑝𝑘

Output: 𝐶𝑥 is the classification

Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather proportion


values (to posterior probabilities)
CS 40003: Data Analytics 34
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


Pros and Cons
• The Naïve Bayes’ approach is a very popular one, which often works well.

• However, it has a number of potential problems

– It relies on all attributes being categorical.

– If the data is less, then it estimates poorly.

CS 40003: Data Analytics 35


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


Approach to overcome the limitations in Naïve Bayesian Classification
• Estimating the posterior probabilities for continuous attributes
– In real life situation, all attributes are not necessarily be categorical, In fact, there is a mix of
both categorical and continuous attributes.

– In the following, we discuss the schemes to deal with continuous attributes in Bayesian
classifier.
1. We can discretize each continuous attributes and then replace the continuous values
with its corresponding discrete intervals.

2. We can assume a certain form of probability distribution for the continuous variable and
estimate the parameters of the distribution using the training data. A Gaussian distribution
is usually chosen to represent the posterior probabilities for continuous attributes. A
general form of Gaussian distribution will look like
2
1 x−μ
P x: μ, σ2 = e−
2πσ 2σ2
2
where, μ and σ denote mean and variance, respectively.

CS 40003: Data Analytics 36


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier

For each class Ci, the posterior probabilities for attribute Aj(it is the numeric
attribute) can be calculated following Gaussian normal distribution as follows.
1 aj − μij 2
P Aj = aj|Ci = e−
2πσij 2σij2
Here, the parameter μijcan be calculated based on the sample mean of attribute
value of Aj for the training records that belong to the class Ci.

Similarly, σij2 can be estimated from the calculation of variance of such training
records.

CS 40003: Data Analytics 37


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Naïve Bayesian Classifier


M-estimate of Conditional Probability

• The M-estimation is to deal with the potential problem of Naïve Bayesian Classifier
when training data size is too poor.
– If the posterior probability for one of the attribute is zero, then the overall class-
conditional probability for the class vanishes.

– In other words, if training data do not cover many of the attribute values, then we may not
be able to classify some of the test records.

• This problem can be addressed by using the M-estimate approach.

CS 40003: Data Analytics 38


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

M-estimate Approach
• M-estimate approach can be stated as follows
𝑛𝑐𝑖 + 𝑚𝑝
P Aj = aj|Ci =
𝑛+𝑚
where, n = total number of instances from class C𝑖
𝑛𝑐𝑖 = number of training examples from class C𝑖 that take the value Aj =aj
m = it is a parameter known as the equivalent sample size, and
p = is a user specified parameter.

Note:
If n = 0, that is, if there is no training set available, then 𝑃 ai|C𝑖 = p,
so, this is a different value, in absence of sample value.

CS 40003: Data Analytics 39


SESSION 14
16CS318 DATA ANALYTICS MODULE 3

A Practice Example
age income studentcredit_rating
buys_compu
Example 8.4 <=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’
>40 medium no fair yes
C2:buys_computer = ‘no’
>40 low yes fair yes
>40 low yes excellent no
Data instance
31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium,
<=30 low yes fair yes
Student = yes
>40 medium yes fair yes
Credit_rating = fair)
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
CS 40003: Data Analytics 40
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

A Practice Example
 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|Ci) for each class


P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 × 0.444 × 0.667 × 0.667 = 0.044


P(X|buys_computer = “no”) = 0.6 × 0.4 × 0.2 × 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028


P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)


CS 40003: Data Analytics 41
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

POINTS TO PONDER
• Naïve Bayes based on the independence assumption
– Training is very easy and fast; just requiring considering each
attribute in each class separately
– Test is straightforward; just looking up tables or calculating
conditional probabilities with normal distributions
• A popular generative model
– Performance competitive to most of state-of-the-art classifiers
even in presence of violating independence assumption
– Many successful applications, e.g., spam mail filtering
– A good candidate of a base learner in ensemble learning
– Apart from classification, naïve Bayes can do more…

42
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

KEY TERMS
• Bayesian Classifier
• Naïve Bayesian Classifier
• M-Estimate

43
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Questions
1. Naive Bayes is a machine learning implementation of
__________theorem

2. In naïve bayes approach, Training is very slow ; just requiring


considering each attribute in each class
separately.(True/False)

3. The ______estimation is to deal with the potential problem of


Naïve Bayesian Classifier when training data size is too poor.
a) N Approach
b) M Approach
c) Naïve
d) Bayes
44
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Answers
1. Bayes
2. False
3. M Approach

45
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

LEARNING OUTCOME

• Analyze data by classification algorithms.

46
SESSION 14
16CS318 DATA ANALYTICS MODULE 3

Glimpses of Next Session


oK Means Clustering
o Clustering using R

47
SESSION 14

You might also like