Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

StatQuest!!!

(Multinomial)

Naive Bayes

Study Guide!!!
© 2020 Joshua Starmer All Rights Reserved
The Problem

Normal messages …and we want to filter


mixed with Spam… out the spam messages.

The Solution - A Naive Bayes Classifier


If we get this message:

Dear Friend

We multiply the Prior …by the probabilities of seeing the


probability the message words Dear and Friend, given that
is Normal… it’s a Normal Message…

p( N ) x p( Dear | N ) x p( Friend | N )

…and compare that to …multiplied by the probabilities of


the Prior probability the seeing the words Dear and Friend,
message is Spam… given that it’s Spam.

p( S ) x p( Dear | S ) x p( Friend | S )
Whichever classification has the highest
probability is the final classification.

NOTES:

© 2020 Joshua Starmer All Rights Reserved


Step 1) Make histograms for all words

Spam NOTE: The word


Lunch did not
appear in the
Spam. This will
cause problems
that we will see
Normal and fix later.
Dear Friend Lunch Money Dear Friend Lunch Money

Step 2a) Calculate conditional probabilities for Normal, N

For example, the probability that the word Dear occurs given that it is in a Normal
message is the number of times Dear occurred in Normal messages, 8…

p( Dear | N ) = 8 = 0.47
p( Dear | N ) = 0.47

17 p( Friend | N ) = 0.29
p( Lunch | N ) = 0.18
…divided by 17, the
p( Money | N ) = 0.06
Dear Friend Lunch Money
total number of words in
Normal messages.

Step 2b) Calculate conditional probabilities for Spam, S

For example, the probability that the word Dear occurs given that
it is in Spam is the number of times Dear occurred in Spam, 2…

p( Dear | S ) = 2 = 0.29 p( Dear | S ) = 0.29


7 p( Friend | S ) = 0.14
…divided by 7, the total p( Lunch | S ) = 0.00
Dear Friend Lunch Money number of words in
p( Money | S ) = 0.57
Spam messages.

© 2020 Joshua Starmer All Rights Reserved


Step 3a) Calculate prior probability for Normal, p( N )

NOTE: The Prior Probabilities can be set to any


probabilities we want, but a common guess is
estimated from the training data like so:

p( N ) =
# of Normal Messages
= 8 = 0.67
Total # of Messages 8+4

Step 3b) Calculate prior probability for Spam, p( S )

p( S ) =
# of Spam Messages
= 4 = 0.33
Total # of Messages 8+4

NOTE: The reason Naive Bayes is naive, is that it


does not take word order or phrasing into account.
In other words, Naive Bayes would give the …even though people
exact same probability to the phrase I like frequently say I like pizza and
pizza as it would to the phrase Pizza like I… almost never say Pizza like I.

Because keeping track of every phrase


I heard that Who cares?

and word ordering would be impossible,


Naive Bayes It works!
Naive Bayes doesn’t even try.
is…naive!
That said, Naive Bayes works well in
practice, so keeping track of word order
must not be super important.

© 2020 Joshua Starmer All Rights Reserved


4a) Calculate probability of seeing the words Dear Friend, given
the message is Normal
The Prior probability the …multiplied by the probabilities of
message is Normal… seeing the words Dear and Friend,
given that it’s Normal.
NOTE: This probability
p( N ) x p( Dear | N ) x p( Friend | N ) makes the naive assumption
that Dear and Friend are
not correlated.

p( N ) = 0.67 p( Dear | N ) = 0.47 p( Friend | N ) = 0.29


In other words, this is not a
realistic model (high bias),
but it works in practice (low
0.67 x 0.47 x 0.29 = 0.09 variance).

4b) Calculate probability of seeing the words Dear Friend, given


the message is Spam
The Prior probability the …multiplied by the probabilities of
message is Spam… seeing the words Dear and Friend,
given that it’s Spam.

p( S ) x p( Dear | S ) x p( Friend | S )
NOTE: In practice, these
probabilities can get very
p( S ) = 0.33 p( Dear | S ) = 0.29 p( Friend | S ) = 0.14 small, so we calculate the
log() of the probabilities to
avoid underflow errors on
0.33 x 0.29 x 0.14 = 0.01 the computer.

5) Classification Because Dear Friend has a higher


probability of being Normal (0.09) than
Spam (0.01), we classify it as Normal.

Dear Friend

BAM!!!
© 2020 Joshua Starmer All Rights Reserved
Dealing With Missing Data
Spam

Normal
Dear Friend Lunch Money Dear Friend Lunch Money

Remember, the word Lunch did not p( Dear | S ) = 0.29


occur in any of the Spam… p( Friend | S ) = 0.14
…and that means the
probability of seeing p( Lunch | S ) = 0.00
This means that any message with Lunch in Spam = 0. p( Money | S ) = 0.57
the word Lunch in it will be
classified as Normal, because the
probability of being Spam = 0.

For example, the probability


that this message is Spam: Lunch Money Money Money Money

p( S ) x p( Lunch | S ) x p( Money | S )4
p( S ) = 0.33 p( Lunch | S ) = 0.00 p( Money | S ) = 0.57

0.33 x 0.00 x 0.574 = 0


To solve this problem a pseudocount is added to
each word. Usually that means adding 1 count to
each word, but you can add any number by changing
Normal α (alpha).

The black boxes Spam


represent pseudocounts.

Dear Friend Lunch Money Dear Friend Lunch Money

© 2020 Joshua Starmer All Rights Reserved


Using Pseudocounts…

p( Dear | N ) = 8 + 1 = 0.43
p( Dear | N ) = 0.43
p( Friend | N ) = 0.29
17 + 4 p( Lunch | N ) = 0.19
p( Money | N ) = 0.10

Dear Friend Lunch Money

p( N ) x p( Lunch | N ) x p( Money | N )4
p( N ) = 0.67 p( Lunch | N ) = 0.19 p( Money | N ) = 0.10

0.67 x 0.19 x 0.104 = 0.00001

p( Dear | S ) = 2 + 1 = 0.27 p( Dear | S ) = 0.27


7+4 p( Friend | S ) = 0.18
p( Lunch | S ) = 0.09
p( Money | S ) = 0.45
Dear Friend Lunch Money

p( S ) x p( Lunch | S ) x p( Money | S )4
p( S ) = 0.33 p( Lunch | S ) = 0.09 p( Money | S ) = 0.45

0.33 x 0.09 x 0.454 = 0.00122

Because Lunch Money Money Money Money has


a higher probability of being Spam (0.00122) than
Normal (0.00005), we classify it as Spam.

Lunch Money Money Money Money


© 2020 Joshua Starmer All Rights Reserved
SPAM!!!

© 2020 Joshua Starmer All Rights Reserved

You might also like