Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv

StatQuest!!!
(Multinomial)
Naive Bayes
Study Guide!!!
© 2020 Joshua Starmer All Rights Reserved
The Problem
Normal messages …and we want to filter

mixed with Spam… out the spam messages.
The Solution - A Naive Bayes Classifier

If we get this message:
Dear Friend
We multiply the Prior …by the probabilities of seeing the

probability the message words Dear and Friend, given that
is Normal… it’s a Normal Message…
p( N ) x p( Dear | N ) x p( Friend | N )
…and compare that to …multiplied by the probabilities of

the Prior probability the seeing the words Dear and Friend,
message is Spam… given that it’s Spam.
p( S ) x p( Dear | S ) x p( Friend | S )
Whichever classification has the highest
probability is the final classification.
NOTES:

Step 1) Make histograms for all words
Spam NOTE: The word

Lunch did not
appear in the
Spam. This will
cause problems
that we will see
Normal and fix later.
Dear Friend Lunch Money Dear Friend Lunch Money
Step 2a) Calculate conditional probabilities for Normal, N
For example, the probability that the word Dear occurs given that it is in a Normal
message is the number of times Dear occurred in Normal messages, 8…
p( Dear | N ) = 8 = 0.47
p( Dear | N ) = 0.47
17 p( Friend | N ) = 0.29
p( Lunch | N ) = 0.18
…divided by 17, the
p( Money | N ) = 0.06
Dear Friend Lunch Money
total number of words in
Normal messages.
Step 2b) Calculate conditional probabilities for Spam, S
For example, the probability that the word Dear occurs given that
it is in Spam is the number of times Dear occurred in Spam, 2…
p( Dear | S ) = 2 = 0.29 p( Dear | S ) = 0.29

7 p( Friend | S ) = 0.14
…divided by 7, the total p( Lunch | S ) = 0.00
Dear Friend Lunch Money number of words in
p( Money | S ) = 0.57
Spam messages.

Step 3a) Calculate prior probability for Normal, p( N )
NOTE: The Prior Probabilities can be set to any

probabilities we want, but a common guess is
estimated from the training data like so:
p( N ) =
# of Normal Messages
= 8 = 0.67
Total # of Messages 8+4
Step 3b) Calculate prior probability for Spam, p( S )
p( S ) =
# of Spam Messages
= 4 = 0.33
Total # of Messages 8+4
NOTE: The reason Naive Bayes is naive, is that it

does not take word order or phrasing into account.
In other words, Naive Bayes would give the …even though people
exact same probability to the phrase I like frequently say I like pizza and
pizza as it would to the phrase Pizza like I… almost never say Pizza like I.
Because keeping track of every phrase

I heard that Who cares?
and word ordering would be impossible,

Naive Bayes It works!
Naive Bayes doesn’t even try.
is…naive!
That said, Naive Bayes works well in
practice, so keeping track of word order
must not be super important.

4a) Calculate probability of seeing the words Dear Friend, given
the message is Normal
The Prior probability the …multiplied by the probabilities of
message is Normal… seeing the words Dear and Friend,
given that it’s Normal.
NOTE: This probability
p( N ) x p( Dear | N ) x p( Friend | N ) makes the naive assumption
that Dear and Friend are
not correlated.
p( N ) = 0.67 p( Dear | N ) = 0.47 p( Friend | N ) = 0.29

In other words, this is not a
realistic model (high bias),
but it works in practice (low
0.67 x 0.47 x 0.29 = 0.09 variance).
4b) Calculate probability of seeing the words Dear Friend, given

the message is Spam
The Prior probability the …multiplied by the probabilities of
message is Spam… seeing the words Dear and Friend,
given that it’s Spam.
p( S ) x p( Dear | S ) x p( Friend | S )
NOTE: In practice, these
probabilities can get very
p( S ) = 0.33 p( Dear | S ) = 0.29 p( Friend | S ) = 0.14 small, so we calculate the
log() of the probabilities to
avoid underflow errors on
0.33 x 0.29 x 0.14 = 0.01 the computer.
5) Classification Because Dear Friend has a higher

probability of being Normal (0.09) than
Spam (0.01), we classify it as Normal.
Dear Friend
BAM!!!
Dealing With Missing Data
Spam
Normal
Remember, the word Lunch did not p( Dear | S ) = 0.29

occur in any of the Spam… p( Friend | S ) = 0.14
…and that means the
probability of seeing p( Lunch | S ) = 0.00
This means that any message with Lunch in Spam = 0. p( Money | S ) = 0.57
the word Lunch in it will be
classified as Normal, because the
probability of being Spam = 0.
For example, the probability

that this message is Spam: Lunch Money Money Money Money
p( S ) x p( Lunch | S ) x p( Money | S )4
p( S ) = 0.33 p( Lunch | S ) = 0.00 p( Money | S ) = 0.57
0.33 x 0.00 x 0.574 = 0

To solve this problem a pseudocount is added to
each word. Usually that means adding 1 count to
each word, but you can add any number by changing
Normal α (alpha).
The black boxes Spam

represent pseudocounts.

Using Pseudocounts…
p( Dear | N ) = 8 + 1 = 0.43
p( Dear | N ) = 0.43
p( Friend | N ) = 0.29
17 + 4 p( Lunch | N ) = 0.19
p( Money | N ) = 0.10
p( N ) x p( Lunch | N ) x p( Money | N )4
p( N ) = 0.67 p( Lunch | N ) = 0.19 p( Money | N ) = 0.10
0.67 x 0.19 x 0.104 = 0.00001
p( Dear | S ) = 2 + 1 = 0.27 p( Dear | S ) = 0.27

7+4 p( Friend | S ) = 0.18
p( Lunch | S ) = 0.09
p( Money | S ) = 0.45
p( S ) x p( Lunch | S ) x p( Money | S )4
p( S ) = 0.33 p( Lunch | S ) = 0.09 p( Money | S ) = 0.45
0.33 x 0.09 x 0.454 = 0.00122
Because Lunch Money Money Money Money has

a higher probability of being Spam (0.00122) than
Normal (0.00005), we classify it as Spam.
Lunch Money Money Money Money

SPAM!!!

Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv

Uploaded by

Copyright:

Available Formats

StatQuest!!!

Normal messages …and we want to filter

The Solution - A Naive Bayes Classifier

We multiply the Prior …by the probabilities of seeing the

…and compare that to …multiplied by the probabilities of

© 2020 Joshua Starmer All Rights Reserved

Spam NOTE: The word

Step 2a) Calculate conditional probabilities for Normal, N

Step 2b) Calculate conditional probabilities for Spam, S

p( Dear | S ) = 2 = 0.29 p( Dear | S ) = 0.29

© 2020 Joshua Starmer All Rights Reserved

NOTE: The Prior Probabilities can be set to any

Step 3b) Calculate prior probability for Spam, p( S )

NOTE: The reason Naive Bayes is naive, is that it

Because keeping track of every phrase

and word ordering would be impossible,

© 2020 Joshua Starmer All Rights Reserved

p( N ) = 0.67 p( Dear | N ) = 0.47 p( Friend | N ) = 0.29

4b) Calculate probability of seeing the words Dear Friend, given

5) Classification Because Dear Friend has a higher

Remember, the word Lunch did not p( Dear | S ) = 0.29

For example, the probability

0.33 x 0.00 x 0.574 = 0

The black boxes Spam

Dear Friend Lunch Money Dear Friend Lunch Money

© 2020 Joshua Starmer All Rights Reserved

Dear Friend Lunch Money

0.67 x 0.19 x 0.104 = 0.00001

p( Dear | S ) = 2 + 1 = 0.27 p( Dear | S ) = 0.27

0.33 x 0.09 x 0.454 = 0.00122

Because Lunch Money Money Money Money has

Lunch Money Money Money Money

© 2020 Joshua Starmer All Rights Reserved

You might also like