Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv
Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv
Statquest Multinomial Naive Bayes Study Guide V3-Mgywmv
(Multinomial)
Naive Bayes
Study Guide!!!
© 2020 Joshua Starmer All Rights Reserved
The Problem
Dear Friend
p( N ) x p( Dear | N ) x p( Friend | N )
p( S ) x p( Dear | S ) x p( Friend | S )
Whichever classification has the highest
probability is the final classification.
NOTES:
For example, the probability that the word Dear occurs given that it is in a Normal
message is the number of times Dear occurred in Normal messages, 8…
p( Dear | N ) = 8 = 0.47
p( Dear | N ) = 0.47
17 p( Friend | N ) = 0.29
p( Lunch | N ) = 0.18
…divided by 17, the
p( Money | N ) = 0.06
Dear Friend Lunch Money
total number of words in
Normal messages.
For example, the probability that the word Dear occurs given that
it is in Spam is the number of times Dear occurred in Spam, 2…
p( N ) =
# of Normal Messages
= 8 = 0.67
Total # of Messages 8+4
p( S ) =
# of Spam Messages
= 4 = 0.33
Total # of Messages 8+4
p( S ) x p( Dear | S ) x p( Friend | S )
NOTE: In practice, these
probabilities can get very
p( S ) = 0.33 p( Dear | S ) = 0.29 p( Friend | S ) = 0.14 small, so we calculate the
log() of the probabilities to
avoid underflow errors on
0.33 x 0.29 x 0.14 = 0.01 the computer.
Dear Friend
BAM!!!
© 2020 Joshua Starmer All Rights Reserved
Dealing With Missing Data
Spam
Normal
Dear Friend Lunch Money Dear Friend Lunch Money
p( S ) x p( Lunch | S ) x p( Money | S )4
p( S ) = 0.33 p( Lunch | S ) = 0.00 p( Money | S ) = 0.57
p( Dear | N ) = 8 + 1 = 0.43
p( Dear | N ) = 0.43
p( Friend | N ) = 0.29
17 + 4 p( Lunch | N ) = 0.19
p( Money | N ) = 0.10
p( N ) x p( Lunch | N ) x p( Money | N )4
p( N ) = 0.67 p( Lunch | N ) = 0.19 p( Money | N ) = 0.10
p( S ) x p( Lunch | S ) x p( Money | S )4
p( S ) = 0.33 p( Lunch | S ) = 0.09 p( Money | S ) = 0.45