Introduction To Probability Theory: Rong Jin
Introduction To Probability Theory: Rong Jin
Theory
Rong Jin
Outline
Basic concepts in probability theory
Bayes’ rule
Random variable and distributions
Definition of Probability
Experiment: toss a coin twice
Sample space: possible outcomes of an experiment
S = {HH, HT, TH, TT}
Event: a subset of possible outcomes
A={HH}, B={HT, TH}
Probability of an event : an number assigned to an
event Pr(A)
Axiom 1: Pr(A) 0
Axiom 2: Pr(S) = 1
Axiom 3: For every sequence of disjoint events
Pr( i
Ai ) i Pr( Ai )
Example: Pr(A) = n(A)/N: frequentist statistics
Joint Probability
For events A and B, joint probability Pr(AB)
stands for the probability that both events
happen.
Example: A={HH}, B={HT, TH}, what is the joint
probability Pr(AB)?
Independence
Two events A and B are independent in case
Pr(AB) = Pr(A)Pr(B)
A set of events {Ai} is independent in case
Pr( i
Ai ) i Pr( Ai )
Independence
Two events A and B are independent in case
Pr(AB) = Pr(A)Pr(B)
A set of events {Ai} is independent in case
Pr( i
Ai ) i Pr( Ai )
Female Patient
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 20%
Pr(C|B) ~ 5%
Simpson’s Paradox: View II
Drug
Female I is better thanMale
Patient Drug II
Patient
A = {Using Drug I} A = {Using Drug I}
B = {Using Drug II} B = {Using Drug II}
C = {Drug succeeds} C = {Drug succeeds}
Pr(C|A) ~ 20% Pr(C|A) ~ 100%
Pr(C|B) ~ 5% Pr(C|B) ~ 50%
Conditional Independence
Event A and B are conditionally independent given
C in case
Pr(AB|C)=Pr(A|C)Pr(B|C)
A set of events {Ai} is conditionally independent
given C in case
Pr( i
Ai | C ) i Pr( Ai | C )
Conditional Independence (cont’d)
Example: There are three events: A, B, C
Pr(A) = Pr(B) = Pr(C) = 1/5
Pr(A,C) = Pr(B,C) = 1/25, Pr(A,B) = 1/10
Pr(A,B,C) = 1/125
Whether A, B are independent?
Whether A, B are conditionally independent
given C?
A and B are independent A and B are
conditionally independent
Outline
Important concepts in probability theory
Bayes’ rule
Random variables and distributions
Bayes’ Rule
Given two events A and B and suppose that Pr(A) > 0. Then
Pr(R) = 0.8
R: It is a rainy day
Pr(W|R) R R
W: The grass is wet
W 0.7 0.4 Pr(R|W) = ?
W 0.3 0.6
Bayes’ Rule
R R
R: It rains
W 0.7 0.4
W: The grass is wet
W 0.3 0.6
Information
Pr(W|R)
R W
Inference
Pr(R|W)
Bayes’ Rule
R R
R: It rains
W 0.7 0.4
W: The grass is wet
W 0.3 0.6
Information: Pr(E|H)
Hypothesis H Evidence E
Posterior Likelihood
Inference: Pr(H|E) Prior
Pr( E | H ) Pr( H )
Pr( H | E )
Pr( E )
Bayes’ Rule: More Complicated
Suppose that B1, B2, … Bk form a partition of S:
Bi B j ; i
Bi S
Pr( A | Bi ) Pr( Bi )
Pr( Bi | A)
Pr( A)
Pr( A | Bi ) Pr( Bi )
j 1 Pr( AB j )
k
Pr( A | Bi ) Pr( Bi )
k
j 1
Pr( B j ) Pr( A | Bj )
Bayes’ Rule: More Complicated
Suppose that B1, B2, … Bk form a partition of S:
Bi B j ; i
Bi S
Pr( A | Bi ) Pr( Bi )
Pr( Bi | A)
Pr( A)
Pr( A | Bi ) Pr( Bi )
j 1 Pr( AB j )
k
Pr( A | Bi ) Pr( Bi )
k
j 1
Pr( B j ) Pr( A | Bj )
Bayes’ Rule: More Complicated
Suppose that B1, B2, … Bk form a partition of S:
Bi B j ; i
Bi S
Pr( A | Bi ) Pr( Bi )
Pr( Bi | A)
Pr( A)
Pr( A | Bi ) Pr( Bi )
j 1 Pr( AB j )
k
Pr( A | Bi ) Pr( Bi )
k
j 1
Pr( B j ) Pr( A | Bj )
A More Complicated Example
R It rains
R
W The grass is wet
U People bring umbrella
W U
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(R) = 0.8 Pr(UW| R)=Pr(U| R)Pr(W| R)
Pr(W|R) R R Pr(U|R) R R
W 0.7 0.4 U 0.9 0.2
Pr(U|W) = ?
A More Complicated Example
R It rains
R
W The grass is wet
U People bring umbrella
W U
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(R) = 0.8 Pr(UW| R)=Pr(U| R)Pr(W| R)
Pr(W|R) R R Pr(U|R) R R
W 0.7 0.4 U 0.9 0.2
Pr(U|W) = ?
A More Complicated Example
R It rains
R
W The grass is wet
U People bring umbrella
W U
Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(R) = 0.8 Pr(UW| R)=Pr(U| R)Pr(W| R)
Pr(W|R) R R Pr(U|R) R R
W 0.7 0.4 U 0.9 0.2
Pr(U|W) = ?
Outline
Important concepts in probability theory
Bayes’ rule
Random variable and probability distribution
Random Variable and Distribution
A random variable X is a numerical outcome of a
random experiment
The distribution of a random variable is the collection
of possible outcomes along with their probabilities:
Discrete case: Pr( X x) p ( x)
b
Continuous case: Pr(a X b) a p ( x)dx
Random Variable: Example
Let S be the set of all sequences of three rolls of a
die. Let X be the sum of the number of dots on the
three rolls.
What are the possible values for X?
Pr(X = 5) = ?, Pr(X = 10) = ?
Expectation
A random variable X~Pr(X=x). Then, its expectation is
E[ X ] x x Pr( X x)
x
e x0
Pr( X x) p ( x) x!
0 otherwise
E[X] = , Var(X) =
Plots of Poisson Distribution
Normal (Gaussian) Distribution
X~N(,)
1 ( x ) 2
p ( x) exp
2 2 2 2
b b 1 ( x ) 2
Pr(a X b) p ( x)dx exp dx
a a
2 2 2 2
E[X]= , Var(X)= 2
If X1~N(1,1) and X2~N(2,2), X= X1+ X2 ?