Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Introduction To Probability Theory: Rong Jin

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

Introduction to Probability

Theory
Rong Jin

Outline

Basic concepts in probability theory


Bayes rule
Random variable and distributions

Definition of Probability

Experiment: toss a coin twice


Sample space: possible outcomes of an experiment

Event: a subset of possible outcomes

S = {HH, HT, TH, TT}


A={HH}, B={HT, TH}

Probability of an event : an number assigned to an event


Pr(A)

Axiom 1: Pr(A) 0
Axiom 2: Pr(S) = 1
Axiom 3: For every sequence of disjoint events
Pr(Ui Ai ) i Pr( Ai )

Example: Pr(A) = n(A)/N: frequentist statistics

Joint Probability

For events A and B, joint probability Pr(AB)


stands for the probability that both events
happen.

Example: A={HH}, B={HT, TH}, what is the joint


probability Pr(AB)?

Independence

Two events A and B are independent in case


Pr(AB) = Pr(A)Pr(B)

A set of events {Ai} is independent in case


Pr(I

Ai ) i Pr( Ai )

Independence

Two events A and B are independent in case


Pr(AB) = Pr(A)Pr(B)

A set of events {Ai} is independent in case


Pr(I

Ai ) i Pr( Ai )

Example: Drug test

A = {A patient is a Women}

Women

Men

B = {Drug fails}

Success

200

1800

Failure

1800

200

Will event A be independent


from event B ?

Independence

Consider the experiment of tossing a coin twice


Example I:

A = {HT, HH}, B = {HT}


Will event A independent from event B?

Example II:

A = {HT}, B = {TH}
Will event A independent from event B?

Disjoint Independence

If A is independent from B, B is independent from C, will A


be independent from C?

Conditioning

If A and B are events with Pr(A) > 0, the conditional


probability of B given A is
Pr( B | A)

Pr( AB)
Pr( A)

Conditioning

If A and B are events with Pr(A) > 0, the conditional


probability of B given A is
Pr( B | A)

Example: Drug test

Pr( AB)
Pr( A)
A = {Patient is a Women}

Women

Men

B = {Drug fails}

Success

200

1800

Pr(B|A) = ?

Failure

1800

200

Pr(A|B) = ?

Conditioning

If A and B are events with Pr(A) > 0, the conditional


probability of B given A is
Pr( B | A)

Example: Drug test

Pr( AB)
Pr( A)
A = {Patient is a Women}

Women

Men

B = {Drug fails}

Success

200

1800

Pr(B|A) = ?

Failure

1800

200

Pr(A|B) = ?

Given A is independent from B, what is the relationship


between Pr(A|B) and Pr(A)?

Which Drug is Better ?

Simpsons Paradox: View I

Drug II is better than Drug I

A = {Using Drug I}

Drug I

Drug II

B = {Using Drug II}

Success

219

1010

C = {Drug succeeds}

Failure

1801

1190

Pr(C|A) ~ 10%
Pr(C|B) ~ 50%

Simpsons Paradox: View II

Female Patient
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 20%
Pr(C|B) ~ 5%

Simpsons Paradox: View II

Female Patient

Male Patient

A = {Using Drug I}

A = {Using Drug I}

B = {Using Drug II}

B = {Using Drug II}

C = {Drug succeeds}

C = {Drug succeeds}

Pr(C|A) ~ 20%

Pr(C|A) ~ 100%

Pr(C|B) ~ 5%

Pr(C|B) ~ 50%

Simpsons Paradox: View II

Drug
I is better thanMale
Drug
II
Patient
Female
Patient
A = {Using Drug I}

A = {Using Drug I}

B = {Using Drug II}

B = {Using Drug II}

C = {Drug succeeds}

C = {Drug succeeds}

Pr(C|A) ~ 20%

Pr(C|A) ~ 100%

Pr(C|B) ~ 5%

Pr(C|B) ~ 50%

Conditional Independence

Event A and B are conditionally independent given


C in case
Pr(AB|C)=Pr(A|C)Pr(B|C)
A set of events {Ai} is conditionally independent
given C in case
Pr(Ui Ai | C ) i Pr( Ai | C )

Conditional Independence (contd)

Example: There are three events: A, B, C

Pr(A) = Pr(B) = Pr(C) = 1/5


Pr(A,C) = Pr(B,C) = 1/25, Pr(A,B) = 1/10
Pr(A,B,C) = 1/125
Whether A, B are independent?
Whether A, B are conditionally independent given
C?

A and B are independent A and B are


conditionally independent

Outline

Important concepts in probability theory


Bayes rule
Random variables and distributions

Bayes Rule

Given two events A and B and suppose that Pr(A) > 0. Then

Pr( AB) Pr( A | B ) Pr( B )


Pr( B | A)

Pr( A)
Pr( A)

Example:
Pr(R) = 0.8
Pr(W|R)

0.7

0.4

0.3

0.6

R: It is a rainy day
W: The grass is wet
Pr(R|W) = ?

Bayes Rule
R

0.7

0.4

0.3

0.6

R: It rains
W: The grass is wet

Information
Pr(W|R)

W
Inference
Pr(R|W)

Bayes Rule
R

0.7

0.4

0.3

0.6

R: It rains
W: The grass is wet

Information: Pr(E|H)

Hypothesis H

Posterior

Likelihood
Inference:
Pr(H|E)

Pr( E | H ) Pr( H )
Pr( H | E )
Pr( E )

Evidence E

Prior

Bayes Rule: More Complicated

Suppose that B1, B2, Bk form a partition of S:

Bi I B j ;

Ui Bi S

Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then

Pr( A | Bi ) Pr( Bi )
Pr( Bi | A)
Pr( A)
Pr( A | Bi ) Pr( Bi )

k
j 1 Pr( AB j )

Pr( A | Bi ) Pr( Bi )

k
Pr( B j ) Pr( A |
j 1

Bj )

Bayes Rule: More Complicated

Suppose that B1, B2, Bk form a partition of S:

Bi I B j ;

Ui Bi S

Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then

Pr( A | Bi ) Pr( Bi )
Pr( Bi | A)
Pr( A)
Pr( A | Bi ) Pr( Bi )

k
j 1 Pr( AB j )

Pr( A | Bi ) Pr( Bi )

k
Pr( B j ) Pr( A |
j 1

Bj )

Bayes Rule: More Complicated

Suppose that B1, B2, Bk form a partition of S:

Bi I B j ;

Ui Bi S

Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then

Pr( A | Bi ) Pr( Bi )
Pr( Bi | A)
Pr( A)
Pr( A | Bi ) Pr( Bi )

k
j 1 Pr( AB j )

Pr( A | Bi ) Pr( Bi )

k
Pr( B j ) Pr( A |
j 1

Bj )

A More Complicated Example


R

U
Pr(R) = 0.8

It rains

The grass is wet

People bring umbrella

Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(UW| R)=Pr(U| R)Pr(W| R)

Pr(W|R)

Pr(U|R)

0.7

0.4

0.9

0.2

0.3

0.6

0.1

0.8

Pr(U|W) = ?

A More Complicated Example


R

U
Pr(R) = 0.8

It rains

The grass is wet

People bring umbrella

Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(UW| R)=Pr(U| R)Pr(W| R)

Pr(W|R)

Pr(U|R)

0.7

0.4

0.9

0.2

0.3

0.6

0.1

0.8

Pr(U|W) = ?

A More Complicated Example


R

U
Pr(R) = 0.8

It rains

The grass is wet

People bring umbrella

Pr(UW|R)=Pr(U|R)Pr(W|R)
Pr(UW| R)=Pr(U| R)Pr(W| R)

Pr(W|R)

Pr(U|R)

0.7

0.4

0.9

0.2

0.3

0.6

0.1

0.8

Pr(U|W) = ?

Outline

Important concepts in probability theory


Bayes rule
Random variable and probability distribution

Random Variable and Distribution

A random variable X is a numerical outcome of a


random experiment
The distribution of a random variable is the collection
of possible outcomes along with their probabilities:

Pr( X x) p ( x)
Discrete case:
b
Continuous case: Pr(a X b) a p ( x)dx

Random Variable: Example

Let S be the set of all sequences of three rolls of a


die. Let X be the sum of the number of dots on the
three rolls.
What are the possible values for X?
Pr(X = 5) = ?, Pr(X = 10) = ?

Expectation

A random variable X~Pr(X=x). Then, its expectation is


E[ X ] x x Pr( X x )

In an empirical sample, x1, x2,, xN,

1
E[ X ]
N

Continuous case:

N
x
i 1 i

E[ X ]

xp ( x)dx

Expectation of sum of random variables


E[ X1 X 2 ] E[ X1 ] E[ X 2 ]

Expectation: Example

Let S be the set of all sequence of three rolls of a die.


Let X be the sum of the number of dots on the three
rolls.
What is E(X)?
Let S be the set of all sequence of three rolls of a die.
Let X be the product of the number of dots on the
three rolls.
What is E(X)?

Variance

The variance of a random variable X is the


expectation of (X-E[x])2 :
Var ( X ) E (( X E[ X ]) 2 )
E ( X 2 E[ X ]2 2 XE[ X ])
E ( X 2 E[ X ]2 )
E[ X 2 ] E[ X ]2

Bernoulli Distribution

The outcome of an experiment can either be success


(i.e., 1) and failure (i.e., 0).
Pr(X=1) = p, Pr(X=0) = 1-p, or
p ( x) p x (1 p )1 x

E[X] = p, Var(X) = p(1-p)

Binomial Distribution

n draws of a Bernoulli distribution

Xi~Bernoulli(p), X=i=1n Xi, X~Bin(p, n)

Random variable X stands for the number of times


that experiments are successful.
n
x
n x
p
(1

p
)

Pr( X x) p ( x) x

E[X] = np, Var(X) = np(1-p)

x 1, 2,..., n
otherwise

Plots of Binomial Distribution

Poisson Distribution

Coming from Binomial distribution


Fix the expectation =np
Let the number of trials n
A Binomial distribution will become a Poisson distribution

Pr( X x) p ( x) x! e
0

E[X] = , Var(X) =

x0
otherwise

Plots of Poisson Distribution

Normal (Gaussian) Distribution

X~N(,)
p ( x)

( x ) 2
exp
2
2
2

2
1

Pr(a X b) p ( x)dx

1
2 2

exp

( x ) 2

dx
2
2

E[X]= , Var(X)= 2
If X1~N(1,1) and X2~N(2,2), X= X1+ X2 ?

You might also like