Bayes 2 V
Bayes 2 V
Bayes 2 V
Greg Distiller
Greg.Distiller@uct.ac.za
Binomial Sampling
So far the conditional probabilities of observing the data (f (xk |θr )) have been
based on a subjective evaluation of the reliability of the observations as
indicators of the state.
In many cases it is possible to use statistical theory to get more objective
values.
For example, the unknown state θ may be the proportion of “successes” in
the population.
Combining binomial sampling with decision theory is particularly relevant in
quality control where sampling information must be used efficiently and the
cost of sampling needs to be weighed against the value gained through
improved quality.
Binomial Sampling
θ
Sell in:
0.05 0.15 0.25
HQ 4000 1500 -3500
MQ 2000 2000 -1000
LQ 1000 1000 0
π(θ) 0.5 0.3 0.2
Binomial Sampling
θ Expected gain
Sell in:
0.05 0.15 0.25
HQ 4000 1500 -3500 1750
MQ 2000 2000 -1000 1400
LQ 1000 1000 0 800
P
π(θ) 0.5 0.3 0.2 E (X ) = xp(x)
The decision that maximises the expected payoff is to sell to the HQ market.
Binomial Sampling
Suppose that before making a decision, a random sample of three items can
be taken from the batch and inspected.
If X is the number of defective items found then
X ∼ Bin(3, θ)
Hence for x = 0, 1, 2, 3:
3 x
f (x|θ) = Pr [X = x|θ] = θ [1 − θ]3−x
x
Binomial sampling
Binomial sampling
If all values of θ are possible, we need a pdf.
If all equally likely then θ ∼ U(0, 1):
π(θ) = 1 0≤θ≤1
Posterior becomes:
n x
θ [1 − θ]n−x
π(θ|x) = R 1 xn
0 x
θx [1 − θ]n−x dθ
Binomial sampling
If all values of θ are possible, we need a pdf.
If all equally likely then θ ∼ U(0, 1):
π(θ) = 1 0≤θ≤1
Posterior becomes:
n x
θ [1 − θ]n−x
π(θ|x) = R 1 xn
0 x
θx [1 − θ]n−x dθ
4
3
posterior
2
1
0
x
Binomial Sampling
Binomial sampling
Can use the beta distribution to also model the prior probabilities for θ:
Joint probability or probability density for the full data set, i.e. the likelihood
function:
n
Y
L(θ; x1 , x2 , . . . , xn ) = f (xi |θ)
i=1
L(θ; x1 , x2 , . . . , xn )π(θ)
π(θ|x1 , x2 , . . . , xn ) =
m(x1 , x2 , . . . , xn )
R∞
where m(x1 , x2 , . . . , xn ) = −∞
L(θ; x1 , x2 , . . . , xn )π(θ)dθ
Simplified form:
1 Point estimation
2 Interval estimation
3 Hypothesis testing
Now what?
Point Estimation
Specify a “loss function” hassociatedi with the error in estimation, e.g. the
expected squared error E (θ − θ̂)2 .
Find Bayes estimate by minimizing E[(θ − θ̂)2 ] also called posterior risk.
Now what?
Point Estimation
Specify a “loss function” hassociatedi with the error in estimation, e.g. the
expected squared error E (θ − θ̂)2 .
Find Bayes estimate by minimizing E[(θ − θ̂)2 ] also called posterior risk.
⇒ Expand as E[(θ − θ̂)2 ] = E[θ2 ] − 2θ̂E[θ] + θ̂2 , and differentiate with
respect to θ̂ to give −2E[θ] + 2θ̂ = 0, or θ̂ = E[θ].
R
Often called the “Bayes’ Estimator” → θ̂Bayes = θπ(θ|x)dθ
Now what?
Interval Estimation
Interval Estimation
Direct interpretation that θ belongs to the interval [θL , θU ] with 100(1 − α)%
probability → the credibility interval.
Now what?
1 Analytical solutions: use the well-known analytic solutions for the mean,
variance, etc. of the various posterior distribution.
Now what?
1 Analytical solutions: use the well-known analytic solutions for the mean,
variance, etc. of the various posterior distribution.
For the polio example : π(θ|y , n) = Beta(18, 8)
18
E[θ] = = 0.69
18 + 8
and
(18)(8)
var[θ] = = 0.01
(18 + 8)2 (18 + 8 + 1)
Now what?
2 Numerical solutions:
▶ Draw a large number of random samples from the posterior distribution π(θ|x)
▶ Calculate the sample statistics from that set of random samples e.g. mean,
median, mode, variance.
▶ Obtain Highest Posterior Density regions (HPD),(also known as Bayesian
confidence or credibility intervals) of θ.
Now what?
2 Numerical solutions:
▶ Draw a large number of random samples from the posterior distribution π(θ|x)
▶ Calculate the sample statistics from that set of random samples e.g. mean,
median, mode, variance.
▶ Obtain Highest Posterior Density regions (HPD),(also known as Bayesian
confidence or credibility intervals) of θ.
For the polio example : π(θ|y , n) = Beta(18, 8)
Draw a 1000 samples from π(θ|y , n):
Mean= 0.698
var= 0.00777
95% HPD: [0.513, 0.853]
Now what?
Hypothesis Testing
Hypothesis Testing
L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0
i.e. product of the “likelihood ratio” (“Bayes Factor”) and the a priori odds.
Now what?
Hypothesis Testing
L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0
i.e. product of the “likelihood ratio” (“Bayes Factor”) and the a priori odds.
For costs CI of a type-I error (rejecting H0 when it is true) & CII of a type-II
error (failing to reject H0 when it is false), Bayes’ optimal decision rejects H0
if CI π(θ0 |x1 , x2 , . . . , xn ) ≤ CII π(θ1 |x1 , x2 , . . . , xn ), i.e. if
L(θ1 ; x1 , x2 , . . . , xn ) π0 CI
≥ · .
L(θ0 ; x1 , x2 , . . . , xn ) 1 − π0 CII
Hypothesis Testing
L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0
i.e. product of the “likelihood ratio” (“Bayes Factor”) and the a priori odds.
For costs CI of a type-I error (rejecting H0 when it is true) & CII of a type-II
error (failing to reject H0 when it is false), Bayes’ optimal decision rejects H0
if CI π(θ0 |x1 , x2 , . . . , xn ) ≤ CII π(θ1 |x1 , x2 , . . . , xn ), i.e. if
L(θ1 ; x1 , x2 , . . . , xn ) π0 CI
≥ · .
L(θ0 ; x1 , x2 , . . . , xn ) 1 − π0 CII
π(θ0 |x1 , . . . , xn )
π(θ0 |x1 , . . . , xn ) + π(θ1 |x1 , . . . , xn )
Now what?
Another Example