Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Bayes 2 V

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Bayesian Inference (cont.

Greg Distiller
Greg.Distiller@uct.ac.za

University of Cape Town


Binomial Sampling

Binomial Sampling

So far the conditional probabilities of observing the data (f (xk |θr )) have been
based on a subjective evaluation of the reliability of the observations as
indicators of the state.
In many cases it is possible to use statistical theory to get more objective
values.
For example, the unknown state θ may be the proportion of “successes” in
the population.
Combining binomial sampling with decision theory is particularly relevant in
quality control where sampling information must be used efficiently and the
cost of sampling needs to be weighed against the value gained through
improved quality.
Binomial Sampling

Quality Control Example

A factory produces and sells batches of some product.


One measure of the quality of a batch is the proportion (θ) of defective items
in the batch.
Suppose that there are three different markets for the product: high quality
(HQ), medium quality (MQ), low quality (LQ).
To which market should the batch be sold?
The selling price (R per batch), minimum quality level, and penalty for not
meeting the minimum requirement are given as:
Market Selling Required Penalty
Price Quality (as %) per %
HQ 4000 10 500
MQ 2000 15 300
LQ 1000 20 200
Binomial Sampling

Quality Control Example

θ is an unknown proportion which can be interpreted as the probability of a


defective item from a batch (p in the binomial distribution).
π(0.05), π(0.15) and π(0.25): probabilities that the true value for θ is 0.05,
0.15 or 0.25.
The probability π(0.05) is a subjective probability indicating how likely it is
that the true value for θ = 0.05.
▶ π(p = 0.05) = 0.5
▶ π(p = 0.15) = 0.3
▶ π(p = 0.25) = 0.2
Binomial Sampling

Quality Control Example - payoff matrix

θ
Sell in:
0.05 0.15 0.25
HQ 4000 1500 -3500
MQ 2000 2000 -1000
LQ 1000 1000 0
π(θ) 0.5 0.3 0.2
Binomial Sampling

Quality Control Example - payoff matrix

θ Expected gain
Sell in:
0.05 0.15 0.25
HQ 4000 1500 -3500 1750
MQ 2000 2000 -1000 1400
LQ 1000 1000 0 800
P
π(θ) 0.5 0.3 0.2 E (X ) = xp(x)
The decision that maximises the expected payoff is to sell to the HQ market.
Binomial Sampling

Quality Control Example

Suppose that before making a decision, a random sample of three items can
be taken from the batch and inspected.
If X is the number of defective items found then

X ∼ Bin(3, θ)

Hence for x = 0, 1, 2, 3:
 
3 x
f (x|θ) = Pr [X = x|θ] = θ [1 − θ]3−x
x

These conditional probabilities are used to get the posterior probabilities.


Binomial Sampling

Quality Control Example: EVSI

Quality control example (pg 12)


Binomial Sampling

Binomial sampling

In a binomial sampling scenario:


 
n x
f (x|θ) = Pr [X = x|θ] = θ [1 − θ]n−x
x

What can we say about θ after observing a particular value for x?


Could consider a large number of values for θ, say θ = (0.01, 0.02, . . . , 0.99)
and associate a probability with each value:
n x
 n−x
x θr [1 − θr ] π(θr )
Pr [θr |x] = π(θr |x) = P n x n−x
k x θk [1 − θk ] π(θk )
Could set this up in a spreadsheet and calculate say π(θr |x = 5). Can we do
this analytically?
Binomial Sampling

Binomial sampling
If all values of θ are possible, we need a pdf.
If all equally likely then θ ∼ U(0, 1):

π(θ) = 1 0≤θ≤1

Posterior becomes:
n x

θ [1 − θ]n−x
π(θ|x) = R 1 xn
0 x
θx [1 − θ]n−x dθ

Binomial term cancels and the denominator is a constant so:

π(θ|x) = kθx [1 − θ]n−x


Binomial Sampling

Binomial sampling
If all values of θ are possible, we need a pdf.
If all equally likely then θ ∼ U(0, 1):

π(θ) = 1 0≤θ≤1

Posterior becomes:
n x

θ [1 − θ]n−x
π(θ|x) = R 1 xn
0 x
θx [1 − θ]n−x dθ

Binomial term cancels and the denominator is a constant so:

π(θ|x) = kθx [1 − θ]n−x

Which is the form of a beta distribution:


Γ(a + b) a−1
x (1 − x)b−1
Γ(a)Γ(b)
Γ(n+2)
where a = x + 1, b = n − x + 1, and k = Γ(x+1)Γ(n−x+1)
Binomial Sampling

Binomial sampling Example: Illustration

A researcher examined the level of consensus denoted θ among n = 24 women


about whether or not polio (as well as other diseases) was thought to be
contagious. In this case, 17 women said polio was contagious.
Let Xi = 1 if respondent i thought polio was contagious and Xi = 0 otherwise.
P
Let Xi = Y ∼ Bin(24, θ) and let θ ∼ U(0, 1).
Binomial Sampling

Binomial sampling Example: Illustration

A researcher examined the level of consensus denoted θ among n = 24 women


about whether or not polio (as well as other diseases) was thought to be
contagious. In this case, 17 women said polio was contagious.
Let Xi = 1 if respondent i thought polio was contagious and Xi = 0 otherwise.
P
Let Xi = Y ∼ Bin(24, θ) and let θ ∼ U(0, 1).
A uniform prior with a binomial likelihood:
→ π(θ|y , n) ∼ Beta(Y + 1, n − Y + 1).
Binomial Sampling

Binomial sampling Example: Illustration

A researcher examined the level of consensus denoted θ among n = 24 women


about whether or not polio (as well as other diseases) was thought to be
contagious. In this case, 17 women said polio was contagious.
Let Xi = 1 if respondent i thought polio was contagious and Xi = 0 otherwise.
P
Let Xi = Y ∼ Bin(24, θ) and let θ ∼ U(0, 1).
A uniform prior with a binomial likelihood:
→ π(θ|y , n) ∼ Beta(Y + 1, n − Y + 1).
Substitute n = 24 and Y = 17 into the posterior distribution
→ π(θ|y , n) = Beta(18, 8)
Binomial Sampling

Posterior distr. for Binomial sampling

4
3
posterior

2
1
0

0.0 0.2 0.4 0.6 0.8 1.0

x
Binomial Sampling

Binomial sampling

Can use the beta distribution to also model the prior probabilities for θ:

π(θ) = kθa−1 [1 − θ]b−1


Gives:

π(θ|x) = kθa+x−1 [1 − θ]b+n−x−1


which is a beta distribution with a = a + x and b = n + b − x.
Can select appropriate values for a and b for the prior distribution.
▶ θ expected to be around 0.2, and
▶ fairly sure (analagous to a 95% CI) that 0.1 ≤ θ ≤ 0.4.
▶ a = 5.48; b = 21.96
General principles

General Bayesian Principles

Joint probability or probability density for the full data set, i.e. the likelihood
function:
n
Y
L(θ; x1 , x2 , . . . , xn ) = f (xi |θ)
i=1

Prior distribution π(θ) for θ.


Posterior probability or probability density:

L(θ; x1 , x2 , . . . , xn )π(θ)
π(θ|x1 , x2 , . . . , xn ) =
m(x1 , x2 , . . . , xn )
R∞
where m(x1 , x2 , . . . , xn ) = −∞
L(θ; x1 , x2 , . . . , xn )π(θ)dθ
Simplified form:

π(θ|x1 , x2 , . . . , xn ) ∝ L(θ; x1 , x2 , . . . , xn )π(θ)

In many cases the distributional form that emerges is recognisable.


Now what?

Analyzing the posterior distr. of θ

1 Point estimation
2 Interval estimation
3 Hypothesis testing
Now what?

Point Estimation

Specify a “loss function” hassociatedi with the error in estimation, e.g. the
expected squared error E (θ − θ̂)2 .
Find Bayes estimate by minimizing E[(θ − θ̂)2 ] also called posterior risk.
Now what?

Point Estimation

Specify a “loss function” hassociatedi with the error in estimation, e.g. the
expected squared error E (θ − θ̂)2 .
Find Bayes estimate by minimizing E[(θ − θ̂)2 ] also called posterior risk.
⇒ Expand as E[(θ − θ̂)2 ] = E[θ2 ] − 2θ̂E[θ] + θ̂2 , and differentiate with
respect to θ̂ to give −2E[θ] + 2θ̂ = 0, or θ̂ = E[θ].
R
Often called the “Bayes’ Estimator” → θ̂Bayes = θπ(θ|x)dθ
Now what?

Interval Estimation

Find two values θL and θU such that:


Z θU
Pr[θL ≤ θ ≤ θU ] = π(θ|x1 , x2 , . . . , xn )dθ = 1 − α
θL

for any specified probability level α.


Now what?

Interval Estimation

Find two values θL and θU such that:


Z θU
Pr[θL ≤ θ ≤ θU ] = π(θ|x1 , x2 , . . . , xn )dθ = 1 − α
θL

for any specified probability level α.

Direct interpretation that θ belongs to the interval [θL , θU ] with 100(1 − α)%
probability → the credibility interval.
Now what?

Point & interval estimation

1 Analytical solutions: use the well-known analytic solutions for the mean,
variance, etc. of the various posterior distribution.
Now what?

Point & interval estimation

1 Analytical solutions: use the well-known analytic solutions for the mean,
variance, etc. of the various posterior distribution.
For the polio example : π(θ|y , n) = Beta(18, 8)
18
E[θ] = = 0.69
18 + 8
and

(18)(8)
var[θ] = = 0.01
(18 + 8)2 (18 + 8 + 1)
Now what?

Point & interval estimation

2 Numerical solutions:
▶ Draw a large number of random samples from the posterior distribution π(θ|x)
▶ Calculate the sample statistics from that set of random samples e.g. mean,
median, mode, variance.
▶ Obtain Highest Posterior Density regions (HPD),(also known as Bayesian
confidence or credibility intervals) of θ.
Now what?

Point & interval estimation

2 Numerical solutions:
▶ Draw a large number of random samples from the posterior distribution π(θ|x)
▶ Calculate the sample statistics from that set of random samples e.g. mean,
median, mode, variance.
▶ Obtain Highest Posterior Density regions (HPD),(also known as Bayesian
confidence or credibility intervals) of θ.
For the polio example : π(θ|y , n) = Beta(18, 8)
Draw a 1000 samples from π(θ|y , n):
Mean= 0.698
var= 0.00777
95% HPD: [0.513, 0.853]
Now what?

Hypothesis Testing

Two simple hypotheses: H0 : θ = θ0 versus H1 : θ = θ1


In effect, only two values a priori possible: θ0 and θ1
The prior distribution for θ is thus a discrete two-point probability mass
function.
Let π0 = Pr[θ = θ0 ].
Posterior probabilities:

π(θ0 |x1 , x2 , . . . , xn ) ∝ L(θ0 ; x1 , x2 , . . . , xn )π0


π(θ1 |x1 , x2 , . . . , xn ) ∝ L(θ1 ; x1 , x2 , . . . , xn )(1 − π0 ).
Now what?

Hypothesis Testing

Odds on H0 being true:

L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0

i.e. product of the “likelihood ratio” (“Bayes Factor”) and the a priori odds.
Now what?

Hypothesis Testing

Odds on H0 being true:

L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0

i.e. product of the “likelihood ratio” (“Bayes Factor”) and the a priori odds.

For costs CI of a type-I error (rejecting H0 when it is true) & CII of a type-II
error (failing to reject H0 when it is false), Bayes’ optimal decision rejects H0
if CI π(θ0 |x1 , x2 , . . . , xn ) ≤ CII π(θ1 |x1 , x2 , . . . , xn ), i.e. if

L(θ1 ; x1 , x2 , . . . , xn ) π0 CI
≥ · .
L(θ0 ; x1 , x2 , . . . , xn ) 1 − π0 CII

→ a “likelihood ratio” test.


Now what?

Hypothesis Testing

Odds on H0 being true:

L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0

i.e. product of the “likelihood ratio” (“Bayes Factor”) and the a priori odds.

For costs CI of a type-I error (rejecting H0 when it is true) & CII of a type-II
error (failing to reject H0 when it is false), Bayes’ optimal decision rejects H0
if CI π(θ0 |x1 , x2 , . . . , xn ) ≤ CII π(θ1 |x1 , x2 , . . . , xn ), i.e. if

L(θ1 ; x1 , x2 , . . . , xn ) π0 CI
≥ · .
L(θ0 ; x1 , x2 , . . . , xn ) 1 − π0 CII

→ a “likelihood ratio” test.


Posterior probability that H0 is true:

π(θ0 |x1 , . . . , xn )
π(θ0 |x1 , . . . , xn ) + π(θ1 |x1 , . . . , xn )
Now what?

Binomial Sampling Example

A botanist wants to estimate what proportion θ of an area is occupied by a


particular species. Assume he uses the following prior for θ:

π(θ) = cθα−1 (1 − θ)β−1 for 0 < θ < 1


with parameters α = 3 and β = 15. He sends a student to sample 15 sites from
this area and determine whether or not the species occurs in that site. The
student finds the species in six of these sites.
1 What is the posterior distribution for θ?
2 Give the Bayes estimator for θ.
Now what?

Another Example

An analyst at a marketing research company wants to test the assertion that a


particular company has a 30% market share as she believes the market share has
fallen over recent times to 22%. Assume the probability of the assertion has been
estimated to be 0.65. Fifty people are randomly selected and asked if they use this
company’s products, and 14 respond positively.
1 What are the prior odds that the analyst is correct?
2 What is the posterior probability that the analyst is correct?

You might also like