Bayes 2 V

Bayesian Inference (cont.
Greg Distiller
Greg.Distiller@uct.ac.za
University of Cape Town

Binomial Sampling
Binomial Sampling
So far the conditional probabilities of observing the data (f (xk |θr )) have been
based on a subjective evaluation of the reliability of the observations as
indicators of the state.
In many cases it is possible to use statistical theory to get more objective
values.
For example, the unknown state θ may be the proportion of “successes” in
the population.
Combining binomial sampling with decision theory is particularly relevant in
quality control where sampling information must be used efficiently and the
cost of sampling needs to be weighed against the value gained through
improved quality.
Binomial Sampling
Quality Control Example
A factory produces and sells batches of some product.

One measure of the quality of a batch is the proportion (θ) of defective items
in the batch.
Suppose that there are three different markets for the product: high quality
(HQ), medium quality (MQ), low quality (LQ).
To which market should the batch be sold?
The selling price (R per batch), minimum quality level, and penalty for not
meeting the minimum requirement are given as:
Market Selling Required Penalty
Price Quality (as %) per %
HQ 4000 10 500
MQ 2000 15 300
LQ 1000 20 200
Binomial Sampling
θ is an unknown proportion which can be interpreted as the probability of a

defective item from a batch (p in the binomial distribution).
π(0.05), π(0.15) and π(0.25): probabilities that the true value for θ is 0.05,
0.15 or 0.25.
The probability π(0.05) is a subjective probability indicating how likely it is
that the true value for θ = 0.05.
▶ π(p = 0.05) = 0.5
▶ π(p = 0.15) = 0.3
▶ π(p = 0.25) = 0.2
Binomial Sampling
Quality Control Example - payoff matrix
θ
Sell in:
0.05 0.15 0.25
HQ 4000 1500 -3500
MQ 2000 2000 -1000
LQ 1000 1000 0
π(θ) 0.5 0.3 0.2
Binomial Sampling
Quality Control Example - payoff matrix
θ Expected gain
Sell in:
0.05 0.15 0.25
HQ 4000 1500 -3500 1750
MQ 2000 2000 -1000 1400
LQ 1000 1000 0 800
P
π(θ) 0.5 0.3 0.2 E (X ) = xp(x)
The decision that maximises the expected payoff is to sell to the HQ market.
Binomial Sampling
Suppose that before making a decision, a random sample of three items can
be taken from the batch and inspected.
If X is the number of defective items found then
X ∼ Bin(3, θ)
Hence for x = 0, 1, 2, 3:

3 x
f (x|θ) = Pr [X = x|θ] = θ [1 − θ]3−x
x
These conditional probabilities are used to get the posterior probabilities.

Binomial Sampling
Quality Control Example: EVSI
Quality control example (pg 12)

Binomial Sampling
Binomial sampling
In a binomial sampling scenario:

n x
f (x|θ) = Pr [X = x|θ] = θ [1 − θ]n−x
x
What can we say about θ after observing a particular value for x?

Could consider a large number of values for θ, say θ = (0.01, 0.02, . . . , 0.99)
and associate a probability with each value:
n x
n−x
x θr [1 − θr ] π(θr )
Pr [θr |x] = π(θr |x) = P n x n−x
k x θk [1 − θk ] π(θk )
Could set this up in a spreadsheet and calculate say π(θr |x = 5). Can we do
this analytically?
Binomial Sampling
Binomial sampling
If all values of θ are possible, we need a pdf.
If all equally likely then θ ∼ U(0, 1):
π(θ) = 1 0≤θ≤1
Posterior becomes:
n x

θ [1 − θ]n−x
π(θ|x) = R 1 xn
0 x
θx [1 − θ]n−x dθ
Binomial term cancels and the denominator is a constant so:
π(θ|x) = kθx [1 − θ]n−x

Binomial Sampling
Binomial sampling
If all values of θ are possible, we need a pdf.
If all equally likely then θ ∼ U(0, 1):
π(θ) = 1 0≤θ≤1
Posterior becomes:
n x

θ [1 − θ]n−x
π(θ|x) = R 1 xn
0 x
θx [1 − θ]n−x dθ
Binomial term cancels and the denominator is a constant so:
π(θ|x) = kθx [1 − θ]n−x
Which is the form of a beta distribution:

Γ(a + b) a−1
x (1 − x)b−1
Γ(a)Γ(b)
Γ(n+2)
where a = x + 1, b = n − x + 1, and k = Γ(x+1)Γ(n−x+1)
Binomial Sampling
Binomial sampling Example: Illustration
A researcher examined the level of consensus denoted θ among n = 24 women

about whether or not polio (as well as other diseases) was thought to be
contagious. In this case, 17 women said polio was contagious.
Let Xi = 1 if respondent i thought polio was contagious and Xi = 0 otherwise.
P
Let Xi = Y ∼ Bin(24, θ) and let θ ∼ U(0, 1).
Binomial Sampling

P
A uniform prior with a binomial likelihood:
→ π(θ|y , n) ∼ Beta(Y + 1, n − Y + 1).
Binomial Sampling

P
A uniform prior with a binomial likelihood:
→ π(θ|y , n) ∼ Beta(Y + 1, n − Y + 1).
Substitute n = 24 and Y = 17 into the posterior distribution
→ π(θ|y , n) = Beta(18, 8)
Binomial Sampling
Posterior distr. for Binomial sampling
4
3
posterior
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0
x
Binomial Sampling
Binomial sampling
Can use the beta distribution to also model the prior probabilities for θ:
π(θ) = kθa−1 [1 − θ]b−1

Gives:
π(θ|x) = kθa+x−1 [1 − θ]b+n−x−1

which is a beta distribution with a = a + x and b = n + b − x.
Can select appropriate values for a and b for the prior distribution.
▶ θ expected to be around 0.2, and
▶ fairly sure (analagous to a 95% CI) that 0.1 ≤ θ ≤ 0.4.
▶ a = 5.48; b = 21.96
General principles
General Bayesian Principles
Joint probability or probability density for the full data set, i.e. the likelihood
function:
n
Y
L(θ; x1 , x2 , . . . , xn ) = f (xi |θ)
i=1
Prior distribution π(θ) for θ.

Posterior probability or probability density:
L(θ; x1 , x2 , . . . , xn )π(θ)
π(θ|x1 , x2 , . . . , xn ) =
m(x1 , x2 , . . . , xn )
R∞
where m(x1 , x2 , . . . , xn ) = −∞
L(θ; x1 , x2 , . . . , xn )π(θ)dθ
Simplified form:
π(θ|x1 , x2 , . . . , xn ) ∝ L(θ; x1 , x2 , . . . , xn )π(θ)
In many cases the distributional form that emerges is recognisable.

Now what?
Analyzing the posterior distr. of θ
1 Point estimation
2 Interval estimation
3 Hypothesis testing
Now what?
Point Estimation
Specify a “loss function” hassociatedi with the error in estimation, e.g. the
expected squared error E (θ − θ̂)2 .
Find Bayes estimate by minimizing E[(θ − θ̂)2 ] also called posterior risk.
Now what?
Point Estimation
Specify a “loss function” hassociatedi with the error in estimation, e.g. the
expected squared error E (θ − θ̂)2 .
Find Bayes estimate by minimizing E[(θ − θ̂)2 ] also called posterior risk.
⇒ Expand as E[(θ − θ̂)2 ] = E[θ2 ] − 2θ̂E[θ] + θ̂2 , and differentiate with
respect to θ̂ to give −2E[θ] + 2θ̂ = 0, or θ̂ = E[θ].
R
Often called the “Bayes’ Estimator” → θ̂Bayes = θπ(θ|x)dθ
Now what?
Interval Estimation
Find two values θL and θU such that:

Z θU
Pr[θL ≤ θ ≤ θU ] = π(θ|x1 , x2 , . . . , xn )dθ = 1 − α
θL
for any specified probability level α.

Now what?
Interval Estimation
Find two values θL and θU such that:

Z θU
Pr[θL ≤ θ ≤ θU ] = π(θ|x1 , x2 , . . . , xn )dθ = 1 − α
θL
for any specified probability level α.
Direct interpretation that θ belongs to the interval [θL , θU ] with 100(1 − α)%
probability → the credibility interval.
Now what?
Point & interval estimation
1 Analytical solutions: use the well-known analytic solutions for the mean,
variance, etc. of the various posterior distribution.
Now what?
1 Analytical solutions: use the well-known analytic solutions for the mean,
variance, etc. of the various posterior distribution.
For the polio example : π(θ|y , n) = Beta(18, 8)
18
E[θ] = = 0.69
18 + 8
and
(18)(8)
var[θ] = = 0.01
(18 + 8)2 (18 + 8 + 1)
Now what?
2 Numerical solutions:
▶ Draw a large number of random samples from the posterior distribution π(θ|x)
▶ Calculate the sample statistics from that set of random samples e.g. mean,
median, mode, variance.
▶ Obtain Highest Posterior Density regions (HPD),(also known as Bayesian
confidence or credibility intervals) of θ.
Now what?
2 Numerical solutions:
▶ Draw a large number of random samples from the posterior distribution π(θ|x)
▶ Calculate the sample statistics from that set of random samples e.g. mean,
median, mode, variance.
▶ Obtain Highest Posterior Density regions (HPD),(also known as Bayesian
confidence or credibility intervals) of θ.
For the polio example : π(θ|y , n) = Beta(18, 8)
Draw a 1000 samples from π(θ|y , n):
Mean= 0.698
var= 0.00777
95% HPD: [0.513, 0.853]
Now what?
Hypothesis Testing
Two simple hypotheses: H0 : θ = θ0 versus H1 : θ = θ1

In effect, only two values a priori possible: θ0 and θ1
The prior distribution for θ is thus a discrete two-point probability mass
function.
Let π0 = Pr[θ = θ0 ].
Posterior probabilities:
π(θ0 |x1 , x2 , . . . , xn ) ∝ L(θ0 ; x1 , x2 , . . . , xn )π0

π(θ1 |x1 , x2 , . . . , xn ) ∝ L(θ1 ; x1 , x2 , . . . , xn )(1 − π0 ).
Now what?
Hypothesis Testing
Odds on H0 being true:
L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0
i.e. product of the “likelihood ratio” (“Bayes Factor”) and the a priori odds.
Now what?
Hypothesis Testing
L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0
For costs CI of a type-I error (rejecting H0 when it is true) & CII of a type-II
error (failing to reject H0 when it is false), Bayes’ optimal decision rejects H0
if CI π(θ0 |x1 , x2 , . . . , xn ) ≤ CII π(θ1 |x1 , x2 , . . . , xn ), i.e. if
L(θ1 ; x1 , x2 , . . . , xn ) π0 CI
≥ · .
L(θ0 ; x1 , x2 , . . . , xn ) 1 − π0 CII
→ a “likelihood ratio” test.

Now what?
Hypothesis Testing
L(θ0 ; x1 , x2 , . . . , xn ) π0
·
L(θ1 ; x1 , x2 , . . . , xn ) 1 − π0
For costs CI of a type-I error (rejecting H0 when it is true) & CII of a type-II
error (failing to reject H0 when it is false), Bayes’ optimal decision rejects H0
if CI π(θ0 |x1 , x2 , . . . , xn ) ≤ CII π(θ1 |x1 , x2 , . . . , xn ), i.e. if
L(θ1 ; x1 , x2 , . . . , xn ) π0 CI
≥ · .
L(θ0 ; x1 , x2 , . . . , xn ) 1 − π0 CII
→ a “likelihood ratio” test.

Posterior probability that H0 is true:
π(θ0 |x1 , . . . , xn )
π(θ0 |x1 , . . . , xn ) + π(θ1 |x1 , . . . , xn )
Now what?
Binomial Sampling Example
A botanist wants to estimate what proportion θ of an area is occupied by a

particular species. Assume he uses the following prior for θ:
π(θ) = cθα−1 (1 − θ)β−1 for 0 < θ < 1

with parameters α = 3 and β = 15. He sends a student to sample 15 sites from
this area and determine whether or not the species occurs in that site. The
student finds the species in six of these sites.
1 What is the posterior distribution for θ?
2 Give the Bayes estimator for θ.
Now what?
Another Example
An analyst at a marketing research company wants to test the assertion that a

particular company has a 30% market share as she believes the market share has
fallen over recent times to 22%. Assume the probability of the assertion has been
estimated to be 0.65. Fifty people are randomly selected and asked if they use this
company’s products, and 14 respond positively.
1 What are the prior odds that the analyst is correct?
2 What is the posterior probability that the analyst is correct?

Bayes 2 V

Uploaded by

Copyright:

Available Formats

Bayes 2 V

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes 2 V

Uploaded by

Copyright:

Available Formats

Bayesian Inference (cont.

University of Cape Town

Quality Control Example

A factory produces and sells batches of some product.

Quality Control Example

θ is an unknown proportion which can be interpreted as the probability of a

Quality Control Example - payoff matrix

Quality Control Example - payoff matrix

Quality Control Example

These conditional probabilities are used to get the posterior probabilities.

Quality Control Example: EVSI

Quality control example (pg 12)

In a binomial sampling scenario:

What can we say about θ after observing a particular value for x?

Binomial term cancels and the denominator is a constant so:

π(θ|x) = kθx [1 − θ]n−x

Binomial term cancels and the denominator is a constant so:

π(θ|x) = kθx [1 − θ]n−x

Which is the form of a beta distribution:

Binomial sampling Example: Illustration

A researcher examined the level of consensus denoted θ among n = 24 women

Binomial sampling Example: Illustration

A researcher examined the level of consensus denoted θ among n = 24 women

Binomial sampling Example: Illustration

A researcher examined the level of consensus denoted θ among n = 24 women

Posterior distr. for Binomial sampling

0.0 0.2 0.4 0.6 0.8 1.0

π(θ) = kθa−1 [1 − θ]b−1

π(θ|x) = kθa+x−1 [1 − θ]b+n−x−1

General Bayesian Principles

Prior distribution π(θ) for θ.

π(θ|x1 , x2 , . . . , xn ) ∝ L(θ; x1 , x2 , . . . , xn )π(θ)

In many cases the distributional form that emerges is recognisable.

Analyzing the posterior distr. of θ

Find two values θL and θU such that:

for any specified probability level α.

Find two values θL and θU such that:

for any specified probability level α.

Point & interval estimation

Point & interval estimation

Point & interval estimation

Point & interval estimation

Two simple hypotheses: H0 : θ = θ0 versus H1 : θ = θ1

π(θ0 |x1 , x2 , . . . , xn ) ∝ L(θ0 ; x1 , x2 , . . . , xn )π0

Odds on H0 being true:

Odds on H0 being true:

→ a “likelihood ratio” test.

Odds on H0 being true:

→ a “likelihood ratio” test.

Binomial Sampling Example

A botanist wants to estimate what proportion θ of an area is occupied by a

π(θ) = cθα−1 (1 − θ)β−1 for 0 < θ < 1

An analyst at a marketing research company wants to test the assertion that a

You might also like