Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
88 views

Introduction To Bayesian Monte Carlo Methods in WINBUGS

1. This document introduces Bayesian analysis and Monte Carlo simulation using WinBUGS. 2. Bayesian probability allows direct statements about unknown parameters through probability distributions. 3. WinBUGS implements Bayesian models using Monte Carlo simulation and allows specification of models through directed graphs.

Uploaded by

rshe0025004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

Introduction To Bayesian Monte Carlo Methods in WINBUGS

1. This document introduces Bayesian analysis and Monte Carlo simulation using WinBUGS. 2. Bayesian probability allows direct statements about unknown parameters through probability distributions. 3. WinBUGS implements Bayesian models using Monte Carlo simulation and allows specification of models through directed graphs.

Uploaded by

rshe0025004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Introduction to Bayesian Analysis and WinBUGS

Summary
1. Probability as a means of representing uncertainty

2. Bayesian direct probability statements about parameters

Lecture 1. 3. Probability distributions

4. Monte Carlo simulation


Introduction to Bayesian Monte Carlo
5. Implementation in WinBUGS (and DoodleBUGS) - Demo
methods in WINBUGS
6. Directed graphs for representing probability models

7. Examples

1-1 1-2

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

How did it all start?


Basic idea: Direct expression of uncertainty about
In 1763, Reverend Thomas Bayes of Tunbridge Wells wrote
unknown parameters
eg ”There is an 89% probability that the absolute increase in major bleeds is less
than 10 percent with low-dose PLT transfusions” (Tinmouth et al, Transfusion,
2004)

!50 !40 !30 !20 !10 0 10 20 30

% absolute increase in major bleeds


In modern language, given r ∼ Binomial(θ, n), what is Pr(θ1 < θ < θ2 |r, n)?

1-3 1-4
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Why a direct probability distribution? Inference on proportions


1. Tells us what we want: what are plausible values for the parameter of interest? What is a reasonable form for a prior distribution for a proportion?

θ ∼ Beta[a, b] represents a beta distribution with properties:


2. No P-values: just calculate relevant tail areas
Γ(a + b) a−1
p(θ|a, b) = θ (1 − θ)b−1; θ ∈ (0, 1)
3. No (difficult to interpret) confidence intervals: just report, say, central area Γ(a)Γ(b)
that contains 95% of distribution a
E(θ|a, b) =
a+b
4. Easy to make predictions (see later) ab
V(θ|a, b) = :
(a + b)2(a + b + 1)
5. Fits naturally into decision analysis / cost-effectiveness analysis / project WinBUGS notation: theta ~ dbeta(a,b)
prioritisation

6. There is a procedure for adapting the distribution in the light of additional


evidence: i.e. Bayes theorem allows us to learn from experience

1-5 1-6

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Beta distribution Gamma distribution

Beta(0.5,0.5) Beta(1,1) Beta(5,1) Gamma(0.1,0.1) Gamma(1,1) Gamma(3,3)


0 1 2 3 4 5

0 1 2 3 4 5

0 1 2 3 4 5

0.8

0.8

0.8
0.4

0.4

0.4
0.0

0.0

0.0
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
success rate success rate success rate

Beta(5,5) Beta(5,20) Beta(50,200) Gamma(3,0.5) Gamma(30,5) Gamma(10,0.5)

0.14
0.4

0.4
0 1 2 3 4 5

12
0 1 2 3 4 5

0.2

0.2

0.06
4

0.0

0.0

0.0
0

0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 5 10 15 0 5 10 15 0 10 20 30 40


success rate success rate success rate

1-7 1-8
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

The Gamma distribution Example: Drug


Flexible distribution for positive quantities. If Y ∼ Gamma[a, b]
• Consider a drug to be given for relief of chronic pain
ba a−1 −by
p(y|a, b) = y e ; y ∈ (0, ∞)
Γ(a) • Experience with similar compounds has suggested that annual response rates
a between 0.2 and 0.6 could be feasible
E(Y |a, b) =
b
a
V(Y |a, b) = . • Interpret this as a distribution with mean = 0.4, standard deviation 0.1
b2

• Gamma[1,b] distribution is exponential with mean 1/b • A Beta[9.2,13.8] distribution has these properties

• Gamma[ 2v , 21 ] is a Chi-squared χ2v distribution on v degrees of freedom

• Y ∼ Gamma[0.001,0.001] means that p(y) ∝ 1/y, or that log Y ≈ Uniform

• Used as conjugate prior distribution for inverse variances (precisions)

• Used as sampling distribution for skewed positive valued quantities (alterna-


tive to log normal likelihood) — MLE of mean is sample mean

• WinBUGS notation: y ~ dgamma(a,b)

1-9 1-10

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Making predictions
Before observing a quantity Y , can provide its predictive distribution by integrating
out unknown parameter
!
p(Y ) = p(Y |θ)p(θ)dθ.

Predictions are useful in e.g. cost-effectiveness models, design of studies, checking


whether observed data is compatible with expectations, and so on.

0.0 0.2 0.4 0.6 0.8 1.0

probability of response

Beta[9.2, 13.8] prior distribution supporting response rates between 0.2 and 0.6,

1-11 1-12
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

If (a) Prior dist. (b) Predictive dist.


θ ∼ Beta[a, b]

0.00 0.02 0.04 0.06 0.08 0.10 0.12


Yn ∼ Binomial(θ, n),
the exact predictive distribution for Yn is known as the Beta-Binomial. It has

3
the complex form
 

density

density
Γ(a + b)  n  Γ(a + yn )Γ(b + n − yn )

2
p(yn ) = .
Γ(a)Γ(b) yn Γ(a + b + n)

1
a
E(Yn ) = n
a+b

0
If a = b = 1 (Uniform distribution), p(yn )is uniform over 0,1,...,n. 0.0 0.4 0.8 0 5 10 15 20
probability of response number of successes
But in WinBUGS we can just write

theta ~ dbeta(a,b)
Y ~ dbin(theta,n) (a) is the Beta prior distribution
(b) is the predictive Beta-Binomial distribution of the number of successes Y in
and the integration is automatically carried out and does not require algebraic the next 20 trials
cleverness. From Beta-binomial distribution, can calculate P (Yn ≥ 15) = 0.015.

1-13 1-14

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Example: a Monte Carlo approach to estimating tail-areas of distributions A simulation approach uses a computer to toss the coins!

100 simulations 10,000 simulations True distribution


Suppose we want to know the probability of getting 8 or more heads when we
toss a fair coin 10 times.

An algebraic approach:

10 ' (
& 1
Pr(≥ 8 heads) = p z|π = , n = 10
z=8
2
     
' ( 8 ' (2 ' (9 ' ( 1 ' (10 ' (0
10 10 10
=   1 1
+  1 1
+  1 1
8 2 2 9 2 2 10 2 2
= 0.0547.

0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
A physical approach would be to repeatedly throw a set of 10 coins and count Number of heads Number of heads Number of heads
the proportion of throws that there were 8 or more heads.
Proportion with 8 or more ’heads’ in 10 tosses:
(a) After 100 ’throws’ (0.02); (b) after 10,000 throws (0.0577); (c) the true
Binomial distribution (0.0547)

1-15 1-16
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

General Monte Carlo analysis - ‘forward sampling’

Used extensively in risk modelling - can think of as ’adding uncertainty’ to a


The BUGS program
spreadsheet
Bayesian inference Using Gibbs Sampling
• Suppose have logical function f containing uncertain parameters
• Language for specifying complex Bayesian models
• Constructs object-oriented internal representation of the model
• Can express our uncertainty as a prior distribution
• Simulation from full conditionals using Gibbs sampling
• Simulate many values from this prior distribution • Current version (WinBUGS 1.4) runs in Windows
• ‘Classic’ BUGS available for UNIX but this is an old version
• Calculate f at the simulated values (‘iterations’)
WinBUGS is freely available from http://www.mrc-bsu.cam.ac.uk/bugs
• Obtain an empirical predictive distribution for f
• Scripts enable WinBUGS 1.4 to run in batch mode or be called from other
• Sometimes termed probabilistic sensitivity analysis software
• Interfaces developed for R, Excel, Splus, SAS, Matlab
• Can do in Excel add-ons such as @RISK or Crystal Ball. • OpenBUGS site http://www.rni.helsinki.fi/openbugs provides an open source
version

1-17 1-18

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Running WinBUGS for Monte Carlo analysis (no Using WinBUGS for Monte Carlo
data) The model for the ’coin’ example is
1. Open Specification tool from Model menu. Y ∼ Binomial(0.5, 10)
2. Program responses are shown on bottom-left of screen. and we want to know P (Y ≥ 8).

3. Highlight model by double-click. Click on Check model. This model is represented in the BUGS language as

model{
4. Click on Compile.
Y ~ dbin(0.5,10)
5. Click on Gen Inits. P8 <- step(Y-7.5)
}
6. Open Update from Model menu, and Samples from Inference menu.
P8 is a step function which will take on the value 1 if Y -7.5 is ≥ 0, i.e. Y is 8 or
7. Type nodes to be monitored into Sample Monitor, and click set after each. more, 0 if 7 or less.

Running this simulation for 100, 10000 and 1000000 iterations, and then taking
8. Type * into Sample Monitor, and click trace to see sampled values.
the empirical mean of P8, provided the previous estimated probabilities that Y will
9. Click on Update to generate samples. be 8 or more.

10. Type * into Sample Monitor, and click stats etc to see results on all monitored
nodes.

1-19 1-20
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Some aspects of the BUGS language Functions in the BUGS language

• <- represents logical dependence, e.g. m <- a + b*x • p <- step(x-.7) = 1 if x ≥ 0.7, 0 otherwise. Hence monitoring p and recording
• ~ represents stochastic dependence, e.g. r ~ dunif(a,b) its mean will give the probability that x≥ 0.7.
• Can use arrays and loops
• p <- equals(x,.7) = 1 if x = 0.7, 0 otherwise.
for (i in 1:n){
r[i] ~ dbin(p[i],n[i]) • tau <- 1/pow(s,2) sets τ = 1/s2 .
p[i] ~ dunif(0,1)
} √
• s <- 1/ sqrt(tau) sets s = 1/ τ .
• Some functions can appear on left-hand-side of an expression, e.g.
)
• p[i,k] <- inprod(pi[], Lambda[i,,k]) sets pik = j πj Λijk . inprod2 may be
logit(p[i])<- a + b*x[i] faster.
log(m[i]) <- c + d*y[i]
• mean(p[]) to take mean of whole array, mean(p[m:n]) to take mean of elements • See ’Model Specification/Logical nodes’ in manual for full syntax.
m to n. Also for sum(p[]).
• dnorm(0,1)I(0,) means the prior will be restricted to the range (0, ∞).

1-21 1-22

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Some common Distributions


Drug example: Monte Carlo predictions
Expression Distribution Usage
Our prior distribution for proportion of responders in one year θ was Beta[9.2, 13.8].
dbin binomial r ~ dbin(p,n)
Consider situation before giving 20 patients the treatment. What is the chance
dnorm normal x ~ dnorm(mu,tau) if getting 15 or more responders?
dpois Poisson r ~ dpois(lambda)
θ ∼ Beta[9.2, 13.8] prior distribution
dunif uniform x ~ dunif(a,b) y ∼ Binomial[θ, 20] sampling distribution
dgamma gamma x ~ dgamma(a,b) Pcrit = P (y ≥ 15) Probability of exceeding critical threshold

NB. The normal is parameterised in terms of its mean and precision = 1/ variance # In BUGS syntax:
= 1/sd2. model{
theta ~ dbeta(9.2,13.8) # prior distribution
See ’Model Specification/The BUGS language: stochastic nodes/Distributions’ y ~ dbin(theta,20) # sampling distribution
in manual for full syntax. P.crit <- step(y-14.5) # =1 if y >= 15, 0 otherwise

Functions cannot be used as arguments in distributions (you need to create }


new nodes).

1-23 1-24
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

WinBUGS output and exact answers WinBUGS output

node mean sd MC error 2.5% median 97.5% start sample

theta 0.4008 0.09999 9.415E-4 0.2174 0.3981 0.6044 1 10000


y 8.058 2.917 0.03035 3.0 8.0 14.0 1 10000
P.crit 0.0151 0.122 0.001275 0.0 0.0 0.0 1 10000

Note that the mean of the 0-1 indicator P.crit provides the estimated tail-area
probability.

Exact answers from closed-form analysis:

• θ: mean 0.4 and standard deviation 0.1

• y: mean 8 and standard deviation 2.93.

• Probability of at least 15: 0.015



These are independent samples, and so MC error = SD/ No.iterations. Independent samples, and so no auto-correlation and no concern with conver-
gence.
Can achieve arbitrary accuracy by running the simulation for longer.

1-25 1-26

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Graphical representation of models Script for running Drug Monte Carlo example
Run from Model/Script menu

display(’log’) # set up log file


check(’c:/bugscourse/drug-MC’) # check syntax of model
# data(’c:/bugscourse/drug-data’) # load data file if there is one
compile(1) # generate code for 1 simulations
# inits(1,’c:/bugscourse/drug-in1’) # load initial values if necessary
gen.inits() # generate initial values for all unknown quantities
# not given initial values

set(theta) # monitor the true response rate


set(y) # monitor the predicted number of successes
• Doodle represents each quantity as a node in directed acyclic graph (DAG). set(P.crit) # monitor whether a critical number of successes occur
• Constants are placed in rectangles, random quantities in ovals trace(*) # watch some simulated values (although slows down simulation)
• Stochastic dependence is represented by a single arrow, and logical function update(10000) # perform 10000 simulations
as double arrow history(theta) # Trace plot of samples for theta
• WinBUGS allows models to be specified graphically and run directly from the stats(*) # Calculate summary statistics for all monitored quantities
graphical interface density(theta) # Plot distribution of theta
density(y) # Plot distribution of y
• Can write code from Doodles
• Good for explanation, but can be tricky to set up

1-27 1-28
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Example: Power — uncertainty in a power calculation Suppose we wish to express uncertainty concerning both θ and σ, e.g.

• a randomised trial planned with n patients in each of two arms


θ ∼ N[0.5, 0.12], σ ∼ N[1, 0.32].
• response with standard deviation σ = 1
1. Simulate values of θ and σ from their prior distributions
• aimed to have Type 1 error 5% and 80% power
2. Substitute them in the formulae
• to detect a true difference of θ = 0.5 in mean response between the groups
3. Obtain a predictive distribution over n or Power
Necessary sample size per group is
prec.sigma <- 1/(0.3*0.3) # transform sd to precision=1/sd2
2σ 2 prec.theta <- 1/(0.1*0.1)
n= (0.84 + 1.96)2 = 63 sigma ~ dnorm(1, prec.sigma)I(0,)
θ2
theta ~ dnorm(.5, prec.theta)I(0,)
Alternatively, for fixed n, the power is n <- 2 * pow( (.84 +1.96) * sigma / theta , 2 )
*+ , power <- phi( sqrt(63/2)* theta /sigma -1.96 )
nθ2 prob70 <-step(power-.7)
Power = Φ − 1.96 .
2σ 2

1-29 1-30

Introduction to Bayesian Analysis and WinBUGS

Median 95% interval


n 62.5 9.3 to 247.2
Power (%) 80 29 to 100

Lecture 2.
Introduction to conjugate Bayesian
inference
For n= 63, the median power is 80%, and a trial of 63 patients per group could
be seriously underpowered. There is a 37% chance that the power is less than
70%.

1-31 2-1
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Spiegelhalter et al (2004) define a Bayesian approach as


What are Bayesian methods?
‘the explicit use of external evidence in the design, monitoring, analysis,
interpretation and reporting of a [scientific investigation]’
• Bayesian methods have been widely applied in many areas:
– medicine / epidemiology They argue that a Bayesian approach is:
– genetics
– ecology • more flexible in adapting to each unique situation
– environmental sciences • more efficient in using all available evidence
– social and political sciences • more useful in providing relevant quantitative summaries
– finance
– archaeology than traditional methods
– .....

• Motivations for adopting Bayesian approach vary:


– natural and coherent way of thinking about science and learning
– pragmatic choice that is suitable for the problem in hand

2-2 2-3

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Example The Bayesian analyst needs to explicitly state

A clinical trial is carried out to collect evidence about an unknown ‘treatment • a reasonable opinion concerning the plausibility of different values of the
effect’ treatment effect excluding the evidence from the trial (the prior distribution)
Conventional analysis • the support for different values of the treatment effect based solely on data
from the trial (the likelihood),
• p-value for H0: treatment effect is zero
and to combine these two sources to produce
• Point estimate and CI as summaries of size of treatment effect
• a final opinion about the treatment effect (the posterior distribution)
Aim is to learn what this trial tells us about the treatment effect

Bayesian analysis The final combination is done using Bayes theorem, which essentially weights
the likelihood from the trial with the relative plausibilities defined by the prior
• Asks: ‘how should this trial change our opinion about the treatment effect?’ distribution

One can view the Bayesian approach as a formalisation of the process of learning
from experience

2-4 2-5
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Posterior distribution forms basis for all inference — can be summarised to provide
Bayes theorem and its link with Bayesian inference
• point and interval estimates of treatment effect
Bayes’ theorem Provable from probability axioms
• point and interval estimates of any function of the parameters
Let A and B be events, then
• probability that treatment effect exceeds a clinically relevant value
p(B|A)p(A)
• prediction of treatment effect in a new patient p(A|B) = .
p(B)
• prior information for future trials - )
If Ai is a set of mutually exclusive and exhaustive events (i.e. p( i Ai ) = i p(Ai ) =
1), then
• inputs for decision making

• .... p(B|Ai)p(Ai)
p(Ai |B) = ) .
j p(B|Aj )p(Aj )

2-6 2-7

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Example: use of Bayes theorem in diagnostic testing


• Our intuition is poor when processing probabilistic evidence

• A new HIV test is claimed to have “95% sensitivity and 98% specificity” • The vital issue is how should this test result change our belief that patient is
• In a population with an HIV prevalence of 1/1000, what is the chance that HIV positive?
patient testing positive actually has HIV?
• The disease prevalence can be thought of as a ‘prior’ probability (p = 0.001)
Let A be the event that patient is truly HIV positive, A be the event that they
are truly HIV negative. • Observing a positive result causes us to modify this probability to p = 0.045.
Let B be the event that they test positive. This is our ‘posterior’ probability that patient is HIV positive.

We want p(A|B). • Bayes theorem applied to observables (as in diagnostic testing) is uncontro-
versial and established
“95% sensitivity” means that p(B|A) = .95.
“98% specificity” means that p(B|A) = .02. • More controversial is the use of Bayes theorem in general statistical analyses,
where parameters are the unknown quantities, and their prior distribution
Now Bayes theorem says needs to be specified — this is Bayesian inference
p(B|A)p(A)
p(A|B) = .
p(B|A)p(A) + p(B|A)p(A)

.95×.001
Hence p(A|B) = .95×.001+.02×.999
= .045.

Thus over 95% of those testing positive will, in fact, not have HIV.

2-8 2-9
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

As with any statistical analysis, we start by positing a model which specifies


Bayesian inference p(x | θ)
Makes fundamental distinction between This is the likelihood, which relates all variables into a ’full probability model’

From a Bayesian point of view


• Observable quantities x, i.e. the data
• θ is unknown so should have a probability distribution reflecting our uncer-
• Unknown quantities θ tainty about it before seeing the data
→ need to specify a prior distribution p(θ)
θ can be statistical parameters, missing data, mismeasured data ... • x is known so we should condition on it
→ use Bayes theorem to obtain conditional probability distribution for unob-
→ parameters are treated as random variables served quantities of interest given the data:
p(θ) p(x | θ)
p(θ | x) = . ∝ p(θ) p(x | θ)
→ in the Bayesian framework, we make probability statements about model p(θ) p(x | θ) dθ
parameters This is the posterior distribution

! in the frequentist framework, parameters are fixed non-random quantities The prior distribution p(θ), expresses our uncertainty about θ before seeing the
and the probability statements concern the data data.

The posterior distribution p(θ | x), expresses our uncertainty about θ after seeing
the data.

2-10 2-11

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Inference on proportions using a continuous prior To represent external evidence that some response rates are more plausible than
others, it is mathematically convenient to use a Beta(a, b) prior distribution for θ
Suppose we now observe r positive responses out of n patients.
p(θ) ∝ θa−1 (1 − θ)b−1
Assuming patients are independent, with common unknown response rate θ, leads
to a binomial likelihood Combining this with the binomial likelihood gives a posterior distribution
 
n p(θ | r, n) ∝ p(r | θ, n)p(θ)
p(r|n, θ) =   θr (1 − θ)n−r ∝ θr (1 − θ)n−r
r ∝ θr (1 − θ)n−r θa−1 (1 − θ)b−1
θ needs to be given a continuous prior distribution. = θr+a−1 (1 − θ)n−r+b−1

Suppose that, before taking account of the evidence from our trial, we believe all ∝ Beta(r + a, n − r + b)
1
values for θ are equally likely (is this plausible?) ⇒ θ ∼ Unif(0, 1) i.e. p(θ) = 1−0 =1

Posterior is then
p(θ|r, n) ∝ θr (1 − θ)(n−r) × 1

This has form of the kernel of a Beta(r+1, n-r+1) distribution (see lect 1), where
Γ(a + b) a−1
θ ∼ Beta(a, b) ≡ θ (1 − θ)b−1
Γ(a)Γ(b)

2-12 2-13
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Comments Example: Drug

• When the prior and posterior come from the same family of distributions the • Recall example from lecture 1, where we consider early investigation of a new
prior is said to be conjugate to the likelihood drug
– Occurs when prior and likelihood have the same ‘kernel’
• Experience with similar compounds has suggested that response rates be-
• Recall from lecture 1 that a Beta(a, b) distribution has tween 0.2 and 0.6 could be feasible

mean = a/(a/ + b), • We interpreted this as a distribution with mean = 0.4, standard deviation 0.1
variance = ab/ (a + b)2(a + b + 1)
0
and showed that a Beta(9.2,13.8) distribution has these properties
Hence posterior mean is E(θ|r, n) = (r + a)/(n + a + b)
• Suppose we now treat n = 20 volunteers with the compound and observe
y = 15 positive responses
• a and b are equivalent to observing a priori a − 1 successes in a + b − 2 trials
→ can be elicited

• With fixed a and b, as r and n increase, E(θ|r, n) → r/n (the MLE), and the
variance tends to zero
– This is a general phenomenon: as n increases, posterior distribution gets
more concentrated and the likelihood dominates the prior

• A Beta(1, 1) is equivalent to Uniform(0, 1)

2-14 2-15

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

(a) (b)
Beta(9.2, 13.8) prior distribution
supporting response rates
between 0.2 and 0.6

0.0 0.2 0.4 0.6 0.8 1.0


Probability or response

Likelihood arising from a


Binomial observation of 15
successes out of 20 cases

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40

Probability or response Probability or response Number of sucesses

Parameters of the Beta (a) Beta posterior distribution after having observed 15 successes in 20 trials
distribution are updated to (b) predictive Beta-Binomial distribution of the number of successes ỹ40 in the
(a+15, b+20-15) = (24.2, 18.8): next 40 trials with mean 22.5 and standard deviation 4.3
mean 24.2/(24.2+18.8) = 0.56
Suppose we would consider continuing a development program if the drug man-
0.0 0.2 0.4 0.6 0.8 1.0 aged to achieve at least a further 25 successes out of these 40 future trials
Probability or response
From Beta-binomial distribution, can calculate P (ỹ40 ≥ 25) = 0.329

2-16 2-17
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

The drug model can be written


Drug (continued): learning about parameters from θ ∼ Beta[a, b] prior distribution
data using Markov chain Monte-Carlo (MCMC) meth-
y ∼ Binomial[θ, m] sampling distribution
ods in WInBUGS
ypred ∼ Binomial[θ, n] predictive distribution
• Using MCMC (e.g. in WinBUGS), no need to explicitly specify posterior Pcrit = P (ypred ≥ ncrit) Probability of exceeding critical threshold

• Can just specify the prior and likelihood separately # In BUGS syntax:

• WinBUGS contains algorithms to evaluate the posterior given (almost) arbi- # Model description ’
trary specification of prior and likelihood model {
theta ~ dbeta(a,b) # prior distribution
– posterior doesn’t need to be closed form y ~ dbin(theta,m) # sampling distribution
– but can (usually) recognise conjugacy when it exists y.pred ~ dbin(theta,n) # predictive distribution
P.crit <- step(y.pred-ncrit+0.5) # =1 if y.pred >= ncrit, 0 otherwise
}

2-18 2-19

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Graphical representation of models Data files


Data can be written after the model description, or held in a separate .txt or .odc
file

list( a = 9.2, # parameters of prior distribution


b = 13.8,
y = 15, # number of successes
m = 20, # number of trials
n = 40, # future number of trials
ncrit = 25) # critical value of future successes

Alternatively, in this simple example, we could have put all data and constants
into model description:

model{
theta ~ dbeta(9.2,13.8) # prior distribution
y ~ dbin(theta,20) # sampling distribution
y.pred ~ dbin(theta,40) # predictive distribution
P.crit <- step(y.pred-24.5) # =1 if y.pred >= ncrit, 0 otherwise
Note that adding data to a model is simply extending the graph. y <- 15
}

2-20 2-21
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

The WinBUGS data formats


Initial values
WinBUGS accepts data files in:
• WinBUGS can automatically generate initial values for the MCMC analysis
1. Rectangular format (easy to cut and paste from spreadsheets)
using gen inits
n[] r[]
47 0 • Fine if have informative prior information
148 18
... • If have fairly ‘vague’ priors, better to provide reasonable values in an initial-
360 24 values list
END
Initial values list can be after model description or in a separate file

2. S-Plus format: list(theta=0.1)

list(N=12,n = c(47,148,119,810,211,196,
148,215,207,97,256,360),
r = c(0,18,8,46,8,13,9,31,14,8,29,24))

Generally need a ‘list’ to give size of datasets etc.

2-22 2-23

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

WinBUGS output
Running WinBUGS for MCMC analysis (single chain)
1. Open Specification tool from Model menu.
2. Program responses are shown on bottom-left of screen.
3. Highlight model by double-click. Click on Check model.
4. Highlight start of data. Click on Load data.
5. Click on Compile.
6. Highlight start of initial values. Click on Load inits.
7. Click on Gen Inits if more initial values needed.
8. Open Update from Model menu.
9. Click on Update to burn in.
10. Open Samples from Inference menu.
11. Type nodes to be monitored into Sample Monitor, and click set after each.
12. Perform more updates.
13. Type * into Sample Monitor, and click stats etc to see results on all monitored
nodes.

2-24 2-25
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

WinBUGS output and exact answers Bayesian inference using the Normal distribution
node mean sd MC error 2.5% median 97.5% start sample Known variance, unknown mean
theta 0.5633 0.07458 4.292E-4 0.4139 0.5647 0.7051 1001 30000
y.pred 22.52 4.278 0.02356 14.0 23.0 31.0 1001 30000 Suppose we have a sample of Normal data xi ∼ N(θ, σ 2 ) (i = 1, ..., n).
P.crit 0.3273 0.4692 0.002631 0.0 0.0 1.0 1001 30000
For now assume σ 2 is known and θ has a Normal prior θ ∼ N(µ, σ 2/n0)
Exact answers from conjugate analysis
Same standard deviation σ is used in the likelihood and the prior. Prior variance
• θ: mean 0.563 and standard deviation 0.075 is based on an ‘implicit’ sample size n0

Then straightforward to show that the posterior distribution is


• Y pred: mean 22.51 and standard deviation 4.31.
σ2
' (
n0 µ + nx
θ|x ∼ N ,
• Probability of at least 25: 0.329 n0 + n n0 + n

MCMC results are within Monte Carlo error of the true values

2-26 2-27

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Example: THM concentrations


• As n0 tends to 0, the prior variance becomes larger and the distribution
becomes ‘flatter’, and in the limit the prior distribution becomes essentially • Regional water companies in the UK are required to take routine measure-
uniform over −∞, ∞ ments of trihalomethane (THM) concentrations in tap water samples for
regulatory purposes
• Posterior mean (n0µ + nx)/(n0 + n) is a weighted average of the prior mean
µ and parameter estimate x, weighted by their precisions (relative ‘sample • Samples are tested throughout the year in each water supply zone
sizes’), and so is always a compromise between the two
• Suppose we want to estimate the average THM concentration in a particular
• Posterior variance is based on an implicit sample size equivalent to the sum water zone, z
of the prior ‘sample size’ n0 and the sample size of the data n
• Two independent measurements, xz1 and xz2 are taken and their mean, xz is
130 µg/l
• As n → ∞, p(θ|x) → N(x, σ 2/n) which does not depend on the prior
• Suppose we know that the assay measurement error has a standard deviation

• Compare with frequentist setting, the MLE is θ̂ = x̄ with SE(θ̂) = σ/ n, and σ[e] = 5µg/l
sampling distribution
• What should we estimate the mean THM concentration to be in this water
p(θ̂ | θ) = p(x̄|θ) = N(θ, σ 2/n) zone?

Let the mean THM conc. be denoted θz .


A standard analysis would use the sample
√ mean xz = 130µg/l as an estimate of

θz , with standard error σ[e] / n = 5/ 2 = 3.5µg/l

A 95% confidence interval is xz ± 1.96 × σ[e] / n, i.e. 123.1 to 136.9 µg/l.

2-28 2-29
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Suppose historical data on THM levels in other zones supplied from the same
source showed that the mean THM concentration was 120 µg/l with standard
deviation 10 µg/l

• suggests Normal(120, 102) prior for θz


Prior
√ Likelihood
• if we express the prior standard deviation as σ[e] / n0 , we can solve to find
n0 = (σ[e] /10)2 = 0.25 Posterior

2 /0.25)
• so our prior can be written as θz ∼ Normal(120, σ[e]

Posterior for θz is then


52
' (
0.25 × 120 + 2 × 130
p(θz |x) = Normal ,
0.25 + 2 0.25 + 2
= Normal(128.9, 3.332)

giving 95% interval for θz of 122.4 to 135.4µg/l 80 100 120 140 160 180

mean THM concentration, ug/l (theta)

2-30 2-31

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Prediction Example: THM concentration (continued)

Denoting the posterior mean and variance as µn = (n0µ + nx)/(n0 + n) and σn2 = • Suppose the water company will be fined if THM levels in the water supply
σ 2 /(n0 + n), the predictive distribution for a new observation x̃ is exceed 145µg/l

• Predictive distribution for THM concentration in a future sample taken from


!
p(x̃|x) = p(x̃|x, θ)p(θ|x)dθ the water zone is N(128.9, 3.332 + 52 ) = N(128.9, 36.1)
which generally simplifies to • Probability that THM3concentration in future sample exceeds 145µg/l is
! 1 − Φ[(145 − 128.9)/ (36.1)] = 0.004
p(x̃|x) = p(x̃|θ)p(θ|x)dθ

which can be shown to give Posterior

p(x̃|x) ∼ N µn, σn2 + σ 2 Predictive


1 2

So the predictive distribution is centred around the posterior mean with variance
equal to sum of the posterior variance and the sample variance of x̃

80 100 120 140 160 180

THM concentration, ug/l

2-32 2-33
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

This implies the following posterior


Bayesian inference using count data p(µ | x) ∝ p(µ) p(x | µ)
Suppose we have an independent sample of counts x1 , ..., xn which can be assumed
to follow a Poisson distribution with unknown mean µ: n
ba a−1 −bµ 4 −µ µxi
= µ e e
4 µxi e−µ Γ(a) i=1
xi !
p(x|µ) =
i
xi !

The kernel of the Poisson likelihood (as a function of µ) has the same form as
that of a Gamma(a, b) prior for µ: ∝ µa+nx−1 e−(b+n)µ

ba a−1 −bµ
p(µ) = µ e
Γ(a) = Gamma(a + nx, b + n).
The posterior is another (different) Gamma distribution.
Note: A Gamma(a, b) density has mean a/b and variance a/b2
The Gamma distribution is said to be the conjugate prior.
' ( ' (
a + nx n a n
E(µ | x) = = x + 1−
b+n n+b b n+b

So posterior mean is a compromise between the prior mean a/b and the MLE x

2-34 2-35

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Example: London bombings during WWII The ‘invariant’


√ Jeffreys prior (see later) for the mean θ of a Poisson distribution
is p(θ) ∝ 1/ θ, which is equivalent to an (improper) Gamma(0.5,0) distribution.
• Data below are the number of flying bomb hits on London during World War Therefore
II in a 36 km2 area of South London p(θ|y ) = Gamma(a + nx, b + n) = Gamma(537.5, 576)
537.5 537.5
• Area was partitioned into 0.25 km2 grid squares and number of bombs falling IE(θ|y ) = = 0.933; Var(θ|y ) = = 0.0016
576 5762
in each grid was counted
Note that these are almost exactly the same as the MLE and the square of the
Hits, x 0 1 2 3 4 7 SE(MLE)
Number of areas, n 229 211 93 35 7 1
)
Total hits, i ni xi = 537
)
Total number of areas, i ni = 576

• If the hits are random, a Poisson distribution with constant hit rate θ should
fit the data

• Can think of n = 576 observations from a Poisson distribution, with x =


537/576 = 0.93

2-36 2-37
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Summary Choosing prior distributions


For all these examples, we see that When the posterior is in the same family as the prior then we have what is
known as conjugacy. This has the advantage that prior parameters can usually
• the posterior mean is a compromise between the prior mean and the MLE be interpreted as a prior sample. Examples include:
• the posterior s.d. is less than each of the prior s.d. and the s.e.(MLE)
Likelihood Parameter Prior Posterior
‘A Bayesian is one who, vaguely expecting a horse and catching a glimpse Normal mean Normal Normal
of a donkey, strongly concludes he has seen a mule’ (Senn, 1997)
Normal precision Gamma Gamma
As n → ∞, Binomial success prob. Beta Beta
Poisson rate or mean Gamma Gamma
• the posterior mean → the MLE
• the posterior s.d. → the s.e.(MLE) • Conjugate prior distributions are mathematically convenient, but do not exist
• the posterior does not depend on the prior. for all likelihoods, and can be restrictive

These observations are generally true, when the MLE exists and is unique • Computations for non-conjugate priors are harder, but possible using MCMC
(see next lecture)

2-38 2-39

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Calling WinBUGS from other software Further reading


• Scripts enable WinBUGS 1.4 to be called from other software Berry (1996) (Introductory text on Bayesian methods, with medical slant)

Lee (2004) (Good intro to Bayesian inference; more mathematical than Berry;
• Interfaces developed for R, Splus, SAS, Matlab 3rd edition contains WinBUGS examples)

• See www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml Bernardo and Smith (1994) (Advanced text on Bayesian theory)

• Andrew Gelman’s bugs function for R is most developed - reads in data, writes
script, monitors output etc. Now packaged as R2WinBUGS.

• OpenBUGS site http://mathstat.helsinki.fi/openbugs/ provides an open source


version, including BRugs package which works from within R

2-40 2-41
Introduction to Bayesian Analysis and WinBUGS

Why is computation important?


• Bayesian inference centres around the posterior distribution
p(θ |x) ∝ p(x|θ ) × p(θ )
where θ is typically a large vector of parameters θ = {θ1 , θ2 , ...., θk }

• p(x|θ ) and p(θ ) will often be available in closed form, but p(θ |x) is usually
Lecture 3. not analytically tractable, and we want to
.. .
Introduction to MCMC – obtain marginal posterior p(θi |x) =
notes the vector of θ’s excluding θi
... p(θ |x) dθ (−i) where θ (−i) de-
.
– calculate
.∞ properties of p(θi |x), such as mean (= θi p(θi |x)dθi ), tail areas
(= T p(θi |x)dθi ) etc.

→ numerical integration becomes vital

3-1 3-2

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Monte Carlo integration How do we sample from the posterior?

We have already seen that Monte Carlo methods can be used to simulate values • We want samples from joint posterior distribution p(θ |x)
from prior distributions and from closed form posterior distributions
• Independent sampling from p(θ |x) may be difficult
If we had algorithms for sampling from arbitrary (typically high-dimensional) pos-
terior distributions, we could use Monte Carlo methods for Bayesian estimation: • BUT dependent sampling from a Markov chain with p(θ |x) as its stationary
• Suppose we can draw samples from the joint posterior distribution for θ , i.e. (equilibrium) distribution is easier

(θ1(1) , ..., θk(1) ), (θ1(2) , ..., θk(2) ), ..., (θ1(N ) , ..., θk(N ) ) ∼ p(θ |x) • A sequence of random variables θ(0) , θ(1) , θ(2) , ... forms a Markov chain if
θ(i+1) ∼ p(θ|θ(i) ) i.e. conditional on the value of θ(i) , θ(i+1) is independent
• Then of θ(i−1) , ..., θ(0)
– θ1(1) , ..., θ1(N ) are a sample from the marginal posterior p(θ1|x) • Several standard ‘recipes’ available for designing Markov chains with required
(i) stationary distribution p(θ |x)
– E(g(θ1)) = g(θ1)p(θ1 |x)dθ1 ≈ N1 N
. )
i=1 g(θ1 )
– Metropolis et al. (1953); generalised by Hastings (1970)
→ this is Monte Carlo integration – Gibbs Sampling (see Geman and Geman (1984), Gelfand and Smith
(1990), Casella and George (1992)) is a special case of the Metropolis-
→ theorems exist which prove convergence in limit as N → ∞ even if the sample Hastings algorithm which generates a Markov chain by sampling from full
is dependent (crucial to the success of MCMC) conditional distributions
– See Gilks, Richardson and Spiegelhalter (1996) for a full introduction and
many worked examples.

3-3 3-4
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Gibbs sampling Gibbs sampling ctd.

Let our vector of unknowns θ consist of k sub-components θ = (θ1 , θ2 , ..., θk ) Example with k = 2

!2
1) Choose starting values θ1(0) , θ2(0) , ..., , θk(0)
(1)
!
p(!)
2) Sample θ1(1) from p(θ1 |θ2(0) , θ3(0) , ..., , θk(0) , x) !
(2)
(0)
!
Sample θ2(1)
from p(θ2 |θ1(1) , θ3(0) , ..., , θk(0) , x)
.....
!1
Sample θk(1) from p(θk |θ1(1) , θ2(1) , ..., , θk−1
(1)
, x)

• Sample θ1(1) from p(θ1 |θ2(0) , x)


3) Repeat step 2 many 1000s of times
– eventually obtain sample from p(θ |x) • Sample θ2(1) from p(θ2 |θ1(1) , x)
• Sample θ1(2) from p(θ1 |θ2(1) , x)
The conditional distributions are called ‘full conditionals’ as they condition on all
• ......
other parameters
θ (n) forms a Markov chain with (eventually) a stationary distribution p(θ |x).

3-5 3-6

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Checking convergence
Using MCMC methods This is the users responsibility!
There are two main issues to consider
• Note: Convergence is to target distribution (the required posterior), not to
a single value.
• Convergence (how quickly does the distribution of θ (t) approach p(θ |x)?)
• Once convergence reached, samples should look like a random scatter about
• Efficiency (how well are functionals of p(θ |x) estimated from {θ (t) }?) a stable mean value

3-7 3-8
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Convergence diagnosis Example: A dose-response model

Consider the following response rates for different doses of a drug


• How do we know we have reached convergence?
dose xi No. subjects ni No. responses ri
• i.e. How do we the know number of ‘burn-in’ iterations?
1.69 59 6
• Many ‘convergence diagnostics’ exist, but none foolproof 1.72 60 13
1.75 62 18
• CODA and BOA software contain large number of diagnostics
1.78 56 28
Gelman-Rubin-Brooks diagnostic 1.81 63 52
1.83 59 53
• A number of runs
1.86 62 61
• Widely differing starting points 1.88 60 60

• Convergence assessed by quantifying whether sequences are much further Fit a logistic curve with ‘centred’ covariate (xi − x):
apart than expected based on their internal variability
ri ∼ Bin(pi, ni )
logit pi = α + β(xi − x)
• Diagnostic uses components of variance of the multiple sequences
α ∼ N(0, 10000)
β ∼ N(0, 10000)

3-9 3-10

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Checking convergence with multiple runs


Output for ‘centred’ analysis
• Set up multiple initial value lists, e.g.
history bgr diagnostic
list(alpha=-100, beta=100)
list(alpha=100, beta=-100) beta chains 1:2 beta chains 1:2
400.0 1.0
300.0
• Before clicking compile, set num of chains to 2 200.0 0.5
100.0
0.0
• Load both sets of initial values 0.0
1 2000 4000 6000 1 2000 4000 6000
iteration iteration
• Monitor from the start of sampling
trace autocorrelation
• Assess how much burn-in needed using the bgr statistic beta chains 2:1 beta chains 1:2
50.0 1.0
0.5
Using the bgr statistic 40.0
0.0
30.0 -0.5
• Green: width of 80% intervals of pooled chains: should be stable 20.0 -1.0
7750 7800 7850 7900 7950 0 20 40
iteration lag
• Blue: average width of 80% intervals for chains: should be stable

• Red: ratio of pooled/within: should be near 1


node mean sd MC error 2.5% median 97.5% start sample
• Double-click on plot, then cntl + right click gives statistics alpha 0.7489 0.139 0.00138 0.4816 0.7468 1.026 1001 14000
beta 34.6 2.929 0.02639 29.11 34.53 40.51 1001 14000

3-11 3-12
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Problems with convergence


History plots for ‘un-centred’ analysis
Fit a logistic curve with ‘un-centred’ covariate x:
ri ∼ Bin(pi , ni )
alpha chains 1:2
logit pi = α + βxi 25.0
α ∼ N(0, 10000) 0.0
-25.0
β ∼ N(0, 10000) -50.0
-75.0
-100.0
1 5000 10000 15000 20000
iteration

beta chains 1:2


60.0
40.0
20.0
0.0
-20.0
1 5000 10000 15000 20000
iteration

3-13 3-14

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

bgr output for ‘un-centred’ analysis Output for ‘un-centred’ analysis


beta chains 1:2 posterior density autocorrelation
beta chains 1:2 sample: 40000 beta chains 1:2
4.0 0.15 1.0
0.1 0.5
0.0
3.0 0.05 -0.5
0.0 -1.0
20.0 25.0 30.0 35.0 40.0 0 20 40
2.0 lag

bivariate posteriors
1.0
centred un-centred
alpha alpha
0.0
1.5 -40.0
-50.0
1 20000 iteration 40000 60000 1.0
-60.0
0.5
Drop first 40,000 iterations as burn-in -70.0

0.0 -80.0
node mean sd MC error 2.5% median 97.5% start sample
20.0 30.0 40.0 25.0 35.0
beta 33.97 2.955 0.1734 28.7 33.89 40.3 40001 40000 beta beta

3-15 3-16
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

How many iterations after convergence?


Inference using posterior samples from MCMC runs
• After convergence, further iterations are needed to obtain samples for pos-
terior inference. A powerful feature of the Bayesian approach is that all inference is based on the
joint posterior distribution
• More iterations = more accurate posterior estimates.
⇒ can address wide range of substantive questions by appropriate summaries of
the posterior
• Efficiency of sample mean of θ as estimate of theoretical posterior expectation
E(θ) usually assessed by calculating Monte Carlo standard error (MC error)
• Typically report either mean or median of the posterior samples for each
parameter of interest as a point estimate
• MC error = standard error of posterior sample mean as estimate of theoretical
expectation for given parameter
• 2.5% and 97.5% percentiles of the posterior samples for each parameter give
a 95% posterior credible interval (interval within which the parameter lies
• MC error depends on
with probability 0.95)
– true variance of posterior distribution
node mean sd MC error 2.5% median 97.5% start sample
– posterior sample size (number of MCMC iterations) beta 34.60 2.929 0.0239 29.11 34.53 40.51 1001 14000
– autocorrelation in MCMC sample So point estimate of beta would be 34.60, with 95% credible interval (29.11,
40.51)
• Rule of thumb: want MC error < 1 − 5% of posterior SD

3-17 3-18

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Probability statements about parameters Complex functions of parameters

• Classical inference cannot provide probability statements about parameters • Classical inference about a function of the parameters g(θ) requires con-
(e.g. p-value is not Pr(H0 true), but probability of observing data as or more struction of a specific estimator of g(θ). Obtaining appropriate error can be
extreme than we obtained, given that H0 is true) difficult.

• In Bayesian inference, it is simple to calculate e.g. Pr(θ > 1): • Easy using MCMC: just calculate required function g(θ) as a logical node at
each iteration and summarise posterior samples of g(θ)
= Area under posterior distribution curve to the right of 1
= Proportion of values in posterior sample of θ which are > 1 In dose-response example, suppose we want to estimate the ED95: that is the
dose that will provide 95% of maximum efficacy.
logit 0.95 = α + β(ED95 − x)
Posterior Distribution of theta
• In WinBUGS use the step function: ED95 = (logit 0.95 − α)/β + x
p.theta <- step(theta - 1)
Shaded Area
= Simply add into model
Prob(theta>1) • For discrete parameters, may also be
interested in Pr(δ = δ0 ): ED95 <- (logit(0.95) - alpha)/beta + mean(x[])
p.delta <- equals(delta, delta0)
node mean sd MC error 2.5% median 97.5% start sample
0.5 1.0 1.5 2.0 2.5 3.0
• Posterior means of p.theta and p.delta ED95 1.857 0.007716 8.514E-5 1.843 1.857 1.874 1001 10000
theta give the required probabilities

3-19 3-20
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

How to rank if you must Example of ranking: ‘Blocker’ trials

• Recent trend in UK towards ranking ‘institutional’ performance e.g. schools, • 22 trials of beta-blockers used in WinBUGS manual to illustrate random-
hospitals effects meta-analysis.

• Might also want to rank treatments, answer ‘which is the best‘ etc • Consider just treatment arms: which trial has the lowest mortality rate?

• Rank of a point estimate is a highly unreliable summary statistic • Assume independent ‘Jeffreys’ beta[0.5, 0.5] prior for each response rate.

⇒ Would like measure of uncertainty about rank for( i in 1 : Num) {


rt[i] ~ dbin(pt[i],nt[i])
pt[i] ~ dbeta(0.5,0.5) # Jeffreys prior
• Bayesian methods provide posterior interval estimates for ranks rnk[i] <- rank(pt[], i) # rank of i’th trial
prob.lowest[i] <- equals(rnk[i],1) # prob that i’th trial lowest
• WinBUGS contains ‘built-in’ options for ranks: N[i]<-i # used for indexing plot
– Rank option of Inference menu monitors the rank of the elements of a }
specified vector
– rank(x[], i) returns the rank of the ith element of x
– equals(rank(x[],i),1) =1 if ith element is ranked lowest, 0 otherwise.
Mean is probability that ith element is ’best’ (if counting adverse events)
– ranked(x[], i) returns the value of the ith -ranked element of x

3-21 3-22

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Mortality rates and ranks

Prob of lowest mortality


1.0
box plot: pt caterpillar plot: rnk
0.5
[13] [13]
[22] [22]
[19] [19]
[18] [18]

[14]
[2]
[4]
[2] 0.0
[4] [14]
[21] [21]
[10] [6] 0.0 10.0 20.0
[11] [10]
[6]
[3]
[11]
[3]
Trial
[5] [5]
[1] [1]
[9]
[8]
[9]
[8] Ranking methods may be useful when
[7] [7]
[17] [17]
[20] [20]
[16]
[15]
[16]
[15]
• comparing alternative treatments
[12] [12]
• comparing subsets
0.0 0.1 0.2 0.0 10.0 20.0 • comparing response-rates, cost-effectiveness or any summary measure

3-23 3-24
Introduction to Bayesian Analysis and WinBUGS

Further reading
Gelfand and Smith (1990) (key reference to use of Gibbs sampling for Bayesian
calculations)

Casella and George (1992) (Explanation of Gibbs sampling)

Brooks (1998) (tutorial paper on MCMC)

Spiegelhalter et al (1996) (Comprehensive coverage of practical aspects of MCMC) Lecture 4.


Bayesian linear regression models

3-25 4-1

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Some advantages of a Bayesian formulation in regression modelling include:


Bayesian regression models
• Easy to include parameter restrictions and other relevant prior knowledge
Standard (and non standard) regression models can be easily formulated within a
Bayesian framework. • Easily extended to non-linear regression

• Specify probability distribution (likelihood) for the data • Easily ‘robustified’

• Specify form of relationship between response and explanatory variables • Easy to make inference about functions of regression parameters and/or pre-
dictions
• Specify prior distributions for regression coefficients and any other unknown
(nuisance) parameters • Easily extended to handle missing data and covariate measurement error

4-2 4-3
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Linear regression Example: New York Crime data


Consider a simple linear regression with univariate Normal outcome yi and a vector • 23 Precincts in New York City
of covariates x1i , ..., xpi, i = 1, ..., n
p
& • Response = THEFT: seasonally adjusted changes in larcenies (thefts) from a
yi = β0 + βk xki + )i 27-week base period in 1966 to a 58-week experimental period in 1966-1967
k=1
)i ∼ Normal(0, σ 2 ) • Predictors = MAN: % change in police manpower; DIST: district indicator
(1 Downtown, 2 Mid-town, 3 Up-town)
An equivalent Bayesian formulation would typically specify
yi ∼ Normal(µi , σ 2 ) • Model specification:
p
THEFTi ∼ Normal(µi, σ 2 ) i = 1, ..., 21
&
µi = β0 + βk xki
k=1 µi = α + β × MANi + <effect of DIST>
(β0 , β1, ..., βp, σ 2) ∼ Prior distributions 1/σ 2 ∼ Gamma(0.001, 0.001)
α ∼ N(0, 100000)
A typical choice of ‘vague’ prior distribution (see later for more details) that will
give numerical results similar to OLS or MLE is: β ∼ N(0, 100000)
Prior on coefficients for DIST effect
βk ∼ N(0, 100000) k = 0, ..., p
1/σ 2 ∼ Gamma(0.001, 0.001)

4-4 4-5

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Specifying categorical covariates using the BUGS language BUGS model code is then

DISTi is a 3-level categorical explanatory variable for (i in 1:N) {


THEFT[i] ~ dnorm(mu[i], tau)
Two alternative ways of specifying model in BUGS language mu[i] <- alpha + beta*MAN[i] + delta2*DIST2[i] + delta3*DIST3[i]
}
1. Create usual ’design matrix’ in data file: alpha ~ dnorm(0, 0.00001)
beta ~ dnorm(0, 0.00001)
MAN[] THEFT[] DIST2[] DIST3[]
delta2 ~ dnorm(0, 0.00001)
-15.76 3.19 0 0 # district 1
delta3 ~ dnorm(0, 0.00001)
0.98 -3.45 0 0
tau ~ dgamma(0.001, 0.001)
3.71 0.04 0 0
sigma2 <- 1/tau
.......
-9.56 3.68 0 0 Note: BUGS parameterises normal in terms of mean and precision (1/variance)!!
-2.06 8.63 1 0 # district 2
-0.76 10.82 1 0 Initial values file would be something like
-6.30 -0.50 1 0
....... list(alpha = 1, beta = -2, delta2 = -2, delta3 = 4, tau = 2)
-2.82 -2.02 1 0
-16.19 0.94 0 1 # district 3
-11.00 4.42 0 1
......
-10.77 1.58 0 1
END

4-6 4-7
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

2. Alternatively, input explanatory variable as single vector coded by its level: Then use ’double indexing’ feature of BUGS language

MAN[] THEFT[] DIST[] for (i in 1:23) {


-15.76 3.19 1 THEFT[i] ~ dnorm(mu[i], tau)
0.98 -3.45 1 mu[i] <- alpha + beta*MAN[i] + delta[DIST[i]]
3.71 0.04 1 }
..... alpha ~ dnorm(0, 0.00001)
-9.56 3.68 1 beta ~ dnorm(0, 0.00001)
-2.06 8.63 2 delta[1] <- 0 # set coefficient for reference category to zero
-0.76 10.82 2 delta[2] ~ dnorm(0, 0.00001)
-6.30 -0.50 2 delta[3] ~ dnorm(0, 0.00001)
..... tau ~ dgamma(0.001, 0.001)
-2.82 -2.02 2 sigma2 <- 1/tau
-16.19 0.94 3
-11.00 4.42 3 In initial values file, need to specify initial values for delta[2] and delta[3] but
..... not delta[1]. Use following syntax:
-10.77 1.58 3
END list(alpha = 1, beta = -2, delta = c(NA, -2, 4), tau = 2)

4-8 4-9

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Raw data
Change in theft rate per 1% increase in police manpower
15

15

beta chains 1:2 sample: 10000


4.0 Posterior mean -0.24
10

10

3.0
change in theft rate

change in theft rate

2.0 95% interval (-0.47, -0.01)


1.0
0.0
5

-1.0 -0.5 0.0

Change in theft rate in Midtown relative to Downtown


0

delta[2] chains 1:2 sample: 10000


0.15 Posterior mean 0.6
!5

!5

0.1
0.05 95% interval (-5.1, 6.6)
!10

!10

0.0
-20.0 -10.0 0.0 10.0

!20 0 20 40 1 2 3 Change in theft rate in Uptown relative to Downtown


% change in manpower District delta[3] chains 1:2 sample: 10000
0.15 Posterior mean -4.0
0.1
0.05 95% interval (-10.0, 2.1)
0.0
-20.0 -10.0 0.0 10.0

4-10 4-11
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

• 95% intervals for DIST effect both include zero • Influential point corresponds to 20th Precinct
→ drop DIST from model (see later for Bayesian model comparison criteria)
• During 2nd period, manpower assigned to this Precinct was experimentally
Change in theft rate per 1% increase in police manpower increased by about 40%
beta chains 1:2 sample: 10000
4.0 Posterior mean -0.18 • No experimental increases in any other Precinct
3.0
2.0 95% interval (-0.39, 0.04)
1.0
0.0 → Robustify model assuming t-distributed errors
-1.0 -0.5 0.0
for (i in 1:23) {
THEFT[i] ~ dt(mu[i], tau, 4) # robust likelihood (t on 4 df)
mu[i] <- alpha + beta*MAN[i]
}
alpha ~ dnorm(0, 0.00001)
beta ~ dnorm(0, 0.00001)
change in theft rate

20.0
tau ~ dgamma(0.001, 0.001)
10.0 sigma2 <- 1/tau
Observed data
0.0
Fitted value, dummy <- DIST[1] # ensures all variables in data file appear in model code
-10.0
mu[i]
-20.0
95% interval for
-20.0 0.0 20.0 40.0 mu[i]
% change in police manpower

4-12 4-13

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Model with Normal errors • Precinct 20 still quite influential


beta chains 1:2 sample: 10000
4.0 • Add additional covariate corresponding to a binary indicator for Precinct 20
20.0 3.0

10.0
2.0 – Equivalent to fitting separate (saturated) model to Precinct 20
1.0
0.0
0.0 for(i in 1:23) {
-1.0 -0.5 0.0
-10.0 THEFT[i] ~ dt(mu[i], tau, 4) # robust likelihood (t on 4 df)
Posterior mean -0.18 mu[i] <- alpha + beta*MAN[i] + delta*PREC20[i] # separate term for precinct 20
-20.0
95% interval (-0.39, 0.04) }
-20.0 0.0 20.0 40.0 alpha ~ dnorm(0, 0.000001)
beta ~ dnorm(0, 0.000001)
delta ~ dnorm(0, 0.000001)
Model with Student t errors tau ~ dgamma(0.001, 0.001)
sigma2 <- 1/tau # residual error variance
beta chains 1:2 sample: 10000
4.0 dummy <- DIST[1] # ensures all variables in data file appear in model code
20.0 3.0
2.0
10.0 1.0 # Create indicator variable for precinct 20
0.0 # (alternatively, could add this variable to data file)
0.0
-1.0 -0.5 0.0 for(i in 1:13) { PREC20[i] <- 0 }
-10.0
PREC20[14] <- 1
-20.0 Posterior mean -0.13
for(i in 15:23) { PREC20[i] <- 0 }
-20.0 0.0 20.0 40.0 95% interval (-0.36, 0.18)

4-14 4-15
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

model fit: mu Specifying prior distributions


20.0 Why did we choose a N(0, 100000) prior for each regression coefficient and a
10.0 Gamma(0.001, 0.001) prior for the inverse of the error variance?
0.0 Choice of prior is, in principle, subjective
-10.0
• it might be elicited from experts (see Spiegelhalter et al (2004), sections 5.2,
-20.0
5.3)
-30.0
• it might be more convincing to be based on historical data, e.g. a previous
-20.0 0.0 20.0 40.0
study
→ assumed relevance is still a subjective judgement (see Spiegelhalter at al
beta chains 1:2 sample: 10000 (2004), section 5.4)
3.0 Posterior mean 0.18
2.0 • there has been a long and complex search for various ‘non-informative’, ‘ref-
1.0 95% interval (-0.16, 0.49) erence’ or ‘objective’ priors (Kass and Wasserman, 1996)
0.0
-1.0 -0.5 0.0 0.5

4-16 4-17

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Uniform priors (Bayes 1763; Laplace, 1776)


‘Non-informative’ priors
Set p(θ) ∝ 1
• Better to refer to as ‘vague’, ‘diffuse’ or ‘minimally informative’ priors .
• This is improper ( p(θ)dθ -= 1)
• Prior is vague with respect to the likelihood
• The posterior will still usually be proper
– prior mass is diffusely spread over range of parameter values that are
plausible, i.e. supported by the data (likelihood) • Inference is based on the likelihood p(x | θ)

• It is not really objective, since a flat prior p(θ)


5 5 ∝ 1 on5 θ 5does not correspond
5 dθ 5 5 dθ 5
to a flat prior on φ = g(θ), but to p(φ) ∝ 5 dφ 5 where 5 dφ 5 is the Jacobian

– Note: Jacobian ensures area under curve (probability) in a specified inter-


val (θ1, θ2 ) is preserved under the transformation → same area in interval
(φ1 = g(θ1), φ2 = g(θ2))

4-18 4-19
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Proper approximations to Uniform(−∞, ∞) prior: Jeffreys’ invariance priors


Consider 1-to-1 transformation of θ : φ = g(θ), e.g. φ = 1 + θ3
• p(θ) = Uniform(a, b) where a and b specify an appropriately wide range, 5 5
5 dθ 5
e.g. Uniform(−1000, 1000) Transformation of variables: p(θ) is equivalent to p(φ) = p(θ = g −1(φ)) 5 dφ 5

• p(θ) = N(0, V ) where V is an appropriately large value for the variance, Jeffreys proposed defining a non-informative prior for θ as p(θ) ∝ I(θ)1/2 where
e.g. N(0, 100000) I(θ) is Fisher information for θ
6 2 7 8' (2 9
∂ log p(X|θ) ∂ log p(X|θ)
• Recall that WinBUGS parameterises Normal in terms of mean and precision, I(θ) = −IEX|θ = IEX|θ
so vague normal prior will be, e.g. theta ~ dnorm(0, 0.00001) ∂θ2 ∂θ

• Fisher Information measures curvature of log likelihood

• High curvature occurs wherever small changes in parameter values are asso-
ciated with large changes in the likelihood
– Jeffreys’ prior gives more weight to these parameter values
– data provide strong information about parameter values in this region
– ensures data dominate prior everywhere

• Jeffreys’ prior is invariant to reparameterisation because


5 5
5 dθ 5
I(φ)1/2 = I(θ)1/2 55 55

4-20 4-21

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Examples of Jeffreys’ priors Some recommendations

• Normal case: unknown mean m, known variance v Distinguish


Sample x1 , . . . , xn from N (m, v)
& (xi − m)2 • primary parameters of interest in which one may want minimal influence of
log p(x|m) = − +C ⇒ I(m) = n/v priors
2v
So Jeffreys’ prior for m is ∝ 1, i.e. the Uniform distribution • secondary structure used for smoothing etc. in which informative priors may
• Normal case: known mean m, unknown variance v, with s = (xi − m)2 be more acceptable
)

s n
log p(x|v) = −n/2 log v − ⇒ I(v) = Prior best placed on interpretable parameters
2v 2v 2
v −1
So Jeffreys’ prior for v is ∝ Great caution needed in complex models that an apparently innocuous uniform
This improper distribution is approximated by a Gamma(), )) distribution with prior is not introducing substantial information
)→0
Note: p(v) ∝ v −1 is equivalent to a uniform prior on log v ‘There is no such thing as a ‘noninformative’ prior. Even improper priors give
information: all possible values are equally likely’ (Fisher, 1996)

4-22 4-23
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Location parameters (e.g. means, regression coefficients) Scale parameters

• Uniform prior on a wide range, or a Normal prior with a large variance can • Sample variance σ 2: standard ‘reference’ (Jeffreys’) prior
be used, e.g.
1
p(σ 2 ) ∝ ∝ Gamma(0,0)
θ ∼ Unif(−100, 100) theta ~ dunif(-100, 100) σ2
θ ∼ Normal(0, 100000) theta ~ dnorm(0, 0.00001) p(log(σ)) ∝ Uniform(−∞, ∞)
Prior will be locally uniform over the region supported by the likelihood
• Note that Jeffreys’ prior on the inverse variance (precision), τ = σ −2 is also
– ! remember that WinBUGS parameterises the Normal in terms of mean and
1
precision so a vague Normal prior will have a small precision p(τ ) ∝
∝ Gamma(0, 0)
τ
– ! ‘wide’ range and ‘small’ precision depend on the scale of measurement which may be approximated by a ‘just proper’ prior
of θ
τ ∼ Gamma(), ))
This is also the conjugate prior and so is widely used as a ‘vague’ proper prior
for the precision of a Normal likelihood
In BUGS language: tau ~ dgamma(0.001, 0.001)
or alternatively tau <- 1/exp(logsigma2); logsigma2 ~ dunif(-100, 100)

Sensitivity analysis plays a crucial role in assessing the impact of particular


prior distributions, whether elicited, derived from evidence, or reference, on the
conclusions of an analysis.

4-24 4-25

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Example: Trade union density


Informative priors
(Western and Jackman, 1994)
• An informative prior expresses specific, definite information about a variable
• Example of regression analysis in comparative research
• Example: a prior distribution for the temperature at noon tomorrow
• What explains cross-national variation in union density?
– A reasonable approach is to make the prior a normal distribution with
mean equal to today’s noontime temperature, with variance equal to the • Union density is defined as the percentage of the work force who belongs to
day-to-day variance of atmospheric temperature a trade union

• Posterior from one problem (today’s temperature) becomes the prior for an- • Competing theories:
other problem (tomorrow’s temperature)
– Wallerstein: union density depends on the size of the civilian labour force
(LabF)
• Priors elicited from experts can be used to take account of domain-specific
knowledge, judgement, experience – Stephens: union density depends on industrial concentration (IndC)
– Note: These two predictors correlate at -0.92.
• Priors can also be used to impose constraints on variables (e.g. based on
physical or assumed properties) and bound variables to plausible ranges

4-26 4-27
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

• Data: n = 20 countries with a continuous history of democracy since World Trace plots, posterior estimates and MC error for regression coefficients
War II

• Variables: Union density (Uden), (log) labour force size (LabF), industrial
Without centering covariates
mean sd MC
concentration (IndC), left wing government (LeftG), measured in late 1970s b3 chains 1:2 error
100.0
• Fit linear regression model to compare theories b0 61.7 62.8 5.19
50.0
Udeni ∼ N(µi , σ ) 2 0.0 b1 0.27 0.08 .002
µi = b0 + b1(LeftGi − LeftG) + b2(LabFi − LabF) + b3(IndCi − IndC) -50.0 b2 -4.14 4.18 0.34
Vague priors: 1001 2000 4000 6000
iteration b3 12.1 20.6 1.67
1/σ 2 ∼ Gamma(0.001, 0.001)
b0 ∼ N(0, 100000)
b1 ∼ N(0, 100000) With centered covariates
mean sd MC
b2 ∼ N(0, 100000) b3 chains 1:2 error
b3 ∼ N(0, 100000) 100.0
b0 54.0 2.48 0.02
50.0
0.0 b1 0.27 0.08 .001
-50.0 b2 -6.33 3.96 0.13
1001 2000 4000 6000
iteration b3 0.98 20.2 0.67

4-28 4-29

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Posterior distribution of regression coefficients Motivation for Bayesian approach with informative priors

• Because of small sample size and multicollinear variables, not able to adjudi-
box plot: b cate between theories
[3]
40.0 • Data tend to favour Wallerstein (union density depends on labour force size),
but neither coefficient estimated very precisely

20.0 • Other historical data are available that could provide further relevant infor-
mation
[1] [2]
0.0 • Incorporation of prior information provides additional structure to the data,
which helps to uniquely identify the two coefficients
-20.0

-40.0

LeftG LabF IndC

4-30 4-31
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Wallerstein informative prior Stephens informative prior

• Believes in negative labour force effect • Believes in positive industrial concentration effect

• Comparison of Sweden and Norway in 1950: • Decline in industrial concentration in UK in 1980s:


– doubling of labour force corresponds to 3.5-4% drop in union density – drop of 0.3 in industrial concentration corresponds to about 3% drop in
– on log scale, labour force effect size ≈ −3.5/ log(2) ≈ −5 union density
– industrial concentration effect size ≈ 3/0.3 = 10
• Confidence in direction of effect represented by prior SD giving 95% interval
that excludes 0 • Confidence in direction of effect represented by prior SD giving 95% interval
b2 ∼ N(−5, 2.52) that excludes 0
b3 ∼ N(10, 52)
• Vague prior assumed for IndC effect, b3 ∼ N(0, 100000)
• Vague prior assumed for IndC effect, b3 ∼ N(0, 100000)

4-32 4-33

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Both Wallerstein and Stephens priors Posterior distribution of regression coefficients under different priors

• Both believe left-wing governments assist union growth


Effect of Labour Force Size Effect of Industrial Concentration
• Assuming 1 year of left-wing government increases union density by about (Wallerstein hypothesis) (Stephens hypothesis)
1% translates to effect size of 0.3
box plot: b2 box plot: b3
• Confidence in direction of effect represented by prior SD giving 95% interval
that excludes 0 5.0 50.0

b1 ∼ N(0.3, 0.152)
0.0
• Vague prior b0 ∼ N(0, 100000) assumed for intercept
-5.0 0.0

-10.0

-15.0 -50.0

b2 (LabF): Vague Info Vague b2 (LabF): Vague Info Vague


b3 (IndC): Vague Vague Info b3 (IndC): Vague Vague Info

4-34 4-35
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Comments
Multivariate responses
• Effects of LabF and IndC estimated more precisely
• In many applications, it is common to collect data on a number of different
• Both sets of prior beliefs support inference that labour-force size decreases outcomes measured on the same units, e.g.
union density
– sample survey, where respondents asked several different questions
• Only Stephens prior supports conclusion that industrial concentration in- – experiment with several different outcomes measured on each unit
creases union density
• May wish to fit regression model to each response
• Choice of prior is subjective – if no consensus, can we be satisfied that data
– May have different covariates in each regression
have been interpreted fairly?
– But errors may be correlated
– Sensitivity to priors (e.g. repeat analysis using priors with increasing – Might also wish to impose cross-equation parameter restrictions
variance) — see Practical exercises
– Sensitivity to data (e.g. residuals, influence diagnostics) — see later → Seemingly Unrelated Regressions (SUR) (Zellner, 1962)
lecture
Bayesian approach to SUR models → model vector of response for each unit as
multivariate normal (could also have robust version using multivariate t)

Possible to extend Bayesian SUR models to binary, categorical, count responses


using multivariate latent variable approach.

4-36 4-37

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Example: Analysis of compositional data • Two main modelling strategies, treating vector of proportions as the data
(sufficient statistics)
• Compositional data are vectors of proportions pi = (pi1, ..., piJ ) representing
relative contributions of each of J categories to the whole, e.g. – Model pi using a Dirichlet likelihood (multivariate generalisation of a beta
distribution)
– Proportion of income spent on different categories of expenditure ∗ assumes the ratios of “compositions” (i.e. proportions) are independent
– Proportion of electorate voting for different political parties – Multivariate logistic normal model (Aitchison, 1986) — apply additive log
– Relative abundance of different species in a habitat ratio (alr) transformation, yij = log(pij /piJ ) and model y i as multivariate
normal
– Chemical composition of a rock or soil sample
∗ allows dependence between ratios of proportions
– Proportion of deaths from different causes in a population
∗ this can be thought of as a type of SUR model
• Regression models for compositional data must satisfy two constraints
• To allow for sampling variability in the observed counts (including zero counts),
0 ≤ pij ≤ 1 model counts (rather than proportions) as multinomial (see later)
&
pij = 1
j

4-38 4-39
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Multivariate logistic normal model Example: British General Election 1992

Define • Data originally analysed by Katz and King (1999), and formulated as BUGS
yij = log(pij /piJ ) example by Simon Jackman
the log ratios of proportions in each category relative to a reference category J
) • Data consist of vote proportions for Conservative (j=1), Labour (j=2) and
Note that piJ = 1 − j-=J pij , so Lib-Dem (j=3) parties from 1992 General Election for each of 521 con-
exp yij stituencies
pij = )
1+ j-=J exp yij • Additive log ratio transformation applied to proportions, taking Lib-Dem vote
as reference category
Since yij are unconstrained, can model vector y i = {yij , j -= J} as multivariate
normal • Covariates include lagged values of the log ratios from previous election, and
indicators of the incumbency status of each party’s candidate

4-40 4-41

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

BUGS model code Priors on precision matrix of multivariate normal


The multivariate generalisation of the Gamma (or χ2) distribution is the Wishart
for(i in 1:521){
distribution, which arises in classical statistics as the distribution of the sum-of-
y[i,1:2] ~ dmnorm(mu[i,], prec[ , ])
squares-and-products matrix in multivariate normal sampling.
for(j in 1:2){
mu[i,j] <- beta[j,1]*x[i,1] + beta[j,2]*x[i,2] + beta[j,3]*x[i,3] + The Wishart distribution Wp(k, R) for a symmetric positive definite p × p matrix
beta[j,4]*x[i,4] + beta[j,5]*x[i,5] + beta[j,6]*x[i,6] Ω has joint density function proportional to
}
|R|k/2|Ω|(k−p−1)/2 exp (−(1/2)tr(RΩ))
}
in terms of two parameters: a real scalar k > p − 1 and a symmetric positive
## priors for elements of precision matrix definite matrix R. The expectation of this distribution is
prec[1:2,1:2] ~ dwish(R[,],k) E[Ω] = kR−1
R[1,1] <- .01; R[1,2] <- 0; R[2,1] <- 0; R[2,2] <- .01; k <- 2
When the dimension p is 1, i.e. reverting back to univariate case, it is easy to
# convert precision to covariance matrix show that the Wishart distribution becomes the more familiar:
sigma[1:2,1:2] <- inverse(prec[ , ])
rho <- sigma[1,2]/sqrt(sigma[1,1]*sigma[2,2]) # correlation W1 (k, R) ≡ Gamma(k/2, R/2) ≡ (χ2k )/R

## Priors for regression coefficients If we use the Wishart distrubution as a prior distribution for a precision matrix Ω
for(j in 1:2){ in sampling from Np (µ, Ω−1 ), we find, generalising the univariate case above, that
for(k in 1:6) { we get the same form for the posterior for Ω – another Wishart distribution.
beta[j,k] ~ dnorm(0, 0.000001) In view of the result above for the expectation of the Wishart distribution, we
} usually set (1/k)R to be a prior guess at the unknown true variance matrix. A
} common choice is to take k = p.

4-42 4-43
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Note Interpretation of model parameters

• In BUGS language, you must specify the dimension of vectors or arrays on • Interpretation of parameter estimates on a multivariate log-odds scale is dif-
the left hand side of multivariate distributions ficult

• In above example, each row (observation) of the 521 × 2 matrix y is a vector • Using the inverse alr transformation, easy to recover estimates of expected
of length 2, hence y[i,1:2] ~ dmnorm..... proportions or predicted counts in different categories

• Likewise, prec is a 2 × 2 matrix, hence prec[1:2, 1:2] ~ dwish..... • Effect of covariates can be examined by calculating difference or ratio of
expected proportions for different values of the covariate
• You cannot specify the dimension to be a parameter (even if the value of
the parameter is specified elsewhere in the code or data file), e.g. – Using MCMC, easy to obtain uncertainty intervals for such contrasts

J <- 2 • Example: effect of incumbency on expected proportion of votes for each


prec[1:J, 1:J] ~ dwish(R[,], k) party
– For party j, calculate expected alr-transformed proportion for two values
will give an error at compilation of incumbency: (1) party j’s candidate is incumbent; (2) open seat (no
candidate is incumbent)
• You do not need to specify the dimension of vectors or arrays on the right – Hold values of all other covariates constant, e.g. at their means
hand side of distribution statements, e.g. dimension of R[,] is not specified – Use inverse alr transformation to obtain expected proportions under each
above (the dimension is implicit from dimension of left hand side) incumbency value, and take difference

4-44 4-45

Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

BUGS code for calculating incumbency effects Results


for(j in 1:2){
# value of mu with Conservative incumbent and average values of other variables Posterior mean 95% CI
mu.con[j] <- beta[j,1]*mean(x[,1]) + beta[j,2]*mean(x[,2]) + Expected vote (Cons), area 1 44.8% (44.2%, 45.4%)
beta[j,3]*mean(x[,3]) + beta[j,4]*1
Expected vote (Lab), area 1 46.1% (45.3%, 46.8%)
# value of mu with Labour incumbent and average values of other variables Expected vote (LibDem), area 1 9.1% (8.7%, 9.5%)
mu.lab[j] <- beta[j,1]*mean(x[,1]) + beta[j,2]*mean(x[,2]) +
beta[j,3]*mean(x[,3]) + beta[j,5]*1 incumbency advantage (Cons) −0.06% (−0.75%, 0.70%)
incumbency advantage (Lab) −0.30% (−1.6%, 1.0%)
# value of mu with open seat and average values of other variables
mu.open[j] <- beta[j,1]*mean(x[,1]) + beta[j,2]*mean(x[,2]) + beta[j,3]*mean(x[,3]) incumbency advantage (LibDem) 8.6% (5.1%, 12.3%)
ρ (correlation between log ratio 0.87 (0.85, 0.89)
# expected proportions
exp.mu.con[j] <- exp(mu.con[j]); p.con[j] <- exp.mu.con[j]/(1 + sum(exp.mu.con[])) for Cons:LibDem and Lab:LibDem)

exp.mu.lab[j] <- exp(mu.lab[j]); p.lab[j] <- exp.mu.lab[j]/(1 + sum(exp.mu.lab[])) Note: results for incumbency advantage agree with those in Tomz et al (2002)
but not with Katz and King, who analysed data from 10 consecutive elections
exp.mu.open[j] <- exp(mu.open[j]); p.open[j] <- exp.mu.open[j]/(1 + sum(exp.mu.open[])) and used empirical Bayes shrinkage priors on the β coefficients across years
}

# difference in expected proportions due to incumbency


incumbency.con <- p.con[1] - p.open[1]
incumbency.lab <- p.lab[2] - p.open[2]

4-46 4-47
Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS

Multivariate t likelihood BUGS code for multivariate t likelihood

• Multivariate logistic normal for compositional data relies on assumption that • Only need to change 2 lines of code
the log ratios are approximately multivariate normal
1. Likelihood:
• Katz and King (1999) argue that this assumption is not appropriate for British ## y[i,1:2] ~ dmnorm(mu[i,1:2], prec[ , ])
election data y[i,1:2] ~ dmt(mu[i,1:2], prec[ , ], nu)
2. Specify either fixed value for degrees of freedom, nu, or a suitable prior
– majority of constituencies tend to be more clustered, and a minority more
## nu <- 4
widely dispersed, than the multivariate normal implies
nu ~ dunif(2, 250)
• K&K propose replacing multivariate normal by a heavier-tailed multivariate
student t distribution

• Multivariate t has 3 parameters: p-dimensional mean vector, p × p inverse


scale (precision) matrix and a scalar degrees of freedom parameter

• A Wishart prior can be used for the inverse scale matrix

• Degrees of freedom parameter can either be fixed, or assigned a prior distri-


bution

• Note: as degrees of freedom → ∞, t → Normal

4-48 4-49

Introduction to Bayesian Analysis and WinBUGS

Results

Normal Student t
Expected vote (Cons), area 1 44.8% (44.2%, 45.4%) 44.7% (44.2%, 45.2%)
Expected vote (Lab), area 1 46.1% (45.3%, 46.8%) 46.4% (45.7%, 47.0%)
Expected vote (LibDem), area 1 9.1% (8.7%, 9.5%) 8.8% (8.6%, 9.3%)
incumbency advantage (Cons) −0.06% (−0.75%, 0.70%) 0.00% (-0.65%, 0.60%)
incumbency advantage (Lab) −0.30% (−1.6%, 1.0%) −0.50% (−1.7%, 0.70%)
incumbency advantage (LibDem) 8.6% (5.1%, 12.3%) 2.3% (−2.0%, 8.3%)
ρ 0.87 (0.85, 0.89) 0.87 (0.85, 0.90)
ν – 4.5 (3.4, 5.9)
DIC −480 −1220
pD 14.5 11.8

4-50

You might also like