0% found this document useful (0 votes)

88 views

An Introduction To Bayesian Statistics and MCMC Methods

- The document provides an introduction to Bayesian statistics and Markov chain Monte Carlo (MCMC) methods. - It discusses the key differences between Bayesian and frequentist statistics, including the use of prior distributions to represent uncertainty about unknown parameters in Bayesian models. - The document gives examples of how Bayes' theorem is used to update prior distributions into posterior distributions based on observed data, including diagnostic testing and modeling means of normal distributions.

Uploaded by

Issaka MAIGA

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views

An Introduction To Bayesian Statistics and MCMC Methods

Uploaded by

Issaka MAIGA

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

An

Introduction to MCMC
methods and Bayesian Statistics
What will we cover in this first session?

• What is Bayesian Statistics? (as opposed to classical or frequentist
statistics)
• What is MCMC estimation?
• MCMC algorithms and Gibbs Sampling
• MCMC diagnostics
• MCMC Model comparisons
WHAT IS BAYESIAN STATISTICS?
Why do we need to know about Bayesian statistics?

• The rest of this workshop is primarily about MCMC methods which
are a family of estimation methods used for fitting realistically complex
models.
• MCMC methods are generally used on Bayesian models which have
subtle differences to more standard models.
• As most statistical courses are still taught using classical or
frequentist methods we need to describe the differences before going
on to consider MCMC methods.
Bayes Theorem

Bayesian statistics named after Rev. Thomas Bayes (1702‐1761)
Bayes Theorem for probability events A and B
p ( B | A) p ( A)
p( A | B) 
p( B)
Or for a set of mutually exclusive and exhaustive events (i.e.
p ( Ai )   p ( Ai )  1 ) :
i i

p ( B | Ai ) p ( Ai )
p ( Ai | B) 
 j p( B | A j ) P( A j )
Example – coin tossing

Let A be the event of 2 Heads in three tosses of a fair coin. B be
the event of 1st coin is a Head.
Three coins have 8 equally probable patterns
{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}
A = {HHT,HTH,THH} →p(A)=3/8
B = {HHH,HTH,HTH,HTT} →p(B)=1/2
A|B = {HHT,HTH}|{HHH,HTH,HTH,HTT} →p(A|B)=1/2
B|A = {HHT,HTH}|{HHT,HTH,THH} →p(B|A)=2/3
P(A|B) = P(B|A)P(A)/P(B) = (2/3*3/8)/(1/2) = 1/2
Example 2 – Diagnostic testing

A new HIV test is claimed to have “95% sensitivity and 98%
specificity”
In a population with an HIV prevalence of 1/1000, what is the
chance that a patient testing positive actually has HIV?
Let A be the event patient is truly positive, A’ be the event that
they are truly negative
Let B be the event that they test positive
Diagnostic Testing continued:

Thus over 95% of those testing positive will, in fact, not have HIV.
Being Bayesian!

So the vital issue in this example is how should this test result
change our prior belief that the patient is HIV positive?
The disease prevalence (p=0.001) can be thought of as a ‘prior’
probability.
Observing a positive result causes us to modify this probability to
p=0.045 which is our ‘posterior’ probability that the patient is
HIV positive.
This use of Bayes theorem applied to observables is
uncontroversial however its use in general statistical analyses
where parameters are unknown quantities is more
controversial.
Bayesian Inference

In Bayesian inference there is a fundamental distinction between
• Observable quantities x, i.e. the data
• Unknown quantities θ
θ can be statistical parameters, missing data, latent variables…
• Parameters are treated as random variables
In the Bayesian framework we make probability statements
about model parameters
In the frequentist framework, parameters are fixed non‐random
quantities and the probability statements concern the data.
Prior distributions

As with all statistical analyses we start by positing a model which
specifies p(x| θ)
This is the likelihood which relates all variables into a ‘full
probability model’
However from a Bayesian point of view :
•  is unknown so should have a probability distribution
reflecting our uncertainty about it before seeing the data
• Therefore we specify a prior distribution p(θ)
Note this is like the prevalence in the example
Posterior Distributions

Also x is known so should be conditioned on and here we use Bayes theorem
to obtain the conditional distribution for unobserved quantities given the
data which is known as the posterior distribution.

p ( ) p ( x |  )
p ( | x)   p ( ) p ( x |  )
 p( ) p( x |  )d
The prior distribution expresses our uncertainty about  before seeing the
data.
The posterior distribution expresses our uncertainty about  after seeing the
data.
Examples of Bayesian Inference
using the Normal distribution

Known variance, unknown mean
It is easier to consider first a model with 1 unknown parameter.
Suppose we have a sample of Normal data:
xi ~ N (  ,  2 ), i  1,..., n.

Let us assume we know the variance, 2 and we assume a prior
distribution for the mean,  based on our prior beliefs:
 ~ N (  0 ,  02 )

Now we wish to construct the posterior distribution p(|x).
Posterior for Normal distribution mean
So we have
2  12
p(  )  (2 ) exp( 12 (    0 ) 2 /  02 )
0
2  12
p( xi |  )  (2 ) exp( 12 ( xi   ) 2 /  2 )
and hence
p(  | x)  p(  ) p( x |  )
2  12
 (2 ) exp( 12 (    0 ) 2 /  02 ) 
0
N

 (
i 1
2 ) 2  12
exp(  1
2 ( xi   ) 2
/  2
)

 exp( 12  2 (1 /  02  n /  2 )   (  0 /  02   xi / 2 )  cons)
i
Posterior for Normal distribution mean (continued)

For a Normal distribution with response y with mean  and

variance  we have
 12
f ( y )  (2 ) exp{ 12 ( y   ) 2 / }
 exp{ y   y /   cons}
1
2
2 1

We can equate this to our posterior as follows:

 exp(  12  2 (1 /  02  n /  2 )   (  0 /  02   x i / 2 )  cons )
i

   (1 /  02  n /  2 ) 1 and    (  0 /  02   x i / 2 )
i
Precisions and means
In Bayesian statistics the precision = 1/variance is often more
important than the variance.
For the Normal model we have

1 /   (1 /  02  n /  2 ) and    (  0 /  02  x /( 2 / n))

In other words the posterior precision = sum of prior

precision and data precision, and the posterior mean
is a (precision weighted) average of the prior mean
and data mean.
Large sample properties

As n  
Posterior precision
1 /   (1 /  02  n /  2 )  n /  2
So posterior variance   2
/n
Posterior mean    (  0 /  02  x /( 2 / n))  x
And so posterior distribution
p (  | x )  N ( x ,  2 / n)
p ( x |  )  N (  ,  2 / n)
Compared to
in the frequentist setting
Girls Heights Example

10 girls aged 18 had both their heights and weights measured.
Their heights (in cm) were as follows:
169.6,166.8,157.1,181.1,158.4,165.6,166.7,156.5,168.1,165.3
We will assume the population variance is known to be 50.
Two individuals gave the following prior distributions for the
mean height:
Individual 1 p1 (  ) ~ N (165,2 2 )
Individual 2 p (  ) ~ N (170,32 )
2
Constructing posterior 1

To construct the posterior we use the formulae we have just
calculated

From the prior, 0  165 ,  0 4
2

From the data, x  165.52,  2  50, n  10

The posterior is therefore
p (  | x) ~ N (1 , 1 )
where 1  (  )  2.222,
1
4
10 1
50

1  1 ( 165
4  50 )  165.23.
1655.2
Prior and posterior comparison
Constructing posterior 2

• Again to construct the posterior we use the earlier formulae
we have just calculated
• From the prior,  0  170,  02  9
• From the data, x  165.52,  2  50, n  10
• The posterior is therefore
p (  | x) ~ N ( 2 , 2 )
where 2  (  )  3.214,
1
9
10 1
50

 2  2 ( 170
9  50 )  167.12.
1655.2
Prior 2 comparison
Note this prior is not as close to the data as prior 1 and hence
posterior is somewhere between prior and likelihood.
Other conjugate examples

When the posterior is in the same family as the prior we have
conjugacy. Examples include:

Likelihood Parameter Prior Posterior

Normal Mean Normal Normal

Normal Precision Gamma Gamma

Binomial Probability Beta Beta

Poisson Mean Gamma Gamma

In all cases
The posterior mean is a compromise between the prior mean
and the MLE
The posterior s.d. is less than both the prior s.d. and the s.e.
(MLE)
‘A Bayesian is one who, vaguely expecting a horse and catching a
glimpse of a donkey, strongly concludes he has seen a mule’
(Senn)
As n  
The posterior mean  the MLE
The posterior s.d.  the s.e. (MLE)
The posterior does not depend on the prior.
Non‐informative priors

We often do not have any prior information, although true
Bayesian’s would argue we always have some prior
information!
We would hope to have good agreement between the
frequentist approach and the Bayesian approach with a non‐
informative prior.
Diffuse or flat priors are often better terms to use as no prior is
strictly non‐informative!
For our example of an unknown mean, candidate priors are a
Uniform distribution over a large range or a Normal
distribution with a huge variance.
Point and Interval Estimation

In Bayesian inference the outcome of interest for a parameter is
its full posterior distribution however we may be interested in
summaries of this distribution.
A simple point estimate would be the mean of the posterior.
(although the median and mode are alternatives.)
Interval estimates are also easy to obtain from the posterior
distribution and are given several names, for example credible
intervals, Bayesian confidence intervals and Highest density
regions (HDR). All of these refer to the same quantity.
Credible Intervals

If we consider the heights example with our first prior then our
posterior is
P(μ|x)~ N(165.23,2.222),
and a 95% credible interval for μ is
165.23±1.96×sqrt(2.222) =
(162.31,168.15).
Similarly prior 2 results in a 95% credible interval for μ is
(163.61,170.63).
Note that credible intervals can be interpreted in the more
natural way that there is a probability of 0.95 that the interval
contains μ rather than the frequentist conclusion that 95% of
such intervals contain μ.
MCMC METHODS
How does one fit models in a Bayesian framework?

In the first section we illustrated a use of conjugate priors to evaluate a
posterior distribution for a model with one unknown parameter.
Let us now consider a simple linear regression:
weighti   0  1heighti  ei
ei ~ N (0,  2 )
With conjugate priors: So our goal now is to make
inferences on the joint
 0 ~ N (0, m0 ), 1 ~ N (0, m1 ), posterior distribution:

 2 ~  1 ( ,  ) p (  0 , 1 ,  2 | y )
where m0  m1  10 6 ,   10 3
MCMC Methods

Goal: To sample from joint posterior distribution:
p (  0 , 1 ,  2 | y )
Problem: For complex models this involves multidimensional
integration
Solution: It may be possible to sample from conditional posterior
distributions,
p (  0 | y, 1 ,  2 ), p( 1 | y,  0 ,  2 ), p( 2 | y,  0 , 1 )

It can be shown that after convergence such a sampling
approach generates dependent samples from the joint
posterior distribution.
Gibbs Sampling

When we can sample directly from the conditional posterior
distributions then such an algorithm is known as Gibbs
Sampling.
This proceeds as follows for the linear regression example:
Firstly give all unknown parameters starting values,
 0 (0), 1 (0),  2 (0).
Next loop through the following steps:
Gibbs Sampling ctd.
Sample from
p (  0 | y, 1 (0),  (0)) to generate  0 (1) and then from
2

p ( 1 | y,  0 (1),  2 (0)) to generate 1 (1) and then from

p ( 2 | y,  0 (1), 1 (1)) to generate  2 (1).

These steps are then repeated with the generated

values from this loop replacing the starting values.
The chain of values produced by this procedure is
known as a Markov chain, and it is hoped that this
chain converges to its equilibrium distribution which
is the joint posterior distribution.
Calculating the conditional distributions

In order for the algorithm to work we need to sample from the
conditional posterior distributions.
If these distributions have standard forms then it is easy to draw
random samples from them.
Mathematically we write down the full posterior and assume all
parameters are constants apart from the parameter of
interest.
We then try to match the resulting formulae to a standard
distribution.
The next 4 slides will probably be skipped but are in the talk for
reference purposes
Note the new STAT‐JR software gives these derivations!
Matching distributional forms

If a parameter θ follows a Normal(μ,σ2) distribution then we can
write
p ( )  exp(a  b  const )
2

1 
where a   2 and b  2
2 
Similarly if θ follows a Gamma(α,β) distribution then we can
write
p ( )   a exp(b )
where a    1 and b   
Step 1: β0
p (  0 | y, 1 ,  2 )  p (  0 ) p( y |  0 , 1 ,  2 )
1  02  1 1 
 exp( )  exp( 2 ( yi   0  xi 1 ) 2 )
m0 2m0 i   2 2 
 1 N 1 
 exp ( 12 (  2 ))  02  2
m0  
i i i 1 0
( y  x  )  const 
 
Matching powers gives
1
1 1 N 1 N
  2   20    2 
 2 0
m0   m0  
1
1 N 1
2 
  0   20 b    2  ( yi  xi 1 )
 m0    i
1 2 
as m0  ,  0 ~ N   ( yi  xi 1 ), 
N i N 
Step 2: β1
p ( 1 | y,  0 ,  2 )  p ( 1 ) p ( y |  0 , 1 ,  2 )
1 12  1 1 
 exp( )  exp( 2 ( yi   0  xi 1 ) 2 )
m1 2m1 i   2 2 
 1  x 2
1 
 exp ( 12 (  i 2 ))  02  2 i ( yi  0 ) xi 1  const 
i

 m1   
Matching powers gives
1
1 i x  1 i xi2 
2
1
     
i
2 
2
1
 2 1
m1 2  1
m  
1
 1  x 2
 1
 1   1 b   
2


i i
2   (x ( y   0 ))
 
2 i i
 1
m i

 i xi yi   0 i xi  2 
as m1  , 1 ~ N  , 

 i xi 2
i xi 
2
Step 3: 1/σ2

p (1 /  2 | y,  0 , 1 )  p (1 /  2 ) p ( y |  0 , 1 ,  2 )
 1
 1      1 1 2 
 2  exp  2   exp( 2 ( yi   0  xi 1 ) )
     i   2 2 
N
 1
 1   1 2
( 2  ( yi  0 xi 1 ) 
2
 2  exp    1
 
   
2
i 
Matching terms gives
p (1 /  2 | y,  0 , 1 ) ~ (a, b)
where a    N2 , b    12 i ei2
Algorithm Summary

Repeat the following three steps
1. Generate β0 from its Normal conditional distribution.
2. Generate β1 from its Normal conditional distribution.
3. Generate 1/σ2 from its Gamma conditional distribution
Convergence and burn‐in
Two questions that immediately spring to mind are:
1. We start from arbitrary starting values so when can we
safely say that our samples are from the correct
distribution?
2. After this point how long should we run the chain for and
store values?
MCMC DIAGNOSTICS
Checking Convergence
This is the researchers responsibility!
Convergence is to a target distribution (the required posterior),
not to a single value as in ML methods.
Once convergence has been reached, samples should look like a
random scatter about a stable mean value.
beta[1]

9.0
8.5
8.0
7.5
7.0

1 250 500 750 1000

iteration

Convergence occurs here at around 100

iterations.
How many iterations after convergence?

After convergence, further iterations are needed to
obtain samples for posterior inference.
More iterations = more accurate posterior estimates.
MCMC chains are dependent samples and so the
dependence or autocorrelation in the chain will
influence how many iterations we need.
Accuracy of the posterior estimates can be assessed by
the Monte Carlo standard error (MCSE) for each
parameter.
MCMC diagnostics in MLwiN ‐ Example

We will describe each pane separately – Note Stat-JR has similar six way plots!
Trace plot

This graph plots the generated values of the parameter

against the iteration number.

A crude test of mixing is the ‘blue finger’ test.

This chain doesn’t mix that well but could be worse!

Kernel Density plot

This plot is like a smoothed histogram.

Instead of counting the estimates into bins of particular widths like a
histogram, the effect of each iteration is spread around the estimate via a
Kernel function e.g. a normal distribution.

This means that at each point we get the sum of the Kernel function parts for
each iteration.

The Kernel density plot has a smoothness parameter that can be modified.
Time series diagnostics

Here we have the Auto correlation function (ACF) and partial autocorrelation
function (PACF) plots.
The ACF measures how correlated the values in the chain are with their close
neighbours. The lag is the distance between the two chains to be compared.
An independent chain will have approximately zero autocorrelation at each lag.
A Markov chain should have a power relationship in the lags i.e. if ACF(1) = 
then ACF(2) = 2 etc. This is known as an AR(1) process.
The PACF measures discrepancies from such a process and so should normally
have values 0 after lag 1.
Accuracy Diagnostics

MLwiN has 2 accuracy diagnostics:
• Raftery‐Lewis works on quantiles of distribution (Not given in
Stat‐JR).
• Brooks‐Draper works on quoting the mean to n significant
figures. It’s formulae uses the estimate, it’s s.d. and the ACF
and it can often give very small or very large values! Available
from SummaryStats in Stat‐JR.
Summary Statistics

Three estimates of location are given:
• Mean – from the chain.
• Mode – from the kernel plot.
• Median (50% quantile) – by sorting the chain and finding the middle value.
The SD is calculated from the chain and the other quantiles are used to give
(possibly) non‐symmetric interval estimates.
The MCSE and ESS will be discussed next.
Monte Carlo Standard Error

The Monte Carlo Standard Error (MCSE) is an indication of how much error is
in the estimate due to the fact that MCMC is used.
As the number of iterations increases the MCSE0.
For an independent sampler it equals the SD/n.
However it is adjusted due to the autocorrelation in the chain.
The graph above gives estimates for the MCSE for longer runs.
Effective Sample Size

This quantity gives an estimate of the equivalent number of
independent iterations that the chain represents.
This is related to the ACF and the MCSE.

Its formula is:
n /  where   1  2  ( k ).
k 1

For this parameter our 5,000 actual iterations are equivalent to
only 344 independent iterations!
Inference using posterior samples from MCMC runs

A powerful feature of MCMC and the Bayesian approach is that
all inference is based on the joint posterior distribution.
We can therefore address a wide range of substantive questions
by appropriate summaries of the posterior.
Typically report either the mean or median of the posterior
samples for each parameter of interest as a point estimate
2.5% and 97.5% percentiles of the posterior sample for each
parameter give a 95% posterior credible interval (interval
within which the parameter lies with probability 0.95)
Derived Quantities

Once we have a sample from the posterior we can answer lots of
questions simply by investigating this sample.
Examples:
What is the probability that θ>0?
What is the probability that θ1> θ2?
What is a 95% interval for θ1/(θ1+ θ2)?
MODEL COMPARISON
Model Comparison in MCMC

In frequentist statistics there are many options including:
• Likelihood ratio (deviance) tests
• Wald Tests
• Information Criterion – e.g. AIC/BIC
Here we look at a criterion that can be used with MCMC and
which for a linear regression model is equivalent to the AIC –
the Deviance information criterion (DIC).
DIC

A natural way to compare models is to use a criterion based on a trade‐off
between the fit of the data to the model and the corresponding
complexity of the model.
DIC does this in a Bayesian way.
DIC = ‘goodness of fit’ + ‘complexity’.
Fit is measured by deviance D( )  2 log L(data |  )

Complexity is measured by an estimate of the ‘effective number of
parameters’ defined as
pD  E | y [ D]  D( E | y [ ])
 D  D ( )
i.e. Posterior mean deviance minus the deviance evaluated at the posterior
mean of the parameters.
DIC (continued)

The DIC is then defined analagously to AIC as

DIC  D( )  2 pD
 D  pD
Models with smaller DIC are better supported by the data.
• DIC is available in Stat‐JR in the ModelResults object.
• DIC can be monitored in other packages such as MLwiN under
the Model/MCMC menu and WinBUGS from (Inference/DIC
menu).
Deviance Information Criterion
• Diagnostic for model comparison
• Goodness of fit criterion that is penalized for model complexity
• Generalization of the Akaike Information Criterion (AIC; where df is known)
• Used for comparing non‐nested models (eg same number but different variables)
• Valuable in MLwiN for testing improved goodness of fit of non‐linear model (eg Logit)
because Likelihood (and hence Deviance is incorrect)
• Estimated by MCMC sampling; on output get

Bayesian Deviance Information Criterion (DIC)
Dbar D(thetabar) pD DIC
9763.54 9760.51 3.02 9766.56

Dbar: the average deviance from the complete set of iterations
D(thetaBar): the deviance at the expected value of the unknown parameters
pD: the Estimated degrees of freedom consumed in the fit, ie Dbar‐
D(thetaBar)
DIC: Fit + Complexity; Dbar + pD
NB lower values = better parsimonious model
• Somewhat controversial!
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of
the Royal Statistical Society, Series B 64: 583‐640.
Some guidance on DIC

• any decrease in DIC suggests a better model

• But stochastic nature of MCMC; so, with small difference in DIC you
should confirm if this is a real difference by checking the results with
different seeds and/or starting values.
• More experience with AIC, and common rules of thumb………
Example: House price dataset
Sequence of Models (in practical)
• Model 0 Null one‐level model; two parameters a fixed mean and
a variance
• Model 1b: Null model but districts as 50 fixed effects (49 dummies
& constant); separate estimation of each intercept
• Model 1: Null two‐level random intercepts model: differential
intercepts coming from a distribution
• Model 2b: with size‐5 in model, but districts as 50 fixed effects;
separate estimation of each intercept
• Model 2: Same basic model but with districts as random effects;
differential intercepts coming from a distribution
• Model 3b: Differential district slopes and intercepts as 100 fixed
effects; separate estimation of each intercept and slope
for each district.
• Model 3: Random slopes model; quadratic variance function at
level 2; differential intercepts and slopes coming from a
joint multivariate distribution
DIC for Sequence of Models
Model Nominal DF Estimated DF DIC
0: single level 2 2.00 10728.31
1b: 50 district effects 51 50.95 10504.88
1: 2 level Random 51 44.43 10498.74
intercepts
2b 1b+ Size 52 51.93 9874.45
2 : 1 + Size 52 45.27 9868.08
3b: 2b + 49 slopes 101 101.16 9843.51
3: + Random Slopes 101 65.24 9807.47

Model 1b: constant + 49 differential intercepts + level 1 variance = 51

effective parameters.
Model 1: districts as random effects; only 44 parameters as these effects are
estimated as coming from an overall distribution. The nominal 51 parameters
are shrunk due to sharing a prior distribution and therefore do not contribute
whole parameters to the parameter count
Model 3: random intercepts & slopes: Lowest DIC; most parsimonious
parameterization: most strongly favoured by this approach
SUMMARY & COMPARISON WITH
FREQUENTIST APPROACH
Markov chain Monte Carlo (MCMC)

• MCMC methods are Bayesian estimation techniques
which can be used to estimate multilevel models
• MCMC works by drawing a random sample of values
for each parameter from its probability distribution
• The mean and standard deviation of each random
sample gives the point estimate and standard error
for that parameter
Estimating a model using MCMC estimation
• We start by specifying the model and our prior knowledge for
each parameter (nearly always no knowledge!)
• Next we specify initial values for the model parameters
(nearly always the IGLS estimates)
• We then run the MCMC algorithm and obtain the parameter
chains
• We discard the initial burn‐in iterations when the chains are
settling down (converging to their posterior distributions)
• Summary statistics for the remaining monitoring iterations
are then calculated:
– Point estimates and standard errors are given by the
means and standard deviations of the chains
IGLS vs. MCMC (1)

IGLS MCMC
Fast Slow
Uses MQL/PQL approximations to fit Produces unbiased estimates
discrete response models, which can
sometimes produce biased estimates
Cannot incorporate prior information Can incorporate prior information

• Note that in practice we often do not incorporate prior
information
• We want to protect our inferences from being influenced by
our prior beliefs
– True Bayesians have a very different take
IGLS vs. MCMC (2)

IGLS MCMC
Confidence intervals based on Normality not assumed
normality are unreasonable for
variance parameters
Hard to calculate confidence intervals Easy to calculate confidence intervals
for functions of parameters for arbitrarily complex functions of
parameters
Difficult to extend to new models Easy to extend
Model convergence is judged for you You have to judge model convergence
for yourself
IGLS vs. MCMC (3)
IGLS algorithm converges
deterministically to a point
Convergence is therefore
judge for you

MCMC algorithm converges

stochastically to the
equilibrium probability
distribution

You have to judge

convergence for yourself
Priors

• Our prior knowledge for each parameter is
summarised by a probability distribution referred to
as the prior distribution
– Typically, we specify that we have no prior
knowledge as we like the ‘data to speak for it self’
– We therefore specify vague, diffuse or
uninformative priors
1 ~ N (0,10000)  U (, )
MCMC samplers

• At the tth iteration we want to sample from the posterior
distribution of each parameter in turn
– If we can write down an analytical expression for the
posterior distribution then we can use Gibbs sampling
• Computationally efficient algorithm
• Continuous response models
– If we can’t write down an analytical expression for the
posterior then we use Metropolis‐Hastings sampling
• Discrete response models (see later)
Deviance information criterion (DIC) for model
comparison

• DIC can be viewed as an AIC or BIC statistic for

MCMC
• DIC balances goodness of fit and model complexity
(i.e. deviance and number of parameters)
• Want to maximise fit and minimise complexity
– Lower deviance and fewer parameters
• So “better” models have smaller DIC
• Note that the DIC does not have universal approval!
When to consider using
MCMC Estimation

• Discrete response data (categorical, counts). PQL often gives
quick and accurate estimates but a good idea to check against
MCMC, especially if you have small clusters – see later
• If you want to obtain accurate confidence intervals for level 2
variances
• Some complex models only estimated using MCMC
(e.g. multilevel factor analysis)
• Some models can be fitted more easily using MCMC
(e.g. cross‐classified and multiple membership models)

Effective Organisational Communication
100% (2)
Effective Organisational Communication
383 pages
Moment Generating Functions
No ratings yet
Moment Generating Functions
7 pages
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
No ratings yet
Course Notes STATS 325 Stochastic Processes: Department of Statistics University of Auckland
195 pages
Lecture 4
No ratings yet
Lecture 4
161 pages
Statistics Lecture Note Asymptotic Tools
No ratings yet
Statistics Lecture Note Asymptotic Tools
216 pages
215 Final Exam Formula Sheet
No ratings yet
215 Final Exam Formula Sheet
2 pages
Analysis of Anisoles GCMS
No ratings yet
Analysis of Anisoles GCMS
3 pages
SMBI Ch1 - Introduction To Bayesian Statistics
No ratings yet
SMBI Ch1 - Introduction To Bayesian Statistics
92 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Bayesian Lecture Notes
No ratings yet
Bayesian Lecture Notes
28 pages
Likelihood Ratio Tests: Instructor: Songfeng Zheng
No ratings yet
Likelihood Ratio Tests: Instructor: Songfeng Zheng
9 pages
Basis Representation Theorem
No ratings yet
Basis Representation Theorem
25 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
STAT 650 - Foundations of Data Science Syllabus
No ratings yet
STAT 650 - Foundations of Data Science Syllabus
13 pages
Bayes' Theorem: Probability Theory Statistics
No ratings yet
Bayes' Theorem: Probability Theory Statistics
9 pages
Malaria Disease Prediction and Grading System: A Performance Model of Multinomial Naïve Bayes (MNB) Machine Learning in Nigerian Hospitals
No ratings yet
Malaria Disease Prediction and Grading System: A Performance Model of Multinomial Naïve Bayes (MNB) Machine Learning in Nigerian Hospitals
14 pages
Uniformly Most Powerful (UMP) Test
No ratings yet
Uniformly Most Powerful (UMP) Test
2 pages
Stochastic Process
No ratings yet
Stochastic Process
43 pages
3 Basics of Probability
No ratings yet
3 Basics of Probability
84 pages
1MATH - MW - Unit 4.1 (Introductory Topics in Statistics)
100% (1)
1MATH - MW - Unit 4.1 (Introductory Topics in Statistics)
30 pages
SPSS2 Workshop Handout 20200917
No ratings yet
SPSS2 Workshop Handout 20200917
17 pages
Theory of Estimation Ama 4306
No ratings yet
Theory of Estimation Ama 4306
4 pages
Sampling Distributions Coursera
No ratings yet
Sampling Distributions Coursera
8 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Regression Modeling Strategies: Frank E. Harrell, JR
50% (2)
Regression Modeling Strategies: Frank E. Harrell, JR
11 pages
Probability - Statistics and Random Processes by Veerarajan
46% (24)
Probability - Statistics and Random Processes by Veerarajan
14 pages
New Multivariate Time-Series Estimators in Stata 11
100% (1)
New Multivariate Time-Series Estimators in Stata 11
34 pages
151 Practice Final 1
100% (1)
151 Practice Final 1
11 pages
Parameter Searching For Epidemiology Models Using Bayesian Optimization
No ratings yet
Parameter Searching For Epidemiology Models Using Bayesian Optimization
14 pages
The Normal Distribution and Areas Under The Normal Curve
No ratings yet
The Normal Distribution and Areas Under The Normal Curve
48 pages
Mathematical Induction Exercise PDF
No ratings yet
Mathematical Induction Exercise PDF
4 pages
Ed - MathMajor12 Elementary Statistics and PRobability
No ratings yet
Ed - MathMajor12 Elementary Statistics and PRobability
3 pages
Markov Chains - Lectures - CMC - 2024
No ratings yet
Markov Chains - Lectures - CMC - 2024
168 pages
Ch7. Hypothesis Testing
100% (1)
Ch7. Hypothesis Testing
86 pages
Hypothesis Testing - 2 Populations
100% (1)
Hypothesis Testing - 2 Populations
26 pages
Difference Between Logit and Probit Models
100% (1)
Difference Between Logit and Probit Models
7 pages
Random Variables
No ratings yet
Random Variables
57 pages
CT4 Q&A Bank Part 1 Questions
No ratings yet
CT4 Q&A Bank Part 1 Questions
12 pages
Bayes' Theorem and Its Applications
0% (1)
Bayes' Theorem and Its Applications
11 pages
Chapter10 Sampling Two Stage Sampling
No ratings yet
Chapter10 Sampling Two Stage Sampling
21 pages
Geometric Distribution Report
No ratings yet
Geometric Distribution Report
5 pages
Tests of Significance and Measures of Association
No ratings yet
Tests of Significance and Measures of Association
21 pages
Introduction To Probability 1
No ratings yet
Introduction To Probability 1
71 pages
Estimation and Hypothesis
100% (1)
Estimation and Hypothesis
32 pages
Notes For Multivariate Statistics With R
No ratings yet
Notes For Multivariate Statistics With R
189 pages
Exponential Distribution
No ratings yet
Exponential Distribution
19 pages
SPS 2320 Theory of Estimation Year 3 Semester II
No ratings yet
SPS 2320 Theory of Estimation Year 3 Semester II
2 pages
Beta Distribution
No ratings yet
Beta Distribution
9 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
Bio Statistics
No ratings yet
Bio Statistics
174 pages
Bayesian Statistics Primer PDF
No ratings yet
Bayesian Statistics Primer PDF
23 pages
Markov Chain
No ratings yet
Markov Chain
16 pages
Introduction To Statistics: Prepared By: Engr. Gilbey'S Jhon - Ladion Instructor
No ratings yet
Introduction To Statistics: Prepared By: Engr. Gilbey'S Jhon - Ladion Instructor
25 pages
Quartiles, Deciles, Percentiles
100% (1)
Quartiles, Deciles, Percentiles
5 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
3 pages
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
No ratings yet
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
2 pages
Lab06 Confidence Intervals
No ratings yet
Lab06 Confidence Intervals
4 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (1)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Survival Analysis
From Everand
Survival Analysis
Rupert G. Miller, Jr.
No ratings yet
An Introduction to Metric Spaces and Fixed Point Theory
From Everand
An Introduction to Metric Spaces and Fixed Point Theory
Mohamed A. Khamsi
No ratings yet
Stochastic Differential Equations: An Introduction with Applications in Population Dynamics Modeling
From Everand
Stochastic Differential Equations: An Introduction with Applications in Population Dynamics Modeling
Michael J. Panik
No ratings yet
Untitled
No ratings yet
Untitled
5 pages
Rainfall Prediction System: (Peer-Reviewed, Open Access, Fully Refereed International Journal)
No ratings yet
Rainfall Prediction System: (Peer-Reviewed, Open Access, Fully Refereed International Journal)
7 pages
Top 10 Open Source Data Mining Tools: A Brief Look at Mining Tasks
No ratings yet
Top 10 Open Source Data Mining Tools: A Brief Look at Mining Tasks
2 pages
DISS Q1 M2 Social Science As Scientific Study of Society
100% (1)
DISS Q1 M2 Social Science As Scientific Study of Society
23 pages
Self-Control and Forgiveness: A Meta-Analytic Review
No ratings yet
Self-Control and Forgiveness: A Meta-Analytic Review
8 pages
Behavioral Statistics in Action
No ratings yet
Behavioral Statistics in Action
9 pages
SAFMEDS Card List (To Be Shared With All Students) Term or Phrase Full Definition
No ratings yet
SAFMEDS Card List (To Be Shared With All Students) Term or Phrase Full Definition
2 pages
Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo
No ratings yet
Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo
55 pages
Adoption and Outcomes of ISO 14001 - A Systematic Review VER 22.01 20
No ratings yet
Adoption and Outcomes of ISO 14001 - A Systematic Review VER 22.01 20
23 pages
Final Exam January 2019 Ines Barkia PDF
No ratings yet
Final Exam January 2019 Ines Barkia PDF
10 pages
PDF Design and Analysis of Experiments Classical and Regression Approaches With SAS 1st Edition Leonard C. Onyiah Download
100% (6)
PDF Design and Analysis of Experiments Classical and Regression Approaches With SAS 1st Edition Leonard C. Onyiah Download
84 pages
The Scientific Process - The Steps
No ratings yet
The Scientific Process - The Steps
2 pages
Larvicidal Effect of Ampalaya (Momordica Charantia) - 3
No ratings yet
Larvicidal Effect of Ampalaya (Momordica Charantia) - 3
1 page
Cotm 2
No ratings yet
Cotm 2
77 pages
Ch1-Introduction WG1AR5 SOD Ch01 All Final
No ratings yet
Ch1-Introduction WG1AR5 SOD Ch01 All Final
55 pages
7.the Marketing Research Process
No ratings yet
7.the Marketing Research Process
3 pages
Science Key Objectives
No ratings yet
Science Key Objectives
8 pages
Gdwq4 With Add1 Annex4
No ratings yet
Gdwq4 With Add1 Annex4
9 pages
Unit 5-The Research Design
No ratings yet
Unit 5-The Research Design
41 pages
Set 3
No ratings yet
Set 3
3 pages
Chapter 10-Statistical Inference For Two Samples
No ratings yet
Chapter 10-Statistical Inference For Two Samples
30 pages
To test = σ = σ = · · · = σ = At least one pair of σ different
No ratings yet
To test = σ = σ = · · · = σ = At least one pair of σ different
12 pages
Statistics and Probability
No ratings yet
Statistics and Probability
5 pages
LAB 6 - Experimental Design - 231016 - 232108
No ratings yet
LAB 6 - Experimental Design - 231016 - 232108
2 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
14 pages
Interview and Observation 1
No ratings yet
Interview and Observation 1
29 pages
MKT Potential and Sales Forecasting
No ratings yet
MKT Potential and Sales Forecasting
15 pages
Physical Violence
No ratings yet
Physical Violence
4 pages