6 Monte Carlo Simulation: Exact Solution
6 Monte Carlo Simulation: Exact Solution
6 Monte Carlo Simulation: Exact Solution
To begin, the word 'simulation model' is slightly obscure or even misleading. Simulation itself is not a
model, but simulation is used as a technique for approximate computations of a particular model. In
our case, the model is some probability density, that can be multidimensional. Therefore, the purpose
here is posterior simulation. Simulation is naturally computer intensive. That is, someone has to
program an algorithm for doing it. And it can be more laborious than the specication of the model
to be simulated. Perhaps for that reason, attention is (too) easily diverted from the discussion of the
model to the discussion of the algorithm - as the 'simulation model' or 'computer model'. But these
are slightly dierent things. There can be many dierent algorithms for simulating the same model.
The model is intrinsically related to the scientic question, whereas simulation is a tool to compute
something interesting out of the model.
In bayesian inference, simulation is an essential tool because we can then use an innite variety of
models instead of the simple ones that can be solved analytically by using conjugate priors.
Exact solution
Simulation
Asymptotic solution
Figure 1: Model and its computation
(L/2)2
q=
=
2
L
4
Imagine then that we can play with darts and our target is the square so that the darts can fall evenly
randomly all over the square. The probability to hit within the circle is then q = /4. After n darts,
we count how many times we hit the circle. The percentage of hits, qn , is an approximation of the
exact number /4. Therefore,
lim qn = q,
and we can approximate 4qn . This shows the essential principle of Monte Carlo simulation.
The more darts we simulate, the more accurate is our approximation. There would be other ways to
compute approximations of , and they can be much more ecient. Monte Carlo simulation is a tool
that is usually used when no other tool can help, or when we are lazy to think of alternatives and the
computers are conveniently available...
In this example, the model was a 2-dimensional uniform density of variables (X, Y ) over a rectangle
[L/2, L/2] [L/2, L/2], and the probability we approximated was
P (X 2 + Y 2 < r2 ).
Simulation in R could be done as:
X <- runif(1000000)
Y <- runif(1000000)
q <- sum(X^2+Y^2<1)/1000000
4*q
An early example of approximating by simulation is the Buon's needle experiment (Georges Louis
Leclerc Comte de Buon 1707-1788). In the experiment, we rst make parallel lines at equal intervals
on a at surface. Then, a needle is 'randomly' dropped on the surface and we count how often the
needle crosses a line.
"Monte Carlo -method" and its systematic use started from A-bomb research, dating back to 1944.
The theory was developed by e.g. Fermi, Metropolis & Ulam.
N
1 X
1A (xi ).
N i=1
In general, e.g. means of any quantities can be approximated by the averages of the simulated values,
and the whole distribution can be approximated by the histogram of the simulated values. (This is
the duality between the probability distribution and the corresponding empirical distribution).
N X
P (X | N, r) =
r (1 r)N X
X
There are many readily available tools in dierent statistical packages that can be used to draw random
samples from standard distributions. For example in R:
2
X<-rbinom(1000,N,r)
hist(X,freq=FALSE,min(X):max(X),
xlab='X',ylab='Probability',
main='Empirical distribution')
will generate 1000 random values of X from binomial(N, r) and plot the empirical distribution. The
more samples, the more accurately the empirical distribution will represent the true distribution.
0.06
0.04
0.00
0.02
Probability
0.08
0.10
Empirical distribution
10
15
20
25
30
35
for(i in 1:1000){X[i]<-sum(runif(100)<0.2)}.
This fundamental uniform distribution is usually found in most software (but some of them may not
be of good quality). In computers, 'random values' are not truly random, but pseudo-random, i.e.
produced by some deterministic algorithm. Once we have a generator of U(0, 1) variables, then, in
principle, we can generate all other distributions more or less eciently and more or less accurately.
And once we can simulate some random variable X from its distribution, it is very easy to obtain
empirical distributions of any transformations of X . Using Monte Carlo simulation, we don't need to
solve the analytical form of the probability density of the transformed variable g(X). For example:
x <- runif(10000)
hist(x^2,100)
Y1 =
2 log(X1 ) cos(2X2 )
q
Y2 =
2 log(X1 ) sin(2X2 )
can be used as independent random draws from N(0, 1) distribution. The idea of transformations
is also used in data-analysis when the data distribution appears 'non standard'. After a suitable
transformation, it can be made approximately normal, in which case a normal distribution might be
chosen as the conditional distribution of data.
6.5.1
The probability model for the number of virus particles X in a sample volume S litres can be assumed
Poisson(S ), where is the mean concentration per litre. If the sample size is one litre, then the
conditional probability of having no particles is P (X = 0 | S = 1, ) = exp(). Therefore, the
probability of a positive test result is
p = P (X 1 | S = 1, ) = 1 exp(),
which also species as a function of p: = log(1 p). When 50 samples are taken, the probability
of 7 positives is binomial
6.5.2
Probability of infection
The probability of infection is the probability of having at least 8 particles, in a consumption volume
of m.
P (infection | , m) = P (X 8 | , m),
where X Poisson(m). The probability of infection can be expressed more easily as
() 1 P (X 7 | , m) = 1
7
X
(m)i exp(m)
i=1
i!
Since is uncertain, this expression is also uncertain, and we obtain the nal probability as
P (infection | Y = 7, N = 50, m) =
Z
0
In other words: taking the weighted average of (), weighting by the posterior density of . Using
Monte Carlo, this integral can be calculated approximately by simulating p from the beta density,
then calculating for each sampled p, and calculating () for each sampled , and nally taking the
average of the simulated sample of ().
Alternatively, if () is interpreted as the proportion of 'servings' (of size m) with infective dose in a
large population of all servings produced from this vat, then we might also report the distribution of
the proportion () with a bayesian CI.
E (h(x)) =
h(x)(x)dx =
h(x)
(x)
(x)
1 X
(xi )
g(x)dx = Eg h(x)
h(xi )
g(x)
g(x)
N i=1
g(xi )
where xi are drawn from distribution g(x). In this method, we are not rejecting any generated values
xi , like in rejection sampling. The quantities (xi )/g(xi ) are called importance ratios, or importance
weights. If these weights can take large values with small probability, then the result might be too
much inuenced by the chance of missing some large weights (because we can only generate a limited
sample). The distribution of weights could be monitored to check if this could happen: estimates
could be poor if the largest importance weights are 'too large' compared to others. (The behavior of
small importance weights is less crucial because they have small eect on the estimate anyway).
Example: estimate E(X 2 ), where X N(0, 1). In this case we know from theory that X 2 is 21 distributed with mean 1. Try importance sampling with g(X) as t-distribution with 3 degrees of
freedom.
x<- rt(10000,3)
w<- dnorm(x,0,1)/dt(x,3)
r<- x*x*w
mean(r)
MCMC is based on constructing a Markov chain so that its stationary distribution is the
required target distribution.
This means that the ordinary Monte Carlo method is more ecient than MCMC. But in many problems, direct Monte Carlo is not possible. Instead, MCMC is extremely general method and can be
widely applied for computing e.g. posterior densities. We only need to be able to compute values of
the (unnormalized!) target density at all points x.
But MCMC needs to be applied carefully since there is no guarantee that the results are automatically
correct after a nite number of iterations. We need to check for possible convergence problems. The
general MCMC algorithm is of the form:
1. Set initial value x1 . Set counter i = 1.
2. Generate next value, conditionally on the previous: xi+1 f (x | xi ), set counter i = i + 1.
6
6.8.1
Slice sampling
(x | , 2 )1{x>L}
2 / 2
e0.5(x)
1{x>L} ,
This could also be done by simulating random draws from an untruncated density, and then accepting
only values greater than L. But if L is large, then this algorithm would be running a long time before
we have a reasonable sample. Slice sampling is actually one version of Gibbs sampling. The iteration
step is as follows (Robert CP, Casella G: Monte Carlo Statistical Methods. Springer 1999):
Step 1: xt | zt1 U L, +
2 2 log(zt1 2)
1
U 0, 2
exp(0.5(xt )2 / 2 ) .
Step 2: zt | xt
(x, z) dz = (x).
In this example:
1
2
exp(0.5(x)2 / 2 )} ,
which is a 2-dimensional uniform density over the area in x, z -plane below the density function (x),
in the region where x > L.
0.016
tiheysfunktion arvo
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
2.4
2.6
2.8
3.2
3.4
3.6
3.8
6.8.2
Gibbs sampling
Gibbs sampling is also known as alternating conditional sampling. Slice sampling is a special case of
Gibbs sampling. To demonstrate Gibbs sampling, consider a simple 2D normal density:
"
X
Y
"
0
0
# "
#
1
1
Recall that the 2D normal density function (mean zero, unit variance) is
(x, y) =
1
1
2
2
(x
2xy
+
y
)
.
exp
2(1 2 )
2 1 2
It can be written in the form (x | y)(y) or (y | x)(x) since the marginal, and conditional, densities
can be solved from the joint density:
(x) =
(y) =
and
(y | x) =
=
=
12
exp
1
2
(x, y)
(x)
1
(x2
2(12 )
2xy + y 2 )
exp( 21 x2 )
1
1
exp(
(x y)2 ) = N(x, 1 2 )
2
2(1 2 )
2 1
Sample x from (x) = N(0, 1), then sample y from (y | x) = N (x, 1 2 ). Continue until required
sample is collected. Each sampled pair of (x, y) is an independent draw from the joint distribution.
(Same can be done by sampling rst y from (y) = N(0, 1), and then x, given y ).
Each iteration step consists of a cycle, in which every element j is sampled in turn, from the full
conditional distribution, that is conditional to all other elements k , k 6= j at their current values. Of
course, the distributions are all conditional to the data X , (if our aim is to sample from a posterior).
6.8.3
Metropolis-Hastings algorithm
This is a very general tool that can be used for simulating from complicated distributions, as long as
we check it's working properly. For the algorithm, we need to choose a proposal distribution Q which
is used to generate a proposed value x for the next round of iteration, that depends on the values at
the previous step. The proposal density is thus of the form Q(x | xi1 ). The proposed value becomes
accepted with probability
r = min
(x | data)Q(x
i1 | x )
,1 .
If it becomes accepted, it will be the new generated value for x at this current iteration step. Otherwise,
the previous value remains also for the current step. As seen from the acceptance probability formula,
it is sucient that we are able to compute the posterior density without normalizing constant.
All terms that are constants with respect to x will cancel out from the ratio. The algorithm can be
slow if the proposal density is badly chosen, either too narrow or too wide. In practice, the acceptance
probability is often computed in logarithmic form which makes fractions and products into a summations that are easier to compute (to avoid numerical under-/overow due to divisions of possibly small
probability densities). As a special case, we obtain Gibbs-sampler where the acceptance probability is
always one.
9
6.8.4
Metropolis algorithm
Metropolis algorithm is a special case of M-H algorithm, in which the proposal density Q is symmetric.
Therefore, it cancels out from the acceptance probability.
4
2
0
0
2
2
0
10
20
30
40
50
60
70
80
90
100
0.4
0.4
0.2
0.2
0
10
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
10
10
0
8
10
10
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0
10
0.1
8
10
0
10
10