May 2021 Examination Diet School of Mathematics & Statistics MT4537
May 2021 Examination Diet School of Mathematics & Statistics MT4537
May 2021 Examination Diet School of Mathematics & Statistics MT4537
Each page of your solution must have the page number, module code, and your student
ID number at the top of the page. You must make sure all pages of your solutions are
clearly legible.
(a) Describe the spherical contact distribution function and how it is derived
from the void probability for a point process. Why is it useful? You need
only consider the R2 case. [2]
(b) Figure 1 shows the plot of the estimated pair correlation function for an
unseen point pattern. What can you infer about the point pattern from the
shape of the curve? [2]
g^R i pl ey (r )
g^T rans (r )
1.5
g P oi s (r )
1.0
g (r )
0.5
0.0
(c) Explain hard-core (distance) in this context, estimate it from from Figure 1,
and explain why they may occur in reality. [2]
(d) The pair correlation function is strongly related to Ripley’s K-function. Pro-
vide details of this relationship; justify why 1 is an important reference num-
(e) Suggest a model class that may be chosen to model a pattern with a pair
correlation function shaped like the one in Figure 1. Justify your choice. [2]
(f) What theoretical K function describes the CSR situation (in R2 )? Give its
form and explain why this is so. [2]
(g) Describe how the weighting for an isotropic edge correction is calculated,
for a pair of events x1 and x2 with associated window W in R2 . Show its
rationale via a sketch. [2]
(a) A type of flower in a field is sampled in three ways: (1) by taking an aerial
photo of the field and counting the flowers in the photo (some of which are
likely to be missed because they are so small), (2) doing the same, but on
a misty day when mist might make flowers in some parts of the field more
difficult to see than others, and (3) a surveyor stands in the middle of the
field and locates all the flowers she can see. Say whether p, p(x) or P (x)-
thinning would be most appropriate to model the observed flower locations
in each case, and explain your answer. [3]
(b) What is the relationship between the K function for a homogeneous Poisson
point process, and that of a p-thinned variant? Explain the relationship. [2]
(c) Consider the flower scenario above, in which a surveyor searches from the
middle of the field. Suppose that the probability of her seeing a flower is
given by exp(−r2 /σ 2 ), where r is radial distance from the surveyor. If the
intensity of flowers in the field is λ throughout the field, what is the apparent
intensity at a point that is a distance r from the surveyor? How would you
simulate this using thinning? [3]
(a) What are the main properties/assumptions that define a Neyman-Scott pro-
cess? [2]
(b) The Matérn and Thomas processes are Neyman-Scott processes. Describe
each of these, including their governing parameters. Further comment on
their applicability to real-world situations. [4]
4. Gibbs processes
(a) State the density of a Gibbs process with a fixed number of points and explain
its components. How does this create regularity in a point pattern? [2]
(b) The probability density for a Strauss process may be expressed as:
where x1 ...xn are points of the pattern, α is a normalising constant, n(x) the
number of points, s(x) the number of point-pairs that are within r units of
one another (the interaction radius). γ here is the interaction parameter and
is 0 ≤ γ ≤ 1. Explain how altering γ can make this process CSR at one
extreme, or a Gibbs hard-core at the other. [3]
(a) Why does a Gaussian Random Field not generally serve as a driving intensity
for a point process? [1]
6. Geostatistical data
(b) What is the difference between ordinary and universal kriging? Briefly com-
ment on how this can relate to spline regression. [2]
(c) Pages 6 to 10 presents outputs from models applied to data measuring heavy
metals in top-soil on a flood plain beside the river Meuse. There is location
information for the samples (x, y in metres) and several covariates - one
of which is considered here: dist, the normalised distance from the river.
Three models have been fitted to this data in order to predict, and explain,
the levels of zinc in the top-soil.
(iii) Referring to model 2, briefly describe what sort of smooth s(x, y) is,
and how its complexity is controlled. [2]
7.0
0.50
6.5
0.45
6.0
5.5
0.40
5.0
4.5 0.35
4.0
0.30
1349 1314
0.25 1139
830 1355
Semi−variance
0.20 711
149
184
0.15
114
0.10 36
17 Model: Sph
Nugget: 0.07
0.05
Sill: 0.28
Range: 724
Distance
Family: gaussian
Link function: identity
Formula:
log(zinc) ~ s(dist) + s(x, y)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.88578 0.02644 222.6 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
333000
1.0
0.5
332000
s(dist,3.75)
0.0
y
331000
−0.5
−1.0
330000
−1.5
0.0 0.2 0.4 0.6 0.8 178500 179000 179500 180000 180500 181000 181500
x
dist
GAM predictions
Model 2
333000
332000
331000
330000
predicted
5.0 5.5 6.0 6.5 7.0
Random effects:
Formula: ~Xr - 1 | g
Structure: pdIdnot
....
....
> summary(model3$gam)
Family: gaussian
Link function: identity
Formula:
log(zinc) ~ s(dist)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.84817 0.06071 96.33 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.655
Scale est. = 0.17914 n = 155
Model 3
1.5
1.0
0.5
s(dist,4.04)
0.0
−0.5
−1.0
−1.5
dist
GAM predictions
Model 3
333000
332000
331000
330000
predicted
5.0 5.5 6.0 6.5