Advance Access publication 2023 September 02
GJI Applied and Marine Geophysics
Anandaroop Ray , Yusen Ley-Cooper, Ross C. Brodie, Richard Taylor,* Neil Symington
and Negin F. Moghaddam*
Geoscience Australia, Symonston, Symonston ACT 2609, Australia. E-mail:
Accepted 2023 August 23. Received 2023 June 30; in original form 2023 January 30
palaeovalley mapping (e.g. Eberle & Siemon 2006), hydrogeologi- information gain (Lindley 1956; Chaloner & Verdinelli 1995; Ryan
cal investigations (e.g. Auken et al. 2017), national scale surveys for 2003; Valentine & Sambridge 2020). For the information gain com-
mapping subsurface architecture (e.g. Ley-Cooper et al. 2020) and putation, we use a covariate shift adaptation technique (Sugiyama
mapping detailed river valley aquifer systems (e.g. Minsley et al. et al. 2008b, a) that directly computes probability density ratios
2021b). from posterior and prior samples. We believe that this has not been
To estimate the subsurface geoelectric structure responsible for used before in near surface geophysics, and shows promise in other
the recorded earth response, we need to convert the data from a fields such as geostatistical learning and online learning from time-
time or frequency domain response to subsurface conductivity us- series (e.g. Hoffimann et al. 2021; Chen et al. 2021). Although there
ing Maxwell’s equations and inversion theory (e.g. Parker 1994; have been uncertainty analyses of AEM data using Bayesian meth-
Menke 2012). For meaningful interpretation of the inverted electri- ods (e.g. Minsley 2011; Hawkins et al. 2018; Blatter et al. 2018;
cal conductivity models and their spatial variation in terms of buried Minsley et al. 2021a) and deterministic spatial resolution investiga-
geology, we require knowledge of model uncertainties. In principle, tions (e.g. Bedrosian et al. 2016), we are not aware of studies that
these uncertainties can be found by propagating the data uncertain- have carried out fixed-wing geometry nuisance marginalization, or
ties through the inversion process all the way to the model estimates. compared the resulting subsurface uncertainties with those of a low
However, the electromagnetic inversion process is non-linear and flying helicopter system, while including deterministic inversions
Figure 2. Noisy synthetics with nominal survey specifications and noise levels for a fixed-wing time-domain AEM system. The data correspond to the X- and
Z-component magnetic fields recorded at each time channel. The synthetic model comprises 65 layers and is based on induction logs from an aquifer system
in the Permian basin, TX, USA.
observed magnetic field (e.g. Loseth & Ursin 2007). As a conse- where σ Bx and σ Bz are the data errors in the X- and Z-components,
quence, many AEM providers do not usually provide Y-component respectively.
data. The advantage of dealing with Bamp as opposed to Bx and
Bz jointly, is that irrespective of the rotation of the X and Z coils
in the X–Z plane (i.e. regardless of receiver pitch), the amplitude 3 H I E R A R C H I C A L B AY E S I A N
of the joint vector field remains invariant. This obviates the need SAMPLING OF AEM NUISANCES AND
to invert for the Rx pitch, reducing the number of unknowns in E A RT H P R O P E RT I E S
the nuisance estimation. Of course, this comes at the expense of
We now return to the matter of trade-offs between system geometry
subsurface information which the individual X- and Z-components
and earth conductivity. Using Bayesian inference in a hierarchical
provide in a conventional joint inversion. However, as we will see,
setting (e.g. Gelman 2006), we can estimate distributions over pa-
there is not an appreciable difference with the resolving capabilities
rameters we are not interested in, to ensure that inferences over
of the joint inversion, and the amplitude only inversions potentially
parameters of interest are unbiased. Nuisance estimation in this
remove troublesome conductivity artefacts at depth (Ley-Cooper &
manner has a long and rich history in the Bayesian geophysical lit-
Brodie 2020). Using the theory of propagation of errors, assuming
erature: for traveltime inversion to estimate data noise (Malinverno
independence of the data errors in the X- and Z-components, the
& Briggs 2004), in geoacoustics to estimate source waveforms and
data error in Bamp at each time channel can be derived from (1) as:
data noise (Mecklenbrauker & Gerstoft 2000; Dettmer et al. 2010),
with receiver functions to parametrize the likelihood (Bodin et al.
1 2 2
2012), with magnetotellurics to estimate water column conductiv-
σ Bamp = Bx σ Bx + Bz2 σ B2z , (2) ities (Blatter et al. 2018), to illustrate a few examples. For AEM
B amp
The process of finding the posterior probability p(θm |d) for vari- proposals, when proposing from the prior, and for uniform pri-
ous models θm admissible by the prior is repeated until an ensemble ors over the number of parameters, eq. (8) also holds for both
of models representative of the posterior probability density func- fixed-dimensional or reversible jump Metropolis-Hastings–Green
tion (PDF) p(θm |d) is obtained. Sampling proportional to the pos- McMC (Metropolis et al. 1953; Hastings 1970; Geyer 2011). This
terior probability is carried out by using the following acceptance is what we have used for sampling geometry nuisances and earth
probability α to move from an Earth resistivity model vector or conductivities. Due to this choice of proposals, the move proba-
nuisance model vector θ to proposed model θ in the McMC chain: bility terms q(.) never explicitly figure in calculation of the ac-
ceptance probability term α in our algorithm (e.g. Agostinetti &
⎡ 1/T ⎤ Malinverno 2010; Dosso et al. 2014). The exponent 1/T in (8) is an
L(θ )
α(θ |θ) = min ⎣1,
⎦. (8) annealing factor for parallel tempering (Swendsen & Wang 1987)
L(θ) as described in Dettmer & Dosso (2012). Parallel tempering sig-
nificantly accelerates the convergence to the posterior distribution
We note here, that for a uniform prior over the number of nuclei and (Dosso et al. 2012, 2014; Sambridge 2013) and is used by de-
when proposing from the prior resistivities for birth and death, for fault in TDGP. The entire McMC algorithm encapsulated within a
all TDGP moves, (8) provides the acceptance probability. All the parallel tempering framework is described in Algorithm B within
TDGP move proposal probabilities q(θ, θ ) are exactly the same as Appendix B.
described in detail in Ray (2021). For symmetric fixed-dimensional
4 S Y N T H E T I C S T U D Y: I N F O R M AT I O N and heights are sourced from the Menindee calibration range AEM
T H E O RY A N D M A R G I N A L I Z AT I O N O F flights (Barlow 2019) that we will report on in a later section. As
G E O M E T RY N U I S A N C E S in the real data example, the fixed-wing AEM system and the he-
licopter AEM system compared are widely used and technically
While ground truth is in principle, the ultimate arbiter of the ac-
mature. Both have been found fit for purpose for various surveys
curacy of a geophysical investigation method, a synthetic study at
we have undertaken (e.g. Ley-Cooper 2021, 2022) on the basis
one sounding location, with noise levels, Tx–Rx geometries and
of competitive bidding from various commercial entities that have
flying heights typical of the systems under consideration is instruc-
included, but not been limited to these two systems.
tive. The transmitter waveforms, noise levels, nominal geometries
divergences, as they are known, which can heuristically but not recent use of logarithmic scores can be found in Seillé & Visser
mathematically, be thought of as ‘distances’ between probability (2020) for selecting an optimal likelihood function, and Friedli
density functions (see Beier et al. 2002). The first such divergence et al. (2022) for evaluating different McMC proposal schemes for
leads to a strictly proper scoring rule (Gneiting & Raftery 2007), a challenging high-dimensional inverse problem. This brings us to
the logarithmic score, dating at least to Good (1952). In essence, the second divergence, known as the Bayesian information gain,
for a forecast density p( · ), say a posterior density, we assign a which represents our increased knowledge of the subsurface with a
score log p(x) when the event x is actually observed. As shown with (usually) narrower posterior density than the prior density we began
eqs (D9)–(D11), on average the highest score will be obtained for a with. In other words, the information gain represents the dimin-
forecast probability density that equals the true probability density ishing overlap between the prior and posterior densities, a natural
of the observations, even if the true density is unknown. Naturally, proxy for Bayesian resolution as shown by Blatter et al. (2018) for
such scoring systems have found heavy use in forecasting and al- an Antarctic AEM survey. To examine these divergences we begin
lied decision theoretic fields such as meteorology, quantitative eco- by showing the true model and prior resistivity probability density
nomics and finance, psychology and optimal energy usage (see Car- in the leftmost column of Fig. 4(d). As discussed in Ray & Myer
valho 2016,for a review). In near surface or exploration geophysics, (2019), though the McMC model parameter priors are uniform,
the resulting resistivities interpolated by the GP parametrization with the logarithmic score, and is discussed in detail by (Bröcker &
are not. However, similar to a bounded uniform distribution, the Smith 2007). The negative of the logarithmic score is known as ig-
interpolated prior resistivities do not have a focussed mode. norance, and (near) zero prediction probability of the truth implies
We follow Seillé & Visser (2020) and treat the posterior marginals (high) infinite ignorance. This situation could easily occur as shown
as a forecast probability density, and the true model as the eventu- above, or for extremely resistive media and inductive EM methods,
ating observation. We then fit a kernel density (Sheather & Jones as these methods are not sensitive to highly resistive media. Barring
1991) pz ( · ) to the marginal posterior samples at depth z and evalu- the outlier, the helicopter inversion does slightly better up shallow,
ate the logarithmic score log pz (mz ) for the true log10 resistivity mz and the fixed-wing joint inversion scores better deeper.
at depth z. In the middle column of Fig. 4(d), we plot the logarithmic In the rightmost column of Fig. 4(d) we show the Bayesian in-
score for each system, with higher scores indicating better repre- formation gain, by calculating the KLD directly from the posterior
sentation of the truth. There is however, a large amount of overlap and prior marginal samples at every depth. For this purpose, we
summarized by the quantile plots at the bottom, showing the overall use a covariate shift method (Sugiyama et al. 2008b), as detailed in
5th, 50th and 95th percentiles of the score. Since there is only one Appendix D. Such methods are adept at quantifying a shift between
score at each depth per inversion type, the summary percentiles samples from two probability densities, such as samples from the
are calculated from values across all depths. For decision theoretic prior and the posterior, or when there are sudden changes in online
problems, the forecast with the highest expected score is usually streaming data (Chen et al. 2021). A recent use of sample based
preferred (Diks et al. 2011). However, there is also a notable outlier covariate shift methods for geostatistical transfer learning can be
score for the helicopter system which will skew the average score found in Hoffimann et al. (2021). The information gain focuses
for that system downwards. This is because the posterior probability very minutely on the overlap (or the lack of it) between the poste-
for all three systems at the bottom end of the true conductor is very rior and prior probability densities. As shown in Appendix D, it is
small. However, it is not that the helicopter system does worse at always positive and only zero if the prior and posterior densities are
localizing the conductor, there is a narrow high probability region identical. To first order, the information gain is small when poste-
only slightly to the left, within half a conductivity decade (1 decade rior widths are large, and large when posterior widths are narrow,
= 1 log 10 unit of resistivity or conductivity) at the outlying depth such as within the conductor between 50 and 80 m depth. This
of ∼80 m in Fig. 4(c). This problem with locality is a known issue is an information theoretic counterpart of deterministic sensitivity,
as we know that TE mode inductive EM sources (Loseth & Ursin this gain from 2–3 to 4 bits is not significant. Each bit of infor-
2007) are sensitive to conductors. Examining in detail, we see that mation gain leads to reduction of half the prior probability mass
the information gain both above and below the conductor is small, (Pinkard & Waller 2022) or equivalently, doubles the concentration
but is slightly larger at shallow depths for all three inversion types. of probability mass in the posterior. Since 80 per cent of posterior
This is as we would expect, given signal-to-noise considerations at conductivities are between 0.8 and −0.26 log10 m for the fixed-
late times as well as conductive shielding effects in electric media. wing system inversions at ∼70 m, halving the posterior probability
We would particularly like to draw attention to the fact that within mass does not add significantly to inferred knowledge of a con-
the conducting body itself, unlike for the logarithmic score, the in- ductor. For all three inversion types, at this depth we are able to
formation gain remains large. This is due to the information gain bracket a 0 log10 m conductor within less than half a decade, in a
not suffering from the aforementioned locality problem. In fact the prior range spanning nearly 4 decades of resistivity. Underneath the
helicopter system has a tight posterior distribution at ∼80 m depth conductor, all three inversions indicate a return to resistive geology
within half a decade of the true value, accordingly it has the highest in the 2.7 to 0.1 log10 m range, with the fixed-wing/joint inversion
information gain. indicating a slightly narrower high probability region. Finally, the
In the absence of ancillary information, no inversion/system box plots showing the 5th, 50th and 95th percentiles of informa-
would clearly outmatch the others. In all cases, from the marginal tion gain for each system and their large overlap, are shown at the
posterior probabilities of resistivity with depth, we would interpret bottom of the rightmost panel in Fig. 4(d). Again, there is only one
the following. Starting from the top: resistive geology, between value of information gain at each depth for every inversion type,
2.7 and 0.8 log10 m, followed by a conductor starting at ∼50 m consequently the summary percentiles are calculated from values
depth. Owing to conductivity-thickness trade-offs the fixed-wing across all depths. As before, we see a large overlap and are able to
systems/inversions do not narrow down the conductor bottom well, confirm our earlier first order inspection of posterior quantiles with
while the helicopter estimate of the bottom is slightly too shallow. this analysis. This is useful to note since most geophysicists or ge-
All three systems have high probability mass in the conductor be- ologists will not have computed information theoretic divergences
tween 50 and 80 m in the vicinity of 1 m or 1 S m−1 (i.e. 0 in readily available. Another useful measure of the overlap between
log10 ). While we could point to information gain within the conduc- probability densities is the Bhattacharyya distance and related co-
tor being higher for the helicopter system—contextually speaking, efficient (Bhattacharyya 1943), which has been used in geophysical
work (Subašić et al. 2019), though it does not have a straightforward and we do not do so here either. We surmise this is because the
Bayesian interpretation. helicopter AEM system studied here, which we have observed in
In conclusion, if it were for this particular synthetic model, the operation, has a rigidly mounted Rx coil. The aerodynamically sta-
choice of system from a technical standpoint is largely equiva- ble Tx–Rx frame and its height are known well enough for this
lent. We can say this as the CI widths are similar noting the vari- not to be inverted for. For conventional fixed-wing inversions (Fig.
ations discussed above, and neither the scoring rule nor informa- 4a), in addition to the earth conductivity, three more parameters
tion gain indicate without qualification, a superior system/inversion need to be sampled (Fig. 5): the receiver pitch, the horizontal
type. Tx–Rx separation (labelled Tx–Rx hsep) and the vertical Tx–Rx
separation (equivalently, we have kept fixed the Tx height, and in-
verted the Rx height, labelled zRx). We make observations of note
4.2 Inversion details and nuisance sampling In Fig. 5(a) the most probable nuisance model values and the
All inversions converged to a rms value of 1. The probabilistic in- truth do not coincide. Although other workers have encountered
versions for both kinds of fixed-wing inversion were run with 7 similar phenomena (e.g. Dettmer et al. 2015, most probable ver-
log-spaced parallel tempering chains, and for helicopter AEM in- sus true values in their fig. 4), we decided to investigate further.
version with 5, with a maximum annealing temperature of 2.5. A We ran the Bayesian inversion for 200 000 more samples with the
greater number of parallel chains are required for the fixed-wing same noise realization as in Fig. 2, then did an independent run for
inversions, as the inference problem with geometry nuisances is 1 000 000 samples. The longer run and independent restarts with
harder to sample. We had initially achieved stationarity well within the same data as in Fig. 2 persistently produced histograms where
400 000 samples in the target McMC chain at T = 1. However, the true nuisance values are in the tail region. Further, the poste-
to sample the near-zero probability regions and establish stable rior conductivities among these runs are virtually indistinguishable
score estimates that avoid infinite ignorance, we ran each inversion from Fig. 4(a). When compared with the prior extents (Fig. 5b), the
type for 1 000 000 samples, discarding the first fifth to preclude posterior distributions do appear more generally in the vicinity of
the possibility of biased inference. Within the legacy survey noise the true values. Given that the likelihood (Fig. 4) depends on the
levels we have accumulated over the years, inverting for height data noise—on running with a different random noise realization,
has not made a significant difference to the conductivity model we did indeed observe coincidence of true geometry values with
high probability regions, resulting in minor differences with the of this prior volume is of posterior importance (i.e. a large part
posterior conductivities presented in Fig. 4a). of the boxed region with dashed lines in Fig. 5b is empty model
We are extremely sensitive to the Tx–Rx separation, but receiver space). Finally, the posterior sampling surface is quite rugged (i.e.
pitch and receiver height trade-off near linearly. This intuitively fat tails with sharp jagged dropoffs) as can be seen from the zoomed
makes sense—if the pitch decreases (antenna axis along flight line in crossplots and marginals for both pitch and zRx in Fig. 5(a).
tilts upwards) this could be compensated by the antenna origin being Most real data McMC AEM inversions converge to stationarity
translated closer to the ground. However, this implies that we cannot well within 200 000 samples, but in order to draw robust conclusions
resolve both receiver height and pitch, and perhaps the information from synthetics we have massively oversampled as described above.
contained in one ought to be enough for the inverse problem—with Parallel and high performance computing (HPC) considerations
a suitable rotation of the nuisance coordinate axes. Unfortunately, during sampling have been laid out in Appendix C
this rotation of the axes is data dependent and we need to do an initial For an amplitude only inversion (Fig. 4b) of the same data, we
sampling run to estimate a principle components rotation. However, do not need to estimate the Rx pitch and the posterior nuisances
we have found that it is indeed much more efficient to sample along are shown in Fig. 6. Immediately, we see that the fraction of prior
such rotated parameter axes as shown by Yardim et al. (2006) for volume required to solve the problem is greater and the poste-
radio-refractivity inversion and Dosso & Dettmer (2011) for geoa- rior surface is less rugged, denoting that the posterior is easier
coustic inversion. An efficient alternative for sampling nuisances to sample. While the deterministic estimate of the receiver height
could be pseudomarginal methods and their correlated variant (see nuisance parameter is outside the prior range, the deterministic con-
Andrieu & Roberts 2009; Friedli et al. 2022, for details). ductivity model, which is ultimately the earth feature of interest, is
Though the nuisance prior bounds are based on what we would within the 90 per cent CI as can be seen from Fig. 4(b). Undoubt-
expect for errors from the onboard inertial measurement unit (IMU) edly, tweaking the regularization and constraints for the determin-
sensor and variability within a flightline—only a very small portion istic nuisances can lead to ‘better’ estimates of conductivity. Since
we do not know the true earth model for real data scenarios, we and partly over a shallow ephemeral lake, co-incident with or very
have not opted for such tweaking to keep the synthetic exercise near boreholes with induction logs (Fig. 7). As will be shown later,
meaningful. downhole conductivities from these logs provide a useful compar-
ison of inversion results with ground truth. This in turn allows us
to assess within the limits of temporal (seasonal or climatic) vari-
5 M E N I N D E E C A L I B R AT I O N L I N E , ation, the accuracy with which different AEM systems image the
N E W S O U T H WA L E S : C O M PA R I S O N subsurface. It is unusual for the same survey line to be systemat-
W I T H B O R E H O L E S A N D I N F O R M AT I O N ically flown repeatedly by vendors (see Minsley et al. 2021a, for
GAIN another example of overflown lines), especially in the presence of
In the Broken Hill region of New South Wales, Australia, lie well-constrained geology and induction logs. Hence the Menindee
Menindee Lakes. Over one of these lakes we operate an AEM test range has become a valuable proving ground for AEM technol-
testing range. A 12-km-long flight line lies partly over arid ground ogy. This holds true for testing mechanical features and electronic
Bayesian analysis of AEM uncertainty 1901
Figure 14. Data fits for 100 randomly selected posterior models at well BHGT14M1 for (a) conventional fixed-wing Bx and Bz joint inversion and (b) fixed-wing
amplitude only inversion and (c) helicopter dBz /dt inversion.
instrumentation of the AEM systems, as well as for inversion codes the test line. The closest location to a well studied borehole which
and modelling theory. intersects a known near surface regional conductor has been marked
In Fig. 8, we show the results from probabilistic inversions of in all three figures for closer inspection later in the text. First of all,
all helicopter EM soundings inverted independently along the test it is apparent from Figs 8–10 that all three inversions show remark-
line. At each sounding, McMC was carried out for 400 000 samples ably similar posterior uncertainties, with exactly the same priors on
on multiple parallelly tempered interacting chains, with the first conductivity, using the same geometry constraints for fixed-wing
80 000 samples discarded in the burn-in phase. The first row from data as in the synthetic studies, all the while using measured high-
the top shows the mean square misfit or φ d , with a reasonable fit altitude noise levels for the test flights. From the northwest to the
to data given by the dashed line at φ d = 1. The second, third and southeast, as we descend into the lake bed, the lake clearly shows
fourth rows from the top show the 5th, 50th and 95th percentile up in all percentiles as a resistive structure relative to its surround-
posterior conductivities with depth at every sounding. Similar to ings. To the southeast, in the near surface, there appears to be a
the synthetic studies shown in Fig. 4, wherever the percentiles show layer of clays which show up as conductive. It must be noted that
similar values, the CIs are narrow and hence imply greater posterior the helicopter system was flown in 2015 and the fixed-wing system
certainty. Conversely, a large spread from red to blue among the in 2017. Differences in the posterior conductivity percentiles be-
three percentile images indicate a broad CI and greater posterior tween AEM systems, especially in the shallow tens of metres could
uncertainty. In exactly the same format as just described, in Figs 9 be due to differences in the subsurface water saturation in these
and 10, we show the results for fixed-wing AEM data flown over years.
A major point of difference between all three inversions is the mass moves towards conducting (e.g. near BHGT14M1). We look
presence of relatively certain, deep conductors shown by the joint next in detail at borehole BHGT14M1, after examining the Bayesian
inversion of fixed-wing data at −200 m relative to the Australian information gain at all sounding locations for context.
Height Datum (Fig. 9). Notably, these deep features are missing from Fig. 12(a) shows the Bayesian information gain for all surveys
Fig. 10 which uses the same input AEM data as the joint inversion. and inversions at all locations, calculated using the methods de-
The presence of these features cannot be validated as there are no tailed in Appendix D. On average, the helicopter system shows a
induction logs in their vicinity. Whether the helicopter system and higher information gain nearer the surface, and the fixed-wing/joint
the amplitude-only inversions fail to see these deep conductors, or inversion shows higher gain as we go deeper (Fig. 12b). All systems
whether they are artefacts, we are unable to say at this point. A more and inversions show increasing information gain in the near surface,
detailed examination of the fixed-wing nuisances is carried out in where conductors are inferred. Keeping in mind the discussion in
Appendix A. Section 4.1 on the synthetic example, we now examine the infor-
mation gain (Fig. 12c) and resolving capability of all systems in the
near surface at BHGT14M1 (Fig. 13), where there is a known strong
conductor within the first 25 m, with well established knowledge of
5.1 Comparison with borehole conductivities and
the aquifer system.
The inverted conductivities as well as acquisition geometry nuisances are shown in this section, both for the conventional fixed-wing joint
inversion (Fig. A1) and for the same input AEM data, the amplitude only inversion (Fig. A2). The fact that the inverted nuisance values
(blue lines and shaded blue regions) do not always overlap with the measured IMU-provided values (orange) is not surprising, as the IMU
readings could be inaccurate, or it is possible that the inversion has found some trade-off as was shown in the synthetic examples. What is
important however, is that the inversion be parametrized to allow for geometry nuisance inference, as in their absence, the data residual would
be propagated incorrectly into the inverted conductivities.
A P P E N D I X B : T R A N S - D I M E N S I O NA L A LG O R I T H M W I T H N U I S A N C E S A M P L I N G
As shown by the pseudocode provided in Algorithm B, one step of our algorithm encapsulates a reversible jump or trans-dimensional step
(starting at Line 4), followed by a nuisance sampling step (starting at Line 11), contained within a parallel tempering loop (Lines 3-20),
followed by a parallel tempering swap (Lines 21–28).
Details of the trans-dimensional birth and death moves for resistivity models are exactly the same as described for stationary Gaussian
processes (Ray & Myer 2019; Ray 2021). The nuisance sampler uses ordinary Metropolis–Hastings (Metropolis et al. 1953; Hastings 1970)
for the fixed-wing geometry nuisance parameters which are nnuisances in number.
Parallel tempering is used to exchange information between interacting McMC chains to escape local misfit minima (i.e. likelihood
maxima). Temperatures or models are exchanged at the end of each McMC step using the following Metropolis–Hastings criterion (Swendsen
& Wang 1987; Geyer 1991; Earl & Deem 2005; Dettmer et al. 2012; Sambridge 2013):
⎡ 1/Ta 1/Tb ⎤
L ( θ mb ) L ( θ ma )
αswap (a, b) = min ⎣1, ⎦. (B1)
L(θ m a ) L(θ m b )
Figure A3. Posterior conductivities obtained after forward modelling the well log conductivities (yellow line) at borehole BHGT14M1 at Menindee lakes for
(a) joint Bx , Bz inversion, (b) amplitude only B field and (c) helicopter dBz /dt data. As in the main text, the dashed lines represent the 5, 50 and 95 percent
posterior percentiles of conductivity. In each case, the posterior median has been used as the upscaled log for qualitative comparison in Fig. 13.
For a description of why swapping models for escaping local likelihood maxima using (B1) is effective, see section 3.2 of Blatter et al. (2018).
For computational efficiency, temperatures are exchanged during inter-process communication to achieve the exact same effect as swapping
models. No likelihood computations are required in the calculation of the swap probability in (B1), as they have already been evaluated in the
preceding j loop on Line 3 of Algorithm B.
All Markov chains were run at log-spaced temperatures between 1 and 2.5. Details of setting a temperature ladder can be found in Dettmer
et al. (2012) and Ray et al. (2013). The helicopter AEM inversions required five temperatures and no nuisance sampling, whereas the fixed-
wing inversions used 7 temperatures with nuisance sampling. Larger numbers of temperatures are required to sample rugged likelihoods.
Posterior inference is carried out only from models that are at T = 1 after sorting the chains by temperature.
A P P E N D I X C : H I G H P E R F O R M A N C E C O M P U T I N G C O N S I D E R AT I O N S
It can be seen from Algorithm B that the nuisance sampler does require likelihood evaluation (i.e. a forward call). Thus sampling the fixed-wing
system posterior requires twice as many forward calls as it does to sample the helicopter system posterior. For the real data examples, one
sampling step required 70-80 ms for the fixed-wing system, and 40–45 ms for the helicopter system. This evaluation time is dominated by
the forward evaluation for the 52 layer model. This is fortunate since, if each McMC sample took less time than a few ms, the overhead in
inter-process communication per McMC step would make the parallel computation inefficient. Parallel tempering is not an embarrassingly
parallel algorithm, owing to the requirement that information be exchanged between interacting chains. This can be seen from Algorithm B
on Line 3, where an independent McMC sampling step is made at each temperature, with the exchange of temperatures occurring on Line
21 of the algorithm. Care must be taken in the computer implementation of this exchange step to transfer the bare minimum information
possible, in order to the make the inter-process communication efficient.
The real data examples required 200 000 McMC samples for each chain in parallel tempering, though we extended each sampling run to
400 000 samples. For each sounding, 200 000 samples required approximately 4.5 hr for the fixed-wing system on 7 + 1 CPUs, and 2.5 hr
for the helicopter system on 5 + 1 CPUs. The +1 indicates the manager process for inter-CPU communication and each remaining CPU was
assigned to McMC computation at a specified temperature. A significant improvement has been made in the HPC implementation of parallel
tempering in this work compared to our earlier work in Blatter et al. (2018), which was based on the parallel tempering implementation of
Ray et al. (2013). As noted by Blatter et al. (2018), one single-moment helicopter AEM sounding required 5 d to invert probabilistically. In
our current work, soundings were batched in parallel such that 120 soundings for the fixed-wing system, and 160 soundings for the helicopter
system were carried out at the same time using 960 CPUs. To be explicit, an entire batch of 160 soundings were probabilistically inverted
within 2.5 hr for the helicopter system using 960 CPUs. The full set of 720 helicopter soundings over the Menindee test range required five
batches and a total of 12.5 hr. If more CPUs are made available, then a greater number of soundings can be parallelly inverted in a batch, and
fewer numbers of batches will be required in total. Similar calculations can be made for the fixed-wing system. On average, the fixed-wing
posterior sampling results require twice as much time with the same computational resources, due to nuisance sampling needing an additional
forward evaluation—though this can perhaps be avoided by reusing precomputed Hankel transform evaluations (future work).
The Bayesian inversion including the forward modeller code is written purely in Julia (v1.6, tested on v1.7-1.8). The deterministic inversion
code including the forward modeller is written in modern C++. The parallel computation in this study used 48-core @3.2 GHz Intel Xeon
Cascade Lake nodes in the NCI’s Gadi cluster.
to receive lower scores than the true forecast density, and it is strictly proper, if the maximum expected score attained is unique. While they
have not been used much in near surface geophysics, we note recent applications in Seillé & Visser (2020) for selecting the best likelihood
function for Bayesian inversion, and in Friedli et al. (2022) for evaluating the performance of different Monte Carlo methods. We shall now
prove that the logarithmic score, which dates back as far as Good (1952), is related to the KLD, by first showing that the KLD is always
greater than or equal to zero. As (D2) provides the general definition of DKL (p||q), not just for priors and posteriors, we can write for any two
valid probability densities p and q:
q( x)
− DKL ( p||q) = Ex ∼ p(x ) log . (D3)
p( x)
Using the inequality 1 + log u ≤ u, with exact equality at 1, we can substitute u → q
and take expectations with respect to p. We can thus
q( x)
1 + Ex ∼ p(x ) log ≤ Ex ∼ p(x ) q( x)
p( x)
, (D4)
p( x)
The integral on the left hand side of (D11) is the expectation of the incorrect forecast score over all possible values x ∼ p(x) where p(x) is
the true density. This quantity is always less than or equal to the average of the true forecast score, which is the integral on the right hand
side of (D11). This shows that the requirement of propriety is satisfied by the logarithmic scoring rule. Since the maximum expected forecast
score is reached uniquely for q(x) = p(x) ∀x, the logarithmic scoring rule is also strictly proper.
the range of the posterior samples. This is important to keep in mind as low probability regions will score very poorly (or even have infinite
ignorance =−log 0) and can bias the average score for a particular posterior density. While other non-local scores exist, they can be relatively
insensitive to the width of probability modes (Smith et al. 2015) and are not considered here. In sum, both logarithmic scores and information
gain are useful analysis tools derived from appropriate usage of the KLD. However, information gain can be estimated when true parameter
values are not exactly known (i.e. most of the earth) for real data. Criteria such as mean absolute error could be used to quantify closeness
to the truth, but again this is not as generally useful as information gain for similar reasons. Further, information gain is not sensitive to true
values falling adjacent to high posterior probability regions—a phenomenon which is common in non-linear geophysical inverse problems.
