Uncertainty Quantifi Cation and Predictive Computational Science
Uncertainty Quantifi Cation and Predictive Computational Science
Uncertainty Quantifi Cation and Predictive Computational Science
McClarren
Uncertainty
Quantification
and Predictive
Computational Science
A Foundation for Physical Scientists and
Engineers
Uncertainty Quantification and Predictive
Computational Science
Ryan G. McClarren
Uncertainty Quantification
and Predictive Computational
Science
A Foundation for Physical Scientists
and Engineers
123
Ryan G. McClarren
University of Notre Dame
Department of Aerospace
and Mechanical Engineering
Notre Dame, IN, USA
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Beatrix, Flannery, Lowry, and Cormac for
the joyous uncertainty they add to my life.
Preface
This book began as a collection of notes from a class on “predictive science” that I
started teaching in 2009 at Texas A&M University. Initially, the course was in the
statistics department and taught a group of engineers and statisticians a common
body of knowledge around using simulation to make predictions about reality. That
initial course had sections on code verification, model validation, and uncertainty
quantification (UQ). Each time I taught the course, the UQ section expanded, and
eventually the UQ portion became the entirety of the course. This was in response
to student feedback and the fact that the research and practice of UQ expanded so
much in the intervening years. The content in this work represents what I feel to be
a range of topics that gives engineers and physical scientists the crucial knowledge
of uncertainty quantification and predictive science. I have tried to include as many
examples as possible to give the reader insight on how the methods in the book
behave, as well as guidance in applying the methods to other problems.
This book is geared toward readers who are numerically solving mathematical
models, often in the form of partial differential equations, that have uncertainties
due to the distribution of the inputs, discretization and solver error, and model error.
The topics that are covered give the reader the ability, and the motivating reasons,
to analyze how uncertainties affect computer simulation and ultimately predictions.
A thorough discussion of the landscape and overall setting of uncertainty quantifi-
cation in the context of simulation-based prediction is given in Chap. 1.
Throughout most of the text, the advection-diffusion-reaction equation is used
as a test bed for different UQ methods. This equation in one of its many forms can
be found in most engineering and science disciplines so that, I hope, most readers
will find examples based on this equation relatable to his or her work. The ideas
behind uncertainty quantification can be applied to almost any problem, but having
examples that can be directly connected to the reader’s experience is more powerful.
In my experience many students do not have a deep enough probability and
statistics background to digest all the techniques that are used in UQ. For this reason
Part I of this work gives the reader the necessary background in probability and
statistics. This goes beyond the basic definitions to include topics such as copulas,
Karhunen-Loève expansions, tail dependence, and rejection sampling.
vii
viii Preface
This book includes coverage, in Part II, of the topic of local sensitivity analysis
because I feel that is a good place for a novice to begin to understand the overall
topic of UQ, and local sensitivities can be useful in reducing the input parameter
space. The coverage of local sensitivity goes beyond derivative approximations and
estimation of output variance. Using regression techniques, including regularized
regression, to estimate first- and second-order sensitivities is included. The topic
of adjoint equations as a means to estimate sensitivities can be found in Chap. 6,
wherein a concise procedure for deriving adjoints for nonlinear, time-dependent
problems is presented.
Part III of this work covers what many would call conventional UQ, that is, the
estimation of output uncertainty from parametric, or input, uncertainties. Therein the
topics of Monte Carlo, reliability methods, and stochastic projection are covered.
The chapter on Monte Carlo, Chap. 7, goes beyond simple random sampling to
include Latin hypercube designs (and variants) as well as quasi-Monte Carlo
techniques and comparing all the sampling-based methods discussed. Reliability
methods are presented in Chap. 8 as an approach to estimate properties of the output
using a small number of simulations.
The exposition of stochastic projection and collocation methods, sometimes
called polynomial chaos techniques, in Chap. 9 is detailed and gives concrete
examples of expansions in several different orthogonal polynomials, as well as
details of the quadrature sets needed. In that chapter I take the liberty of defining
beta and gamma random variables that are slightly different than the standard
definitions to make the expansions much easier to calculate. Chapter 9 also
includes discussions of sparse quadrature for multidimensional integration, the use
of regularized regression to estimate expansion coefficients, and the stochastic finite
element/projection method. The coverage in Chap. 9 is complete and addresses the
common complaint from students that polynomial chaos is difficult to apply because
of the different definitions of orthogonal polynomials and quadratures. One small
downside to this completeness is that there are over 100 numbered equations in
Chap. 9.
Part IV demonstrates how surrogate models (sometimes called emulators) can be
used to fuse experimental and simulation data to make predictions. Chapter 10 intro-
duces Gaussian process regression as a technique to construct surrogate models. The
discussion of calibration and predictive models in Chap. 11 follows that of Kennedy
and O’Hagan for the predictive model form but does include the extension to a
hierarchy of model fidelities. Chapter 12 also provides the requisite background in
Markov chain Monte Carlo and the Metropolis-Hastings algorithm to fit predictive
models. The final chapter, Chap. 12, is devoted to handling uncertainties that do not
have a distributional nature. This chapter shows how interval uncertainties can be
treated and how they affect predictions.
The material in this book can be covered in a single course on uncertainty
quantification. I assume knowledge of the standard mathematical content covered
in the engineering/physical science undergraduate program. Some knowledge of
partial differential equations is assumed, and any topics that I believe would be new
Preface ix
to the reader are introduced gently. The most challenging mathematics is probably
in Chaps. 9, 10, and 11. I have attempted to make these topics as uncomplicated as
possible without making the techniques seem like opaque, black boxes.
Finally, a note about style. I have tried to make the text of this work not be
burdened by an overly pedantic style. I hope that the style does not veer into the
realm of being too conversational. My intent is to make the reader feel as though
we are discussing the material face to face. Of course in discussion, I often make
allusions to topics that are far afield of science and engineering. I have tried to
minimize the number of times the reader will be sent to the nearest search engine to
look up something, but at the same time, I hope some readers learn about more than
just UQ.
Many thanks are in order for making this book possible. I would like to thank
Denise Penrose at Springer who managed a project that was long in the making and
shepherded drafts of the manuscript through the review process. The feedback of
the anonymous reviewers, as well as Martin Frank and Jonas Kusch from Karlsruhe
Institute of Technology (KIT), helped improve the work. My engineering colleagues
during my time at Texas A&M, Marvin Adams, Jim Morel, and Jean Ragusa, were
especially influential in the development of this book. I am also grateful to Bani
Mallick and Derek Bingham for many helpful discussions. I would like to thank KIT
and RWTH Aachen University for hosting me at points during the preparation of
this manuscript, including giving a short course based on Chap. 9 for the AICES EU
Regional School in 2016. Finally, this work would not have been possible without
the support and help of my wife Katie.
Part I Fundamentals
1 Introduction to Uncertainty Quantification and Predictive Science . . 3
1.1 The Limits of Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Verification and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Code and Solution Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Experiments for Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.4 Simulation Versus Experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.5 Small-Scale Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 What Is Uncertainty Quantification?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Selecting Quantities of Interest (QoIs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Types of Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.1 Aleatory Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.2 Epistemic Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Physics-Based Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 From Simulation to Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.1 Best Estimate Plus Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7.2 Quantification of Margins and Uncertainties. . . . . . . . . . . . . 17
1.7.3 Optimization Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7.4 Data-Driven Experimental Design. . . . . . . . . . . . . . . . . . . . . . . . 17
2 Probability and Statistics Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 Probability Density and Cumulative Distribution
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.1 Median and Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
xi
xii Contents
2.2.3 Skewness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.4 Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.5 Estimating Moments from Samples . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Sampling a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.1 Sampling a Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.2 Sampling a Gaussian Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.6 Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.7 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3 Input Parameter Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1 Dependence Between Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.1.1 Pearson Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.1.2 Spearman Rank Correlation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.3 Kendall’s Tau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1.4 Tail Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2.1 Normal Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.2 t-Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.3 Fréchet Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.4 Archimedean Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2.5 Sampling from Bivariate Copulas . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 Multivariate Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.1 Sampling Multivariate Archimedean Copulas . . . . . . . . . . . 73
3.4 Random Variable Reduction: The Singular Value Decomposition 76
3.4.1 Approximate Data Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4.2 Using the SVD to Reduce the Number of Random
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5 The Karhunen-Loève Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.5.1 Truncated Karhunen-Loève Expansion. . . . . . . . . . . . . . . . . . . 84
3.6 Choosing Input Parameter Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.6.1 Choosing Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.6.2 Distribution Choice as a Source of Epistemic
Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.7 Notes and References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Contents xiii
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Part I
Fundamentals
Part I of this text gives the background in scientific computing, probability, and
statistics that will be the baseline for the development of uncertainty quantification
techniques. The first chapter deals with the question of how and why we need to
understand the uncertainty in computer simulation results and sets the stage for
the type of problems that we will solve. The second and third chapters in this part
discuss how we will use probability and statistics, with the last chapter giving in-
depth discussion of the more advanced statistical tools and concepts necessary to
perform uncertainty analyses.
Chapter 1
Introduction to Uncertainty
Quantification and Predictive Science
You shall know the truth, and the truth shall make you odd.
—Flannery O’Connor
Since I was a child, I was enthralled with the idea of using a computer to solve
problems that I could not with pencil and paper. There is a good chance if
you are reading this that you have had a similar experience with the augmented
problem-solving ability that computers allow. Most people in computational science
have, at one point or another, been frustrated with the limited applicability of the
typical toolbox used to solve partial differential equations (e.g., integral transforms,
eigenfunction expansions, etc.). The beauty of computer simulation is that any
problem can be solved provided you can cast the continuum equations in terms
of finite quantities and you have enough computer horsepower at your disposal.
Beyond the fact that computation allows the solution of problems that are
intractable by other means, simulation also allows us to probe areas that experi-
mental measurements cannot. No experiment could give you the temperature profile
at every point on the surface of a space reentry vehicle or the distribution of neutrons
in a nuclear reactor. Solving the equations on a computer gives you this information
at the scale one desires and can give insight into the mechanism of a phenomenon
in ways that experiments can only suggest.
The ability to show what is going on in an experiment can be extrapolated to
make a prediction. It is reasonable to suggest that a computational simulation could
tell a researcher what will happen in an experiment that has yet to be performed.
Such a request occurs often in terms of design, asking how a new system will
perform when it is built. Typically, this exercise uses computation to rule out
certain designs, and the candidate designs that pass the computational test are then
tested in small scale experiments, before production of the new system takes place.
Eliminating some designs will cut down on the possible number of prototypes that
need to be built and tested, leading to significant cost and time savings. Using
computation in this way is entirely justifiable and reasonable, especially when
there is operation history and previous experimental results for systems that are
“nearby” the new candidate design. Take the example of an airplane. From my
1.2.2 Validation
1 This is not just a handwaving argument. There is no unified theory of all the forces in the universe,
i.e., we have not uncovered the equations that underlie the universe at all scales. Therefore, any
single mathematical model will not be accurate for every problem.
8 1 Introduction to Uncertainty Quantification and Predictive Science
Even a fledgling science student knows that lynchpin of the scientific process
is the use of experiments to support hypotheses or to falsify them. Supporting the
theory that a given mathematical model describes a phenomena is no different. The
problem is the process of comparing experiments to numerical results is not as
simple as computing a number and then seeing if it agrees with the experimental
measurement.
Unfortunately, experimental data is often lacking or impossible to gather. This
predicament is not uncommon. The problem of geologic disposal of nuclear waste
is a prime example. We can model the behavior of a repository for nuclear waste
using geology, hydrology, and nuclear engineering considerations in an attempt
to say whether the waste will contaminate the groundwater, but we cannot do an
experiment unless we want to wait 10,000 years (!) for the result of the experiment.
In such a situation, often the best we can do is to forthrightly state the assumptions
in our model and point by point justify each of these assumptions.
For the most part, computer simulations are guilty until proven innocent, in that the
burden of proving that a simulation represents reality lies in the hands of the one
doing the simulation. On the other hand, an experimental result is often widely
accepted as being an accurate picture of reality. Few will question whether the
team who completed the experiment properly characterized and accounted for all
sources of error. Paraphrasing Roache (1998), the state of play is such that nobody
believes the result of a simulation, except the person who performed the simulation,
and everybody believes the result of an experiment, except the person who ran the
experiment.
The outlook of the experimenter is often proper, that is, it is naive to assume
that the result of a single experiment is the final word on a specific phenomenon
or system. This should also be the outlook of the computational scientist that is
attempting to validate a particular model: one number from one experiment should
not make or break a model.
order model. Furthermore, the types of input uncertainties affect the interpretation
of the prediction.
The process and science of UQ is more than just putting error bars on the
simulation. It requires inquisitiveness to ask questions about the impact of results,
physical and engineering intuition to know how to interpret results, and the humility
to understand that not all questions can be answered with certainty.
While this work focuses primarily on using probability theory to estimate
uncertainties, there are other mathematical approaches that we do not cover.
These include fuzzy logic and worst-case analysis. Halpern (2017) discusses these
alternate approaches.
∂u
+ v · ∇u = ∇ · ω∇u + R(u), r ∈ V, t > 0. (1.1)
∂t
with boundary and initial conditions given by
In the situation where Eq. (1.1) is an adequate model for our physical system, we
might be interested in the following quantities:
• The maximum value of u inside a given time range [a, b]:
• The average value over a particular region of space, D, and time range [a, b]:
b
1 1
dr dt u(r, t).
b − a |D| D a
• The outflow of u from the system over a given time range [a, b]:
b
dA dt (v · n − n · ω∇) u(r, t),
∂V a
Here, s(u) is a function that maps the output of u(r, t) to a scalar, such as the max
function we saw above, and w(u, r, t) is a weight function. As an example, if the
QoI is the reaction rate over a range of time, then s(u) = 0 and
R(u) t ∈ [a, b]
w(u, r, t) = .
0 otherwise
There are two main classes of uncertainties in a problem. These are not necessarily
two distinct classes as some uncertainties could be classified into either category.
The nature of the uncertainty does impact how we want to treat the results of the
analysis, as we will demonstrate. Further discussion of these concepts can be found
in Der Kiureghian and Ditlevsen (2009).
Aleatory uncertainties come from the inherent randomness of a system. The term
derives from the Latin aleator for dice player, and this provides a good mental
model for these uncertainties. If we consider every replicate of an experiment or
system fielded, there will be slight differences due to issues such as manufacturing
tolerances, ambient conditions (e.g., weather), and other randomness.
A property of aleatory uncertainty is that the randomness can be described by
a distribution. For example, given a process that manufactures a part of the system
of interest, there will be a distribution of sizes of the component. By looking at
realizations from the manufacturing process, one could fit a distribution for the size
of the component.
Another example of an aleatory uncertainty would be the position of aggregate
(i.e., the rocks) in concrete. The distribution of the position and the size of the rocks
in the concrete will change from concrete sample to concrete sample, and one could
obtain a distribution for the position, shape, size, etc. for the aggregate.
Epistemic uncertainties arise from the lack of knowledge about a system. Often-
times, these uncertainties are due to an approximate model for the system, but they
can also arise from numerical error. In both cases, it is likely there are errors made
from the approximations, and we do not know how large those errors are and those
errors are not described by a probability distribution. In many cases the best we
can do is bound the uncertainty, but then we are dealing with intervals and not
probabilities.
The epistemic uncertainty could arise from approximations in the analysis of
a system. For instance, when an analyst prescribes a distribution for an aleatoric
uncertainty based on a set of samples from the distribution, an error likely arises.
This uncertainty arises from a lack of knowledge from the true distribution of that
input.
1.6 Physics-Based Uncertainty Quantification 15
example, the experiments will have uncertainties in the measurement, the theoretical
models have parameters that will be uncertain, such as a gas constant, and the
simulations will also have uncertainties. The sum total of the uncertainties in these
components is likely to be much smaller than the number of parameters in the table.
Therefore, the true dimension of the uncertainty is not based on the equation of state
table, but on the physics behind the table.
This is an example of physics-based uncertainty quantification, and it is an
important illustration of the power that knowledge of the simulation and the
processes behind it are useful to the uncertainty quantification practitioner. There are
also many other ways that domain expertise can inform a UQ study. With knowledge
of the properties of the inputs and QoIs, the UQ process can be tailored to the
situation and be more efficient and more accurate. For instance, if a parameter is
known to be strictly positive, that will influence the type of distribution it can be.
Also, if a QoI cannot be larger than a given amount, the UQ procedure should respect
that.
These examples of physics-based uncertainty quantification indicate that the
expertise that the scientist has cannot be forgotten when performing a UQ study,
or, to put it the other way, the UQ expert is most effective when expert knowledge is
combined with domain expertise from a scientist or engineer. Furthermore, this type
of domain knowledge is not limited to physics; one could easily speak of chemistry-
based or biology-based UQ or any number of other technical fields.
Given the results of a UQ study, that is, knowledge of the QoI and its uncertainty, the
next question is what does one do with that information. There are several scenarios
that serve as the bridge between understanding of parametric uncertainties and the
application of that knowledge to making a prediction. As a way to highlight how
this might be used, we will detail some examples of predictive science in action.
The term best estimate plus uncertainty is used by regulators in nuclear reactor
certification around the world. The term refers to the use of simulation codes and
models that have been demonstrated to be applicable to the system and conditions
under question. The values of the QoI (typically the probability of failure) are
quoted at the most likely values of the uncertain parameters, this is the best
estimate part of the equation, and then a confidence interval around that estimate,
the uncertainty part. This confidence interval is estimated by sampling uncertain
parameters, running a simulation, looking at the distribution of the outputs, or
building an approximate model based on the outputs.
1.7 From Simulation to Prediction 17
Stars were falling across the sky myriad and random, speeding
along brief vectors from their origins in night to their destinies
in dust and nothingness.
—Cormac McCarthy, Blood Meridian, or the Evening Redness
in the West
The probability density and cumulative distribution functions are key pieces of
information about a random variable. Sometimes we know these functions, for
instance, when we say an input to a code has a normal distribution, and other times,
for example, a QoI, we would like to determine these functions. In either case, we
will need to know how the two are related and the key properties of each.
For a given real random variable X ∈ R, the cumulative distribution function
(CDF) is defined as
FX (x) = P (X ≤ x) (2.1)
= The probability that the random variable X is less than or equal to x.
Oftentimes, we will leave out the subscript on F when it is clear what random
variable we are referring to. One of the uses of the CDF is to find the probability
that a random variable is between two numbers. From the above definition, it is
straightforward to see that we can find the probability that X is between a and b via
subtraction:
In this equation we note that the probability is strictly greater than a and less than
or equal to b. This comes from the definition of the CDF that we used. Based on the
fact that a probability must be in the closed interval [0, 1] we assert that
These relations are equivalent to saying that X will take some value between
negative and positive infinity. There is one more property of the CDF that we need,
namely, that the CDF is nondecreasing. One way to state this is to say
In other words as x increases, the probability that X is less than or equal to x cannot
go down. We will show some examples of CDFs later.
If X is a continuous random variable, that is, X can take any value on the real
line or on some interval of the real line, we define the probability density function
(PDF) as
dFX
f (x) = . (2.3)
dx
We can “invert” the definition of the PDF to get the CDF in terms of the PDF:
x
FX (x) = f (x ) dx .
−∞
Following this line of thinking further, we deduce that the probability X is between
a and b is given by
b
P (a < X ≤ b) = f (x) dx = FX (b) − FX (a).
a
We note here that it is possible for the density to be undefined for a given random
variable. This can occur, for example, when the CDF is not differentiable.
As an example of the PDF and CDF of a distribution, consider the normal
distribution (also known as a Gaussian distribution). This distribution has two
parameters, μ ∈ R and σ > 0. As we will see later, these correspond to the mean
and standard deviation of the distribution. A random variable X that is normally
distributed has a PDF given by
1 (x − μ)2
f (x) = √ exp − . (2.4)
σ 2π 2σ 2
0.4
0.2
0.0
-6 -3 0 3 6
x
Fig. 2.1 Probability density functions for a normally distributed random variable with different
values of μ and σ
1.00
0.75
F(x)
0.50
-6 -3 0 3 6
x
Fig. 2.2 Cumulative distribution functions for a normally distributed random variable with
different values of μ and σ
2.1 Random Variables 23
x−μ
z= , (2.6)
σ
will create a random variable Z ∼ N (0, 1).
For discrete random variables, that is, a random variable that only takes on a
countable number of values, we cannot use a probability density function because it
does not make sense to talk about a differential volume element. Instead we define
the probability mass function (PMF) for a discrete random variable as
The notation is being somewhat abused by having both the PDF and probability
mass function use f . Nevertheless, by the context it should be clear which we mean,
and in practice this is a distinction without a difference if we think of the probability
mass function as a sum of Dirac delta functions, that is, a function that is nonzero
only at a single point and has a well-defined definite integral. For the CDF of a
discrete random variable, instead of an integral, we have a sum:
FX (x) = f (s), (2.8)
s∈S
0.5
0.4
0.3
f(x)
0.2
0.1
0.0
-2 -1 0 1 2
x
Fig. 2.3 Probability mass function for a Bernoulli distributed random variable p = 0.5
1.00
0.75
F(x)
0.50
0.25
0.00
-2 -1 0 1 2
x
Fig. 2.4 Cumulative distribution function for a Bernoulli distributed random variable p = 0.5
If the random variable is a fair coin, then p is 0.5 and we can (arbitrarily) choose
a flip that lands on heads as x = 1 and a flip that lands on tails as x = 0. The PMF
and CDF for a Bernoulli distributed X with p = 0.5 is shown in Figs. 2.3 and 2.4.
Notice the “stair-step” shape of the CDF because the probability that x is less than
or equal to a given number “jumps” when crossing 0 and 1.
2.2 Expected Value 25
The expected value is a weighted average of g(x) where the weighting function is
the PDF (or PMF).
An important special case of the expected value is the mean which is the expected
value of x. It is often denoted as μ
∞
μ = E[X] = xf (x) dx. (2.12)
−∞
In common parlance, the mean is the value of X one would “expect” when drawing
a random variable. In many cases this is true. For example, if X ∼ N (μ, σ ), then
X is normally distributed. The mean of X is then
∞
x (x − μ)2
E[X] = √ exp − dx = μ. (2.13)
−∞ σ 2π 2σ 2
The above relation can be shown by making the substitution u = x 2 in the integral.
Equation (2.13) says that μ is the mean of the distribution. It is also true that μ is
the maximum value of f (x) and therefore the most likely value of X.
The mean is not always the most likely value of a random variable, in fact it may
not even be a possible value of X. Consider the Bernoulli distribution, the mean of
this distribution is
∞
E[X] = xf (x) dx = 0 · (1 − p) + 1 · p = p. (2.14)
−∞
Therefore, mean (or expected value of X) is p when X can only take the values of
0 or 1. The mean is still useful in this case; we just cannot interpret it as the most
likely value.
An old saying about judging a random variable by its mean goes something like
this: if I put my head in the oven and my feet in ice water, my mean temperature
is just right. In other words, the mean does not tell us everything about the random
variable: do not try to walk across a river that has an average depth of 1 m.
26 2 Probability and Statistics Preliminaries
There are two useful properties of the distributions that are not related to the
expected value: the median and mode. The median is the point at which the CDF is
equal to one-half, i.e., F (x) = 12 . This is a useful quantity because it indicates the
point that splits the random variable into two equal parts: in the limit of an infinite
number of realizations, half will be above the median and half will be below the
median. This is not true of the mean. Also, the median is less influenced by outliers.
The mode is the point which the PDF takes its maximum value. Therefore it is
the most likely value of the distribution. A distribution with a single mode is said to
be unimodal.
2.2.2 Variance
The expected value of (x − μ)2 is called the variance, and often written in shorthand
as σ 2 . It is worth noting that the variance can be expressed in terms of the mean and
E[X2 ] via
In this relation, we used the fact that E[X] = μ, E[μX] = μ2 , and E[μ2 ] = μ2 .
One can interpret the variance as the average squared difference between a random
variable and its mean. The larger the value of the variance, the more likely values are
away from the mean. The square root of the variance is called the standard deviation,
σ . The standard deviation is useful because it will have the same units as X, whereas
the variance has the units of X2 .
For a normally distributed random variable X ∼ N (μ, σ ), the variance of X is
σ 2 . The fact that larger values of σ 2 correspond to values away from the mean being
more likely can be seen in Fig. 2.1. In that figure those curves with larger values of σ
have much wider curves. For the Bernoulli distribution, the variance can be shown
to be p(1 − p). Therefore, the maximum value of σ 2 for the Bernoulli distribution
is 0.52 = 0.25 and occurs when p = 0.5.
2.2.3 Skewness
The mean and the variance are related to the expectation of X and X2 , respectively.
The skewness, γ1 , is related to the third moment of f (x), that is, the expected value
of X3 :
E[(X − μ)3 ]
γ1 = . (2.15)
Var(X)3/2
2.2 Expected Value 27
Skew 1.74
Skew -1.55
0.6
Mean
0.4
f(x)
0.2
0.0
0 2 4 6
x
Fig. 2.5 The PDFs of two distributions demonstrating positive and negative skewness. Notice that
for positive skewness in this case the peak of the distribution is to the left of mean and for negative
skewness it is to the right
The skewness tells us something about the symmetry of the distribution about the
mean. The skewness can be counterintuitive because a distribution with positive
skew may look as though it is leaning to the left or negative direction.
As illustrated Fig. 2.5, the skewness tells us how the distribution goes to zero
away from the mean when the distribution has a single maximum (a unimodal dis-
tribution). For this type of distribution, a negative skewness tells us the distribution
goes to zero more slowly to the left of the mean, whereas a positive skewness says
the opposite. The normal distribution has a skewness of 0 because it is symmetric
about the mean.
2.2.4 Kurtosis
Next on the list of properties of a distribution is the excess kurtosis (usually just
referred to as the kurtosis) which is a measure of “tail fatness” for a distribution.
The kurtosis, Kurt(X), is related to the fourth moment of a random variable’s PDF
and is defined as:
E[(X − μ)4 ]
Kurt(X) = − 3. (2.16)
σ4
The minus three is included so that a normal distribution has a kurtosis of 0. The
definition of the kurtosis is such that for a unimodal distribution, the slower the
28 2 Probability and Statistics Preliminaries
PDF approaches zero as one moves away from the mode, the higher the kurtosis
will be. Another way of thinking about it is that the sign of the kurtosis tells you
if the distribution has heavier tails than a normal distribution (positive kurtosis) or
if it has thinner tails than a normal distribution (negative kurtosis). There are also
fancier names for these cases. A distribution that has negative kurtosis is said to be
platykurtic from the Greek platy1 for “flat”, whereas a positive kurtosis indicates a
leptokurtic distribution from the Greek word leptós meaning narrow.2 A distribution
with zero kurtosis is mesokurtic.
As an example we look at a uniform distribution, a normal distribution, and the
logistic distribution in terms of kurtosis. A uniform distribution has a PDF that is
uniform over a finite range:
1
x ∈ [a, b]
funi (x) = b−a . (2.17)
0 otherwise
A uniform distribution over the range [a, b] is written as X ∼ U (a, b). The kurtosis
of a uniform distribution is − 65 and a variance of 12
1
(b − a)2 . We already noted that
the definition of kurtosis we are using defines a normal distribution as having a
kurtosis of zero. The logistic distribution’s PDF is given by
1 x−μ
flogistic (x) = sech2 , (2.18)
4s 2s
1 To remember this one can think of a duck-billed platypus having a flat bill, or, for the animal
taxonomy aficionado, the name of the phylum of flat worms, Platyhelminthes.
2 There does not exist a great mnemonic for leptós, unfortunately. Leptons are small (i.e., narrow)
mass subatomic particles, but one does not typically think of them as narrow. Interestingly, the
word leptós can be found in Mycenaean Greek documents written in Linear B, one of the oldest
recorded forms of Greek.
2.2 Expected Value 29
Normal
0.4 Uniform
Logistic
0.3
f(x)
0.2
0.1
0.0
Fig. 2.6 PDFs for a uniform, normal, and logistic distribution all with mean 0 and variance 1
0.008 Normal
Uniform
Logistic
0.006
f(x)
0.004
0.002
0.000
Fig. 2.7 Detail of Fig. 2.6 where we see that for the logistic distribution, one is more likely to have
extreme values (greater than 3 standard deviations from the mean) than a normal distribution
random variable. The moments are integrals over the probability distribution. To
estimate these quantities, we rely on the näive estimator:
∞ 1
N
E[g(X)] = dx g(x)f (x) ≈ g(xi ), (2.19)
−∞ N
i=1
where xi is a sample from the PDF f (x) and N is the number of samples. In
other words, the expected value of g(x) is approximated by the average value of
g(xi ). Therefore, an estimate of the mean of the PDF can be estimated via the
approximation:
1
N
μ≈ xi ≡ x̄. (2.20)
N
i=1
The notation x̄ is used for the estimate of the mean and is known as the sample
mean or sample average. This estimate of the mean will have an error based on the
randomness of the samples involved. One can show, via the √ central limit theorem,
that the error is the estimate of the mean is proportional to 1/ N as N → ∞.
The variance estimate is similar in that we are trying to estimate an integral. There
is a slight wrinkle, however, because to estimate the variance, we use our estimate of
the mean. The formula for the estimate of the variance based on a sample of random
variables is written as s 2 given by:
∞ 1 1
N N
Var(X) = (x − μ)2 f (x) dx ≈ (xi − μ)2 ≈ (xi − x̄)2 ≡ s 2 .
−∞ N N −1
i=1 i=1
(2.21)
The factor 1/(N − 1) comes from the fact that the we have to use the estimate
of the mean, x̄, instead of the true mean. This factor is called Bessel’s correction
and comes from the fact that the quantity (xi − x̄) has N values but only N − 1
independent values because the sum of (xi − x̄) must equal zero. Nevertheless, if N
is large, the correction has a small effect.
The skewness has a similar formula for an estimator; it is a combination of the
sample mean, x̄, and the sample variance, s 2 , along with an additional integral
estimate. The skewness estimate for a sample is written as b1 and given by
N
i=1 (xi − x̄)
1 3
b1 = N
. (2.22)
(s 2 )3/2
This function is the probability that each random variable is smaller than a given
number. As before, this definition allows the difference of the joint CDFs to give
you the probability that each random variable is within a range:
As before, the derivative of the joint CDF is the joint probability density function
(joint PDF):
∂ p F (x)
f (x) = f (x1 , x2 , . . . , xp ) = . (2.25)
∂x1 ∂x2 . . . ∂xp x
The joint CDF is then the integral of the joint PDF in a similar fashion to the single
variable:
x1 x2 xp
F (x) = dx1 dx2 . . . dxp f (x ). (2.26)
−∞ −∞ −∞
Using the joint PDF, we can get the PDF of a single variable. For instance, f (x1 )
can be computed by integrating over the other p − 1 variables:
∞ ∞
f (x1 ) = dx2 . . . dxp f (x ). (2.27)
−∞ −∞
That is, if we integrate over the second through pth variables, we will have a
function of just x1 that is equal to it PDF. In this case we would call f (x1 ) the
marginal probability density function for random variable X1 . Additionally, we
can define a marginal cumulative distribution function for X1 as
32 2 Probability and Statistics Preliminaries
x1 ∞ ∞
F (x1 ) = dx1 dx2 . . . dxp f (x ). (2.28)
−∞ −∞ −∞
Clearly, the marginal PDF and CDF could be defined for any of the p variables in
the multivariate distribution.
We can generalize the idea of the marginal PDF into the joint marginal PDF of
any subset of the p variables. Say for l < p variables, the joint PDF for these l
variables is
∞ ∞
f (x1 , x2 , . . . , xl ) = dxl+1 . . . dxp f (x ). (2.29)
−∞ −∞
Using the definition of Eq. (2.27), we can simplify this, for fX (x) = 0.
f (x, y)
f (y|X = x) = ,
fX (x)
where we have used the subscript X to indicate that fX is the PDF of the random
variable X. Going back to the more general case, the conditional probability of l
random variables given p − l other variables is
where
∞ ∞
f (xl+1 , . . . , xp ) = dx1 . . . dxl f (x).
−∞ −∞
The variance for a collection of random variables is more complicated than that for
a single variable because we can look at how the random variables change together.
The measure of this is called the covariance and the covariance between Xi and Xj
is written as σij :
The covariances form a p×p symmetric matrix with the diagonal being the variance
of each random variable. The covariance matrix is typically denoted by Σ(x) so that
There is a special case for a collection of random variables where the joint PDF
can be factored into the product of individual PDFs as
p
f (x) = f (xi ).
i=1
0.4
Density
0.3
0.2
0.1
0.0
-4 -2 0 2 4
2 2
1 1
X2
0 0
-1 -1
-2 -2
and the covariance matrix Σ was defined in Eq. (2.35), with the determinant of the
matrix written as |Σ|. The notation for a random variable X to be a multivariate
normal with mean vector, μ, and covariance matrix Σ is X ∼ N (μ, Σ) (see
Fig. 2.8).
2.4 Stochastic Processes 35
where A is a random variable given by A ∼ N(0, 1). In this case we can write the
mean function as
∞
cos(x + a) −a 2 /2 cos(x)
μ(x) = √ e da = √ .
−∞ 2π e
(e − 1)(e − cos(2x))
σ 2 (x) = .
2e2
36 2 Probability and Statistics Preliminaries
1.0
u(x) 0.5
0.0
-0.5
-1.0
0 2 4 6
x
Fig. 2.9 Five realizations of a simple stochastic process u(x; ξ ) = cos(x +A), x ∈ [0, 2π ], where
A is a random variable given by A ∼ N (0, 1). The black line is the mean function of the process,
μ(x), and the gray band represents μ(x) ± σ (x)
Five realizations of this process are shown in Fig. 2.9. This is a particularly
simple stochastic process because all of the randomness is contained in a single
parameter, and this makes the mean and covariance functions computable.
Σij = k(xi , xj ).
1
u(x)
-1
Fig. 2.10 Five realizations of a Gaussian process defined on x ∈ [0, 1] with μ(x) = 0 and
k(x1 , x2 ) = exp(−|x1 − x2 |)
1
u(x)
-1
Fig. 2.11 Five realizations of a Gaussian process defined on x ∈ [0, 1] with μ(x) = 0 and
k(x1 , x2 ) = exp(−(x1 − x2 )2 )
1
u(x)
-1
Fig. 2.12 Five realizations of a Gaussian process defined on x ∈ [−0.5, 0.5] with μ(x) =
cos(2π x) and k(x1 , x2 ) = 0.1 exp(−|x1 − x2 |)
function that has a range from [0, 1] and is a monotonic, non-decreasing function.
Therefore, the CDF is invertible. With this result we can take a uniformly distributed
random variable between 0 and 1 and invert the CDF to get a sample of the random
variable associated with that CDF. That is,
will give a sample x that is distributed according to the CDF F (x). Note that if the
CDF has jumps, then the inverse CDF is defined so that it gives the smallest value x
such that F (x) = ξ .
An illustration of this procedure is shown in Fig. 2.13 for a standard normal
random variable. Here we show samples of a uniformly distributed variable between
0 and 1 and the corresponding samples from the distribution after inverting the CDF.
Notice where the CDF is changing more rapidly, there is a higher density of samples.
f (x) = λe−λx , x ≥ 0,
1.00
0.75
F(x)
0.50
0.25
0.00
-4 -2 0 2 4
x
Fig. 2.13 In this figure we show a set of points on the y-axis that are randomly chosen between 0
and 1. Then on the x axis, we show the corresponding sample points from inverting the CD (in this
case the standard normal CDF). Notice that the uniform samples in y are nonuniformly clustered
around 0, as we would expect for samples from a standard normal random variable
Therefore,
− log(1 − ξ )
x= ,
λ
and x will be distributed according to f (x).
In this example we will explain a clever way of inverting the CDF for a standard
normal random variable. A sample from a standard normal random variable can be
transformed to a general normal random variable through the relation:
Consider a normal random variable with mean 0 and standard deviation 1. The
associated PDF will be
1 −x 2
f (x) = √ e 2 .
2π
40 2 Probability and Statistics Preliminaries
The Box-Muller transform gives a way to get two samples at a time. Consider the
product of two PDFs:
−(x 2 +y 2 )
e 2
f (x) dxf (y) dy = dx dy.
2π
If we change coordinates into polar coordinates so that
dx dy = r dr dθ,
for r = x 2 + y 2 , and θ = tan−1 (y/x), we can write
−r 2 dθ
f (x)f (y) dy dx = e 2 r dr r ∈ [0, ∞), θ ∈ [0, 2π ].
2π
We can separate this expression into two functions,
−r 2
g(r) = e 2 r,
and
1
h(θ ) = .
2π
These functions are both properly normalized PDFs:
∞ 2π
g(r) dr = dθ h(θ ) = 1.
0 0
θ = 2π ξ1 , ξ1 ∈ [0, 1].
To sample an r from g(r), we can use the result from the previous example if we
define u = r 2 and du = 2rdr to get
r= −2 log(1 − ξ2 ), ξ2 ∈ [0, 1].
As a result, drawing two random numbers, ξ1 and ξ2 , gives two samples from the
Gaussian:
x = r cos θ, y = r sin θ.
2.5 Sampling a Random Variable 41
This compares with the brute force approach of inverting the CDF for a normal
random variable, with inverting two simple CDFs. The trade-off is that one needs to
generate two samples at a time.
Σ = LLT ,
where L is a lower triangular matrix. The Cholesky decomposition exists for any
symmetric matrix of real values that is positive definite. The covariance matrix
satisfies these properties. The Cholesky decomposition requires O(p3 ) floating
point operations to compute and is therefore expensive when p is large.
With the Cholesky decomposition, we then generate p independent samples from
a standard normal random variable:
x = μ + LZ.
Σ(Z) = E[ZZT ] = I.
Now consider a vector X = LZ. The covariance matrix for this collection of random
variables is
From this we can move the L’s from outside the expectation operator to get
To shift this result to a variable with a nonzero mean, we add in the desired mean μ.
42 2 Probability and Statistics Preliminaries
Σij = k(xi , xj ), i, j = 1, . . . , I.
The vectors that we sample can be interpreted as the Gaussian process evaluated at
each point:
In some cases it is difficult to create the CDF from the PDF or the CDF may not
be known in closed form or may not be invertible except by expensive numerical
solution. In this case, it can be easier to use rejection sampling. To illustrate how this
works, we will take the PDF for a random variable X, where the random variable
takes values only inside a given range [a, b]. We then draw a rectangle around
the function. The base of the rectangle extends from a to b and the height of the
rectangle is the maximum value of the PDF, called h here. An example of this is
shown in Fig. 2.14. Then we pick points at random in the box, i.e., X ∼ U (a, b),
and Y ∼ U (0, h). If the point is below the PDF, i.e., y ≤ f (x), then we accept it,
and if not is rejected. The accepted values of X are our samples from the random
variable. Figure 2.15 shows how rejection sampling proceeds as more points are
tried.
2.6 Rejection Sampling 43
f(x)
0
a b
x
Fig. 2.14 Illustration of drawing a box around the PDF for a random variable for the purpose of
rejection sampling
h h
f(x)
f(x)
0 0
a b a b
x x
Fig. 2.15 Rejection sampling at two different number of attempted samples, 300 (left) and 1000
(right). The points with an “times” symbol were rejected and the “circled plus” points were
accepted
f(x)
0
a b
x
1
f (x|μ = 0, σ 2 = 1) = √ e−x /2 .
2
2π
Additionally, in Eq. (2.31) we wrote that the conditional probability was the joint
probability density function divided by the marginal probability density function,
viz.,
Equating these two expressions and rearranging, we can write Bayes’ law (or Bayes’
theorem or Bayes’ rule)
f (y|x)fX (x)
f (x|y) = . (2.40)
fY (y)
f (y|x)fX (x)
f (x|y) =
∞ . (2.41)
−∞ f (y|x)fX (x) dx
Oftentimes, we write Bayes’ law using special notation that indicates the
interpretation of its implications. We define π(x) as the prior probability density
function for X, and π(x|y) as the posterior conditional probability density function
for X given Y = y, and f (y|x) as the conditional likelihood, or just likelihood, of
y given X = x. Using this notation we write
f (y|x)π(x)
π(x|y) =
∞ . (2.42)
−∞ f (y|x)π(x) dx
The interpretation of Bayes’ law is that we have a prior density function for x
that we update given the observation that Y = y to get π(x|y).
Assume a drug test is 99% accurate in the sense that the test will produce 99% true
positive and 99% true negative results. Say 0.5% of the population use the drug. An
individual tests positive. What is the probability they are a user?
P (+|user)P (user)
P (user| + test) =
P (+|user)P (user) + P (+|non-user)P (non-user)
0.99 · 0.005
= = 0.332,
0.99 · 0.005 + 0.01 · 0.995
or 33.2%.
46 2 Probability and Statistics Preliminaries
Say we want to know the fairness of a coin (i.e., is the probability of heads 12 ?). If
I flip the coin 10 times and get 3 heads, what is my estimate of the probability of
getting heads on any toss? Using Bayes’ rule we write the probability of heads as p
and write
f (y|p)π(p)
f (p|y) =
∞ .
−∞ dpf (y|p)π(p)
In this equation
• f (y|p) = probability density of getting y given a value of p,
• π(p) = prior distribution on p (what I believe given no data), and
• f (p|y) = posterior distribution for p given data y.
For the coin example, we claim to have no idea if the coin is fair, i.e., p could be
anywhere between 0 and 1 with equal likelihood. We express this as
1 p ∈ [0, 1]
π(p) = .
0 otherwise
In Fig. 2.17, we show the results of this trial. We see in the posterior the maximum
is at p = 0.3, but it does not rule out the coin being fair. The posterior does rule out,
however, p = 0 or p = 1, because those are not possible given the observation of
only 3 heads.
A useful feature of Bayes’ theorem is that we can update the posterior if new
data comes along in the same way as before. That is, we use the current posterior as
2.7 Bayesian Statistics 47
3
prior
posterior
2
f(p)
Fig. 2.17 Posterior and prior distributions of the probability of getting 3 heads in 10 tosses for a
coin of unknown fairness
the prior in another calculation. If we make 990 more flips of the coin and get 430
heads, this makes the likelihood in the numerator
990 430
f (430|p) = p (1 − p)560 = 5.127419 × 10292 p430 (1 − p)560 ,
430
prior
posterior
20
f(p)
10
Fig. 2.18 Posterior and prior distributions of the probability of getting heads for the coin tossing
example
2.8 Exercises
1. Show that the transformation in Eq. (2.6) results in a standard normal random
variable by computing the mean and variance of Z.
2. Consider the random variables X ∼ U (−1, 1) and Y ∼ X2 . Are these
independent random variables? What is their covariance?
3. Show that a general covariance matrix must be positive definite, i.e. xT Σx > 0
for any vector x that is not all zeros.
4. Use rejection sampling to sample from a Gamma random variable X ∼ G (α, β)
where
x α e−βx
f (x) = , α > −1, β > 0.
Γ (α + 1)β −α−1
1 (y − θ )2
f (y|θ ) = √ exp − .
σ 2π 2σ 2
The parameters μ and τ are called hyperparameters. Using Bayes’ theorem find
p(θ |y), and show that it is a normal distribution.
8. Suppose that X is the number of people arriving at a particular tavern during
a given hour. This type of arrival process is naturally described by a Poisson
process:
e−θ θ x
f (x|θ ) = , x ∈ {0, 1, 2, . . . }, θ > 0.
x!
We then say that our prior distribution of θ is a Gamma distribution
θ α−1 e−βθ
π(θ ) = , α, β > 0.
Γ (α)β −α
Compute and plot the marginal PDFs for X and Y . Additionally, compute the
conditional probability distributions, and make plots of f (y|X = μx ) and
f (x|Y = μy ).
11. Consider a covariance function between points in 2-D space:
In this chapter we will explore how we can use the principles of statistics and
probability to model input parameters to simulation models. This discussion will
require that we understand how random variables depend on each other, how we
can model this dependence when we have limited information, and how we can
approximate a collection of random variables, or even a stochastic process, based
on some underlying structure.
In a computer simulation, there will be typically several random variables as
inputs. For a collection of random variables, it is common to not have an expression
for the joint distribution functions (CDF or PDF) for the collection. Rather, the best
one can do is hope to have some measure of the dependence between the pairs
of variables. As we will see, the dependence measures we use are not enough to
uniquely determine the relationship between random variables.
Additionally, later when we try to model the distribution of output quantities
of interest based on input uncertainties, we will see that the number of random
variables we have as input determine the accuracy we can achieve with our
uncertainty quantification given a fixed computational budget. Therefore, we would
like to determine if we can eliminate input random variables if there is an underlying
correlation or approximation. Methods for this type of reduction will be discussed
in this chapter as well.
is that it has units that are the product of the units of X and Y . This can make it
difficult to compare covariances. For instance, Σ(X, Y ) > Σ(X, Z) does not imply
that there is a stronger relationship between X and Z than X and Y because of the
units.
A normalized measure of the relation between two random variables is the Pearson
correlation coefficient, ρ. Oftentimes, this is simply called the correlation coefficient
or correlation. Considering two random variables, X, and Y , the correlation
coefficient is
E[XY ] − E[X]E[Y ]
ρ(X, Y ) = . (3.2)
σX σY
That is, the Pearson correlation is the covariance normalized by the standard
deviation of each variable. On this normalized scale, we can say things about how
two variables change together. If the variables are independent, then ρ(X, Y ) = 0.
As with covariance, a correlation of zero between variables does not imply that the
variables are independent.
One property of the correlation coefficient is that if X and Y are linearly related,
i.e., there exist an a and b such that Y = aX + b, then ρ(X, Y ) = sign(a). As a
corollary, if we define a new random variable X = aX + b, we have the relation
ρ(X , Y ) = sign(a)ρ(aX + b, Y ),
which can be shown found using the properties of the expected value.
When we have a collection of random variables, X = (X1 , X2 , . . . , Xp )T , we
can define a correlation matrix R in terms of the covariance matrix as
Σij
Rij = , (3.3)
σXi σXj
x0 =0, y =1 0.4
0.6 x0 =0, y =2
x0 =1, y =1 Cauchy x0=0, y =1
x0 =-2, y =0.5 0.3 Normal µ=0, σ=1
0.4
f(x)
f(x)
0.2
0.2
0.1
0.0 0.0
-6 -3 0 3 6 -6 -3 0 3 6
x x
Fig. 3.1 The Cauchy distribution with various parameters and compared with the standard normal
The mean and variance of the distribution are undefined because the distribution
goes to zero too slowly, but the median and mode are x0 . The PDF for a Cauchy
distribution and it’s comparison to the standard normal are given in Fig. 3.1.
Another, potentially more important, downside of the Pearson correlation coef-
ficient is that if X is transformed by a nonlinear, strictly increasing function, g(X),
the correlation ρ(X, Y ) will be different than ρ(g(X), Y ). This means that if there
is a nonlinear relation between X and Y , the Pearson correlation coefficient may
under- or overestimate the relation between the two variables.
If we do not know the marginal CDF, but we have samples of the random variables,
we can still estimate the Spearman correlation. Given N samples of X and Y , we
create a function that takes sample xi or yi and gives the rank of that sample among
the N samples:
Using this function we then define the Spearman correlation coefficient for the
samples:
N
i=1 (rank(xi ) − r̄X )(rank(yi ) − r̄Y )
ρS (X, Y ) = , (3.6)
N N
i=1 (rank(xi ) − r̄X ) i=1 (rank(yi ) − r̄Y )
2 2
where
1
N
r̄X = rank(xi ).
N
i=1
When computing ρS any ties in the data are assigned the average rank of the tied
scores.
One of the important properties of the Spearman correlation is that if there
exists a strictly increasing function g(X) that relates X to Y as Y = g(X), then
ρS (X, Y ) = 1. Furthermore, a strictly monotonic transformation of X or Y will not
affect the Spearman correlation
As with the Pearson correlation, we can compute a Spearman correlation matrix
for a collection of random variables X = (X1 , . . . , Xp )T . We will call this matrix
RS , and it is given by
RS,ij = ρS (Xi , Xj ).
The final measure of correlation that we will use is Kendall’s tau or the Kendall rank
correlation coefficient. Similar to the Spearman correlation, it tries to measure the
relation between two variables in terms of the ranks. It is best for looking at a sample
population of random variables because it requires looking at pairs of samples of
random variables. To define Kendall’s tau, consider N samples of random variables
x and y. We examine all the pairs of samples (xi , yi ) for i
= j . There are 12 N (N −1)
such pairs. We look at each pair and say that a pair ij is concordant if xi > xj and
yi > yj or if xi < xj and yi < yj . A pair is discordant if xi > xj and yi < yj or if
xi < xj and yi > yj . If either xi = xj or yi = yj , then the pair is a tie.
3.1 Dependence Between Variables 57
20
0
y
-20
y = x + 0.05z
y = (x + 0.05z)5
-2 -1 0 1 2
x
Fig. 3.2 The comparison of Pearson, Spearman, and Kendall’s tau correlation measures on 300
samples of two pairs of random variables, (x, x+0.05z) and (x, (x+0.05z)5 ), where z is a standard
normal random variable. The three measures give a correlation of ρ = 0.999, ρS = 0.999, and
τ = 0.973 for the correlation of (x, x + 0.05z). For the correlation of (x, (x + 0.05z)5 ), the
Spearman correlation and Kendall’s tau values do not change, but ρ = 0.843 for this data
The range of τ is [−1, 1]. Kendall’s tau has the property that it is not affected by
performing a nonlinear, increasing transformation on either random variable: this
is the same property Spearman correlation has. We can relate τ to the Pearson
correlation coefficient if the variables X and Y are jointly normally distributed
through the equation
2
τ (X, Y ) = arcsin ρ(X, Y ).
π
We will use Kendall’s tau when we want to relate two random variables through
copulas.
As comparison of the correlation measures, Fig. 3.2 shows how a strictly
increasing transformation of a variable changes the Pearson correlation, but not
the Spearman correlation or Kendall’s tau. In the figure the correlation between
random variables (x, x + 0.05z) and the correlation between (x, (x + 0.05z)5 ),
where z is a standard normal random variable, are computed. The Spearman
and Kendall measures do not change, whereas the Pearson correlation drops
by 15%.
58 3 Input Parameter Distributions
This is the probability that Y goes to its lower bound as X goes to its lower bound.
The upper tail dependence is
λu (X, Y ) = lim P(Y > FY−1 (q) | X > FX−1 (q)), (3.9)
q→1
and measures the probability that X and Y go to their upper bound together.
Tail dependence is different than typical correlation measures in that it is only
interested in extreme values. For example, two variables could have a Pearson
correlation of 0.5, but a tail dependence is much larger, say 0.9. This has been
observed, for example, in the returns of stocks. Many stocks that had low correlation
in typical times had very high lower tail dependence during the financial crises (they
all went down a lot).
The lower tail dependence can be written in terms of the joint CDF for two
variables. Using the definition of the CDF and law of total probability, we get that
These equations give us formulas for the tail dependences in terms of the joint and
marginal CDFs for each of these variables.
3.2 Copulas 59
3.2 Copulas
This definition takes the marginal CDF for each variable and creates a joint CDF. A
result known as Sklar’s theorem tells us that such a copula will exist for any joint
CDF, and it is unique if the marginal CDFs are continuous. A copula has the domain
u, v ∈ [0, 1] and a range of [0, 1]. For a given copula, we can define the joint PDF
as
∂2
c(u, v) = C(u, v). (3.14)
∂u∂v
This definition is a special case of Eq. (2.25). Additionally, the conditional CDF
C(v|u) is
∂
C(v|u) = C(u, v). (3.15)
∂u
60 3 Input Parameter Distributions
The tail dependence for a copula can be obtained by plugging Eq. (3.12) into the
definitions for tail dependence, Eqs. (3.8) and (3.9), to get
C(q, q)
λl = lim , (3.16)
q→0 q
and
1 − 2q + C(q, q)
λu = lim . (3.17)
q→1 1−q
CI (u, v) = uv.
Copulas are widely used in the finance and insurance industries to model the joint
distributions of risks. Because the fact that mapping marginal distributions to joint
distributions is not unique, the way we use copulas requires choices by the user. The
considerations of ease of use, matching observed correlation, and tail dependence
have to be weighed when choosing a copula.
where R is a correlation matrix for the intended joint distribution. The normal copula
is simple to sample. Given two random variables X and Y with marginal CDFs
FX (x) and FY (y), we can generate a sample from CN (FX (x), FY (y)) using the
following procedure:
1. Sample from the collection of two random variables Z ∼ N (0, R) using the
Cholesky factorization approach in the previous chapter.
2. Compute u = Φ(z1 ) and v = Φ(z2 ).
3. The samples are x = FX−1 (u) and y = FY−1 (v).
Therefore, via the normal copula, we can create a joint distribution that has a
prescribed Pearson correlation where the underlying marginal distributions do not
have to be normal. This is different than saying that the two variables are a
multivariate normal with a known correlation. Note that the matrix R has only 1
3.2 Copulas 61
degree of freedom because the diagonal is 1 and it is symmetric; we can call this
degree of freedom ρ. It can be shown that for a normal copula, the value of Kendall’s
tau is
2
τ (X, Y ) = arcsin ρ, (3.19)
π
Therefore, given a desired value of Kendall’s tau for the joint distribution, one can
produce it using the normal copula.
The normal copula has zero tail dependence: as one variable approaches ±∞, the
probability that the other variable does the same goes to zero. Therefore, if we are
modeling a system where tail dependence could matter greatly, e.g., analyzing how
the system behaves under input variables near their extremes, the normal copula
may not be appropriate.
The normal has been blamed for the financial crisis of 2008 (Jones 2009) because
it does not account for the fact that mortgage defaults, while not being correlated
under normal circumstances, have strong lower tail dependence because if everyone
in a neighborhood is foreclosed, then housing prices fall, and more mortgages
then default: a fact that risk assessors never understood, or to be more charitable
they did not account for it. The lack of tail dependence needs to be carefully
analyzed when quantifying uncertainty in a physical system. In many cases tail
dependence could be present, and we need to understand how this may affect our
predictions.
In Fig. 3.3, two uniform distributions joined by a normal copula with ρ = 0.8
are shown. Notice how there is a clear correlation between the two random variables
and, as a result, a clustering in the corners of the distributions. An important property
of these samples is that they are not normal; we have just used a normal copula to
join them.
3.2.2 t-Copula
A distribution similar to the normal is the t-distribution: it is unimodal but has more
kurtosis than a normal random variable. This distribution can be used to define a t-
copula with a scale parameter ν > 0 and a positive definite, symmetric scale matrix
S with a diagonal of ones as
0.15
Density 0.10
0.05
0.00
0 2 4
2.5
2.5
y
2.0 2.0
0.00 0.250.500.75 1.00
-1 2 5 Density
x
Fig. 3.3 Samples from uniform random variables X ∼ U (−1, 5) and Y ∼ U (2, 3) joined by a
normal copula with ρ = 0.8. From these 104 samples, the empirical value of τ and the predicted
value from Eq. (3.19) are shown also
0.15
Density
0.10
0.05
0.00
0 2 4
2.5
2.5
y
2.0 2.0
x Density
Fig. 3.4 Samples from uniform random variables X ∼ U (−1, 5) and Y ∼ U (2, 3) joined by
a t-copula with r = 0.8 and ν = 4. From these 104 samples, the empirical value of τ and the
predicted value from Eq. (3.19) are shown also
fact that the t-distribution with a small value of ν has more kurtosis than a normal
distribution. Therefore, it is more likely to get anticorrelated values as samples. The
fact that the t-copula has tail dependence can also be observed in this figure in the
concentration of points near the lower left and upper right corners.
The tail dependence can be seen even more clearly if we use a t-copula to
couple two normal random variables. In Fig. 3.5 the t-copula and normal copulas
are compared. Here, we see that the tail dependence appears as the area that the
samples occupy narrowing as the upper right and lower left corners are approached
in the t-copula, but this not present in the normal copula. This discrepancy in the
tails exists even though both distributions have the same value for τ and the same
marginal distributions for X and Y . The change in the underlying distribution as a
function of r and ν is shown in Fig. 3.6. In this figure two standard normals are
joined by a t-copula. As τ increases the tail dependence between the distributions
increases.
64 3 Input Parameter Distributions
0.4
0.3
Density
Density
0.3
0.2 0.2
0.1 0.1
0.0 0.0
-4 -2 0 2 4 -4 -2 0 2 4
2 2
2 2
0 0
0 0
y
y
-2 -2 -2 -2
-4 -4 -4 -4
0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3
-4 -2 0 2 4 Density
-4 -2 0 2 4 Density
x x
Fig. 3.5 Samples from standard normal random variables X ∼ N (0, 1) and Y ∼ N (0, 1) joined
by a t-copula with r = 0.8 and ν = 4 (left) and the normal copula with ρ = 0.8 (right). From these
104 samples, the empirical value of τ and the predicted value from Eq. (3.19) are shown also. Note
the tail dependence in the t-copula that is lacking in the normal copula: when one variable is close
to ±4, the other variable is also likely to be close to ±4
The Fréchet copulas CL and CU are simple copulas that join random variables with
Spearman correlation ±1. Furthermore, any other copula is bounded by the relation
CL ≤ C ≤ CU . The Fréchet copulas are
CL will give perfect negative dependence between variables and CU will give
perfect positive correlation between variables. We can then combine Fréchet copulas
to describe something with a Spearman correlation between [−1, 1]:
These are a simple combination and can give a Spearman correlation given
by 2A − 1.
4 l
l
l l
l
l
l
l l
l l
l l l l
l l
l ll
ll l l ll ll l
l l l l
l l l l l l l l l
l l
l l l ll l l l ll
l ll
2 l l l l l
l l l l l
l l l l l ll l
l
l l l l
l l l l l l l l
l l ll l l l l l
l l l l ll
l l ll l ll
l l l lll l l l
l l
l
l l l lll l l l l l l ll ll l
l
ll l l l l l l ll l l ll l l ll
l ll l l ll l ll
ll
l l l ll ll
ll l l l l l ll l ll l l l ll
l l
l l l l l
ll ll l l l l ll
ll l l l l l l
l lll
ll
l l l
l l l l l ll l l
l
l
l l
l
l ll ll l ll l ll l l l
l l ll l l l ll l l l ll l ll l lll
l l lll l l l l l l l l l ll l l l ll l ll ll l
l l
l l l
l l ll
l
l ll l
l l
ll
l ll l l l l ll l
l
ll
l ll l
l
lll l l l l l ll ll
l llll l
ll l l l ll l l
ll l l ll l ll
l
l ll l l l l ll l ll
l l l l l l l l l l l ll l l ll l l l l l
ll
l l
l l l ll l lll l l l l llll l l l l
l l lll l ll l l l l l ll ll l l l ll l l l l l ll l l
l
l l l lll l l l l l l l l ll ll l l l
l ll lll l ll l l ll l l lll l l l l l lll
l l ll
l l l
l l l l l l ll ll l l l
ll l l l l l l l l
l l l lllll
ll l l l l lll l l l l ll l lll l ll l ll l l
ll
ll l
l ll
l
l l ll ll
l ll l l l l
ll ll l l ll l ll ll lll l l l ll l
l l
l
l ll l llll
lll
l ll l l l l l l l ll l l ll ll ll l l
l l
ll
l ll lllll lll ll ll l lll l l
lll
ll lll l
l ll l l ll l l ll l l l l l ll l ll l llll l ll ll ll
ll
ll
ll lll l
l l ll l l
ll ll l
ll
ll
l l l l ll
l l
l
l l l ll l l l ll l ll ll llll lll l
l l l
ll l
l ll llll l
l ll l ll l l l l ll l l
l ll l ll l l l l
ll l
l ll lll
l ll l
l l l ll l l l ll l ll ll
ll
l l ll l
l l l l lll l l l ll l l l l l l ll l l l ll ll l ll l l ll l ll l ll l
l l l ll l ll ll ll l ll lll l ll ll l l l l l l l l l l l l
l ll l lll ll
lll ll
llll lll ll l l
l ll l ll l l ll l l l l ll
l l l llll l l ll l l ll ll l l l
l ll lll
l l ll l l l l ll ll l llllllll ll
llll
l l
l llll l l l llll ll l l
l l l l l l l llll
l
l l ll l ll l l l l l l l llll ll
l l l l
ll
lll
ll l ll l l l
l l l ll l ll l l l ll
lll l l l ll ll
l l ll
l ll l ll ll ll ll l llll ll llll ll
l l l l l l lll l l ll l
l l lll l l l
ll l l l l ll l lllll l ll ll lll
l l
l l lllll l ll l lll l ll l l l l ll lllll l ll
ll l l l
l l l l
l ll llll ll ll l l l l
l l lll ll ll lll
l llllllll ll l l l l l ll l l l
ll l ll ll l l
l l l ll l llll l
lll l l lll lll
0
l l l
l l l l l ll l l
l l l l l ll l ll l
lll l
l ll l ll
l l ll l l l
ll l lll l
ll ll lllll
lll
l l
llll ll l
ll ll ll l ll l l ll
l l lll l l l l lll ll ll l l lll l l ll l l ll lll
ll
ll
l l
l l l l ll lllll ll l
l ll lll
l l ll ll ll ll l l ll l lll l lll l l l l l ll l llll
llll l l l
l
l lllll
l l lll l ll l l l l ll lll ll l llll l ll l l l ll lll l l ll l l l l
l
l lll l l l lll
l ll l
l l l lll ll l l l l l l ll l ll llll
ll l l llll l
l
l l l l
l l ll l ll
l ll l l lllllll l
lll ll l l
l ll l l l ll ll
l l ll lll l ll l l l ll llllll l ll
ll
l ll
lll
l
l ll
ll ll l
l
l
l l ll l
l
lll ll l ll l
ll
l l l
llll l l l
ll
l lllllll lll
l lll l lll lll ll l ll ll l ll l l
lll ll l ll
l l l ll l l lll l l l l
l
l lll l l ll l l
l l l ll
ll llll ll l l l l l l l l l
lll l
l ll l lllllll lll ll lll
l
lll l l ll
lll llll l l lll l l l l l l l ll l l
ll l l
l l l ll l l l l ll
ll l ll l
l l l ll ll l ll l l ll lll lll l ll llll ll
lll l l ll l
l ll l l l lll l l ll ll l l
l l ll lll l l l l l lll l ll ll l l l l l l llll l lll
ll
l
l
lllll l l l ll l l ll l
l
l ll l lll l l ll l l ll l l l l
lllll
lll llll
l ll
l lll ll l
ll
l ll l l l ll l ll ll l
l l ll l l l ll l l l llll l
l l ll l ll l l
l l l l lll l ll l l ll l ll l l llllllll llll ll l
l l l l l l ll l ll l
lll l ll l l l l lll l l l l l l
lll l l l
l ll l ll ll l l l ll llllll l l l ll l l l l l l l ll lll l l l ll
ll l l ll
l
l ll l l ll
l ll l l l ll ll l l
ll ll ll lll l l l l l ll ll
l l ll
l llll l ll l
l
l lll l ll l ll l ll ll l l ll l lllll
l
ll l ll l l ll l l l l l l ll l
ll lll
l
l l
l ll ll l l l
l l l
l l l
l l l ll ll
ll l l lll
l l
ll l l l l l ll l lll lll l
l l
l l l l l l lll l ll l l l
l ll l l ll
l ll l l ll l l
l l llll
l llll ll ll ll ll l l ll l lll l
l l ll l lll l ll
ll l l lll l ll l ll l l l ll l l
l
l l l l
l l ll l lll l ll l ll l l
l l l l l ll l
l l llll ll l lll l
ll
l l ll ll l
l
ll l l l l
ll
l l
l
l l ll
l l l l l ll l
l
l l l lll l
l l l l
ll ll lll l ll l l l ll lll l l l l l lll l lll lll l ll
l l ll l l l l l ll l
l l lll
l
l l l l
l l l l l l l l
l ll l l l l l ll
ll
l
l
ll
lll lll l l
ll l l l
l ll ll
l l l l l l l
l ll
l l l
l ll
l
l l ll l l l l l l l l l ll l l
l l l ll ll
l
ll l
l l l l l l l l
l l l ll l l
l ll ll ll
l l l l l l l ll l
l
l ll
l ll
l ll l
l ll l l l
l
l l
l ll ll l
l ll lll l l l
l l l l l l l l l l
ll ll l ll l l l l l l l
ll lll l l lll l l l l
l
l l lll l lll l l
ll l l l l l l l l l ll l
l l ll
−2 l l l ll l
l l l l ll
l l l l l
lll l l l l
l ll l l l l l
l l l l l
l l l l l l ll l
l
l l l l l l l
l l
l l l l l ll
l l
l l l l
l l l l
l l l l
l
l l
l
l l
l l
−4
4
l
l l
l l
l
l
l l
l l
l l ll
l l l l
l l l
l l
l l l l
l l l l
l l l
l l l l l l l l l l
l l l l l l
l l l l
2
l l l l l l l
l l
l l l ll l l
l l l l l l l
l l ll l l
l l l l
l l l l l ll l l
l l l ll l l l ll l l l ll
l l ll l l l l
l l l ll l l ll l
l l l l
l l ll l l l l ll l
l ll l l l
l l
l
l l ll l ll l l l ll l l l l l l
l ll l l l l l l l l l llll l l l l l l l l ll ll
l l l l l l ll ll lll l l
l l ll l l l
l l
l l l
l ll l l l l l l l ll
l
l
l l l l l ll l l l l l l l l l l ll
lllll l
l l l
l l ll l l l l l l l l l ll l l
l
l
l l l
l l lll l l l ll l l lll l l
l l l l l ll l
l l l l l l l l lllll l l
l ll ll lll l l l l
l l ll l ll l l l l l l l lll l
l l l l l ll l l l l
l ll
l lll ll ll l l l l ll l l l l
ll
l l l
llllll l l
l
l
l l l l l l ll l l
ll l ll l lll llll l l lll ll l ll l
l l
l ll
l l llll l
l ll l ll l l lll ll ll ll
l
llll ll
ll l l l l
llll l l ll
l l l
l
l ll l l l l l l l
l l l l ll ll l l ll l ll lllll
l l l l ll l ll l ll l l l l ll
l
l l l ll l ll l l l l l l ll l ll l ll ll
l ll l
l ll ll l l l l ll l
l
l ll l l l
ll l l
lll
l ll l l l l l l l l l l
l ll l l l l l l l
ll l l lll ll ll l l l
l ll l l l l l
l l l
ll l
ll lll l l l l lll l l l l l l
l
l
l ll l llll ll l
l
l
l
ll
l
l ll l l
ll l l ll ll ll l l l ll ll l llll l l l l l ll l l
l ll
l l
lll l ll lll l ll l
l ll l lll lll l ll l l l llll l l l l ll l l l
l ll
l l ll l
l l ll l ll l l l l ll
ll l ll l
l l l lll ll ll l lll l
l l l l l lll l l l l lll
l
ll
lll l l
l
l l l l l
l l l lll l l
l
l
l
l l
l l ll
ll
ll l
l ll
l l l l l l ll l
l
l
ll l ll l l l
l lll l
l l ll l
ll l l l ll lll ll ll lll l l l l l l l ll ll ll l ll l l l ll l l l l l ll ll ll l
l
llll l l
l l
l ll l
l l l l l l lll l ll l l l l l l
l l ll l l l ll l l l l ll l l
l l l ll
l ll l ll l ll l ll l l lllll ll llll l l l l
lll lll l
ll l l l l l l l l
l l l ll l l l l l lll l l l l l ll ll l l l ll l l l l l lll l l lll ll l lll ll ll l l
l
ll lll lll ll
l lll l l l
l lll l l ll l
l
l l l l l l l l ll ll l l
l ll l ll
l l l ll l ll ll l
ll lll ll l l llll l l l l
ll l l l ll l ll lll ll ll lll ll ll l ll lllllll l
l lll ll ll l ll l l l l ll ll ll
ll l
l
ll lll l ll l l l l ll l l ll l ll l l ll l l
l ll l ll l l ll ll ll l l l l l l l l
ll ll lll l
lll l
l ll ll l l l ll l l llll ll ll l l l ll l
l ll l ll l lllllllll l ll l l ll l l l
ll l lll llll l l l l l l lll ll ll l
0
l l l l l l l l l l
l lll l l l l ll l l l
l l l l
l lll l l llll l
l lll
l l l l l l ll l llll lll l ll
lll l ll l l
l l
ll l l l l
l ll l
l
l ll l l ll
l l ll l
ll l ll l ll l ll ll l l l ll l l ll l l l ll ll ll l l l l l l
y
l l l l l llll ll ll l l l l
l l l lllllll ll l
ll l l l
lll l l ll l
ll l l ll l l l l lll lll l l ll l l
l l ll lll
l lllll lll l l l ll l
l ll l l l
l l
l l lllll l
ll l l l l lll
l ll
l l ll
l ll ll l l ll ll ll ll ll l l
l l
l l l
l lll ll llll ll l l l l l l lllll l ll l
l l ll ll l
l l
lll
ll l ll
l l l ll lll lll l llll
l
l l l l ll l l ll
l ll lll
l l l ll l ll l l l l l ll l ll l
ll l l
l
l l llll
l ll l ll l
l l l l l lllll l lll l ll
l
l llll l l
l l
l l llll llllll l
ll llll l l l l
l l l
l l l l lll
l
l
l ll l ll l
l l l l l l ll l l lllllllll ll ll l l l l l
l l ll
l
lll l ll ll ll l ll l l l l l
l l llll ll llll
ll l l l ll l
ll
lll l lll l
ll lll l
l l
l ll
l l
ll ll lll
ll ll l lll
l ll l l
l l lll l l
lll l l ll l l ll l l ll ll l
l l lll l
l
ll lll ll ll ll l l l lll
l ll ll l l l ll l l l
l l l l l l lll l ll l l ll l l ll l
ll l ll lll l ll ll ll ll l l l l ll l l l ll l l l lllll l lll l ll llll l l l l ll ll
l l
l
l l l l l
l l l l l l l l l ll ll l l ll l l l ll l ll l l ll l lll l l
l l
lll lll l l l l ll
ll
l l l ll l l l l l l l l ll l l l l
l ll lll l l l l l
l l l l l l lll ll l l
l lll l
lll ll ll l
l l l l ll l
l l ll l l
l ll l
l l l
lll l
l l l l l ll l l l ll ll lll
ll l l l l lllll lll l
ll l l ll ll l
ll l l l ll l
l l l lll ll
l l l l l l
llll ll l
ll
l l l l l ll
l l ll l ll l l l ll l
l l l lll l l ll ll
l lll
l
l l ll lll l l
l l ll
l
l ll l ll l l l l l lll l l ll l ll ll
l l l l lll l l
l ll l l ll l l l l lll l l l l l l ll ll l l ll l l
l l ll l l ll ll l l
l l lll l lll
l ll l lll l l ll l l l lllll ll l l l ll
ll l
l l
l l l l ll l
l
l ll ll l l ll l ll l l l
l
l ll l ll
l
l ll l lll l ll l l lll l l l l
l ll l l l l l
l l l l l l l l l l l l l l l
l l ll
l l l lll l
l l llll
l l l
l l l
l l l l l l l l l l lll ll ll l l l l
l ll l l l
l l l ll l l l
ll l lll lll l l l l l ll
l ll l l ll l l l l ll l
l ll l l
l l l ll l l ll ll l
l
l lll l l
ll l ll l ll l l l l l ll l l l l ll
lll
l
l l ll ll l l l llll l l l ll l
ll l
l l l l
l l l l ll ll l l l l l ll
l l
l l l l l l l l l
l ll l l ll l
l ll l l l l
ll l
l
ll l l l l
l l ll
l l l l l l ll l lll ll l l
l
l l
l l l l l
l l l ll l l ll l ll l
ll l l l l l l l l l l l
ll
l l ll
l l
l l l l l
l l l l ll l l l ll l l l l l l l
l l l l l l l l ll
−2
l l l l l
l l l l l l l l l l
l l l l l l ll l l
l l l l l l
l l l
l l l l l l lll lll
l l l l l l
l l l l
l l l
ll l l
l l l l
l l l l
l ll l l l l
l
l
l
−4
l
4 l
l l
l l
l
l
l
l
l
l ll
l
l l
l l
l l ll
l l l l
l l
l l
l l l l l
l l l l l
l l l l l l l l l
l l l
l l l l l l
l
2
l l l ll l l
l l l l l l ll l l
l l l l l l l l l ll l ll
l l l l l l l
l l
l ll l ll l l l
l
l l l l l
l l l l ll l
l l l l l l l ll l l l l
l l l ll l l l l
l l l
l ll
l ll l
l l l l l l l l l lll l l
l ll l l l l ll l l l ll ll
l ll l l l l l l l l l
l
l l ll l
l
l l l l l l l ll l l l ll l l ll l l
l ll l
l ll l l l l l l l
l l ll
l l l ll l ll
l l l ll l
l l l l ll
l ll l l l l ll ll l l
l l l l l
l l l ll lll l l l l
l l ll l ll l l ll l l l
l l l l l l l lll l l l l l l l ll l
l l l l ll l l l l l ll llll
l l
l l l l ll l l ll ll l ll l llll l l l l l l
ll l l ll ll l l l l
ll l
l
l l l l l l
ll
l l l
ll l
l
l l
ll l
l l ll
l ll l ll
l l l l
l
l ll l l l lll
l ll llll l ll l
l l l l l
l
ll l llllll l lll
l l ll l ll
l l ll l l ll l l l l l l l l ll ll lll l
ll l l ll l ll
l
ll l lll l l l l l l ll l l l l ll ll l l l l
l l l
l l l l l l l l
l
ll l l lll
ll l l l
l
l l l l l l l l l
l
l
llllll ll l l l l l l l ll
l l l l ll l ll l l l l l l l l ll l l ll l ll lll ll llll lll ll
l l l
ll l lll
ll l l
l l ll
l lll l ll
l ll l l l l l l l l l l l ll ll l l l
l ll
l ll
l l ll
lll
l
l ll ll l l l lll l
l l l l l lll ll l llllllll ll ll
llll ll l ll ll l
l l l l lllll ll
ll
l ll
ll l
lll l ll l l ll l l l l l ll l l l l l l l l l l llll l
ll ll
l l l lll llll
l
l
l l
l ll
l l ll l l l
l l l l l l l l l
l l l lll
l l lllllllll
l
l ll
l
l ll l
l
l l ll
l l
l l llllll llll
ll
l
l ll l l l ll ll
l l l lll l l l ll ll l l l l l lll l ll l l ll ll
l ll l
ll l l l l ll l l l l
l
l llll ll l l l l l l l ll l
l l
l l
ll l l l l lll l l ll l l llll l l l
l ll ll ll l l l ll l l l ll l l l l l ll llll lll l
l
l
ll l ll l ll l
l l ll l ll l
l ll l l
l
lll
lll l l l l
ll ll ll l l l
l
l
ll ll l l l l l l
l l
l l l ll ll l l l
l ll l l l ll l l lll lll l l l l
l
ll l ll
l l ll l llll lllll
l llll l llllll l l l l
l l ll ll l l l lll ll
l ll ll l ll ll l l l
l l l l l l l l l lll
ll l l ll lll l l ll
ll l l lll l l l l l l ll ll
l
ll ll l l l l l l
l l l ll l l
l ll llllll
l l l ll ll l l
ll l l l ll l l
l l l l l l l ll l l l l ll ll l ll ll lll ll l l ll ll
l l l l ll ll l llll l ll l ll l l l ll ll lll l l l
l
ll
l
l l ll lll l
ll ll l ll
l
l ll
ll l
l lllll
l l ll
l l
lll l l
l l l l l lllll ll ll l l l l l l llll ll
ll l l l l l ll l l l ll l l
0
l ll l llll ll l l l
ll l l l l l
l l l l l
l l l l l ll l ll
ll
l l l ll l
l l l l l lll l ll l l llllll l
l
ll lll
l
l ll l l lll l l l l l l ll
l
llll l
ll l l l lll lll
lll ll l l l
l l l l ll ll l l
l l ll
ll l
lll l l lll ll
l l l ll l l lll llll lll ll
ll lll l
l l
llll
l
l l ll l ll ll l l l
l ll ll l ll l l lll l ll l ll ll ll l
ll ll ll lll l l lll l l lll l l l
ll ll l l l
ll l ll l l l
l l
l ll l l l l l l llll l l l l l ll
l lll l ll l l llll l
l l ll l l llll
l l l l l
ll l ll
lll l lll lll l l l
l l
l l l l llll ll l
ll l l l l l l
l l l lll l
l l l l l ll ll l lll l l l ll l ll ll l l l
l l l l l l ll l lll l l lll ll l
l
l
ll l l l l l ll l l ll l l
l ll
l llll l l l l l l
l ll ll l l lll l l
l ll
l l ll l l l ll
l ll
l ll l
l l
ll l l l llll ll ll lll ll l l l l l ll llll l ll l l l lllll lllll ll lll l
l
ll ll l ll l
ll ll l l lll l l ll l ll l l l l l l l l l l l ll l ll ll l
ll l l ll l l l l l l ll l l l
l lll l l l l l lll l l l ll
l
ll l l l l ll ll ll l
l ll lll l
l l
l
ll l l l
ll
ll
l l l l l l l l l l l l lllll ll l l ll ll l l
l ll l l
l l
l l l ll l l ll l l l l
l l l l l l ll l
ll lll l ll l l l l ll l l l lll l llll ll l
l l l ll lll l l ll lll ll l l ll l l ll l ll l l l l ll ll l
l ll l ll l ll l l l
l l l ll l l l l l
l l ll ll l l l ll lll l l lllll l
l l ll l l l ll lll ll ll
l l ll l ll l lll ll l ll l
l
ll l l
l l ll l l l l ll ll l l l l
l l l l
l
l
l l ll
l
ll l
l l
ll ll
l l l l l
l l ll l ll ll l
l l l l l
l l
l l l l
l
ll l l l
l llll l ll
l l l l l l ll llllll ll
ll l
l
llll ll l l
l l llll
ll
l l l l llll l
l l l l l ll l lll l ll ll l
l ll l l ll l l l ll l ll l l l
llll
l l l l lllll l l l l
l l ll l ll l l lllll ll ll l l l
l l
lll l l l lllll
l l ll ll l
l lll l l l
l ll l ll l l ll l l l ll l ll llll ll l
l l
l l llll
llll l l
ll
l
l
ll l
l l l l l l l l l l l l l
ll ll lll l l l l ll l l l l l l l ll lll l l l
l l ll l l lll l l l ll l l l l
l ll lll l l l l l
l l ll ll ll ll l
l l
l l lll lll l l ll l l l l ll l l l l l ll l l l ll
l l l
l lll
l
l l ll l
llll l l l l l l l ll l l ll ll l
l l l ll l l l lll l l ll l l lll l ll ll
l l l l ll lll l l
l ll l
ll
l l l ll lllll ll l l
l l l l ll l
l l llll l l l l
l l lll
ll
l l l
l ll l l l l l
l l ll
l l l
ll l
l ll
ll ll l l l l l
l l l ll ll ll l l l l l l l l
l
l
l llll l l
l l l l l l l l l ll l ll
l l
l l
l l
l l l l l l l lll l
l l lll l ll
l l l ll l l
l l l ll l ll l l ll l l l l l lll l
l
l ll l l l l ll l
l lll
−2
l l l l l l l l l
l l l l l l l l l
ll
l l ll l l l l l l l l l l l
l
ll l
l l
l l ll l
l l l l
l l l l
l l l l l
l l ll
l l l
l l l l l l l
l l
l l l l l
l l
l
l l ll l
l l l
l l
l l
−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x
Fig. 3.6 Samples from standard normal random variables X ∼ N (0, 1) and Y ∼ N (0, 1) joined
by a t-copula with several values of r and ν. The value of ν is constant in a row, and the value of r
(and the corresponding τ ) is constant in each column
copulas, are defined by a generator function, ϕ(t) for t ∈ [0, ∞). Given a generator,
we define the quasi-inverse
−1 ϕ −1 (t) 0 ≤ t ≤ ϕ(0)
ϕ̂ (t) ≡ . (3.23)
0 ϕ(0) < t < ∞
With the generator and quasi-inverse, the Archimedean copula for ϕ(t) is
The term Archimedean arises from the development of the triangle inequality for
probability spaces; in that context Archimedes of Syracuse’s name is attached a
particular norm that has the form of Eq. (3.24).
Archimedean copulas are commutative
associative
The associative property will be used later to easily create Archimedean copulas for
arbitrary numbers of variables.
Furthermore, an Archimedean copula can be related to Kendall’s tau via the
formula
1
ϕ(t)
τ (U, V ) = 1 + 4 dt. (3.25)
ϕ (t)
0
There are many Archimedean copulas one could define; we will discuss two
below that are commonly used.
One common Archimedean copula is the Frank copula. This copula has a single
parameter, θ
= 0, and a generator function given by
e−θt − 1
ϕF (t) = − log . (3.26)
e−θ − 1
The inverse is
1
ϕ̂ −1 (t) = − log 1 + e−t (e−θ − 1) . (3.27)
θ
This makes the copula
1 (e−θu − 1)(e−θv − 1)
CF (u, v) = − log 1 + . (3.28)
θ e−θ − 1
3.2 Copulas 67
1.0
0.5
0.0
τ
-0.5
-1.0
-100 -50 0 50 100
θ
One property of the Frank copula is that as θ → ∞, the copula becomes the
upper Fréchet copula: CF → CU . As θ → −∞, then the Frank copula approaches
the lower Fréchet copula: CF → CL .
The value of Kendall’s tau for a Frank copula can be calculated from Eq. (3.25)
as
2 3θ 2 − 6iπ θ + 6θ − 6θ log eθ − 1 − 6Li2 eθ + π 2
τF (U, V ) = 1 − ,
3θ 2
(3.29)
where Lis (z) is the polylogarithm function. A table for matching a desired value
of τF to θ is given in Table 3.1. Additionally, the value of τF as a function of θ is
shown in Fig. 3.7. The Frank copula has a tail dependence of zero. Samples from a
standard normals joined by a Frank copula are shown in Fig. 3.8, where we observe
the lack of tail dependence.
68 3 Input Parameter Distributions
0.4
0.3
Density 0.2
0.1
0.0
-4 -2 0 2 4
τ = 0.601 θ = 7.92964
4
4
2
2
0
0
y
-2
-2
-4 -4
0.0 0.1 0.2 0.3 0.4
-4 -2 0 2 4 Density
x
Fig. 3.8 Samples from standard normal random variables X ∼ N (0, 1) and Y ∼ N (0, 1) joined
by a Frank copula with θ chosen to get τ = 0.6. Note the lack of tail dependence in the lack of
concentration near the upper right and lower left corners. Note that relative to the normal copula
and the t-copula, these points form a rectangular-shaped band. The lack of tail dependence is also
apparent in the lack of points along the diagonal
In Fig. 3.9 samples from Frank copula are shown with the values of θ given in
Table 3.1. In this figure we can see that as θ gets larger, the distribution is pinched
in the middle, but the tails of the distribution remain spread out.
The Clayton copula generator function has a single parameter, θ > 0, with generator
function
ϕC (t) = t −θ − 1 (3.30)
and inverse
4 l
l
l
l l l
l l
l
l l
l
l l
l l
l l
l l l l l
l l
l l l l l
l l l ll l
l
l l l l
l l l l l l l
l l l l l l
l l l l l l l l l
2
l l l l ll l
l
l l l l l l l ll l
l l l
l l l l l l l l l
l l l l l l
l l l ll l l l
ll l
l
l ll l l l l
l
l l
l l
l
l ll l
l
l l l ll l
l l
l
l ll l
l l l l ll l l l l l ll l l l ll
l
l
l l l l l lll l l l ll
l l l
l l l l l
l
l ll l ll l l l l l l ll l l l l l l lll
l ll ll l l
l
l l l ll l l l l l
ll ll l l l l l
l l l lll
l
l
l l ll ll ll l
l l ll l ll l ll l l l l l l l l l l
ll
l l l l
l l l l l l l l l ll ll l l
l
ll ll l l l l l ll ll l ll l
l l ll l l l l l l l ll
l l l ll l l l l ll l
l l ll l
l l ll l
l l ll ll l l l l l
l l l l ll l l l
l l
l ll l
l
ll l l
l l l l l ll l l l l ll l l l l ll ll ll l l l
l ll ll l ll ll ll l l ll l l l
l lll l l l l l l l
l l
l lll
l ll ll
ll l l ll l l l lll ll lll l l lll ll l l
ll l
ll l l ll l l l ll l l ll l l l ll l l ll l lll l l
lll ll l l l l l ll l ll l ll l
ll l l l l l l l l l lll l
l
l l
l l ll ll l l l l ll l l l ll ll ll l l l l l l l l l lll l ll
ll l l
l l l
l ll lll l l l l
l ll l l l ll l l l l l l l l l l l l
lll l ll
l l
l
l l ll l lll l l l ll l lllll
l l l l l l l l l ll ll l l
lll l ll
l
lll l l
l l l l l l l llll l l ll
l l
l ll ll l ll l l l l l l
l
l l l ll l l lll
ll l l
l l ll l
l ll l ll l l l l l ll l l l l l l ll l ll
l
lll l l ll l
l l
l l
lll l l ll ll
l l l l ll lll l l l l lll l l l l l l ll ll
l l l l l ll l ll ll ll l l l ll ll ll
ll l l ll
l l
l l
l l
lll l l l lll l
l l l l l l l l l l l ll l l l l l l ll l l l llllll l ll l l
l l l l l l l ll l l l l l l lll l l l
l l l
ll
l l l ll l l l l
l ll l
lll l l l l l
l ll l l l l l lll ll l ll ll
l l lll l l l l
l l l lllll l ll ll l l ll lll l l l l l l lll l l l l
lll l ll l
l ll l l l l l
l ll
l l ll lll l l l l l
l l ll
l l l l l
l l l ll l ll l l l l l l l l
l l l l lll l ll l l l l l ll l l
l l ll l ll ll l l ll ll l
l l ll
ll l l ll l l ll
l l ll l
ll l l
ll ll l
l ll
l lll l l l ll ll l l l l
l l l l l
l l l l l l ll l l l l l l
l l ll l ll ll
lll l ll l
l ll
l l l l
l l l ll ll l l l l lll ll ll lll l l l l ll l l l l l l l l l ll l ll l l
l l llll l l l l l l l l l l l l ll l l l l lll l l ll l
l ll l l l l l l l ll ll l ll
l l l l l l l ll l lll l l ll l ll l ll ll l l l l l ll ll l l l
l l l
l l ll llll l l ll l l l l l l ll l l ll l l l ll l l l l
0
l l l l l l l l l l l
l l l l l lll l
ll ll l l l l l l l l l l
l l ll lll ll
ll l l l llll lll l l l ll ll l ll l l l l ll l l ll
l ll l
l l l l l ll l lll
ll l ll ll l lll l l ll l l l lll
lll ll l
lll
l l l l l l l ll
lll l ll
l llll
ll l ll l l
l l ll l l
ll l
l l ll l l l ll l l llll l l ll l l l l l
l ll l
ll l l l ll l l ll ll l l l
l
l ll
l l ll l ll l ll ll ll ll l l l l l lll l l l ll
l l
lll
l ll ll
l lll l
l
l l l
l l l lll l lll
l ll l ll l l l l l
l ll l
ll llll ll l
l l ll l ll l l lll l l l lll l l ll
ll l ll
l l
l l l ll l ll ll l ll l l l l l l l ll lllll l l l ll ll l ll l l l
l llll ll l l
l l
l
l ll l l lll l l ll l l l l ll l l
l
l l
l ll l ll l l ll lll l ll ll l ll l ll l ll
l
l ll l
llllll ll l ll l
ll ll l l l ll l l l l
l l l l ll ll l l l l ll ll ll l ll l l ll l ll l ll l l l ll l l l l l l l ll l l
lllll lll l ll l l l l l l
l llll ll l l ll l l l
l l ll lll
l l ll
l ll l ll l l l
l l l l l
l l llll l l
l
ll l
l lllll l ll ll ll
llll ll l l l l l l l l l
l
l l l l ll l l
l l l ll
ll l
l llll
l
l l l
ll l l ll l ll
l l l l l
l l l l ll l ll l l ll ll l ll l l l l ll l lll l ll ll ll l l llllll l l
l lll l
l l l lll l lll l l l l l ll l l
l l ll
l l l l l
ll ll l
l l lllll l l l l l l l lll l l ll ll ll l l l
l
lll l l ll l ll l l ll l ll ll l l ll l l l l l l l ll l l l l l l l ll ll llll l
l l l l l l ll l l l l
l l l
l l
l l l ll l l ll l ll lll l l l l l l ll l ll l ll l l llll l l l ll l l
l ll ll l ll l l l l ll ll l l ll l l l l l l l ll l l l ll l l l l ll l
l l l l l ll l l l l
l l l l l l l
ll l
ll
l l ll l l l l l l l l l
l lll ll l llll l
ll l
l l
ll l l l l ll
l l l
l l l l lll
l l ll l l l l
l l l l
l
l ll l
l l l
ll ll
l l
l llll l l
l
ll l ll l l ll l ll l ll l ll l l lll l l l l l
l l l l l l l l l ll l l l l l ll l l l l ll l l l l l l l l l ll l ll l l l
l l ll l lllllll ll l l ll l l l l ll l l ll ll l l
l l l ll l llll ll l l ll l
l l
ll l l l l l ll l l ll
l
l l l ll ll l
l l l l ll l llll l l l lll l
ll
l l l l ll l l l
l ll ll l l l lll l l l l l lll ll l l
l l l
l l l
l ll l l l l l lll l lll l l l l l l l l l l l
l ll lll
ll l l
ll l l
l l ll l
lll l l l l l
l
ll l l l l ll l l
ll
l l l
ll ll
l lll l l
l ll l
l ll ll l l
l
l l l ll l l l l l l ll l ll
ll l ll l l l l l l
l l l ll l ll ll l ll
ll l l
l ll ll l
l l l l l l l l l l l ll l l l l l l l l ll l l
l
ll
l l ll l ll ll
l
l l ll ll l ll l
l l l l ll l
l ll
ll l l ll l l l l
l ll l l l l l l ll l ll l l l l ll ll l ll l l
l l
ll l ll ll l ll ll l l l l l
l l l l
l lll l l l l l l l
ll l l l l l l
l l l l ll
l
ll ll l
l l l
l lll l l l l
ll l l l l l l l l l l
l l l ll
l
l
l
l l l l l l l
l l l l
ll l
−2 l
l
l
l
l
l
ll
l l
l ll
l
l
l
l
l
l
l
ll
l
l
ll
l
l
l l
l l
l
l
l
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l l l l l l
l
l
l l
l l
−4
4
l l
l
l
ll l l l l
l l
ll l l l l
l
l l l l
l l l l l
l
l l l l l
l l
l l l l l
lll
l
l l l l l l
l l l l
2
l ll l l l l
ll l
l l l
ll ll l l l
l l l l l l
l l l l l l l l l l l
l l l l
l l l l
l l l l l l l l ll
l
ll l
ll l l
ll l l l l l
l l ll
l l l
l l ll
l lll l l l l l l l l
l
l l
l l l ll ll l
ll l ll ll ll l l l l l l l l l l l l l
l l l
l l l l ll
ll l ll l l ll l l l l
l
l
l l
l l l l
l l ll l l l ll l
l llll ll l l l ll l l l l
ll l l l l l l l l l l l l ll l ll ll l llll l l l l l
ll
l l l l l l ll l ll ll l l l ll ll l ll ll l
l l l l
l
l l l l l l l l l ll ll l ll l
lll ll lll l
l l l l l l l ll ll l l l l
l l
l
l lll l l l l l ll l l l l ll ll l l
l lll l l lll l ll l
l l ll l l ll l l l l l l l ll l l l ll l l l l l
l l l l l l
l l l ll l l l ll l l ll lll ll lll
l
ll l ll l l l l
l l l ll l ll ll l l l l l l l l l
l l ll ll ll llll ll l l ll l l l l l ll l l lll
l l l ll l l l l l
l l l l l l l l l l ll l ll ll ll l ll
l l llll ll
lll l ll
l l ll l l
ll l l l ll l lll l l l l l ll l l l l ll l lll l ll l ll l l ll l l l lll l l l
ll lllll
l l
l l l ll l llll l l l ll l l l l ll l llll l
l lll l l l l l lllll l ll l l ll l
l ll ll l l l l l l ll
l l
l l l ll ll ll l l l ll l l l l l l ll lll l l l ll ll l l l l l l llll l
l l ll l l
l l l
l ll l l l l l ll l l
l ll l l l l l l l
ll l l l l
l l ll llllll l lll l l l l l l l l llll l l
ll l l l l
l l ll l l l l ll l l lllll ll ll ll l ll l llll l l l ll l
ll l l
l l
l ll ll l l l l lll l
l l l l l l ll l l llllll ll l l l
l l l l l ll llll l l ll l l l
l l ll l
l l
l ll lll l l ll lll l l l l l l l ll ll
l
l
lll l
lll
l
l l l l lll l l
l l ll lll l
l l l l
l ll l
l l
l l lll lll l l l l llll
ll l l ll l l l l l l l ll l l l l
ll
l
l l l ll
ll l l ll l l l
l ll ll
l
llll l l l
l l
l
l l l l
ll l
l l ll l l
l ll l l l l ll l ll
l l
llll l
l ll ll l l
l l l
lll l l ll ll l lll llll ll l l l l l l llll l l lll ll l ll l l l llllll l ll ll l l l ll
ll
ll
l lll ll
l
l l l
ll l ll lll ll l lll l l ll l
l l l l l l lll l lll l l l l l ll
l l l l l
l ll l llll
ll ll l l l ll l l l ll l l l ll ll ll l ll l l
lll ll l l l l ll l l ll l l l l lll llll ll l
llll lll l
l l ll ll l ll ll l
l ll l l
l l l l l l ll llll l l l l ll
l ll ll llll l l l l l lll l l l l
l l l
llll l l l lllllll
ll ll l lll l l ll l l l ll l lll l ll l
l l ll
l l
0 ll l
l ll ll l ll ll l l lll l l l llll ll ll l
l l l l l
ll l ll l l l l lll llll l ll l l l l lll
y
l l ll l l l lll l l lll
l l ll ll l lll l ll l l l l l
ll ll llll ll l l l ll llll l lll ll l l l llllll lll ll l ll
l l
l l ll
l lll ll ll l llll ll
lll
l
ll l l l l l l l l l lll
l llll l l l l l l l l l l l ll lll llll
lll
l lll lll l ll l
l
l
l ll l
l lllll l ll l l l
l l l l l l ll l l ll lll ll ll l l ll llll ll l l l l
l l l l lll
l l
l ll l l l l l llll ll ll
l l l l ll l
l l
l l l l lll ll l
l ll
ll l ll l l l l ll l ll l ll l l
l ll l l ll l ll
l l llll l l
l ll llll ll
l l l ll
l l ll l l l
l ll l l l l lll l l llll l l ll l l l l l l l l ll l l l ll l
l ll ll ll l lll
l llll lll ll ll lll l
l l l lll l l ll l l
ll l ll l l l l l l ll l l l l l
l ll lll l l l l l llll
l l ll l l l lll l l l
l l ll l l l ll ll ll l
l l l l l l l lll l l lll l l
llllll l lll l l ll l
l lll l l ll l
l lll
l l lll l
ll l l l l
l l l l l l l ll ll ll
lll l ll ll
ll l
ll
l llll l l l
ll l l l ll l l l l ll
l ll l ll l
l ll l
ll
l ll ll llll l l l ll ll l l
lll l l ll l l l l
ll l l
ll l l ll l l ll lllll ll lll l l
l
l l l l l l l l ll
ll ll l
l l ll l l lll ll lll l ll
l l ll l l ll l ll
l l l l l l l
l l l l l lll ll ll ll l ll l ll ll l l
l l l l ll l l l ll l
l ll ll l ll ll
ll l l l l l l ll ll l ll l l
l l
ll l
l l ll l l l
lll l l l
l l l l lll ll l ll l ll l l lll l ll l l l
l ll
l
ll l ll l
l
l l l l ll
l l l ll l l l ll
lll
l l l l ll ll l ll ll l l l l l l l l ll l ll
l l ll l l l l l l ll l ll l l l l ll ll
l l l
ll l ll l ll ll l ll ll l l ll l l l
l lll l l l l l l l l l l
ll
l l ll l l ll
l ll ll
l l ll l ll l l
l l ll l l ll l l
ll l l l l
l l l l l l
l l l l l l l
l ll
l l l ll l l l ll l l l ll
ll l l l
ll l l l l l l ll l
l l ll ll l
l l ll lll l l l l l
ll ll ll l lll ll l l l l lll
ll ll l l l l ll l
ll
l llll l ll l ll l l ll l
ll l l l ll l ll lll ll ll
l l ll l l l l l l l l ll l l l l
l l ll l l l l l l l
ll lll l l ll l ll l l ll
ll l ll l l ll ll l l
l l l ll ll l l l ll ll
l l l l
l l l l l l l
ll
l
ll l l l
l l l l l l ll l ll ll l l ll
l l l l ll l l l l
l
l l llll l l l ll l
l l ll l l l lll l l
ll
l l l l l l l
l l l l ll ll l l
l l l ll l
l l ll l l l l l ll
ll l l l l l l lll
l l ll l l l l l
l ll
l
l l l l l ll lll l l
ll l l
ll l l l l l l l l l
l l l
l ll
l l ll l l l ll l l l ll ll l ll l
l l ll lll l l l l
l l l l l l l l l l l l l l l l
l
l l l l l l l ll l ll ll l l l ll
l l l l
l l
l l l l l ll
−2
l l ll l l l
l ll l l l l ll
l l ll l
l l l l l l l l l l
l ll l ll l
l l l l l
l l
l l l l
l l l l
l
l l l l l l l
l l l
l l l l l l
l l l l l
l l l
l
l
−4
4 l
l
l
l
l l
l
l l
l l l
l l l l l
l l l l l l
l l
l l l
l
l l ll l l l l l
l l l
l l ll
2 ll l
l l l l ll l
l l l l
l l
l l l l l l l l
l l l l l l lll
ll l l l l l l
l l
lll
ll l l ll l l l l l
l l
l l l ll l l l l ll l l l l l l
l l l l l lll l l lll l lll
l l l l ll l l
l
l l ll l l
ll ll lll
l l
l l l l
l l
l l l l ll l l l
l l l ll ll ll l l l l l
ll l l
l l l l
l lll ll l l
lll l l l lll l l ll l l l lll l l ll l l l ll
l ll llll l l
lll lll
l l l l llll l
l l l l
ll l ll l ll ll llllll lll l l l l lllll l
l l ll l l
l l l l l ll
l l ll l l
l l l ll lll lll
l l l
l
llll
l ll l
l l l
l ll
l
ll l
l l ll lll
l ll ll l l l
l l l l l l ll l ll l l l l llll ll l l
l l lll l l l l l l ll llll llll l l
l lll l
lll
ll
l ll ll l l ll l ll l
ll
llll l ll l
l l
l l ll l ll l l l l
ll l l l lll ll
l
llll lll l l l l
l l l ll l l
ll lll ll l
l
l l l l
ll l l ll ll ll l ll l ll
l l ll ll l l l l l
l l ll lll l l
l l ll l ll l l
l ll l l l l l
ll
ll l llll l l l l l l
l ll ll ll l ll
lll l l l l
llll l ll l l l lll l l l l ll l l llll lll l
lll ll ll
lllllll
l lll lll lll ll l
l
ll ll l
l
l llllll ll lllll lll l ll l l
l ll l
l l l l ll
l
l ll lllll
llllll
l
l
ll l ll
l ll lll l ll
l l llll l lll lll
lll llll l
lll lll l l
l l l
l lllll l lllll ll l l l l ll l ll lll l l
lll ll
lllllll l
l ll l
ll l l l lll l lll l ll l l l l l l lll
llllll
ll l l lll l
l ll lll llll l l l l ll l l l ll l l
l ll l
llll lll l
l l
l l llllllll l l l lllll ll lll ll l l ll l ll
l lll lllll l
llll lll l l l l ll l l lll lll
ll l ll l l ll l lll llll l
ll ll
l
ll l ll
l l l l lll l ll l ll ll
ll l ll l l l
lll
l l
ll
l
l lll
llll
l lll
ll
ll l
lll l l l l ll lllllll llllll l l
lll ll
llll
l ll
l l
l lllll
lll l l
llllllll lllll l
llll
ll
l
ll l lll l l lll
ll
lll l ll
l l lll l lll
l
l
ll l l ll
l
l l ll
llllll ll l
l ll ll l
l l l l ll l
ll
ll llllll
ll
l ll
l
l ll l ll
lllll l
l l l ll
lll
l ll
ll
ll
llll ll l l
lll l l l l ll l l ll ll ll l lll lll l ll
l
ll ll lll l ll
lll
l l llll ll
llllll
l lll ll
l l lllll ll
llll
ll
l l ll lll
0 ll l l l l l l ll l
llll l ll ll l
l ll
lll
ll
llll ll ll lll l
ll l
l ll llll
ll
l
l ll lll lllll
l
llll
l l l ll lll ll l lllll ll l l l l l ll l l
l
lll ll
ll
l
ll llll ll
l ll l ll lll l
lll
l l
l
l lll
lll
llll
lll l ll l ll l ll l l ll l l lll l l l ll
lllll ll l llll lll
llll
ll lllll
lll ll ll l ll l
l l
lll ll lll l l ll
ll l
llll
ll
l ll
lll l
ll lll lll
llllll l
l
lll ll
l
ll
l l
lll ll
llllll ll ll lll l ll l llll
llll
lllllllll
ll l l
l l
l l
lll
l ll
l ll ll ll
l l ll ll l l lll lllll
lll l l l l llll
l ll
ll
l l
l lll lll
lll
ll l
lll ll
l
l l
lll ll
l l l
ll l l ll
l ll ll l l ll l l l
l lll
l
llllll
ll
lll
lll lll l
l
lll l ll
ll
ll
lll
l l
l
l
l l
l l l
lll
ll l ll l l l l l ll l ll lll
l l lll
lll ll
lll lllll l l
ll l l ll ll
ll l l lllll
ll ll l
l
l l l ll ll ll l
l l l l llllllll l
l ll ll l
l l
ll l
l
l l l ll l lll l ll ll l
l
l ll
ll
lll
lllllll l l ll ll l l ll
lll
l
llll llll l
lll
l
l l l l llll
l l
llllll l
lll
llllllll ll l
lll ll l l l lll l llll
l ll ll l l
l l llll
l ll
l
l
ll
l llll
l
l l
l l
l ll l l l l
l llllll
l lll l llll l lll l lll l l l
l l l ll llllllll l
l l
lll l llll l
l lll llll
ll
ll l
lll lll l
lllll
l
lll
llll
l
lll
ll
l
l
l
l
l
ll
lll l l
l ll llll l l l l ll ll l l l l l l
llllll l l
l l l ll l
l ll l ll
l ll l
l
l l l ll ll l l l l l l ll lll lll
ll l
llllll ll l
l ll ll l l l l
l lll ll l l l ll l l l ll lllll l ll ll l ll
l ll llll
l
l
l l l l l llll l ll lll l
l l l l l ll l l l lllll lll
ll
ll
l
ll
l l
l ll l ll llll lll l l ll l l lll
ll llllllll l
ll
l
l
lllllllll
ll ll ll llll l l l lll l l l ll l
ll l l l l l llll l
l lll
ll l l l l l ll l l l ll l ll ll ll
ll
l lll l l
l
l l l ll lll
l
l l ll lll l l l
l ll l l l
ll ll l
ll
l l l l l l l l ll
ll
l
ll lll l l l l l lll l
l l l
l
l l l ll l l l
ll l lll
l
ll
l l
ll l ll lll
l l l l lll lll ll l l l l l l lll
l l
l
l l l lll ll l l l l lll llll l l l l
ll l l l
l ll l ll ll
l l ll l l l
ll l
ll l ll l l l
l l
llll
l l l l l l ll ll l l
l l l l
l l l ll l ll l l l l
ll l l l l l
l
l l
l l l l ll l l l l l l ll l ll
l ll l l l l ll
l ll l l l l ll ll l ll ll ll l l
ll l l ll
l l
l llll l l l l
l ll l l l l
l ll ll lll lll l l l l ll l l l ll
l l l l l l l l l l ll l l l
l l l
l ll l
l
l ll l
l l ll l ll l
l ll l ll lll ll
l ll l ll l l
l l l l l l lll l l
−2 l l l l l ll ll l
l l l l l ll l l l l
l ll l ll l l
l
l l ll l ll l
l l l l l l l l
l l
l l l l
l l l l l l l l
l ll
l l l l
l lll l
l l l
l l l
l ll
l l l
l l l l
l l l l
−4
l
−4 −2 0 2 −4 −2 0 2 −4 −2 0 2
x
Fig. 3.9 Samples from standard normal random variables X ∼ N (0, 1) and Y ∼ N (0, 1) joined
by a Frank copula with several values of θ taken from Table 3.1
The Clayton copula has Kendall’s tau for the resulting joint distribution given by
θ
τC (U, V ) = . (3.33)
θ +2
Additionally, the Clayton copula has zero upper tail dependence and nonzero lower
tail dependence:
λl = 2−1/θ . (3.34)
70 3 Input Parameter Distributions
0.4
0.3
Density 0.2
0.1
0.0
-4 -2 0 2 4
τ = 0.6 θ = 3
4
4
2
2
0
0
y
-2
-2
-4 -4
0.0 0.1 0.2 0.3 0.4
-4 -2 0 2 4
Density
x
Fig. 3.10 Samples from standard normal random variables X ∼ N (0, 1) and Y ∼ N (0, 1)
joined by a Clayton copula with θ chosen to get τ = 0.6. There is strong lower tail dependence in
the samples and the zero upper tail dependence
We can use the Clayton copula to produce joint distributions with upper tail
dependence and no lower tail dependence by using the copula CC (1 − u, 1 − v).
In Fig. 3.10 two standard normals are joined by a Clayton copula and the strong
lower tail dependence can be seen.
The Clayton copula with different values of θ that correspond with values of
Kendall’s tau from 0.1 to 0.9 is shown in Fig. 3.11. As θ increases, the shape
of the distribution becomes more tapered in the middle and makes the lower tail
dependence more prominent, as predicted, making the samples form something akin
to the celebrate emoji .
3.2 Copulas 71
4
l l
l l
l l
l l
l
l l l l
l l l l
l l l
l l l l
l
l l l l l l
l l l l l
l l
l l l ll l l
l
2
l l l
l ll l ll l l l l l l l l l
l ll l l l l l l
ll l l l l l l ll ll l
l l l l l l l
l l l l l l
l
l l ll l
l l l
l l l l l ll l l l l
l l l l l l
l l l l l
l ll l
l l l l l l ll
l l l ll l l
ll
l ll l l l l ll
l l l l
l
ll l l
l
l ll l l ll l l
l l l l
l l
l l l l l l ll ll l
l l l l l l l l
l l l l l l l l ll l l l
l
l
l l l ll l l l ll l l l l l l l l l l l l l l l l l
l l l l ll l ll
l l l l ll l l l ll l l ll ll
l l
l ll l l l l l l l ll ll l l
l
l l l l l
l l l l l
l l l l l l l l l l l l ll l l l l l
l ll lll l l l l
l l l l ll l l l l
lll l ll ll l l l l l
l l ll l ll l
l l ll l
l l l l l l l ll l
l
l l
l l l
ll ll l ll l lll l l
l l l l ll l l l ll l l l l l l l l l l ll lll lll ll l l l l l l ll l l ll
l ll l l l l l l l ll l l l ll ll l l
l
l l l
l lll lll
l l l l ll l l l ll l
ll
l
ll l l l
l
l l ll
l
l l l l l l ll
l l l l l
l ll llll
ll l l
l l l l l l ll l l l ll l l ll
l l l l
l ll
l
l ll ll ll l l l ll l l l
l
l l l l
l l l l llll lll l
lll l l l l lll lll
l ll
l l
l l ll l l ll l lll lll lll l llllll ll ll
l l ll
l
l ll l l l l l
ll
l l l ll
l
l l lll l l l l l ll l ll l l l l ll l
l ll ll l ll l llll ll l l ll
l l ll l ll l l ll
l l l l ll ll l
l l l l l l l l l l l l ll l lll ll ll l l ll l l l l
l
l l l l l l ll l l
ll ll l l l l l l
l l
l
l
l l l l l l l l l l l l l l l ll l l ll ll l l ll l
l l l l lll l lll l
l lll
l ll
l
ll
l l
l l
l l ll l
l l l ll l l
lll l l l l
l l l ll l l l l ll l l ll
l l l l l l l l
ll l l
l ll l l l l ll ll ll l ll l l
l l l ll l
lllll
ll ll llll l l l l l l l l l
l l l lll ll ll l l l l ll
lll
l l l l l l l ll l l
l ll ll l l ll l l l l l ll l ll l l l l l l l ll ll l l ll ll l l l ll ll l lll l l
ll l
l ll l ll l l l l l l lll ll lll ll ll l lll llll l ll ll l l l l l l l l l l ll l l lll lll l ll lll l l ll l ll l l lll
l
ll l ll l l l ll l l lll l lll l
l l lll l l l ll
l l l l
lll
l l lll ll l l l ll l l l l ll ll l l l ll ll l lll ll l l l ll ll l l l
l l l l l ll l l ll l ll l l l l l lll l ll ll l lllll ll
l l l l ll ll l l
l
l l ll ll ll
ll
l ll
l lll l l ll ll l
lll ll
l l l ll l l l l ll l l llll l lll l l l l l l
lll l lll lll l ll ll l l l l l l ll l lll l l l l l l l ll l
ll l l l l ll l l
l l l llll l
ll llll l l ll l ll ll l llll l ll l l ll l l l l
ll ll l l l lll l
l
ll ll l l
l l lll l l l l l lll lll l l ll llll
ll l
l ll lll
l l ll l l l l l
l ll l lll l ll l l l
l ll l l lll ll l
0 ll l l l l l l l l ll l l l
l l llllll ll
l l llll l l
ll l ll l ll
ll l l l ll l ll
l l
l l
l
ll ll ll l l ll l ll l l ll l l l l l
l l l l l l l ll ll l l l l l l l l l
l l
l l l l l l ll
ll l l
l
ll l l l l l l ll
l ll ll ll l ll ll l ll l ll l ll l l
ll
l l l l lll lll ll ll l l ll l l lll l l l l l l ll l llll l l l l l l l l l l
l llll
l
llll ll ll
l l ll ll l lll llll ll ll l l l
l l l ll l l
l l l lll ll l l ll
l ll
lll l ll
l ll l l l l l
l lll l
lll
l
l ll l l l
ll ll l l l ll ll l l
l l l l l llll ll ll ll
l l l l l l l
l l ll l
ll l ll l
l l l l lll l
l lll l lll l ll l l
l ll
ll l ll l
llll
l ll l l
lll ll ll l ll l l llll lll l l ll l l l l l ll llll l l l l lll
lll ll l l l l ll l l lll llll lll lll l l l
l l l ll l ll
l lll
l l ll
l ll ll l l l ll l l l l l
l l l ll llll ll
ll ll l l l l l l l
l l l l l ll l l l l l l ll l l
ll ll ll l l l l l l
l l l l l l l l l l l l l l
l l l l l l
l ll ll l ll l l
l l l l l l l l ll l l lll ll l ll l
l ll l l l l ll l l l l
l ll ll l l ll ll l ll lllll l ll l ll l l l l l ll l ll l l l ll l l ll
l l
l ll l
l l ll l l l l lll l l ll
l l l l l l l ll
l l l ll lll l l
l l l ll ll ll l l ll l l l l l l lll l l ll l l l l ll
ll lllllll l l l ll l ll lll
l l l l l ll l l
l l ll l l l l l ll l l ll l l
l lll l l l l l l l ll l l ll l l l
ll l
l l l l ll l l ll
l l ll
l ll ll ll l l ll l
l l l ll l l
l l l ll ll l l ll
l llll lll l l l
l l l l llll l ll ll l l l l ll l l l
l l ll l l l ll l
l l ll l
l l l l ll l
l ll ll
l l l l lll ll l l lll l ll l ll l l
ll l l l l ll ll ll l l l
l l l l llllllll l ll l l
lllll
l
lll l lll l
l ll l l ll l llll l l ll ll ll l l l l l llll l l l ll ll l l l l ll l
l l l
ll l l lll l l l l l ll ll ll l ll l ll ll l l
l l lllll l l l l l
l l l l l ll l ll ll
l ll llll ll ll
l l l l ll l ll l l l l l l l l l ll l l ll l l
ll ll ll l l ll
l
l
l l l
l l
l ll l lll l
l ll l ll
l l l l l l ll l
llll l
l l
l ll
l l l ll
l
l l ll l l lll ll l l ll l
l
l
l
l
l ll ll l l
ll l l ll lll l l l l
l l l l l lll l ll l l l l l l l ll
l llllll l l l l l lll l l l
ll
l ll l l l l l l
l l ll l l l l l ll l l l l
l l l l
l ll ll ll lll l l
l l l l l l l
l l l l l l l l l ll l l l l l l l l ll l
l l
lll ll l
ll
l ll ll l l l
l l l l l l l ll l l l
l l l l
ll l
l ll l l
l ll l l l l l ll l
l
l l
l l
l l ll ll l l l
l l l
l l l l
l l ll
l l l l l ll lll l
l l l ll l l l
l l l
l
l l l l l l ll l
l
l l l ll l
l l l l l l l l l l l l
l l l l l l l l
l l l l l l l
l l l l ll l
l l l l ll l l
l
−2
l
l l l l l l l l l l
l
l l l l l l l l
l ll ll l l l l l
l l l l
l l l
l l l l l l ll
l l l l l
l l l l l l l
l l l
ll l
ll l l
l l l
l
−4
4 l
l l l
l l l
l l
l
l
l l
l l l l l
l l l
l
l l l l
lll
l l l
l l
l l l l l
l l l l l l l l
l l l l l l l
2
l ll l l
l l l l l
l l l ll l l l
ll l l l ll
l l l l l l l l
l l l l l
l l l l l l l ll l ll
l l
l ll l l lll lll
l l ll l l l ll l l
l ll
ll l l
l ll lll l ll l
l l l lll
l l l
ll ll l l
ll l l ll
l l l l ll l ll l l l
l l l l l l
l
l
l
ll ll ll l l l ll l
l ll l
l l lllll l l
l
l
l
l l
l
l l l l l llll l
ll l
l l l l ll l l l
l l
l lll lll lll l l lllll
l l ll l l l l l ll
ll
ll ll l
l
ll l
l lll l l
l
l
ll
l l lll lll l l ll l
l ll l l l ll l lll ll l l l l
ll ll l l ll l l l l lllll ll l l lll l l l l lll l l
l l
l l
l
l l
l
l l l l l l
l l
l ll
l lll lll l ll ll l l ll l l
l
l l ll l ll ll l l
l l l l
l l l l l l
l ll l l l ll l lllll l l l l l l l
l l l llll l l l ll l ll lll
l l
l
l lll l l ll l ll l l ll l l l l
l
ll
l
l ll l
l l l l
l l l l l l l ll ll ll l ll l ll ll l lll lll lll l l l
l l ll ll l ll l l ll l l l l
l l
lll l l lll l l l l l l
l ll
l ll l l l ll ll l l l lll
l l l
ll l l l l l l ll l l l l ll lll l
lll
l
l
ll l
l l l
l l ll lll ll l l ll l l l
ll l l l l l l l l l ll l l l l l l l l l lll l
l ll ll l l l
l l l l
l l
l l l l l ll l l l l l ll ll l lll l
l l l l ll l
l l l l l ll
ll l l
l l l lll l l l l l ll l l ll l l l l l l l
l l l l l ll lll l l ll
ll l ll ll ll l l l
l l l lll l ll
lll l
ll l ll l ll ll l l l l l l
ll l l l
l
ll l ll llll l l l l
l
l lllll ll l l l
lll
ll
l
ll
l
l
l l
ll l l ll l l l
ll l l l l ll ll l l l l l
l l
l ll l
l
l l llll l l l l l llllll ll l ll l l ll l l lll
ll l lll l l ll
l l
ll ll ll l l l ll lll lll
l
lll l ll l
l l l ll l lll lllll l l l l ll
l l ll ll ll l l llllll l l l l l l l
l ll l ll l
l l llll ll
lll l ll ll lll l l l
l l l
lll l l ll l llll l
l
l l
ll l l l l
ll lllll ll l l ll l l l l l ll l l l l l
l l
ll l ll
l l
l l ll l llll l l llll l l l ll l
l ll l lll l l
l l l l ll
ll lll l ll l l
l l ll l l ll l l l l l
l l ll l ll
l l ll l
lll ll l lll l l l lll lllll l l llll l ll
l
l l
l lll ll l l l ll l l l l ll l l l l ll l l
l ll ll l lll ll lll ll ll
l
l ll l l lll l lll
l l l l lllll lllll ll lll
l lll l l l l ll l ll l l l l ll l l l
l l ll ll lll
ll ll ll l l l l
l
ll l l l l l l ll llll ll lllllll l llllll l l l ll ll l
lll l l ll l l ll l ll ll ll
l l l
l l
y
l l llll l ll ll ll l ll l l l l l l l l
0 l ll l l
llll
ll ll l l ll ll l l l l ll l l ll l l l l l
lll l l l ll ll l
ll l l l l
l l l ll ll l l
l ll
lll l l l l ll l l
lll ll l l l l l ll
ll
l ll l ll
l
l l ll llll l ll ll ll l l
ll ll ll l l ll l l l ll l l l l l ll l ll l l l ll l lll l l l
l l
l ll l lllll l ll l l l lll l llll llll l ll l
l l l l ll lllll
l
l l l l ll l
l l l l
ll llll l l ll l l l l ll ll l l llll l
l l l l l ll l l l lllll ll llll ll l l ll
l l ll llllll lll
l ll l ll l l l l ll ll l l l lll l ll lllllll
ll l ll ll l l l l l l l l l llll lll l ll ll
ll
l
l ll l l l ll l
ll l l l l ll l l l l l l lll
l ll
l l l l l l ll
ll lll l l l l
l l lll
l ll lll ll lll
l lll l ll lll l l l l l l ll lll
l llll ll l ll l
ll llll l l lll ll
l l l ll l l l lll ll ll llll ll l ll l l l l l
ll l
l lll l llll l lll ll l
l
l l l ll ll ll l ll l l lll l l l l l ll l l l l ll l ll l l l
l ll l
l
l l l l llll l l l ll l l l l lll l l l llll l l l
l l ll l ll
ll
lll lllll
l l l l l ll l lll lll llll l l l l ll l l
l l ll l l ll l l l lll l l l l lllll l l ll
ll l ll l l l l l ll l l l l lll lllll l lll l
l l ll l l
l l ll l l l l
l l l
l ll l l lll l ll l ll ll l l
ll lll l
l lll l l ll l l l l l l l l
ll lll
l l l lll l l l ll l ll l lll l lll l lll ll l l ll l
ll ll l
ll l l l l l
l l ll ll ll lll
ll l l ll l l l l l l l ll l
l l ll l l
l l
l l l l l l l ll ll l ll l l ll l ll l l l lll l l l ll
l ll ll l llllll ll l
ll l l l ll l l l l l ll l llllll
l l l ll l l l l
l l llll llll l llll l l l ll l l ll l ll l ll ll llll l lllll ll
l l l l lll l ll l l
ll
l l l l l lll
l l l lll l l
l
l l ll l l l
ll lll
l lll ll l l l l l
ll
l
ll l l l l l l l l l l ll ll l l ll l lll l l l
l l ll l l l l
l l ll
l l l l lll
l l ll
l l ll ll l l ll l l l l l l lllll lll lll l l l l
l l l l ll l l ll
lll ll l ll llll l l l l l
llllll l l ll ll l l l l ll l ll l l l ll lll ll l
l ll l l
llll l l l l l l lll lll l l lll ll l
l llll l
l ll l ll l l l l l l l l ll l l l l ll l ll l ll l l l l l l
l l l ll l l l
l
l l ll l l lll ll ll l l l llll l l
l ll
ll l l ll l l
l ll ll llll l
l l
l l l l
l l ll
l
ll l l lll ll l
l l ll
l l
l l l l l l ll l l l
l ll l
l l l l ll l l llll l ll ll l
l lll
l l
l l l
l
l
l ll llll l
l
ll l
l l ll lllll l l
l l l ll ll l l ll l ll ll
l ll
l
ll l l l
l l l l l lll l
l l l
ll ll l l l l ll l ll ll l ll l l l ll l
ll l ll lll l
ll l ll l l
l l l l l ll l l l l l lll
l
lll l
l l l l l l l lll
lll l ll l l ll l l l
−2 ll l l lll
l l l l ll l l l
l l l ll l
ll l l l ll l l l l l ll
l l l l ll l
l l l l l l l l lll l
l l l
l l l
l l
l l l l l l
l l l l
l l
l ll
l l l
l l l
l
ll l
−4
4
l
l l
l
l
l
l l
l l l
l
l l
l l l l l l
l l l
l l l
l l
l l l
l l l l
l l l
l l l l l
l l l
l
ll l l l l
l l l l l
2
l ll l ll
l l l l ll ll l l l
ll l l
l l
ll l
l l l l l l l l l l
l l ll l ll l l ll
l l l l
l
l l l ll l ll l l
l
l ll l l l l l l
l l l l l l
l l l l l l l l l l
l l l l l l
l l l lll l l
l l l
ll
l lll l ll l l l l llll ll l l lll l ll l
l
l l l ll l ll l l l l l l l l l ll l
l
l l
l
l
lll l l ll l l
l l l l l ll l ll l lll lll
l l
l l l ll l l l l ll ll l ll l
l lll ll ll
ll ll l
l
l ll l l
l l l ll l
l l l l l lll l l ll l l
ll ll l lll l l ll ll l l l ll ll l ll l l l l
l lll
l l l
l l l l l ll l ll l l ll l l l l ll l ll l
l ll l
l l l ll l l ll lll l l l l lll
lll ll l lll l l ll l
l l l l l
ll l l ll ll llll
ll l ll ll ll
l llllll
l lll l
l
l l ll
l l l l l l l l ll l l lll l ll l lllll l l
l l l
l
l l l llll l l
l
l ll
ll l
l ll l
l l ll
l
l l
ll
l
l ll ll
l l l l
ll
l l
ll l l l l l
l l ll ll l l
l l l ll l
lll l ll l
ll llll llll ll ll llll lll l
lll ll lll ll l
l ll ll l ll l l
llll l l l lll l
l
ll ll ll l
l
l
llllllll
ll
l lllll lllll ll l
l l ll l l l ll lll l l ll ll llll ll ll
l l l l
l lll l ll ll l l l l ll
l l l l l l l l l l ll ll
lll
ll ll l l ll
l l l l
ll l l ll
l l l lll l ll l l
l l l lll ll llll
ll ll llll l l l l lll ll l l l
lll ll l
l l l
l l
ll
l
l ll l l l l l l l lll llll
l llll l l l l l l l l lll l l
l l ll
l l
ll lll
l l l ll ll l
ll l ll
ll l l llllll l l
l l lll ll lll l l lll lll l ll l l l ll l l l llll lll l llll lll l
l
ll
l l l l
ll
lll
l l l
lllllll
l l lll
l l ll llll l ll l l l lllll l l
l llll lll ll l ll
l l lllll l
lll
lll
lll
ll ll
l
l
ll
l lll lll l ll l
l l ll l
l l ll ll llll l lll l l ll ll l
llll
llll
l l l
l ll
l l l
llll l l ll llllllll l lll l ll ll
lll
l lll
l ll lll l
llll
l l ll l l l ll l ll l l l l l lll l
l
ll
lll lll
ll lll ll
l lll llllll l l ll lll lll
l l l l
l
l l l lllll ll llllll l l l l ll ll l
l
lll
ll
l
lll ll l l
l ll l l ll ll l l l ll l ll l
l l ll l l l l l l l l l l
l l ll
l l l l
lll
lll ll l l l l lll l l ll l l l ll ll lll ll
lll
ll l
l l ll
ll
l
llll
l
l
l
l
lllll
l
lll
lllll
l l ll l l lll l l ll l l l ll ll llll ll
0 lll l ll l l ll ll l l l
l
lll l l lllll ll
l l l l l l l l ll llll lll lllll l
ll l ll l l ll lll
llll
llll
ll
l
l
ll l
ll l
l ll l
ll
ll
l ll ll l l l l l ll
lllllll ll l l
ll l ll l ll l lll
ll
l l l l l
lll ll
llllll
lll
llll
ll
l l l
l l l lll l l l l
l ll l ll l lll ll
ll
ll
lll
l l
l
ll l l ll ll l
ll ll l l l lll l ll lll
ll l l ll l ll ll
l
l l
lll ll
ll llll ll l lll ll
l ll
l l lll l
l
llll llll llll
llllll
l l
l l l
lll l l l lll
ll l
ll ll
l ll ll l l l
ll
l
lll
l
ll lll
ll l l
l ll
lll ll
lll l
l ll
l
ll lll
ll l ll
ll l
l
l l l l
l
ll
l ll
l
l
l
ll
l ll
l l
llll
l lll l llllllllll
l
l
ll
l
l
l
ll
llll ll l
l l ll lllll ll l
lll l l l lll lll
lll
ll llll lllll ll
l lllll
ll
l
ll
l l
ll
l ll l
l
l lll ll
l ll
lll l ll ll l l l ll ll
l l l l ll
lll
l
lllll
lll l
ll ll lll l
l
l l l
l
l
l ll llll l l l ll l l l ll ll l l
l l
ll
ll
l l l l
l lll
lll
l
l
l
ll
ll
ll
l
l
llll
llll lll
l l l lll l
l l lll l l ll ll l
ll ll l
ll
l
lllllll
ll l
lll l ll
llll
l
l
l
l
l
l
l
l
l
l
l
l
ll
llll
l
ll l ll l l l lll l
lll
l l
ll llll l l l ll
ll
ll l
l
l ll l lllllll ll l l l l
l ll ll lllll
l
lll
lllll ll l l l l
lll llllll ll l l ll
ll
l
ll
l
l
l
l l
l
l ll ll l llllll l l ll lll l
l
l
l
l
ll
lll
l l
llll lll
l l
l
ll
l
l
ll
l
l
l
lll l
ll
l
l
llll l
ll
l
ll l l l l
l l ll l
l ll
l
l
lll
lll
llll lllll l l ll
lll
l
l
l
l
ll
lll
lll
l l
l l ll l ll l l l l l ll l
l
l l
ll
l
l
l l ll ll
l l l ll l
l
l
l
lllll ll
ll
ll l lll
l ll
ll
l
ll
ll l
ll
ll l l l lllllll lll l l
l
ll lll
l l l l lll l l ll l l
l
l
l lllllll
l
l l l
ll
l
l
l
l
ll
l l
l
ll
ll
l ll
l l
l l lll l l
l l
ll ll
l l l l ll l
l
ll l ll
l
l l l l llll
lll
l
l llll llll
l
lll
ll ll l l ll
lllll l
l
ll
llll
llll
ll l l l ll
l ll ll ll lll ll
l
l l l
lll
l
l lll
ll
l l ll l llll
l l
l l llll
ll
l l lll llll l l
l l ll
ll ll ll
llll
ll l
l ll lll
l l
ll l lll
l lll
l
ll ll
l
l
l
ll
l l l
l llll lll l
l ll
llll
l
ll l l ll ll
l l ll ll l
ll l l llll
l l
ll
l
ll ll
l
l l
ll
l
l
l lll l l ll l ll lll
l
l
lll lll ll l
llll l l
l
l
l
ll
ll
lll l l l ll
l
l ll l lll llll l l ll
ll
l l
ll l ll
ll ll
−2 l l
l
l
l l l ll
l
lll
l l l lll
l
l l ll ll ll
ll
l
l ll
l l
l
l
l
l
ll l ll ll
l l l l
l l
ll
l l
ll l
l l l
ll l
l l
l ll
l l ll
l l l l l ll
l ll
l
l l l
−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4
x
Fig. 3.11 Samples from standard normal random variables X ∼ N (0, 1) and Y ∼ N (0, 1)
joined by a Clayton copula with several values of θ
We have discussed how to sample from the t- and normal copulas, but these
procedures do not extend to general copulas. There is a straightforward way
to produce samples from a joint distribution produced by copulas. Consider the
marginal CDFs for random variables X and Y , FX (x) and FY (y), and a copula
C(u, v). The procedure to produce samples from the joint distribution given by
C(FX (x), FY (y)) is:
1. Produce two uniform random variables ξ1 and ξ2 where ξi ∼ U (0, 1).
2. Set
72 3 Input Parameter Distributions
w ≡ C −1 (ξ2 |ξ2 ).
3. Then the samples x and y are x = FX−1 (ξ1 ) and y = FY−1 (w).
This sampling procedure is simple to perform with the possible exception of not
knowing C −1 (v|u). In this case we can use a nonlinear solver to perform the
inversion.
As a demonstration we will show how this works for the Frank copula. This
is a case where the inverse of the conditional CDF, C −1 (v|u), can be explicitly
calculated. For the Frank copula, we have
eθ eθv − 1
CF (v|u) = . (3.35)
−eθ + eθ+θu − eθ(u+v) + eθ+θv
The idea of a copula can be extended to more than two random variables. For this
discussion we will have a collection of p random variables X = (X1 , . . . , Xp )T .
Each of these random variables has a known marginal CDF FXi (xi ). A copula, C,
on this collection of random variables is function that maps a p-dimensional vector
u with each component in [0, 1] to a nonnegative real number. With this copula we
then define a joint CDF for X as
p
CI (u) = ui . (3.38)
i=1
For both of these copulas, we have already given a procedure for sampling from
the joint distributions. The algorithms that we discussed earlier need to take p
samples from a multivariate normal instead of two, and the rest of the algorithm
proceeds naturally. The multivariate extensions of these copulas will have the
correlation between variables specified by the R and S matrices.
Archimedean copulas in higher dimensions also have a natural extension. These
copulas can be written as
Note that each generator needs to have the same value of θ in this definition. This
means that Kendall’s tau, and perhaps the tail dependence, will be the same for all
the variables.
(1 − e−θ )k
F (k) = , k = 1, 2, . . . .
kθ
1 Here we use the notation for a gamma random variable as given in Sect. A.13.
74 3 Input Parameter Distributions
x1 x2 x3 x4 x5
75
50
x1
25
4 l
l l
l
l
l ll
l ll l
l
2 l
l
l
l l ll
l
ll l l ll l
l
l
l
l
ll l
l
l l
ll l l
l
ll
l
l
l
l
l ll
l
l l
l l
l
l l ll ll l
l lll l
l
l l l l ll l l
l ll l l lll l l
ll ll ll l l l ll l
ll l l llll
l lll l l llll
l l ll ll ll
l l
l
lll l llll l llllllll ll
l ll l
l ll ll l ll l l llll l l l l
ll
ll l l l llll l l ll ll l
lllll lllllll l l l ll
l
l ll ll l
l ll ll l
ll l ll ll l ll l ll ll l l l l
l l
ll l ll lll ll lll
llll
ll l lll llll l lll ll
l
ll lll
ll lll lll l lll l l
x2
llll llll ll l
l ll l l ll lllll lll l l
ll l ll
ll ll lll l
llllll l
ll
l lll
l
ll lll
l l
ll ll
ll l
ll
l l ll ll ll lll
l
l l l l ll l l
l
ll l l lll l
0 l lll l l
llll l
ll ll ll l l l l ll
ll
lll ll
l
l ll l l ll
llll llllll lll
l l lll lllll l
lllllll
ll
l llll
llllll lll
l l
llllll
llll l ll ll l
lll lll lll l
l ll l ll l l
l l ll l
ll l
lllll lllll
ll ll lll lll l ll l
llll
ll ll
lll
ll lll
l ll l
l lll l l l
l
l
ll ll
ll l
l
ll
lll
ll
lll
l
l
l
ll
l ll
l
l l l l ll l
l l l llll l
lllll ll
l lll l l l l
l lll
llll l
l l l
l l l ll ll
lllll llll l lll l l
l l
l ll l
lll ll lllll
l l l
l l
ll l l l l l
l
llll
lll
l ll
l l llll
ll ll l ll
lll lll lll
l l
l
lll
l
l l
lll
ll ll l lllll l
l
ll l l
ll ll ll
l l
lllllll l l
ll l ll l
l lll
llll lll ll
l l l
l
lllllllllllll
l l l ll l
l llll ll lll l
l lll ll lllll
l
lllll l l
l l l llllll l l
ll ll l l l
l
ll
l l
l
ll
lll l l
l ll
ll ll l
l ll l l ll
l l
−2 l
l
l
l ll
ll llll
l
l lll
l l l ll
l
τ = 0.6066266
l
ll
l
−4
4
l ll l l
l
l l l l l l
l l ll ll l l l l l l
l l l l
2 l
ll l l l l
l ll l
ll ll
lll l
l
ll ll
l
ll
l
l ll l
l
l
l
l l
l l
l ll
l
ll
l
l
ll l
l
l
l
ll l
l
l
l
l l
ll
lll l l
l
ll
ll
l
ll
l
ll
l l l
l
l
l
ll
l
l l
l ll
ll lll l l l ll l
l l l l l ll lll l l
l l ll
l lll ll l ll
l l
l l l
l ll ll lllll l l
l l l
lll l
ll lll lll lll l llll l l lll l l l ll
ll l
ll lll llll l l l l
l
l l ll llll l l l ll l
l l l l l
l lll l ll ll ll ll lll l l l l l l
llll lll ll
l ll l lll l lll l
l l l ll l l l ll
ll ll l l l lll lll
l l llll l
llll l l l ll lllll ll l
ll lll lllll l ll ll llll ll llll
l l l ll
l l l l ll l l l ll l
ll
l lll
l l l lll lll l l llll lll l
lllll l
l
l l l l l l l lllll ll l l ll lll lllll l
l ll
l
l l ll l ll l l l l l l ll l l lll ll l
l l ll
l lllll
ll
l ll
l
ll
ll l l l lll l
l lll l
l l
lllll lllllllll l
l ll ll ll
l llll
l l
lll l l lll ll
l l l l l
l l l ll
ll ll llll l ll l
l l l l ll l lll
ll l lll l ll
l l
llll
l ll l
l lll ll
lll lll
lll
l
lll l ll
l
l l
ll ll ll l
l l l
llllll l l l l l l
x3
l
l lllll l lll l l llll l ll ll l l ll lll lll ll l
ll
llll ll l l
l
l l l ll l
l
ll lll ll ll l l l ll l llll l
l
ll l lll
l lllll
l llll llll ll l l l l l l lll ll l
l
l l ll
l llllll llll l l l
l ll l llllll l ll
ll
ll ll l l l ll l l
llll
l ll
l ll lll l l
0 l ll
l ll l l llll l l l ll l
l l llll
l
ll
l
l
l ll l ll ll ll l
lll lll
ll l ll l l l l l ll
llllll
l l
l
l ll l lll
l l
l l llll
l l
lll lll
l
ll
llllll
l llllllll ll l ll l llllll lll lllllll l l
l
lll ll ll l l
ll
l ll
l ll
l
ll llllll
l
llll
l
ll
l
l l ll ll ll l l l ll lll ll
l
l l
ll lll l
l
l
l
lllllll lll
l ll l l
l l l l l
l llll
l
lll
l l ll
l
lll lll l ll l l ll
lll l
lll
l
llll l l ll l ll l
l
lll ll l lll llll
l ll
ll l
llll lll l ll lllll
llll
l
ll
l l l
ll l l ll ll ll llll
ll lllll l
l ll l l l
l
l ll l lll l
l
l ll
ll lllll l lll l l ll
l
l l
l l ll ll ll
ll
llllllll l
l ll llll ll
l l l
l l l l
ll l l
ll
ll
l
l
lll
lll
l l
llllllll
l
l l
l
l l
lll l lllll l
l
l l l l l ll l
l l
l ll lllllll lll l l
l l ll
l l l l
l l
l ll llll lll l l
l ll l l l l l lll l
ll l l l
lllll ll l ll lllll ll l
lllllll ll l
l lll l l l ll ll
llllll
ll
ll
l
l l
l
l ll
lll ll ll ll llll l l
ll ll ll lll l l llll l l l l llllll l
l ll l
l llll lll l
lll ll ll
ll l ll
ll lll lllll l ll
l l ll
l
ll
ll
lll
l
ll
l
lll
l
lll l ll
ll
l llll l l l
l
llllll ll
lll l
l ll
l
lll
lll
l l
l lll l lll
l lll lll
llll l
ll
l l lll
l
l ll ll
ll
lll lll l
l
ll
ll l ll lll
l l lllll l l l l l
ll lll l
l l ll ll l
llllllll l
l llllll ll lll l ll l ll
llll
lll l ll llll
l
lll
l
lll ll l
ll llll l
llll ll lllll l
ll ll lll ll
l
l l lll l
l l l l l l ll l l l
ll l l l
ll l
lllll ll l
ll
ll l l ll l l lll l
l
l
ll
llll l
l
l lll l
l
l
ll l l l
l l l
l llll llll lllll
l ll ll l lllll lll l
l ll l
llllll l l l
ll ll l l
−2
l l
l l l ll l l ll l l
l
lllll lll l ll l
l l l ll l l
l ll
lll l
ll
l
l l
l
ll lll ll l
l ll l
l
l
l
l
τ = 0.5947948 l
l
l
l
τ = 0.5977217
−4
4
l l l
l l l
l l l
l ll l ll l l l
l l l l l l ll ll ll
l l l
l l l
2 l
l
l
l
l ll
l
l
l
ll ll
l
l l
ll ll
l l l
ll ll
lll
ll lll l
ll l
l
l
l
l
ll
l
l
l
l
l
ll
l
l
l
l l
ll
l ll
l
l
l
ll lll l ll
l ll
l l ll
ll l
l
ll lll
l
l
lll l
l
ll l l l ll l
lll
l
l
l
l
l l l ll
l
l
llll
ll l l
lll
l l
l lll
ll
l
l
l
l l
l
ll l
l l
l ll
ll
l
l l l
l
l
l
l
l
l l l ll ll l l l l l
ll
l l l l ll l l ll ll l
l
lll
l l
l l l ll ll l l ll l l
l l l ll ll l l l l ll l
lll l l l l ll ll l l ll
ll ll l ll ll l l l ll ll lll l llll l lll l l ll llll l ll
lllllll l l
l ll ll l l ll lll l l ll l l ll lll l l ll l ll l l
l l ll l ll l l l ll l ll lll l llll l l l l l
lll l
l l l l ll l ll l l ll l ll l l l l ll l
ll lll lll l ll ll
l lll ll llllll ll l l l ll l lllllll ll l l
ll ll l l l ll llll
ll
l ll
l
l l
llll l l l l l
l l l
l l l l l lll l
l l l l l
l ll ll lll
l ll ll l
l l ll l llll l ll ll l l ll
lllll
lll lll l ll ll l l
lll l ll l lll l ll lll l l l l llll llll
lll ll lllll l ll l
l lll l
ll llll l l
l
llllll
l
lll
lll
l l ll ll lll lll
ll l l lll l l llll l
l ll l
l llll ll ll l l
lll l
l l lll lll
l l l l
ll
ll llll l lll
l
lll ll l l l ll lll l l ll ll lll l
lllll l
ll l
ll
ll
l ll lll ll lll ll ll l l l ll l llll
l ll ll
ll
l ll ll l l lll ll l lllll l l
l l ll lll ll l lll lll l l ll l
ll
l
ll l
ll l l l l ll
l
lllll llll
lllll l l
l ll l
l llllll l
ll
l
l
lll l ll
l lllll l
lll
ll l lll
l
l l l
ll l l l lllll l ll llll l l ll
l
lll l l llll lll ll
l lll ll ll l lll llll ll ll lll
l
l ll lll l ll l lll l llll lll l
x4
l
l lll l l lll l l l
ll ll l l l l l
l ll lll lll l lll l lll ll l ll ll lllll ll l ll
ll l
l
ll l ll
l
ll lll lll l l l ll llll l l l l
l l lll l
l lll l ll
ll
lll lll l
l llll l l l
l ll l lll l l
ll l
l
l lll ll l lll ll
ll l
lll lll ll
l
l ll ll l
l l lllll l lll l l l l ll l l
ll l ll
ll llllll lll lll l l llll l llllll l ll l l
l lll ll ll l l l
lll ll ll ll l ll
llllll l l ll l ll ll
ll
ll ll ll
l l ll ll l llllll lll l ll ll
0 l
l
ll l
l lll
ll
ll
ll
l
lllll
llll
l
ll ll
l
ll
l
ll
ll
ll
ll
ll
ll
l
l
l
l
l
llll
ll
ll
ll
l l
lll
lll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
lll
l
l
l
l
ll
ll
llll
llll
l
lll
l
ll
ll
l
l
ll
lllll
l
ll
l
l
l
l
l
ll ll
l
ll
l
l
l
l ll
l ll
l
lll lll
ll
ll
l
ll
l lll
ll
ll
l lll
l
ll l
lll
l
ll
ll
l l l
ll
l
l
lll
l
lll
lll l ll l l l
l
ll
ll l
ll
l
l
l
l
l
ll
ll
ll
l
l
l lll l
ll
l lll l
llll
l
llll
l
ll l
l
l
ll
l lll l
l
lll
l
l lll
ll
ll
lll
ll
l
l
l
l
l
l
l
l
l l
ll
l
l
l
l
lll
ll
ll
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
ll l
l
l
l
ll
ll
l
l
l ll
l
l
lll
l
l
l
l
l
l
lll ll l
l
ll
l
l
l
llll
l
ll
ll
l ll
l
lll
l
l
ll
l
l
l
l
l l
l ll l
l
ll
l
lll l
l
ll
l
l
l
ll
l
ll l l
l
l
l
l l ll
l llll
l
l lll ll
l l
l
l
l
l
l
l
l l l ll
l l ll
l ll l
l ll ll
l
l ll
lllll
ll
ll lllll
ll
lll
llll
ll
l
lll
l
l
lll
l
ll
ll
l
l
l
l
llll
l
ll
l
ll
l
ll
lll
lll
l
l
ll
l
l
lll
l
ll
ll
ll
l
l
l
l
l
l
l
l
l
l
l
l
ll
lll
l
l
l
ll
ll
llll
ll
l
ll
ll
l
l
lll
l
ll
ll
l
l
ll
lll l
ll l
l
l lll
l
lll l l
l
lllll
l
l ll
l
l l ll
l
l l
ll
l
l
l
ll
l
l l ll
l ll
l
ll l
lll lll l
l ll l ll
l
l
l l
l ll l
l
l
l
l
llll
l l
l
l ll l
l
l
ll l l lll ll
ll
l
l
lll
l
l
llll l
l
l
l
l l lll l l l
lllll l
l
l l
l
lll
l
l
lll ll
llll l ll l
llll lll
l l
l l l l ll
l l l
l l lll ll l
l l ll l l
l l l
l
ll
l l
ll
l
llll ll
ll l
ll llll l lll l l l
l ll l l llll ll
ll
l
ll
ll
lll
l
ll l l ll
l
lll
l ll
l l l ll
l l ll lll
l lllllllll
l l
l
ll
lll
l l l l
ll llll llll l llll llll l l
l ll
l l
l ll
l llll
lll
l ll l
ll
llll llll
l l ll l ll
l l l
ll llll
l
l
l
l
l ll llll l l
ll
ll
ll
l
l
lll l ll l
l llllll
l
l
l
ll
ll
lll
l lllll l
ll ll l l llll
llll l
l l ll
l lll lll lll ll l l ll ll
l l ll l
l
llll
l
llll
lllll ll lll l
lll lllll l
l lllll ll ll ll lll
l
lll
l ll ll l ll
ll l
l ll l lll l l ll ll l l llll
l l ll
l
l l l l lll ll lll
l llll ll ll
l
l lll ll l l l
l
ll ll ll lll l l lll l
ll lll l ll l l l l lll lll ll lll l ll l
ll lll lllllllll
ll l l
llll lll ll l l
l l lll l l ll ll lll ll ll l
l llll ll ll lll l l l ll l lll l
l llll
l ll
l
l
l ll l l l lllll
lll l ll ll l
lllllll
ll
l l l
lll l l l lll ll ll ll lll l l l l lllll lll l
l llllllll l l l l
lll l l
l lll l l
ll
ll ll l ll l l
ll l l ll ll
lll
l l ll lll ll
l l ll lllllll l l
ll l l l l
ll lll
−2 l lll
llll lll ll llll l lll l
l l
l ll l l ll l
l l ll ll
lllllll ll lll ll
l l llll l ll l
ll l l
lll l l l ll l l ll l
l l l l l
l ll
l l ll
l l l
l l l ll
l l l l l l
l l l
−4
4 l l l l
l l l l l l l l
l l l l
l l l l
l l l l l
l l ll l l l
l l l l ll l
l l
l l
l l l l l lll l l
l l
l
l ll l
l
ll ll l l l l
2 ll
l
l
l l l ll
ll lll l l l
l l
l
l
l l
ll
llll l
ll
ll ll
l l
lll
ll
l l
ll l
l
l
l
l
ll
lll
l
l
l l l ll
ll l
l
l
ll l
l l l
l
ll l
l
l
l
l
l
ll ll ll
ll l l l
lll l
l
ll
ll
l
l
l l l
l
l
l
l l
l
l l
l
l
l
l
ll l
l
ll
l ll
ll l l ll l ll l l l
ll l
l
ll l
l l l l
l l
ll l l l l
l
ll
l
l
l
l
ll
l
ll
ll l l
l ll lll
ll
l
l
l ll l l l
l
l
l ll
l
l ll
l l
ll l ll llll
l
l l
ll
ll
l l
l l l l
l ll
l l l l lll l l l l l l
ll l ll ll ll l l l l
l l ll l l ll l
l l lllll ll ll ll ll l
l ll l
l ll lll
l l
l l
ll l
l lll l l ll lll l ll l ll l ll l l l l lllll l l l l l l llll ll l l
l l
l l l l
ll llllll l
l l l ll ll ll l
l l l
l llll l l ll l l llll
l
lll l l
l l l l
ll l l ll
l
ll l ll
ll l l lll l l lll lll
l l l l lll
ll l
l l ll l l
l l ll
l l l ll
l l l l l l l ll llll l ll ll ll l l l l ll l l l ll l
l l lll ll l l
l l
lll
ll l ll
l
ll l l lll ll l l l l ll l l
l l ll l l l l
ll l ll l
l l l l ll l ll l
ll l l lll l
ll l ll ll l l lll l l ll l ll l ll
l llll lll
l lll l l l l lll l llll
lllll l l l l l l ll l l lllllll l l l
l ll
lll ll ll ll l l
ll lll ll ll l ll ll l
l l l l l l l l ll l ll l ll lll l lllllllll l
l llll lll ll l ll
l l ll l l l l
l
lll ll l ll
ll l l l l ll l l l l l l l l
l llllll l llll
lllll l ll llll l l l
l l
llllll lll
ll lllll
l l ll ll ll l l llll lll ll
lll ll ll l l l l
llll l lll lllllll lll ll ll l l ll
lll
l ll
l lll ll l ll
ll l l l l
l l
ll ll l ll
ll
ll llll
l l ll l l l l l
l ll l
ll l
l ll
l l ll l l
ll ll l lll ll
ll
l l lll l
l
l ll ll l
l ll l
lllll ll ll l l l
l lll ll
l ll ll llll ll l ll l l l llll lll l llllllll ll l l
l
l l l l l l ll
l llll ll l ll
l l l l l l
lll l l ll lll l l l
l l l ll ll l l l ll
l l l l
lll l lll
l ll l l
ll l ll l l ll l l l l l
ll
l
llll ll
ll l ll l ll l lll llll
l l l lll l l l l
l ll l l ll l l ll l
lllllll l
lll
l
l l l l llll ll
l l
l llll lll ll
ll l lll
ll l ll l ll l
l l
ll
lll
l l ll l
l lll l l lll lll
lll l l ll ll l l l l l l l ll lll
ll l ll ll l
x5
ll lllll l l l l l l l ll ll llll l ll ll l l
l lllll l ll l ll l l ll l
l
l l l
llllll l l llll l
ll
ll l
l l
llll l
ll
ll ll l l
ll ll l l l
l
l llllll l ll l ll l l l
lll l l ll lllll ll ll ll
lll l
l ll l l l llll l
ll lll l ll l llll ll lll ll l l l lll l
l l
ll lll ll lll l
l ll l lll l l ll
l
llll
ll
ll
l
l
lll ll ll l l l l l l ll ll lll lll
lllll
l ll ll lll l l l l l ll l lll ll ll ll l
llll ll l
ll lll
l l l l l l l
lll lll l l l l
lllll
l
l ll
ll ll l l ll l lllll ll
lll ll l lll l ll l l
l l ll ll ll lll
ll l llll l
l
ll ll l ll l l l ll l
ll ll l l l l l
l l lllll l
0
ll ll ll l llll ll l ll
l l ll
ll
l l ll l ll l l ll
lll ll l l l ll
l l l
lll ll
ll ll ll l ll l l l ll ll
l ll
ll ll
l
ll
l
l lll l l l l l l
ll l ll
lllll l l
ll
ll l
ll l
l
ll lllll
l lll
l l l l
l l
ll l l ll
ll lll l
l
lll
ll
l l
l
lllll ll lll
lll
l
l ll ll l l l l ll l
lll l ll l
llll lllll l l l
ll l lllll
l ll
l llll
l l
llll l
ll l l
l
l ll
l
l l
l
l l lll ll l l
l ll
l
lll
lll l l ll
l
l
l
l lll ll
l l l l ll l lll l lll lllll
ll
l ll ll
l
lll l l l l l l lll l ll
l llll l lllllllllll
l l
ll ll l l ll
l
l
lll l l ll
l
ll
l
ll ll l ll lll ll l l l l l ll lll l l llll ll llll l l lll
lll
l
ll l lll
l llll
lllllll
l l l ll
l l l ll l ll
ll lllll llll
llll ll l l l ll l l l l lll ll lllll
l ll lllll
l l ll l ll ll ll l l l lll
l ll ll llllllllll llll l l ll l
l l l
ll llll
l l
llll ll
ll ll l l l l
ll l l ll
ll ll ll
l
lll ll l l
lll l l l l ll ll l ll ll l
l
lll l l
ll l ll
ll
l l l
l llll l lll
l
l
l ll
l l ll
ll l ll
l l llll
ll lll
l l l ll l ll l l ll ll l lll llll
l l
l
lll ll
l
l l l l l ll l l l llll l
llllllll l ll
ll ll ll l
l lll l ll lllll
ll l
l l llll
l l
l ll l
l
l
l
llll
llllll
l lll
ll l ll
l ll
l ll l
ll
ll l
ll
l
ll ll lll
l l ll ll ll
l l l
lll llll
l l
l
l
l lll l l
l
llll l
llll
l l llll ll
l
l
l
ll
lll
l l ll
llllllll ll
ll l ll lll l
l
ll
l l
llll l ll l lll lll l
lll l ll l ll ll
l
l
l ll l ll l l
ll
l
ll l lll l
lllll ll l l l
llll l l l
l ll l
ll
ll l
l l l l ll ll l l
llll l
llll l lll l l l l l l
ll
ll
lll
l
ll l
l l l ll l lll lll l
lll llll l l l
lllll
lll l ll l ll ll ll lll ll lllll l l l
l ll l l l l lll lll
l lll
l
ll ll
llllll l lll ll
l l l l l ll lllll
l
l ll
llll ll
l
l l l l l l l
l
ll l l
llll lll lll l lll
l l l
ll
l lll ll l
lllll lll
lll lll l l
l l ll
ll
l
ll ll llll l lllll l l l
l
llll l l l lll l lll l
ll
ll ll l ll
l l l l ll l l l l llll ll ll l
ll l
ll l l
lllll
ll
l l
ll
lll
l
ll ll l l l ll l
llllllllll
ll llll
lll
ll ll l l l lll
lll l
llllll
l
l
lllllll
llllllllll l l l l ll
l
l l
lll
ll
lll
lllll
ll
lll ll l l l
l l
ll
l
l
l
llllllll
lll l l lll ll l
l ll l l l
l lll
ll
ll
l ll lll
l l
l l l ll
l l l
llll
llll lll
l l l l ll
llll ll
lll
lll
l llll
l l ll l
llll l
llll
lllll l l l l l l lll l l ll
l ll l l l l l
lllll
l ll llll llll lllll l l ll ll llllll lll l
l
ll l
l
l lllllll ll l l l
ll ll ll
lll
ll ll l l l ll lllll lll
l l l l l l ll ll ll lll l lll
l ll ll lll
llllll ll l l
l ll
l
l llll llll l ll l l l ll
l ll lllll lllll l l ll l ll lllllll l
lll lll
ll ll l lll ll ll ll ll l l lllll l
l l l
ll llll
l l ll l lll ll ll lll llll ll l l ll lllll l ll l
llllll
ll
l l l ll ll l lll l l l llll ll l ll lll
l l l
lllll
l
l ll
l lll l l l
ll llll ll l l l l l ll llllll l
llll
l ll l l l ll lll
ll l ll l lll ll l l l lll lllll ll ll
l ll l ll l l lllll
ll ll lll l l l ll
l
l
l
ll l
lllll l llllll l
l ll llll ll l ll lll ll ll
ll
l l l lll ll l l l l l ll
l l ll l
l l l llllll llll l l l l l lll l l ll ll
llll l ll l
l lll ll l l l
l l l ll ll l l lllll
−2 l
l
l
l
l l ll
ll l
l
l lll l
l l
l l
lll l
l l
l l
lllll
l
ll l
l
l
l
l ll
l l ll
l
l ll l
ll
l
l
l llll l
l llllll
ll ll l l
l
l
l
l
l
l
l
lll
l ll l
l
l
ll l
ll ll l l l
l
l
l
l l l l
l l l l
−4
−4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 −2 0 2
Fig. 3.12 Samples from five standard normal random variables joined by a Clayton copula with
θ = 3. Note how the Kendall’s tau value between each pair of variables is constant
x1 x2 x3 x4 x5
100
75
x1
50
25
4
l
l
ll l
l
l
l l l ll l l
l
l l
l
2 l
l
l
l
l l l
l
ll
lllll
l
l
l l l l ll l l
l ll ll
l
lll l l l
ll ll lll
l l
l ll
l
l
ll l l ll
ll
l
l llll llll l l l
l l ll llll
l l ll
ll l
l l
l ll l ll l ll ll ll lll
l l ll
l l l
l ll l l l lll l
ll ll l llll
l l lll l l ll l l l l l
ll l l ll ll
l lll
llll l l lll l l l
l l lllll l l
l lll
l llllll l
l
lll l
l l lll l l l ll
lll l
l l l ll l ll l ll
l
llllllll llll ll l
l l ll l
l l
ll ll l l lllllll l
l l l
lll ll ll l
lllll l
lll ll
ll l ll lll
l
ll l ll l l
ll lll lllllll ll l l l lll
l l ll l l
l l l ll llll lllll ll l
x2
l
l l ll ll
l ll l ll lll lll l
lll
lll
llll l l llll ll l
ll
lllll l l ll l l l ll l llllll ll
lll l l
l l l
l ll l ll ll
0 llllll l ll
l l ll l llll
l
ll lllllllll lllll
l ll ll l lll l lll ll l ll
l l l ll l
l l l l ll l
ll l
lllll ll lll
lll
ll
lll l l
l
l ll ll
l
l
lllll
ll lllll lllll lllll l l
ll l l l lllll
lll
l l l
l
ll lll lll ll l
l l
l l
ll lll l ll ll l
l
llllll l lll l ll l l
lll lllll l ll l l l l ll
ll l ll
ll l lllll
l
llll
llll llllll l l
l l l ll l l ll l l l
l
l l ll lll lll l
lll l ll lll lll
l
l ll lll l l
l llll
l l llll llllll l
ll ll l l llll
lllll
ll l lll lll l l lll ll ll
l llll l ll l l l
lllll l l ll l ll l lll l l
lll l l l lll lll
l ll ll l l lll
l l l
ll l lll ll l l l lll l
l l l l ll l ll l l l l ll ll l l
ll
ll l l lllllll l
l ll l l l l l lll lll l l
ll llll
l l lll ll ll llll l
l l l l
l
l l ll l lll lll
l l l l l l
l l ll l
−2 l
l
ll
l l
l ll
l
l
l
l
l
l
l
l l
l
l
l
ll l
ll
l
−4 τ = 0.5642763
4
l l
l l l l l l
l l
l
l l l
l l l l l l l
l l
l ll
2 l l l l l l
l l l ll l ll l l l
l l l l l l
l l l l l l l l
l l l l ll l l
l l l lll l l l l l ll ll
lll ll l ll ll l l ll ll
lll l l ll l ll l l l ll ll ll l
l l ll l l l l l l l l ll l l ll l
ll l l l lll l l ll l l l l
lll l l l
ll
ll lll
l
l l l l
l l l l l l
l
ll l l ll
l
ll
l l l l ll ll
l ll l l
l l l l ll
l l ll l l
ll l l l ll l ll ll ll l
l l l l
l l l
l
ll l l ll
ll llll l ll l
l l l l
ll
ll llll
l llll
ll l l l lll l
l ll l
l l ll ll l lll l ll
l llll l ll
l
l
l l l l l l ll l ll l lllll l ll l l
lll ll ll lll l l l ll lll ll
lll l
l ll l l ll l l llllll
llllll ll ll l ll
l l ll l l ll ll
lll l l l l
lll
l
l ll ll
ll ll l ll l l llll ll l ll l
llll l
l ll lll l l l llllllll ll lllllllll l l
l l llllllll ll ll
l
l l ll l l
l
l
l ll llll lll ll l l l l l l l
l
l ll l ll
ll ll ll lllll l l llllll
l
l l l ll l l ll ll l
ll l lllllll ll
ll llll l ll l l l l ll ll
l ll llll lll llll ll ll ll
ll
ll l ll l l ll l ll l l llllllll l ll l l
ll llllll l l lll l ll
l l
l lll ll l ll l ll l l
ll l
x3
l l l ll
l l ll ll l l lll ll l l l l l
l l lll l l l l ll llll l llll
l llllll ll l ll l l l llll l ll l ll l lll l l
l l l llll ll l l
l ll ll l l l
l l l ll ll l l l ll l l ll ll ll lll l l ll
l l
l l
l
l ll ll
lllll l l lll ll l l ll lll ll
l l lll l
ll l llll ll ll l ll l ll l l
ll l lll ll ll l
l lll
0 l
ll
l
l ll
l
lll llll l
l
l
l
l
l
ll
ll ll
l ll l
l l ll
l l llll
l l l
l l
l
l
llll
l
l
l
l ll
l lll
llllll l ll
l l ll l
l
l
l
l
l
llllll lll l l
llll
lll ll ll
llll
l
ll
l
ll
l
l
ll
ll
ll
l
l
l
l
ll l l
l
lll
l
l
l
l
l
ll l
lll l
ll
l
l
l
l
l
ll l ll
l l l
l
l ll l
ll l l l
l llll l lll
ll
llll
l
ll
l
l
ll
ll lll
ll ll l
l l
ll l
l
l
ll
l
l l l
l
l
l
l
l
l ll
l
l
l
l l
l
l ll
l
lll l
l
l
ll l
l l l ll l
l
l l lllll
ll l
l l
l
ll
l l
l l
l
l
l
l
l
l ll
l ll
lllll
l
l
lllll
l
l
l
ll
l ll
l lllll l
lll l l
ll
l
l
lll
l ll
l
llll
l
l
l l
l llll
l
l l lll
lll lll
l l
l l
ll
lll l
ll
l l ll l l l
l l ll ll
l l
l
l ll ll l
l l ll
l ll l ll
ll l
l
lllll
l ll
l
lll
ll
l
l ll
l
ll l l
lll
l
l
l l
ll l l l
l l
ll lll l l l l
lll
l
ll
ll
ll
l
ll
l
l l
l l l ll l l l l l l l l
l lll l ll l l
l lll l l ll ll
ll l
l ll l llll l ll
l l l l lll
l l l l l
ll l ll
ll ll
l
l
l
ll l l l l ll l l
l llll ll l ll lllll l
l l l lllll l lllll llll ll
l l l lll
l
l ll ll
l ll l
lll l l l l l ll l ll lll llll ll l l ll l
l l l l ll ll l l ll llll l l
ll l lll l l ll l l l
l lll l l l
ll
l ll l l l l llllll l
l
l
lll l ll ll lll l ll l
llll l l ll l ll
l l
l ll llll
ll
l l ll l l l l ll l ll
lllll l l ll l l ll l ll ll ll
l
lllllll l l l l lllll l llllll
llll l ll
l ll ll l ll
l l ll l lllll lll l lll l ll ll l l l ll l l ll ll l ll l lll ll l
l l l llll
l l l ll l l ll
l ll l l
l lll llllll l l
l ll ll l l l l l
l l l l l l l ll l l l lll l l l l
l l l lllll ll ll l l
l ll l
l llll l lll l l l l l l ll ll l
l l
l
l l l lll l l l ll ll ll
ll l l l ll l ll l l l l
l
l l lll l l l ll l ll
l l l
l l l l ll
l
ll l
l l l llll
l
l
l l l ll l ll l l
−2
l ll ll
l l l l l l
l ll l l l l
l l ll ll
l l l l l l l l ll l
l ll
l l l
l
l l ll
l
l l
l l
l l
l l
l l
−4 l
τ = 0.3486366 τ = 0.3285285
l
4
l l l
l l l
l l l
l l l l l l
l l l
l l l
l l l l
l l l
l l
l l l l l l
lll l l l l l l l l l l l l
2
l l l l l l l l l
l l ll lll l l l ll l l lll l
l l llll l l l l
l l l l ll l ll l l l l l ll
l l l l
l ll ll l ll l l l
l ll ll l l
l l ll l l l
l l l l ll
l
l l
l l l ll l l
l l l l l lll
l l
ll l ll l l l l l l ll ll l l ll l l ll l ll l l l l
l l l l l l ll ll l l l ll l l l l l l l l l l lllll l ll l l
ll l l l l l l l l ll l l l l l l
l l ll l l l l lll l l ll l ll l l l l l
ll l ll l l ll l l l l l l llll ll l ll
l ll l l l ll l l
l l ll l l l ll l l l l ll l l lll l ll l ll l l ll l l l l l l ll l ll ll
ll l
ll ll l l l l lll llll ll l l l l
l l l l ll l
l ll l l ll l l l ll l llll l l l l ll ll l l ll
ll l
l
l l l l l l ll lll l ll l lll l l
l l l l
l ll l lll l l ll
l l l l l l
l l
l lll l
l
l l
ll ll l
l ll l
l l lll ll l
l l l l ll l ll l ll l lll lll l l
l
l l ll l
l
l ll ll l llll ll
ll ll l l l l
ll ll lll l l lll ll l ll llll l l l l l l l l lll
ll l lll ll l ll llll
l l ll ll l l
l l l ll l l
l ll l l
ll
lll ll l ll l l l l l l l ll
l l lll
l l
ll l l l l l l ll ll ll l llll l l l
l l ll llll ll l l l l ll l
l l l l
ll
ll ll l lll l l l
l l ll l ll l l
l
llllll l lll ll l l ll l
llll llll llll l l ll l ll
ll l lllllll l l l l
l
l
lll l lll
ll l
ll l
l ll ll
l
l
l l l l l l
l llll l l l ll
ll ll l l l l l
l l l l
lll l lll
l l
ll l llll
l ll l lll ll l l ll ll l llllll
ll lllll l l l l
l l l
l lll l ll
l l l l
ll l l
l l ll lll llll l l l l l l lll lll l l llll l l llll l lllll ll lll l ll l ll l
l l l ll l
l l l lll ll l ll l ll llllll llll l l ll l
l l l ll
l
l l l l lllll l ll ll lll ll l l l l l ll l lllll lllll ll ll
ll
l l ll l lll l ll l lll ll
x4
l ll l ll ll
l
ll
l ll l ll l ll l l l l l l l l ll ll l l l lll l l l l ll lll l l ll l
ll ll
llll
l l lll l l
l llllll l lll llll
llllll l l lll
l l l l l
l llll lll l lll lll
l l ll l l lll l l
l l ll l ll l ll l ll ll
l l l llll
l ll l
ll l
l l ll l lll ll l l l llll lll ll l l l
ll l
ll lll l llllll
lllll l l ll ll l ll
l llll
ll ll
ll ll ll l l l ll l ll l ll ll l l l ll lll ll lllll l ll
l lll ll l ll ll ll ll
ll l lll ll l ll
l lll lll l l l lll
l l
l
lll l l l ll ll
l l l l lll l l lll ll l l l l l l
l ll l ll ll ll
0 ll l ll l l l ll l l ll l
ll ll
l l
llll ll ll l lll l lll l l ll ll l
ll l l ll lll lll lllll
l l lll l l lllll l l ll l
ll l l
l l ll l
l l ll ll ll l ll lll lllll l
l
l ll l
ll ll
l l
l ll l l lll l
l l l
l l l l ll l l l ll ll
l
ll l lll
ll l l ll ll
ll lll ll ll l l ll l ll
l ll ll lll l l llll l ll lll lll l l l l l ll l l
l lll ll lll lll ll ll
ll l
l l ll lll l l
lll
l l l llll l ll
l ll
l ll
llll ll l l l l l
l ll l llllll l
l
l l l
l l l l ll ll l l l l
lll
ll
lll l l l l ll l ll
ll llllll lll l l
ll l lll
l
llll ll llll
ll l l l
ll l l
ll
ll lll ll
lllll
l l lll l ll ll
lll lll l l l lll llll l l
lllllllll lll l
l l ll
l ll l
lllll l
l l
ll ll ll ll l lll l l ll lll l ll l l l ll l ll l llll
l l l lll l l l lll l ll ll l ll l l l l l l lll lll ll ll ll l l ll l ll l ll ll ll l ll l
l l ll l ll l lll l l l l ll ll l l ll ll ll lll ll l l llll
l l
ll
l ll l l ll ll l llll ll l l ll l llll l l
ll
l ll llll l lllll
l ll
l l l l ll l ll l
l ll l l ll ll
lll ll
l ll l l
ll l l ll ll l llll
lllll ll
l ll
l lllll lll l
llll ll l
l l l llllllll lll ll lll lll l l lll l l l l l l ll llll l l lll ll
l l l ll l ll ll l ll
l l
ll
l
lll
l l
ll l
ll ll lllll l l ll l l ll l l l l l l l lll l l llllll l l l
l ll
l l l l l l l l ll l l l lll l l l l
lll l llllll ll llll l l l llll l l
ll lll l l ll llll ll l
l l l ll ll l l lll ll l l l
l l ll lll lll ll l l ll l l l l l l l l l ll ll l l
lll
llllll ll ll lll ll ll l l ll lll
l l l ll ll ll ll ll ll ll
l ll l l
ll l l ll ll l
l l ll ll l l ll ll l lllll l ll l l ll l
l
l l ll l
l l l ll ll l
l ll l l l
l ll
l l l ll
l ll
llll l ll ll lll ll ll
ll
lll l ll l
l l
ll l llll lll
lll
l l ll l l l l ll l l l l ll
l ll llllll ll l l ll ll l l l l l ll l lll ll
l ll lll
lll l l lll ll ll l
l ll l l l l l ll l l ll l ll ll l l ll lll l
ll l l
ll ll l ll l llll l l l l ll l ll l l l lll l lll l l l l l l l ll l l l l ll l
ll l l ll l lll l l l l l l ll ll l l ll l l l ll l ll l ll l l l l ll ll l
l ll l l lll ll l l ll
l ll l ll l l l l l l l lllll
l l l l
l l
l l l l ll ll llll
l l l l l l llll ll ll l
l l llll l l l ll l l l l l ll l ll ll l lll lll llll l
lllll l l
l ll ll l l l l l l l l lll l l l l l llll l l lll l
l l
l l
ll ll l ll l l l l l
l
l
l l l l l l
l l l ll l l lll l l l l l ll
ll ll ll l l l ll l l l ll l l l ll l
l l l l ll l
l l l
l l lll ll l ll l ll l l
−2 l
l
l
l
l l
l l
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l
l
ll
lll
l
l
l
ll l
l
l
l
l l
l
l
l
l
l
l
l
l ll l l l
l l ll
l
l
l l
l
l
l
l
l l
l l
l ll l l
l
l ll
l l l
l l l
l l l
l l l
2 l
l
l l
ll
l
l
llll
l l
ll
l
l
l lll
l l
l l l
l
ll
l
ll
l
ll
l
l
l
l l
l l ll l
l
l
l l l
ll
l
l
l
l
l
ll lll l l
l
l
l
ll
l
l
l
ll
l
l
l l
l
ll l
l ll l l
lll
l
lll l
ll ll
l l
ll l
l
l
l
l
l
l l
l
l
l l
l
l ll
l
l
l ll l l
l
l l
l
l ll
ll l l l l
l ll
l
l l l ll
ll l
ll
l l
l
l
ll l
l
l
l
l
l
l
l
l l
l
l
l
l l
l
l
l
l
l l
l l l l ll llll l l ll l
l
l
l ll lll ll l
l
lll
l l lll l
l
ll
l
l l
l
l
l
l
ll
l
ll l ll l ll l l l llll l ll l
l l l l ll l l l l l
l l l l l ll l l l
l l l l ll l l l l l l l l l l ll lll
l l l lll l l l l l ll l l l ll
l lll
l l l lll l ll
l
l l l l l l l l l l
l ll l l l l l l l ll lll ll ll l ll l
l ll l l l l l l l ll
l l l lll l l l
l l l l lll l lll
l
l lll ll l l
l l l llll
l lll ll l l l lll l l l ll ll l
l l l l
lll l ll ll l ll l
l l l lll l l ll lll ll
l
ll l
l
l l ll l ll l ll l l ll ll l l l l l l l l l l l lll l ll l l ll l l ll l
l ll l
l llll lll l ll l l ll l l llll l ll ll llll
l
l l l ll ll l l
ll
l l ll l l ll l l
lllll lllll ll ll l l ll l lll l l ll ll l l ll ll ll ll l
l
l
l l
l
ll ll lll l l lll
l lll ll l l l l ll
l ll l l l l l l l l l ll l l l l l ll lll l ll l ll l lll l lll ll l ll
ll llll ll
l l l
l l ll ll
l ll l ll l l
l ll l l l l l ll l l
l
l ll lll l l l l l llll ll l l l lll l l ll l ll lll l l l
l l
l l ll l ll l l l l ll ll lllll l
l l ll l l l l lll l lll l l ll l l l l
l l ll
l lll
ll l l ll ll l
ll l llll lll
l ll l l l l ll l
l ll llll ll l ll l
l lll l
l l l ll
l l ll lll l
ll ll l l l l l l ll ll l lll
l l l
l l
l ll ll l l
l l l llll l lllll l l
ll ll l l l
l l
l l
l ll ll l ll l l llll ll ll ll l l l
llll ll l l ll
l l lll
l l
l l ll l
lll llll lllllll l l l l l
ll l l ll l ll ll l l
l l lll l
l l l l ll ll l ll l
l ll l l l
l
l
l l l lllll
l ll ll
ll l l l l l l l l ll l l
l
l
l
ll l l l
l
ll l l l ll l llll l
ll ll ll l l l ll l lll lll ll ll l lll l l l l l ll lll l ll l l l ll l ll l
l ll ll l ll l
l l l lll l l ll l l lll l ll l l ll l ll ll l lll l llll ll l l llll lll l l l l
x5
l l llll l l lll ll ll l l
ll l l lll ll
ll l ll l l ll
llll ll lll llllll l l ll l l l
ll l
ll ll llll l
l
lllll lll l l ll l l l
l ll lll lllll
l l l l lll l l lll l l l l l
l ll
ll
ll lll ll l lllllll lll l l ll
l l l l ll lll lll
ll l ll
l l
ll l
l l ll ll l lll ll
lll ll l ll l ll l l ll l
lll l llll ll lll ll
l lllll l l
l
ll
l ll
ll llll l l l l llllll lllllll
lllll ll ll lll l ll
l ll l l
l ll l ll
l lll
l lll
l
l l l ll ll lll lll ll
ll l ll l l l ll l l l ll ll
l l l l l
l l ll ll
lll l
llll llll
l l
ll ll l l l l l ll l l ll
lllll lllll l ll
l l ll
l llllllll lllll ll l l
lll
l l ll ll l ll l l ll l l l ll l l l l ll lll
l lll
ll llll ll
l
l l
ll lll ll
l
l l
ll l l l ll
l
l ll l l
ll l ll l
l l
l ll
0
l l l ll l l l l llll l l l lllll l
l ll
l ll lll ll ll l ll l l l l l ll l ll l l ll l l l l ll ll l lll
l l l ll ll l l l
l l lllll l l lll
l l lll l l
l l l ll l l ll
ll lll l ll
l ll
l
l
ll
ll
ll l
l llll llll
l ll lll l l ll l
lll llll l lll lllll l lllllll
l
lll lllll l
l l
l
l l l
ll l
lll l ll ll lllll
ll ll l
llll lll l
l llll l ll l l l ll ll l l
ll
l l l ll l lll ll lllll
l l
l lll ll lllllll
l lll ll ll l l l ll l ll l l ll l ll l l l l l l l
l l l l l l l l l ll l l l
l ll l lll ll l
l ll
lll ll l ll ll llllll l l l l ll l l l ll l ll ll ll l
l
llllllll
llll ll l l l ll l l lll l ll ll
ll lll llll l l llll l ll
ll l
l
l ll lll l ll
llll l ll l
ll
l l l llllllllll
llll l
ll
ll l ll l llll l lllll l l ll
llll
l ll l l l l l ll l lll ll ll l
ll ll ll
ll ll l l l ll l l l ll l l l l l
l ll l ll l l l llllll lllll l l ll l l lll lll
ll ll ll
ll l lll ll l l ll lll ll lll l ll l l l llll
l ll ll
l
l l ll l l lllll l ll ll
l ll l
l ll l ll l
l l
lllll l lll lll l ll l l l l
ll l ll l l l l ll l ll l ll llllllll l l l l lll l l l ll lll l l llll l l l l l ll l l ll ll l ll ll
llll l
l l
l ll lll l
l l lll
ll l l l l
l
l
llllll l l l
ll ll l l l
ll ll l l l ll l
l l l l llll
llll ll ll
l ll ll ll l l l ll l l llll ll ll l
ll
l l
l
l l lll l l l ll l l l
l l l l l l l
l ll
lll ll l ll
lll l l l l llll
l l l l ll lll l l l l l ll l llll ll ll l l l l ll llll lll
l l ll lllll
llll ll ll
l llllll ll ll ll l l l l ll lll
l l ll l lllll l l ll l l l l lll ll
l l
l l lll
l l l
l l l ll ll lll
l llll ll l llll
llllll ll l l l l l
l
l l ll ll l lllll
l ll
l ll l l l l
l ll l
l lll l l ll l l l
l
l ll l llll ll ll
ll l l
l
ll
l ll
l l
ll l l l l l ll
l l
l llll
l l l ll lll l l
l l
ll
ll lll ll l l l l ll ll l lllll l l lll
l l ll ll l
l lllll l ll l l lll
ll ll l
ll ll l l l l lll ll l l
l llll ll l l
ll l l l l
l l ll l l l
l
lll ll ll l l lll l l ll lll ll l l l l l l ll l
l lll l
ll ll llll l l l l
l l lllll l ll l
lll l l ll l lll l l l ll
l l l ll llll l l
ll l lll l ll ll l ll
ll ll llll l l llll l l l l ll
ll lllllll ll lll l l ll l lll l l
l l lll l
l l ll l ll l lll l l l l ll ll ll l l l l
l l l lll
lll l ll
l l l lll ll
ll l l llll l l lll
l l
l l l ll ll l l l l ll l l l
l llll l ll l l ll l l ll l l l ll l l ll l l l ll llllll l ll lll l l lll
ll l l l l l l l
ll l l
ll lllll lll l l ll l
l l l
l l l l l lll l l llll ll
l ll l l l
l l l l l l ll l l l l l
l l l l lll l l l
l l l l l ll
l ll l l l lll ll
ll l ll lll lll l l l l l l ll l l
ll l ll ll l l l l
ll l l l ll l l
l
l
l ll l l ll l llll
lll
ll
l l l l
l lll ll l
l
l
l
l ll l l l l ll l l l l ll
l l ll l l lll
l l ll l l l ll l l l l ll l l ll l ll l l
l
ll l l ll ll l l l l ll l l lll l l
l l ll l
l l l l l lll l l l l l l lll l l
l l l
l l l l l ll lllll
l l l lll
l
l
l
l l
ll l
l
l l l l ll
l l llll ll l l l
l ll
l l
l
l l l l
l
ll l l l l l l l l l ll l
l l lll l l
ll l
l l ll
l l l ll l l ll
l ll l ll ll l
l
l l ll
l l l ll ll l l ll l ll l l
l l l ll l l ll
ll l ll l l ll
ll l l l l l
l
l l l ll l
l l ll
l
l l l l l ll l l lll ll l ll l l
l l l l l lll l
l ll l
l l l l l l l l ll l l l l l l
l l l l l
l l l l lll l l l l l l l l lll ll
l
ll l ll l l l l l l l l l l ll l l l
l l l l
−2 l
ll l l
l
l
l
l
l
l
l l
l
l
l
l
ll
l
l l l
l
l l l
l
l l
l
ll
ll l
l
l
l
ll
l
l
l
ll
l
ll
l
l
l
ll
l
l
ll
l
l
l
ll
ll
l
ll
lll l
l
l
l
l
l
l
ll
l
l
l
lll ll
l
l l
l ll
l
l
ll
l l
l
l l l l
l l l l
l l l l l l l l l l l l
l l l l
l l l l
l l l l
l l l l
Fig. 3.13 Samples from five standard normal random variables joined by a normal copula with
a correlation matrix that is not uniform. Note how the Kendall’s tau value between each pair of
variables is different
⎛ ⎞
1.00 0.75 0.50 0.25 0.12
⎜0.75 0.12⎟
⎜ 1.00 0.50 0.25 ⎟
⎜ ⎟
R = ⎜0.50 0.50 1.00 0.12 0.50⎟ .
⎜ ⎟
⎝0.25 0.25 0.12 1.00 0.12⎠
0.12 0.12 0.50 0.12 1.00
In this section we will discuss a way to create uncorrelated random variables from a
set of correlated random variables. We will do this using the singular value decom-
position of the data. This procedure is known by several names, including principal
component analysis, the Hotelling transform, or proper orthogonal decomposition.
This procedure can be used to reduce the dimension of the data set by revealing a
set of uncorrelated random variables that produce the observed correlated random
variables.
Consider that we have a collection of p random variables X, n samples of these
random variables, and n > p. We can assemble these samples into a n by p matrix
(n rows and p columns) of the form
⎛ (1) (1) (1) (1) ⎞
x1 x2 . . . xp−1 xp
⎜ (2) (2) (2) (2) ⎟
⎜x1 x2 . . . xp−1 xp ⎟
A=⎜
⎜ ..
⎟.
⎟ (3.42)
⎝ . ⎠
(n) (n) (n) (n)
x1 x2 . . . xp−1 xp
We also assume that the matrix A is such that each row has a mean of zero; this can
be done by subtracting the mean of each row from the matrix.
The matrix A will be rectangular in general, n
= p. For such a matrix, we can
factor it into what is known as the singular value decomposition (SVD):
A = USVT . (3.43)
In this decomposition:
• U is a n × p orthogonal matrix, i.e., UT U = I and UUT = I where I is an identity
matrix.
• S is a p × p diagonal matrix with nonnegative entries.
• V is a p × p orthogonal matrix.
The singular value decomposition is related to the eigenvalue decomposition of
AAT or AT A. To see this we left multiply Eq. (3.42) by AT to get
T
AT A = USVT USVT = VSUT USVT = VS2 VT
AAT = US2 UT .
3.4 Random Variable Reduction: The Singular Value Decomposition 77
λ1 ≥ λ2 ≥ · · · ≥ λr ,
T ≡ AV.
The resulting matrix has columns that are linear combinations of the original
columns. The columns in T will have zero covariance between them. This can be
seen by multiplying T by its transpose. The matrix TT T is the covariance matrix of
the data matrix T, and it is given by the diagonal matrix
TT T = (US)T US = S2 .
Therefore, the matrix V has columns that give the coefficients for a linear combina-
tion of the original variables to create p uncorrelated variables.
The rows of the matrix U are the values of the uncorrelated random variables
created
√ by the linear combinations defined by the columns of V, divided by the
λi . To see this we can look at the matrix T:
T = AV = US,
or
U = TS−1 .
Additionally, mean of each column in the matrix U is zero. Therefore, each row of
U contains the values of p uncorrelated, mean-zero random variables.
To summarize, the SVD transforms the original data matrix, into a matrix U
of mean-zero, uncorrelated variables that are rescaled linear combinations of the
original variables. The linear combinations are defined in V, and the scaling is given
in the diagonal matrix S.
78 3 Input Parameter Distributions
To examine the way the SVD works and see how we can use it to approximate the
original matrix, we write it as a sum
r
A= λi ui vTi , (3.44)
i=1
where ui is the ith column of U and vi is the ith column of V. Given that each term
of this sum is a n × p matrix, we can write an approximation to A using a subset of
the terms. Call the matrix using only k terms in the sum as Ak such that
k
Ak = λi ui vTi .
i=1
Ak = USk VT ,
As we will see later, the number of input random variables is a strong determinant
of the computational cost of performing a UQ study. In such an instance, it may be
possible to use the SVD to reduce the number of input random variables. If we say
that there are nominally p input random variables to our simulation and we have n
samples of those random variables, we can form the matrix A as described above.
Then we can perform the SVD on the matrix and determine how many uncorrelated
variables there are. For instance, if r < p, then we know we can exactly represent
the matrix A using fewer than p random variables.
We can also use the SVD to create a small number of solutions based on the
numerical solutions to our model equations. It is also the case that if we have the
numerical solution to our model equations at a finite number of points, we can
use the SVD to represent the variability in the numerical solution with a handful
3.4 Random Variable Reduction: The Singular Value Decomposition 79
1.00
0.75
0.50
0.25
0.00
1 2 3 4 5 6 7 8 9
k
Fig. 3.14 The fraction of variance explained as a function of k for the SVD of the data in Table 3.2
250
200
150
100
50
250
200
150
X2
100
50
250
200
150
100
50
Fig. 3.15 Scatter plots of X1 versus X2 for k = 1 . . . 9 in the original units. The top left plot is
k = 1 and the bottom right is k = 9 (the original data set)
0.50
0.25
Coefficient
0.00
-0.25
k
1
-0.50 2
3
X1 X2 X3 X4 X5 X6 X7 X8 X9
Variable
Fig. 3.16 Composition of the first three uncorrelated variables in the SVD of the normalized data
matrix
data set, the largest value of in the second column of U belongs to Mark McGwire
in 1998 when he hit 70 home runs in a allegedly steroid-tainted campaign. This is
the most extreme power hitter in this measure. The lowest value in this column is
Rickey Henderson in 1982 when he set the modern-day record for stolen bases with
130 (and was caught 42 times).
When looking at the SVD results in this light, we can see that the coefficients
are telling us something about the data. In this case it tells us that one measure of a
baseball player is the amount of power versus speed. These results also indicate that
the SVD can be useful even when we are not looking to reduce the data because it
can give us a different lens to see how a data is varying.
The Karhunen-Loève expansion (KL expansion) is the analog of the SVD for a
stochastic process. Recall that a stochastic process can be thought of as a collection
of random variables where the number of random variables goes to infinity. In this
case, we represent the stochastic process as an expansion in basis functions instead
of the basis vectors in the V matrix in the SVD. To compute the KL expansion,
we need to know only the mean function, μ(x), and covariance function, k(x1 , x2 ).
With this knowledge we can write the KL expansion of a stochastic process u(x; ξ )
where x ∈ [a, b] is the deterministic spatial variable and ξ denotes the random
component as
∞
u(x; ξ ) = μ(x) + λ ξ g (x). (3.45)
=0
Notice that this form looks nearly identical to the SVD in Eq. (3.44). The ξ are
random variables with zero mean and unit variance. The ξ are also uncorrelated,
but they are not necessarily independent.
The λ and g (x) are eigenvalues and eigenfunctions of the covariance operator:
b
k(x, y)g (y) dy = λ g (x). (3.46)
a
The functions g (x) are orthonormal just like the matrix V was orthogonal in the
SVD. Also, we order the eigenvalues as we did in the SVD case, λ1 ≥ λ2 ≥ . . . ,
and the eigenvalues have a finite sum of squares:
∞
λ2 ≤ ∞.
=0
84 3 Input Parameter Distributions
The KL expansion turns a stochastic process into a sum over random variables.
Therefore, if we truncate the expansion, we have effectively discretized it in terms
of randomness: rather than infinite collection of random variables, we write the
process as a finite sum of random variables with known properties. Going back to
our definition of the UQ problem in Chap. 1, if we have a calculation that depends
on a stochastic process as input, we can consider the ξ as our uncertain inputs and
get a map to the input stochastic process. The number of terms that we need to keep
in the expansion depends on the covariance function and how fast the λ go to zero
in magnitude.
we can find the eigenvalues and eigenvectors exactly. The case we will consider has
x ∈ [−a, a], but we can use these results over any domain provided we define a
shifted spatial variable.
The eigenvectors for this covariance function can be expressed in terms of
cosines and sines, and we will write the KL expansion in a slightly different way:
∞
u(x; ξ ) = μ(x) + λ ξ g (x) + λ∗ ξ∗ g∗ (x) . (3.48)
=0
2cb 2cb
λ = , λ∗ = , (3.49)
ω2 + b2 ω∗2 + b2
cos(ω x)
g = , (3.51)
sin(2ω a)
a+ 2ω
and
sin(ω x)
g∗ = . (3.52)
sin(2ω a)
a− 2ω
1.00
0.75
0.50
λn
b
0.1
1
0.25
10
0.00
1 2 3 4 5 6 7 8 9 10
n
Fig. 3.17 The eigenvalues λn and λ∗n for various values of b and a = 0.5 and c = 1. The odd n
are λ∗n , and even n are λn
0
u(x)
KL order
-1 2
10
50
-2 100
200
Fig. 3.18 A single realization of a Gaussian stochastic process over [−0.5, 0.5] with μ(x) =
cos 2π x and an exponential covariance with b = c = 1 using various number of expansion terms
2
1
0
2
-1
-2
-3
2
1
0
10
-1
-2
-3
2
1
0
u(x)
50
-1
-2
-3
2
1
0
100
-1
-2
-3
2
1
0
200
-1
-2
-3
-0.50 -0.25 0.00 0.25 0.50
x
Fig. 3.19 Five realizations of a Gaussian stochastic process over [−0.5, 0.5] with μ(x) = cos 2π x
and an exponential covariance with b = c = 1 at different numbers of expansion terms
of the expansion approaches that of the full process. In many cases, the fine-scale
structure of the process is not what is important; rather the overall behavior is of
interest. If this were the case, we would likely be able to adequately model this
process with just a few terms in the expansion.
One basic question regarding an uncertain parameter regards how we would like to
represent that uncertainty. From the preceding chapter, we know that once we have
a CDF or PDF for a random variable, we can then compute quantities like the mean,
variance, and any number of other properties of the distribution. Nevertheless, it is
generally not possible to have a unique mapping the other way: to go from moments
88 3 Input Parameter Distributions
of the distribution, e.g., mean, variance, skewness, kurtosis, etc. to produce a PDF
or CDF.
Unfortunately, we usually do not know the distribution of our input parameters.
It is much more typical to have some number of samples from the distribution. For
instance, if the system we are interested in simulating has manufactured parts and
the properties of those parts have a distribution, we will be able to take a number
of parts and measure the properties. This gives us samples from the distribution of
the properties, from which we can estimate moments like the mean and variance.
However, we cannot robustly quantify the behavior of the tails of the distribution
from a small number of samples. This is because, by definition, our samples
will, with high probability, not have any values from the tails of the distribution.
Therefore, the best we can do is make a guess as to the tail behavior of the system.
We need to acknowledge that we have made this assumption about the tail behavior
of the distribution, and not make overly specific claims about the probability of a
tail event is.
A common approach to modeling a random variable is to select a distribution
from the standard set of distributions (such as those provided in Appendix A). There
are several considerations that are important when selecting a distribution for an
input random variable. For a given parameter, we want the distribution we assume
it follows to be consistent with the parameter in the following regards:
1. The range, e.g., real numbers, positive real numbers, or a certain range
2. The known moments, or other properties of the distribution, e.g., mean, median,
variance, or various quantiles.
The first of these conditions can eliminate many possible distributions. For instance,
if we know the parameter can only take on a range of values or is positive, then we
know that we cannot use a normal distribution without an ad hoc procedure for
ignoring the probability of getting an invalid parameter. The known information
about the parameter’s behavior will also eliminate some possible distributions. As
an example, if the parameter is known to possess some skewness or excess kurtosis,
then a normal distribution will not be able to capture those properties. Once a
distribution is chosen, then one can fit the remaining known information about
the distribution. That is, select the parameters of the distribution so that the input
random variable’s properties are preserved.
Many times it will not be the case that all of the desired properties of the
distribution can be fit with a standard distribution. It may be the case that a
standard distribution is not flexible enough to reproduce the desired properties (e.g.,
there is a fixed relationship between moments of the distribution). In this case
one could compromise and decide to not match all of the desired properties. The
other possibility is to blend distributions together to get the desired properties. For
instance, if the desired distribution is multimodal, i.e., it has multiple local maxima
in the PDF, one could write this as the sum of normal distributions and fit the mean
and standard deviation of each normal to match the desired distribution.
3.6 Choosing Input Parameter Distributions 89
In the selection of a distribution for input parameters, there are necessarily assump-
tions that are made. These assumptions are a type of epistemic uncertainty in the
uncertainty modeling. For the distribution of a single parameter, i.e., its marginal
distribution, the behavior of that distribution in the tails could have an impact on
the conclusions of the analysis. For instance, if one is interested in the percentage
of time the system’s maximum temperature exceeds some threshold, one could get
90 3 Input Parameter Distributions
an answer of 0.01% using normal distributions for the input parameters, and 0.05%
using a t-distribution for the parameters. Given that we do not actually know which
is the is correct distribution to use, the range 0.01–0.05% is the epistemic uncertainty
in the result.
Furthermore, the assumptions on the joint distribution lead to epistemic uncer-
tainty. For a given measure of relation between two variables, there are an infinite
number of joint distributions that could match this quantity. In fact, we discussed
several possible joint distributions when we discussed copulas. Each of these joint
distributions has properties that could affect an uncertainty analysis. For example,
both the Frank copula and the normal copula could match any particular value
of Kendall’s tau,, but the behavior of the joint distribution is not the same: when
we look at samples from the joint distributions, Frank gives an almost rectangular
distribution versus the elliptically shaped normal copula.
It is likely that the number and impact of outliers, that is, low-probability
events, will be underestimated by any finite sample of a distribution. Indeed, if
the distribution looks normal, except for a single sample, the analyst is likely to
“ignore” that sample. One would like the prediction from an uncertainty study to
be robust to the presence of outliers, but this needs to be a conscious decision of
the practitioners, and distributions need to be chosen that give this property. In the
same vein, tail dependence is very difficult to estimate from a set of samples. In
any uncertainty study, it should be carefully considered what the implications of tail
dependence are and how to conservatively estimate the impact of such a dependence.
The topic of principal component analysis is covered in detail by Jolliffe (2002), and
proper orthogonal decomposition for numerical calculations is covered by Schilders
et al. (2008). Additional discussion of copulas can be found in Kurowicka and
Cooke (2006).
3.8 Exercises
and
of the QoI. Therefore, it is possible to use the local sensitivity analysis to screen
out unimportant parameters before performing a more in-depth analysis. Such a
reduction in the number of parameters will make later analyses more efficient.
Nevertheless, one must be mindful that a parameter that is unimportant in one region
of input space may be important in another.
As we will see in this chapter, the nominal amount of evaluations of the QoI to
perform a sensitivity analysis is the number of parameters plus one. The number
goes up significantly if second-order information is calculated. In later chapters we
will see how these numbers can be reduced using regularized regression and adjoint
techniques. We begin with the straightforward calculation of sensitivities.
From this expansion, we can expect that for small variations to x, the Taylor
expansion of the QoI would give an accurate description to the behavior of the
QoI to changes in the input parameters. The question remains how to estimate the
derivatives in the expansion. Once we do know these derivatives, we could predict
the behavior with changes to parameters.
The error in the Taylor series is only small for points close to the expansion point,
x̄. As one moves away from the expansion point, the error can become very large,
even if the underlying function is smooth and a high-order series is used. This can
be seen in the fact that the error term is proportional to Δ to some power. Therefore,
when Δ becomes large enough, the error will be large.
4.1 First-Order Sensitivity Approximations 97
For the expansion in Eq. (4.2), if we neglect the second-order and higher terms, we
can express the behavior of the quantity of interest using only first derivatives of the
QoI. This will give us the ability to predict which parameters have a larger effect on
the QoI and the expected change of the QoI to a small perturbation in a parameter.
This use of derivatives to predict the behavior of a QoI is commonly called local
sensitivity analysis. The first-order derivatives of the QoI are often called the first-
order sensitivities of the QoI.
By ranking the sensitivities by magnitude, we can gauge which uncertain
parameters are likely to have the largest impact. To compare the sensitivities, we
need to cast them in the same units because, for example, the units of sensitivity
i will have the inverse units of xi . One way to do this is with scaled sensitivity
coefficients. The scaled sensitivity coefficient for parameter i is the nominal value
of parameter i, x̄i , multiplied by the derivative of the QoI with respect to xi :
∂Q
(Scaled Sensitivity Coefficient)i = x̄i . (4.3)
∂xi x̄
This definition of the scaled sensitivity coefficient can use any nominal value of the
user’s choosing. Often the nominal value will be the mean, but it could be any value.
The scaled sensitivity coefficients indicate which parameters are most sensitive
about a value of the parameter. This can be misleading, however, because it is
possible that a parameter has a large scaled sensitivity coefficient, but a small overall
uncertainty, i.e., we know that parameter to within a small degree of uncertainty.
To correct this case, sensitivity indices are used. In this case we multiply by the
characteristic range of variation of the parameter; often this is chosen to be the
standard deviation of the parameter i, σi :
∂Q
(Sensitivity Index)i = σi . (4.4)
∂xi x̄
The parameter with the largest product of the derivative and the standard deviation
will have the highest sensitivity index. Note that the parameter σi might be replaced
by some other measure of the variability of the parameter about x̄.
Both of these measures of sensitivity are useful in eliminating parameters that
do not appear to be important to the QoI, at least near their nominal value. The
utility of such knowledge is most evident in a system where there is a large number
of uncertain parameters. Knowing the sensitivities allows the UQ practitioner to
narrow the focus to a smaller number of parameters and then apply the more time-
consuming techniques we shall discuss later, e.g., sampling methods or polynomial
chaos expansions. One must keep in mind, however, the fact that sensitivities
are only local quantities and extrapolating far from the nominal value x̄ may
require understanding of higher-order terms and the interactions between different
parameters.
98 4 Local Sensitivity Analysis Based on Derivative Approximations
To estimate the expectation of Q(x)2 , we use the first-order Taylor expansion from
Eq. (4.2) and ignore the second-derivative and higher terms:
∂Q 2 ∂Q
Q(x )2 ≈ Q (x̄ )2 + (xi − x¯i ) + 2Q (x̄ ) (x − ¯
x ) .
∂xi x̄ ∂xi x̄
i i
i i
(4.6)
E[Q(x)]2 ≈ Q(x̄)2 .
Using the expansion from Eq. (4.6) in Eq. (4.5), we get, to second order,
Var(Q) = −Q (x̄ )2
∂Q 2
+ dx Q (x̄ ) +
2
(xi − x¯i )
∂xi x̄
i
∂Q
+ 2Q (x̄ ) (xi − x¯i ) f (x). (4.7)
∂xi x̄
i
and this will cancel the other quadratic Q term. In addition, the cross terms are linear
in x about the mean and will integrate to zero. The remaining term to deal with is
4.3 Difference Approximations 99
∂Q 2
dx f (x) (xi − x¯i )
∂xi x̄
i
∂Q ∂Q
= dx dxj fij (xi , xj )(xi − x¯i )(xj − x¯j ), (4.9)
∂xi x̄ ∂xj x̄
i
i j
where the fij (xi , xj ) is the joint marginal distribution of f (x). The integral in
Eq. (4.9) is the covariance matrix that we previously defined. The covariance matrix
indicates how parameters vary together and was defined in Sect. 2.3 as
σij = dxi dxj fij (xi , xj )(xi − x¯i )(xj − x¯j ). (4.10)
As discussed above, the scaled sensitivity coefficients and sensitivity index require
the derivative of the QoI with respect to each xi . We can approximate these
derivatives easily using finite differences:
∂Q Q(x̄ + δi êi ) − Q(x̄)
≈ , (4.12)
∂xi x̄ δi
100 4 Local Sensitivity Analysis Based on Derivative Approximations
where δ is a small, positive parameter and êi is a vector that is one in the ith position.
Given that we need to compute p derivatives, we need to compute the QoI at p + 1
points (i.e., p + 1 runs of the code): 1 for the mean value, x̄, and 1 for each of the i
parameters.
This finite difference formula is known as the forward difference formula: it
perturbs the nominal state in the positive, or forward, direction to estimate the
derivative. Other types of finite differences include backward difference where the
perturbation is in the negative direction and central difference where the parameter
is adjusted forward and backward (this can be thought of as an average of the
forward and backward differences). The central difference formula has the benefit
of having an error that decreases at a rate of δi2 , compared with δi for forward and
backward differences, at a cost of requiring two function evaluations per derivative
approximation.
du d 2u
v − ω 2 + κ(x)u = S(x), (4.13)
dx dx
u(0) = u(10) = 0,
μv = 10, μω = 20,
and variances
25 Parameter
K
20 S
15
Value
10
0
0.0 2.5 5.0 7.5 10.0
x
Fig. 4.1 Values of the mean function for κ and S for our ADR example
with μq = 1, Var(q) = 7.062353 × 10−4 . The mean functions for κ(x) and S(x)
are shown graphically in Fig. 4.1. We also prescribe that the parameters ordered as
(v, ω, κl , κu , q) have a correlation matrix given by
⎛ ⎞
1.00 0.10 −0.05 0.00 0.00
⎜ 0.10 1.00 −0.40 0.30 0.50 ⎟
⎜ ⎟
⎜ ⎟
R = ⎜−0.05 −0.40 1.00 0.20 0.00 ⎟ . (4.15)
⎜ ⎟
⎝ 0.00 0.30 0.20 1.00 −0.10⎠
0.00 0.50 0.00 −0.10 1.00
The QoI for this example will be the total reaction rate in the problem:
10
Q= dx κ(x)u(x). (4.16)
0
4
u(x)
0
0.0 2.5 5.0 7.5 10.0
x
Fig. 4.2 The solution u(x) evaluated at the mean value of the uncertain parameters
Table 4.1 Sensitivities to the five parameters in the ADR reaction rate
Parameter Sensitivity Scaled sensitivity coef. Sensitivity index
v −1.7406 −17.406 −0.46819
ω −0.97020 −19.404 −0.54842
κl 12.868 1.2868 0.037542
κh 17.761 35.523 0.93616
q 52.390 52.390 1.3923
4.3 Difference Approximations 103
2.0
1.5
K(x)
1.0
Realization
0.5 1
2
0.0 3
Fig. 4.3 Three realizations of the random process version of κ(x) at 2000 points
The variance in Q can be estimated from Eq. (4.11) using the data from the
sensitivity column in Table 4.1 and forming the covariance matrix using the given
variances and correlation matrix for the parameters (see Eq. (3.3)). This estimate
of the variance gives Var(Q) ≈ 2.0876. If we assume that the parameters are a
multivariate normal, the actual variance in Q, as estimated via Monte Carlo1 with
4 × 104 code runs, is 2.0699, a difference of about 8.5%. This result indicates that
for this problem the variance estimate is a reasonable approximation to the true
variance.
We can significantly increase the size of the parameter space by making the
parameter κ be a Gaussian stochastic process with mean function given by Eq. (4.14)
and covariance function given by
1 The use of Monte Carlo to estimate the variance in a QoI will be discussed in Chap. 7.
104 4 Local Sensitivity Analysis Based on Derivative Approximations
Sensitivity 0.02
0.01
0.00
0.0 2.5 5.0 7.5 10.0
x
Fig. 4.4 Sensitivity of Q to κ(x) using finite differences and 2001 solutions to the ADR equations
the sensitivity via finite differences. For each parameter we perturb by a relative
amount of 10−6 and get the sensitivities to Q to the value of κ in each cell.
These sensitivities are shown in Fig. 4.4. Notice that the sensitivity looks similar
to the solution at the mean function of κ as given in Fig. 4.2. We will explore this
connection further when we discuss adjoint methods in Chap. 6.
To estimate the variance using Eq. (4.11), we form a covariance matrix as
Σij = k(xi , xj ),
where xi and xj are the centers of the ith and j th mesh cell, respectively. The
variance estimate from Eq. (4.11) gives Var(Q) ≈ 18.672, compared with a Monte
Carlo estimate of 19.049 using 4 × 104 realizations of κ. This difference is only
about 2%: this indicates that even when there are a large number of parameters, the
first-order variance estimate can be accurate.
The finite difference formula given in Eq. (4.12) could be replaced with other
common finite difference formulas, such as a central difference formula. Alter-
natively, the complex step formula from Lyness and Moler (1967) could be used
if the underlying function is analytic. This method computes the derivative by
perturbing the parameter with an imaginary perturbation to compute a second-order
approximation to the derivative as
! "
∂Q I Q(x̄ + iδi êi )
= + O(δi2 ), (4.17)
∂xi x̄ δi
4.4 Second-Derivative Approximations 105
0.0375
0.0350
10-5
Sensitivity
Abs. error
0.0325
0.0300 10-10
0.0275
10-15
0.0250
10-10 10-8 10-6 10-4 10-2 100 10-10 10-8 10-6 10-4 10-2 100
δ δ
Method
Forward Center Complex Step
Fig. 4.5 Sensitivity of Q to κ(6.25) using forward difference, centered difference, and complex
step methods for different values of δ
√
where i = −1 and I {·} denotes the imaginary part of the argument. It has been
demonstrated that this method can produce derivative approximations as accurate as
the floating point arithmetic on a particular computer will allow. This is because the
approximation does not take a small difference divided by a small number, which
can amplify small round-off errors.
To use the complex step method, the computer code must be able to appropriately
handle complex arithmetic, which is not the usual case. However, if this method is
available for use, it can be a powerful technique because it can save one evaluation
of the code by not requiring the evaluation of Q(x̄), and the derivatives can
be approximated more accurately. As an example of this, Fig. 4.5 demonstrates
how different derivative approximations to the sensitivity of Q to the value of
κ(6.25) perform. In this figure we see that as δ → 0 the methods converge to a
different answer. Initially, when δ is still greater than 10−5 , the methods seem to be
converging to the same point, but eventually precision errors in the finite difference
calculation dominate. This occurs even when central differences are used for the
derivative approximation.
It seems natural to extend the approximation of the local sensitivity to include the
second derivatives in the Taylor series approximation of Q. The number of extra
terms in the expansion is p2 . To estimate these terms, we need to evaluate Q at
more points. For the derivatives that are the second derivative with respect to an
individual value, the simplest formula is
∂ 2 Q Q(x̄ + δi êi ) − 2Q(x̄) + Q(x̄ − δi êi )
≈ . (4.18)
∂xi2 x̄ δi2
106 4 Local Sensitivity Analysis Based on Derivative Approximations
Upon inspection of this formula, we see that each of the terms in the numerator is not
contained in either Eq. (4.12) or Eq. (4.18). Therefore, each cross-derivative term
requires four new function evaluations. For the p(p − 1) cross-derivative terms,
there will be 2p(p − 1) additional evaluations of Q; a factor of two is saved by
symmetry due to the fact that
∂ 2Q ∂ 2Q
= .
∂xi ∂xj ∂xj ∂xi
Q
the actual values of Q, the
solid line is the linear 50
approximation, and the
dashed line is the quadratic 45
approximation
10 15 20 25 30
ω
52.75
52.50
Q
52.25
52.00
60
Q
50
40
60
55
Q
50
45
4.6 Exercises
∂u ∂u ∂ 2u
+v = D 2 − ωu,
∂t ∂x ∂x
for u(x, t) on the spatial domain x ∈ [0, 10] with periodic boundary conditions
u(0− ) = u(10+ ) and initial conditions
1 x ∈ [0, 2.5]
u(x, 0) = .
0 otherwise
The time interval for the problem is t ∈ [0, 5]. Use the solution to compute the
total reactions in a particular part of the domain:
6 5
dx dt ωu(x, t).
5 0
Compute scaled sensitivity coefficients and sensitivity indices for normal random
variables:
a. μv = 0.5, σv = 0.1,
b. μD = 0.125, σD = 0.03,
c. μω = 0.1, σω = 0.05,
Also, estimate the variance in the total reactions. How do these results change
with changes in Δx and Δt?
Chapter 5
Regression Approximations to Estimate
Sensitivities
1 We have switched the notation for number of parameters here so that when we form matrices the
indices will be the common i and j for row and column, respectively.
which, using gradient notation, allows us to write the total collection of data as
where the subscripts are ordered so that xij is the value of input j for the ith
evaluation of Q. We can rearrange equations so that it can be written in the shorthand
form
Xβ = y, (5.1)
The vector y is often called the dependent variables and X styled the data matrix of
independent variables.
The natural reaction is to seek β by solving Eq. (5.1). Of course, unless I = J ,
i.e., X is a square matrix, there is not a unique solution or necessarily even a solution.
Therefore, we need at least J + 1 simulations to estimate the sensitivities.2 The
XT Xβ = XT y, (5.2)
where the hat denotes that this is not solution to Eq. (5.1), rather an approximation.
We note that the system in Eq. (5.2) is called the system of normal equations, and
they will only have a solution if XT X is full rank. Additionally, in practice the
normal equations are not formed and then solved; typically QR factorization or the
SVD is used to find β̂ LS .
This approximation given by Eq. (5.3) has the often useful property that it
minimizes the total squared error over the data. In particular one can show that the
solution given by Eq. (5.3) is equivalent to the solution to the minimization problem
of finding the β that minimizes the sum of the squared error:
1
I
β̂ LS = min (yi − β · xi )2 , (5.4)
β 2
i=1
where xi is the ith row of the data matrix. The subscript “LS” denotes that this is the
least-squares solution. As noted above this is the solution we can obtain when I >
J , that is when the number of simulations is greater than the number of parameters—
a case that is not useful to our goal of reducing the number of simulations required
to estimate the sensitivities.
(xij − x̄j )
Xij = , (5.5a)
x̄j
I
β̂ ridge = min (yi − β · xi )2 + λβ2 , (5.6)
β
i=1
This new problem seeks a value of β that minimizes the sum of the squares
of the data while also minimizing the 2-norm of the coefficients. To put it more
colloquially, the goal is to get a value of β that matches the data as well as possible
while also having a small magnitude. Another name for the ridge regularization is
Tikhonov regularization.
An equivalent formulation of the regression problem casts the regularization as a
constraint rather than a penalty.3 In particular, this form is
I
β̂ ridge = min (yi − β · xi )2 subject to β2 ≤ s. (5.8)
β
i=1
This form helps us to visualize how ridge regression works, and we will do so by
considering a system with J = 2 as shown in Fig. 5.1. The cost function to be
minimized in Eq. (5.8) is a quadratic function in the two coefficients. The contours
of this quadratic function will appear as ellipses in the (β1 , β2 ) plane. In the center
of these ellipses is the LS estimate.
The circle in Fig. 5.1 has a radius s, and the solution must lie inside or on this
circle. Because the LS estimate is outside the circle, the solution will be where the
circle intersects a contour line of the sum of the squared error at the minimal possible
value of the error. Notice that the magnitude of both β1 and β2 has decreased in the
ridge estimate compared with the LS estimate and that both are nonzero.
The solution to the ridge problem can be shown to be equivalent to the solution
to the system
3 The constraint form can be changed into the penalty form by considering λ as a Lagrange
XT X + λI β = XT y, (5.9)
where I is an identity matrix of size J × J . This system will always have a solution
for λ > 0.
A feature of ridge regression is that the larger the value of λ, the smaller the
values in β̂ ridge will be in magnitude relative to the values of β̂ LS , when the LS
values exist. This makes λ a free parameter that must be chosen based on another
consideration. We will discuss using cross-validation to choose this parameter later.
As a simple example of ridge regression, consider the problem of estimating a
function of the form y = ax + b given the data y(2) = 1. That is, we are interested
in fitting a line to single data point. This problem is formulated as
2 1
a= , b= ,
λ+5 λ+5
for λ > 0. From this we can see that the limit of this solution as λ approaches zero
from the right is
2 1
lim a = , lim b = .
λ→0+ 5 λ→0+ 5
Notice that for λ > 0, the fitted solution does not pass through the data, that is,
2a + b
= 1, as the original data showed.
In this example we can see that we can fit a line to single data point, but the
result is not necessarily faithful to the original data, but it does give us a means to fit
a solution when I < J . This property will be useful for estimating local sensitivities
when we have fewer QoI evaluations than parameters.
The ridge prescription can be modified to make the penalty be the 1-norm of the
coefficients. The resulting solution is known as the least absolute shrinkage and
selection operator, often shortened to lasso (Tibshirani 1996). This approach tends
5.2 Regularized Regression 117
b̂ lasso b̂ LS
b̂ ridge
L1
b1
L2
to set several coefficients to be zero and, as such, “lassos” the important coefficients.
The lasso problem is given by
I
β̂ lasso = min (yi − β · xi )2 + λβ1 . (5.10)
β
i=1
The small difference between ridge and lasso is in the choice of norm for the penalty.
It would seem that this small difference would not make a major difference in
the result. Nevertheless, the introduction of the 1-norm makes the solution to the
problem more difficult, and the L1 penalty tends to set some of the coefficients to
zero. Making some of the coefficients zero is said to produce a sparse model or,
alternatively, a model that is parsimonious in that it does not include variables that
are not important.
The property of setting some of the coefficients to zero is precisely what we
would like our method to do in the case where many of the sensitivities are small
and there are a few, large nonzero sensitivities. In fact, lasso will select at most I
nonzero coefficients when I < J . The property of setting certain coefficients to
zero is demonstrated in Fig. 5.2. The L1 norm has a diamond-shaped level curve
with the points of the diamond on the axes. Therefore, the intersection between the
L1 penalty and the level curves of the squared residuals is likely to be on an axis.
This is indeed what happens in Fig. 5.2. Notice that the sum of the squares of the
error in the lasso solution will be larger than that for ridge because the point is on
an ellipse that is farther away from the least-squares solution. This will not always
be the case, however.
The solution to the lasso problem is more difficult because the minimization
problem is now non-quadratic because the derivative of the L1 norm is not smooth
118 5 Regression Approximations to Estimate Sensitivities
due to the singular derivative of the absolute value at zero. The problem, however, is
still a convex optimization problem, and there are numerical methods for efficiently
solving convex optimization problems.
Elastic net regression (Zou and Hastie 2005) is a combination of the ridge and
lasso penalties that keeps some of the sparsity of lasso but gives more accurate
predictions. The elastic net minimization problem is
I
β̂ el = min (yi − β · xi )2 + λ1 β1 + λ2 β22 . (5.11)
β
i=1
The elastic net solution does promote sparseness in the coefficient vector, like lasso,
but it is not limited to find only I nonzero coefficients. The elastic net tends to set
groups of coefficients to nonzero values, as we will see in an example below.
In practice, the trade-off between the L1 and the L2 penalty is quantified by the
parameter α:
λ1
α= .
λ1 + λ2
b1
0.5
Variable
0.4 a
b
Coefficient
0.3
Method
lasso
0.2
α = 0.75
α = 0.5
0.1 α = 0.25
ridge
0.0
10-6 10-4 10-2 100
λ
Fig. 5.5 Comparison of three methods on the problem of fitting y = ax + b to the given data
y(2) = 1 as a function of λ
values of αβ1 + (1 − α)β22 is between the diamond of L1 and the circle of the
Euclidean norm. As α approaches one, the solution moves from the ridge result to
the lasso value of β.
The effect of the elastic net penalty is further illustrated in Fig. 5.4 where the
curves of equal value for αβ1 +(1−α)β22 two independent variables are shown.
As α is decreased from 1 to 0, the curves transition from the circle of the L2 norm
to the diamond of the L1 norm. During the transition the points on the axes stay the
same, which gives the curves for 0 < α < 1 a blunted point on the axes (i.e., the
curve is still smooth here for α < 1). This transition gives elastic net the ability to
find sparse solutions but also allows non-sparse solutions to be found.
It is worthwhile to compare elastic net and lasso to the ridge result on the simple
problem of fitting y = ax + b to the given data y(2) = 1. In Fig. 5.5 the results
120 5 Regression Approximations to Estimate Sensitivities
are shown as a function of λ. For the elastic net regression results, λ1 = αλ and
λ2 = (1 − α)λ. In this figure we can see that even for small values of λ, the lasso
and ridge results are different: the lasso result sets a = 0.5 and b = 0. This is the
“sparsity” we referred to before. Additionally, on this problem with a small number
of coefficients, the elastic net results with α = 0.75 and 0.5 are nearly identical to
the lasso results. When α is reduced to 0.25, the result is something between the
ridge and lasso. Note that both the lasso and ridge result with a small value of λ
exactly match the data.
1 25 50 75 100
Variable
Fig. 5.6 Comparison of the different regularized regression methods to estimate the scaled
sensitivity coefficients using 40 evaluations of a QoI with 100 parameters
where ∼ N (0, σ 2 = 0.012 ). For this QoI, only the scaled sensitivity coefficients
for the first five variables is large: the contribution to the QoI from variables
6 through 100 is small. If we knew this ahead of time, we could do finite
difference only on these five variables. Of course, we would typically not know
this information.
Let us assume that we can afford 40 simulations. We choose 40 values of xi
using Latin hypercube sampling and fit a lasso model using cross-validation. With
the value of λ selected based on a leave-one-out cross-validation and selecting the
largest λ with a mean value of the mean-squared error within one standard error
of the minimum error, we get λ = 0.0055. The results for the scaled sensitivity
coefficients are given in Fig. 5.6. In this figure we can see that lasso does correctly
pick the five largest scaled sensitivity coefficients, with a slight inaccuracy in the
actual values of the coefficients. For the many small sensitivities, it does not give
the correct value of 0.1 but instead estimates several coefficients to be nonzero and
larger than 0.1. On the whole, this is a positive result: we have correctly identified
the important parameters when the number of simulations was much smaller than
the number of parameters.
122 5 Regression Approximations to Estimate Sensitivities
Repeating the process for ridge and elastic net with α = 0.5, we obtain the other
results in Fig. 5.6. In these results we see that ridge has difficulty with this problem:
it underestimates the large coefficients and says that far too many variables have
a significant coefficient. Elastic net regression with α = 0.5 gives results similar
to lasso with smaller estimated values for the large coefficients, but unlike lasso
gives more nonzero coefficients to the variables with a low sensitivity. These results
indicate that on a problem like this, where I < J and there are many variables that
have a sensitivity near zero, lasso or elastic net regression is superior to the ridge.
On another type of problem, we can see the benefit of elastic net with a value
of α close to 0. In Sect. 4.3.2 we defined an ADR problem where the reaction
rate coefficient, κ, was a Gaussian process. As a result a problem with Nx spatial
cells had nominally Nx parameters. In Sect. 4.3.2 we used Nx = 2000, and as a
result, the finite difference estimates of the first-order sensitivities required 2001
simulations to calculate. In this case we solve the same problem and use a Latin
hypercube design to sample from the random process the values of κ to run in
the simulation. The results for the scaled sensitivity coefficients using 100 sample
points and different regularized regression techniques, with λ chosen using leave-
one-out cross-validation, are shown in Fig. 5.7. In the figure the finite difference
estimates are shown as a reference. These results represent 5% of the number of
evaluations of Q required to estimate the points via finite difference. One feature of
the finite difference estimates is that they are nonzero everywhere. In this case lasso
gives us a sparse solution with few nonzero coefficients. At the opposite end of the
spectrum, the ridge result captures the overall trend of the finite difference estimates
without exact quantitative agreement. For example, ridge correctly predicts that the
most significant values are on the edge of the region where κ is low. The elastic
net results with a small value of α have similar values to the ridge results, with
amplified oscillations in the parameters. As α grows the elastic net result transitions
to the lasso value.
To really stress the regression methods, we can reduce the number of simulations
used to estimate the coefficients to 10, i.e., 0.5% of the number of simulations
required for finite difference estimation. The results, shown in Fig. 5.8, indicate
that, once again, lasso does not properly estimate the sensitivities. In this case a
small amount of L1 penalty in the elastic net (α = 10−3 ) gives the best estimates;
the ridge result gives larger amplitude oscillations in the estimates on the left side
of the problem and a flat behavior on the right side.
The following conclusions are in order. On problems with large number of
parameters with a small number of significant sensitivities, lasso regression or
elastic net regression with α near 1 can estimate the sensitivity coefficients. On
the other hand, if there is correlation between the sensitivities, as there was spatial
correlation between the sensitivities in the random process example, ridge or elastic
net with a small value of α gives adequate estimate of the qualitative behavior of the
sensitivity, even when the number of simulations is two orders of magnitude smaller
than the number of parameters. Nevertheless, some user judgment is required to
estimate to which situation a given problem corresponds.
5.3 Fitting Regularized Regression Models 123
Ridge α= 10−3
0.04 0.04
0.02 0.02
0.00 0.00
0 500 1000 1500 2000 0 500 1000 1500 2000
−2 α= 0.1
α= 10
0.06 0.20
0.15
0.04
value
0.10
0.02
0.05
0.00 0.00
0 500 1000 1500 2000 0 500 1000 1500 2000
α= 0.5 Lasso
0.6
2
0.4
1
0.2
0.0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
Variable
Fig. 5.7 Estimates of the scaled sensitivity coefficients for the ADR problem with ω defined by a
random process evaluated at 2000 points using several regression techniques and a 100 point Latin
hypercube design. The smooth, solid lines indicate the finite difference estimates
Ridge α= 10−3
0.04 0.04
0.02 0.02
0.00 0.00
0 500 1000 1500 2000 0 500 1000 1500 2000
α= 10−2 α= 0.1
0.100
0.075 0.2
value
0.050
0.1
0.025
0.000 0.0
0 500 1000 1500 2000 0 500 1000 1500 2000
α= 0.5 Lasso
10.0
0.8
0.6 7.5
0.4 5.0
0.2 2.5
0.0 0.0
0 500 1000 1500 2000 0 500 1000 1500 2000
Variable
Fig. 5.8 Estimates of the scaled sensitivity coefficients for the ADR problem with ω defined by a
random process evaluated at 2000 points using several regression techniques and a 10 point Latin
hypercube design. The smooth, solid lines indicate the finite difference estimates
In the examples above, I used the glmnet library for R to fit the regression models.
This library has a built-in cross-validation function to choose λ and a reasonable
user interface. It can fit elastic net models and, therefore, ridge and lasso regression
models as well. For python, the sklearn library has an elastic net function in the
sklearn.linear_model module.
j −1
J
∂ 2 Q
+ (xij − x̄j )(xij − x̄j ) . (5.12)
∂xj ∂xj x̄
j =1 j =1
We have ignored the higher-order correction terms in this equation. We can write
Eq. (5.12) as regression system so that the coefficients, β, are the scaled sensitivity
coefficients by making the entries in the data matrix X have the appropriate scaled
values. A common mistake is to forget the inclusion of the factor of one-half in the
single-variable second derivatives.
As with the first-derivative sensitivities, we can estimate the sensitivities in
2
Eq. (5.12). In this equation there are 12 J (J + 3) = J2 + 3J
2 sensitivities to estimate:
J first derivatives, J single-variable second derivatives, and 12 J (J − 1) terms from
the mixed-variable second derivatives. In this case standard least-squares regression
may require fewer function evaluations than finite difference, where previously we
showed that the number of function evaluations was 2J 2 +1. Therefore, it is possible
to save a large number of simulations relative to finite difference without needing
regularized regression.
We can apply regression to the ADR problem we solve in Sect. 4.4. This problem
had J = 5 and, therefore, 20 total first- and second-derivative sensitivities and
requires 51 function evaluations for finite difference. Using regression, and a Latin
hypercube design covering ±10% around the nominal values of the parameters,
we computed the estimates in Fig. 5.9. Given that there are 20 sensitivities, 20
126 5 Regression Approximations to Estimate Sensitivities
samples = 20
150
100
50
Abs. Scaled Sensitivity Coeff.
0
ν
ω
κl
κh
q
νν
ων
ωω
κ lν
κ lω
κlκl
κhν
κhω
κhκl
κhκh
qν
qω
qκl
qκh
qq
samples = 32
50
40
30
20
10
0
ν
ω
κl
κh
q
νν
ων
ωω
κlν
κlω
κlκl
κhν
κhω
κhκl
κhκh
qν
qω
qκl
qκh
qq
Sensitivity
Method
Lasso Ridge
α = 0.5 Least squares
Finite diff
Fig. 5.9 Comparison of the different regression methods to estimate the first- and second-
derivative scaled sensitivity coefficients using 20 and 32 evaluations of a QoI with 5 parameters.
The finite difference estimates required 51 QoI evaluations
QoI evaluations are sufficient to use least-squares regression. Indeed, with both 20
and 32 samples, the least-squares estimates give estimates that are closest to the
finite difference estimate. With 32 evaluations of the QoI, the regularized regression
estimates are all accurate. With only 20 evaluations of the QoI, the regularized
regression estimates lose some accuracy; ridge regression has large errors in the
estimation of the second-derivative sensitivities, however.
5.6 Exercises 127
5.6 Exercises
Be sure to do cross-validation for each fit, and for each method present your best
estimate of the model.
2. Using a discretization of your choice, solve the equation
∂u ∂u ∂ 2u
+v = D 2 − ωu,
∂t ∂x ∂x
for u(x, t) on the spatial domain x ∈ [0, 10] with periodic boundary conditions
u(0− ) = u(10+ ) and initial conditions
1 x ∈ [0, 2.5]
u(x, 0) = .
0 otherwise
I didn’t think so much of him at first. But now I get it, he’s
everything that I’m not.
—from the film The Royal Tenenbaums
Adjoints are useful in local sensitivity analysis because they can give information
about any perturbed quantity with only one solve of the forward system (i.e., the
system we solve to compute the QoI) and one solve of the adjoint equations. The
adjoint problem is defined as a system where the physics, in a sense, happen in
reverse. Therefore, adjoint solution allows us to see the effect of perturbations in the
QoI parameters. Importantly, solving the forward problem once, and the adjoint
problem once, allows us to compute the sensitivity to all the parameters. This
compares to the p +1 solutions needed to compute the sensitivities for p parameters
using finite differences.
One important distinction between the adjoint method and finite differences
is that each QoI requires a separate adjoint system to be solved, whereas finite
differences can be applied to any number of QoIs without additional solutions.
On balance, when the number of QoIs is small relative to the number of uncertain
parameters, the adjoint approach can be more efficient. The issue with the adjoint
approach is that it can be difficult to define the adjoint equations. In this section
we will deal with linear, time-independent partial differential equations. In a latter
section, we relax this assumption with a concomitant increase in complexity.
where D is the phase space domain of the functions, f and g are functions that are
square integrable over the domain D, and dV is a differential phase space element.
The adjoint for an operator L is typically denoted L∗ and is defined as
using the definition of the inner product above. One can think of the operator L as
the part of the differential equations that operates on the dependent variables, as we
will see in an example soon. Using the definition of the adjoint in Eq. (6.2), it is easy
to show that adjoints make taking inner products of solution variables trivial if the
adjoint solution is known.
For a PDE with differential operator L,
Lu = q,
L∗ u∗ = w,
In other words, the inner product of u and w is the same as the inner product of q
and u∗ .
Now consider a quantity of interest Q given by an integral of the solution u
against a weighting function, w(r):
Q= dV w(r)u(r) = (u, w) (6.4)
D
Equation (6.4) indicates that we can define a QoI as an inner product by picking a
weighting function. Using relation Eq. (6.3) above, we see that the Q is just (u∗ , q).
In other words, the adjoint solution with source w(r) integrated against the source
q gives the Q, that is, we can calculate the QoI two ways:
This is not magical, however, because the adjoint equation is typically as hard to
solve as the original PDE as we will see in an example.
We now make the notion of the adjoint concrete for the steady ADR equation
with a linear reaction term for u(x) on the domain (0, X) with zero Dirichlet
boundary conditions. Under these conditions the ADR equation and boundary
conditions are
du d 2u
v − ω 2 + κu = q
dx dx (6.6)
u(0) = u(X) = 0
d d2
L=v − ω 2 + κ. (6.7)
dx dx
We will postulate an adjoint form of this system and then show that it satisfies
the definition in Eq. (6.2). The form of the adjoint we propose is basically the same
equation, with the sign of the advection term flipped:
d d2
L∗ = −v −ω 2 +κ
dx dx
u∗ (0) = u∗ (X) = 0. (6.9)
X X X
X du du∗ du∗
vu∗ dx = vu∗ u − v u dx = −vu dx (6.11)
0 dx 0 0 dx 0 dx
132 6 Adjoint-Based Local Sensitivity Analysis
which is the term on the RHS of Eq. (6.10). The diffusion term just needs integration
by parts twice:
X X X 2 ∗
X
∗d
2u
∗ du du du∗
X du∗ d u
u dx = u − dx = u + u 2 dx
2
0 dx dx 0 0 dx dx dx 0 0 dx
(6.12)
which matches the diffusion term on the RHS of Eq. (6.10).
With the known form of the adjoint ADR equation, we can use it to compute a
QoI. Notice that because the adjoint equation is also an ADR equation, it is no easier
to solve than the original equation.
As an example, if our QoI is the average of u over the middle third of the domain,
this would make w(x):
3
x ∈ [ X3 , 23 X]
w(x) = X , (6.13)
0 otherwise
which leads to a Q:
2
3X 3
Q= u(x) dx. (6.14)
X
3
X
Lu = q or L∗ u∗ = w, (6.15)
and compute
Our interest in the adjoint solution arises from the manner in which they allow first-
order sensitivities to be computed. In some situations, this is called perturbation
6.1 Adjoint Equations for Linear, Steady-State Models 133
analysis, but as we will see it is the same as the sensitivity analyses discussed above.
Consider the perturbed problem:
where δL and δq are perturbations to the original problem and δu is the change
in the solution due to changing the problem. In the ADR example, the δL would
involve changing the advection speed, diffusion coefficient, or reaction operator,
and δq would be a change to the source.
Expanding the product on the LHS of Eq. (6.17) we get
This equation is useful, except we do not know what δu is. It is simple to compute
the perturbation to L and apply it to a known forward solution u (this just involves
taking derivatives). Similarly, we can compute δq easily because q is a parameter.
To remove the δu from Eq. (6.20) we will use the property of the adjoint that we
can “switch” L and L∗ in the inner product to make the relation
where L∗ u∗ = w was used in the second equality. This makes Eq. (6.20)
Therefore, if we can get another relation for (δu, w), then we can eliminate δu from
our equations.
The definition of the perturbed QoI is
Q + δ(Q) = dV wu + dV wδu + dV (δw)u + O(δ 2 ). (6.23)
D D D
Here we have allowed for the case where w may be dependent on a parameter by
including the δw. In the ADR equation this could be the case if, for example, the QoI
were the reaction rate in a particular region. Then w would depend on the reaction
coefficient.
134 6 Adjoint-Based Local Sensitivity Analysis
Using this result in Eq. (6.22) gives an equation for the perturbation to the QoI in
terms of perturbations to parameters and the forward and adjoint solution:
That is, if we know u and u∗ , we can compute δ(Q). In general, for a quantity θ ,
we interpret the quotient δQ/δθ as the partial derivative of the QoI with respect to
θ . This interpretation is reasonable because the perturbation can be as small as we
like. Therefore we write
∂Q ∂q ∗ ∂w ∂L ∗
= , u + u, − u, u . (6.25)
∂θ ∂θ ∂θ ∂θ
Using the same data as the example in the previous chapter (Sect. 4.3) where the
source and κ varied over space and the QoI was the total reaction rate,
10
Q = (u, κ) = κ(x)u(x) dx
0
5 7.5 10
= κh u(x) dx + κl u(x) dx + κh u(x) dx,
0 5 7.5
we can compute the sensitivities using the adjoint. For convenience we will write the
source in the example as q(x) = q̂x(10 − x) so that there is no confusion between
the general source in our adjoint derivations, q, and the source strength in the ADR
example, now set to q̂.
We can use the code from that example to implement the adjoint operator by
simply running the code with v → −v and setting the source in the adjoint
equation to be κ. The adjoint solution u∗ at the mean of the all the parameters is
shown in Fig. 6.1. In this case if we compute the QoI using the forward or adjoint
solution using Eq. (6.5) we get a match to 12 digits for our QoI, the total reaction
rate:
The numerical method to solve the adjoint ADR equation is given in Algo-
rithm 6.1. Notice that this algorithm differs from the forward model only by the sign
6.1 Adjoint Equations for Linear, Steady-State Models 135
0.4
u†(x) 0.3
0.2
0.1
0.0
0.0 2.5 5.0 7.5 10.0
x
of v, the righthand side of the resulting linear system is now κ, and the calculation
of Q now uses the source.
To compute the inner products we use simple quadrature based on the midpoint
rule. The derivative of Q with respect to v is computed using Eq. (6.25) to get
∂Q ∂u ∗
=− , u = −1.74049052049,
∂v ∂x
136 6 Adjoint-Based Local Sensitivity Analysis
The derivative of u must be estimated from the forward solution using finite
differences. Here we used forward finite differences because this is consistent with
what we used in the solution technique. This number agrees with the finite difference
result given previously to four digits.
Similarly, the derivative with respect to ω involves the integral of the second
derivative of the forward solution times the adjoint solution:
2
∂Q ∂ u ∗
= , u = −0.970207772262.
∂ω ∂x 2
For κl the derivative is based on an integral only over the range x ∈ (5, 7.5) because
the operator only depends on κl in that range. Additionally, however, w also depends
on κl meaning that we will have two terms in our uncertainty:
7.5 7.5
∂Q
= u(x) dx − u(x)u∗ (x) dx = 12.862742303.
∂κl 5 5
The first term here is the derivative with respect to w term in Eq. (6.25), and the
second term is the derivative of L term.
The sensitivity to κh is an integral over the parts of the problem where κ = κh ,
again with terms relating to the derivatives of w and L:
5 10 5 10
∂Q
= u(x) dx + u(x) dx − u(x)u∗ (x) dx − u(x)u∗ (x) dx
∂κh 0 7.5 0 7.5
= 17.7613932101.
The final sensitivity to compute is involves the source strength, q. From Eq. (6.25)
we get
∂Q
= (x(10 − x), u∗ ) = 52.3903954692.
∂ q̂
Notice that with the q̂ sensitivity we only have the derivative with respect to q
contributing to the sensitivity in this case.
These results all agree with the first-order derivative results in Table 4.1 to several
digits.
using finite differences, we required 2001 solutions to the ADR equations in this
case to get the sensitivity of Q with respect to κ when we used 2000 mesh cells.
With the adjoint approach, we only need a single forward and a single adjoint solve
to arrive at the same answer. To compute the sensitivity of Q to the value of κ in
any mesh cell i, we evaluate
xi+1/2 xi+1/2
∂Q
= u(x) dx − u(x)u∗ (x) dx
∂κi xi−1/2 xi−1/2
xi+1/2
= u(x)(1 − u∗ ) dx ≈ ui (1 − u∗i )Δx,
xi−1/2
where xi+1/2 is the right edge of the ith mesh cell, ui is the average value of u(x)
in mesh cell i, and Δx is the width of the mesh cell.
This calculation gives identical results to those in Sect. 4.3.2 with a factor 1000
fewer solutions to the ADR equation (2 compared to 2001).
In the previous section, we had to make some strong assumptions about the
underlying mathematical model to use adjoints, namely, that it was steady in time
and linear. In this section we relax that assumption and show how an adjoint
equation can be formed. To begin we have a time-dependent PDE of the form
where F (u, u̇) is an operator, u̇ is the time derivative of u, V is the spatial domain,
and the time domain goes from time 0 to tf . We also have boundary conditions such
that the solution goes to zero on the boundary of the domain of interest:
u(x, t) = 0 x ∈ ∂V ,
We can modify this equation for the QoI by adding a Lagrange multiplier, u∗ , times
F without changing the QoI because F (u, u̇) = 0. We call this new quantity the
adjoined metric and write it as
tf
# $
L = (u, w) − (F, u∗ ) dt. (6.28)
t0
138 6 Adjoint-Based Local Sensitivity Analysis
dL ∂L ∂u ∂L ∂ u̇ ∂L
= + + ,
dθ ∂u ∂θ ∂ u̇ ∂θ ∂θ
where the partial derivatives hold the other quantities constant, e.g., ∂L/∂θ is taken
with u and u̇ constant. One can think of the first variation as the total derivative of
the functional with respect to a parameter.
Using this definition of the first variation, we can then write for LQ ≡ (u, w),
dLQ ∂u ∂u ∂w
= ,w + u, = (1, w)uθ + (u, wθ ), (6.29)
dθ ∂u ∂θ ∂θ
In order to eliminate uθ from this equation, except for the boundary term, we will
define the Lagrange multiplier so that
d ∂ ∂
(1, w) + (F, u∗ ) − (F, u∗ ) = 0. (6.33)
dt ∂ u̇ ∂u
The boundary term in Eq. (6.32) evaluates uθ at t = t0 and tf . At the final time,
we can state that u∗ (tf ) = 0 to represent the fact that anything that happens beyond
the final time does not contribute to the quantity of interest. The issue of the final
6.2 Adjoints for Nonlinear, Time-Dependent Equations 139
Notice that to evaluate the equation we need the full forward solution and adjoint
solution for all time to compute the integrals. This could represent a storage problem
for large-scale systems.
The above derivation of the adjoint for a time-dependent problem was fairly abstract.
We shall show it applies to the ADR problem we have seen before. For the linear
ADR equation we saw before, the system can be written in as F (u, u̇) = 0 where
∂u ∂ 2u
F (u, u̇) = u̇ + v − ω 2 + κu − S.
∂x ∂x
We also consider a problem domain given by x ∈ [0, X], with u(0, t) = u(X, t) =
0. For a generic QoI weighting function w, terms in Eq. (6.33) are computed below.
We begin with the term involving the derivative with respect to û:
X
d ∂ d ∂ ∂u ∂ 2u
(F, u∗ ) = u∗ (x, t) u̇ + v − ω 2 + κu − S dx
dt ∂ u̇ dt ∂ u̇ 0 ∂x ∂x
X ∗
∂u
= dx.
0 ∂t
X
∂ ∂ ∂u ∂ 2u
(F, u∗ ) = ∗
u (x, t) u̇ + v − ω 2 + κu − S dx
∂u ∂u 0 ∂x ∂x
X ∗
∂ ∂u ∂ u∗
2
= u(x, t) −v − ω 2 + κu∗ dx
∂u 0 ∂x ∂x
X ∗ 2 ∗
∂u ∂ u
= −v − ω 2 + κu∗ dx.
0 ∂x ∂x
where we used integration by parts to move the derivatives onto the adjoint variables.
In doing so we relied on the fact that u(0, t) = u(X, t) = 0 and that we are free to
define the boundary conditions for u∗ to be u∗ (0, t) = u∗ (X, t) = 0.
If we assert that Eq. (6.33) holds at every point in the medium, then we have the
adjoint equation
∂u∗ ∂u∗ ∂ 2 u∗
− −v − ω 2 + κu∗ = w,
∂t ∂x ∂x
with boundary and final conditions,
This equation is the time-dependent version of Eq. (6.1.1). As a result our new
approach to deriving an adjoint is equivalent to the previous one in the steady-state
limit; except now we can, in principle, handle more complicated equations.
∂ 2 u4
F (u, u̇) = ρ u̇ − ω + κu4 − S. (6.36)
∂x 2
The boundary conditions we use are u(0, t) = u(X, t) = 0 and the initial condition
is 0. Notice that for this new form we have the time derivative linear in u and the
other terms involve u4 . For this new form of F we will have to compute
X 2 ∗
∂ ∗ ∂ 2 u4 X
3∂ u 3 ∗
u (x, t) −ω 2 + κu4 − S dx = −4ωu + 4κu u dx.
∂u 0 ∂x 0 ∂x 2
6.2 Adjoints for Nonlinear, Time-Dependent Equations 141
∂u∗ ∂ 2 u∗
−ρ − 4ωu3 2 + 4κu3 u∗ = w.
∂t ∂x
This is a linear equation, but it requires knowledge of u(x, t) at every time and point
in space to evaluate.
To demonstrate how this works we will solve a problem where, X = tf = 2,
κh 1 ≤ x ≤ 1.5
κ(x) = ,
κl otherwise
q 0.5 ≤ x ≤ 1.5
S(x) = .
0 otherwise
ρ = 1, ω = 0.1, κl = 0.1, κh = 2, q = 1.
this makes
κ(x) x ∈ [1.5, 1.9], t ∈ [1.8, 2]
w(x, t) = .
0 otherwise
Notice that we only need the adjoint solution in the time range t ∈ [1.8, 2].
The solution to the forward and adjoint versions of this problem are shown in
Fig. 6.2.
With this system we will find the sensitivities to ρ, ω, κl , and κh , q using finite
differences and the adjoint methodology. The results are given in Table 6.1. These
results were obtained using 200 mesh cells of finite difference and a second-order
predictor-corrector Runge-Kutta method with Δt = 0.0001. The finite difference
parameter used was δ = 10−6 .
In these results we see that the finite difference and adjoint estimates of the
sensitivities agree to four significant digits, numerically demonstrating that the
adjoint approach can be extended to nonlinear and time-dependent problems.
142 6 Adjoint-Based Local Sensitivity Analysis
1.00 Forward
Adjoint
0.75
Value
0.50
0.25
0.00
0.0 0.5 1.0 1.5 2.0
x
Fig. 6.2 The solution u(x, 2) and u∗ (x, 1.8) to the nonlinear diffusion-reaction problem
Table 6.1 First-order sensitivity results for the nonlinear diffusion-reaction problem
Finite difference Adjoint estimate Abs. Rel. difference (10−5 )
ρ −0.099480 −0.099484 4.584074
ω 0.288975 0.288994 6.309322
κl −0.030224 −0.030226 6.013714
κh 0.032156 0.032158 5.221466
q 0.096382 0.096387 5.469452
6.4 Exercises
1 Q
−∇ 2 φ(x, y, z) + φ(x, y, z) = ,
L2 D
for X, Y, Z, L, D, and Q.
2. Using a discretization of your choice, solve the equation
∂u ∂ 2u
v = D 2 − ωu + 1,
∂x ∂x
for u(x) on the spatial domain x ∈ [0, 10] with periodic boundary conditions
u(0) = u(10). Use the solution to compute the total reactions
6
dx ωu(x).
5
Derive the adjoint equation for this equation, and use its numerical solution to
compute the sensitivities to the following parameters.
(a) μv = 0.5,
(b) μD = 0.125,
(c) μω = 0.1,
Part III
Uncertainty Propagation
In this part we will apply various techniques to understand the distribution of quanti-
ties of interest due to the distribution of input parameters. These methods range from
straightforward, robust, and expensive sampling-based techniques to inexpensive,
but fragile, reliability methods. The content of this part is not independent from the
local sensitivity work in the previous part. Sensitivity analysis can be used to screen
out those inputs that have a small impact on the QoI. This can save considerable
time when propagating uncertainties. At the same time, there is the danger that an
unimportant variable at one point of input space may be important in another.
In the next chapter, we present methods based on random samples from the input
distributions to get samples of the QoI and then use those samples to infer properties
of the QoI distribution.
Chapter 7
Sampling-Based Uncertainty
Quantification: Monte Carlo and Beyond
In its most basic form, we will use Monte Carlo methods to produce samples from
distributions and use those samples to infer information regarding the distribution.
Oftentimes, we are interested in the distribution of a QoI, but the methods we cover
are not restricted to these cases.
Assume that we have N independent and identically distributed samples from
the probability distribution of a QoI, Q(X), where X is a p-dimensional vector
of random variables. These samples are obtained by sampling values of X and
evaluating q(x). We are interested to know the expectation of some function of the
QoI. Using our previous definitions, we have
E[g(Q)] = dx1 · · · dxp g(q(x))f (x), (7.1)
where f (x) is the probability density function for the inputs to the QoI. The standard
Monte Carlo estimator defines an estimate of the integral as
1
N
IN ≡ g(q(xn )). (7.2)
N
n=1
The value of IN will limit to E[g(Q)] as N → ∞ by the law of large numbers. This
result will hold even if the variance in any component of X is unbounded.
This result tells us that if we estimate moments of the QoI using samples, we
can get good estimates given “enough” samples. A good question to ask is, what is
enough?
We can estimate the variance in IN using the variance in the samples of X.
Consider the sample variance of Q:
1
N
2
σN2 = Q̄N − Q(xn ) ,
N −1
n=1
where Q̄N is the sample mean of the N samples. If the true variance of the QoI is
finite, then by the central limit theorem, the error in the estimate, IN − E[g(Q)],
will converge, as N → ∞, to a normal distribution with mean zero and variance
given by
σN2
Var (IN − E[g(X)]) = .
N
In other words, the error in the Monte Carlo estimator, as represented by the standard
deviation of the estimator, goes to zero as the square root of the number of samples,
with a constant that depends on the sample variance of the QoI.
A classical example of the Monte Carlo estimator considers the QoI given by
4 x2 + y2 ≤ 1
Q(x, y) = , (7.3)
0 otherwise
The result of this integral is π because we are taking the area of a circle with
radius 1.
To use a Monte Carlo estimate for this problem, we sample N values of x and y
based on the joint distribution (in this case we can simply sample x and y from a
uniform distribution between −1 and 1). We then evaluate Eq. (7.3) and average the
results over all N .
In Fig. 7.1 the error in the estimate of Q̄ is shown for different values of N . For
each value of N , a single estimate is computed. Therefore, due to the randomness
7.1 Basic Monte Carlo Methods: Simple Random Sampling 149
100 l l
l
l
l
10−1 l
l
l l
−2
10
Error
l l
−3 l
10
l
l
10−4
10−5 l
Fig. 7.1 The convergence of the error for the Monte Carlo estimate of the mean given by Eq. (7.4)
at different numbers of samples, N . For each value of N , a single estimate was computed. The
dashed line has a slope of −1/2
80
60
Density
40
20
0
−0.02 −0.01 0.00 0.01 0.02
Error
Fig. 7.2 The distribution of 5000 estimates of the mean using N = 105 given in Eq. (7.4). The
bars are the histogram of the estimates,
√ and the curve is the normal distribution with mean zero
and standard deviation given by σN / N
of the sampling, there is some “noise” in the convergence. The dashed line in this
figure has a slope of −1/2, demonstrating that the error decreases proportional to
N −1/2 .
We show the empirical distribution of estimates with N = 105 in Fig. 7.2. In
this figure we show a histogram of 5000 estimates of Q̄. Additionally, for each
estimate we computed the sample variance. In the figure we include the PDF for a
normal distribution with mean zero and a variance given by the mean of the sample
variances divided by 105 . This PDF agrees with the histogram, as predicted by the
theory above.
150 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
N
π(θ |x1 , . . . , xN ) ∝ f (xn |θ )π(θ ). (7.6)
n=1
N
N % &
1 (μ − xn )2
f (xn |θ ) = √ exp −
2 2σ 2
n=1 n=1 2π σ
N N
1 (μ − xn )2
= √ exp − .
2π σ 2 2σ 2
n=1
7.1 Basic Monte Carlo Methods: Simple Random Sampling 151
To maximize this function, we take its derivative and set it to zero. Before taking
the derivative, we use the trick of taking the logarithm first. We can do this because
the likelihood is nonnegative:
N
N N (μ − xn )2
N
log f (xn |θ ) = − log σ 2 − log 2π − .
2 2 2σ 2
n=1 n=1
d N (μ − xn )
N
log f (xn |θ ) = − (7.7)
dμ σ2
n=1 n=1
1
N
μ= xn .
N
n=1
This is the standard estimate for the mean that we have used before.
For σ 2 we get the derivative
d
N
N 1
N
log f (xn |θ ) = − 2 + (μ − xn )2 . (7.8)
dσ 2 2σ 2σ 4
n=1 n=1
Setting the derivative to 0 and solving for σ 2 , we get the maximum likelihood
estimate
1
N
σ2 = (μ − xn )2 ,
N
n=1
where μ is computed as above. This is the estimate of the variance we have seen
before, except it does not have Bessel’s correction.
What this example indicates is that we can use the maximum likelihood estimator
to estimate properties of a distribution. To use this we had to assume a form
for the final distribution (a normal in the example). In the case of the normal
distribution, we could do this by hand. In many cases, we may require numerical
root finding to determine the value of θ . As we mentioned in Chap. 1, this selection
of a distribution that we desire to fit to our data could be a source of epistemic
uncertainty. The question of how our choice of distribution affects our conclusions
needs consideration but is not always easy to quantify. For instance, for any set
of samples, we could fit a normal distribution using the procedure detailed in the
example. However, the resulting distribution may not match the histogram of the
152 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
data at all. At the very least, one should compare the behavior of the fit distribution
to the samples collected.
1 k
N
EN [Xk ] = xn , k = 1, . . . , K.
N
n=1
Then we equate these K moments to the same moments of the distribution we want
to fit:
EN [X] = xf (x|θ ) dx,
..
. (7.9)
EN [XK ] = x K f (x|θ ) dx.
The right-hand side of the system in (7.9) can be computed exactly for many
common distributions and will be a function of θ only. Therefore, in principle we
have a soluble system because we have K equations and K unknowns. The values
of θ found then give us a distribution that matches the moments of our samples. The
method of moments distribution will, in general, differ from a distribution fit using
maximum likelihood. For many distributions, the method of moments can be easier
to compute than the maximum likelihood estimate.
We will apply the method of moments to the example of considering N samples
from a distribution we believe to be normal. In this case we have two parameters to
fit, θ = (μ, σ 2 ). Assume that we have computed, from our sample, the mean, which
we will denote as μs here to make it clear that it is the sample mean, and second
moment, E[X2 ]. Performing the integrations in (7.9), we get the system
μs = μ,
EN [X2 ] = μ2 + σ 2 .
7.1 Basic Monte Carlo Methods: Simple Random Sampling 153
The first of these says that the estimate of the mean, μ, is the sample mean. The
second gives
1 2
N
σ 2 = E[X2 ] − μ2 = (μs − xn2 ).
N
n=1
These estimates are equivalent to the values from the maximum likelihood estimate.
This will not always be the case, but it is a special property of the normal
distribution.
To further demonstrate the method, we consider fitting a Gumbel distribution to
some samples. The Gumbel distribution has support on the real line and has a PDF
of
1 −(z+e−z ) x−m
f (x|m, β) = e , where z= . (7.10)
β β
The two parameters of the distribution are θ = (m, β). The mean of the Gumbel
distribution is
μ = m + β γ,
Gumbel
Normal
Density 0.15
0.10
0.05
0.00
0 10
Value
Fig. 7.3 The distribution of 1000 samples from a Gumbel distribution and a normal distribution
fit to the data using the method of moments, and a Gumbel distribution fit with the method of
moments
Both the method of moments and maximum likelihood have the feature that the
choice of distribution matters. One can use Bayesian model selection (Carlin and
Louis 2008) or other frequentist methods (Hastie et al. 2009) to help decide which
distribution is the best fit. The problem of extreme values remains: without a large
number of samples, our knowledge of the tails of the distribution will be limited or
absent.
The previous section explored the use of random sampling to estimate properties of
a distribution. In uncertainty quantification we are often interested in QoIs that have
a large number of parameters. Also, quantities of interest are not always described
by “nice” distributions: they may have non-smooth regions or cliffs in the value
that arise from extreme values of the uncertain parameters. It is often the behavior
at these cliffs or extreme points that we are most interested in. For this reason, we
desire to make sure that our sampled parameters cover the range of possible values
that could occur in the real system.
The process of selecting points to evaluate the QoI at is the problem of designing
an experiment. The design of experiments is itself a subfield of statistics and entire
volumes are devoted to its vagaries. Also, because the intricacies of designing
experiments for computer experiments are very different than designing experiments
7.2 Design-Based Sampling 155
for pharmaceutical clinical trials or social science,1 this exercise is called the design
of computer experiments. The monograph by Santner et al. (2013) is dedicated to
this topic.
In computer experiments we do not need replicates because generally computer
simulations will give the same result given the same inputs, unless the code is
stochastic in some way, such as codes that use Monte Carlo to estimate the outcome
of a stochastic process. Therefore, we focus on filling the space of the inputs, or, in
the parlance of experimental design, we seek a space-filling design.
Simple random sampling, like that we used in the previous section, is often not
adequate because of the tendency of random samples to group near the mode of the
distribution. Also, there is no guarantee that two samples will not be close together.
In pseudo-Monte Carlo (or pseudo-sampling) based on an experimental design, we
address this issue by imposing some structure on the sampling procedure but retain
the stochastic nature of the process.
% &
m−1 m
Sm = , , m = 1, . . . , M. (7.11)
M M
1 Forsome time, design of experiments for social science has consisted of running a trial until
one gets the desired answer and then stopping (John et al. 2012). There are efforts to identify and
correct this practice (Collaboration et al. 2015).
156 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
x
−3σ −2σ −σ 0 σ 2σ 3σ
Fig. 7.4 A demonstration of stratified sampling for a standard normal distribution. There are four
strata (denoted by solid vertical lines) with five points in each. The y-position of the points is set
randomly
In each stratum we will have NS uniform random variables, tn . To get the samples
from a random variable X, we evaluate
xn = FX (tn ).
Stratified sampling will assure that for M strata there are at least N/M samples
in each of the quantiles [0, M −1 ], [M −1 , 2M −1 ], . . . , [(M − 1)M −1 , 1]. In other
words, we are guaranteed to have some number of samples in the extreme values of
the distribution with some measure of even distribution in between. For example, if
N = M = 100, we know that one point will be in the bottom 1% of possible values
and one point will be in the top 1% of possible values. This is not guaranteed with
random sampling: taking 100 samples may not give any points at this extreme.
In Fig. 7.4 20 points divided into four strata (5 points per stratum) are shown
for a standard normal distribution. The strata boundaries define the quartiles of the
standard normal distribution. We can see that in each stratum, there are exactly five
points: giving five samples per quartile in this case.
We can show that the variance in an estimate from stratified sampling will be
lower than one from simple random sampling. As demonstrated by Santner et al.
(2013), it is possible to show that the variance in an estimate IN,strat ≈ E[g(X)] can
be written as
M 2
V m
Var(IN,strat ) = σm2 , (7.12)
nm
m=1
where Vm is the volume of the mth stratum, nm is the number of samples per stratum,
and σm2 is the variance of g(X) over the mth stratum. If the number of strata is equal
to the total number of samples, i.e., M = N , the strata are all of the same size,
Vm = N −1 for p the size of X, and the number of samples per stratum is nm = 1,
then Eq. (7.12) simplifies to
7.2 Design-Based Sampling 157
100
100
0.5
0.49 10−2 1
1
10−2
−4
10
Error
Error
10−4 10−6
10−8 1.49
10−6 1.09
1
1 10−10
Fig. 7.5 The convergence of the standard deviation of the estimated mean of a standard normal
(left) and a uniform distribution (right) using 50 replicates at each value of N with stratified
sampling where M = N and standard (unstratified) sampling
1 2
N
Var(IN,strat ) = σm . (7.13)
N2
m=1
1 2
Var(IN,strat ) ≤ σ̂ .
N
The best possible value of σ̂ 2 can be shown to be CN −2/p (Carpentier and Munos
2012) (the value of C depends on the gradient of g(X)). Therefore, we can say that
the variance in the estimate from stratified sampling with equal-sized strata and a
single sample per stratum is
−(1+ p2 ) 1 2
CN ≤ Var(IN,strat ) ≤ σ̂ . (7.14)
N
Therefore, stratified sampling will be no worse than simple random sampling and
could be much better.
A simple demonstration of stratified sampling can be seen by sampling from a
distribution and computing the mean. As we discussed before, repeatedly doing this
with simple random sampling will give an estimate of the mean that has a standard
deviation of that scales as N −1/2 , where N is the number of samples. Doing this
with stratified sampling, we see that the standard deviation of the estimate decreases
faster. Results from a numerical experiment where the standard deviation of the
mean is estimated using 50 replicates for each value of N are shown in Fig. 7.5.
The figure shows the results for two underlying distributions: a standard normal and
a uniform distribution. In the figure we see that stratified sampling with N = M
158 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
100
100
0.5
0.5 1
10−1 10−2
1
10−2
Error
Error
1.01
0.72 10−4
−3 1
10 1
10−4 10−6
10−5
Fig. 7.6 The convergence of the standard deviation of the estimated mean of the 2-D distribution
(left) given by Eq. (7.4) and the sum of two uniform random variables (right) using stratified
sampling compared with standard (unstratified) sampling
yields estimates with a much lower standard deviation than unstratified sampling.
Estimating the mean of the uniform distribution does give the theoretical best
convergence for the standard deviation, O(N −3/2 ), whereas the normal converges
slightly slower, at about O(N −1 ).
The effect of stratifying in multiple dimensions is demonstrated in Fig. 7.6 where
the standard deviation of the mean of a QoI that is a function of two random
variables, given in Eq. (7.4), is shown at different values of N . Once again, stratified
sampling converges faster than simple random sampling, but the rate has decreased
to be less than O(N −1 ). For a simpler estimate, the sum of two uniform random
variables, the standard deviation converges to zero at the theoretical rate of O(N −1 )
given by Eq. (7.14).
One of the drawbacks of stratified sampling is that the number of samples
required for a full stratification grows geometrically as the number of dimensions
increases. For example, to have s strata per dimension will require s d samples if d
is the number of dimensions: this is the dreaded curse of dimensionality. When the
number of dimensions is high, full stratification becomes impossible. This is one
of the reasons to undertake a variable screening study using inexpensive sensitivity
methods.
The idea can be demonstrated using in 2-D using a Latin square.2 The square has
a side length of 1 and the divisions in each dimension correspond to the quantile of
the variable to be sampled. If we desire N total samples, we divide the square into
N 2 equally sized cells. Then in each row, permutations of the integers 1 through
N are placed so that no column has an integer repeating. This is reminiscent of the
puzzle game sudoku.
For N = 4, two possible examples of these permutations are
3 1 4 2 4 3 2 1
4 3 2 1 3 4 1 2
2 4 1 3 2 1 4 3
1 2 3 4 1 2 3 4
The next step is to pick a random integer between 1 and N . This integer then
selects the N cells to generate a sample in. In our example, if 4 is the integer chosen,
the cells chosen are
3 1 4 2 4 3 2 1
4 3 2 1 3 4 1 2
2 4 1 3 2 1 4 3
1 2 3 4 1 2 3 4
We would then pick a point, at random, in each of the shaded boxes to get our four
samples. The design on the right picked the diagonal, which is not ideal because of
the correlation between the dimensions. We will revisit this idea later.
To generalize the Latin square to a hypercube, we define a X = (X1 , . . . , Xp ) as
a collection of p independent random variables. To generate N samples, we divide
the domain of each Xj in N intervals. In total there are N p such intervals. The
intervals are defined by the N + 1 edges:
( )
1 2 N −1
Fj−1 (0), Fj−1 , Fj−1 , . . . Fj−1 , Fj−1 (N ) .
N N N
2 The terminology hypercube comes from the fact that we use a square in p dimensions to formulate
the design.
160 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
where uij ∼ U (0, 1). This makes xi = (xi1 , . . . , xiN ). the ith sample of X.
As an example we look at a case with p = 3 and N = 4. In this case a possible
matrix Π is
⎛ ⎞
4 1 2
⎜3 3 1⎟
Π=⎜
⎝2
⎟.
2 3⎠
1 4 4
Notice that each of the four intervals in each dimension are only sampled once.
Latin hypercube designs can be shown to be an improvement on simple random
sampling by looking at how the “main effects” of the function we are trying to
estimate the expected value of. Say we are interested in E[g(Q)] as estimated by
Eq. (7.2). If the random variable space is p dimensional, there are p main effects
defined by the function of xp :
αj (xj ) = dx1 · · · dxj −1 dxj +1 · · · dxp (g(q(x)) (7.16)
− E[g(Q)])f (x1 , . . . , xj −1 , xj +1 , . . . , xp ).
The main effects give a measure of how much single dimension of the input space
affects the behavior of g(Q) about its mean. It can be shown (Stein 1987) that unless
7.2 Design-Based Sampling 161
the main effects are all zero, Latin hypercube will have a superior convergence of the
error compared to simple random sampling. Indeed, the integral of the main effects
squared gives the improvement of Latin hypercube sampling over simple random
sampling.
We noted above that some designs generated by a Latin hypercube are better than
others. Because the selection of intervals is chosen using random permutations, it is
possible that the design does not fill the space optimally. Also, when the points
are chosen inside the intervals, they could be close together due to the random
placement. To address this we introduce the distance between any two points:
⎛ ⎞1/
p
ρ (x, y) = ⎝ |xj − yj | ⎠ . (7.17)
j =1
For a given design, X = {X1 , · · · , XN }, the minimum distance between two points
can be defined as
Latin hypercube designs assure that a point in each of the N intervals is selected
for each of the p dimensions in X. We could extend this to ask, for example, could
we create a design that selects every pair of intervals, i.e., if I project the design
onto any 2-D plane, the sampling fills the space. This can be generalized to other
groupings of intervals.
162 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
To construct designs that have this desired projection property, we can use an
orthogonal array. An orthogonal array O of strength t on s intervals is an N × p
matrix, where N = λs t and the number of dimensions p ≥ t and has property
that in every N × t submatrix of the orthogonal array the s t , possible rows appear λ
times. The parameter λ can be thought of as the number of replicates, so in computer
simulations λ is typically set to 1.
To unpack the definition of an orthogonal array, it creates a design of N points
such that when one projects into a t-dimensional space, every interval is covered.
In this sense, when t = 1, we get a Latin hypercube design because each interval is
chosen once. Additionally, orthogonal arrays of strength 2 are the basis for factorial
experimental designs.
For an example we consider a four-dimensional space (p = 4), with three
intervals in each dimension (s = 3) and a strength of 2 (t = 2). There will be
32 = 9 samples in each pair of dimensions. An orthogonal array for this situation is
⎛ ⎞
3213
⎜1 2 3 2 ⎟
⎜ ⎟
⎜2 1 3 3 ⎟
⎜ ⎟
⎜1 3 2 3 ⎟
⎜ ⎟
⎜ ⎟
O = ⎜2 2 2 1 ⎟ .
⎜ ⎟
⎜2 3 1 2 ⎟
⎜ ⎟
⎜3 3 3 1 ⎟
⎜ ⎟
⎝3 1 2 2 ⎠
1111
Each row in this array gives an interval to pick a point in, just like the matrix Π did
for a Latin hypercube. The samples corresponding to each entry in the matrix are
generated using Eq. (7.15). From this example orthogonal array, we get the design
shown in Fig. 7.7.
The generation of orthogonal arrays is not straightforward. For R, the package
DoE.base will generate strength 2 orthogonal arrays with the oa.design
function. Python has the OApackage for generating these arrays as well.
Quasi-Monte Carlo dispenses with the notion of using random numbers in the
sampling and uses sequences of seemingly random numbers. These sequences
can be designed so that they are space filling and can be rapidly generated. The
sequences of samples are often called low-discrepancy sequences because there is a
measure of uniformity in how they fill the space, i.e., they do not leave large gaps.
The simplest low-discrepancy sequence is the van der Corput sequence. For a
given base, b, this sequence takes the integers n = 1, . . . , N and for each,
7.3 Quasi-Monte Carlo 163
l l l
l l l
l
l
l l l
l l
l l
l
l
l
l l
l l l
l l l l l
l l l
l l l
l l
l l l
l
l l
l l l
l l
l l
Fig. 7.7 A design generated by an orthogonal array on four dimensions, with three intervals
per dimension and strength 2. The headings indicate the dimension shown on the x and y axis,
respectively. Note that every possible 2-D projection fills the nine possible pairs of intervals
1. Writes n in base b,
2. Reflects that number about the ones place to create a rational number, and
3. Writes the resulting number as a decimal.
As an example consider b = 2 and n = 2. In base 2, 2 = (10)2 , where the
subscript denotes the base. Reflecting this number gives (.01)2 in base 2 which is
2−2 = 0.25. With n = 3 we have 3 = (11)2 and (.11)2 = 2−1 + 2−2 = 34 . The van
der Corput sequence in base 2 is
1 1 3 1 5 3 7 1 9 5 13 3 11 7 15
2 , 4 , 4 , 8 , 8 , 8 , 8 , 16 , 16 , 16 , 16 , 16 , 16 , 16 , 16 , . . .
The first eight points of the van der Corput sequence base 2 are shown in Fig. 7.8.
Notice that the sequence moves to fill in the largest gap in the interval for each point
added.
Using the van der Corput sequence, we can use the sequence points to be a
uniform random number to use for sampling. Though the formula will generate
numbers for any base b, the base must be prime to avoid repeated numbers. Also, we
need to generalize the prescription to sample from a multidimensional distributions:
if we use van der Corput with the same base in each dimension, we will only sample
the diagonal.
164 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
8
4
2
6
1 1 1 3 1 5 3 7
0 16 8 4 8 2 8 4 8 1
5 3 7
Fig. 7.8 The first eight points of the base 2 van der Corput sequence. The first point is 12 ; the paths
between subsequent points are labeled
The Halton sequence is the generalization of van der Corput sequences to multiple
dimensions. Each dimension has a different prime base in the sequence. This
effectively generalizes the van der Corput sequence to multiple dimensions and is
simple to generate the points. However, there is a drawback in that when the prime
number used for the base is large, the number of consecutive samples where the
sequence has a monotonic behavior (i.e., sample n + 1 is greater than or less than
n for many consecutive n) causes the Halton sequence to not behave in a seemingly
random manner or to fill the space.
A demonstration of the behavior of Halton sequences for different numbers
of dimensions is shown in Figs. 7.9 and 7.10. In the five-dimensional case in
Fig. 7.9, the space is reasonably filled with a small correlation between the variables.
However, when the dimension is increased to 40, Fig. 7.10, there is a clear
correlation between certain variables, and there are large gaps of unfilled space.
For these reasons, Halton sequences are not suggested for input parameter spaces
larger than about eight dimensions. There are alternatives to the Halton sequence
that utilize van der Corput sequences. One possibility is the Faure sequence which
reorders the van der Corput sequence (Faure 1982).
X1 X2 X3 X4 X5
5
X1
ρ = − 0.03 ρ = − 0.01 ρ = − 0.03 ρ = − 0.01
2
0
1.00 l
l
l l
l
l l l
l l l
l l
l l
l l l
l l
l l l
0.75 l
l
l l l
l
l
l l l
l l
l l
l l l
l l
l l
X2
l
0.50 l
l
l
l
l
l
l
l
l
l
ρ = − 0.02 ρ = − 0.03 ρ=0
l l
l l l
l l l
l l
l l l
l l
l l
l
0.25 l
l
l l
l l
l
l l
l l l
l l
l l l
l l l
l l
l l l
0.00 l l
1.00 l l
l
l l l
l l
l l
l l l l l l
l l l l l l
l l l l
l l l l l l
l l l l ll
l l l l
l l l l
0.75 l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l l l l
l l l l l l
l l l ll l
l l l l
l l l l l l
X3
l l
ρ = − 0.04 ρ = − 0.04
l l l l l l
0.50 l
l l
l
l
l
l
l
l
l
l
l
l
l l
l
l l l l ll l
l l l
l l l l l l
l l l l l l
l l l l
l l l l l l
l l
0.25 l
l l
l
l l
l l
ll
l
l
l
l
l l l l l l
l l l l
l l l l l l
l l l l l l
l l l l
l l l l l l
l l l l l l
0.00 l l
1.00 l
l
l
l
l l
l l l
l l l l
l
l
l
l l l ll l l
l
l l l l l l
l l l l l l
l l ll l l l
l l l l l
l l l l l l
l l l l
l l ll l l l
l
0.75 l
l l
l
l
l l
l
l l
l
l
l
ll
l
l l
l
l
l
l
l
l
l l l l l l
l l l l
l l l ll l
l l
l l l
l l l
l l l l l
l
l l ll l l l
l l l l l
X4
ρ = 0.01
l l l l l l
l l l
0.50 l l
l
l
l
l
l
l l
l
ll
l l
l
l
l l
l
l
l
l l l
l ll ll
l
l l l
l l l l l l
l l l l
l l ll l
l l l
l l l l l l
l l l l l l l l l
l l l
ll l l
ll l l l
0.25 l
l
l
l
l l
l l
l
l
l l
l l
l l
l l
l l l l l l
l l l l l
l l l ll ll l l
l l
l l l l l
l l l l
l l l ll l l l
l l l l l l
l l l l l l
l l l l
ll l
0.00 l l l l l
1.00 l l
l l l l l
l
l l l l l l l
l l
l l l
l l l l l l l
l l l l l
l l l l
l l l l l l l l l l l l
l l l l l l l l
l l
l l l
l l l l l l
l l l l l
l l l l l l l l l l l l
l l l l
0.75 l
l
l
l l l
l
l
l
l
l l
l
l l
l
l l
l
l
l
l
l l l
l
l l
l l l l l l l
l l l l
l
l l l l l
l l l l l l l
l l l l
l l l l
l l l l l l l
l l l l l l
l l
l
l l l l l l l l l l l l
X5
l l l l
l l l l
0.50 l
l l l l
l l
l
l
l l
l
l l
l
l
l
l
l
l
l
l
l l l l
l l l
l
l
l
l l l l l l l l l l l l
l l l l l l l l l l l l
l l l l l l l l l l l l
l l l l l l l l l l l
l l
l l l l
l l l l l l l l l l l
l l l l l l l l
0.25 l
l l
l l
l l
l l
l
l l
l
l l
l l
l
l l
l
l l
l l l
l
l l l
l
l
l l l l l l l
l l l l l l l l l
l l l l l l l
l l
l l l l
l l l
l l l l
l l l l l l l l
l l l l l l l l l l l l
l l l l l l
l l
l l l l
l l l l l l l l
0.00 l l l l
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Fig. 7.9 The pairwise projections for the variables X1 through X5 in a 5-D Halton sequence with
100 points. The upper part of the diagram gives the Spearman correlation (ρ) between the two
variables; the diagonal shows a histogram for each variable
and 7.12 that the performance for Sobol sequences is comparable in five dimensions,
and slightly improved at 40 dimensions, relative to the Halton sequence. In 40
dimensions the Sobol sequence does not have the high correlation that the Halton
sequence did, but there is a noticeable relation between variables, especially in X11
and X12.
X11
2 ρ = − 0.02 ρ = − 0.09 ρ = − 0.02 ρ = 0.23
0
1.00 l
l
l
l
l l
l l
l l
l l
l l
l l
0.75 l
l
l
l l
l
l
l l l
l l l
l l l
l l l
X12
l l l
0.50 l
l
l
l
l
l
l
l
l
l l l
l l l
l l l
l l l
l l l
l l l
l
0.25 l
l
l
l
l
l
l
l
l
l
l
l l l
l l l
l l l
l l l
l l l
l l l
0.00 l l
1.00 l
l
l
l
l
l
l
l
l l l l
l l l l
l l l l
l l l l
l l l l
l l l l
l l l l
0.75 l
l
l
l
l
l
l
l
l
l
l
l
l l l l
l l l l
l l l l
l l l l
X13
l l l l
l l l l
ρ = 0.67 ρ = 0.2
l l l l
0.50 l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
0.25 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
0.00 l l l l
1.00 l
l
l
l
l
l
l
l l l
l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
0.75 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l l
l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
X14
l l l l l l
l l l l l l
ρ = 0.4
l l l l l l
0.50 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l l
l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l
l l l l l l l l l
l l l l l l l l l
l l l l l l l l l
0.25 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l
l l l
l l l
l l l l l l l l l
l l l l l l l l l
l l l l l l l l l
l l l l l l l l l
l l l l l l l l l
l l l l l l l l l
l l l l l l l l l
l l l l l l l l l
0.00 l l l l l l
1.00 l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
0.75 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
X15
l l l l l l l l
l l l l l l l l
l l l l l l l l
0.50 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
0.25 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l l l l l
l l l l l l l l
l l l l l l l l l l l l
l l l l l l l l l l l l
l l l l l l l l l l l l
l l l l l l l l l l l l
l l l l l l l l l l l l
l l l l l l l l l
0.00 l l
l
l l
l
l l
l
l l
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Fig. 7.10 The pairwise projections for the variables X11 through X15 in a 40-D Halton sequence
with 100 points. The upper part of the diagram gives the Spearman correlation (ρ) between the two
variables; the diagonal shows a histogram for each variable
Python there are separate libraries for Halton and Sobol sequences available. Many
of these implementations are based on the work of Fox (1986) and Bratley et al.
(1992).
X1 X2 X3 X4 X5
5
3
ρ = −0.01 ρ = −0.07 ρ = −0.04 ρ = −0.05
X1
2
0
1.00 l
l
l l
l
l
l l
l l
l l
l l
l
l l
l
l
l
l
l l
0.75 l
l
l
l
l
l
l
l
l l
l l
l l
l l
l
l l
l
X2
l l
l l
0.50 l
l
l
l
l
l
l
l
l l
l l
l l
l l
l
l l
l
l
l l
l l
0.25 l
l
l
l
l
l
l
l
l
l l
l l
l l
l l
l
l l
l
l
l
l
l l
0.00 l
1.00 l
l
l l
l
l
l
l l
l
l l
l l l l
l l l l
l l l l
l l
l l l l
l l
l l l l
l l
l l l l
l l
0.75 l
l
l
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l l l l
l l l l
l l l l
l l
l l l l
l l
l l l l
ρ = −0.02 ρ = 0.01
l l
X3
l l l l
l l
0.50 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l l l
l l l l
l l l l
l l
l l l l
l l
l l l l
l l l l
l l l l
l l
0.25 l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l l l l
l l l l
l l l l
l l
l l l l
l l
l l l l
l l
l l l l
l l
0.00 l l l l
1.00 l
l
l
l
l
l
l
l
l
l
l l l
l l
l
l l l l l l
l l
l l l l l l
l l l
l l l l l l
l l l l l l
l l l l l l
l l l
l l l l l l
l l l
l l l l l l
0.75 l
l
l
l
l
l l
l
l
l
l
l
l
l l
l
l
l l l
l l
l l l l l l l
l
l l l l l l
l l l
l l l l l l
l l l l l l
l l l l l l
l l l
ρ = −0.03
l l l l
X4
l l l l l
l l l l l l
0.50 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l l l l l l l l
l l l l l l
l l l
l l l l l l
l l l l l l
l l l l l l
l l l
l l l l l l
l l l
l l l l l l
0.25 l l
l
l
l
l
l
l
l
l
l l
l
l
l l
l l
l
l
l l
l l l l l l l l
l l l l l l
l l l
l l l l l l
l l l l l l
l l l l l l
l l l
l l l l l l
l l l
l l l l l l
0.00 l l l
1.00 l
l
l
l
l
l
l l
l
l
l
l
l l
l
l l
l
ll
l
l l l
l
l l l l l l l
l l l l l l l l
l l l l
l l l l l l l l
l l l l
l l l l l l l l
l l l l
l l l l l l ll
l l l l
l l l l l l l l
0.75 l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
ll
l
l l l l l l l
l l l l l l l l
l l l l
l l l l l l l l
l l l l
l l l l l l l l
l l l l
l l l ll
X5
l l l l l l l
l l l l l l l l
0.50 l
l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l
l l
l
l l l
l
l
l l
l
l
l
ll
l
l
l
l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l l l l l
l l l l
l l l l l l l l
l l l l
l l l l l l ll
l l l l
l l l l l l l l
0.25 l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
ll
l
l
l
l
l l
l l l l l l l
l l l l l l l l
l l l l
l l l l l l l l
l l l l
l l l l l l l l
l l l l
l l l l l l ll
l l l l
l l l l l l l l
0.00 l l l l l l l l
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Fig. 7.11 The pairwise projections for the variables X1 through X5 in a 5-D Sobol sequence with
100 points. The upper part of the diagram gives the Spearman correlation (ρ) between the two
variables; the diagonal shows a histogram for each variable
rather than each being normal, each parameter will be gamma distributed with the
parameters chosen so that the standard deviation is 10% of the mean. Additionally,
the normal copula uses the same correlation matrix as in Sect. 4.3.1.
The resulting distributions of the quantity of interest, the total reaction rate, are
shown in Fig. 7.13 for different sampling techniques and numbers of samples. At
a low number of samples, N = 100, none of the methods match the reference
solution, but we do see that the quasi-Monte Carlo designs, namely, Halton and
Sobol sampling, do seem to steadily improve as N increases. Simple random
sampling (denoted as SRS on the plot) demonstrates the most variability when
changing the number of points: as N increases the overall behavior seems to
improve, but there are idiosyncratic spikes in the plot that behave randomly because
the samples are random. The Latin hypercube samples (LHS) are in between SRS
168 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
X11
ρ = −0.05 ρ = 0.02 ρ = −0.05 ρ=0
2
0
1.00 l
ll
l l
l l
l ll
l l
l
l ll
l l
l
ll l
0.75 l
l
l l
l
l l
ll
l
ll l
l l
l
ll l
X12
l l
l
ll
l
ll l
l l
l
ll l
l l
l
l ll
0.25 l
l l
ll
l
l
l
l l
l ll
l l
l
l ll
l l l
l
ll l
0.00 l l
1.00 l
l l
l
l
l l
l
l
l
l
l
l l l l
l l l l
l l l l
l l
l l l l
l l
l l l l
l l
l l l l
l l l l
0.75 l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l
l l
l l l l
l l
l l l l
l l
X13
l l l l
ρ = −0.04 ρ = 0.05
l l
l l l l
l l l l
0.50 l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l l l l
l l
l l l l
l l
l l l l
l l
l l l l
l l
l l l l
l l l l
0.25 l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l l l l
l l
l l l l
l l
l l l l
l l
l l l l
l l
l l l l
l l l l
0.00 l l l l
1.00 l
l
l
l
l l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l l l
l l l
l l l l l
l
ll l l l l
l l l
l l l l l l
l l l
l l l l l l
l l l
l l l l l l
0.75 l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l l l
l l l l l
l
l l l
ll l l l l
l l l
l l l
X14
l l l l l l
ρ = −0.03
l l l l l l
l l l
l l l l l l
0.50 l l
l
l
l
l
l
l
l l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l l l l l l
l l l l
l l l l l
ll l l l l
l l l
l l l l l l
l l l
l l l l l l
l l l
l l l l l l
0.25 l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l l l l l
l l l l l l l l
l
ll l l l l
l l l l l l
l l l l l l
l l l
l l l l l l
l l l
l l l l l l
0.00 l l l
1.00 l
l l
l
l l l l l
l l
l l
l
l l
ll l
l l
l
l
l
l l l l l l l l
l l l l
l l l l l l l l
l l l l l l l l
l l l l l l
l l l l l l
l l l l l ll l l
l l l
l l l l l l l l
l l l l
0.75 l
l l
l
l l
l l
l
l
l
l
l
l
l
l
l l l l
ll l
l
l
l l
l
l
l
l
l
l l
l
l
l l l l l l l l
l l l l
l l l l l l l l
l l l l l l l l
l l l l l l
l l l l l l
X15
l l l l ll l l l
l l l
l l l l l l l l
l l l l
0.50 l
l
l l
l
l
l
l l l l
l
l
l
l
l l
l
l l
l
l
l l
ll
l
l
l
l
l
l
l
l
l
l
l l l l l l l l
l l l l
l l l l l l l l
l l l l l l l l
l l l l l l
l l l l l l
l l l l l ll l l
l l l
l l l l l l l l
l l l l
0.25 l l
l l
l
l
l l
l l l l
l l
l
l
l
l
l
l
l
l l
l l
ll
l
l
l l
l
l
l l
l
l
l
l
l
l l l l l l l l
l l l l
l l l l l l l l
l l l l l l l l
l l l l l l
l l l l l l
l l l l ll l l l
l l l
l l l l l l l l
l l l l
0.00 l l l l l l l l
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Fig. 7.12 The pairwise projections for the variables X11 through X15 in a 40-D Sobol sequence
with 100 points. The upper part of the diagram gives the Spearman correlation (ρ) between the two
variables; the diagonal shows a histogram for each variable
and quasi-Monte Carlo in these regards. The LHS solution does improve noticeably
as N increases, in a similar manner to quasi-Monte Carlo, but there are still small
artifacts from the random sampling inside the strata in LHS (Fig. 7.14).
To explore how increasing the dimension of the input space affects the perfor-
mance of sampling methods, we return to the ADR problem where the value of κ
was the result of a random process. In Sect. 4.3.2 we defined this problem and solved
it with 2000 mesh cells. In this test we will use 40 mesh cells to get a 40-dimensional
input space because that was the largest dimension available for the Sobol sampler
available for Python. The problem sets that value for all the parameters at the mean
of values, except for κ which is set as a Gaussian random process with known
covariance function.
7.4 Comparison of Techniques 169
N = 100 N = 400
0.08
0.06
0.06
0.04
0.04
0.02 0.02
0.00 0.00
density
40 60 80 40 60 80
N = 800 N = 1000
0.06
0.06
0.04
0.04
0.02 0.02
0.00 0.00
40 60 80 40 60 80
Q
Halton LHS Ref Sobol SRS
Fig. 7.13 Empirical distributions of the QoI from the ADR problem using different methods and
number of samples. The reference distribution is a Latin hypercube design with 106 points
l
10−2
l l
10−5 10−3
Relative Error
100 l
10−1 l
l
l
l
l l
10−1 l l
10−2
l
10−2 l
Fig. 7.14 Convergence of the moments of the QoI from the ADR problem using different methods
and number of samples. The reference distribution is a Latin hypercube design with 106 points
estimate than LHS for N around 1000, but this appears to be an anomaly because
as more points are added, the estimate does not improve, and, indeed, the Sobol
estimate of kurtosis has an error of about 100% with N = 105 .
These results demonstrate that as the dimensionality of the space gets bigger,
the QMC approaches we have discussed may not be adequate for estimating the
distributions of QoIs. When the dimensionality of the input space is smaller, as we
saw when p = 5, these methods appear to be superior to the random and design-
based sampling techniques. Also in every case we tested, Latin hypercube sampling
was superior to simple random sampling, making it a method that should be used
when possible over a pure Monte Carlo strategy.
Our discussion of sampling did not include the ideas of importance sampling,
biased sampling techniques, or other specialized Monte Carlo variance reduction
techniques. These techniques are often highly customized to the particular problem
at hand and are, therefore, less amenable to a general prescription for uncertainty
7.5 Notes and References 171
N = 100 N = 1000
0.100
0.10 0.075
0.050
0.05
0.025
0.00 0.000
density
0 20 40 60 80 0 20 40 60 80
N = 10000 N = 100000
0.075 0.075
0.050 0.050
0.025 0.025
0.000 0.000
0 20 40 60 80 0 20 40 60 80
Q
Halton LHS Ref Sobol SRS
Fig. 7.15 Empirical distributions of the QoI from the random process ADR problem using
different methods and number of samples. The reference distribution is a Latin hypercube design
with 107 points
quantification purposes. The works of Robert and Casella (2013) and Kalos and
Whitlock (2008) are appropriate references for these topics.
There also has been recent work on using low-resolution calculations in
Monte Carlo estimates and correcting these low-resolution calculations with high-
resolution calculations in a method known as multilevel Monte Carlo (MLMC)
(Giles 2013; Cliffe et al. 2011). The basic idea is that the numerical calculation of
a QoI at a low resolution Q0 and at higher resolutions Q for = 1, . . . , L can be
combined to form an estimate of the expected value of QL , the highest-resolution
estimate, using the linearity of the expected value operator:
L
E[QL ] = E[Q0 ] + E[Q − Q−1 ].
=1
Then we use different Monte Carlo estimators for each expected value as
1 1
N0 N
E[Q0 ] = Q0,n , E[Q − Q−1 ] = Q,n − Q−1,n .
N0 N
n=1 n=1
172 7 Sampling-Based Uncertainty Quantification: Monte Carlo and Beyond
100
10−2
10−4 10−2
Relative Error
101 102
100
100
10−1
Fig. 7.16 Convergence of the moments of the QoI from the random process ADR problem using
different methods and number of samples. The reference distribution is a Latin hypercube design
with 107 points
7.6 Exercises
1. For the random variable X ∼ N (0, 1), draw 50 samples and generate histograms
using the following sampling techniques:
(a) Simple random sampling,
(b) Stratified sampling,
7.6 Exercises 173
∞ −xt
e
En (x) = dt.
tn
1
∂ψ
μ + σ ψ = 0,
∂x
Reliability methods are a class of techniques that seek to answer the question of
with what probability a QoI will cross some threshold value. The name for the
methods comes from civil engineering where they were originally formulated to
answer the question of when the amount of margin in the system is smaller than
zero, i.e., the system fails. These methods typically try to answer this question using
approximations to the distribution based on a minimal set of evaluations of the QoI.
Reliability methods will try to characterize the safety of the system using a
single number, β, which is the probability of not failing, expressed as the number
of standard deviations above the mean performance where the failure point of the
system is. While it is a laudable goal to have a single metric to report to other
stakeholders and decisionmakers, as we shall see, many details necessarily get
obfuscated in doing so.
As mentioned, reliability methods try to estimate the system performance using
a minimal number of QoI evaluations to infer system behavior: an endeavor that
necessarily requires extrapolation from a few data points to an entire distribution.
This contrasts with the previous chapter on sampling methods where we used actual
samples from the distribution of the QoI to make statements about a distribution,
at the cost of requiring many evaluations of the QoI. As a result of the fewer
evaluations required in reliability analysis, it can be much faster than sampling.
On the other hand, the simplifications made in these methods make it less robust
than sampling techniques. The assumptions and approximations in a reliability
calculation should be noted by a practitioner.
The simplest and least expensive type of reliability method involves extending the
sensitivity analysis we have already completed to make statements about the values
of the distribution. The first-order second-moment (FOSM) method uses first-order
sensitivities to estimate the variance. Then, using the assumption that the value of
the QoI at the mean of the inputs is the mean of the QoI, i.e.,
An additional assumption is that the QoI is normal with a known mean and variance.
∂Q
We use the covariance matrix for the inputs, along with the sensitivities, ∂Xi
, to
estimate the variance as (c.f. Eq. (4.11))
∂Q T ∂Q
Var(Q) ≈ Σ .
∂X ∂X
With the mean and variance in hand, we can then assume, without any justification
at this point, that Q is normally distributed as
∂Q T ∂Q
Q ∼ N Q(x̄), Σ . (8.2)
∂X ∂X
This assumption will only be valid if the QoI is a linear function of the inputs and
if the inputs are independent and normally distributed.
Reliability analysis typically rescales the QoI so that the point we are interested
in, the so-called failure point, is expressed as a quantity, Z, such that failure occurs
when Z < 0. Therefore, we use the failure value of the QoI, Qfail , to define Z:
where Φ(x) is the standard normal CDF. The probability of failure leads to
the definition of the reliability index for the system. The reliability index, β, is
defined as
Qfail − Q(X̄)
β= . (8.5)
∂Q T ∂Q
∂X Σ ∂X
This makes 1 − Φ(β) the estimated probability of failure. β is simply the number
of standard deviations above 0 the mean system performance is. A larger value of β
indicates that the system is farther from the failure point at the nominal conditions.
In other words, β indicates how many standard deviations of margin are available
when the QoI is evaluated at the mean value of the inputs. Of course, there have been
many assumptions that went into calculating β, and one should consider these when
using β to make quantitative statements. On the other hand, even an approximate
indicator like β can be quite useful when comparing two different systems in terms
of reliability.
As a simple example, consider the QoI defined by the linear combination of
independent normal random variables:
The QoI will then be normally distributed with mean 11.5 and standard deviation
of 4.03 as shown in Fig. 8.1. If the failure point is Qfail = 16.5, then the reliability
0.100
0.075 Qfail
Probability Density
0.050
β = 1.24
1 −Φ(β) = 0.107
0.025
βσQ
0.000
0 10 20
Q(x,y)
Fig. 8.1 Illustration of the reliability index for the QoI Q(x, y) = 2x + 0.5y, X ∼ N (5, 2),
Y ∼ N (3, 1) with a failure point Qfail = 16.5. The shaded area is the probability of failure, and
βσQ is the distance from the mean to the failure point
178 8 Reliability Methods for Estimating the Probability of Failure
Probability Density
0.2
0.1
β = 1.806
1 −Φ(β) = 0.0354
EmpiricalPfail = 0.03545
0.0
50 52 54 56 58
Q
Fig. 8.2 Comparison of FOSM with a 4 × 104 sample Monte Carlo design to estimate the
probability of failure in the ADR example with multivariate normal inputs. The solid line is the
distribution fit by FOSM, and the dashed line is empirical probability density. The shaded region
has an area of 1 − Φ(β)
0.100
0.075
Probability Density
0.050
β = 0.509
1 −Φ(β) = 0.305449
0.025
EmpiricalPfail = 0.344489
0.000
50 55 60 65 70
Q
Fig. 8.3 Comparison of FOSM with a 106 sample Monte Carlo design to estimate the probability
of failure for the ADR example where the inputs are gamma random variables joined by a normal
copula. The solid line is the distribution fit by FOSM, and the dashed line is empirical probability
density
deviation that is 10% of the mean. These variables are joined via a normal copula
where the correlation matrix is given by Eq. (4.15).
In Fig. 8.3 the results from a Monte Carlo estimate using 106 samples and FOSM
are shown. While the FOSM estimate for the probability of failure is close to that
estimated via Monte Carlo (30.55% for FOSM and 34.45% for MC), it appears
to be getting the right answer for the wrong reasons. FOSM predicts that values
of Q much larger than the failure point are more probable. Also, the mode of the
empirical distribution from Monte Carlo is not the value of Q(x̄), as assumed in
FOSM.
To demonstrate that the type of distribution used makes significant difference in
the results if FOSM, we change the distribution of κh to be a binomial distribution
with PDF.
This distribution has the same mean and standard deviation as that used previously
but obviously has a much different character. As before, the parameters are joined by
a normal copula. The covariance matrix overall is different and leads to an estimate
of the variance of the QoI that is larger than before. Nevertheless, in this instance
FOSM gives an estimated probability of failure 32 times smaller than observed
empirically if Qfail is set to 75, as shown in Fig. 8.4. This demonstrates that the
accuracy of FOSM is sensitive to the underlying distributions.
180 8 Reliability Methods for Estimating the Probability of Failure
0.08
0.06
Probability Density
0.04 β = 3.613
1 − Φ(β) = 0.000152
EmpiricalPfail = 0.004919
0.02
0.00
60 80 100 120
Q
Fig. 8.4 Comparison of FOSM with a 106 sample Monte Carlo design to estimate the probability
of failure for the ADR example where the input for κh is a binomial distribution with the same
mean and variance as that in the previous example. Note that the vertical axis is proportional to
the square root of the probability density. The solid line is the distribution fit by FOSM, and the
dashed line is empirical probability density
(X − μ)T Σ −1 (X − μ) = β 2 .
8.2 Advanced First-Order Second-Moment Methods 181
x2
Failure Region
Safe Region
XMPF
X0
m2
(X−m)TS −1(X−m)= b 2
x1
m1
Fig. 8.5 Illustration of the advanced first-order second-moment method. The ellipse centered at
the design point, X0 , is the smallest circle in a rescaled coordinate system that touches the failure
surface. The point where they touch is the most probable failure point, XMFP
This ellipse and the failure surface are illustrated in Fig. 8.5.
To use AFOSM we need to determine an equivalent normal variable for each of
the inputs. This will allow us to use the standard normal distribution to estimate the
probability of failure. For each variable we need to determine the mean and standard
deviation for this equivalent normal. To do this we equate the distribution of each
input to a normal distribution at the mean of the distribution using the CDF and PDF
at some point xi :
xi − μi
Φ = FXi (xi ), (8.6)
σi
1 xi − μi
φ = fXi (xi ). (8.7)
σi σi
In these relations we have used the fact that a random variable X ∼ N (μ, σ 2 ) has
a PDF of
1 x−μ
f (x) = φ .
σ σ
It has been noted by Rackwitz and Flessler (1978) that if the original distribution is
skewed, then one can match μi to the median, μi = FX−1 i
(0.5), and then set the σi
using Eq. (8.6).
With these equivalent normal variables, we can infer a multivariate normal
distribution by using the correlation matrix, R, of the inputs. The vector of variables,
Y, defined by
xi − μi
Yi = ,
σi
with correlation matrix R, will be treated as a multivariate normal with zero mean,
unit variance, and known correlation.
What we want to do is find the nearest point to the failure surface Z(X) = Qfail −
Q(X) = 0 when measured in terms of Y(X). In other words, we want to minimize
β ≡ min YT (X)R−1 Y(X). (8.10)
Z(X)=0
Finding this minimum will give the nearest point to the failure surface, relative
to the nominal system performance, where distance is measured in a normalized
coordinate system. This minimum is called the most probable point of failure.
Finding this minimum will require an optimization procedure. Using Lagrange
multipliers, minimizing the function
1 T
g(X, λ) = Y (X)R−1 Y(X) − λ(Qfail − Q(X)), (8.11)
2
will find the minimum β on the failure surface.
Using this objective function, we will find a minimum using an iteration
procedure. We start with a point, X, and its mapping to the equivalent normals,
Y(X). We then seek to find a point, X̂ and the associated Ŷ = Y(X̂) that is on
the failure surface, with a small value of β. Therefore, we take the derivative of
Eq. (8.11) with respect to Ŷ and set it to zero. After some manipulation, we get
Ŷ = λR∇YT Q, (8.12)
Qfail − Q(X) + ∇Y QY
λ= . (8.14)
∇Y QR∇YT Q
Qfail − Q(X) + ∇Y QY
β= . (8.15)
∇Y QR∇YT Q
This result leads to the iteration procedure for determining the β shown in
Algorithm 8.1. Each iteration of the algorithm requires the calculation of the value
of the QoI at a point and the local derivatives at that point. Therefore, it will require
p + 1 QoI evaluations per iteration. For this reason a good initial guess for the most
probable failure point is in order, if possible.
Algorithm 8.1 Algorithm for finding β and most probable failure point using
AFOSM
0. Begin with an initial value for the most probable failure point, X0 , and set = 0.
1. Determine σi , μi using the value of X . Compute Y .
2. Compute the derivatives of Q(X) at point X to form ∇Y Q.
3. Evaluate λ using the formula
2
x2
0 o l
0.05
Iteration
0.2 l0
−2 1
0.5
2
0.791
3
4
−4 −2 0 2 4
x1
Fig. 8.6 Plot of the convergence of the AFOSM method to the most probable point of failure for
several starting points. The solid curve is the failure surface, below which Q(X) < Qfail . Ellipses
of different of magnitudes of YT (X)R−1 Y(X) are drawn to show that the most probable failure
point touches such an ellipse. AFOSM computes β 2 ≈ 0.791
where the input is a multivariate normal with mean vector (0.1, −0.05); the
covariance and correlation matrices are
4 3.9 1 0.65
Σ= , R= ,
3.9 9 0.65 1
which implies σ1 = 2, and σ2 = 3. Because the distributions are normal, the value
of σi and μi are constant over the iterations. The value of Qfail used in this example
is 100. The value of the gradient for this problem is
∇X Q = 6x12 + 10x2 + 1, 10x1 + 9x22 + 1 ,
and
∇Y Q = 12x12 + 20x2 + 2, 30x1 + 27x22 + 3 .
The results from applying AFOSM to this problem are shown in Fig. 8.6. The
results for two different starting points are shown. In these results we can see that
during the iteration procedure, the iterations do not necessarily stay on the failure
surface if it starts there (see iterations that start at the top left). This is due to the
8.2 Advanced First-Order Second-Moment Methods 185
linear approximation of Q(X). In the results we see that the value of β computed
by the method is indeed the value of β 2 = YT (X)R−1 Y(X) such that the ellipse
touches the failure surface.
For this problem, using β = 0.889, the estimated failure probability is 1 −
Φ(β) = 0.187. A 106 Monte Carlo sample gives 0.202 for the failure probability,
a relative difference of about 7%. Using basic FOSM, we get that the probability
of failure is functionally zero, because the estimate of β from Eq. (8.5) is 14.6.
For this problem, AFOSM is necessary to get a reasonable approximation of the
failure rate because the nominal design point X = (0.1, 0 − 0.5) has a value much
smaller than Qfail . In such a problem, it is necessary to include the interactions and
nonlinearities in the QoI to get a good estimate of the behavior of the probability of
failure.
To demonstrate that AFOSM will work when the input distributions are not
normal, we change the problem to have x1 and x2 distributed by independent, i.e.,
R = I, Gumbel distributions (see Eq. (7.10) for the PDF of this distribution) with the
same mean and standard deviation as that used in the previous example. Because the
Gumbel distribution is skewed, we use the median of the distributions for the value
of μi for the equivalent normal distributions and then evaluate Eq. (8.6) to get σi .
The value of σi will change each iteration in this case, resulting in a change in the
formula for ∇Y Q each iteration. The results from applying AFOSM to this problem
are shown in Fig. 8.7. Notice that when the underlying distribution is nonnormal, the
surfaces of equal β 2 are no longer ellipses. Additionally, the mean of the inputs is no
longer located at β = 0. As before, despite different starting points, the method con-
verges to the same point, with a value of β = 1.125. For this value of β, we infer a
failure probability of 0.130, compared with 106 Monte Carlo samples giving 0.169.
For this problem, as before, the value of β from FOSM is much too large at 14.4.
Therefore, while the approximation of AFOSM is not perfect (it underestimated
the failure probability), it is much better than extrapolating from Q(X̄) using the
gradient.
This example demonstrates that AFOSM can be a large improvement over basic
FOSM. The improvement comes at a price in terms of more function evaluations.
While FOSM requires only N + 1 evaluations of the QoI, AFOSM requires
N + 1 function evaluations per iteration. Despite this increase AFOSM is still
relatively inexpensive relative to sampling-based methods. Nevertheless, it is still an
approximate method, and forgetting that AFOSM is based on normal distributions
is a statistical Pelagian error.1
1 The founding assumption of normal distributions could be called the “original sin” of AFOSM.
The Pelagian heresy, named for the fourth-century British theologian Pelagius, rejected the notion
of original sin, among other things. Therefore, ignoring the ramifications of the normal assumption
could be seen as forgetting about this original sin.
186 8 Reliability Methods for Estimating the Probability of Failure
2
x2
0 o l
Iteration
0.1 l0
−2 0.4 1
0.8 2
1.267 3
4
−4 −2 0 2 4
x1
Fig. 8.7 Plot of the convergence of the AFOSM method to the most probable point of failure for
several starting points where the distribution of x1 and x2 are independent Gumbel distributions
with the same mean and standard deviations as the previous example. The solid curve is the failure
surface, below which Q(X) < Qfail . Surfaces of different of magnitudes of YT (X)Y(X) are drawn
to show that the most probable failure point touches such a surface. AFOSM computes β 2 ≈ 1.267.
The black circle indicates the mean of the inputs
It is possible to use an estimate of the second derivative of the QoI to improve on the
reliability methods we have discussed so far. These estimates can be improvements
over those from the methods we have already discussed. The estimate of second
derivatives (and the cross-derivative terms) in problems with a modest number of
inputs can be cost prohibitive (as we saw in Chap. 4). Therefore, rather than discuss
higher-order reliability methods, we will use more general approximations to the
QoI in the form of polynomial chaos expansions in the next chapter.
Many of the topics in this section are presented in the book on reliability analysis
by Haldar and Mahadevan (2000). Also, the review by Bastidas-Arteaga and
Soubra (2006) is a useful reference. The references in these two works are useful
because much of the reliability analysis literature is contained in domain-specific
publications.
8.5 Exercises 187
8.5 Exercises
1. Repeat the example in Sect. 8.2 where the distributions of x1 and x2 are Gumbel
distributions with the same mean and standard deviation used previously. Use a
Frank Copula with θ = 0, 1, 5, 10, 20 to join the input parameter distributions.
How does the most probable failure point change with the changes in the copula?
2. Consider the Rosenbrock function: f (x, y) = (1 − x)2 + 100(y − x 2 )2 . Assume
that x = 2t − 1, where T ∼ B(3, 2) and y = 2s − 1, where S ∼ B(1.1, 2).
Estimate β and the probability that f (x, y) is less than 10 using
(a) FOSM
(b) Advanced FOSM
3. Using a discretization of your choice, solve the equation
∂u ∂u ∂ 2u
+v = D 2 − ωu,
∂t ∂x ∂x
for u(x, t) on the spatial domain x ∈ [0, 10] with periodic boundary conditions
u(0− ) = u(10+ ) and initial conditions
1 x ∈ [0, 2.5]
u(x, 0) = .
0 otherwise
Compute the probability that this quantity of interest is greater than 0.035 using
FOSM and AFOSM using the following distributions:
(a) μv = 0.5, σv = 0.1,
(b) μD = 0.125, σD = 0.03,
(c) μω = 0.1, σω = 0.05,
How do these results change with changes in Δx and Δt?
Chapter 9
Stochastic Projection and Collocation
single random variables, before embarking on multivariate expansions and the ideas
of sparse quadrature. A natural starting point is a QoI that is a function of single,
standard normally distributed random variable.
The quote used at the beginning of the chapter is related to the way many
students and instructors find this subject. For the students much of the notation and
the various competing definitions for basis functions make the application of these
methods precarious. For the instructor the task of giving students adequate coverage
of the topic is difficult without spending large amounts of time defining special
functions and quadrature rules and writing out multidimensional expansions. This
chapter seeks to give adequately explained and detailed projection techniques with
fully worked examples to make the topic readily digestible and applicable to real-
world problems.
The Hermite polynomials,1 Hen (x), are a set of orthogonal polynomials that form a
basis for square-integrable functions on the real line with weight,
w(x) = e−x
2 /2
,
∞
x2
g(x), h(x) = g(x)h(x) e− 2 dx,
−∞
i.e., the polynomials form an orthogonal basis for L2 (R, w(x) dx). The Hermite
polynomials are defined as
x2 d n − x2
Hen (x) = (−1)n e 2 e 2. (9.1)
dx n
1 There are two definitions of the Hermite polynomials that are scalings of each other. We use the
“probabilist” version of the functions because of similarities with the standard normal distribution
in the weighting function. The “physicist” version of the polynomials is slightly different and forms
a natural expression of the quantum harmonic oscillator.
9.1 Hermite Expansions for Normally Distributed Parameters 191
He0 (x) = 1,
He1 (x) = x,
He2 (x) = x 2 − 1,
He3 (x) = x 3 − 3x,
He4 (x) = x 4 − 6x 2 + 3,
He5 (x) = x 5 − 10x 3 + 15x.
∞ √
x2
Hem (x)Hen (x) e− 2 dx = 2π n!δnm . (9.2)
−∞
Consider a function g(x) where x ∼ N (0, 1). The value of the function is also a
random variable that we will call G ∼ g(x). If we compute the zeroth order constant
in the Hermite expansion of this function, we get
∞
g(x) − x 2
c0 = √ e 2 dx = E[G] = ḡ. (9.5)
2π
−∞
In other words, the constant c0 in the expansion is the mean of the random
variable G.
192 9 Stochastic Projection and Collocation
∞
∞
2
1 x2
Var(G) = √ cn Hen (x) e− 2 dx − c02
2π n=0
−∞
∞
1 2
=√ cn Hen (x), Hen (x) − c02 (9.6)
2π n=0
∞
= n!cn2 .
n=1
Here we have used the orthogonality of the Hermite polynomials to get the second
equation, followed by the value of the integral in Eq. (9.2) to get the final result.
As an example, let us consider the function g(x) = cos(x). In this case we can
directly compute the expansion coefficients:
∞
1 −x 2 /2 0 n odd
cn = √ cos(x)Hen (x)e dx = n e−1/2
. (9.7)
2π n! (−1) 2
n! n even
−∞
1 n Hen (x)
cos(x) = e− 2 (−1) 2 , x ∼ N (0, 1). (9.8)
n even
n!
This implies that the mean of g(x) is e−1/2 and that the variance is
1
Var(G) = e−1 = e−1 (cosh(1) − 1) ≈ 0.19978820.
n!
n even, n>1
We can get a baseline for comparison between the expansion and the actual
distribution of G by sampling a value for x from a standard normal and then
evaluating g(x). The resulting distribution is a Monte Carlo approximation to the
true distribution of G. We then can compare that to the values obtained by sampling
x and then evaluating the expansion in Eq. (9.8) with different orders of expansion.2
These results are shown in Fig. 9.1.
In these results we see the improvement obtained as we go to higher order
expansions. The zeroth-order expansion only gives a value of the mean, and there is
a large improvement in going to the second-order expansion. There is a noticeable
2 For
a function that is expensive to evaluate, we may not be able to estimate the distribution using
Monte Carlo. Nevertheless, sampling x and then evaluating a polynomial of x is basically free.
9.1 Hermite Expansions for Normally Distributed Parameters 193
10.0 Exact
0th Order
2nd Order
7.5 4th Order
6th Order
8th Order
density
5.0
2.5
0.0
−1.0 −0.5 0.0 0.5 1.0
g(x)
Fig. 9.1 PDF of the random variable g(x) = cos(x), where x ∼ N (0, 1), and various
approximations. This figure was generated from 106 samples of x that were used to evaluate g(x)
and the various approximations
difference between fourth- and second-order expansions, though beyond that, there
is little difference in the figure. We can track improvement in the higher-order
expansions by looking at the convergence of the variance. In Table 9.2 we show
that adding more terms to the expansion does improve the estimate of the variance,
though modestly beyond the second-order expansion. The values in this table were
computed using Mathematica.
If the random variable is normal, but not standard normal, then we need to change
the procedure a bit. We say that g(x) is a function of the random variable x ∼
N (μ, σ 2 ). In this case we will change variables to express the function as g(Z)
194 9 Stochastic Projection and Collocation
E[x] = E[μ + σ Z] = μ + σ
E[Z].
The bounds of the inner product’s integration are not affected because they are
infinite; this may not be the case when we have bounded random variables.
Going back to our example from before where g(x) = cos(x), we now say that
x ∼ N (μ = 0.5, σ 2 = 4). Evaluating the integrals for the coefficients in Eq. (9.10)
gives the following expansion, to fifth order,
−2 2 1
cos(x) ≈ e 1 − 2He2 (z) + He4 (z) cos
3 2
4 4 1
+e−2 2He1 (z) + He3 (z) − He5 (z) sin
3 15 2
(9.11)
The mean is
−2 1
ḡ = e cos ≈ 0.1187678845769458,
2
10.0 Exact
0th Order
1st Order
2nd Order
7.5
3rd Order
4th Order
density
5th Order
5.0
6th Order
2.5
0.0
−1.0 −0.5 0.0 0.5 1.0
g(x)
Fig. 9.2 PDF of the random variable g(x) = cos(x), where x ∼ N (μ = 0.5, σ 2 = 4), and
various approximations. This figure was generated from 106 samples of x that were used to evaluate
g(x) and the various approximations
In this example the variance also takes longer to converge. In Table 9.3, we see
that even the sixth-order expansion only has 1 digit correct.
Recall that our ultimate goal is to use polynomial expansions to provide information
about the distribution of output quantities from a computer simulation. To that end
we will need to estimate the coefficients in the Hermite expansion. If we use a
quadrature rule to estimate the integrals in these coefficients, then we would like a
quadrature rule to require as few evaluations of the integrand as possible, because
each evaluation requires running a new simulation at a different point in input space.
196 9 Stochastic Projection and Collocation
The most common way to approximate the required integrals is to use Gauss-
Hermite quadrature, which is a Gauss quadrature rule for computing integrals of the
form
∞
n
f (x)e−x dx ≈
2
wi f (xi ), (9.12)
−∞ i=1
where the abscissas, xi , are given by the n roots of Hen (x), and the weights are
given by
√
πn!
wi = √ 2 . (9.13)
2
n Hen−1 2xi
Gauss quadratures are defined so that the maximum degree polynomial is exactly
integrated given a specified number of function evaluations. In particular, a Gauss
quadrature rule using n points will exactly integrate a polynomial of degree 2n − 1.
That this is possible can be seen by noting that a polynomial of degree 2n − 1 has 2n
coefficients and an n point quadrature rule has 2n degrees of freedom: n points and
n weights. To determine the quadrature points and weights, one can use a variety of
computational techniques, such as the Golub and Welsch algorithm. See Townsend
(2015) for an interesting discussion of the history of algorithms for computing the
quadrature rules.
The values for the weights and abscissas given up to n = 6 are given in Table 9.4.
Note that the points are symmetric about the origin, so we only give the magnitude
of the abscissas.
This quadrature set has the standard features of Gauss quadrature. The rule will
be exact when f (x) is a polynomial of degree 2n − 1 or less.
9.1 Hermite Expansions for Normally Distributed Parameters 197
10.0 Exact
n=2
n=4
7.5 n=6
n=8
n = 10
density
n = 100
5.0
2.5
0.0
−1.0 −0.5 0.0 0.5 1.0
g(x)
Fig. 9.3 PDF of the random variable g(x) = cos(x), where x ∼ N (μ = 0.5, σ 2 = 4) using
a fifth-order Hermite expansion with various Gauss-Hermite quadrature rules to approximate the
coefficients. This figure was generated from 105 samples of x that were used to evaluate g(x) and
the various approximations
There is a slight
issue
in Gauss-Hermite
quadrature
in that it uses the a weight
function of exp −x 2 , rather than exp −x 2 /2 that we used in our inner product
√
definition.3 Therefore, we need to make the change of variable x → x / 2. This
makes the approximation to the inner product
√ n √
g(x), Hem (x) ≈ 2 wi g( 2xi ). (9.14)
i=1
3 We could have defined our Gaussian quadrature rule to have the same weight function as we used
in our expansion. Nevertheless, most readily accessible tabulations of Hermite quadrature use the
weighting function used herein.
198 9 Stochastic Projection and Collocation
Table 9.5 The convergence of the first six coefficients in the Hermite polynomial expansion
g(x) = cos(x), where x ∼ N (μ = 0.5, σ 2 = 4) as estimated by different Gauss-Hermite
quadrature rules
n c0 c1 c2 c3 c4 c5
2 −0.365203 −0.435940 −0.000000 0.145313 0.030434 −0.021797
3 0.307609 0.087730 −0.569973 −0.000000 0.142493 −0.004386
4 0.065646 −0.219271 −0.023343 0.173281 0.000000 −0.034656
5 0.130446 −0.103803 −0.322800 0.037629 0.141446 0.000000
6 0.116662 −0.135589 −0.213171 0.104748 0.048382 −0.028531
7 0.119090 −0.128702 −0.242956 0.081489 0.089843 −0.012370
8 0.118725 −0.129931 −0.236549 0.087602 0.076377 −0.018886
9 0.118773 −0.129744 −0.237688 0.086315 0.079768 −0.016907
10 0.118767 −0.129769 −0.237515 0.086541 0.079075 −0.017382
100 0.118768 −0.129766 −0.237536 0.086511 0.079179 −0.017302
When the input parameter is not normally distributed, we need a different polyno-
mial expansion to approximate the mapping from input parameter to output random
variable. We will cover three such cases, as enumerated in Table 9.1. First, we tackle
uniform random variables.
Consider a random variable x that is uniformly distributed in the range [a, b]. In this
case we write x ∼ U [a, b].4 Additionally, the PDF of x is
1
x ∈ [a, b]
f (x|a, b) = b−a . (9.15)
0 otherwise
The mean of a uniform distribution is (a + b)/2, and the variance is (b − a)2 /12.
As with normal random variables, it is useful to convert general uniform random
variables to a standardized random variable.5 In this case, we map the interval [a, b]
to [−1, 1] to correspond with the support with the standard definition of Legendre
polynomials. In particular, if Z ∼ U [−1, 1], then
b−a a+b
x= z+ , (9.16)
2 2
3 1
2 (5x
3 − 3x )
4 1
8 (35x
4 − 30x 2 + 3)
5 1
8 (63x
5 − 70x 3 + 15x)
6 1
16 (231x
6 − 315x 4 + 105x 2 − 5)
7 1
16 (429x
7 − 693x 5 + 315x 3 − 35x)
8 1
128 (6435x
8 − 12012x 6 + 6930x 4 − 1260x 2 + 35)
9 1
128 (12155x
9 − 25740x 7 + 18018x 5 − 4620x 3 + 315x)
10 1
256 (46189x
10 − 109395x 8 + 90090x 6 − 30030x 4 + 3465x 2 − 63)
and
a + b − 2x
z= . (9.17)
a−b
b 1
1 1 b−a a+b
E[g(x)] = g(x) dx = g z+ dz. (9.18)
b−a 2 2 2
a −1
For a function on the range [−1, 1], the Legendre polynomials form an orthogo-
nal basis. The Legendre polynomials are defined as
1 dn 2
Pn (x) = (x − 1)n . (9.19)
2n n! dx n
The first ten of these polynomials are given in Table 9.6.
The orthogonality relation for Legendre polynomials is written as
1
2
Pn (x)Pn (x) dx = δnn . (9.20)
2n + 1
−1
∞
a + b − 2x
g(x) = cn Pn , x ∈ [a, b], (9.21)
a−b
n=0
where cn is defined by
1
2n + 1 b−a a+b
cn = g z+ Pn (z) dz. (9.22)
2 2 2
−1
1
1 b−a a+b
c0 = g z+ dz
2 2 2
−1
b
1
= g (x) dx (9.23)
b−a
a
= E[G].
Additionally, the variance of the G is equivalent to the sum of the squares of the
coefficients with n ≥ 1:
1
∞
2
1
Var(G) = cn Pn (z) dz − c02
2
−1 n=0
∞
cn2
= . (9.24)
2n + 1
n=1
1 1
2n + 1 2n + 1
cn = cos (π z + π ) Pn (z) dz = − cos (π z) Pn (z) dz.
2 2
−1 −1
(9.25)
This makes the expansion, through sixth-order
15 45 4π 2 − 42
cos(x) ≈ 2 P2 (z) + P4 (z)
π 2π 4
273 7920 − 960π 2 + 16π 4
+ P6 (z) x ∼ U (0, 2π ), (9.26)
16π 6
9.2 Generalized Polynomial Chaos 201
Exact
2nd Order
4th Order
6
6th Order
8th Order
density
0
−1.0 −0.5 0.0 0.5 1.0
g(x)
Fig. 9.4 PDF of the random variable g(x) = cos(x), where x ∼ U (0, 2π ), and various
approximations. This figure was generated from 106 samples of x that were used to evaluate g(x)
and the various approximations
and z is related to x via Eq. (9.17). The variance of this function is given by
2π
1 1
Var(G) = cos2 (x) dx = . (9.27)
2π 2
0
1
n
f (z) dz ≈ wi f (zi ), (9.28)
−1 i=1
where the zi are the roots of Pn , and the weights are given by
2
wi = . (9.29)
1 − zi [Pn (zi )]2
2
10.0 Exact
n=2
n=4
7.5 n=6
n=8
n = 10
density
n = 100
5.0
2.5
0.0
−1.0 −0.5 0.0 0.5 1.0
g(x)
Fig. 9.5 PDF of the random variable g(x) = cos(x), where x ∼ U (0, 2π ) using a fifth-order
Legendre expansion with various Gauss-Legendre quadrature rules to approximate the coefficients.
This figure was generated from 106 samples of x that were used to evaluate g(x) and the various
approximations
Table 9.9 The convergence of the first six coefficients in the Legendre polynomial expansion
g(x) = cos(x), where x ∼ U (0, 2π ) as estimated by Gauss-Legendre quadrature rules using
different values of n
n c0 c1 c2 c3 c4 c5
2 0.240619 0.000000 0.000000 0.000000 −0.842165 0.000000
3 −0.022454 0.000000 1.955092 0.000000 −2.639374 0.000000
4 0.001068 0.000000 1.478399 0.000000 −0.000000 0.000000
5 −0.000031 0.000000 1.521801 0.000000 −0.637516 0.000000
6 0.000001 0.000000 1.519760 0.000000 −0.579819 0.000000
7 0.000000 0.000000 1.519819 0.000000 −0.582523 0.000000
8 0.000000 0.000000 1.519818 0.000000 −0.582445 0.000000
9 0.000000 0.000000 1.519818 0.000000 −0.582447 0.000000
10 0.000000 0.000000 1.519818 0.000000 −0.582447 0.000000
100 0.000000 0.000000 1.519818 0.000000 −0.582447 0.000000
A random variable that takes on a value in the range, [−1, 1], can often be described
by a beta distribution.6 A random variable Z that is beta-distributed is written as
6 The definition of the beta distribution used here is not the typical statistician’s distribution. That
distribution has support on [0, 1] and uses parameters α and β that are equal to α = α + 1,
204 9 Stochastic Projection and Collocation
Z ∼ B(α, β), where α > −1 and β > −1 are parameters. The PDF for Z is
given by
2−(α+β+1) Γ (α + 1) + Γ (β + 1)
f (z) = (1 + z)β (1 − z)α z ∈ [−1, 1].
α+β +1 Γ (α + β + 1)
(9.30)
The reason that is sometimes called a beta distribution is that the PDF can be
expressed in terms of the beta function, B(α, β)
Γ (α)Γ (β)
B(α, β) = , (9.31)
Γ (α + β)
as
2−(α+β+1)
f (z) = (1 + z)β (1 − z)α z ∈ [−1, 1]. (9.32)
B(α + 1, β + 1)
There is some subtlety regarding the support of z. If α or β is less than 0, then
one or both of the endpoints is excluded due to a singularity. The PDF for various
values of α and β is shown in Fig. 9.6.
As before, we can scale the distribution to a general range x ∈ [a, b] using
Eqs. (9.16) and (9.17). The expectation operator in this case is given by
1
b−a a + b 2−(α+β+1) (1 + z)β (1 − z)α
E[g(x)] = g z+ dz. (9.33)
2 2 B(α + 1, β + 1)
−1
From this we get following for a beta distribution on the range [a, b]:
(α,β)
The Jacobi polynomials, Pn (z), are orthogonal polynomials under the weight
(1 − z)α (1 + z)β for the interval z ∈ [−1, 1]. These polynomials can be defined in
several ways, including the Rodrigues-type formula:
(−1)n n * n +
−α −β d
Pn(α,β) (z) = (1 − z) (1 + z) (1 − z) α
(1 + z) β
1 − z 2
.
2n n! dzn
(9.35)
The general form of these polynomials is given up to order 3 in Table 9.10. Note
that when α = β = 0, these polynomials are the Legendre polynomials.
and β = β + 1. As we will see the definition in Eq. (9.30) is well-suited to expansion in Jacobi
polynomials.
9.2 Generalized Polynomial Chaos 205
2.5
α= −0.5, β= −0.5
α= 4, β= 0
2.0 α= 0, β= 4
α= 1, β= 1
α= 1, β= 4
1.5
f(z)
1.0
0.5
0.0
Fig. 9.6 PDF Z ∼ B(α, β) for several values of α and β. Note that when α = β the distribution
is symmetric about 12 , and swapping α and β creates mirror images
2 1
2 (α + 1) (α + 2) + 18 (z − 1)2 (α + β + 3) (α + β + 4) + 12 (z − 1) (α + 2) (α + β + 3)
3 1
6 (α + 1) (α + 2) (α + 3) + 1
48 (z − 1)3 (α + β + 4) (α + β + 5) (α + β + 6)
+ 18 (z − 1)2 (α + 3) (α + β + 4) (α + β + 5) + 14 (z − 1) (α + 2) (α + 3) (α + β + 4)
where
1
g(z), h(z) = (1 − z)α (1 + z)β g(z)h(z) dz. (9.37)
−1
206 9 Stochastic Projection and Collocation
Note that if n = 0, then we can use the identity Γ (z + 1) = zΓ (z) to get the
normalization constant used in the PDF for the beta distribution:
2α+β+1 Γ (α + 1)Γ (β + 1)
= 2α+β+1 B(α + 1, β + 1).
α + β + 1 Γ (α + β + 1)
A function that is square-integrable with respect to the inner product in Eq. (9.37)
can be written as
∞
a + b − 2x
g(x) = cn Pn(α,β) , x ∈ [a, b], (9.38)
a−b
n=0
1
b−a a+b
cn = Pn(α,β) (z)Pn(α,β) (z)−1 g z+ Pn(α,β) (z)(1−z)α (1+z)β dz.
2 2
−1
(9.39)
1
2−(α+β+1) b−a a+b
c0 = g z+ (1 − z)α (1 + z)β dz = E[g(x)].
B(α + 1, β + 1) 2 2
−1
(9.40)
Also, by construction the variance in g(x) is the sum of the squares of the cn for
n > 0:
∞
2−(α+β+1)
Var(G) = E[g (x)] − (E[g(x)]) =
2 2
cn2 Pn(α,β) (z)Pn(α,β) (z).
B(α + 1, β + 1)
n=1
(9.41)
1
cn = Pn(α,β) (z)Pn(α,β) (z)−1 cos (π z + π ) Pn(4,1) (z) dz. (9.42)
−1
9.2 Generalized Polynomial Chaos 207
0.5
0.4
0.3
density
0.2
0.1
0.0
0 2 4 6
x
Fig. 9.7 Density plot of 106 samples of x = π z + π where Z ∼ B(4, 1). These samples were
used to generate the results in Fig. 9.8
There is not a tidy formula for the coefficients, but we can calculate them (with the
help of Mathematica). The mean value of G ∼ cos(x) is
15 π 2 − 9
c0 = − ≈ −0.0669551. (9.43)
2π 4
The expansion, through third-order is
15 π 2 − 9 6 315 − 60π 2 + 2π 4 (4,1)
cos(x) ≈ − + P1 (z)
2π 4 π6
35 630 − 75π 2 + π 4 (4,1)
− P2 (z)
2π 6
12 −51975 + 8190π 2 − 315π 4 + 2π 6 (4,1)
+ P3 (z) Z ∼ B(4, 1),
π8
(9.44)
1
2−(α+β+1)
Var(G) = cos2 (π z + π )(1 − z)α (1 + z)β dz
B(α + 1, β + 1)
−1
2
15 π 2 − 9
− (9.45)
2π 4
2
1 135 60 225 π 2 − 9
= 4
+ 32 − 2 − ≈ 0.4221832.
64 π π 4π 8
The convergence of the variance estimate is given in Table 9.12. Notice that at
fourth-order, the estimate is correct to three digits.
The convergence of the approximation to G as a function of the order of the
Jacobi expansion is shown in Fig. 9.8. The “exact” distribution is determined by
evaluating g(x) at the 106 points shown in Fig. 9.7. By the fourth-order expansion,
the overall character of the true distribution is captured. The eighth-order expansion
is indistinguishable from the exact distribution.
Exact
2nd Order
4 4th Order
6th Order
8th Order
3
density
0
−1.0 −0.5 0.0 0.5 1.0 1.5
g(x)
Fig. 9.8 PDF of the random variable g(x) = cos(x), where x = π z + π and Z ∼ B(4, 1),
and various approximations. This figure was generated from 106 samples of x that were used to
evaluate g(x) and the various approximations
1
n
f (z)(1 − z)α (1 + z)β dz ≈ wi f (zi ). (9.46)
−1 i=1
(α,β)
The abscissas, zi , for the quadrature rule are the n roots of Pn (z), and the weights
are given by
2n + α + β + 2 Γ (n + α + 1)Γ (n + β + 1) 2α+β
wi = (α,β)
.
n + α + β + 1 Γ (n + α + β + 1)(n + 1)! Pn (α,β)
(zi )Pn+1 (zi )
(9.47)
Beyond n = 1 the formulas for the weights and abscissas will not fit on a page, so
they do not appear here.
For our example from above, where Z ∼ B(4, 1), the quadrature rules are given
in Table 9.13. Notice that unlike Gauss-Legendre quadrature rules, these rules are
210 9 Stochastic Projection and Collocation
− 23 48
35
3 0.273378 0.213558
−0.313373 1.121472
−0.778187 0.798303
4 0.451910 0.062182
−0.037021 0.545298
−0.497091 1.049649
−0.840875 0.476204
5 0.573288 0.019805
0.169240 0.233970
−0.247188 0.732908
−0.615377 0.850154
−0.879964 0.296496
not symmetric about the origin. Moreover, the weights sum to the integral of the
weight function over the domain:
n 1
32
wi = (1 − z)4 (1 + z) dz = . (9.49)
15
i=1 −1
In the final class of random variable, we will consider gamma random variables.
These random variables have support on (0, ∞), and if x is a gamma-distributed
random variable, we will write x ∼ G (α, β) where the PDF of the random
variable is7
7 There
are several definitions of gamma random variables. One common definition has a different
parameter α = α + 1, but the same parameter β.
9.2 Generalized Polynomial Chaos 211
10.0 Exact
n=2
n=4
n=6
n=8
7.5 n = 10
n = 100
density
5.0
2.5
0.0
-1.0 -0.5 0.0 0.5 1.0
g(x)
Fig. 9.9 PDF of the random variable g(x) = cos(x), where x = π z + π and Z ∼ B(4, 1)
using a sixth-order Jacobi expansion with various Gauss-Jacobi quadrature rules to approximate
the coefficients. This figure was generated from 106 samples of x that were used to evaluate g(x)
and the various approximations
Table 9.14 The convergence of the first seven coefficients in the Jacobi polynomial expansion
g(x) = cos(x), where x = π z + π and Z ∼ B(4, 1) as estimated by Gauss-Jacobi quadrature
rules using different values of n
n c0 c1 c2 c3 c4 c5 c6
2 −0.035714 −0.642857 0.000000 0.589286 −0.157292 −0.259369 −0.055473
3 −0.069292 −0.503277 0.282089 0.000000 −0.280037 0.478186 −0.131973
4 −0.066861 −0.514456 0.229440 0.132105 −0.000000 −0.135492 −0.210799
5 −0.066957 −0.513982 0.233355 0.120895 −0.058189 0.000000 0.060564
6 −0.066955 −0.513994 0.233197 0.121391 −0.053616 −0.011632 −0.000000
7 −0.066955 −0.513994 0.233201 0.121378 −0.053807 −0.011110 0.004949
8 −0.066955 −0.513994 0.233201 0.121378 −0.053802 −0.011124 0.004737
9 −0.066955 −0.513994 0.233201 0.121378 −0.053802 −0.011124 0.004742
10 −0.066955 −0.513994 0.233201 0.121378 −0.053802 −0.011124 0.004742
100 −0.066955 −0.513994 0.233201 0.121378 −0.053802 −0.011124 0.004742
β (α+1) x α e−βx
f (x) = , x ∈ (0, ∞), α > −1, β > 0. (9.50)
Γ (α + 1)
The distribution gets its name from the appearance of the gamma function in the
PDF.
As in other variables, it will be useful to have a standardized gamma random
variable. In this case we define a Z ∼ G (α, 1), so that Z has the PDF
212 9 Stochastic Projection and Collocation
0.5 α= 0, β= 0.5
α= 1, β= 2.5
α= 1, β= 1
α= 1, β= 0.5
0.4
α= 6.5, β= 1
0.3
f(x)
0.2
0.1
0.0
0 5 10 15 20
x
Fig. 9.10 PDF x ∼ G (α, β) for several values of α and β. Note that adjusting α moves the peak
of the distribution and β scales the distribution along x
zα e−z
f (z) = , z ∈ (0, ∞), α > −1. (9.51)
Γ (α + 1)
z = βx. (9.52)
The PDF for a gamma random variable with several different values for the α
and β parameters is shown in Fig. 9.10. Here we see that α moves the peak of the
distribution and that β, as we mentioned above, scales the distribution.
The expectation operator for a gamma random variable can be written as
∞ ∞ α −z
β (α+1) x α e−βx z z e
E[g(x)] = g(x) dx = g dz. (9.53)
Γ (α + 1) β Γ (α + 1)
0 0
α+1 α+1
x̄ = , Var(x) = . (9.54)
β β2
9.2 Generalized Polynomial Chaos 213
x −α ex d n −x n+α
n (x) =
L(α) e x . (9.55)
n! dx n
Some low-order generalized Laguerre polynomials are given in Table 9.15.
The generalized Laguerre polynomials have the following orthogonality condi-
tion
∞
Γ (n + α + 1)
x α e−x L(α)
n (x)Lm (x) dx =
(α)
δn,m . (9.56)
0 n!
The generalized Laguerre polynomials form a basis for functions on (0, ∞) that
are square-integrable with the inner product
∞
g(z), h(z) = zα e−z g(z)h(z) dz. (9.57)
0
Therefore, we can write a function g(x) where x ∼ G (α, β) using the following
expansion
∞
g(x) = cn L(α)
n (βx) , (9.58)
n=0
The value of c0 is once again the mean of G ∼ g(x) where x ∼ G (α, β):
∞ α −z
z z e
c0 = g dz = E[g(x)]. (9.60)
β Γ (α + 1)
0
214 9 Stochastic Projection and Collocation
The variance of G is related to the sum of the squares of the expansion coefficients:
∞
∞
2
zα e−z
Var(G) = cn L(α)
n (z) dz − c02
Γ (α + 1)
0 n=0
∞
Γ (n + α + 1)
= cn2 . (9.61)
Γ (α + 1)n!
n=1
12 44 28 2
cos(x) ≈ + (2 − 2x) + 2x − 6x + 3
25 125 625
656
+ x 3 − 6x 2 + 9x − 3 x ∼ G (1, 2).
9375
(9.64)
Exact
2nd Order
4 4th Order
6th Order
8th Order
3
density
0
−1.0 −0.5 0.0 0.5 1.0 1.5
g(x)
Fig. 9.11 PDF of the random variable g(x) = cos(x), where x ∼ G (1, 2), and various
approximations. This figure was generated from 106 samples of x that were used to evaluate g(x)
and the various approximations
The abscissas, zi , for the quadrature rule are the n roots of L(α)
n (z), and the weights
are given by
Γ (n + α)zi
wi = . (9.67)
n!(n + α)(Lαn−1 (zi ))2
216 9 Stochastic Projection and Collocation
(α + 1)Γ (a + 1)
x1 = 1 + α, w1 = . (9.68)
a+1
For n = 2 we have
√
√ 3 ± 3 Γ (a + 2)
x1,2 = α ± α + 2, w1,2 = √ 2 . (9.69)
2(a + 2) a + 1 − 3 ± 3
Beyond second-order the quadratures are too lengthy to write for a general value of
α. Note that if α = 0, then the quadrature rule reduces to simple Gauss-Laguerre
quadrature.
To use the generalized Gauss-Laguerre quadratures to compute the inner products
for the Laguerre expansion of a gamma-distributed random variable, x ∼ G (α, β) as
∞
n
z zi
f zα e−z dz ≈ wi f . (9.70)
β β
0 i=1
For our example from above, where x ∼ G (1, 2), the quadrature rules are given
in Table 9.17. In this case the weights sum to the integral of the weight function over
the domain:
n ∞
wi = ze−z dz = 2. (9.71)
i=1 0
9.2 Generalized Polynomial Chaos 217
10.0 Exact
n=2
n=4
n=6
7.5 n=8
n = 10
n = 100
density
5.0
2.5
0.0
-1.0 -0.5 0.0 0.5 1.0
g(x)
Fig. 9.12 PDF of the random variable g(x) = cos(x), where x ∼ G (1, 2) using a sixth-order
Laguerre expansion with various Gauss-Laguerre quadrature rules to approximate the coefficients.
This figure was generated from 106 samples of x that were used to evaluate g(x) and the various
approximations
We can use our previous example, of g(x) = cos(x), where x ∼ G (1, 2) as a test
of estimating the coefficients using generalized Gauss-Laguerre quadrature rules.
In Fig. 9.12, the distribution, as approximated by a fifth-order Laguerre expansion,
is computed using generalized Gauss-Laguerre quadratures of different values of
n. The distribution at about n = 8 the approximation is fairly accurate. We can
see the convergence in the coefficients with the number of quadrature points in
Table 9.14. This table bears out the observation that n = 8 is an adequate level
of approximation.
The examples we have seen so far have been functions that have been simple
to evaluate. In such examples, there is no benefit to minimizing the number of
function evaluations. For a more realistic example where the function evaluations
are expensive, but not too expensive, we will look at a quantity related to the solution
of the 2-D Poisson’s equation with Dirichlet boundary conditions:
2
∂ ∂2
+ u(x, y) = −q(x, y). (9.72)
∂x 2 ∂y 2
Table 9.18 The convergence of the first six coefficients in the generalized Laguerre polynomial
expansion g(x) = cos(x), where x ∼ G (1, 2) as estimated by generalized Gauss-Laguerre
quadrature rules using different values of n
n c0 c1 c2 c3 c4 c5
2 0.484528 0.438701 0.000000 −0.219350 −0.223933 −0.140776
3 0.478523 0.343285 0.077209 −0.000000 −0.046325 −0.099540
4 0.480185 0.352313 0.038293 −0.054229 −0.000000 0.036153
5 0.479984 0.352043 0.045559 −0.053931 −0.036908 −0.000000
6 0.480001 0.351990 0.044746 −0.052110 −0.029267 −0.004078
7 0.480000 0.352001 0.044801 −0.052532 −0.029939 −0.000867
8 0.480000 0.352000 0.044800 −0.052475 −0.029968 −0.001564
9 0.480000 0.352000 0.044800 −0.052480 −0.029949 −0.001480
10 0.480000 0.352000 0.044800 −0.052480 −0.029952 −0.001484
100 0.480000 0.352000 0.044800 −0.052480 −0.029952 −0.001485
The center of the normal in the y-coördinate will be a uniform random variable
in the range [−0.25, 0.25] (i.e., ω ∼ U (−0.25, 0.25)). We are interested in the
integral over a quarter of the domain. Our quantity of interest is therefore
1 1
g(ω) = dx dy u(x, y; ω). (9.75)
0 0
The notation u(x, y; ω) denotes that the solution depends on the center of Gaussian
ω (Table 9.18).
Because ω is a uniform random variable, we will use a Legendre expansion to
compute an approximation to G ∼ g(ω). From Eq. (9.22), we are interested in
computing the integral
1
2n + 1 z
cn = g Pn (z) dz. (9.76)
2 4
−1
Table 9.19 The convergence of the first six coefficients in the 2-D Poisson’s equation example as
a function of the number of Gauss-Legendre quadrature points used
n c0 c1 c2 c3 c4 c5
1 0.386712 0.000000 −0.966780 0.000000 1.305153 0.000000
2 0.381378 0.000000 −0.000000 −0.000000 −1.334823 −0.000000
3 0.381406 −0.000000 −0.010613 −0.000000 0.014327 0.000000
4 0.381406 −0.000000 −0.010559 0.000000 −0.000000 0.000000
5 0.381406 0.000000 −0.010559 0.000000 0.000071 −0.000000
6 0.381406 −0.000000 −0.010559 0.000000 0.000071 −0.000000
7 0.381406 −0.000000 −0.010559 0.000000 0.000071 −0.000000
8 0.381409 0.000000 −0.010567 −0.000000 0.000079 0.000000
9 0.381406 0.000000 −0.010559 −0.000000 0.000071 −0.000000
10 0.381406 0.000000 −0.010559 −0.000000 0.000071 −0.000000
Note, to compute the cn in this case will require solving Poisson’s equation twice,
each time with different sources, and computing the integral in Eq. (9.75). There are,
at least, a countably infinite number of ways to estimate the solution to Poisson’s
equation. Here we will use Mathematica’s NDSolve function. Solving Poisson’s
equation with these two values of ω gives
1 1
g − √ = 0.381378, g √ = 0.381378.
4 3 4 3
In Table 9.19 estimates for the expansion coefficients up to cn are shown. Note
that in the best case, we could only hope for a quadrature rule with n points to
integrate up to c2n−1 accurately and that this would only be the case if g were a
constant function. From this table it seems that the integrals are accurate (though
not exact) up to cn for an n point quadrature rule once n > 2.
Using the results from Table 9.19, we can create an empirical PDF of G
for different quadrature rules. For a given polynomial expansion, generating 106
samples requires only evaluating that many polynomials. In Fig. 9.13, these PDFs
are shown for quadrature rules using 2, 4, 6, and 10 points as well as the PDF from
3000 Monte Carlo samples of G by randomly selecting ω’s. Note that with only six
function evaluations using the n = 6 quadrature rule, we get a better representation
of G than thousands of Monte Carlo samples and a savings of about 0.75 h on my
laptop.
220 9 Stochastic Projection and Collocation
500 MC
n=2
n=4
400 n=6
n = 10
300
density
200
100
0
0.370 0.375 0.380 0.385 0.390
g(x)
1
1
Fig. 9.13 PDF of the random variable g(ω) = 0 dx 0 dy u(x, y; ω), where ω ∼
U (−0.25, 0.25) and u is the solution to Eq. (9.72), using several different Gauss-Legendre
quadrature rules and a Monte Carlo simulation using 3 × 103 numerical solutions of Poisson’s
equation
Now that we have a discussed how to express a QoI as a projection onto a linear
combination of orthogonal polynomials, it is a good point to point out some of
the warts of this particular method. The projection method uses a single expansion
to represent the QoI; such an expansion is called a global expansion. Whatever
values the random variable takes, the expansion coefficients do not change. Global
expansions work well and have rapid convergence if the underlying function is
smooth. However, if the function being projected onto polynomials is not smooth,
the expansions demonstrate large oscillations known as Gibbs’ phenomena (Boyd
2001). These are especially present when the function is discontinuous. Gibbs’
oscillations are present when a global polynomial (that is a single polynomial) is
used to approximate a non-smooth function.
In practice many quantities of interest are discontinuous at some point in random
variable space. An example of this would be a QoI that is zero until a threshold
is met and then jumps up to a nonzero value. Such a function would not be
well represented by projection onto orthogonal polynomials. To demonstrate this
consider the function
1
g(x) = H (x + 1) − H (x),
2
9.3 Issues with Projection Techniques 221
1.0
0.5
g(x)
0.0 Exact
0th Order
2nd Order
4th Order
6th Order
−0.5 18th Order
−2 0 2
x
10.0
Exact
0th Order
2nd Order
4th Order
7.5 6th Order
18th Order
density
5.0
2.5
0.0
−1.0 −0.5 0.0 0.5 1.0
g(x)
Fig. 9.14 Projection results for the approximation to the function g(x) = H (x + 1) − 12 H (x)
where x is a standard normal random variable. The top panel is the Hermite approximation at
various orders, and the bottom panel is the histogram from 106 samples of x
where H (x) is the Heaviside step function and x is a standard normal random
variable. Using Hermite expansions of various orders fails to give a reasonable
approximation to the function, as shown in Fig. 9.14. Also, the empirical distribution
has issues that even an 18th order expansion has spurious artifacts near the three
possible values of the function. Moreover, the approximations using Hermite
polynomials indicate that the probability of g(x) being between 0.5 and 1 is fairly
222 9 Stochastic Projection and Collocation
large even though it is not possible for the true solution to have such a value. The
takeaway from these results is that one needs to use caution when using expansion
techniques if there is a possibility that the QoI is not a smooth function of the random
variables.
The situation is the same with the collocation methods the we introduce later:
non-smooth functions are not well-suited to global polynomial interpolation. The
shortcomings of global expansions are worse when dealing with stochastic finite
elements, a topic discussed toward the end of this chapter. There are approaches
to correcting oscillations of global expansions including “local” expansions that
use different projections in different domains and spline-based reconstructions that
use piecewise polynomials. A hurdle for the local expansion techniques is that one
often does not know where any discontinuities or other non-smooth features of the
function lie. Therefore, effort (i.e., function evaluations) must be expended to find
these points, potentially dulling the usefulness of the method.
It is likely that in a realistic problem, there will be several sources of uncertainty and
several uncertain parameters. It may also be possible that the different parameters
may have different types of distributions. Let us consider a generic function of d
random variables, θi , with an expansion given by
∞
∞
g(θ1 , . . . , θd ) = ··· cl1 ,...,ld Pl1 ,...,ld (θ1 , . . . , θd ). (9.79)
l1 =0 ld =0
d
Pl1 ,...,ld (θ1 , . . . , θd ) = Pli (θi ), (9.80)
i=1
and w(θ1 , . . . , θd ) is the product of the weight functions for the d bases. If the sum
is truncated at degree N polynomials, then there will be (1 + N )d terms in the
expansion.
As a simple example, consider the function g = cos(θ1 ) cos(θ2 ) with θi ∼
U (0, 2π ). A second-order expansion would have the form
9.4 Multidimensional Projections 223
n
Qn f (x) = w f (x ), (9.83)
l=1
n
n
n g(θ1 , . . . , θd ) =
Q(d) ··· wl1 · · · wld g(θ1l1 , . . . , θdld ), (9.84)
l1 =1 ld =1
where θi,lj is the ith input evaluated at its j th point in the quadrature set. It is
sometimes convenient to write Q(d) as a tensor product of 1-D quadrature rules.
We define a tensor product of two quadrature rules as
y y
x x
Fig. 9.15 Illustration of the 2-D tensor-product quadrature derived from the six-point Gauss-
Legendre quadrature set. The size of a point is proportional to its weight
10.0 Gamma RV
Actual
7.5
density
5.0
2.5
0.0
0.0 0.1 0.2 0.3 0.4 0.5
Annual Volatility
Fig. 9.16 The empirical distribution and a fitted Gamma distribution in the annual percentage
change in Coca-Cola stock for each year between 1970 and 2015. The distribution has a mean
of 0.154083 and variance of 0.0036984. This corresponds to a Gamma distribution of Σ ∼
G (5.46636, 41.8142)
where
F = Se(r−q)T , (9.88)
log KS + (r − q + 12 σ 2 )T √
v1 = √ , v2 = v1 − σ T , (9.89)
T T
of the current rate (0.48%). For the dividend rate, we will use a uniform distribution
so that Q ∼ U (0.025, 0.045).
The expansion of p(x, D, Σ) will have the form
∞
∞
∞
(0) 2d − 0.7 (5.46636)
p(x, Q, Σ) = clx ld lσ Llx (x)Pld L lσ (41.8142σ ).
0.2
lx =0 ld =0 lσ =0
(9.90)
From this equation, we can compute the mean of the distribution, c000 as
∞
0.045 ∞
z
p̄ = c000 = dx dq dz p x, q,
41.8142
0 0.025 0
z5.46636 −x−z 1
× e ≈ 1.56662. (9.91)
Γ (6.46636) 0.02
Note that this is slightly higher than the price the option is trading at, $1.46.
Because the price of the option is a well-behaved function, we will expand p
with polynomial degree up to order four:
4
4
4
(0) 2d − 0.7 (5.46636)
p(x, Q, Σ) = clx ld lσ Llx (x)Pld L lσ (41.8142σ ).
0.2
lx =0 ld =0 lσ =0
(9.92)
∞
∞ ∞
Γ (lx + 1)Γ (lσ + 6.46636) 2
Var(P ) = c . (9.93)
lx !lσ !Γ (6.46636)(2ld + 1) lx ld lσ
lx =1 ld =1 lσ =1
9.4 Multidimensional Projections 227
100
10−2
2
10−4
10−6
Absolute Expansion coefficient
100
10−2
4
10−4
10−6
100
10−2
6
10−4
10−6
100
10−2
8
10−4
10−6
Coefficient
0 1 2 3 4
Fig. 9.17 The magnitude of the coefficients in the expansion of the value of a call option as a
function of three uncertain parameters, r = 0.0048x with x ∼ G (0, 1), Q ∼ U (0.025, 0.045), and
Σ ∼ G (5.46636, 41.8142). The color and shape of the points indicate the maximum polynomial
degree that the coefficient responds to, e.g., c011 would be a “1” in the figure. The different panels
on the figure indicate the number, n, of Gauss quadrature points in each dimension. Those points
with a maximum polynomial degree greater than n are not shown, and the coefficients are “floored”
to a minimum of 10−6
The results for this calculation using the expansion coefficients in Fig. 9.17 are
shown in Table 9.20. This table indicates that the n = 2 coefficients estimate the
variance to three digits of accuracy.
We compare random 106 samples from the Black-Scholes solution to the same
number of samples to the expansions as estimated with the various quadrature rules,
in Fig. 9.18. This figure indicates that, because of the smoothness of the underlying
function, an expansion with only a few terms is accurate.
228 9 Stochastic Projection and Collocation
0.6 MC
n=2
n=4
n=6
n=8
0.4
density
0.2
0.0
0 1 2 3 4 5
Option Price
Fig. 9.18 The distribution of the price of an option with a strike price of $44, a stock price
of $44.15, and days to expiration of 158. The risk-free interest rate is r = 0.0048x with
x ∼ G (0, 1), the dividend rate is Q ∼ U (0.025, 0.045), and the volatility of the stock is
Σ ∼ G (5.46636, 41.8142). We compare the polynomial chaos expansion as computed using
tensor-products of quadrature rules with n = 2, 4, 6, 8 and compare these distributions to a Monte
Carlo distribution with 106 samples
From this example, several things are evident. With a smoothly varying function,
the expansion order required to estimate the distribution of the quantity of interest
and the number of function evaluations needed are small. The results also indicate
that of the many coefficients possible in a high-order expansion, most will be
negligible. In the next sections, we will investigate how to take advantage of this
structure.
g(θ1 . . . , θd ) ≈ cl1 ,...,ld Pl1 ,...,ld (θ1 , . . . , θd ), (9.94)
l1 +···+ld <N
−1
d −1
S(d) f = (−1) −1−q
Q2k1 −1 ⊗ · · · ⊗ Q2kd −1 f,
−1−q
q=−d k1 =q+d
(9.95)
where k1 = di=1 |ki |. The term in parenthesis is (d − 1) choose ( − 1 − q).
Looking at this formula, we see that the tensor products where the sum of the
number of points in each dimension equals a constant are included. Note that the
quadrature rule can have negative weights.
To demonstrate how these rules work, we will look at the quadrature rule with
= 3 and Gauss-Legendre quadrature. In this case we should have the a quadrature
rule with up to 23 − 1 points:
2
(2) 1
S3 f = (−1)2−q Q2k1 −1 ⊗ Q2k2 −1 f
2−q
q=1 k1 =q+2
=− Q2k1 −1 ⊗ Q2k2 −1 f + Q2k1 −1 ⊗ Q2k2 −1 f
k1 =3 k1 =4
Counting up the total number of points in this rule, there are8 21 compared to 49 for
the tensor product quadrature rule for Q7 ⊗ Q7 .
(2)
We show the points for S3 based on Gauss-Legendre quadrature in Fig. 9.19 as
well as the comparable tensor-product quadrature rule, Q7 ⊗ Q7 .
1 ⊗Q3 ) and (Q3 ⊗Q1 ) rules are completely redundant with (Q3 ⊗Q3 ) and the (Q1 ⊗Q7 )
8 The (Q
y y
x x
(2)
S3 rule Q7 ⊗ Q7 rule
Fig. 9.19 Comparison of the Smolyak sparse quadrature rule of level = 3 and the tensor-product
rule comprised of 7-point Gauss-Legendre quadrature rules
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 1 1 1
⎝ x y ⎠=⎝ x ⎠+⎝ y ⎠+⎝ x y ⎠
x2 xy y2 x2 y2 xy
⎛ ⎞ ⎛ ⎞
1 1
−⎝ x ⎠−⎝ y ⎠. (9.96)
9.5 Sparse Quadrature 231
Q1 ⊗ Q7 Q3 ⊗ Q7 Q7 ⊗ Q7
Q1 ⊗ Q3 Q3 ⊗ Q3 Q7 ⊗ Q3
Q1 ⊗ Q1 Q3 ⊗ Q1 Q7 ⊗ Q1
Fig. 9.20 Demonstration of the construction of the Smolyak quadrature rule with = 3 in two
dimensions comprised of Gauss-Legendre quadrature rules. The Smolyak quadrature is a linear
combination of the points below the dashed line
Here we see the reason for the appearance of the term (−1)−1−q term in Eq. (9.95).
It is worth mentioning that there is an alternate form of the Smolyak rule. For this
will also need to define a difference in quadratures as
(d)
−1
S f = Δ2k1 −1 ⊗ · · · ⊗ Δ2kd −1 f. (9.98)
q=0 k1 =q+d
232 9 Stochastic Projection and Collocation
1
x y
x2 xy y2
x3 x2y xy2 y3
x3y xy3
x3y3
Fig. 9.21 The polynomials that can be integrated exactly by a two-dimensional tensor-product
Gauss quadrature rule comprised of two-point rules. The dashed line encloses the polynomials that
the sparse grid will integrate
2
(3) 2 (σ ) (x) (z)
S3 f = (−1)2−q Q2k1 −1 ⊗ Q2k2 −1 ⊗ Q2k3 −1 f
2−q
q=0 k1 =q+3
(σ ) (x) (z)
= Q1 ⊗ Q 1 ⊗ Q 1 f
(σ ) (x) (z) (σ ) (x) (z)
−2 Q3 ⊗ Q1 ⊗ Q1 f + Q1 ⊗ Q3 ⊗ Q1 f
The component 1-D rules in S3(3) are shown in Table 9.21. Note that only the z points
are nested at all (notice the repeated 0).
9.5 Sparse Quadrature 233
Table 9.21 The 1-D quadrature rules that comprise the sparse rule S3(3)
βσ wσ x wx z wz
Q1 6.466360 271.060701 1.000000 1.000000 0.000000 2.000000
Q3 13.811184 13.236834 6.289945 0.010389 0.774597 0.555556
7.787369 148.010162 2.294280 0.278518 0.000000 0.888889
3.800528 109.813705 0.415775 0.711093 −0.774597 0.555556
Q7 28.226889 0.000454 19.395728 0.000000 0.949108 0.129485
20.399826 0.129138 12.734180 0.000016 0.741531 0.279705
14.769642 4.663395 8.182153 0.001074 0.405845 0.381830
10.417345 42.165053 4.900353 0.020634 0.000000 0.417959
6.984121 116.015439 2.567877 0.147126 −0.405845 0.381830
4.281556 93.279531 1.026665 0.421831 −0.741531 0.279705
2.185142 14.807693 0.193044 0.409319 −0.949108 0.129485
The nesting of points in the z direction leads to seven redundant points and a
(3)
total of 50 unique points in the S3 set. The points in the set are shown in Fig. 9.22
(compare these to the full tensor product rule in Fig. 9.23).
Using the sparse quadrature, we look at calculating the expansion coefficients
for the Black-Scholes example in Fig. 9.24. In these results we see that the l = 2
rule exactly integrates the 1-D polynomials up to order 2. The l = 3 rule is accurate
for the univariate polynomials up to degree 4. The mixed degree polynomials are
less accurate at l = 3, as observed in the polynomials with maximum order 1–3. At
l = 4 the coefficients are as accurate as the tensor-product quadrature set.
The Smolyak sparse quadrature addresses the problem of the number of quadrature
points growing exponentially with the number of dimensions and leads to poly-
nomial growth in the number of quadrature points. It does not, however, address
the issue regarding how many points will be needed in any single dimension. In
fact, our Black-Scholes example indicates that the volatility variable should require
more points than the other two. One way to accomplish this is to use an anisotropic
quadrature.
Anisotropic quadratures are a way to handle integrals that require more accuracy
in a given dimension. A simple way of doing this is to introduce a weight into the
selection of quadrature rules. This makes Eq. (9.95)
−1
(d) d −1
S,a f = (−1) −1−q
Q2k1 −1 ⊗ · · · ⊗ Q2kd −1 f,
−1−q
q=−d q+d−1<ka ≤q+d
(9.99)
234 9 Stochastic Projection and Collocation
x = 20
b s = 30
z = −1
(3)
Fig. 9.22 Depiction of the points for the S3 quadrature set comprised of the rules in Table 9.21.
The diamond-shaped points are the Q7 rules in each dimension, and the points and planes are
those from the three permutations of the rule Q3 ⊗ Q3 ⊗ Q1 . The star-shaped points are the two
nonredundant points from permutations of the Q3 ⊗ Q1 ⊗ Q1 rules
where a is a d-length vector of weights, and ka = di=1 |ai ki |.
As an example, if a = (1, 0.5), then the = 3 quadrature rule with d = 2 would
be
(d)
S,(1,0.5) f = −Q1 ⊗ Q7 f − Q1 ⊗ Q15 f − Q3 ⊗ Q3 f − Q3 ⊗ Q1 f
+Q3 ⊗ Q15 f + Q3 ⊗ Q7 f + Q1 ⊗ Q31 f
+Q7 ⊗ Q3 f + Q7 ⊗ Q1 f
(9.100)
This rule has a maximum of 31 points in one direction and 7 in the other dimension.
9.5 Sparse Quadrature 235
x = 20
b s = 30
z = −1
Fig. 9.23 The points from the Q7 ⊗ Q7 ⊗ Q7 tensor-product quadrature using the points from
Table 9.21. The different x levels are colored to distinguish them in the 2-D projection
where I is the set of all indices included in the rule. The adaptive algorithm starts
with I = {(1, · · · , 1)}. Then, we add a point to I with an additional level in the
dimension with the largest value of the tensor product of Δ quadratures because the
magnitude of a Δ quadrature indicates how much the integral changes when adding
new points. Then an additional level in the direction of the level just added. The rule
grows by considering those tensor products that are adjacent to terms already in the
set.
236 9 Stochastic Projection and Collocation
100
10−2 l = 2, 9 points
10−4
Absolute Expansion coefficient
10−6
100
10−2 l = 3, 50 points
10−4
10−6
100
10−2 l = 4, 218 points
10−4
10−6
100
10−2 tp 8, 512 points
10−4
10−6
Coefficient
0 1 2 3 4
Fig. 9.24 The magnitude of the coefficients in the expansion of the value of a call option as a
function of three uncertain parameters, r = 0.0048x with x ∼ G (0, 1), Q ∼ U (0.025, 0.045), and
Σ ∼ G (5.46636, 41.8142). The color and shape of the points indicate the maximum polynomial
degree that the coefficient responds to, e.g., c011 would be a “1” in the figure. The different panels
on the figure indicate the level l of the sparse Gauss quadrature followed by the total number of
points in the quadrature. The bottom panel shows the coefficients calculated with a tensor-product
quadrature rule with 8 points in the 1-D quadrature rules. Those coefficients with a total polynomial
degree greater than (2 − 1)/2 are not shown, and the coefficients are “floored” to a minimum of
10−6
c d
y y
D5 D5
D4 D4
D3 D3
D2 D2
D1 D1
x x
D1 D2 D3 D4 D5 D1 D2 D3 D4 D5
N
g(x) ≈ cn Hen (x).
n=0
Now consider that we have evaluated g(x) at M values of x. The resulting data gives
us the following system of equations
238 9 Stochastic Projection and Collocation
100
10−2 20
10−4
10−6
Absolute Expansion coefficient
100
10−2 50
10−4
10−6
100
10−2 100
10−4
10−6
100
10−2 500
10−4
10−6
100
10−2 50000
10−4
10−6
Coefficient
0 1 2 3 4
Fig. 9.26 The magnitude of the coefficients in the expansion of the value of a call option as a
function of three uncertain parameters, r = 0.0048x with x ∼ G (0, 1), Q ∼ U (0.025, 0.045),
and Σ ∼ G (5.46636, 41.8142) as computed by elastic net regression with α = 0.75 and λ picked
via cross-validation. The different panels on the figure indicate the number, n, of samples of the
output used to construct the fits. The coefficients are “floored” to a minimum of 10−6
Here we have written the expansion error for each case as i . This system is M
equations for N + 1 unknowns, the cn coefficients, and therefore has no unique
solution unless M = N + 1. We can write this system using a rectangular matrix as
y = Ac,
100
20
0.6 50
500
50000
MC
0.4
density
0.2
0.0
0 1 2 3 4 5
Option Price
Fig. 9.27 The distribution of the price of an option with a strike price of $44, a stock price
of $44.15, and days to expiration of 158. The risk-free interest rate is r = 0.0048x with
x ∼ G (0, 1), the dividend rate is Q ∼ U (0.025, 0.045), and the volatility of the stock is
Σ ∼ G (5.46636, 41.8142). We compare the distribution using a polynomial chaos expansion
calculated via elastic net regression as computed using different number of function samples and
compare these distributions to a Monte Carlo distribution with 105 samples
via a cross-validation procedure are shown. In this figure, we notice that with
only 50 samples from the output function, we approximate the coefficients with
magnitude larger than 10−4 well, and further samples do improve some of these
small coefficients. Another observation is that even with 5000 function evaluations,
the nonzero coefficient for the fourth degree polynomial in volatility is not captured,
as it was with quadrature using fewer points. This is likely due to the nature
of generating the samples of the output randomly leading to difficulty estimating
parameters that are otherwise swamped by the randomness of the sampling.
Despite not capturing the low-magnitude coefficients estimated by quadrature,
Fig. 9.27 shows that the distributions produced by regularized regression capture
the features of the true distribution of the output, as calculated via Monte Carlo
sampling of the output.
Thus far in this chapter, we have considered projection methods where the distri-
bution for a QoI is projected onto a polynomial subspace using quadrature. In this
section we seek a polynomial representation of the QoI that exactly matches the
QoI at a set of points of input space. As before, this method involves evaluating the
240 9 Stochastic Projection and Collocation
QoI at particular values of the uncertain inputs. It differs in that it uses the value of
the QoI at those points to create an interpolating polynomial. It is this interpolating
polynomial that can then be evaluated at samples of the inputs to get approximate
samples of the QoI. This procedure is called stochastic collocation because it assures
that the approximation to the QoI is exact at the sampled points and is analogous to
collocation methods for deterministic problems.
The points that are used to evaluate the inputs can be generated by any means
(e.g., random sampling, which would lead to Monte Carlo when combined with
quadrature), but it is most common to use the quadrature points associated with
the distribution of the uncertain inputs or their sparse constructions. Using the
quadrature points allows an equivalence between the polynomial chaos projection
methods and collocation if the QoI is a polynomial. To demonstrate this, consider a
QoI that is a polynomial of degree d of a single input,
d
Q(x) = ci Pi (x),
i=0
10.0 Exact
n=2
n=4
n=6
7.5
n=8
n = 10
n = 100
density
5.0
2.5
0.0
−1.0 −0.5 0.0 0.5 1.0
g(x)
Fig. 9.28 PDF of the random variable g(x) = cos(x), where x ∼ N (μ = 0.5, σ 2 = 4) using
stochastic collocation with various Gauss-Hermite quadrature points to evaluate the function for
interpolation. This figure was generated from 106 samples of x that were used to evaluate g(x) and
the various approximations
n
n
x − xj
g(x) ≈ f (xi ). (9.102)
xi − xj
i=1 j =1, i
=j
Other polynomial constructions are possible, but will give equivalent polynomials
due to uniqueness of polynomials—only one polynomial of degree n − 1 will pass
through the n points (xi , f (xi )).
The results for stochastic collocation on this example are shown in Fig. 9.28.
These results appear very similar to those in Fig. 9.3 where several different
quadrature orders are used to approximate a projection onto a fifth-order Hermite
polynomial. The biggest difference between the two examples is that the collocation
example is converging to the exact distribution because as more points are added the
degree of the interpolating polynomial must go up. With the projection technique,
we fixed the number of moments required, and, as a result, the approximation
stopped improving at some point, even as more quadrature points were added.
The other noticeable difference is the n = 2 approximation has very different
character in the two approaches. Using collocation we see a unimodal shape in the
242 9 Stochastic Projection and Collocation
distribution, whereas the projection results with the same number of points have a
shape similar to the exact result, except shifted.
Stochastic collocation will have an error in the approximation of the moments
of the QoI, just as the projection method had errors when too few quadrature
points were used. To estimate the moments, we have two options: we could
estimate empirical distributions by sampling from x and evaluating the collocation
approximation (this is done in Table 9.22), or we could project the collocation
estimate onto the Hermite polynomials up to some degree. This second approach
gives the same approximation as before when the number of collocation points is
sufficient to estimate the moment integrals. As the results in Table 9.22 indicate, the
empirical approach can be problematic because it is possible that a sampled point
will be an extrapolation for the polynomial, i.e., a sampled value x is outside the
range of collocation points, and large errors are possible. This is what happened
in the variance estimate for n = 100 collocation points: extrapolating with a high-
degree polynomial is a risky idea.
The connections between collocation and projection suggest a way to combine
them together. Given a fixed number of samples of the QoI that one can afford,
a set of runs can be used based on the appropriate quadrature points or their
sparse version. Then using those quadrature points project the QoI onto a large
set of orthogonal polynomials. Using the same points to construct a collocation
approximation to the QoI, one can then project the approximation onto different
degree expansions in orthogonal polynomials using different quadrature rules
(i.e., evaluating the interpolating polynomial at different quadrature points). These
quadrature calculations do not require any additional function evaluations. Then
comparing how these projections converge to the projection using all the quadrature
points, one can decide what degree polynomial to use to represent the QoI.
To illustrate this procedure, we consider the QoI from above, g(x) = cos(x),
where x ∼ N (μ = 0.5, σ 2 = 4). We assume we can only afford ten function
evaluations and use these to project onto a ninth degree Hermite expansion (the
penultimate row in Table 9.23). Then using those ten points as collocation points, we
construct the interpolating polynomial to approximate g(x). With this approxima-
Table 9.23 The expansion coefficients of a 9 degree Hermite expansion of the ninth degree collocation approximation to g(x) = cos(x), where x ∼ N (μ =
0.5, σ 2 = 4) as estimated using different Gauss-Hermite quadrature rules
n c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
2 0.148413 0.024270 0.001767 0.000953
9.7 Stochastic Collocation Methods
∂u
F (u, u̇, x) = 0; u̇ = . (9.103)
∂t
The function F also depends on x independent of u.
We then write u as a truncated polynomial chaos expansion (c.f. Eq. (9.79)) as
Np
N1
û(z, t; x) ≈ ··· u1 ,...,p (z, t)P1 ,...,p (x), (9.104)
1 =0 p =0
9.8 Stochastic Finite Elements 245
p
P1 ,...,p (x) = Pi (xi ).
i=1
Using the expansion in Eq. (9.104), we will attempt to determine the coefficients
u1 ,...,p (z, t) using the method of weighted residuals. That is, we use the expansion
in Eq. (9.103), multiply the result by a weight function w1 ,...,d (x), and integrate
over the domain of each of the p random variables. The resulting system of
equations is
dx1 · · · ˙ x) = 0.
dxp w1 ,...,d (x)F (û, û, (9.106)
D1 Dp
If we use the P1 ,...,d (x) as the weighting function, this is known as Galerkin
weighting. The result will be a system of coupled partial differential equations for
the u1 ,...,p (z, t) functions. The benefit of this approach is that if the polynomials
are chosen such that P0,··· ,0 is the PDF of the joint distribution of x assuming each
input is independent, then the function u0,··· ,0 (z, t) will be the mean of the random
process u(z, t; x). The variance can be written in terms of the sum of squares of the
expansion functions, as in the projection methods already discussed.
To demonstrate this technique, we use the 2-D Poisson’s equation with an
uncertain source that we have seen in Sect. 9.2.7. The problem we are interested
in solving is
∂2 ∂2
− + u(x, y; τ ) = q(x, y; τ ). (9.107)
∂x 2 ∂y 2
1
14e−x e− 16 (4y+1) (20y(4y − 1) − 2ey (10y(4y + 1) + 41) + 82)
2 2
3
√
+ π y 80y 2 + 117 erf 14 − y + erf y + 14
In the notation of Eq. (9.103) for this problem, we have z = (x, y), x = τ , and
∂2 ∂2
F (u, u̇; x) = + u(x, y; τ ) − q(x, y; τ ) = 0.
∂x 2 ∂y 2
Given that we have a single, uniform random variable, we expand u(x, y; τ ) in terms
of Legendre polynomials. For this example we choose a cubic expansion:
The coefficient functions qn (x, y) are given in Table 9.24. Using these results, we
insert û into F (u, û; x) = 0, multiply by a Legendre polynomial, and integrate to
get the four equations:
∂2 ∂2
+ un (x, y) = qn (x, y), 0 ≤ n ≤ 3. (9.111)
∂x 2 ∂y 2
These are four uncoupled partial differential equations for the un (x, y). The fact that
they are uncoupled is due to the fact that the uncertainty was in the source term to
the equation.
We can also see that our expansion for û in Eq. (9.110) gives us the mean of the
random process u(x, y; τ ) as
9.8 Stochastic Finite Elements 247
0.20 MC
SFEM
0.15
u(0.25,y)
0.10
0.05
0.00
-1.0 -0.5 0.0 0.5 1.0
y
Fig. 9.29 The mean (solid line) and ±2 standard deviation for the function u(0.25, y; τ ) from the
Poisson’s equation with uncertain source as approximated by the stochastic finite element method
with a cubic Legendre expansion and Monte Carlo with 104 samples of τ . The mean is coincident
on the scale of the plot for both methods
1/4 1
1
2 û(x, y; τ ) dτ = û(x, y; θ ) dθ
−1/4 2 −1
= u0 (x, y).
1
3
1 un (x, y)2
P0 (θ )(û(x, y; θ )) dθ − (u0 (x, y)) ≈
2 2
. (9.112)
2 −1 2n + 1
n=1
The SFEM method with Galerkin weighted residuals gives equations that are
more difficult to solve when products of random variables occur in the equations.
This was not the case in the above example. We consider the steady ADR equation
for a quantity u(z; x) where x = (x1 , x2 ) as
du d 2u
v(x1 ) −ω 2 +κ(x2 )u = qz(10−z), u(0; x) = u(10; x) = 0, (9.113)
dz dz
where
dx1 dx2 Hen (x1 )Hem (x2 )qz(10 − z) = δm0 δn0 qz(10 − z).
−∞ −∞ 2π n!m!
(9.115)
The terms with v and κ are a bit trickier. To whit, the integrals of the κu terms are
9.8 Stochastic Finite Elements 249
∞ ∞ e−x1 /2−x2 /2
2 2
Note that this term couples the uncertainties in the two different variables: κ depends
on x2 , but u10 is the x1 dependence of u. This comes about from the product of the
two variables in κ û. The advection term has similar coupling
∞ ∞ e−x1 /2−x2 /2
2 2
d
dx1 dx2 Hen (x1 )Hem (x2 )(10 + x1 )
dz −∞ −∞ 2π n!m!
×(u0 (z) + u10 (z)x1 + u01 (z)x2 )
⎧
⎪
⎪10u0 (z) + u10 (z) n = 0 & m=0
⎪
⎪
⎪
⎪
⎪u0 (z) + 10u10 (z) n = 1 & m=0
d ⎨
= 10u01 (z) n=0 & m=1. (9.117)
dz ⎪⎪
⎪
⎪ n=1 & m=1
⎪
⎪u01 (z)
⎪
⎩0 otherwise
Putting this all together, we get that the projection of Eq. (9.113) onto
He0 (x1 )He0 (x2 ) is
d d 2u du10
10 − ω 2 + κ0 (z) u0 (z) + + κ1 (z)u01 (z) = qz(10 − z). (9.118a)
dz dz dz
Notice that this equation is the equation assuming the mean values of v and κ(z)
with the additional coupling terms to the linear terms in both random variables.
Continuing on, the projection onto He1 (x1 )He0 (x2 ) is
d d 2u du0 (z)
10 − ω 2 + κ0 (z) u10 (z) + = 0. (9.118b)
dz dz dz
There is a modification to the SFEM procedure that can simplify the application of
the method and allow it to be nonintrusive in many cases. Rather than computing the
expansion coefficients using Eq. (9.106) where the weight function is an orthogonal
polynomial and the expansion of u is an orthogonal polynomial expansion of the
random process û, we evaluate the random process at particular values of x and then
use interpolation to get the value of the random process at other values of the random
variable. As we will see, this method does not require the solution of coupled partial
differential equations.
Consider the generic system from Eq. (9.103). Given a definition for the random
variables x, we then choose points to evaluate the function at based on the quadrature
rule appropriate for each random variable or the sparse version of the quadratures.
This will give us a set of decoupled partial differential equations to solve. These can
then be solved independently and polynomial interpolation can be used to produce
a representation of the full random process in a similar manner as that done for a
single QoI in Sect. 9.7.
On the advection-diffusion-reaction problem solved in the previous section,
defined by Eq. (9.113), the application of collocation would be as follows. Given
that there are two normal random variables, we would evaluate x1 and x2 at the four
points xi = ±2−1/2 as given in Sect. 9.1.3. The resulting equations are
1 du1 d 2 u1 1
√ −ω 2 +κ √ u1 = qz(10 − z), (9.119a)
2 dz dz 2
1 du2 d 2 u2 1
−√ −ω 2 +κ √ u2 = qz(10 − z), (9.119b)
2 dz dz 2
9.9 Summary of Methods 251
1 du3 d 2 u3 1
−√ − ω 2 + κ − √ u3 = qz(10 − z), (9.119c)
2 dz dz 2
and
1 du4 d 2 u4 1
√ − ω + κ − √ u4 = qz(10 − z). (9.119d)
2 dz dz2 2
Using the four resulting ui (z) functions, we then can use Lagrange interpolation to
construct a representation of u(z, x).
Comparing this result to that from Galerkin SFEM, we see that we have one
more equation, but we do not need to solve any coupled differential equations.
Moreover, the method is nonintrusive. A code that can solve the ADR equation can
be wrapped in this collocation procedure—this is a definite benefit of this approach
over standard SFEM. The other benefits of collocation are the same as discussed
previously. If the random variables do not have a polynomial representation, we can
still use collocation, though convergence may be slower.
• There are significant drawbacks to this approach that the practitioner must be
aware of.
– When the number of random variables is large, the number of evaluations
needed to compute even a modest polynomial representation is prohibitively
large due to the geometric increase in the number of function evaluations. This
is the curse of dimensionality.
– Sparse quadratures and regularized regression techniques can help with the
curse of dimensionality but are not a panacea.
– If the QoI is not a smooth function of the random variables, the resulting
expansion can be inaccurate and give spurious results due to the Gibbs’
phenomenon.
9.9.1.2 Collocation
• Collocation takes the value of the QoI at different values of the random variables
and constructs an interpolating polynomial. This nonintrusive method has the
following properties:
– If the QoI can be expressed as a polynomial, collocation constructed by
evaluating the function at the Gauss quadrature points for the appropriate
orthogonal polynomials will give an equivalent representation to projection
onto the same orthogonal polynomials.
– Sparse collocation grids based on sparse quadrature points can be used for
collocation.
– Collocation and projection can be combined when one has a fixed budget of
samples from the QoI and wants to get the best expansion possible from the
available samples.
The drawbacks from collocation are the same as those for projection: the curse
of dimensionality and Gibbs’ phenomena for non-smooth QoIs.
• The curse of dimensionality and Gibbs’ oscillations are still present in Galerkin
SFEM solutions.
• The cost of solving, possibly large, systems of coupled PDEs is typically much
larger than solving a single PDE many times.
We can avoid the coupled PDEs and intrusive character of Galerkin SFEM by apply
collocation to the solutions in an analogous way to how it is applied to a single QoI.
The curse of dimensionality and Gibbs’ oscillations remain, however.
9.11 Exercises
1. A beam of radiation that strikes a slab of material will have the intensity
decreased by a factor t = exp(−kx) where x is the thickness of the slab and
k is the extinction coefficient, sometimes called a macroscopic cross section. If
K ∼ N (μ = 5, σ 2 = 1) and x = 1, compute the mean and variance of t (K).
Plot the distribution of t (K) as well.
2. Repeat the exercise using K ∼ N (μ = 2, σ 2 = 1).
3. Consider a stochastic medium where the distribution of thicknesses of two
different materials is unknown. In this case the beam transmission will be given
by
t = exp(−k1 x1 − k2 (x − x1 )).
4. The function f (x) = 1/(1 + x 2 ) is called the Witch of Agnesi. If x ∼ U (−2, 2),
find the best approximation to the distribution possible using 10 and 100 function
evaluations to build polynomial chaos projections and stochastic collocation.
Compare your results to the analytic distribution and its moments. Has the witch
cast a spell on the methods?
5. Using a discretization of your choice, solve the equation.
∂u ∂u ∂ 2u
+v = D 2 − ωu,
∂t ∂x ∂x
for u(x, t) on the spatial domain x ∈ [0, 10] with periodic boundary conditions
u(0− ) = u(10+ ) and initial conditions
1 x ∈ [0, 2.5]
u(x, 0) = .
0 otherwise
Using a polynomial chaos expansion, estimate the mean and variance in the total
number of reactions
6 5
dx dt ωu(x, t).
5 0
In this part we will make predictions that combine simulation and experimental
data. In many cases we are limited by how many times we can evaluate the QoI in
simulation. To help with this, we discuss the construction of surrogate models for
the simulation. These surrogates allow us to bridge the gap between a limited set of
simulations and experiments to make predictions using both calibrated parameters
and the observed discrepancy between our simulation and previous experiments.
The workhorse for this task is a surrogate model based on Gaussian process
regression, as introduced in the next chapter. Chapter 11 then constructs predictive
models in the framework of Kennedy and O’Hagan to use Gaussian processes to
make data-informed predictions. The final chapter discusses the phenomenon of
epistemic uncertainty and gives tools to address the question of how to deal with
unknown uncertainties.
Chapter 10
Gaussian Process Emulators
and Surrogate Models
prediction can be added to other uncertainties in the system. The methods we discuss
are based on Bayesian statistics, and it is this character of the methods that allows
for the estimate of the uncertainty. We will begin by introducing a Bayesian version
of linear regression before generalizing to Gaussian process regression.
We consider the case where we have a dependent variable y, the output, and a set
of p independent variables, x, the inputs. For these input/output pairs, we have
n realizations, that is, for n different values of xi , i = 1, . . . , n, we know the
corresponding yi . We are interested in computing a linear approximation to y as
y = xT w + , (10.1)
This equation implies that yi |xi , w, σd ∼ N (xTi w, σd2 ) or that the data likelihood
is a normal distribution for yi that has mean xTi w and variance σd2 . Also, the errors
were independent so that we can write the likelihood given all n data points as
n
1 (yi − xTi w)2
f (y|X, w, σd ) = √ exp − (10.3)
i=1
σd 2π 2σd2
1 −1
= exp (y − Xw) (y − Xw) ,
T
(2π σd2 )n/2 2σd2
10.1 Bayesian Linear Regression 259
f (y|X, w, σd )π(w)
π(w|X, y, σd ) =
, (10.4)
f (y|X, w, σd )π(w) dw
where π(w) is the prior distribution of the weights. To compute the posterior
distribution of the weights given the data, we need to specify a prior distribution on
the weights. A reasonable prior is w ∼ N (0, Σp ), where Σp is a p × p covariance
matrix. This choice of prior attempts to make the weights close to zero (the mean of
the distribution is zero), and, as we will see, it is a form of regularization.
Using the prior in Eq. (10.4), we find the posterior distribution of the weights can
be written as
% &
−1 1 T −1
π(w|X, y, σd ) ∝ exp (y − Xw) T
(y − Xw) exp − w Σ p w (10.5)
2σd2 2
% &
−1 ∗ T ∗
∝ exp (w − w ) A(w − w ) ,
2
and
1 T
A= X X + Σp−1 .
σd2
Equation (10.5) tells us what the posterior is proportional to, but we also know
that the posterior is a probability distribution so the proportionality constant must
properly normalize the distribution. From this argument, and the form we derived,
we can state that the posterior is a normal distribution with mean w∗ and covariance
matrix A−1 or w|X, y, σd ∼ N (w∗ , A−1 ).
With this posterior we could sample weights from the multivariate normal and
evaluate the model. However, it is more convenient to specify a set of points that
we want to evaluate the model at in a matrix denoted X∗ and average the result
over the posterior distribution of the weights. This average can be thought of as the
probability density of X∗ w given the data, and X∗ :
f (X w|X , X, y, σd ) = f (X∗ w|X∗ , w)π(w|X, y, σd ) dw.
∗ ∗
(10.6)
260 10 Gaussian Process Emulators and Surrogate Models
The result in Eq. (10.7) gives us a way to get the distribution of the prediction
from the linear model at point X∗ by sampling from a multivariate normal. Given
that we have a distribution for the prediction, we can compute the variance in the
prediction, confidence intervals, etc. The form of the covariance matrix indicates
that the uncertainties are quadratic in X∗ so that the uncertainty in the prediction
grows with the magnitude of X∗ . Additionally, the larger the eigenvalues of Σp
are, the larger the uncertainty in the prediction will be. This is sensible because Σp
represents the amount of uncertainty we believe the weights will/should have in the
prior for the weights.
To this point we have not addressed the variance of the error, σd2 . It is likely that
this error will not be known in practice. In that case we could modify the procedure
to allow for a prior on σd2 and allow the data to suggest this value by computing
a posterior for the weights and the variance of the error. This does, unfortunately,
introduce a great deal of algebraic complexity that will not provide much gain for
our studies.
As an example we consider a simple linear model of the form
y = w1 + w2 x1 + ,
⎛ ⎞ ⎛ ⎞
0.05σd2 −47.7
3
+1 1
⎜ σd4 +54σd2 +152 ⎟
σd2 σd2
A=⎝ ⎠, w∗ = ⎝ 50.25σ 2 +150.7 ⎠ .
1 51
+1 d
σd2 σd2 σd4 +54σd2 +152
0
y
-5
-6 -3 0 3 6
x1
Fig. 10.1 Plot of results for Bayesian linear regression mean model (solid line) and the ±2
standard deviation (dashed lines) for the model y = w1 + w2 x1 using data x1 = {−5, 1, 5} and
y = {−5.1, 0.25, 4.9} (the three symbols) and σd2 = 1. A sample from the posterior of the weight
distribution is shown in the dash-dot line
The results for the fit with σd2 = 1 are shown in Fig. 10.1. In this figure we
see the predicted mean model (i.e., the mean value of the weights in the posterior)
and the ±2 standard deviations from the mean model that represents the estimated
uncertainty in the model. The predicted behavior that the uncertainty grows as |x1 |
increases can be seen in the width of the uncertainty bands.
There are ways that we could improve our model, for example, as we noted
that it would be ideal to estimate σd as a function of the data. Depending
on the type of prior distribution we chose for σd , we would likely lose the
ability to write the posterior distribution as a multivariate normal distribution.
As a result we would need a means to evaluate the posterior distribution from
Bayes’ rule. One approach would be numerical integration of the denominator,
but there is a simple means to produce samples from the posterior without
knowing the full distribution. That idea, however, will be left until the next chapter.
Rather we turn to how we can increase the class of functions in our regression
model.
We could attempt to enrich the class the models that we try to fit to include
polynomials in the inputs or other functions. This added complexity is known as
enhancing the feature space because the data used to build the model and any
manipulations of that data are often called model features. Perhaps surprisingly,
we can derive a model that has nearly the same complexity as the linear regression
262 10 Gaussian Process Emulators and Surrogate Models
case but can model a wide class of nonlinear functions. This can be thought of as
finding a transformation of the independent variables to find a new set of variables
that do provide a linear representation for the dependent variable.
Consider the set of monomials of a single variable x up to degree d − 1. We write
the set as
yi = φ(x)T w + ,
where now there are pd weights. Clearly this model is more general than the pure
linear model, and we would expect it to be able to match a wider class of functions.
Nevertheless, we can replace the data matrix X in our previous derivation of the
Bayesian linear regression model with Φ(X) and arrive at the predictive distribution
at a set of points X∗ as
∗ ∗ 1 ∗ −1 T ∗ −1 ∗ T
Φ(X )w|X , X, y ∼ N Φ(X )A X y, Φ(X )A Φ(X ) , (10.8)
σd2
Given this form of the predicted distribution, the evaluation of the action of the
inverse of A could be expensive if the product of p and d is large. However, it is
possible to specify an arbitrarily large d using the kernel trick. The kernel trick takes
advantage of the fact that the feature space only appears as a quadratic form such as
Φ(X)T Φ(X). If we can specify a function, called a kernel function, that is equivalent
to this quadratic form, we do not need to work with the entire feature space. Indeed,
it is possible to specify a kernel function that is equivalent to a quadratic form
involving an infinite number of features, as we will see.
To demonstrate the kernel trick we rearrange Eq. (10.8) to be
Φ(X∗ )w|X∗ , X, y ∼ N Φ(X∗ )Σp Φ(X)(K + σd2 I)−1 y,
Φ(X∗ )T Σp Φ(X∗ ) − Φ(X∗ )T Σp Φ(X)(K + σd2 I)−1 Φ(X)T Σp Φ(X∗ ) ,
(10.9)
10.2 Gaussian Process Regression 263
with K = Φ(X)T Σp Φ(X). In Eq. (10.9) the feature space only appears in the form
Φ(X̂)T Σp Φ(X̂) with X̂ equal to either X or X∗ . Therefore if we specify a kernel
function of the form
we only need to specify weighted inner products of the feature space and never need
to specify the feature space.
At this point we recall some insights from Sect. 2.4 regarding Gaussian pro-
cesses. In that section we defined a Gaussian process as a random process where
a finite collection of points are distributed as a multivariate normal. The predictive
distribution for Φ(X∗ )w is a Gaussian process with a known covariance matrix
related to the kernel function.
As we argued above, the kernel function, also called a covariance function, can
replace the feature space. In other words, specifying a kernel function is the same
as specifying a feature space. The power-exponential kernel is given by
p
1
α
k(x, x ) = exp − βk |xk − xk | . (10.11)
λ
k=1
It can be shown (Rasmussen and Williams 2006) that this kernel function leads to
a feature space that includes an infinite number of basis functions. The parameter
βk−1 can be thought of as a length scale for variable xk . The power α is related to
the smoothness of the model: a value of α = 2 creates an infinitely differentiable
covariance function. Finally, the larger λ is the smaller the covariance function is
and the more the model ignores nearby points when computing the model. The
statistical interpretation of this is that if λ is large, the data is significant and the
model needs to respect that reality. Other kernel functions are possible, though the
power-exponential covariance is the most commonly used in practice.
When using a kernel function on a data matrix, we define the matrix K(X, X ) as
the matrix
⎛ ⎞
k(x1 , x1 ) k(x1 , x2 ) . . . k(x1 , xn )
⎜ .. ⎟
K(X, X ) = ⎝ . ⎠
k(xn , x1 ) k(xn , x2 ) . . . k(xn , xn ).
This matrix is of size n × n where n is the number of rows in the data matrix X and
n is the number of rows in X .
264 10 Gaussian Process Emulators and Surrogate Models
If there is no uncertainty assumed in the model, that is, the error in the model is
assumed to be zero and σd = 0, we can simplify Eq. (10.9) to be
Φ(X∗ )w|X∗ , X, y ∼ N K(X∗ , X)K(X, X)−1 y, K(X∗ , X∗ )
−K(X∗ , X)K(X, X)−1 K(X, X∗ ) . (10.12)
This equation defines the mean and covariance function for a Gaussian process.
Computing the mean function involves solving a linear system of equations that is
n × n, as does computing the action of the covariance matrix.
The regression model defined by Eq. (10.12) is called a Gaussian process model
or Gaussian process regression (GPR). This model is clearly flexible because it is
completely defined by the input data and the kernel function. A drawback of these
models is that the covariance matrix and mean function are defined in terms of the
inverse of an n × n matrix. Therefore, if there is a large amount of training data, it
can be expensive to compute.
The parameters βk , α, and λ are sometimes referred to as hyperparameters. This
designation indicates that these values are needed to fit the model, but they also have
an influence as to how the model fits the data. Later we will see how we can use the
data to choose these values, but for now we assume that they are fixed. The term
hyperparameter can refer to more than just these three parameters and can mean any
parameter that influences the model fit, but are not given in the data.
We have implicitly assumed in the derivation that the prior mean for the mean
function is the zero function (i.e., a function that is zero everywhere). This comes
from the assumptions we made in the original Bayesian linear regression model.
This prior can be relaxed, or the training data can be mean-centered. Rasmussen and
Williams (2006) demonstrate how such a nonzero mean function can be defined. For
our purposes we will assume the training data is mean-centered.
To demonstrate Gaussian process regression, we use a data set generated from
the function y = e−x sin 4x + (x − 1)H (x − 1) − 0.732. The data points we have
are
x = {1.475, 1.859, 0.757, 0.665, 0.161, 0.175, 0.185, 1.243, 0.939, 1.606},
and
and choose x ∗ to be 100 equally spaced points between 0 and 2. If we fit a GPR
model using this data and choose β = 1, α = 1.9, and λ = 1, we obtain the results
in Fig. 10.2. In this figure we see that the GPR model interpolates the data and gives
estimates of the uncertainty (as shown by the ±2 standard deviation confidence
10.2 Gaussian Process Regression 265
0.5
0.0
y
-0.5
-1.0
Fig. 10.2 Example GPR fit for the function y = e−x sin 4x + (x − 1)H (x − 1) − 0.732, where
H (x) is the Heaviside step function. The 10 points shown were used to fit the model, and the
hyperparameters are β = 1, α = 1.9, and λ = 1. The dark solid line is the true function, the
dashed line is the estimated mean function in Eq. (10.12), and the dotted lines are the ±2 standard
deviation bounds around the mean. Two sample functions are also shown
a b
0.5
0.5
0.0 0.0
y
y
-0.5 -0.5
-1.0
-1.0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
x x
c d
1
0.0
0
y
y
-0.5
-1
-1.0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
x x
e f
0.5
0.5
0.0
0.0
y
y
-0.5 -0.5
-1.0
-1.0
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0
x x
Fig. 10.3 The effect of varying hyperparameters on the GPR fit for the function y = e−x sin 4x +
(x − 1)H (x − 1) − 0.732. The dark solid line is the true function, the dashed line is the estimated
mean function in Eq. (10.12), and the dotted lines are the ±2 standard deviation bounds around
the mean. Two sample functions for each model are also shown. (a) β = 0.5, α = 1.9, λ = 1.
(b) β = 2, α = 1.9, λ = 1. (c) β = 1, α = 1.1, λ = 1. (d) β = 1, α = 2, λ = 1. (e)
β = 1, α = 1.9, λ = 0.5. (f) β = 1, α = 1.9, λ = 2)
Finally, the influence of λ is on the confidence in the model. When λ = 0.5, the
confidence intervals are wider than the nominal example with λ = 1. Increasing λ
decreases these intervals.
10.3 Fitting GPR Models 267
If we relax the assumption that σd is zero, we can write the posterior distribution of
Φ(X∗ )w as a finite sample from a Gaussian process by defining
The addition of a nonzero σd gives a floor to the covariance function and has the
affect of making the model less confident near the training data.
We will modify the previous example to include noise to see the effect on the
resulting model. We include a measurement uncertainty of σd = 0.05 and perturb
the values of y in the same way so that y = e−x sin 4x +(x −1)H (x −1)−0.732+
where ∼ N (0, σd2 ). In this case we know the exact value of the measurement
uncertainty, so we can force the GP model to have the correct uncertainty near
the data. In Fig. 10.4 two realizations of this model are shown using 10 and 25
training points. The values for the hyperparameters are β = 1, α = 1.9, and
λ = 1. In the figure we see that the addition of noise makes the uncertainty in
the model be nonzero at the data points, i.e., the confidence interval has a nonzero
width at the data points. Additionally, because the data is to be less trusted due to
the uncertainty, the less influence the data can have on the inferred shape of the
underlying function. This is evident in the 10-point results where the peak between
0 and 0.5 is underestimated because of the noise in the data to the left of the peak.
As more points are added to the training set, in this case making the total 25, we see
that the estimated uncertainty in the model does decrease and the true underlying
function is better approximated by the model.
In this example we knew what σd was for the data, and we assumed values for
the other parameters. In the next chapter, we turn to answering the more common
question of how to fit a GP emulator without knowledge of these parameters.
As we saw above the hyperparameters in the Gaussian process, regression model can
have a large impact on the fit of the model. We will discuss how to fit these models
and optimize these parameters. To begin we develop a simple, implementable
version of the GP emulator. To do this we begin with Eq. (10.14). This equation
requires the solution of two systems involving the matrix cov(y). Additionally,
268 10 Gaussian Process Emulators and Surrogate Models
0.5
0.0
y
-0.5
-1.0
0.5
0.0
y
-0.5
-1.0
Fig. 10.4 The effect of noise in the dependent variable in the GPR fit for the function y =
e−x sin 4x + (x − 1)H (x − 1) − 0.732 + where ∼ N (0, σd2 ) using different numbers of training
points. The dark solid line is the true function without noise, the dashed line is the estimated mean
function in Eq. (10.14), and the dotted lines are the ±2 standard deviation bounds around the mean.
Two sample functions for each model are also shown. (a) 10 training points. (b) 25 training points
cov(y) = LLT .
10.3 Fitting GPR Models 269
We also define a vector that is the same length as the number of training data points,
k∗ . This vector holds the covariance between a single prediction point x∗ and the
training data X:
k∗ = K(X, x∗ ). (10.15)
Then using Eq. (10.14) we can write the mean prediction at point x∗ as
with
−1
u = LT L−1 y. (10.17)
The vector u can be calculated by doing two triangular solves; note that this vector
does not depend on the prediction point, only on the data. Therefore, we can obtain
the mean prediction at point x∗ by dotting the covariance function evaluated at the
prediction point with the vector u. In Eq. (10.16) we used the fact that the covariance
kernel is a symmetric function of its arguments.
To evaluate the variance in the prediction at point x∗ , we also need to solve a
linear system, but in this case, it does depend on k∗ . From Eq. (10.14), the variance
at a single prediction point can be written as
−1
K(x∗ , x∗ ) − K(x∗ , X)cov(y)−1 K(X, x∗ ) = K(x∗ , x∗ ) − k∗ · LT L−1 k∗ .
(10.18)
Note that this equation will involve the solution of a linear system for each point x∗ .
Also, we have to evaluate the covariance function at the point x∗ .
Therefore, the mean and variance of the prediction f ∗ = Φ(x∗ )w are
−1
E[f ∗ ] = k∗ · u, Var(f ∗ ) = K(x∗ , x∗ ) − k∗ · LT L−1 k∗ . (10.19)
Algorithm 10.1 Python code to fit a GP regression model. The covariance function
k is assumed to take only two arguments
import numpy as np
def GPR(X,y,Xstar,k,sigma_n):
N = y.size
#build covariance matrix
K = np.zeros((N,N))
kstar = np.zeros(N)
for i in range(N):
for j in range(0,i+1):
K[i,j] = k(X[i,:],X[j,:])
if not(i==j):
K[j,i] = K[i,j]
else:
K[i,j] += sigma_n**2
#compute Cholesky factorization
L = np.linalg.cholesky(K)
u = np.linalg.solve(L,y)
u = np.linalg.solve(np.transpose(L),u)
#now loop over prediction points
Nstar = Xstar.shape[0]
ystar = np.zeros(Nstar)
varstar = np.zeros(Nstar)
kstar = np.zeros(N)
for i in range(Nstar):
#fill in kstar
for j in range(N):
kstar[j] = k(Xstar[i,:],X[j,:])
ystar[i] = np.dot(u,kstar)
tmp_var = np.linalg.solve(L,kstar)
varstar[i] = k(Xstar[i,:],Xstar[i,:])
- np.dot(tmp_var,tmp_var)
return ystar, varstar
Algorithm 10.2 Example covariance function to be used with the GPR model in
Algorithm 10.1
def cov(x,y,beta,l,alpha):
exponent = np.sum(beta*np.abs(x-y)**alpha)
return 1/l * np.exp(-exponent)
beta = [1.0, 2.0]
lambda = 1.0
alpha = 1.9
k = lambda x,y: cov(x,y,beta, lambda, alpha)
the mean prediction and variance for f ∗ at the point not used to build the model.
This type of cross-validation is called “leave-one-out” cross-validation because it
repeatedly leaves a single instance out of the training data. Using the prediction for
a single point, we can compute the likelihood for the actual value y from a normal
10.3 Fitting GPR Models 271
distribution with a mean and variance predicted by the model. This is then repeated
N times, and we compute the sum of the likelihoods from each test at the same
values of the hyperparameters, (t), where t is the set of hyperparameters. We then
have an optimization problem to solve: maximize the sum of the likelihoods for the
predictions over the hyperparameters. Solving this optimization problem subject to
reasonable constraints on the hyperparameters will give a GP model which can make
predictions at a new data point that will have a high likelihood of being “correct.”
Algorithm 10.3 gives a python function that will perform cross-validation to
compute the sum of the likelihoods for the predicted points. This function could
be an input in one of the optimization functions found in the SciPy package for
python to find the hyperparameters that maximize the predicted likelihood.
To demonstrate the GP fitting with the algorithms defined in this section, we use
a set of simulation runs from the simulation of a laser-driven shock in a disc of
beryllium (Be) as reported in McClarren et al. (2011) and Stripling et al. (2013).
In this data the QoI is the shock breakout time, and there are five parameters that
are varied: the disc thickness, the laser energy, the Be gamma (a parameter in an
ideal gas equation of state), the wall opacity, and the flux limiter constant. For more
details on these parameters, see McClarren et al. (2011). The QoI as a function of
these inputs are shown in Fig. 10.5. From the data in the figure, we can observe that
the disc thickness and the Be gamma are clearly important parameters as the graphs
show a trend in the breakout time as a function of these parameters.
We will now use this data in the GPR functions defined above. To begin we
normalize and center each variable by subtracting the mean of the maximum and
minimum value of the variable and dividing by the range: this makes each parameter
vary between −0.5 and 0.5. Then we use cross-validation and use an optimization
function to find the best values for the hyperparameters. We also want to set bounds
on the hyperparameters so that they are physical. In this case since we are using the
power-exponential covariance function, we set the βi to be in the range [0.001, 10]
and allow λ to vary between [0.001, 10]; α is fixed to be 2, and we do not vary it in
272 10 Gaussian Process Emulators and Surrogate Models
500
450
400
Breakout Time (ps)
350
300
18 19 20 21 22 3600 3700 3800 3900 4000 1.4 1.5 1.6 1.7
Wall Opacity Flux limiter
500
450
400
350
300
0.8 1.0 1.2 0.050 0.055 0.060 0.065 0.070 0.075
value
Fig. 10.5 The QoI, shock breakout time, as a function of the five inputs for the laser-driven shock
simulation
this problem. Furthermore, since this is simulation data that is not subject to noise
in the observation (i.e., if we ran the same simulation again, we would get the same
result), we set σd = 0. Another consideration when fitting the model is that we split
the data into a test and training set, with 80% of the data being randomly placed in
the training set. This allows us to ensure that using cross-validation and maximizing
the likelihood of the model are not overfitting the available data.
We use the cross-validation procedure given in Algorithm 10.3 and a minimiza-
tion function from scipy.optimize to find the maximum likelihood values
starting at β = (1, 1, 1.5, 0.01, 0.05) and λ = 1. Results from this fit are shown
in Fig. 10.6. The optimized values were
These values indicate that we started near a local maximum in the likelihood because
the β’s did not change much. It also indicates that the value of Be gamma was
the most important input variable in predicting the breakout time. Because we
normalized the inputs, the β’s give an indication of what variables have the most
impact on the covariance function: a larger value of βi indicates that variable is more
important. The values of βi are sometimes called the relative relevances. In this case
the relative relevances suggest that Be gamma, laser energy, and disc thickness are
the key variables to consider when seeking to affect the shock breakout time.
In the figure we can see that the GPR model exactly predicts the training data (as
expected since we set σd = 0), and for the test data, we see a small disagreement
for some of the points. For many of the test points, the true value was within ±2
standard deviations of the prediction, though some are outside that bound. On the
whole the predictions have an average error of 9.74 ps.
10.4 Drawbacks of GPR Models and Alternatives 273
a b
1.5
500
450 1.0
Predicted
400
β
0.5
350
300
Test
Train
0.0
300 350 400 450 500 Be gamma Flux Limiter Laser Energy Thickness Wall Opacity
True
Fig. 10.6 Results for a GPR model fit with 80% of the simulation data. For the predicted versus
actual plot, the error bars are the two times estimated standard deviation in the prediction. (a)
Predicted versus actual breakout times. (b) Relative relevance of inputs
The most common complaint about GPR is that it is expensive to build a model
when there is a lot of training data. This is due to the fact that the construction of the
model requires the Cholesky factorization of a dense matrix with a size equal to the
number of training points; Cholesky factorization of such a matrix requires O(N 3 )
operations for a size N matrix. As we saw above and will see in the next chapter, it
is typical to build many GPR models to find an optimal one, so this cost is further
multiplied by the number of models we want to construct. To this end there has been
work on local GP models that only include a subset of the data (Gramacy and Apley
2015) and can be implemented efficiently (Gramacy et al. 2014).
Another issue with the GP models we used here is that the covariance kernel was
applied to all of input space; that is, we assumed a stationary covariance function. In
many problems the character of the covariance function needs to change in different
regimes. This is particularly acute in problems where the dependent variable is
constant for a large region of space and then begins to vary once some threshold
is crossed. To address this issue, Gramacy and Lee (2008) developed a hybrid tree-
Gaussian process model that allows the covariance function to change over the range
of input data.
Indeed GPR is not the only possible approach to model computer simulations.
There has been success using the Bayesian Multiple Adaptive Regression Splines
(MARS) method of Denison et al. (2002, 1998). This method allows for a
distribution of piecewise polynomial functions to be fit to the data. These methods
can automatically handle some of the issues with nonstationary covariances, and the
underlying computation in fitting a Bayesian MARS model is a least-squares solve,
which can be done efficiently.
The techniques of Gaussian process models and Bayesian MARS are but two
examples of machine learning approaches to finding the functions underlying a
274 10 Gaussian Process Emulators and Surrogate Models
10.6 Exercises
1. Show that maximizing the likelihood in Eq. (10.2) over the weights leads to the
standard least-squares regression model.
2. Consider the function
1
f (x, k) = + ,
1 + ekx
where ∼ N (0, σ 2 = 0.01). Generate 100 samples from this function for
x ∈ [−2, 2] and k ∈ [1, 10]. Fit a Gaussian process regression model to this
data as a function of x and k using the correct measurement uncertainty, α = 2
and βi = 1. Compare the result to the true function. Repeat the exercise by
finding the most likely value of the hyperparameters starting the search near these
parameters.
Chapter 11
Predictive Models Informed by
Simulation, Measurement,
and Surrogates
In this chapter we develop the idea of using statistical models to fuse experi-
ments/measurements with simulation data. Our approach will use Gaussian process
models to model both simulation results and discrepancies between simulations and
experiments. The idea behind all of these approaches is to construct a model that
can be trained to assign differences between a measurement and a simulation to a
calibration parameter and, if necessary, find a function for the difference between
the results and the simulation. The discussion of calibration and Kennedy-O’Hagan
models follows the notation and form of Higdon et al. (2004); the interested reader
is encouraged to see that work for more applications of these models. We also
introduce the idea of having a hierarchy of simulations and how to combine them,
including how to use a low-fidelity model that is “free” to evaluate.
11.1 Calibration
In this problem we have assigned all of the disagreement between the simulation
and the measurement of the QoI as measurement error . Additionally, we have
purposefully written y as a function of x only and not as a function of t. This is
because typically the calibration parameters may not have a physical interpretation:
they are parameters that we need to make our code give good answers. In other
words, nature does not care about our calibration parameters.
The calibration problem gives a straightforward methodology to attempt to
combine experimental and simulation data to improve the simulation. Nevertheless,
we have not specified how to solve the problem, and at this point, we have not
specified enough information to solve it. A reasonable approach to solving this
problem, due to its statistical nature and the combination of deterministic (the
simulator) and stochastic information (the measurement), would be to use Bayes’
rule.
To use Bayes’ rule, we will need to specify a prior for the calibration parameters
and the measurement error. For the calibration parameters, we will typically have
an interval of values that each can take or have other information that we can use
to construct a prior. The measurement error will typically be reported using some
notion of a distribution by the experimenter that could be used to inform the prior.
As a word of caution, do not always assume that the measurement error reported by
the experiment to be a normal distribution, even if the experiment uses the parlance
of normal random variables, such as standard deviation. In the author’s experience,
further investigation will uncover that some sources of error are non-normal.
Given priors for the calibration parameters and the measurement error, we can
use Bayes’ rule to update our estimate for t and the measurement error distribution
given a set of measurements, y as
f (y|x, t, )π(t)π()
π(t, |y, x) = . (11.1)
dt d f (y|x, y, )π(t)π()
It may be possible that the experimental error is well characterized and we know
the properties of the distribution. We will explore the case of a known measurement
error distribution to illustrate how calibration can be performed. If the measurement
11.1 Calibration 277
errors are said to be independent and each is normal with mean zero and a known
standard deviation, σ , then we can write the likelihood in Eq. (11.1) as
1
N
1
f (y|x, t, ) = exp − 2 (y(xi ) − η(xi , ti )) .
2
(11.2)
(2π )N/2 σ N 2σ
i=1
f (y|x, y, )π(t)
π(t, |y, x) = . (11.3)
dtf (y|x, y, )π(t)
where g in meters per second squared is the calibration parameter. We obtain the
following measurements and model evaluations:
We also know that the measurement error is normally distributed with mean zero
and standard deviation of 0.001 s. The prior distribution1 for g is said to be normal
with mean 9.81 and standard deviation 0.01 m/s2 . Evaluating the posterior using
numerical integration, we get the logarithm of the posterior distribution for g to be
The results for this calibration are shown in Figs. 11.1 and 11.2. The prior and
posterior estimates and confidence intervals for the time are shown in Fig. 11.2. It
is clear that the measurement data selected values of g that agree with the data
within the measurement errors. Additionally, we can see that the five measurements
cause our knowledge of g to improve when we compare the width of the posterior
distribution to the prior for g in Fig. 11.1.
1 Inactuality this a very wide range for g as it has been known to five significant digits since at
least the 1960s (Cook 1965; Tate 1968).
278 11 Predictive Models Informed by Simulation, Measurement, and Surrogates
prior
posterior
60
prob. density
40
20
Fig. 11.1 Comparison of the posterior and prior distributions for the calibration example after five
measurements
Prior Posterior
1.56
1.53
time (s)
1.50
1.47
Fig. 11.2 Model results before and after calibration. The lines are results from the model with g
selected at the 5, 15, . . . , 85, 95 percentile of the prior and posterior, respectively. The points are
the experimental measurements with a two standard deviation uncertainty
The example above was simple for several reasons. Two of those reasons regard the
problem formulation: it had only a single calibration variable, and the experimental
measurement uncertainty was known. The fact that the cost to evaluate the QoI was
11.2 Markov Chain Monte Carlo 279
“free” is what made the calibration simple to perform. It allowed us to perform the
integration in Bayes’ rule using numerical integration and get a closed form for
the prior. In practice most QoI calculations will require running a computer code,
and we cannot afford to run it as many times as it would require to perform the
numerical integration for the denominator in Bayes’ rule, and then each evaluation
of the posterior would require another evaluation of the QoI.
To handle this situation, we can generate samples from the posterior distribution
without needing to evaluate the integral in the denominator. This is accomplished
using Markov Chain Monte Carlo, which we will discuss next.
When dealing with Bayes’ rule, we can often write down the numerator in the
expression for the posterior, but the denominator, which normalizes the distribution,
may not be known or may be difficult to compute. The knowledge of the numerator
gives us an expression for the posterior distribution but only up to a multiplicative
constant. There is a method for generating samples from a distribution if one only
knows a constant multiple of the distribution known as the Metropolis-Hastings
algorithm for Markov Chain Monte Carlo. We can use this algorithm to sample the
posterior from Bayes’ rule if we only know the data likelihood and the prior.
1
n
E[g(x)] ≈ g(xt ). (11.4)
n−m
t=m+1
We want to construct a Markov chain with stationary distribution that is the posterior
from Bayes’ rule. The Metropolis-Hastings algorithm (MH) provides a means to
accomplish this task. We only need to be able to evaluate the product of the prior and
the likelihood function; we call this unnormalized target distribution p̂(x). MH is a
rejection sampling technique that uses a distribution that is not the target distribution
to generate proposed samples. The algorithm begins with this proposal distribution
that we write as q(y|xt ): the proposal distribution can depend on the current chain
state. In practice the proposal distribution is often chosen to be a multivariate
normal with mean xt . A sample is proposed by sampling y from q(y|xt ). Then
the acceptance probability of y is computed as
⎛ ⎞
p̂(y)
α(xt , y) = min ⎝1,
q(y|xt ) ⎠ = min 1, p̂(y)q(xt |y) . (11.5)
p̂(xt ) p̂(xt )q(y|xt )
q(xt |y)
The acceptance probability is defined so that if the ratio of the proposed point to its
probability of being proposed is greater than the ratio of the likelihood of the current
chain state to the probability of it being proposed from y, the proposal is always
accepted. In other words, if the gain in likelihood is high relative to the probability
of it being proposed, as compared with the current chain likelihood relative to the
chain going back to xt , we accept. Otherwise, we accept with some probability based
on the ratio in Eq. (11.5). This allows the chain to not get stuck at a local maximum
because it can step to a lower likelihood with some probability.
If the proposal y is accepted, then xt+1 = y, otherwise the chain does not change
and xt+1 = xt . The MH algorithm is written in Algorithm 11.1.
MH generates a Markov chain where the stationary distribution is the prop-
erly normalized form of p̂. Also, once MH generates a sample from the target
distribution, all the subsequent samples will also be from the target distribution.
This explains the necessity of the burn-in period. In the following subsection,
we demonstrate the properties of the stationary distribution created by MH. This
discussion is optional and not essential to comprehend the remainder of this chapter.
11.2 Markov Chain Monte Carlo 281
We consider the case where the target distribution is the posterior from Bayes’ rule,
i.e.,
p̂(x) ≡ π(x|D) p(D|x)π(x) dx = p(D|x)π(x), (11.6)
where D denotes the data that we have, p(D|x) is the data likelihood conditional on
a value of x, and π(x) is the prior on x. The definition of α(xt , y) from Eq. (11.5)
gives
π(y|D)q(xt |y)
α(xt , y) = min 1,
π(xt |D)q(y|xt )
p̂(y)q(xt |y)
= min 1, .
p̂(xt )q(y|xt )
Notice that the posterior distribution appears in the expression for α because the
constant of normalization cancels. Upon manipulating this equation, we can get the
equality:
π(xt |D)q(xt+1 |xt )α(xt , xt+1 ) = π(xt+1 |D)q(xt |xt+1 )α(xt+1 , xt ). (11.7)
We note that q(xt+1 |xt )α(xt , xt+1 ) is the probability density of the chain moving
from xt to xt+1 (the probability density of proposing xt+1 times the acceptance
probability). Therefore,
This equation indicates that the probability of transitioning to state xt+1 from xt is
the same as transitioning to state xt from state xt+1 . Such a Markov chain is said to
be reversible.
If we integrate the detailed balance equation over all values of xt , we get
:1
π(xt |D)P (xt+1 |xt ) dxt = π(xt+1 |D) P |x
(xt t+1 ) dxt. (11.9)
The result in Eq. (11.9) gives the posterior evaluated at state xt+1 given that state
xt is a sample from the posterior. Therefore, from the property of the stationary
distribution of the Markov chain, once one sample xt is from the posterior
distribution, all subsequent samples will be as well. As a result, after a long-enough
burn in the samples, xt will be samples from π(x|D).
1
n
E[g(x)] ≈ g(xt )δ0, t mod s (11.10)
ns
t=m+1
and ns is the number of points between m + 1 and n that were used in the
estimator. This estimator helps to counteract the autocorrelation of the samples. The
value of s may be chosen so that several acceptances are likely to have occurred
between sample points. The autocorrelation of the chain may also be used to select
s: the larger the autocorrelation, the larger s needs to be. The effect of a larger
autocorrelation is, therefore, to reduce the number of samples used in the estimator.
It has been claimed that s = 5 is a default value used in practice (Denison et al.
2002).
284 11 Predictive Models Informed by Simulation, Measurement, and Surrogates
2
0
-2
Acc. Rate = 0.234
2
σ = 0.1
0
-2
Acc. Rate = 0.892
x
2
σ=1
0
-2
Acc. Rate = 0.704
2
σ = 10
0
-2 Acc. Rate = 0.125
Fig. 11.3 Markov chains generated by the Metropolis-Hastings algorithm using a standard normal
for the target distribution, p̂(x) = φ(x), and a proposal distribution N (xt , σ 2 ) for different values
of σ . Each chain starts at x0 = 3
11.3 Calibration Using MCMC 285
σ = 0.01
std. dev. = 0.817
750
500
250
0
1000
750
σ = 0.1
mean = 0.077
500 std. dev. = 1.021
250
count
0
1000
750
σ=1
mean = -0.014
500 std. dev. = 1.008
250
0
1000
750
σ = 10
mean = 0
500 std. dev. = 0.992
250
0
-4 -2 0 2 4
x
Fig. 11.4 Histogram for Markov chains of length 105 with a burn-in period of 104 and sampling
period of 10 generated by the Metropolis-Hastings algorithm using a standard normal for the target
distribution, p̂(x) = φ(x), and a proposal distribution N (xt , σ 2 ) for different values of σ
the expected values of 0 and 1, respectively. However, the σ = 0.01 results do not
generate samples that approximate a standard normal distribution. It is possible to
run the chain for much longer to produce a reasonable histogram.
With MCMC we are able to sample from the posterior distribution of the cal-
ibration parameters, t. We can use this capability to obtain samples from the
posterior with a finite number of simulation outputs. To pose this problem, we
consider the case where we have N measurements at points xi , that is, we have
{y(x1 ), . . . , y(xN )}. At these points we wish to know the value of the calibrated
simulation, {η(x1 , tc ), . . . , η(xN , tc ))}. We also have M simulations at other points
in input space {η(x∗1 , t∗1 ), . . . , η(x∗M , t∗M )}; here the asterisks denote simulations that
do not necessarily correspond to the experimental measurements.
We combine the measurements and simulations into a single vector:
Using this vector we can formulate the calibration problem using a Gaussian process
regression model as the simulation:
zi = η̂(xi , tc ) + i , i = 1, . . . , N, (11.11)
zi = η̂(x∗i−N , t∗i−N ), i = N + 1, . . . , N + M.
where η̂ denotes a Gaussian process model for the simulation. Notice that tc is
unknown at this point.
We assume that the measurement uncertainty is normally distributed and that we
know the covariance for the observations. In particular, this means that for , we can
write down an N × N covariance matrix for the measurements, Σy . We also will
assume a covariance function for the simulation that is a power-exponential kernel,
as shown in Sect. 10.2.1. We write the kernel function to explicitly include both the
experimental controls and the calibration parameters:
p q
1
α α
k(x, t, x , t ) = exp − βk |xk − xk | + exp − βk+p |tk − tk | .
λ
k=1 k=1
(11.12)
Using this kernel we can define a (N +M)×(N +M) matrix given the measurement
and simulation points and a value for tc as
Ση =
⎛ ⎞
k(x1 , tc , x1 , tc ) k(x1 , tc , x2 , tc ) . . . k(x1 , tc , xN , tc ) k(x1 , tc , x∗1 , t∗1 ) . . .
k(x1 , tc , x∗M , t∗M )
⎜ ⎟
⎜ k(x2 , tc , x1 , tc ) k(x2 , tc , x2 , tc ) . . . k(x2 , tc , xN , tc ) k(x2 , tc , x∗1 , t∗1 ) . . .
k(x2 , tc , x∗M , t∗M ) ⎟
⎜ ⎟
⎜ . ⎟
⎜ . ⎟
⎜ . ⎟
⎜ ⎟
⎜ k(xN , tc , x1 , tc ) k(xN , tc , x2 , tc ) . . . k(xN , tc , xN , tc ) k(xN , tc , x∗1 , t∗1 ) . . . k(xN , tc , x∗M , t∗M ) ⎟ .
⎜ ⎟
⎜ k(x∗ , t∗ , x1 , tc ) k(x1 , t1 , x2 , tc ) . . . k(x1 , t1 , xN , tc ) k(x1 , t1 , x1 , t1 ) . . . k(x1 , t1 , xM , tM ) ⎟
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
⎜ 1 1 ⎟
⎜ . ⎟
⎜ . ⎟
⎝ . ⎠
∗ ∗
k(xM , tM , x1 , tc ) ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
k(xM , tM , x2 , tc ) . . . k(xM , tM , xN , tc ) k(xM , tM , xM , t1 ) . . . k(xM , tM , xM , tM ) ∗
Given the assumptions that the simulation is replaced with a Gaussian process
regression model and that the measurements have a normal uncertainty, and given
values of the hyperparameters in Eq. (11.12) and a covariance for the measurement
error, the vector z has a likelihood that is a multivariate normal PDF of the form
% &
1
f (z|tc , βk , λ, α, Σy ) ∝ |Σz |−1/2 exp − zT Σz−1 z . (11.13)
2
11.3 Calibration Using MCMC 287
In this likelihood we have assumed that z is standardized to have mean zero. Using
this likelihood function, we can specify a posterior for the calibration parameters
and the Gaussian process regression hyperparameters as
Therefore, if we can sample points from this posterior using MCMC, we perform
calibration and build the emulator at the same time. The benefit of this approach is
that measurement data is combined with simulation data in the construction of the
emulator.
To perform MCMC sampling from the posterior in Eq. (11.15), we need to spec-
ify the prior distributions for the hyperparameters and the calibration parameters.
For the calibration parameters, we can typically choose these based on valid limits
for the models they represent, e.g., the parameter must be in some range or be
positive, etc. It is common to set a flat, uniform prior for these variables if we
have no preference for one value over another before we look at data. For the
hyperparameters we follow the prescriptions of Higdon et al. (2004) and set
Then in the MH algorithm, the value of the logarithm of the acceptance probability
is found from Eq. (11.5) to be
log α(xt , y) = min 0, log p̂(y) + log q(xt |y) − log p̂(xt ) − log q(y|xt ) .
Using this formulation, we then take the logarithm of a uniform random number
between 0 and 1 and accept the proposal if that logarithm is less than the logarithm
of the acceptance probability.
To make predictions from the calibrated model, we need to use the MCMC
samples generated, after the burn-in, to construct a GP model using the algorithms
of the previous chapter. One approach is to draw a sample from the Markov chain
and use those values of the hyperparameters to construct a GP model using the data
and make predictions. This is repeated, drawing new samples each time, several
times to estimate the mean prediction and confidence intervals for the calibrated
model. It is also useful to use calibrated parameters to then use in the execution of
the simulation code for new predictions.
One important point we have not discussed is how to select the points at which to
run the simulation, that is the x∗i and t∗i . A space-filling design, orthogonal array, or
pseudo-Monte Carlo technique, such as those discussed in Chap. 7 can be employed.
These methods have the benefit of working in a batch mode: one determines the
inputs at which to execute the simulation and then can run, in parallel, all the
simulations. However, there are approaches to using adaptive sampling of the input
space where a batch of simulations is run, a GPR is built, and points with the highest
predicted uncertainty are then added as training points. This does limit the amount
of batching that can be executed; however, it can greatly improve the number of
simulation runs needed to perform calibration.
We can apply this calibration model to the shock breakout data from Chap. 10. In
that data set, the laser energy and disc thickness are experimental parameters, and
the other three inputs (Be gamma, wall opacity, and flux limiter) are calibration
parameters for approximate models in the calculation. Additionally, there are eight
experimental measurements of the shock breakout time. Therefore, we can use these
experiments to find appropriate posterior estimates for the calibration parameters.
We do this using the priors specified above and 104 burn-in samples, and use a
flat distribution over the range of the calibration parameters for the prior for the t. In
Fig. 11.5 we show the distribution of the β hyperparameters, and Fig. 11.6 compares
the calibrated parameters with the prior distribution.
The results of the calibration indicate that the calibration parameters should be set
to the lower end of their range to best agree with the experimental data. Furthermore,
the disc thickness is the most important parameter in describing the shock breakout
time, followed by the Be gamma and the flux limiter.
11.4 The Kennedy-O’Hagan Predictive Model 289
0 1 2 3 0 1 2 3 0 1 2 3
Wall Opacity Flux limiter
12000
9000
6000
3000
0
0 1 2 3 0 1 2 3
βk
Fig. 11.5 The MCMC samples of the βk for the five inputs to the simulation; the first two plots
are x parameters, and the final three are calibration parameters
25 10
1.4 1.5 1.6 1.7 0.050 0.055 0.060 0.065 0.070 0.075 0.8 1.0 1.2
t
Fig. 11.6 The empirical density function from the MCMC samples of the three calibration
parameters for the calibration problem. The flat prior distribution for these parameters is shown
with a dashed line
The calibration procedure described above works when the computational model
has the ability to reproduce the experimental results. Taking a cynical view of the
calibration exercise, we could say that calibration only works if there are enough
knobs in the code to turn to get the correct answer. In many cases we might know that
the simulation is not an adequate representation of reality, and we want to develop
a function for the difference between the code and the experimental results. That is
we want to know how to correct the code to match an experiment.
290 11 Predictive Models Informed by Simulation, Measurement, and Surrogates
We will use the predictive model originally proposed by Kennedy and O’Hagan
(2000) and commonly called the Kennedy-O’Hagan model in a flourish of Hibernian
appellation. The idea for the model is that we want to include a term that only
depends on experimental parameters (the x from before), to allow for corrections to
the computer model. To this end we write an experimental observation, y(xi ), as
The function δ(xi ) is known as the discrepancy function. The question now is how
to modify the calibration problem to estimate a GP model for the discrepancy and
the simulation at the same time.
As before we combine the N measurements and M simulation results into a
single vector z of size N + M. With this formulation the only change between fitting
the Kennedy-O’Hagan model and the calibration problem is the specification of the
covariance matrix Σz . For the predictive model, we include an N × N matrix Σδ :
Σy + Σδ 0
Σz = Ση + . (11.19)
0 0
The elements (Σδ )ij are computed by evaluating a kernel function, kδ (xi , xj ), at the
N inputs corresponding to the measurements. We can use the same form for this
kernel function as we used previously:
1
p
− xk |αδ
(δ)
kδ (x, x ) = exp − βk |xk . (11.20)
λδ
k=1
−bλδ
π(λδ ) ∝ λa−1
δ e (11.21)
p+q
(δ)
− 106
π(β (δ)
)∝ 1 − e−βk e−βk ; (11.22)
k=1
to give a flat prior for λδ , we set a = 2 and b = 0.001. The prior for β (δ) is chosen
to encourage the discrepancy function to be flatter than the model of the simulation
as well; this is manifest in the power of 6/10 compared with 1/2 in the prior for β
in the simulation covariance. The logic behind this choice is that we would prefer
to match the experiment with the simulation (if possible) and make the discrepancy
function small.
The question naturally arises regarding using the discrepancy to make a predic-
tion. When I want to apply my code to a new experiment, how do I best apply
the predictive model? When the new experiment is interpolation, that is the inputs
11.4 The Kennedy-O’Hagan Predictive Model 291
are inside the convex hull of previous data used to build the predictive model, the
discrepancy function should be used to correct the simulation’s prediction. However,
extrapolating outside the previous experimental data requires care in applying the
discrepancy function. For a point far outside the training data, the GP for the
discrepancy function will return to the mean of the function, in this case zero. This
should not be interpreted to imply that the simulation should be trusted to be correct
in its prediction. In such an extrapolation, we can investigate how the discrepancy
function varies for the known measurements: if the discrepancy function is small
in magnitude, we can use this fact to give credence to the simulation predictions
for extrapolation. Of course it will require expert judgment and considerations of
epistemic uncertainty to be completely transparent with the uncertainties in the
extrapolation, as we will discuss in a later chapter.
To make a prediction using the Kennedy-O’Hagan model, we have to modify the
definition of k∗ from Eq. (10.15) to include the kernel function for the discrepancy
function. Each element of the vector is
k(xi , t, x∗ , t∗ ) + kδ (xi , x∗ ), i = 1, . . . , N
(k∗ )i = , (11.23)
k(xi , t, x∗ , t∗ ) i = N + 1, . . . , N + M
where k(xi , t, x∗ , t∗ ) is the covariance kernel function for the simulations. The
prediction requires this definition of k∗ to inform the prediction vector that the
covariance between the prediction should have a different form when compared
with the simulation training points versus the measurement points. Equation (11.23)
is used in Eq. (10.19) to produce the predictions for the predictive model. The
evaluation of the expected simulation result from the predictive model can be
accomplished by removing the kδ (xi , x∗ ) term from Eq. (11.23) to get a prediction
without a discrepancy. This simulation prediction can then be used to evaluate the
discrepancy function via subtraction from the full prediction.
t
1.6
t
1.4
value
1.2
1.0
β
(δ)
βx βt βx
1.5
1.0
value
0.5
0.0
0 5000 10000 15000 20000
λ
λsim λδ
5
4
value
Fig. 11.7 The MCMC samples of the calibration parameter t and the hyperparameters. The burn-
in period used was 104
the true discrepancy between −2 and 2 with large error outside this range. We do
notice that the uncertainty in the discrepancy function is noticeable near x = ±1.
This uncertainty in the discrepancy is mirrored in the simulation estimates, though
due to the scale is less noticeable. In this case we have compensating errors
294 11 Predictive Models Informed by Simulation, Measurement, and Surrogates
1.0
0.5
Predicted
0.0
-0.5
-1.0
Fig. 11.8 Prediction from the predictive model versus actual at 20 new measurements generated
from Eq. (11.24). Each point represents the mean of the estimate generated using 100 different
samples from the MCMC chain, and the error bars give the range of those estimates
1.0
0.5
value
0.0
-0.5
-1.0 pred.
true
-2 0 2
x
Fig. 11.9 Prediction from the predictive model as a function of x and the underlying true function
from Eq. (11.24). The predicted curve represents the mean of the estimate generated using 10
different samples from the Markov chain, and the dashed lines give the range of those estimates
11.5 Hierarchical Models 295
a b
1.0
0.2
0.5
η(x, t)
δ(x)
0.0
0.0
-0.5 -0.2
-1.0
-0.4
-2 0 2 -2 0 2
x x
pred. true
Fig. 11.10 The estimated simulator response η(x, t) from the predictive model compared with the
function sin 1.2x (left) and the discrepancy function estimated by the predictive model compared
with the true discrepancy (right). The dashed lines represent the range of estimates produced from
100 samples from the Markov chain. (a) Simulation. (b) Discrepancy
in the simulation and discrepancy estimates: if the simulation is too high, the
discrepancy can be decreased to compensate for the error. These errors cancel when
the simulation and discrepancy are added to get an overall prediction.
Though this is a simple example, it does point out some important features
of predictive models in terms of extrapolation and how the discrepancy function,
and GP for the simulation can have compensating effects on the prediction. These
phenomena occur beyond just toy problems. We will return to this example later
when we want to have a multi-fidelity model.
In this model the subscript c denotes calibrated quantities, and the hats over the η
functions represent a GP regression approximation to the simulation.
The form of the multi-fidelity predictive model indicates that we calibrate the
low-fidelity model and compute a discrepancy to match the high-fidelity model, and
then the high-fidelity model is calibrated along with a discrepancy function to match
the measurements. There is the complication that the calibration parameters for the
two models are generally different. Therefore, in order for the low-fidelity model to
approximate the high-fidelity simulation, we must include these parameters in the
discrepancy function.
We construct a single vector to hold the measurements and two types of
simulation data: z will be a length N + MH + ML vector containing the left-hand
side of Eq. (11.25). We then write the covariance matrix for the data as
Σy + Σδ 0 ΣL 0
Σz = ΣηL + + ; (11.26)
0 0 0 0
k=1
r
+ exp − βk+p+qL |tks − tks |α , (11.27a)
k=1
11.5 Hierarchical Models 297
1
p
H s
− xk |αL
(L)
kL (x, t , t , x , t , t ) =
H s
exp − βk |xk
λL
k=1
qH
(L)
+ exp − βk+p |tkH − tkH |αL
k=1
r
(L) s
+ exp − βk+p+qH |tks − tk |αL , (11.27b)
k=1
1
p
kδ (x, x ) = exp − βk(δ) |xk − xk |αδ . (11.27c)
λδ
k=1
π(tLc , tH s
c , tc , β, λ, α|z, Σy )
∝ f (z|tLc , tH s L H s
c , tc , β, λ, α, Σy )π(tc , tc , tc )π(βk )π(λ)π(α). (11.28)
where we have abused notation to write all the β, λ, and α hyperparameters using a
single variable each.
It would be possible to extend the hierarchical model to include further levels
in a straightforward, if notationally messy, manner. Additionally, one can develop a
predictive model that admits several models that do not necessarily have a known
hierarchy. In such a model, we may not know which computational model is better,
but we would like to use simulation data from each model to make predictions. Such
predictive models were studied by Goh (2014).
(11.30)
ηH (xi , tH s
i , ti ) = ηL (xi , tLc , tsi ) + δL (xi , tH s
i , ti ), i = N + 1, . . . , N + MH .
We then define the vector z to have just the measurements and the high-fidelity
simulations making it a N + MH vector. The covariance matrix for the model
becomes
Σy + Σδ 0
Σz = ΣL + . (11.31)
0 0
The low-fidelity model is a Taylor series of the high-fidelity model with two
additional calibration parameters:
1 s 2 L 2 1 s 3 L 3
ηL (x, t1L , t2L , t s ) = t1L + t s t2L x − t t1 x − t t2 x .
2 6
11.5 Hierarchical Models 299
Note that the Taylor series is correct if t1L = sin t H and t2L = cos t H ; these are
the values we expect to recover in the calibration procedure. The measurements are
generated from
t
L
2 ts tH t1 tL2
1
value
β
3
βx βts βtH βt L βt L
1 2
2
value
λ
8
λsim λδ λL
6
value
0
0 5000 10000 15000 20000
MCMC Sample
Fig. 11.11 The MCMC samples of the calibration parameter t and the hyperparameters for the
hierarchical model. The burn-in period used was 104
11.5 Hierarchical Models 301
0
Predicted
-1
-2
-3
-1.0 -0.5 0.0 0.5 1.0
Measurement
Fig. 11.12 Prediction from the hierarchical predictive model versus actual at 20 new measure-
ments generated from Eq. (11.33). Each point represents the mean of the estimate generated using
100 different samples from the MCMC chain, and the error bars give the range of those estimates
7.5 pred.
meas.
5.0
2.5
value
0.0
-2.5
-5.0
-2 -1 0 1 2
x
Fig. 11.13 Prediction from the hierarchical predictive model as a function of x and the underlying
true function from Eq. (11.33). The predicted curve represents the mean of the estimate generated
using 10 different samples from the Markov chain, and the dashed lines give the range of those
estimates
302 11 Predictive Models Informed by Simulation, Measurement, and Surrogates
a b
0.5
4
0.0
value
value
0 -0.5
-1.0
-4
-1.5
-2 -1 0 1 2 -2 -1 0 1 2
x x
pred. meas. pred. actual
Fig. 11.14 The estimated simulator response for ηH (x, t H , t s ) from the hierarchical predictive
model compared with the function sin(1.2x + 0.1) (left), and the discrepancy function, δ(x)
estimated by the predictive model compared with the true discrepancy (right). The dashed
lines represent the range of estimates produced from 100 samples from the Markov chain. (a)
Simulation. (b) Discrepancy
The predictive model can be used to estimate the high-fidelity model output at
the calibrated inputs, as shown in Fig. 11.14a. Here we can see that the estimate is
accurate inside the range of the training data, though the predictions are slightly high
for x ∈ [−1, 0] and low between 1 and 2. For the discrepancy function, the result
does not match the true linear discrepancy except near x = 0. Additionally, the
uncertainty in the discrepancy estimate is much larger than in the previous, single-
level predictive model.
Despite the inaccuracy in producing the discrepancy function for this data, the
hierarchical predictive model excels at its designed goal: to make a prediction for
the measurement. This is noteworthy because we asked the model to accomplish
four tasks in a single MCMC procedure: estimate a discrepancy function, calibrate
parameters from low- and high-fidelity models, as well as estimate a GP emulator
for the high-fidelity model.
The models that we demonstrated in this chapter all used the same kernel for the
covariance function. In the literature there are other covariances commonly used.
One to note takes the form
1 4(xk −xk )2
p
k(x, x ) = ρk , ρk > 0.
λ
k=1
11.7 Exercises 303
In this function the smaller the value of ρk , the more important the parameter. As a
prior for ρk , a flat prior with mean near 1 is usually used. This covariance function
is widely used; we chose a single function in our examples for ease of exposition.
In addition to the references mentioned above, the use of predictive models
can be found in other papers. Holloway et al. (2011) and Gramacy et al. (2015)
used a Kennedy-O’Hagan model on the modeling of a radiating shock experiment,
Karagiannis and Lin (2017) combined several types of simulation of unknown
fidelity to make predictions, Zheng and McClarren (2016) used multiple physical
models to calibrate neutron scattering data, and Bayarri et al. (2007) combined a
predictive model with wavelet decompositions of functional data. This list is not
exhaustive, but does point to papers that can be readily understood with the tools
discussed in this chapter.
11.7 Exercises
10x12 + 4x22
y(x) = ηH (x, 0.3, 0.2) + + ,
50x1 x2 + 10
Using the simple model for the particle intensity φ(x) = E2 (σ x), where
∞ −xt
e
En (x) = dt.
tn
1
With σ ∼ G (8, .1), and the experimental data just given, derive a posterior
distribution for σ (i.e., calibrate σ ). You may assume that the measurement has
an error distributed by N(0, σ = 0.001). Do your answers change if you add a
discrepancy function?
Chapter 12
Epistemic Uncertainties: Dealing
with a Lack of Knowledge
I think I was aware that something had happened, but I’m not
fully aware.
—Brady Hoke
a
f(x) f(Q) QoI distribution
Interpret x ∈ [a, b] as
x ∼ [a, b]
a b x Q
b
f(x) f(Q) QoI distribution
Actual distribution.
a b x Q
Fig. 12.1 The different results that can be obtained when an interval uncertainty is interpreted as
a uniform distribution (a). In reality the distribution of x is highly peaked in the right part of the
interval leading to a QoI distribution peaked in the tail of the distribution inferred from a uniform
distribution. (b) Actual distribution is not uniform, but does lie within the interval
a b
1.00 1.00
0.75 0.75
F(x)
F(x)
0.50 0.50
0.25 0.25
0.00 0.00
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
x x
c
1.00
0.75
F(x)
0.50
0.25
0.00
−5.0 −2.5 0.0 2.5 5.0
x
Fig. 12.2 Examples of comparison of observations and the CDF of a QoI estimated from
simulations (black line). The number of observations affects the degree to which we can conclude
the two are in agreement. The shaded area in each figure is the validation metric d. (a) One
measurement. (b) Three measurements. (c) Ten measurements
The validation metric is illustrated in Fig. 12.2 where the shaded area represents
the value of d. As we can see in the left panel, when there is a single measurement
to compare with the computational results, the value of d is not small despite the
fact that the measurement “agrees” with the prediction. If the single measurement
were shifted to the left or right, it would be possible to increase d. When further
measurements are added, they can reduce the magnitude of d. This is a feature of
the validation metric: it naturally indicates the greater confidence in the simulation
model when there are more measurements.
The realization that d makes it possible to estimate the impact of unknown
uncertainties suggests how we might use it to make a prediction. Given that with
the experimental data the empirical CDF had a given amount of area between it
and the simulated CDF, when we make another prediction, we can assume that the
computed CDF could fall within a range of possible CDFs that have an area between
them and the nominal simulated CDF equal to d. That is, we have extrapolated d to
the prediction. This is our first example of a probability box, a topic of discussion
below.
We have not included the uncertainty in the experimental measurement of the
QoI in the construction of the validation metric: our definition implicitly assumes
that the observation is exact. If the uncertainty in the observation is small, then this is
308 12 Epistemic Uncertainties: Dealing with a Lack of Knowledge
The previous section discussed using CDFs derived from simulations based on
propagating aleatory uncertainty. Now we consider the case where there are
epistemic and aleatory uncertainties in the QoI. In this case for each possible value
of the epistemic uncertainties, there is a CDF at that value. This arises from the fact
that the epistemically uncertain inputs have a value that is unknown. At that value
there is a CDF of the QoI based on the aleatory uncertainties. Nevertheless, we do
not know which CDF of all possible CDF functions that is.
To estimate the range of possible outcomes due to epistemic uncertainty, we can
use a technique known as second-order sampling. We assume that it is possible to
produce a CDF of the QoI when the epistemically uncertain inputs are fixed; in
practice this is a strong assumption because this involves uncertainty propagation of
all the aleatory uncertainties. We sample from the epistemic uncertainties as though
they were uniform distributions between their minimum and maximum values. For
each sample, we have a CDF of the QoI. If we plot these, we get what is known as
a horsetail plot. The range of CDFs in the horsetail plot is an estimate of the bounds
of the actual CDF. It is an estimate because we have a finite number of values of the
epistemic uncertainties.
A horsetail plot for a normal distribution where the mean and standard deviation
of the distribution are epistemic uncertainties is shown in Fig. 12.3. In the figure
1.00
P(X)
0.75
F(x)
0.50
0.25
P(X)
0.00
−5.0 −2.5 0.0 2.5 5.0
x
We denote the upper bound of the CDFs in the horsetail plot as P (x) because the
true CDF is below this function; the lower bound of the CDFs in the horsetail plot
is P (x) because the true CDF is above this function. These functions are shown
in Fig. 12.3. The functions P (x) and P (x) define an area between them called a
probability box, or p-box for short. Now we are in the position where we have a
range of possible CDFs that could represent the system. The question that arises is
how can we quantify the agreement between the model and experiment in this case.
One possible solution is to generalize the validation metric to include the
discrepancy between the experimental values and the p-box. To this end we can
define the validation metric as
∞
d(Fsim , Fobs ) = D(P (Q), P (Q), Fobs (Q)) dQ, (12.2)
−∞
where
a b
1.00 1.00
0.75 0.75
F(Q)
F(Q)
0.50 0.50
0.25 0.25
0.00 0.00
−5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0
Q Q
c
1.00
0.75
F(Q)
0.50
0.25
0.00
−5.0 −2.5 0.0 2.5 5.0
Q
Fig. 12.4 Three examples of the validation metric d when the CDF of the simulation is a p-box
and there is a single experimental measurement. The shaded area is equivalent to the value of d
for each case. In each figure the p-box is the same, but the observed value differs. (a) d ≈ 0. (b)
d ≈ 0.2. (c) d ≈ 2
uncertainty. The value of d when there is a p-box is illustrated in Fig. 12.4. In the left
panel, the value of d is approximately 0. This does not mean that the simulation is
perfect. Rather, we can conclude that there is no evidence of disagreement between
the simulation and measurement. That is, there are values of the epistemically
uncertain parameters that would agree with the measurement. In the second and
third panels of Fig. 12.4, that value of d is larger and represents the area between
the nearest edge of the p-box and the measurement. As before, the validation metric
can improve as more experimental measurements are added that “agree” with the
data, that is, if d is not already approximately zero.
When p-boxes are involved, the value of d does not contain any information
about the precision of the computational model when epistemic uncertainty is
concluded. As shown in Fig. 12.5, a large p-box, which indicates a large degree
of epistemic uncertainty, could easily have a small value of d because of the large
range of potential results to bound the measurement. Nevertheless, a small p-box
indicative of a model that has a small amount of epistemic uncertainty could have
the same value of d. The value of d cannot pick out the method/process that has
a smaller degree of epistemic uncertainty; that quantification requires different
analysis perhaps by computing the area of the p-box.
12.4 Predictions Under Epistemic Uncertainty 311
a b
1.00 1.00
0.75 0.75
F(Q)
F(Q)
0.50 0.50
0.25 0.25
0.00 0.00
−5.0 −2.5 0.0 2.5 5.0 −3 0 3
Q Q
Fig. 12.5 Demonstration that d = 0 does not mean that the amount of uncertainty in the prediction
is small: in the left plot, the simulation results can agree with the measurement, but epistemic
uncertainty is quite large. (a) d ≈ 0 with low precision. (b) d ≈ 0 with higher precision
The validation metric d measures the agreement between the simulation and
observations. To apply this metric to a prediction, that is, to a new scenario where we
do not have observations, we want to quantify how much we should adjust the CDF
or p-box obtained from a UQ analysis. To aid in this, we consider that the definition
of d yields a quantity that is equal to the average difference between the inverse of
the simulation CDF and the inverse of the observation CDF (or p-boxes). Therefore,
we could add d to both sides of the CDF or p-box obtained from simulations as an
estimate for the possible range of outputs of the prediction. This procedure assumes
that we can extrapolate the result and that d will not be larger in the prediction.
Therefore, given a p-box for the prediction, the adjusted p-box for the prediction
would be defined by
In the case of the CDF, we would apply the shift to the CDF to create a p-box. If the
CDF is given by F (Q), the resulting p-box is
Other types of extrapolation are also possible. For instance, one could compute
the portion of d that is to the left or right of the CDF/p-box and then create a shift
that is not symmetrically applied. For a p-box these shifts would be given by
and
Fig. 12.6 Cantilevered beam with a load of f placed at the edge. The shape of the beam is an
aleatory uncertainty, and the elastic modulus of the beam is an epistemic uncertainty
With these results we can then define P pred (Q) and P pred (Q) using dleft and
dright , respectively. There are obvious benefits in calculating the portion of d to the
left/right of the p-box. The downside is that the structure of the discrepancy may not
hold up upon extrapolation: a model that consistently predicts values too low in one
scenario may not give low predictions in other case.
As an example of adding a p-box to a prediction, we consider the deflection of
an end-loaded cantilevered beam, as shown in Fig. 12.6. The deflection of the beam,
y, is given by the formula
4fL3
y= , (12.8)
Ewh3
where f is a force in Newtons, E is the elastic modulus, and the dimensions L, w,
and h are shown in the figure. Due to manufacturing tolerances, the dimensions of
the beam are normally distributed with
μL = 1 m, σL = 0.05 m,
μw = 0.01 m, σw = 0.0005 m,
and
μh = 0.02 m, σh = 0.0005 m.
For this particular material, the elastic modulus is only known to be in an interval:
E ∈ [69, 100] GPa.
We begin by performing ten measurements of the deflection at f = 75 N
and compare the resulting CDF to the p-box computed from the model given by
Eq. (12.8) in Fig. 12.7. Using the measurements at 75 N, we compute a value of
d ≈ 0.00572 m and dleft ≈ 0.556 m and dright ≈ 0.0001. Using these values of d,
we then predict the deflection of the beam at 100 N by computing the p-box from
12.5 Beyond Interval Uncertainties with Expert Judgment 313
a b
1.00 1.00
0.75 0.75
F(y)
F(y)
0.50 0.50
0.25 0.25
0.00 0.00
0.025 0.050 0.075 0.100 0.00 0.03 0.06 0.09 0.12
y y
c
1.00
0.75
F(y)
0.50
0.25
0.00
0.00 0.03 0.06 0.09
y
Fig. 12.7 Example of using d to adjust p-box for prediction for the cantilevered beam example.
The model was tested at f = 75 N, and the resulting d is extrapolated to predict performance at
f = 100 N by adding d to the p-box and adding dleft and dright to the p-box (the dotted lines). (a)
f = 75 N. (b) f = 100 N adding d. (c) f = 100 N adding dleft , dright
To this point we have only dealt with epistemic uncertainties that have a simple
interval. There are cases where we have more information than just an interval,
but not enough information to have a true distribution. The ideas we use are based
on Dempster-Shafer evidence theory in a simplified form. The approach detailed
discusses how to include information from experts that disagree and only have an
indication of what values of the quantity are more likely.
Consider a parameter θ ∈ [a, b] that represents an epistemic uncertainty in
a model. An expert is solicited to give a basic probability assignment (BPA) to
different ranges of the interval. The BPAs each must be in the range [0, 1] and must
314 12 Epistemic Uncertainties: Dealing with a Lack of Knowledge
0.2
0.09 0.9 0.01 0.5 0.1
0.2
a b a b
Fig. 12.8 Two illustrative BPA structures for θ ∈ [a, b]: the left example is a simple interval
considered to be more probable in the middle with a left skew, and the right example has
overlapping regions and gaps. Note that in both cases the BPAs sum to one
a b a b
Fig. 12.9 Example of combining BPA structures from two experts: the BPAs are divided by the
number of experts to get a single BPA structure
sum to 1. For example, if θ ∈ [0, 1] and the expert believes that values in the middle
of the interval are more likely and the value of θ has a small likelihood of being
close to 1, the BPA might be
⎧
⎪
⎪ θ ∈ [0, 0.35)
⎨0.09
BPA(θ ) = 0.9 θ ∈ [0.35, 0.8] .
⎪
⎪
⎩0.01 θ ∈ (0.8, 1)
This BPA structure is shown in the left part of Fig. 12.8. The BPA may have overlaps
or gaps depending on the situation; such a BPA structure is also shown in Fig. 12.8.
The BPA is then used in second-order sampling by selecting a BPA interval
based on its value (higher BPAs being more likely to be selected) and then picking
a uniform random number in that interval. This procedure will construct a p-box as
before, but the construction is informed by the BPAs assigned by the expert. The
BPAs only inform the sampling of θ and do not give any weighting to the resulting
CDFs in the construction of the p-box. Therefore, in the limit of an infinite number
of samples of θ in second-order sampling, the result will be the same as treating θ
as a simple interval, but in practice when samples are limited, the resulting p-box
will be informed by the expert opinion.
If there are multiple experts that assign BPAs to θ , their knowledge can be
fused by taking the BPAs from each expert and dividing by the number of experts.
This will result in a set of BPAs that sum to 1 and can then be used in second-
order sampling as before. The resulting BPA structure will have high values where
the experts agree while still including information where disagreement exists. An
example of this combination is shown in Fig. 12.9. Here two experts have different
12.6 Kolmogorov-Smirnoff Confidence Bounds 315
BPA structures that are combined by taking the union of the BPA structures and
dividing by 2 (the number of experts).
The resulting BPA after combining experts would be used in second-order
sampling as though the BPA came from a single expert. The properties of the
resulting p-box would be the same as for a single expert.
F(x)
dN
Fig. 12.10 Illustration of δN : the maximum vertical distance between the true CDF F (x), dashed
line, and empirical CDF, FN (x), solid line, with N samples
316 12 Epistemic Uncertainties: Dealing with a Lack of Knowledge
We know that as x → ±∞ that the true and empirical CDF will go to 0 and 1.
This means that the value of δN will be at some finite value of x. It also means that
the difference between the empirical CDF and the true CDF has the character of a
random walk with fixed endpoints: the difference is 0 at the endpoints (±∞) and
a random value between 0 and 1 in between the endpoints. It can be shown√using
the theory of random walks that if FN converges to F as N → ∞, then N δN
converges to a Kolmogorov distribution as N → ∞. The Kolmogorov distribution
has a CDF given by
√ ∞
2π −(2i − 1)2 π 2
FK (x) ≈ exp . (12.10)
x 8x 2
i=1
√
Therefore,
√ if NδN is described by a Kolmogorov distribution, we want to know
when FK ( N δN ) = 1 − α for some confidence α in order to estimate. In other
words, if we wanted√ to know what δN was with 95% confidence (α = 0.05), we
would solve FK ( NδN ) = 0.95. This task is made difficult when we account
for the fact that at small values of N , δN is not well approximated by Eq. (12.10).
Formulas exist for the distribution as a function of N (Marsaglia et al. 2003) and can
be used for determining various confidence levels for δN given a number of samples.
A useful approximate formula for the 95% confidence value of δN is
⎧
⎪ 1.1897N 3/2 +0.00863443N
√
2 +1.04231N
⎪
⎨ −3.893 N +4.32736
5 ≤ N ≤ 50
95%
δN ≈ N2 . (12.11)
⎪
⎪ 1.3581
⎩ √ N > 50
N
Using this formula for δN , we can extend a p-box to account for the finite number
of samples in the construction of the CDFs by defining P̂ and Pˆ as
P̂ (Q) = min P (Q) + δN , 1 (12.12a)
Pˆ (Q) = max P (Q) − δN , 0 . (12.12b)
The extended p-box is constrained by the fact that the range of the CDF is [0, 1].
Other adjustments may also be possible due to physical restrictions (e.g., the value
of Q is always positive or in some range).
To demonstrate the extension of a p-box using δN , we return to the cantilevered
beam example. At f = 75 N, we construct a p-box using second-order sampling.
We use 20 sampled values of the elastic modulus, E, and at each value, we use only
10 samples of the aleatory uncertainties. Using Eq. (12.11) we compute δN 95%
(10) =
ˆ
0.4092477. The resulting distribution and its adjusted p-box, [P̂ , P ], are shown in
Fig. 12.11. Also, in the figure, the p-box with 104 samples used in each CDF is
shown. This example demonstrates that the 95% KS confidence interval does bound
12.7 The Method of Cauchy Deviates 317
1.00
F(y) 0.75
0.50
0.25
0.00
0.03 0.06 0.09
y
Fig. 12.11 A p-box for the cantilevered beam problem at f = 75 N where there are N = 10
samples used to construct the CDFs in second-order sampling (solid, piecewise constant line).
The Kolmogorov-Smirnov 95% confidence interval for the p-box is shown with a dotted line. The
smooth p-box is constructed with 104 samples in each CDF
the true p-box, though near the extremes of the original p-box the confidence interval
is very conservative.
Overall, the KS confidence interval for the p-box is useful when the model is
expensive to evaluate and only rough CDFs can be constructed using second-order
sampling. The results from the method may be large p-boxes, but they can be used
in a wide variety of applications to understand if performance limits or regulations
are potentially at risk in a given scenario.
1 The Witch of Agnesi is the name of the curve y = 1/(1 + x 2 ) studied by Maria Agnesi in
the oldest extant mathematics text written by a woman (Agnesi 1748). This curve was called a
averisera the Latin for “versed sine curve.” This being one letter off of the Italian word avversiera,
a term synonymous with “witch,” it was mistranslated into English. The resemblance between the
representation of the millinery of witches in popular depictions and the graph of the function is
seemingly coincidental.
318 12 Epistemic Uncertainties: Dealing with a Lack of Knowledge
the CDF is
1 1 x
F (x) = + arctan . (12.14)
2 π Δ
The mean and variance of a Cauchy distribution are undefined. The mode of the
distribution is at x = 0. Because the mean and variance are undefined, the central
limit theorem does not apply to a sum of Cauchy random variables: the sum of N
Cauchy random variables will not limit to a normal distribution as N → ∞.
Given a set of N samples, xn , from a Cauchy distribution, the maximum
likelihood estimate of Δ is given by the relation
N
1 N
= . (12.15)
xn2 2
n=1 1+ Δ2
Note that the function on the LHS is monotonically increasing and that the solution
is in the interval Δ ∈ [0, maxn xn ], so that closed root finding methods can be used.
Also, a linear combination of p-independent Cauchy random variables,
x̂ = c1 x1 + c2 x2 + · · · + cp xp ,
where the ci are real numbers and each Cauchy random variable xi has a parameter
Δi , is also a Cauchy random variable with parameter
Q(x, θ̂1 + δθ1 , . . . , θ̂p + δθp ) = Q(x, θ̂1 , . . . , θ̂p ) + δQ, (12.16)
N
1 N
= .
δQ2nm 2
n=1 1+ Δ2m
end for
with ci = ∂Q ∂θi . Using the properties discussed above, if the δθi are Cauchy
distributed, then δQ will be Cauchy distributed with a computable Δ parameter.
Notice that δQ will be a function of x; this makes the linear approximation less
restrictive.
In Algorithm 12.1 a procedure for determining the parameter of a Cauchy-
distributed δQ at M different samples of the aleatory uncertainties x. The algorithm
will require M(N + 1) evaluations of the QoI. Also, each iteration will require
the solution of a single nonlinear equation to determine Δm , the parameter of
the distribution at xm . This equation can be solved with a simple closed method,
such as bisection, and does not require any further evaluations of the QoI. In the
algorithm the values of δθi are scaled so that their values are always between
±Δθi . By the way that we defined δQ and normalized the sampling, the value
of Q(xm , θ ) ∈ [Qm,mid − Δm , Qm,mid + Δm ]. Therefore, the algorithm gives an
estimate of the bounds of Q at a particular value of x.
The value of Δm that we obtain from Algorithm 12.1 is an approximation due
to the finite number of samples. As discussed by Kreinovich and Ferson (2004),
using a finite number of samples to calculate an approximate Δ√will result in an
overestimation of the true range of the QoI by a factor (1 + 2 2/N ), where N
is the number of samples. Therefore, using 50 samples will yield an up to 40%
overestimation of the true interval.
One consideration in using the Cauchy deviates method is the number of
parameters, p. In principle, the range of the interval could be estimated using
numerical differentiation to compute the ci and then use the linear approximation to
compute the range of Q. The number of QoI evaluations to estimate the derivatives
is p + 1, whereas the Cauchy deviates method does not depend on p. Therefore,
when p is large and we can afford to overestimate the interval size, the Cauchy
deviates method is appropriate.
320 12 Epistemic Uncertainties: Dealing with a Lack of Knowledge
1.00
F(Q) 0.75
0.50
0.25
0.00
910.0 912.5 915.0 917.5 920.0
Q
Fig. 12.12 A p-box for the oscillator problem with 1200 epistemic uncertainties. The solid lines
are the p-box derived from the method of Cauchy deviates, and the dashed lines are the result of
second-order sampling using 105 samples of the epistemic uncertainties at each value of x
400
ki
Q(x, ki , mi , ci ) = . (12.18)
i=1 (ki − mi x 2 )2 + ci x 2
does give a slightly smaller p-box than that derived from second-order sampling.
Nevertheless, the Cauchy deviates result required about 100 times fewer evaluations
of the QoI.
12.9 Exercises
∂u ∂u ∂ 2u
+v = D 2 − ωu,
∂t ∂x ∂x
322 12 Epistemic Uncertainties: Dealing with a Lack of Knowledge
for u(x, t) on the spatial domain x ∈ [0, 10] with periodic boundary conditions
u(0− ) = u(10+ ) and initial conditions
1 x ∈ [0, 2.5]
u(x, 0) = .
0 otherwise
6 5
dx dt ωu(x, t).
5 0
This appendix gives the definitions and properties of a variety of typical distributions
for random variables. Most of these distributions are discussed elsewhere in the
text, but having the definitions in a central location can be useful for reference. The
notation used here is the same as can be found in other standard references, except
where indicated otherwise.
This is a discrete distribution where the random variable takes the value of 1 with
probability p and the value of 0 with probability 1 − p. If we consider the toss
of a fair coin, and we assign the outcome of heads the value of x = 1 and tails
x = 0, then x is Bernoulli distributed with p = 0.5. For simplicity we also define
q = 1 − p.
1−p x=0
f (x|p) =
p x=1
⎧
⎪
⎪
⎨0 x<0
F (x|p) = 1 − p 0≤x<1
⎪
⎪
⎩1 x≥1
A.1.3 Properties
• Mean: E[x] = p
• Median:
⎧
⎪
⎪
⎨0 q>p
Median = 0.5 q=p
⎪
⎪
⎩1 q < p.
• Mode:
⎧
⎪
⎪
⎨0 q>p
Mode = {0, 1} q=p
⎪
⎪
⎩1 q < p.
• Variance: pq = p(1 − p)
• Skewness:
1 − 2p
γ1 = √
pq
• Excess kurtosis:
1 − 6pq
Kurt =
pq
The binomial distribution is a discrete distribution that gives the number of binary
events that are successes (i.e., the outcome is 1), out of n ∈ N trials when each
trial has probability p of success. As an example, if I flip a fair coin (p = 0.5) ten
times (n = 10), then the number of heads, x, in those ten tosses will be binomially
A Cookbook of Distributions 325
A.2.1 PMF
n x
f (x|n, p) = p (1 − p)n−x ,
x
A.2.2 CDF
1−p
n
F (x|n, p) = I1−p (n − x, 1 + x) = (n − x) t n−x−1 (1 − t)x dt,
x 0
A.2.3 Properties
• Mean: E[x] = np
• The median for a binomial distribution does not have a simple formula but it lies
between the integer part of np and the value of np rounded up to the nearest
integer, i.e., the median is in between np and np.
• Mode:
⎧
⎪
⎨(n + 1) p
⎪ (n + 1)p is 0 or a noninteger
mode = (n + 1) p and (n + 1) p − 1 (n + 1)p ∈ {1, . . . , n},
⎪
⎪
⎩n (n + 1)p = n + 1
• Variance: np(1 − p)
• Skewness:
1 − 2p
γ1 = √
np(1 − p)
326 A Cookbook of Distributions
• Excess kurtosis:
1 − 6p(1 − p)
Kurt =
np(1 − p)
A.3.1 PMF
λx e−x
f (x|λ) =
x!
A.3.2 CDF
x i
λ
F (x|λ) = e−λ
i!
i=0
A.3.3 Properties
• Mean: E[x] = λ
• The median is greater than or equal to λ − log 2 and less than λ + 13 .
• There are two modes: λ and λ − 1.
• Variance: λ
• Skewness:
1
γ1 = √
λ
• Excess kurtosis:
Kurt = λ−1
A Cookbook of Distributions 327
The normal, or Gaussian, distribution is the most well known continuous distri-
bution. It has two parameters, μ ∈ R and σ 2 > 0, that correspond to the mean
and variance for the distribution. We write a random variable x that is normally
distributed with parameters μ and σ 2 as x ∼ N (μ, σ 2 ).
2
1 − (x−μ)
f (x|μ, σ 2 ) = √ e 2σ 2
2π σ 2
A.4.2 CDF
% &
1 x−μ
F (x|μ, σ 2 ) = 1 + erf √
2 σ 2
A.4.3 Properties
x−μ
z= .
σ
328 A Cookbook of Distributions
the covariance matrix Σ is a symmetric positive definite matrix with the determinant
of the matrix written as |Σ|. A vector that is distributed as a multivariate normal with
mean vector μ and covariance matrix Σ is written as x ∼ N (μ, Σ).
1 1 T −1
f (x|μ, Σ) = exp − (x − μ) Σ (x − μ) .
(2π )k |Σ| 2
A.5.2 CDF
A.5.3 Properties
1 This
name arises because the distribution was popularized by William Sealy Gosset under the
pseudonym “Student” (Student 1908) to hide, for competitive reasons, the fact that it was used on
samples from the beer making process at the Guinness brewery in Dublin, Ireland. Brilliant!
A Cookbook of Distributions 329
distribution is equivalent to the Cauchy distribution (see below). The smaller the
value of ν the thicker the tails in the distribution.
Other than the thick tales, the distribution is also used to model the possible errors
of having a small number of samples from a normal distribution. Given n samples
from a normal distribution, the difference between the sample mean and the true
mean of the distribution is a t-distribution with ν = n − 1.
Γ ν+1 − ν+1
2 x2 2
f (x|ν) = √ ν
1+ ,
νπ Γ 2 ν
A.6.2 CDF
x2
2, 2 ; 2; − ν
1 ν+1 3
1 ν+1 2 F1
F (x|ν) = + xΓ × √
2 2 π ν Γ ν2
A.6.3 Properties
• The median and mode are 0. The mean is also 0 for ν > 1 and undefined for
ν≤1
• The variance has three different cases: it can be undefined, infinite, or finite
depending on ν:
⎧
⎪
⎪ ν≤1
⎨Undefined
Var = ∞ 1<ν≤2
⎪
⎪
⎩ ν
ν>2
ν−2
The logistic distribution resembles a normal distribution but it has thicker tails (i.e.,
the excess kurtosis is not zero). The distribution gets its name from the fact that
its CDF is the logistic function. The logistic distribution has two parameters, the
real-valued μ and the positive, real s.
x−μ
e− s 1 2 x−μ
f (x|μ, s) = 2 = sech .
− x−μ 4s 2s
1+e s s
A.7.2 CDF
1 1 1 x−μ
F (x|μ, s) = x−μ = + tanh .
1 + e− s 2 2 2s
A.7.3 Properties
π2 2
Var = s
3
A Cookbook of Distributions 331
• The skewness is 0.
• The excess kurtosis is 1.2.
2 −1
1 x − x0
f (x|x0 , γ ) = 1+ .
πγ γ
A.8.2 CDF
1 1 x − x0
F (x|x0 , γ ) = + arctan .
2 π γ
The Gumbel distribution is often used to model the maximum of a random variable.
It has two parameters m ∈ R and β > 0. It has positive skew and excess kurtosis.
The CDF has one of the few occurrences of the exponential of an exponential.
1 −(z+e−z ) x−m
f (x|m, β) = e , where z= .
β β
332 A Cookbook of Distributions
A.9.2 CDF
−(z−μ)/β
F (x|m, β) = ee .
A.9.3 Properties
π2 2
Var = β
6
• The skewness is positive:
√
12 6 ζ (3)
γ1 = ≈ 1.14,
π3
The Laplace distribution resembles the normal distribution except that it has an
absolute value in the exponential, rather than the quadratic exponent. It takes two
parameters, m ∈ R and b > 0. It is a symmetric distribution about m and has
nonzero excess kurtosis.
1 − |x−m|
f (x|m, b) = e b
2b
A Cookbook of Distributions 333
A.10.2 CDF
1 x−m
2e x<m
b
F (x|m, b) = 1 − x−m
2 + 2e x≥m
1 b
A.10.3 Properties
Var = 2b2
• The skewness is 0.
• The excess kurtosis is 3.
A uniform random variable is equally likely to take on any value in the interval
[a, b], with b > a. If x is a uniformly-distributed random variable, we write x ∼
U (a, b).
1
x ∈ [a, b]
f (x|a, b) = b−a .
0 otherwise
A.11.2 CDF
⎧
⎪
⎪
⎨0 x<a
F (x|a, b) = x−a
x ∈ [a, b) .
⎪ b−a
⎪
⎩1 x≥b
334 A Cookbook of Distributions
A.11.3 Properties
a + b − 2x
z= .
a−b
The beta distribution describes random variables that take on values in the interval
[−1, 1] and can be described by two parameters α > −1 and β > −1. If z is a
beta-distributed random variable, we write x ∼ B(α, β).
2−(α+β+1) Γ (α + 1) + Γ (β + 1)
f (x|α, β) = (1 + x)β (1 − x)α x ∈ [−1, 1].
α+β +1 Γ (α + β + 1)
Γ (α)Γ (β)
B(α, β) = ,
Γ (α + β)
as
2−(α+β+1)
f (x|α, β) = (1 + z)β (1 − z)α z ∈ [−1, 1].
B(α + 1, β + 1)
A.12.2 CDF
B(x; a, b)
Ix (a, b) = ,
B(a, b)
and
x
B(x; a, b) = t a−1 (1 − t)b−1 dt.
0
A.12.3 Properties
• The mean is
−(α + 1) + (β + 1)b
E[x] = .
α+β +2
• The variance is
4(α + 1)(β + 1)
Var(x) =
(α + β + 2)2 (α + β + 3)
We can scale a beta random variable, z ∼ B(α, β), to have support on the
interval [a, b] by writing
b−a a+b
x= z+ .
2 2
Note: the more common definition for beta random variables uses α = α + 1
and β = β + 1 and has the distribution supported on [0, 1]. In this work we choose
our definition so that the PDF for the standardized gamma random variable is the
weighting function in the orthogonality relation for Jacobi polynomials, which have
a domain of [−1.1].
The gamma distribution describes random variables that take on values on the
positive real line and can be described by two parameters α > −1 and β > 0.
If x is a gamma-distributed random variable, we write x ∼ G (α, β).
336 A Cookbook of Distributions
β (α+1) x α e−βx
f (x|α, β) = , x ∈ (0, ∞), α > −1, β > 0.
Γ (α + 1)
A.13.2 CDF
γ (α + 1, βx)
F (x|α, β) = ,
Γ (α + 1)
A.13.3 Properties
zα e−z
f (z|α) = , z ∈ (0, ∞), α > −1.
Γ (α + 1)
Note: the more common definition for gamma random variables uses α = α + 1
but the same parameter β. In this work we choose our definition so that the
PDF for the standardized gamma random variable is the weighting function in the
orthogonality relation for generalized Laguerre polynomials.
A Cookbook of Distributions 337
β
β (α) x −α−1 e− x
f (x|α, β) = , x ∈ (0, ∞), α > 0, β > 0.
Γ (α)
A.14.2 CDF
Γ α, βx
F (x|α, β) = ,
Γ (α)
A.14.3 Properties
f (x|λ) = λe−λx
A.15.2 CDF
F (x|m, b) = 1 − e−λx
A.15.3 Properties
Agnesi M (1748) Instituzioni analitiche ad uso della gioventú italiana. Nella Regia-Ducal Corte
Barth A, Schwab C, Zollinger N (2011) Multi-level Monte Carlo finite element method for elliptic
PDEs with stochastic coefficients. Numer Math 119(1):123–161
Bastidas-Arteaga E, Soubra AH (2006) Reliability analysis methods. In: Stochastic analysis and
inverse modelling, ALERT Doctoral School 2014, pp 53–77
Bayarri MJ, Berger JO, Cafeo J, Garcia-Donato G, Liu F, Palomo J, Parthasarathy RJ, Paulo
R, Sacks J, Walsh D (2007) Computer model validation with functional output. Ann Stat
35(5):1874–1906
Bernoulli J (1713) Ars conjectandi, opus posthumum. Accedit Tractatus de seriebus infinitis, et
epistola gallic scripta de ludo pilae reticularis. Thurneysen Brothers, Basel
Boyd JP (2001) Chebyshev and fourier spectral methods. Dover Publications, Mineola
Bratley P, Fox BL, Niederreiter H (1992) Implementation and tests of low-discrepancy sequences.
ACM Trans Model Comput Simul 2(3):195–213
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cacuci DG (2015) Second-order adjoint sensitivity analysis methodology (2nd-ASAM) for
computing exactly and efficiently first- and second-order sensitivities in large-scale linear
systems: I. Computational methodology. J Comput Phys 284:687–699
Carlin BP, Louis TA (2008) Bayesian methods for data analysis. Chapman & Hall/CRC texts in
statistical science, 3rd edn. CRC Press, Boca Raton
Carpentier A, Munos R (2012) Adaptive stratified sampling for Monte-Carlo integration of
differentiable functions. In: Advances in neural information processing systems, vol. 25, pp
251–259
Chowdhary K, Dupuis P (2013) Distinguishing and integrating aleatoric and epistemic variation in
uncertainty quantification. ESAIM Math Model Numer Anal 47(3):635–662
Cliffe KA, Giles MB, Scheichl R, Teckentrup AL (2011) Multilevel Monte Carlo methods and
applications to elliptic PDEs with random coefficients. Comput Vis Sci 14(1):3–15
Collaboration OS et al (2015) Estimating the reproducibility of psychological science. Science
349(6251):aac4716
Collier N, Haji-Ali AL, Nobile F, Schwerin E, Tempone R (2015) A continuation multilevel Monte
Carlo algorithm. BIT Numer Math 55(2):1–34
Collins GP (2009) Within any possible universe, no intellect can ever know it all. Scientific
American
Constantine PG (2015) Active subspaces: emerging ideas for dimension reduction in parameter
studies. SIAM spotlights, vol 2. SIAM, Philadelphia. ISBN 1611973864, 9781611973860
Cook AH (1965) The absolute determination of the acceleration due to gravity. Metrologia
1(3):84–114
Denison DG, Mallick BK, Smith AF (1998) Bayesian MARS. Stat Comput 8(4):337–346
Denison DGT, Holmes CC, Mallick BK, Smith AFM (2002) Bayesian methods for nonlinear
classification and regression. Wiley, Chichester
Der Kiureghian A, Ditlevsen O (2009) Aleatory or epistemic? Does it matter? Struct Saf
31(2):105–112
Farrell PE, Ham DA, Funke SW, Rognes ME (2013) Automated derivation of the adjoint of high-
level transient finite element programs. SIAM J Sci Comput 35(4):C369–C393
Faure H (1982) Discrépance de suites associées à un système de numération (en dimension s). Acta
Arith 41(4):337–351
Ferson S, Kreinovich V, Ginzburg L, Myers D, Sentz K (2003) Constructing probability boxes and
dempster-shafer structures. Tech. Rep. SAND2002-4015, Sandia National Laboratories
Ferson S, Kreinovich V, Hajagos J, Oberkampf W, Ginzburg L (2007) Experimental uncertainty
estimation and statistics for data having interval uncertainty. Tech. Rep. SAND2007-0939,
Sandia National Laboratories
Fox BL (1986) Algorithm 647: implementation and relative efficiency of quasirandom sequence
generators. ACM Trans Math Softw 12(4):362–376
Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty
in deep learning. In: International conference on machine learning, pp 1050–1059
Ghanem RG, Spanos PD (1991) Stochastic finite elements: a spectral approach. Springer, Berlin
Giles MB (2013) Multilevel monte carlo methods. In: Monte Carlo and Quasi-Monte Carlo
methods 2012. Springer, Berlin, pp 83–103
Gilks W, Spiegelhalter D (1996) Markov chain Monte Carlo in practice. Chapman & Hall, London
Goh J (2014) Prediction and calibration using outputs from multiple computer simulators. PhD
thesis, Simon Fraser University
Goh J, Bingham D, Holloway JP, Grosskopf MJ, Kuranz CC, Rutter E (2013) Prediction
and computer model calibration using outputs from multifidelity simulators. Technometrics
55(4):501–512
Gramacy RB, Apley DW (2015) Local Gaussian process approximation for large computer
experiments. J Comput Graph Stat 24(2):561–578
Gramacy RB, Lee HKH (2008) Bayesian treed Gaussian process models with an application to
computer modeling. J Am Stat Assoc 103(483):1119–1130
Gramacy RB et al (2007) TGP: an R package for Bayesian nonstationary, semiparametric nonlinear
regression and design by treed Gaussian process models. J Stat Softw 19(9):6
Gramacy RB, Niemi J, Weiss RM (2014) Massively parallel approximate Gaussian process
regression. SIAM/ASA J Uncertain Quantif 2(1):564–584
Gramacy RB, Bingham D, Holloway JP, Grosskopf MJ, Kuranz CC, Rutter E, Trantham M,
Drake RP et al (2015) Calibrating a large computer experiment simulating radiative shock
hydrodynamics. Ann Appl Stat 9(3):1141–1168
Griewank A, Walther A (2008) Evaluating derivatives: principles and techniques of algorithmic
differentiation, vol 105. SIAM, Philadelphia
Gunzburger MD, Webster CG, Zhang G (2014) Stochastic finite element methods for partial
differential equations with random input data. Acta Numer 23:521–650
Haldar A, Mahadevan S (2000) Probability, reliability, and statistical methods in engineering
design. Wiley, New York
Halpern JY (2017) Reasoning about uncertainty. MIT Press, Cambridge
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data Mining,
Inference, and Prediction, 2nd edn. Springer Science & Business Media, New York
Higdon D, Kennedy M, Cavendish JC, Cafeo JA, Ryne RD (2004) Combining field data and
computer simulations for calibration and prediction. SIAM J Sci Comput 26(2):448
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems.
Technometrics 12(1):55–67
References 341
Holloway JP, Bingham D, Chou CC, Doss F, Drake RP, Fryxell B, Grosskopf M, van der Holst B,
Mallick BK, McClarren R, Mukherjee A, Nair V, Powell KG, Ryu D, Sokolov I, Toth G, Zhang
Z (2011) Predictive modeling of a radiative shock system. Reliab Eng Syst Saf 96(9):1184–
1193
Holtz M (2011) Sparse grid quadrature in high dimensions with applications in finance and
insurance. Lecture notes in computational science and engineering, vol 77. Springer, Berlin
Humbird KD, McClarren RG (2017) Adjoint-based sensitivity analysis for high-energy density
radiative transfer using flux-limited diffusion. High Energy Density Phys 22:12–16
Humbird K, Peterson J, McClarren R (2017) Deep jointly-informed neural networks.
arXiv:170700784
John LK, Loewenstein G, Prelec D (2012) Measuring the prevalence of questionable research
practices with incentives for truth telling. Psychol Sci 23(5):524–532. https://doi.org/10.1177/
0956797611430953
Jolliffe I (2002) Principal component analysis. Springer series in statistics. Springer, Berlin
Jones S (2009) The formula that felled Wall St. The Financial Times
Kalos M, Whitlock P (2008) Monte Carlo methods. Wiley-Blackwell, Hoboken
Karagiannis G, Lin G (2017) On the Bayesian calibration of computer model mixtures through
experimental data, and the design of predictive models. J Comput Phys 342:139–160
Kennedy MC, O’Hagan A (2000) Predicting the output from a complex computer code when fast
approximations are available. Biometrika 87(1):1–13
Knupp P, Salari K (2002) Verification of computer codes in computational science and engineering.
Discrete mathematics and its applications. CRC Press, Boca Raton
Kreinovich V, Ferson SA (2004) A new Cauchy-based black-box technique for uncertainty in risk
analysis. Reliab Eng Syst Saf 85(1–3):267–279
Kreinovich V, Nguyen HT (2009) Towards intuitive understanding of the Cauchy deviate method
for processing interval and fuzzy uncertainty. In: Proceedings of the 2015 conference of the
international fuzzy systems association and the european society for fuzzy logic and technology
conference, pp 1264–1269
Kreinovich V, Beck J, Ferregut C, Sanchez A, Keller G, Averill M, Starks S (2004) Monte-
Carlo-type techniques for processing interval uncertainty, and their engineering applications.
In: Proceedings of the workshop on reliable engineering computing, pp 15–17
Kurowicka D, Cooke RM (2006) Uncertainty analysis with high dimensional dependence mod-
elling. Wiley, Chichester
Lahman S (2017) Baseball database. http://wwwseanlahmancom/baseball-archive/statistics
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Ling J (2015) Using machine learning to understand and mitigate model form uncertainty in
turbulence models. In: 2015 IEEE 14th international conference on machine learning and
applications (ICMLA). IEEE, Piscataway, pp 813–818
Lyness JN, Moler CB (1967) Numerical differentiation of analytic functions. SIAM J Numer Anal
4(2):202–210
Marsaglia G, Tsang WW, Wang J (2003) Evaluating Kolmogorov’s distribution. J Stat Softw
8(18):1–4. https://doi.org/10.18637/jss.v008.i18
McClarren RG, Ryu D, Drake RP, Grosskopf M, Bingham D, Chou CC, Fryxell B, van der Holst
B, Holloway JP, Kuranz CC, Mallick B, Rutter E, Torralva BR (2011) A physics informed
emulator for laser-driven radiating shock simulations. Reliab Eng Syst Saf 96(9):1194–1207
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state
calculations by fast computing machines. J Chem Phys 21(6):1087–1092. https://doi.org/10.
1063/1.1699114
National Academy of Science (2012) Building confidence in computational models: the science of
verification, validation, and uncertainty quantification. National Academies Press, Washington
Oberkampf WL, Roy CJ (2010) Verification and validation in scientific computing, 1st edn.
Cambridge University Press, New York
Owhadi H, Scovel C, Sullivan TJ, McKerns M, Ortiz M (2013) Optimal uncertainty quantification.
SIAM Rev 55(2):271–345
342 References
Owhadi H, Scovel C, Sullivan T (2015) Brittleness of Bayesian inference under finite information
in a continuous world. Electron J Stat 9(1):1–79
Peterson J, Humbird K, Field J, Brandon S, Langer S, Nora R, Spears B, Springer P (2017) Zonal
flow generation in inertial confinement fusion implosions. Phys Plasmas 24(3):032702
Rackwitz R, Flessler B (1978) Structural reliability under combined random load sequences.
Comput Struct 9(5):489–494
Raissi M, Karniadakis GE (2018) Hidden physics models: machine learning of nonlinear partial
differential equations. J Computat Phys 357:125–141
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press,
Cambridge
Roache PJ (1998) Verification and validation in computational science and engineering. Hermosa
Publishers, Albuquerque
Robert C, Casella G (2013) Monte Carlo statistical methods. Springer Science & Business Media,
New York
Roberts GO, Gelman A, Gilks WR (1997) Weak convergence and optimal scaling of random walk
metropolis algorithms. Ann Appl Probab 7(1):110–120
Saltelli A, Ratto M, Andres T, Campolongo F, Cariboni J, Gatelli D, Saisana M, Tarantola S (2008)
Global sensitivity analysis: the primer. Wiley, Chichester
Saltelli A, Annoni P, Azzini I, Campolongo F, Ratto M, Tarantola S (2010) Variance based
sensitivity analysis of model output. Design and estimator for the total sensitivity index.
Comput Phys Commun 181(2):259–270
Santner TJ, Williams BJ, Notz WI (2013) The design and analysis of computer experiments.
Springer Science & Business Media, New York
Schilders WH, Van der Vorst HA, Rommes J (2008) Model order reduction: theory, research
aspects and applications, vol 13. Springer, Berlin
Sobol IM (1967) On the distribution of points in a cube and the approximate evaluation of integrals.
USSR Comput Math Math Phys 7(4):86–112
Spears BK (2017) Contemporary machine learning: a guide for practitioners in the physical
sciences. arXiv:171208523
Stein M (1987) Large sample properties of simulations using latin hypercube sampling. Techno-
metrics 29(2):143
Stripling HF, McClarren RG, Kuranz CC, Grosskopf MJ, Rutter E, Torralva BR (2013) A
calibration and data assimilation method using the Bayesian MARS emulator. Ann Nucl Energy
52:103–112
Student (1908) The probable error of a mean. Biometrika 6:1–25
Tate DR (1968) Acceleration due to gravity at the national bureau of standards. J Res Natl Bur
Stand Sect C Eng Instrum 72C(1):1
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B
(Methodological) 58(1):267–288
Townsend A (2015) The race for high order Gauss–Legendre quadrature. SIAM News, pp 1–3
Trefethen LN (2013) Approximation theory and approximation practice. Other titles in applied
mathematics. SIAM, Philadelphia
Wagner JC, Haghighat A (1998) Automated variance reduction of Monte Carlo shielding calcula-
tions using the discrete ordinates adjoint function. Nucl Sci Eng 128(2):186–208
Wang Z, Navon IM, Le Dimet FX, Zou X (1992) The second order adjoint analysis: theory and
applications. Meteorol Atmos Phys 50(1–3):3–20
Wilcox LC, Stadler G, Bui-Thanh T, Ghattas O (2015) Discretely exact derivatives for hyperbolic
PDE-constrained optimization problems discretized by the discontinuous Galerkin method. J
Sci Comput 63(1):138–162
Wolpert DH (2008) Physical limits of inference. Phys D Nonlinear Phenom 237(9):1257–1281
Zheng W, McClarren RG (2016) Emulation-based calibration for parameters in parameterized
phonon spectrum of ZrHx in TRIGA reactor simulations. Nucl Sci Eng 183(1):78–95
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser
B (Stat Methodol) 67(2):301–320
Index
A sampling, 71
Adjoint Marshall-Olkin algorithm, 73
linear, steady-state equations, 130 t, 61, 63
nonlinear, time-dependent equations, 139 Correlation
sensitivity formula, 134, 139 Kendall’s tau, 56, 57, 61, 62, 66, 67, 69, 70,
Advection-diffusion-reaction equation, 73–75, 89, 90
12, 100, 106, 131, 135, matrix, 54, 101
139, 248 Pearson, 54, 55, 57, 58, 60, 89, 90
Aleatory uncertainties, 14 Spearman, 55, 57, 64, 89, 90
Archimedes of Syracuse, 66 Covariance, 33
Automatic differentiation, 108 Cross validation, 120
leave-one-out, 269
Cumulative distribution function, 20
B
Bayesian statistics D
Bayes’ theorem, 45 Design of computer experiments, 155
linear regression, 258 Determinism, 5
Black-Scholes equation, 224, 232 Distribution
Bernoulli, 23
beta, 203–206, 208
C binomial, 46
Calibration, 276 Cauchy, 55, 318
simple, 277 exponential, 38
using Markov Chain Monte Carlo, 285 gamma, 210, 213, 214
Coca-Cola, 225 Gumbel, 153
Copula logistic, 28
Archimedean, 64, 74 multivariate normal, 33
Clayton, 68 normal, 21, 191, 194, 195
definition, 59 t, 61
Fréchet, 64 uniform, 28, 198–200
Frank, 66, 68 Duck-billed platypus, 28
independent, 60
multivariate, 72 E
normal, 60, 64, 72 Emoji, 70
blamed for financial crisis, 61 Emulator, see Surrogate model
T
S
Tail dependence, 58, 60, 61, 63, 64, 67–69, 73,
Scaled sensitivity coefficients, 97, 101, 106,
89
114
Taylor series, 96
Sensitivity index, 97, 101
Singular value decomposition, 76–79, 82
uncovering outliers with, 83 V
Skewness, 26 Validation metric
sklearn, 125, 274 definition as Minkowski L1 metric, 306
Solution verification, 6, 305 epistemic uncertainty in, 309
Spectral projection, 189 Variance, 26
applied to PDE, 217, 219 first-order sensitivity estimate, 98, 99, 103,
beta, 203–206, 208 104
curse of dimensionality, 223 Volatility, 224
gamma, 210, 213, 214
issues with, 220
multi-dimensional, 222–224 W
normal, 194, 195 Witch of Agnesi, 254, 317