Ankenman 2008

Proceedings of the 2008 Winter Simulation Conference
S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds.
STOCHASTIC KRIGING FOR SIMULATION METAMODELING
Bruce Ankenman
Barry L. Nelson
Jeremy Staum
Department of Industrial Engineering & Management Sciences

Northwestern University
Evanston, IL, U.S.A.
ABSTRACT Using simulation to construct metamodels (models

of the simulation model) is not new (see Barton and
We extend the basic theory of kriging, as applied to the Meckesheimer 2006 for a review). Starting with classi-
design and analysis of deterministic computer experiments, cal response-surface modeling in statistics (e.g., Myers and
to the stochastic simulation setting. Our goal is to provide Montgomery 2002), simulation researchers have adapted
flexible, interpolation-based metamodels of simulation out- experiment designs for linear regression models to account
put performance measures as functions of the controllable for dependence within a replication for steady-state sim-
design or decision variables. To accomplish this we charac- ulations (e.g., Law and Kelton 2000); to permit the use
terize both the intrinsic uncertainty inherent in a stochastic of common random numbers (CRN) and antithetic variates
simulation and the extrinsic uncertainty about the unknown across design points (e.g., Schruben and Margolin 1978,
response surface. We use tractable examples to demonstrate Nozari et al. 1987, Tew and Wilson 1992, 1994); and to
why it is critical to characterize both types of uncertainty, de- compensate for the strong relationship between response
rive general results for experiment design and analysis, and variance and customer load in queueing simulations (e.g.,
present a numerical example that illustrates the stochastic Cheng and Kleijnen 1998, Yang, Ankenman and Nelson
kriging method. 2007). However, linear regression models (that are usu-
ally polynomials in the design variables and linear in their
1 INTRODUCTION unknown coefficients) tend to fit well locally but do not
provide the sort of robust global maps we desire. Nonlin-
Discrete-event simulation is a general-purpose tool for an- ear models based on queueing theory work very well for
alyzing dynamic, stochastic systems. Virtually any level of queueing simulations, but require domain knowledge of the
detail can be modeled and any performance measure es- problem context and specialized fitting algorithms.
timated, which explains simulation’s popularity. However, We are interested in more general-purpose approaches
simulation models are often tedious to build, need substan- that assume less structure than linear or queueing-specific
tial data to parameterize, and require significant time to run, nonlinear models; that tend to be more resistant to overfitting
particularly when there are many alternatives to evaluate. than general interpolators (e.g., neural networks, see for
The objective of the methodology described in this paper instance Sabuncuoglu and Touhami 2002); that facilitate
is to get more benefit from a simulation investment. The sequential, adaptive experimental design rather than fixed,
specific context we have in mind is when time to exercise a priori designs; and that can provide statistical inference
the simulation model in advance of the decision making about when a good fit is obtained. We also want to account
it will support is relatively plentiful, but decision-making for the reality that the simulation output is stochastic, with
or decision-maker time is relatively scarce or expensive. variance that usually changes significantly across the design
Therefore, rather than executing a simulation run whenever space.
a “what if” question is posed, or trying to anticipate every To satisfy these requirements we extend the kriging
scenario of interest in advance, we use the simulation to methodology that is popular, and has been highly success-
“map” the performance response surfaces of interest as ful, in the design and analysis of (deterministic) computer
functions of the controllable design or decision variables. experiments (DACE). DACE methodology is particularly
Ideally, these response surface maps provide the fidelity of well suited for systematically reducing uncertainty about
the full simulation model with the ease of use of, say, a the unknown response surface as experiments (computer
spreadsheet model. runs at different design settings) are performed and leads to
978-1-4244-2708-6/08/$25.00 ©2008 IEEE 362

Ankenman, Nelson and Staum
interpolation-based models. Our central contribution is to unknown response surface as

fully account for the sampling variability that is inherent to
a stochastic simulation. We show that correctly accounting Y(x) = f(x)> β + M(x) (2)
for both sampling and response-surface uncertainty has an
impact on experiment design, response-surface estimation where M is a realization of a mean 0 random field; that is,
and inference. we think of M as being randomly sampled from a space of
In the next section we describe our extended metamodel functions mapping ℜd → ℜ. The functions in this space are
under the special case that all model parameters are known; assumed to exhibit spatial correlation, which means that
this setting allows us to demonstrate why the extension is values M(x) and M(x0 ) will tend to be similar if x and x0
critical without cluttering the discussion with estimation are close to each other in space. We refer to the stochastic
issues, which are resolved in Section 3. A numerical illus- nature of M as extrinsic uncertainty, since it is imposed
tration and conclusions close the paper in Sections 4 and 5, on the problem (not intrinsic to it) to aid in developing a
respectively. metamodel. This paradigm embeds a deterministic problem
into a probabilistic framework so that statistical concepts
2 THE METAMODEL such as mean squared error (MSE) of estimation can be
brought to bear. Statistical inference about Y(x) at values
We describe our approach by refining a sequence of of x not simulated can aid experiment design and provide
models. We are interested in modeling an unknown estimates of the metamodel’s precision, a feature we want
performance-measure surface (or surfaces) y(x), where to exploit.
x = (x1 , x2 , . . ., xd )> is a vector of design variables and We argue that the following model is more useful
y(x) is a deterministic function of x. For instance, in a than (1) or (2) for representing a stochastic simulation’s
semiconductor fabrication simulation x might represent the output on replication j at design point x:
release rates of d products and y could be the steady-state
mean cycle time of product 1 (however, y need not be a Y j (x) = f(x)>β + M(x) + ε j (x). (3)
mean).
The classical approach is to assume that the observed The intrinsic noise ε1 (x), ε2 (x), . . . at a design point x is
response obtained from the jth simulation replication at x naturally independent and identically distributed across repli-
is described by the model cations, but we allow the possibility that V(x) ≡ Var[ε (x)]
is not constant and that Corr[ε j (x), ε j (x0 )] > 0 to model the
Y j (x) = f(x)> β + ε j (x) (1) effect of CRN. The intent of CRN is to reduce the variance of
estimated differences through inducing positive correlation
where f(x) is a vector of known functions of x, β is a across design points by driving their simulations with the
vector of unknown parameters of compatible dimension, and same sequence of pseudorandom numbers (see, for instance,
ε j (x) has mean 0 and represents the sampling variability Law and Kelton 2000). Later we propose simultaneously
inherent in a stochastic simulation. The distribution of modeling M and V, which is a central contribution of this
ε j (x), and in particular its variance, may depend on x, paper.
although this dependence is often ignored. We refer to ε In our setting an experiment design consists of pairs
as intrinsic uncertainty, because it comes from the nature (xi , ni), i = 1, 2, . .., k, where ni is the number of simulation
of the stochastic simulation itself. An experiment design replications taken at design setting xi . Let the sample mean
specifies settings of x at which to observe Y (x), and the at xi be
number of replications to obtain at each x. In this paper we ni
1
primarily address the replication setting (as opposed to the Y¯ (xi ) =
ni ∑ Y j (xi) (4)
single-run experiment design sometimes used in steady-state j=1
simulation). >
Now consider the following thought experiment: Sup- and let Y¯ = Y¯ (x1 ), Y¯ (x2 ), . . ., Y¯ (xk ) .
pose that the response y(x) could be observed without noise, We want a metamodel that predicts the response Y(x0) ≡
but we are still interested in developing a metamodel after f(x0 )> β + M(x0 ) at any x0 , simulated or not. Until further
observing y(x) at a few design points x. This problem notice we only consider the case f(x0 )> β = β0 (that is,
is treated in the DACE literature (Kennedy and O’Hagan just a constant term representing the overall surface mean),
2000, Sacks et al. 1989, Stein 1999, Santner et al. 2003). A because this model has tended to be the most useful in
remarkably successful approach is to cast this deterministic practice for DACE.
problem into a statistical framework by representing the
363
As is typical in spatial correlation models, we consider where τ 2 can be interpreted as the variance of M(x) for
linear predictors of the form all x, and RM is the correlation which depends only on
x − x0 and may be a function of some unknown parameters
λ0 (x0 ) + λ (x0 )> Y¯ (5) θ . Further, we will require that RM (x − x0 ; θ ) → 0 as the
distance between x and x0 goes to infinity, and RM (0; θ ) = 1.
where λ0 (x0 ) and λ (x0 ) are weights that depend on x0 and Use of kriging for metamodeling in stochastic simu-
are chosen to give the predictor good properties, such as lation was first mentioned by Mitchell and Morris (1992),
minimum MSE for predicting Y(x0 ) = β0 + M(x0 ). Later, but has only been explored in depth by Kleijnen and his
when we make Gaussian assumptions on the intrinsic and collaborators; the papers most closely related to our work
extrinsic uncertainty, this form drops out as the best predictor, are van Beers and Kleijnen (2003) and Kleijnen and van
linear or otherwise. Beers (2005) (see also Biles et al. 2007 and van Beers
Let ΣM (x, x0) = Cov[M(x), M(x0)] be the covariance and Kleijnen 2007). The central idea in these papers is to
implied by the extrinsic spatial correlation model, let first model out any trend using least squares or generalized
ΣM be the k × k covariance matrix across all design least-squares techniques, and then to apply kriging to some
points x1 , x2, . . ., xk , and let ΣM(x0 , ·) be the k × 1 vec- form of standardized residuals. They do not incorporate a
tor (Cov[M(x0 ), M(x1 )], . . ., Cov[M(x0 ), M(xk )])> . Also model of the intrinsic uncertainty, which means that they
let Σε beh the k × k covariance matrixi with (h, i) ele- cannot be used for the sort of adaptive design we desire
ment Cov ∑ j=1
nh n
ε j (xh )/nh , ∑ j=1
i
ε j (xi )/ni across all design that jointly considers the placement of design points and
simulation effort. To illustrate the insights gained from our
points xh and xi .
approach, we examine a tractable example in detail.
To illustrate the key issues, suppose that ΣM, Σε and
Consider the case of k = 2 design points x1 and x2 with
β0 are known (clearly, in a real application they need to be
equal numbers of replications n1 = n2 = n. Suppose that
estimated, which is a contribution of our research). We can
show that the MSE-optimal predictor of the form (5) is
2 1 r12 2 r0
ΣM = τ and ΣM(x0 , ·) = τ .
r12 1 r0
b 0 ) = β0 + ΣM(x0 , ·)> [ΣM + Σε ]−1 Y¯ − β0 1k
Y(x (6)
The term τ 2 > 0 represents the extrinsic variance of M,
where 1k is the k ×1 vector of ones. We refer to this predictor
r12 is the extrinsic correlation between M(x1 ) and M(x2 ),
as stochastic kriging. Notice that the only computationally
and r0 is the extrinsic correlation between the point to be
intensive operation in evaluating (6) is the matrix inversion,
predicted Y(x0 ) and each of the design points (these usually
which is done once since it is independent of x0 . If there were
would not be equal). Typically we expect r12 and r0 to be
no intrinsic uncertainty due to simulation, Σε would vanish
positive.
and (6) would reduce to the standard kriging estimator
For the intrinsic uncertainty due to sampling at a design
that matches the data Y¯ at design points, and predicts
point, suppose
Y(x0 ) by a weighted average of Y¯ elsewhere (e.g., Cressie
1993). Equation (6) clearly shows that the presence of V 1 ρ
Σε =
intrinsic uncertainty impacts the prediction everywhere on n ρ 1
the surface. We can also show that the optimal MSE is
where in this example the variance at the design points
MSE ?
= ΣM(x0 , x0 ) − ΣM(x0 , ·) [ΣM + Σε ] ΣM(x0 , ·)
> −1 is a common V > 0, and −1 ≤ ρ ≤ 1 represents intrinsic
h i dependence between the design points; for instance, we
= ΣM (x0 , x0) − ΣM (x0 , ·)>Σ−1 Σ (x , ·) would expect ρ > 0 if we used CRN. Substituting these
M M 0
b 0) =
into (6)–(7), the MSE-optimal predictor of Y(x0 ) is Y(x
+ ΣM(x0 , ·)>Ξ ΣM(x0 , ·) (7)
¯
2τ 2 r0 Y (x1 ) + Y¯ (x2 )
where Ξ is a positive definite matrix that depends on Σε β0 + − β0
(1 + r12 )τ 2 + (1 + ρ )V/n 2
and ΣM. The term in brackets in (7) is the usual kriging (9)
MSE; the additional term is positive, showing that intrinsic with MSE
uncertainty inflates MSE.
To actually estimate a stochastic kriging metamodel 2 2τ 2 r02
MSE = τ 1 −
?
. (10)
from data we need ΣM(·, ·) to have more structure. In (1 + r12 )τ 2 + (1 + ρ )V/n
particular, we will assume that M is second-order stationary,
meaning that Equation (9) shows that stochastic kriging is a bit like
a control-variate estimator (e.g., Nelson 1990), where a
ΣM(x, x0) = τ 2 RM (x − x0 ; θ ) (8) correction term is applied to the mean based on the deviation
364
of the observed responses from their expectations and the Corr(M(xi ), M(xh )) depends only on xi −xh . The normality
strength of the correlation (r0 ) between the design points of ε j (x) could be anticipated if, for instance, the output of
and the response to be predicted. each replication was itself the average of a large number
The MSE (10) is even more revealing: MSE is decreas- of more basic random variables (e.g., the average of hun-
ing in r02 , meaning the stronger the correlation between the dreds of individual product cycle times in the semiconductor
design points and the response at x0 , the smaller the MSE fabrication example).
because the design points provide more information. How- Under Assumption 1, (Y(x0 ), Y¯ (x1 ), . . ., Y¯ (xk )) is
ever, MSE is increasing in r12 , since the more correlated multivariate normal and the stochastic kriging predictor (6)
the design points themselves are, the less additional infor- is the conditional expectation of Y(x0 ) given Y¯ , making it
mation they provide. Intrinsic uncertainty, V, also increases the minimum MSE predictor (Santner et al. 2003, Theorem
MSE, but can be reduced by increasing the sample size n. 3.2.1).
Most interesting is that the assumed impact of CRN, which We begin by assessing the impact of estimating the
is to make ρ > 0, increases MSE relative to independent intrinsic variance Σε , then derive the maximum likelihood
sampling. This may seem surprising because in standard estimators given Σε and conclude by addressing experiment
linear regression models such as (1) the impact of CRN is design.
to reduce the variance of the slope coefficients. However,
the stochastic kriging predictor is a weighted average of 3.1 Estimating the Intrinsic Variance
the outcomes from the design points, and CRN inflates the
variance of averages. In fact, (10) shows that antithetic In this section we confront the fact that V is typically
variates (e.g., Law and Kelton 2000), which tries to induce unknown. In summary, our approach is as follows:
ρ < 0, would reduce MSE. Because we are interested in sequential experiment
There are two messages in this example: (i) In stochastic design, we need a model for V. To obtain it, we will
kriging there is an important interplay between the placement assume V is also represented by a spatial correlation model
of design points (through their extrinsic correlation with
each other) and the simulation effort at the design points V(x) = σ 2 + Z(x) (11)
(through their intrinsic variance); and (ii) CRN will not be
helpful for predicting Y(x) in general. where Z is a mean zero stationary random field that is
independent of M. Denote the estimated model by V(x). b
3 PARAMETER ESTIMATION Since V(xi ) is not observable, even at the design points,
we let
To actually apply stochastic kriging for simulation meta-
ni
modeling, a method for estimating the unknown parameters 1 2
is required. The DACE literature contains several methods
S 2 (xi ) =
ni − 1 ∑ Y j (xi ) − Y¯ (xi ) (12)
j=1
and refinements when there is only extrinsic uncertainty;
see for instance Santner et al. (2003) and Fang et al. (2006). stand in for it. Under Assumption 1, S 2 (xi ) is strongly
Here we focus on extending the most well-known method— consistent for V(xi) and has a scaled chi-squared distribution.
maximum likelihood—to allow for intrinsic uncertainty. Because we observe S 2 , not V, there is extrinsic and
Recall that our model for the simulation output is intrinsic uncertainty, just as in estimating β0 + M from
Y¯ . However, since we are not interested in V except as it
Y j (x) = β0 + M(x) + ε j (x). impacts our design and analysis, we will ignore the intrinsic
uncertainty and fit model (11) using standard kriging as if S 2
We now adopt the following b i ) = S 2 (xi ) at design points
had no noise. Therefore, V(x
Assumption 1 The random field M is a station-
xi since standard kriging interpolates the response at the
ary Gaussian random field, and ε1 (xi ), ε2(xi ), . . . are i.i.d.
design points exactly. We will show that the consequences
N(0, V(xi)), independent of ε j (xh ) for all j and h 6= i (i.e.,
of estimating V in this way are slight as long as the ni are
no CRN), and independent of M.
not too small.
That M is a stationary Gaussian random field is a
We do not describe estimation of model (11) from
standard assumption in DACE. We refer the reader to,
S 2 (x1 ), S 2 (x2 ), . . ., S 2 (xk ) here, since no new ideas are
for instance, Santner et al. (2003, §2.3.2) for technical
introduced. In the numerical illustration in Section 4 we
details, but in brief this assumption implies that for any
cite a specific approach.
finite collection of design points x1 , x2, . . ., xk the random
Our first key result is that estimating Σε in this way
vector (M(x1 ), M(x2 ), . . ., M(xk )) has a multivariate nor-
introduces no prediction bias.
mal distribution with constant marginal mean 0, variance
τ 2 > 0, and positive definite correlation matrix RM such that
365
Theorem
n 1 Let o
b 1 )/n1 , V(x
bε = Diag V(x
Σ b 2 )/n2 , . . ., V(x
b k )/nk and define
h i−1
b
b 0 ) = β0 + ΣM (x0 , ·)> ΣM + Σ
Y(x bε Y¯ − β0 1k . (13)

b
b 0 ) − Y(x0 ) = 0.
If Assumption 1 holds, then E Y(x
As a consequence of Theorem 1, our key concern is
how much variance inflation occurs when V is estimated.
Clearly if the ni are large enough there is little inflation.
But how large do they have to be? To answer this question
we consider another tractable example:
Suppose that
 
1 r ··· r
 r 1 ··· r 
 
ΣM = τ 2  .. .. . . .. ,
 . . . . 
r r ··· 1 Figure 1: MSE inflation as a function of γ = V/τ 2 when
n = 10 and correlation r0 is 95% of its maximum possible
ΣM (x0 , ·) = τ 2 (r0 , r0, . . ., r0)> with r0 , r ≥ 0, and Σε = value.
(V/n)I. This represents a situation in which the extrin-
sic correlations among the design points are all equal and
the design points are equally correlated with the point we We assess the inflation by evaluating the ratio of (15)
wish to predict, which might be (approximately) plausible if to (14) numerically. The ratio is largest when n is small
the design points are widely separated, say at the extremes and r0 and r are large, so Figure 1 shows the inflation as
of the region of interest, while x0 is central. Note that a function of γ = V/τ 2 for n = 10, r = 0, 0.1, 0.2 and r0
for the covariance matrix of (Y(x0 ), Y¯ (x1 ), . . ., Y¯ (xk ))> to at 95% of the maximum value it can take. Even with this
be positive definite we must have r02 < 1/k + r(k − 1)/k. small value of n the inflation is slight over an extreme range
The structure of Σε arises because we assume the intrinsic of γ values. As n increases the inflation vanishes. This
variance is the same across all design points and n repli- suggests that the penalty for estimating V will typically be
cations have been allocated to each of them. Suppose also small.
that we have an estimator V b ∼ V χ 2 /(n − 1), meaning
n−1
b
that (n − 1)V/V has a chi-squared distribution. We use 3.2 Maximum Likelihood Estimation
a common estimator of the intrinsic variance rather than
estimating it at each design point individually to make the In this section we derive the maximum likelihood estimators
example tractable. Finally, let γ = V/τ 2 be the ratio of the of (β0 , τ 2 , θ ) assuming Σε is known. To reduce notation,
intrinsic variance to the extrinsic variance, which is (roughly let Vi ≡ V(xi )/ni ; thus, Σε = Diag{V1 , V2, . . ., Vk }. Also
speaking) a measure of the sampling noise relative to the define RM (θ ) to be correlation matrix of M across the design
response surface variation. points.
b 0 ), For a fixed experiment design {(xi , ni ), i = 1, 2, .. ., k},
For this example we can show that the MSE of Y(x
and under Assumption 1, the log likelihood function of
the stochastic kriging predictor with V known, is
(β0 , τ 2 , θ ) is
!
2 kr02 `(β0 , τ 2 , θ ) = (16)
MSE = τ?
1− . (14)
1 + (k − 1)r + γn h i 1
− ln (2π )k/2 − ln |τ 2 RM (θ ) + Σε |
2
b
b 0 ) obtained by substituting 1 ¯ > 2 −1
On the other hand, the MSE of Y(x − Y − β0 1k τ R M (θ ) + Σ ε Y¯ − β0 1k .
b for V is MSE =
V 2
 

1 + (k − 1)r + γn kr02 2kr02
If the Σε terms are removed then this is the log likelihood
 
τ 2 E 1 + −  . function for kriging when M is a Gaussian random field.
b 2 1 + (k − 1)r + γn V
b
1 + (k − 1)r + nγ V
V
V We have been intentionally vague about the covariance
(15)
366
function RM (θ ), because we want the results to be general, Let n> = (n1 , n2, . . ., nk ). Then our goal is to
but when we apply stochastic kriging later we will use a Z
standard model from the DACE literature. minimize IMSE(n) = MSE(x0 ; n) dx0 (20)
Finding the maximum likelihood estimators requires x0 ∈X
simultaneously solving subject to:
n> 1k ≤ N (21)
∂ `(β0 , τ 2, θ ) ∂ `(β0, τ 2, θ ) , τ 2, θ )
∂ `(β0
=0 =0 =0 ni ∈ Z + (22)
∂ β0 ∂ τ2 ∂θ
(17)
b 2 b where the integrand MSE(x0 ; n) = ΣM(x0 , x0 ) −
for (β0 , τb , θ ) which is no more computationally difficult
ΣM (x0 , ·)> [ΣM + Σε (n)]−1 ΣM(x0 , ·) and Σε (n) =
than when Σε is not present, and in fact is more likely to
Diag {V(x1 )/n1 , V(x2)/n2 , . . ., V(xk )/nk }. In words,
be numerically stable.
we minimize the IMSE for the MSE-optimal stochastic
To summarize, given the data Y j (xi ), j = 1, 2, . .., ni, i =
kriging estimator as a function of the number of replications
1, 2, . .., k, a stochastic kriging metamodel is obtained as
allocated to each design point. To obtain an approximate
follows:
solution to this problem, we relax the integrality con-
1. Estimate b bε =
let Σ straint (22) and assume only that ni ≥ 0. Since we will
n V as in Section 3.1 and o
b b b have repeated need of it, let Σ(n) = ΣM + Σε (n).
Diag V(x1 )/n1 , V(x2 )/n2 , . . ., V(xk )/nk where
Assuming M is second-order stationary, as in (8), we
b 2
V(xi ) = S (xi ). can let ΣM(xi , x0) = τ 2 ri (x0 ). We can then show that the
Using Σbε instead of Σε , solve the likelihood equa-
2. p solution n to (20), with integrality relaxed, satisfies
optimal ?
tions (17) for (βb0 , τb2 , θb). ni ∝ V(xi )Ci (n? ) where
?
3. Predict Y(x0 ) by the metamodel

Ci (n) = Σ(n)−1 WΣ(n)−1 ii
h i−1
b
b 0)
Y(x = βb0 + τb2 RM (x0 , ·; θb)> τb2 RM (θb) + Σ
bε
and W is the k × k matrix with elements
× Y¯ − βb0 1k (18) Z
Wi j = ri (x0 )r j (x0 ) dx0 .
x0 ∈X
with plug-in MSE estimate
To gain some insight into this result, suppose that N is
[ 0 ) = τb2
MSE(x large enough that Σ(n) ≈ ΣM so that
h i−1
− τb4 RM(x0 , ·; θb)> τb2 RM (θb) + Σ
bε
Ci (n) ≈ Ci = Σ−1
M WΣM ii .
−1
× RM (x0 , ·; θb)
h i−1 −1 Then
2 p
+ δ > δ 1> k τ
b R M ( θ
b) + Σ
bε 1k (19) V(xi )Ci
n?i ≈N k p . (23)
∑ j=1 V(x j )C j
where δ = 1 −1> b2 b b2
k [τ RM (θ )+ Σε ] RM (x0 , ·; θ ) τ .
b b −1
Notice that Ci is only a function of the extrinsic correlation
The last term on the right-hand side of (19) accounts
structure, and V is the intrinsic variance. Expression (23)
for the variability due to estimating β0 .
shows how the response surface, as represented by its corre-
lation structure, distorts the allocation of replications from
3.3 Experiment Design
one that is proportional to only the extrinsic standard de-
viation at the design point; it tends to favor design points
In this section we describe an approach to obtain experi-
that are centrally located because they do more to reduce
ment designs with low integrated MSE (IMSE). Our results
MSE throughout the design space. This further emphasizes
assume that the extrinsic covariance function ΣM (·, ·) and
that both intrinsic and extrinsic uncertainty matter in the
the extrinsic variance function V(·) are known; later in the
experiment design.
section we describe how we might use the results when
In practice neither ΣM (·, ·) nor V(·) are known in ad-
these functions are estimated.
vance, and the design points are not given. One way to use
Let X be the d-dimensional experiment design space
these results is via a two-stage design strategy:
of interest, and suppose that we have k fixed design points
x1 , x2, . . ., xk to which we want to allocate N replications.
367
1. In Stage 1, select a space-filling design of m pre-

determined design points x1 , . . ., xm and allocate
10
n0 replications to each. -
--
2. Fit V b and τb2 RM (·, ·; θb) as described above. --
--
8
-
-
3. In Stage 2, jointly select k − m additional design -
-
-
--
- -
points xm+1 , . . ., xk from a larger set and optimally
Yhat (V unknown)
- -
6
- -
-
allocate the N − mn0 additional replications among -
-
-
-
-
-
x1 , . . ., xk to minimize IMSE using V b and RM (·, ·; θb) -
4
- -
-
- -
--
in place of the true functions. - ----- - ---
-
- - - - - - - - - - - -- - -
--
- -
2
-- -- -
-
-- -- -
- - - -
-
- ---- -
4 ILLUSTRATION -
-
- --
-
- --
--
------
--
-
0
- --
0.3 0.4 0.5 0.6 0.7 0.8 0.9
To illustrate the methodology developed in this paper, we x0
consider the steady-state mean number in an M/M/1 queue.

The statistic we record from each replication is the average
number of customers in the system from time 0 to T . For
Figure 2: Fitted via stochastic kriging (solid line) and true
the M/M/1 queue we can initialize each replication in steady
(dashed line) expected number in an M/M/1 queue from
state by independently sampling the number in the system
the first-stage experiment.
at time 0 from the steady-state distribution. We keep the run
length per replication T the same for all arrival rates x so that
we entirely control intrinsic variance through the number of
For reference we also plot the known variance function
replications. We do not employ CRN. For fitting the mean
V(x)/T = 2x(1 + x)/(T (1 − x)4 ) (Whitt 1989).
and variance models we assume a Gaussian correlation
Using the results from the first-stage experiment (in
structure of the form RM (xi , x j ; θM) = exp(−θM (xi − x j )2 )
b
particular θbM and V(x)) we apply (23) to obtain the optimal
and RV (xi , x j ; θV) = exp(−θV (xi − x j )2 ), respectively, with
the θ ’s unknown. All of the simulation and fitting of allocation of N = 500 replications to the full set of design
the metamodels was done using our own code written in points x = 0.3, 0.4,0.5,0.6, 0.7,0.8,0.9. The variance model
S-PLUS; fitting was via maximum likelihood. is required as the full design includes design points that
To illustrate stochastic kriging, we consider an experi- were not simulated in the first-stage experiment. The es-
ment that starts with four design points, x = 0.3, 0.5,0.7,0.9, timated optimal allocation is n = 2, 80, 11,81,33,165,128,
making 20 replications of length T = 1000 time units at respectively. That design points 2 and 4 (0.4 and 0.6) re-
each of them (80 replications total). Based on the results ceive relatively large allocations relative to design points 1,
we allocate a total of N = 500 replications among these 3 and 5 (0.3, 0.5 and 0.7) results mostly from their variance
being overestimated by V. b More interesting is that x = 0.8
four design points, plus 3 additional points x = 0.4, 0.6,0.8,
using the approximately optimal allocation formula (23), receives a larger allocation than x = 0.9, even though the
and view the final fit. standard deviation at 0.9 is predicted to be substantially
Figure 2 shows the results for the mean number in greater than at 0.8 by V.b This occurs because our optimal
b
b 0 ) from the first-stage experiment. allocation considers not only the relative standard deviations
queue metamodel Y(x
at the design points, but also their range of influence in the
In the plot a circle represents an estimated response from
metamodel; x = 0.8 is closer to more points in the design
the simulation (the data points); the solid-line curve is
than 0.9 and therefore is more valuable.
thepstochastic kriging metamodel, which is surrounded by
Since several of the design points have already received
± MSE[ intervals at a fine grid of points; and the dashed- more replications than optimal—always a danger when the
line curve is the true surface. Since this is stochastic kriging, initial sample size has to be selected arbitrarily—we reran
as opposed to ordinary kriging, the fitted surface need not the experiment allocating the 500 replications optimally (in
pass throughp the data points (see especially at x = 0.9), practice we would not discard the data we already have
and the ± MSE [ intervals account both for intrinsic and and would instead allocate as close to the optimal design as
extrinsic uncertainty aboutp the surface. Notice that the true possible). Figure 4 shows the result. The most important
surface is within the ± MSE [ bounds on the fitted surface. thing to notice is not the close
p fit to the true curve as much
The fitted variance curve V(xb 0 ) is shown in Figure 3. as the nearly constant ± MSE [ intervals surrounding the
Since we use ordinary kriging for this model the fitted fitted curve.
curve passes through the data points, and it is clear that the
simulation provided a particularly poor estimate of V(0.9).
368
10
-
30
--
8
--
--
--
Yhat (V unknown)
--
6
--
20
--- --
Vhat
-- - - -
- -
- - --
- --
- - --
4
- -- -
- -
- --
---- ---- - --
10
-- - -- -
- - -- -
---
- -
- -
- -
-
- -
- -
- -- -- - -
-
2
- - - -- - -
- - - - - ------
- - -----
- - - - - - -- -- -- -
- - - - - - -- -- -- - -
-
- - - - - -- -- -- --------
- - -
- - - -
0
0
- - - - - -
0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.3 0.4 0.5 0.6 0.7 0.8 0.9
x0 x0
Figure 3: Fitted via ordinary kriging (solid line) and true Figure 4: Fitted via stochastic kriging (solid line) and true
(dashed line) variance of average number in an M/M/1 queue (dashed line) expected number in an M/M/1 queue from
from the first-stage experiment. the second-stage experiment.
5 CONCLUSIONS ACKNOWLEDGMENTS
This paper provides a mathematical foundation for stochas- This paper is based upon work supported by the National
tic kriging, a method that extends the power of kriging Science Foundation under Grant No. DMI-0555485, by
metamodeling for deterministic computer experiments to the Semiconductor Research Corporation under Grant No.
modeling responses from stochastic simulations. To realize 2004-OJ-1225, and by General Motors R&D. The authors
the full potential of this technique we need to, and are, also acknowledge helpful advice from Dan Apley, Russell
addressing these follow-up issues: Barton, Thomas Santner and Tim Simpson.
Our initial results on experimental design should lead
to methods for sequential, adaptive design that places design REFERENCES
points and allocates simulation effort as we learn more about
the response surface being modeled. The ability to capture Barton, R. R., and M. Meckesheimer. 2006. Metamodel-
intrinsic and extrinsic uncertainty in the design is a strength based simulation optimization. In Elsevier Handbooks
of stochastic kriging. in Operations Research and Management Science: Sim-
In our limited experiments it appeared that the Gaussian ulation, ed. S. G. Henderson and B. L. Nelson, 535–574.
random field model with Gaussian correlation structure did New York: Elsevier.
not work as well for representing estimator variance as it Biles, W. E., J. P. C. Kleijnen, W. C. M. van Beers and
did for the response mean. Other alternative models should I. Nieuwenhuyse. 2007. Kriging metamodeling in con-
be explored, as well as whether there is any benefit from strained simulation optimization: An exploratory study.
fitting a joint model for (M, V). Proceedings of the 2007 Winter Simulation Conference ,
We largely ignored the possibility of including a trend ed. S. G. Henderson, B. Biller, M.-H. Hsieh, J. Shortle,
term, f(x)> β , in our metamodel. Clearly there are applica- J. D. Tew and R. R. Barton, 355–362. Piscataway, New
tions for which the form of such a term is known or suspected Jersey: Institute of Electrical and Electronics Engineers.
and including it may leads to better fits. The presence of Cheng, R. C. H., and J. P. C. Kleijnen. 1998. Im-
a trend term may make the use of CRN worthwhile. proved design of queueing simulation experiments with
The examples in this paper employed only a one- highly heteroscedastic responses. Operations Research
dimensional design variable x, but the theory is for general 47:762–777.
d-dimensional x. In addition to the numerical issues that Cressie, N. A. C. 1993. Statistics for spatial data . New
can arise in fitting high-dimensional kriging models, there York: John Wiley.
is also a practical matter of visualizing and exploring the Fang, K. T., R. Li and A. Sudjianto. 2006. Design and
fitted surface. Tools such as ATSV (Stump et al. 2007) may modeling for computer experiments. Boca Raton, FL:
be particularly helpful in this regard. Chapman & Hall/CRC.
369
Kennedy, M. C. and A. O’Hagan. 2000. Predicting the van Beers, W. C. M. and J. P. C. Kleijnen. 2003. Kriging
output from a complex computer code when fast ap- for interpolation in random simulation. Journal of the
proximations are available. Biometrika 87:1–13. Operational Research Society 54:255–262.
Kleijnen, J. P. C. and W. C. M. van Beers. 2005. Robustness van Beers, W. C. M. and J. P. C. Kleijnen. 2007. Customized
of Kriging when interpolating in random simulation with sequential designs for random simulation experiments:
heterogeneous variances: Some experiments. European Kriging metamodeling and bootstrapping. European
Journal of Operational Research 165:826–834. Journal of Operational Research, forthcoming.
Law, A. M. and W. D. Kelton. 2000. Simulation modeling Whitt, W. 1989. Planning queueing simulations. Manage-
and analysis, 3rd ed. New York: McGraw Hill. ment Science 35:1341–1366.
Mitchell, T. J. and M. D. Morris. 1992. The spatial correla- Yang, F., B. E. Ankenman and B. L. Nelson. 2007. Efficient
tion function approach to response surface estimation. generation of cycle time-throughput curves through sim-
In Proceedings of the 1992 Winter Simulation Con- ulation and metamodeling. Naval Research Logistics
ference, ed. J. J. Swain, D. Goldsman, R. C. Crain 54:78–93.
and J. R. Wilson, 565–571. Piscataway, New Jersey:
Institute of Electrical and Electronics Engineers. AUTHOR BIOGRAPHIES
Myers, R. H. and D. C. Montgomery. 2002. Response
surface methodology, 2nd ed. New York: John Wiley. BRUCE ANKENMAN is an Associate Professor in the De-
Nelson, B. L. 1990. Control-variate remedies. Operations partment of Industrial Engineering & Management Sciences
Research 38:974–992. at Northwestern University. His research interests include
Nozari, A., S. F. Arnold and C. D. Pegden. 1987. Statis- the statistical design and analysis of experiments. Although
tical analysis for use with the Schruben and Margolin much of his work has been concerned with physical experi-
correlation induction strategy. Operations Research ments, recent research has focused on computer simulation
35:127–139. experiments. Professor Ankenman is currently the director
Sabuncuoglu, I. and S. Touhami. 2002. Simulation meta- of the Masters of Engineering Management Program and
modeling with neural networks: An experimental inves- the director of the Manufacturing and Design Engineering
tigation. International Journal of Production Research Program. He co-directs the freshman engineering and de-
40:2483–2505. sign course (EDC), and is the director of undergraduate
Sacks, J., W. J. Welch, T. J. Mitchell and H. P. Wynn. programs for the Segal Design Institute. His e-mail and
1989. Design and analysis of computer experiments. web addresses are <ankenman@northwestern.edu>
Statistical Science 4:409–423. and <users.iems.northwestern.edu/˜bea/> .
Santner, T. J., B. J. Williams and W. I. Notz. 2003. The
design and analysis of computer experiments. New BARRY L. NELSON is the Charles Deering McCormick
York: Springer. Professor of Industrial Engineering and Management Sci-
Schruben, L. W. and B. H. Margolin. 1978. Pseudorandom ences at Northwestern University and is Editor in Chief
number assignment in statistically designed simulation of Naval Research Logistics. His research centers on the
and distribution sampling experiments. Journal of the design and analysis of computer simulation experiments
American Statistical Association 73:504–525. on models of stochastic systems. His e-mail and web
Stein, M. L. 1999. Interpolation of spatial data: Some addresses are <nelsonb@northwestern.edu> and
theory for Kriging. New York: Springer. <www.iems.northwestern.edu/˜nelsonb/> .
Stump, G., S. Lego, M. Yukish, T. W. Simpson and J. A. Don-
ndelinger. 2007. Visual steering commands for trade JEREMY STAUM is Associate Professor of Industrial En-
space exploration: User-guided sampling with example. gineering and Management Sciences at Northwestern Uni-
In ASME Design Engineering Technical Conferences– versity. His research interests include risk management and
Design Automation Conference, ed. F. Liou. ASME simulation in financial engineering. Staum is Associate
DETC2007/DAC-34684. Editor of ACM Transactions on Modeling and Computer
Tew, J. D. and J. R. Wilson. 1992. Validation of sim- Simulation, Naval Research Logistics, and Operations Re-
ulation analysis methods for the Schruben-Margolin search, and was Risk Analysis track coordinator at the
correlation-induction strategy. Operations Research 2007 Winter Simulation Conference. His e-mail and web
40:87–103. addresses are <j-staum@northwestern.edu> and
Tew, J. D. and J. R. Wilson. 1994. Estimating simulation <users.iems.northwestern.edu/˜staum/> .
metamodels using combined correlation-based variance
reduction techniques. IIE Transactions 26:2-16.
370

Ankenman 2008

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ankenman 2008

Uploaded by

Copyright:

Available Formats

Proceedings of the 2008 Winter Simulation Conference

S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds.

STOCHASTIC KRIGING FOR SIMULATION METAMODELING

Department of Industrial Engineering & Management Sciences

ABSTRACT Using simulation to construct metamodels (models

978-1-4244-2708-6/08/$25.00 ©2008 IEEE 362

interpolation-based models. Our central contribution is to unknown response surface as

3. Predict Y(x0 ) by the metamodel

1. In Stage 1, select a space-filling design of m pre-

To illustrate the methodology developed in this paper, we x0

consider the steady-state mean number in an M/M/1 queue.

You might also like