978-1-6654-7661-4/22/$31.00 ©2022 Ieee 61
978-1-6654-7661-4/22/$31.00 ©2022 Ieee 61
978-1-6654-7661-4/22/$31.00 ©2022 Ieee 61
B. Feng, G. Pedrielli, Y. Peng, S. Shashaani, E. Song, C.G. Corlu, L.H. Lee, E.P. Chew, T. Roeder, and
P. Lendermann, eds.
ABSTRACT
This paper considers sample average approximation (SAA) of a general class of stochastic optimization
problems over a function space constraint set and driven by “regulated” Gaussian processes. We estab-
lish statistical consistency by proving equiconvergence of the SAA estimator via a sophisticated sample
complexity result. Next, recognizing that implementation over such infinite-dimensional spaces is possible
only if numerical optimization is performed over a finite-dimensional subspace of the constraint set, and if
sample paths of the driving process can be generated over a finite grid, we identify the decay rate of the
SAA estimator’s expected optimality gap as a function of the optimization error, Monte Carlo sampling
error, path generation approximation error, and subspace projection error.
1 INTRODUCTION
We consider infinite-dimensional stochastic optimization problems of the form
Z Z
˜ J˜◦ Γ (F + z)dπ0 (z)
Γ
min J(F) = J(x)dπF (x) =
C C
s.t. F ∈ F , (OPT)
where J˜ : C → R is some “cost” functional, C is the space of R-valued continuous functions with domain
[0, T ] and equipped with the supremum norm, F ⊂ C is a subspace of C, and F ∈ F is the “decision
variable”. The functional J˜ takes as argument “paths” X F := Γ(F + Z), where Γ : C → C is a continuous
“regulator” map that confines Z + F to a subdomain of C, Z is a C-valued Gaussian random variable that
induces a measure π0 on the Borel space (C, C ) and X F induces the ‘push-forward’ measure πFΓ (see
Definition 8 below).
control and machine learning, when formulated as stochastic optimization problems driven by Gaussian
processes.
Example 1 Let F = W01,2 the Sobolev space consisting of R-valued absolutely continuous functions with
L2 -integrable derivatives and initial value 0. If Z = σ B, where B is a Wiener process with measure π0 and
σ > 0, then (C, F , π0 ) is the classic Cameron-Martin-Wiener space. Let Γ be the so-called Skorokhod
regulator map (Chen and Yao 2001, Ch. 5), which satisfies Γ(x)(·) = x(·) + sup0≤s≤· max{−(x(s)), 0} for
any function x ∈ C. Then, the random variable X F is a so-called reflected Brownian motion (RBM) with drift
˜ := a1 0T g(x(s))ds + a2 G(x(T )), where (a1 , a2 ) ∈ R2 , and
R
F. Consider a cost functional over x ∈ C, x 7→ J(x)
g : R → R and G : R → R are well-defined functions. The corresponding optimization problem represents
a class of ‘open-loop’ optimal control problems over W01,2 , driven by an RBM. This class of problems
arises in nonstationary queueing network control, scheduling and inventory control.
Example 2 Suppose that X F = F + Z, Z = σ B and F := {F ∈ C : LF p = 0, p(0) = δx0 }, where LF ≡
2
∂t + F 0 (t)∂x − σ2 ∂xx is the Fokker-Planck partial differential operator corresponding to X F , F 0 (t) = dF(t)/dt
and p(0) = δx0 is the initial condition. The solution of this equation is the marginal density pF (t, ·) of X F (t).
Consider the cost functional J(x)˜ := log (pF (T, x(T ))/p0 (x(T ))) where p0 (·) is a reference density function,
and the optimization problem minF∈F E[J˜ X F ] = Rd log pFp0(T,z)
R
(z) pF (T, z)dx. Roughly speaking, this
problem computes arbitrary Gaussian approximations pF (T, ·) to p0 (·) by minimizing the Kullback-Leibler
divergence between these densities. This formulation underlies the use of so-called stochastic normalizing
flows for variational inference (VI) in probabilistic machine learning.
where the random variables Z := (Z1 , · · · , ZN ) are independent and identically distributed (iid). The main
idea in SAA is the recognition that since (MC-OPT) is a deterministic optimization problem that in a sense
approximates (OPT), a solution to (MC-OPT) might reasonably be expected to approximate a solution
to (OPT). While this idea is sound in principle, the context raises a number of statistical questions that
need resolution. Accordingly, this paper establishes the following two “first order” results.
1. Asymptotic Consistency. We first demonstrate that the optimal value and optimizers of (MC-OPT) are
asymptotically consistent (in the number of samples N from π0 ) by proving convergence in probability. Our
approach to this first establishes a novel uniform equiconvergence result over function spaces by √ showing
that the Gaussian complexity of the SAA estimator of the objective is inversely proportional to N (for
every N), assuming the diameter of the constraint set F is bounded.
2. Rate of Convergence. The Gaussian paths {Z j , j ≥ 1} from π0 in (MC-OPT) cannot in general be
sampled directly. For instance, if Z1 is a Brownian motion, then sample paths may (only) be approximated
using Euler-Maruyama or Euler-Milstein schemes (Asmussen and Glynn 2007). In other words, the
problem in (MC-OPT) is “fictitious” from the standpoint of computation and a further approximation
to (MC-OPT) is necessary for implementation. Furthermore, since F might be infinite dimensional, the
solving of (MC-OPT) must be performed (only) over a finite-dimensional subspace of the constraint set F
to allow computation using a method such as gradient descent. Our second main result is a convergence
rate result that accounts for the above two sources of error, and quantifies the expected decay rate of the true
62
Zhou, Honnappa, and Pasupathy
1.3 Literature
The use of SAA methodology toward stochastic optimization in Rd has an extensive literature, as compre-
hensively surveyed in (Shapiro 2003; Shapiro et al. 2009; Kim et al. 2014; Pasupathy 2010; Banholzer
et al. 2019). Corresponding results in the non-Euclidean setting appear in (Dupačová and Wets 1988;
Robinson 1996)) where the feasible F is assumed to be finite dimensional. Especially relevant to what we
present here is the extensive treatment of consistency and uniform rate properties of M-estimators (van de
Geer 2000; Bose 1998) in normed spaces. (A solution to (OPT) is indeed an M-estimator.) However, we
are not aware of infinite-dimensional SAA rate results in the typical context where the SAA estimator is
not available in “closed form” but is computed using an iterative optimization technique. Nonetheless, as
pointed out in the introduction, there are a number of problems that require optimization over function
spaces, wherein SAA is a natural approximation to such problems.
2 PRELIMINARIES
In this section, we discuss mathematical preliminaries including key definitions, assumptions, and notation.
It can be shown that x : F → R is a bounded linear functional if and only if x is continuous on F , and that
continuity of x at any point F0 ∈ F implies boundedness of x. (It is important that x : F → R being bounded
does not mean supF∈F |x(F)| < ∞; indeed, it is routinely the case that kxk < ∞ but supF∈F |x(F)| = ∞.)
Definition 2 (Dual Space, Adjoint Space, Conjugate Space) The space F ∗ of linear functionals on F
is called the algebraic dual space of F . F ∗ should be distinguished from the dual space F 0 , which is
the space of of bounded linear functionals on F . F ∗ is sometimes also called the adjoint space or the
conjugate space of F .
Definition 3 (Dual Norm) The (operator) norm of the functional T ∈ F ∗ is called the dual norm or
conjugate norm of T :
|T x|
kT k∗ := sup : x ∈ F , x 6= 0
kxk
= sup {|T x| : x ∈ F , kxk = 1} . (1)
63
Zhou, Honnappa, and Pasupathy
Definition 4 (Right and Left Directional Derivatives) The right directional derivative J+0 (F, v) and the left
directional derivative J−0 (F, v) of the functional J : F → R at the point F ∈ F are defined as
1
J+0 (F, v) := lim+ (J(F + tv) − J(F)) ;
t→0 t
1
J−0 (F, v) := lim− (J(F + tv) − J(F)) .
t→0 t
Definition 8 (Push-Forward Measure) This paper focuses on stochastic optimization problems defined with
respect to regulated Gaussian processes, X F = Γ(Z + F), whose paths are confined to a subdomain of C. To
define the measure corresponding to such a regulated process, define the shift operator Tg (x) : (F ,C) → C
as Tg (x) = g + x, and the push-forward measure corresponding to the shift operator πg (A) := (Tg )∗ (π0 )(A) =
π0 (Tg−1 (A)) for any A ∈ C . Then, the push-forward measure corresponding to X F is defined as πFΓ (A) =
πF (Γ−1 (A)), for any A ∈ C .
where Kx > 0 for every sample path x ∈ C, F1 , F2 ∈ F and E[KZp ] < +∞ for some 2 ≤ p < +∞.
64
Zhou, Honnappa, and Pasupathy
Assumption 3 The cost functional J˜ is L-Lipschitz in z ∈ C, i.e., for any F ∈ F and z, z0 ∈ C, we have
|J(z ˜ 0 + F)| ≤ κkz − z0 k∞ .
˜ + F) − J(z (5)
Assumption 4 The cost functionalR J˜ : C → R is sufficiently regular such that the composite functional
J˜◦ Γ : C → R is integrable, that is, C (J˜◦ Γ)(x + F) dπ0 (x) < +∞.
Assumption 5 The composition J˜ ◦ Γ : C × F → C is LΓ,Z -Lipschitz in F, that is, for any Z ∈ C,
kJ˜◦ Γ(Z + F1 ) − J˜◦ Γ(Z + F2 )k ≤ LΓ,Z kF1 − F2 k, where E[LΓ,Z
2 ] < ∞.
This assumption is easily satisfied by the Skorokhod regulator which is 2-Lipschitz continuous in the
space C.
We assume that the subspace F ⊆ C satisfies
Assumption 6 F has a finite diameter. That is, diam(F ) := supF1 ,F2 ∈F kF1 − F2 k∞ < +∞.
This is a reasonably strong assumption, that is nonetheless satisfied by many problems settings; for instance,
if the function class F is parameterized by a compact set. We also believe that it should be possible to
relax this condition, at the expense of more complicated computations.
where the expectation is taken with respect to the Gaussian random vector g ∼ N (0, IN×N ), which is
independent of the iid samples Z := (Z1 , · · · , ZN ).
Next, define the RN -valued random field G· (·) as F 7→ GF (Z) := J(Z
˜ 1 + F), · · · J(Z
˜ N + F) . For each
z ∈ CN define the set B ≡ B(z) := {GF (z) : F ∈ F } ⊆ RN , and the pseudometric d : B × B → [0, ∞),
given by
1
d(x, y) = √ kK(z)k p kFx − Fy k∞ , (7)
N
1/p
where Fx , Fy ∈ F correspond to x, y (respectively) through the map G· , for any x ∈ RN kxk p := ∑Ni=1 |xi | p
and Kzi are the Lipschitz variables defined in Assumption 2.
Let {YF (Z) : F ∈ F } be the real-valued random field defined as YF (Z) := √1N ∑Ni=1 gi J(Z
˜ i +F), where g is
a N-dimensional standard Gaussian random vector, as before. The next lemma shows that {YF (Z) : F ∈ F }
satisfies a sub-Gaussian concentration inequality, conditioned on Z.
u2
Lemma 1 For any F, G ∈ F such that kF −Gk∞ 6= 0 we have P (|YF (z) −YG (z)| > u| Z = z) ≤ 2 exp − 2L2 d(GF (z),G 2 ,
G (z))
where d(·, ·) is defined in (7) and L := sup kykq = 1 for q ≥ 2.
y∈RN :kyk2 =1
Proof. Fix F, G ∈ F such that F 6= G. By Hölder’s inequality we have |YF (Z)−YG (Z)| ≤ √1N kgkq kGF (Z)−
GG (Z)k p , where 1
p + 1q = 1 and q ≥ 2. Next, following Assumption 2, we have
!1/p !1/p
N N
kGF − GG k p = ˜ i + G) p
˜ i + F) − J(Z ∑ |KZ | p kF − Gk∞p
∑ J(Z ≤ i = kF − Gk∞ kK(Z)k p .
i=1 i=1
65
Zhou, Honnappa, and Pasupathy
It follows that
√
u N
P |YF (z) −YG (z)| > u Z = z ≤ P kgkq > Z=z . (8)
kK(z)k p kF − Gk∞
It is straightfoward to see that x 7→ kxkq is a Lipschitz function from RN to R. Then, by (Boucheron, Lugosi,
and Massart
2013, Theorem 5.6), kgkq satisfies the sub-Gaussian concentration inequality P (kgkq > ε) ≤
2
ε
2 exp − 2L 2 , where L is defined above. Applying this to (8) completes the proof.
Next, we show that an ε-cover under the pseudometric can be “translated” into a corresponding ε-cover
under the supremum-norm.
Lemma 2 Fix ε > 0. Let z = (z1 , · · · , zN ) ∈ CN and suppose B1 , · · · , Bl ⊂ RN is an ε-cover of B = {GF (z) :
F ∈ F } under the pseudometric (7). Then, √
there exist subsets B01 , · · · , B0l that form an ε 0 -cover of F under
the supremum norm k · k∞ , with ε 0 = kK(z)k
ε N
p
.
Proof. It is straightforward to see that {YF (Z) : F ∈ F } is a separable random field. Further, by Lemma 1
{YF (Z) : F ∈ F } is sub-Gaussian. By Assumption 6, and the definition of the pseudometric d, we have
D := supz1 ,z2 ∈B d(z1 , z2 ) < +∞. By Dudley’s Theorem for separable random fields it follows that there
R D/2 p
exists a constant 0 < C < +∞ such that Eg supF∈F |YF −YF0 | Z = z ≤ C0 0
0 log N(ε, B, d)dε. By
√
Lemma 2 it follows that N(ε, B, d) = N(ε 0 , F , k · k∞ ), where ε 0 = ε kK(z)k
N
. Then, changing variables in
√ p
Note that by Assumption 6 it follows that the right hand side above is finite. Setting C = C0 α−1
α
completes
the proof.
66
Zhou, Honnappa, and Pasupathy
Recall that the sub-Gaussian diameter for a metric probability space (X , d, π) with metric d and measure
2 2
π is defined as ∆2SG (X ) := σ ∗ (Y ) where σ ∗ (Y ) is the smallest σ that satisfies E eλY ≤ eσ λ /2 , λ ∈ R,
Y := εd(X, X 0 ) is the symmetrized distance on the metric space X , ε = ±1 with probability 1/2 and X, X 0
are X -valued random variables with measure π. Consider the following generalization of McDiarmid’s
inequality.
Theorem 1 (Theorem 1 (Kontorovich 2014)) Let (X , d, π) be a metric space that satisfies ∆SG (X ) < +∞,
2
and ϕ : X N → R is 1-Lipschitz, then Eπ [ϕ(Z)] < +∞, and π (|ϕ(Z) − Eπ [ϕ(Z)]| > t) ≤ 2 exp − 2N∆2t (X ) ,
SG
where Z = (Z1 , · · · , ZN ) is an independent sample drawn from π.
Observe that this result significantly loosens the requirements in McDiarmid’s inequality from bound-
edness to Lipschitz continuity.
Proposition 2 Let Z = (Z1 , · · · , ZN ) be N i.i.d. random variables with measure π0 . Suppose the cost function
satisfies Assumption 3. Suppose that the metric probability space (C, k · k∞ , π0 ) satisfies ∆SG (C) < +∞.
Then for any F ∈ F and δ > 0, with probability at least 1 − δ , we have
" ( )# 1/2
1 N ˜ 1 N ˜ 2κ 2 ∆2SG (C) log(1/δ )
J(F) ≤ ∑ J(Zi + F) + E sup J(G) − ∑ J(Zi + G) + . (10)
N i=1 G∈F N i=1 N
Remark: We note that the assumption that ∆SG (C) < +∞ is reasonable – for instance, it is satisfied in the
case where π0 is the Wiener measure.
Proof. We start by considering the functional ϕ : CN → R defined as ϕ(z) :=
supF∈F J(F) − N1 ∑Ni=1 J(z ˜ i + F) , for any z ∈ CN . Let z = (z1 , · · · , zN ) ∈ CN , z0 = (z01 , · · · , z0N ) ∈ CN ; the
metric distance between these vectors of functions is given by kz − z0 k = ∑Ni=1 kzi − z0i k∞ . Also, define the
sequence of vectors z1 = (z01 , z2 , · · · , zN ), z2 = (z01 , z02 , z3 , · · · , zN ), . . . , zN = (z01 , z02 , . . . , z0N ) ≡ z0 . Using the
triangle inequality, it is straightforward to see that
where each pair of zk−1 and zk differs only by the kth element. Let zk (i) represent the ith element of the
kth vector and F ∗ ∈ F be the function that achieves the supremum in ϕ(z). For any such pair of vectors,
we have
( )
k−1 k 1 N ˜ k−1
ϕ(z ) − ϕ(z ) = sup J(F) − ∑ J(z (i) + F)
F∈F N i=1
( ! )
1 N ˜ k−1 1 ˜ ˜ 0k + F)
− sup J(F) − ∑ J(z (i) + F) + J(zk + F) − J(z
F∈F N i=1 N
1 ˜ ˜ 0k + F ∗ ) ≤ κ kzk − z0k k∞ .
J(zk + F ∗ ) − J(z
≤
N N
where the last inequality follows from Assumption 3. Consequently, substituting this into (11) we have
67
Zhou, Honnappa, and Pasupathy
Now, by hypothesis we have ∆2SG (C) < +∞, and therefore applying Theorem 1 we have
2 2
P (ϕ − E(ϕ) > t) = P NL (ϕ − E(ϕ)) > NL t ≤ exp − 2κ 2 Nt . Now, for any δ > 0, exp − 2L2 ∆Nt2 (C) ≤ δ
∆2SG (C) SG
2κ ∆SG (C) log(1/δ ) 1/2
2 2
implies that t ≥ N . Hence, with probability at least 1 − δ , we have ϕ < E(ϕ) +
2 2 1/2
2κ ∆SG (C) log(1/δ )
N , which yields the final expression in (10).
Now, our main sample complexity result follows by combining Proposition 1 and Proposition 2. By
taking an expectation with respect to π0 over (9) it follows that the Gaussian complexity of the function
space F is
α−1
C E [kK(Z)k p ] 1 α
RN (F ) := √ diam(F ) , (13)
N 2
provided Eπ0 [kK(Z)k p ] < +∞; this is a consequence of Assumption 2.
Theorem 2 Let F ⊆ C satisfy Assumption 6 and suppose that log N(ε, F , k · k∞ ) ≤ ε −1/α for α ≥ 1 and
ε > 0. Suppose the cost function J˜ satisfies Assumption 1, Assumption 2 (for some 1 ≤ p < +∞) and
Assumption 3. Let Z = (Z1 , · · · , ZN ) be an i.i.d. sample drawn from π0 . Then, for any δ > 0 and some
1 ≤ p < +∞, with probability at least 1 − δ , for any F ∈ F we have
r !
1 N ˜ log(1/δ )
J(F) ≤ ∑ J(Zi + F) + 2RN (F ) + O .
N i=1 N
J ∗ < J˜N∗ . In the former case, |J ∗ − J˜N∗ | < |J(Π∗n ) − J˜N∗ |. In the latter case, |J ∗ − J˜N∗ | < | N1 ∑Ni=1 J(Z
˜ i + π ∗ ) − J ∗ |.
Therefore,
( )
N N
1 ˜ i + π ∗ ) − J ∗ | < |J(Π∗n ) − J˜N∗ | + | 1 ∑ J(Z P
|J ∗ − J˜N∗ | < max |J(Π∗n ) − J˜N∗ |, | ∑ J(Z ˜ i + π ∗) − J∗| → 0
N i=1 N i=1
as N → ∞ by Theorem 2.
4 RATE OF CONVERGENCE
Let’s introduce further notation to keep our exposition clear. Recall that F is a compact subspace of
the space of continuous functions on [0, T ]. Suppose F ∈ F and that we can generate N independent
realizations of the process {(Zh (t),t ∈ [0, T ]} with measure π0,h
Γ and having continuous paths and having
68
Zhou, Honnappa, and Pasupathy
Let Fn denote an n-dimensional (n < ∞) closed subspace of F such that elements in F can be approached
by a sequence of elements in Fn , that is, for every F ∈ F , there exists {Fn , n ≥ 1}, Fn ∈ Fn such that
kFn − Fk → 0. An example of Fn is the span of the first n Legendre polynomials (Kreyszig 1989, pp. 176)
on the interval [0, T ]. More generally, Fn can be chosen as the span of the first n elements of any Schauder
basis of F . (Recall that a sequence {Pj , j ≥ 1} of vectors in a normed space F is called a Schauder basis
of F if for every F ∈ F there is a unique sequence {a j , j ≥ 1} of scalars such that kF − ∑nj=1 a j Pj k → 0
as n → ∞.) Consequently, we assume that
Assumption 7 The closed finite-dimensional function subspace Fn ⊂ F is such that
where g(n) → 0 as n → ∞.
With the above notation in place, the SAA problem (MC-n-OPT) approximating (OPT) is:
( )
N
1 iid
min. JN,h (F) :=
N ∑ J˜◦ Γ(Zh, j + F) , Zh, j ∼ π0,h
j=1
s.t. F ∈ Fn (MC-n-OPT)
where the measure π0,h approximates the measure π0 . For brevity, we will write Zh, j as Z j in the remainder
of this section. To facilitate a basic result that quantifies the quality of the solution to (MC-n-OPT) as an
estimator to the solution to (OPT) we assume that
Assumption 8 The random functional F 7→ J˜◦ Γ(Z + F) is convex in F.
Observe that the problem in (MC-n-OPT) is obtained by replacing the expectation appearing in (OPT) by
a Monte Carlo sum obtained by generating N samples of a process {XhF (t),t ∈ [0, T ]} that approximates the
process {X F (t),t ∈ [0, T ]}. We define the following optimal values and optimal solution (sets) corresponding
to (OPT) and (MC-n-OPT), the existence of which will become evident.
69
Zhou, Honnappa, and Pasupathy
Lemma 3 Let Assumption 2 and Assumption 6 hold, and suppose there exists F0 ∈ F such that
iid
˜ F0 )) < ∞;
σ02 (h) := Var(J(X XhF0 ∼ πFΓ0 ,h . (16)
h
Then,
q 2
˜ hF )) ≤ σ0 (h) + diam(F ) E[L2 ] .
sup Var(J(X (17)
Γ,Z
F∈F
˜ F0 )) ≤ E LΓ,Z
2
˜ hF ) − J(X
Var(J(X h diam2 (F ). (20)
Use (18) and (20) along with (16) to conclude that the assertion of the lemma holds.
∗
We now present the main rate result governing the solution estimator FN,n,k of (MC-n-OPT).
iid
Theorem 3 Let Assumptions 2, 6, 7, 8 hold, and suppose that the method used to generate paths XhF ∼ πF,h
Γ
exhibits weak convergence order β , implying that there exists `1 < ∞ such that
˜ hF ) − J(F) ≤ `1 hβ .
sup E J(X (21)
F∈F
Furthermore, suppose mirror descent (Bubeck 2015, pp. 80) is executed for k steps on (MC-n-OPT):
sup kSJN,h (F)k∗ ≤ K̄; SJN,h (F) ∈ ∂ JN,h (F); E[KZ2 j ] < ∞, (23)
F∈F
70
Zhou, Honnappa, and Pasupathy
where SJN,h (F) is a subgradient and ∂ JN,h (F) the subdifferential of the convex functional JN,h at the point
F. Then, for all k ≥ 1,
∗ c1 c2
) − J(F ∗ ) ≤ √ + √ + c3 hβ + c4 g(n),
0 ≤ E J(FN,n,k (24)
k N
where
s 1/2
2 2 1 2
c1 := E[R ] Var(KZ ) + E[KZ ] ;
ρ k
3 q
2 ] + σ (h) ;
c2 := √ diam(F ) E[LΓ,Z 0
N
c3 := `1 ; and
c4 := E [KZ ] . (25)
Proof. Observe that
∗
0 ≤ J(FN,n,k ) − J(F ∗ ) = J(FN,n,k
∗ ∗
) − JN,h (FN,n,k ∗
) + JN,h (FN,n,k ∗
) − JN,h (FN,n )
∗
+ JN,h (FN,n ) − J(Fn∗ ) + J(Fn∗ ) − J(F ∗ )
∗ ∗
≤ JN,h (FN,n,k ) − JN,h (FN,n )+ ∑ |JN,h (F) − J(F)| + J(Fn∗ ) − J(F ∗ )
∗
F∈{FN,n,k ∗ ,F ∗ }
,FN,n n
where the penultimate inequality in (26) follows from rearrangement of terms and the last inequality follows
upon using the sub-gradient inequality (3) for the convex functional J(·). Now we quantify (in expectation)
each of the error terms appearing on the right-hand side of (26). Applying mirror descent’s complexity
bound (Bubeck 2015, pp. 80) on the K̄-smooth function JN,h (·) and taking expectation, we get
s 1/2
∗ ∗ 1 2 1
E[R2 ] Var(KZ ) + E[KZ2 ]
0 ≤ E JN,h (FN,n,k ) − JN,h (FN,n ) ≤ √ . (27)
k ρ k
where the last inequality in (30) is from Assumption 2. Now use (27), (28), (29), and (30) to conclude.
71
Zhou, Honnappa, and Pasupathy
REFERENCES
Asmussen, S., and P. W. Glynn. 2007. Stochastic Simulation: Algorithms and Analysis. New York, NY:
Springer.
Banholzer, D., J. Fliege, and R. Werner. 2019. “On rates of convergence for sample average approximations
in the almost sure sense and in mean”. Mathematical Programming 191(1):307–345.
Bartlett, P. L., and S. Mendelson. 2002. “Rademacher and Gaussian complexities: Risk bounds and structural
results”. Journal of Machine Learning Research 3(Nov):463–482.
Bose, A. 1998. “Bahadur representation of Mm estimates”. The Annals of Statistics 26(2):771–777.
Boucheron, S., G. Lugosi, and P. Massart. 2013. Concentration inequalities: A nonasymptotic theory of
independence. Oxford university press.
Bubeck, S. 2015. “Convex Optimization: Algorithms and Complexity”. Foundations and Trends in Machine
Learning 8(3–4):231–358.
Chen, H., and D. D. Yao. 2001. Fundamentals of queueing networks: Performance, asymptotics, and
optimization, Volume 4. Springer.
Dupačová, J., and R. J. B. Wets. 1988. “Asymptotic behavior of statistical estimators and of optimal solutions
of stochastic optimization problems”. The Annals of Statistics 16(4):1517–1549.
Kim, S., R. Pasupathy, and S. G. Henderson. 2014. “A Guide to SAA”. In Encyclopedia of Operations
Research and Management Science, edited by M. Fu, Hillier and Lieberman OR Series. Elsevier.
Kontorovich, A. 2014. “Concentration in unbounded metric spaces and algorithmic stability”. In Proceedings
of the 31st International Conference on Machine Learning. June 22nd -24th , Beijing, China, 28–36.
Kreyszig, E. 1989. Introductory functional analysis with applications. Wiley Classics Library ed. Wiley
classics library. New York: Wiley.
Nesterov, Y. 2004. Introductory Lectures on Convex Optimization: A Basic Course. New York, NY: Springer
Science + Business Media, LLC.
Pasupathy, R. 2010. “On Choosing Parameters in Retrospective-Approximation Algorithms for Stochastic
Root Finding and Simulation Optimization”. Operations Research 58(4):889–901.
Robinson, S. 1996. “Analysis of Sample-path Optimization”. Mathematics of Operations Re-
search 21(3):513–528.
Shapiro, A. 2003. “Monte Carlo sampling methods”. Handbooks in operations research and management
science 10:353–425.
Shapiro, A., D. Dentcheva, and A. Ruszczynski. 2009. Lectures on Stochastic Programming: Modeling
and Theory. 2nd ed. Philadelphia, Pennsylvania: Society for Industrial and Applied Mathematics.
van de Geer, S. 2000. Empirical Processes in M-estimation. Cambridge, UK: Cambridge University Press.
AUTHOR BIOGRAPHY
ZIHE ZHOU is a graduate student in the School of Industrial Engineering at Purdue University. Her re-
search interests lie broadly in applied probability and simulation. Her email address is zhou408@purdue.edu.
RAGHU PASUPATHY is Professor of Statistics at Purdue University. His current research interests
lie broadly in general simulation methodology, stochastic optimization, statistical inference and sta-
tistical computation. Raghu Pasupathy’s email address is pasupath@purdue.edu, and his web page
https://web.ics.purdue.edu/∼pasupath contains links to papers, software codes, and other material.
72