a5
a5
Editor: Jianfeng Lu
Abstract
A method for analyzing non-asymptotic guarantees of numerical discretizations of ergodic
SDEs in Wasserstein-2 distance is presented by Sanz-Serna and Zygalakis in Wasserstein
distance estimates for the distributions of numerical approximations to ergodic stochastic
differential equations. They analyze the UBU integrator which is strong order two and only
requires one gradient evaluation per step, resulting in desirable non-asymptotic guarantees,
in particular O(d1/4 −1/2 ) steps to reach a distance of > 0 in Wasserstein-2 distance away
from the target distribution. However, there is a mistake in the local error estimates in
Sanz-Serna and Zygalakis (2021), in particular, a stronger assumption is needed to achieve
these complexity estimates. This note reconciles the theory with the dimension dependence
observed in practice in many applications of interest.
Keywords: Markov Chain Monte Carlo; Langevin diffusion; Bayesian inference; numer-
ical analysis of SDEs; strong convergence
1. Introduction
In Sanz-Serna and Zygalakis (2021), the authors present a framework to analyze the con-
vergence rate and asymptotic bias in Wasserstein-2 distance of numerical approximations
to ergodic SDEs. In their framework, they consider underdamped Langevin dynamics on
R2d which is given by
p
dv = −γvdt − c∇f (x)dt + 2γcdW (t)
(1)
dx = vdt,
with c, γ > 0. It can be shown under mild assumptions that (1) is ergodic and has invariant
measure π ∗ with density proportional to exp −f (x) − 2c
1
kvk2 .
1.1 Assumptions
Let H : Rd → Rd×d be the Hessian of f .
∗. Both authors contributed equally.
Note that this assumption can be also reformulated in terms of the following norm.
Definition 3 For A ∈ Rd×d×d
d
X X X X
kAk{1}{2}{3} = sup Aijk xi yj zk x2i ≤ 1, yj2 ≤ 1, zk2 ≤ 1
i,j,k=1 i j k
where E(t) = exp (−γt) and F(t) = 1−expγ(−γt) . Due to the high strong order properties
of the scheme, the UBU scheme shows improved dimension dependence in terms of non-
asymptotic guarantees, O(d1/4 ). This is supported by numerics in some applications in
2
Wasserstein distance and numerical approximations to ergodic SDEs
Zapatero (2017); Chada et al. (2023). In Sanz-Serna and Zygalakis (2021) under Assump-
tions 1 and 2 they show in Theorem 25 this improved dimension dependence, however,
there is a mistake in the local error estimates, in particular on page 31 they make use of
the bound
E kH0 (x)[v, v]k2 ≤ L21 E kvk4 ≤ 3L21 c2 d
(7)
for v ∼ N (0, c). However, it is straightforward to show that
for all x ∈ Rd .
From the definition, it is clear that kAk{1,2}{3} ≥ kAk{1}{2}{3} , and so by (2), the
strong Hessian Lipschitz property (Assumption 6) implies the Hessian Lipschitz property
(Assumption 2) with L1 = Ls1 . We will show a result in the other direction in Lemma 8.
It is also easy to check that the strong Hessian Lipschitz property does not introduce
dimension dependency for product target distributions, since A ∈ Rd×d×d with Aijk =
0 unless i = j = k (diagonal tensors), kAk{1,2},{3} = maxi |Aiii |, and so Ls1 equals the
maximum of the supremum of the absolute value of third derivatives amongst the potentials
of all components. Other examples of interest which do not introduce dimension dependency
include the Bayesian multinomial regression (see (Chada et al., 2023, Lemma H.6)), ridge
separable functions and 2-layer neural networks (see Chen and Gatmiry (2023)).
3
Paulin and Whalley
This is a special case of Lemma 13 of Chen and Gatmiry (2023), but with an explicit
constant in the bound (their constant was not made explicit).
Lemma 7 Let p ∼ N (0, Id ), and
X X X
g(p) = Aijk pi pj Almk pl pm .
k≤d i,j≤d l,m≤d
Then
Eg(p) ≤ 3dkAk2{12}{3} .
Proof Due to the independence of the components, E(pi pj pl pm ) = 0 unless there are either
two pairs of the same or four of the same amongst the indices. We know that E(p4i ) = 3.
Hence, we have
X X
Aiik Ajjk + A2ijk + Aijk Ajik ≤ Aiik Ajjk + 2A2ijk .
E(g(p)) =
i,j,k i,j,k
We have that
X X X
kAk{12}{3} = sup Ai1 i2 i3 x i1 i2 y i3 x2i1 i2 ≤ 1, yi23 ≤ 1
x,y
i1 ,i2 ,i3 i1 ,i2 i3
!2 1/2
X X X
2
= sup Ai1 i2 i3 yi3 yi3 ≤ 1
y
i1 ,i2 i3
i3
1/2 1/2
X 2
X
= sup hAi1 ,i2 ,· , yi = Ai1 ,i2 ,· · ATi1 ,i2 ,·
y:kyk≤1 i1 ,i2 i1 ,i2
1/2
X
= ATi1 ,·,· · Ai1 ,·,· .
i1
For the other term i,j,k Aiik Ajjk , we define a matrix A ∈ Rd×d as Aik = Aiik . Let e ∈ Rd
P
be a vector of ones, i.e. e1 = 1, . . . , ed = 1. Using these, we have
X T
Aiik Ajjk = eT A Ae ≤ dkAk2 .
i,j,k
4
Wasserstein distance and numerical approximations to ergodic SDEs
We have
X X X
kAk{12}{3} = sup Ai1 i2 i3 x i1 i2 y i3 x2i1 i2 ≤ 1, yi23 ≤ 1
x,y
i1 ,i2 ,i3 i1 ,i2 i3
and we can simply replace L1 by Ls1 in the estimates of Sanz-Serna and Zygalakis (2021).
Note that although the strongly Hessian Lipschitz assumption is stronger than the pre-
vious Hessian Lipschitz assumption (i.e. L1 ≤ Ls1 ), it is possible to show that every Hessian
Lipschitz function is also strongly Hessian Lipschitz, due to the following result.
√
Lemma 8 For any A ∈ Rd×d×d , kAk{12}{3} ≤ dkAk{1}{2}{3} . Hence every L1 -Hessian
√
Lipschitz function is dL1 -strongly Hessian Lipschitz.
1/2
X
kAk{12}{3} = ATi1 ,·,· · Ai1 ,·,· .
i1
5
Paulin and Whalley
(i)
where ρh ∈ (0, 1) for realizations (ξk )k∈N for i = 1, 2 of the UBU discretization.
and
kαh (ξˆn , tn )kL2 ,Ph ≤ C1 hp+1/2 , kβh (ξˆn , tn )kL2 ,Ph ≤ C2 hp+1 . (11)
Theorem 10 (Theorem 23 of Sanz-Serna and Zygalakis (2021)) Assume that the in-
tegrator satisfies Sanz-Serna and Zygalakis (2021)[Assumption 22] and in addition, there
are constants h0 > 0, r > 0 such that for h ≤ h0 the contractivity estimate (9) holds with
ρh ≤ (1 − rh)2 . Then, for any initial distribution π, stepsize h ≤ h0 , and n = 0, 1, ...,
√ !
∗ n ∗ 2C1 C2
WPh (π , Ψh,n π) ≤ (1 − hRh ) WPh (π , π) + √ + hp (12)
Rh Rh
with
1 p
Rh = 1 − (1 − rh)2 + C0 h2 = r + o(1), as h ↓ 0.
h
The remaining subsections are devoted to correcting the constants in Assumption 22 for
the UBU integrator and p = 2 and thereby the non-asymptotic guarantees by Theorem 23
of Sanz-Serna and Zygalakis (2021).
6
Wasserstein distance and numerical approximations to ergodic SDEs
D E
ψh (ξˆn , tn ) − ψh (ξn , tn ), αh (ξˆn , tn ) = |E(hṽn+1 − vn+1 , αv i)|.
L2 ,Ph
Unfortunately, this is incorrect, due to the matrix inner product defined by Ph , we have
D E
ψh (ξˆn , tn ) − ψh (ξn , tn ), αh (ξˆn , tn ) = |E(hṽn+1 − vn+1 , αv i + hx̃n+1 − xn+1 , αv i)|.
L2 ,Ph
The first term on the right-hand-side of this expression was bounded in Sanz-Serna and
Zygalakis (2021)[Eq. (43)] in terms of E(kv̂n − vn k2 ), E(kx̂n − xn k2 ) and E(kαv k2 ). The
additional term can be treated by the same argument as the fourth step of Sanz-Serna and
Zygalakis (2021)[Section 7.6] that is we estimate
with (6)
p Z tn+1/2
ỹn = x̂n + F(h/2)v̂n + 2γc F(tn+1/2 − s)dW (s),
tn
1/2 h 1/2
(Ekỹn − yn k2 )1/2 ≤ E kx̂n − xn k2 + E kv̂n − vn k2
2
h2 cL
hcL 2 1/2 2 1/2
1/2
E(kαv k2 )
h E kx̂n − xn k + 1+ E kv̂n − vn k .
2 4
7
Paulin and Whalley
C0 = K0 (3 + 2cL)
C1 = K1 c3/2 Ld1/2
√ !
√ 2 3/2 42 3/2 √
C2 = K2 (1 + 4 3)c L + (3 + )c L + 6cL1/2 + 3c2 Ls1 d1/2 ,
2
Proof Combining estimates of Section 3.1 in the norm k(v, x)k2Ph := kvk2 + 2 hv, xi + 2kxk2
q
and using the fact that k(x, v)k ≤ 3−2√5 k(x, v)kPh we have
D E
ψh (ξˆn , tn ) − ψh (ξn , tn ), αh (ξˆn , tn ) 2 ≤
L ,Ph
h2 cL hcL
hcL 2 1/2 2 1/2
1/2
E(kαv k2 )
h + cL E kx̂n − xn k + 3+ + E kv̂n − vn k
2 4 2
s
4
≤h √ (3 + 2cL) kξˆn − ξn kL2 ,Ph kαh kL2 ,Ph ,
3− 5
and the required C0 constant, with C2 given by following the argument of Sanz-Serna and
Zygalakis (2021)[Section 7.6] using Lemma 7.
Acknowledgments
We would like to thank Jesus Sanz-Serna and Kostas Zygalakis for the helpful correspon-
dence regarding their paper.
References
Neil K Chada, Benedict Leimkuhler, Daniel Paulin, and Peter A Whalley. Unbiased Kinetic
Langevin Monte Carlo with Inexact Gradients. arXiv preprint arXiv:2311.05025, 2023.
8
Wasserstein distance and numerical approximations to ergodic SDEs
Yuansi Chen and Khashayar Gatmiry. When does Metropolized Hamiltonian Monte
Carlo provably outperform Metropolis-adjusted Langevin algorithm? arXiv preprint
arXiv:2304.04724, 2023.
Jesus Marı́a Sanz-Serna and Konstantinos C. Zygalakis. Wasserstein distance estimates for
the distributions of numerical approximations to ergodic stochastic differential equations.
J. Mach. Learn. Res., 22:Paper No. 242, 37, 2021. ISSN 1532-4435,1533-7928. doi:
10.1080/14685248.2020.1855352.
Alfonso Alamo Zapatero. Word series for the numerical integration of stochastic differential
equations. PhD thesis, Universidad de Valladolid, 2017.