Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

a5

Uploaded by

HARSHIT KHANNA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

a5

Uploaded by

HARSHIT KHANNA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Journal of Machine Learning Research 25 (2024) 1-9 Submitted 6/24; Revised 12/24; Published 12/24

Correction to “Wasserstein distance estimates for the


distributions of numerical approximations to ergodic
stochastic differential equations”

Daniel Paulin ∗ dpaulin@ed.ac.uk


School of Mathematics
University of Edinburgh, United Kingdom
Peter A. Whalley ∗ peter.whalley@math.ethz.ch
Seminar for Statistics
ETH Zürich, Zürich, Switzerland

Editor: Jianfeng Lu

Abstract
A method for analyzing non-asymptotic guarantees of numerical discretizations of ergodic
SDEs in Wasserstein-2 distance is presented by Sanz-Serna and Zygalakis in Wasserstein
distance estimates for the distributions of numerical approximations to ergodic stochastic
differential equations. They analyze the UBU integrator which is strong order two and only
requires one gradient evaluation per step, resulting in desirable non-asymptotic guarantees,
in particular O(d1/4 −1/2 ) steps to reach a distance of  > 0 in Wasserstein-2 distance away
from the target distribution. However, there is a mistake in the local error estimates in
Sanz-Serna and Zygalakis (2021), in particular, a stronger assumption is needed to achieve
these complexity estimates. This note reconciles the theory with the dimension dependence
observed in practice in many applications of interest.
Keywords: Markov Chain Monte Carlo; Langevin diffusion; Bayesian inference; numer-
ical analysis of SDEs; strong convergence

1. Introduction
In Sanz-Serna and Zygalakis (2021), the authors present a framework to analyze the con-
vergence rate and asymptotic bias in Wasserstein-2 distance of numerical approximations
to ergodic SDEs. In their framework, they consider underdamped Langevin dynamics on
R2d which is given by
p
dv = −γvdt − c∇f (x)dt + 2γcdW (t)
(1)
dx = vdt,
with c, γ > 0. It can be shown under mild assumptions that (1) is ergodic and has invariant
measure π ∗ with density proportional to exp −f (x) − 2c
1
kvk2 .

1.1 Assumptions
Let H : Rd → Rd×d be the Hessian of f .
∗. Both authors contributed equally.

c 2024 Daniel Paulin and Peter A. Whalley.


License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided
at http://jmlr.org/papers/v25/24-0895.html.
Paulin and Whalley

Assumption 1 f : Rd → Rd is twice differentiable, m-strongly convex and L−smooth, i.e.

∀x ∈ Rd , mId×d ≺ H(x) ≺ LId×d ,

where Id×d ∈ Rd×d is the d-dimensional identity matrix.

Assumption 2 Let H : Rd → Rd×d be the Hessian of f , f be three times differentiable and


there is a constant L1 ≥ 0 such that at each point x ∈ Rd , for arbitrary (w1 , w2 ) ∈ Rd × Rd

kH0 (x)[w1 , w2 ]k ≤ L1 kw1 kkw2 k.

Note that this assumption can be also reformulated in terms of the following norm.
Definition 3 For A ∈ Rd×d×d
 
d
 X X X X 
kAk{1}{2}{3} = sup Aijk xi yj zk x2i ≤ 1, yj2 ≤ 1, zk2 ≤ 1
 
i,j,k=1 i j k

Then it is easy to check that Assumption 2 is equivalent to

kH0 (x)k{1}{2}{3} ≤ L1 for every x ∈ Rd . (2)

1.2 Wasserstein distance


Definition 4 Let π1 and π2 be two probability measures on R2d , the 2-Wasserstein distance
between π1 and π2 with respect to the positive definite matrix P ∈ R2d×2d is given by
 Z 1/2
2
WP (π1 , π2 ) = inf kx − ykP dζ(x, y) , (3)
ζ∈Z R2d
1/2
where Z is the set of all couplings between π1 and π2 and kξkP = ξ T P ξ for all ξ ∈ R2d .

2. UBU integrator and error in dimension dependence


In Sanz-Serna and Zygalakis (2021) they considered the UBU integrator, a numerical inte-
grator for (1), which is strong order 2 and only requires one gradient evaluation per iteration.
It is defined by the following updated rule for a stepsize h > 0
p Z tn+1
vn+1 = E(h)vn − hE(h/2)c∇f (yn ) + 2γc E(tn+1 − s)dW (s), (4)
tn
p Z tn+1
xn+1 = xn + F(h)vn − hF(h/2)c∇f (yn ) + 2γc F(tn+1 − s)dW (s), (5)
tn
p Z tn+1/2
yn = xn + F(h/2)vn + 2γc F(tn+1/2 − s)dW (s), (6)
tn

where E(t) = exp (−γt) and F(t) = 1−expγ(−γt) . Due to the high strong order properties
of the scheme, the UBU scheme shows improved dimension dependence in terms of non-
asymptotic guarantees, O(d1/4 ). This is supported by numerics in some applications in

2
Wasserstein distance and numerical approximations to ergodic SDEs

Zapatero (2017); Chada et al. (2023). In Sanz-Serna and Zygalakis (2021) under Assump-
tions 1 and 2 they show in Theorem 25 this improved dimension dependence, however,
there is a mistake in the local error estimates, in particular on page 31 they make use of
the bound
E kH0 (x)[v, v]k2 ≤ L21 E kvk4 ≤ 3L21 c2 d
 
(7)
for v ∼ N (0, c). However, it is straightforward to show that

E kvk4 = c2 (d2 + 2d)




and under Assumption 2 it is not possible to achieve non-asymptotic guarantees of order


O(d1/4 ). The focus of this note is to reconcile the numerics with the theory by introducing a
stronger assumption than Assumption 2, used in Chen and Gatmiry (2023) which achieves
non-asymptotic guarantees of order O(d1/4 ) and is verifiable for many applications. This
additional assumption is introduced in Section 2.1. We also correct an issue in the fifth step
of the local error proof in Sanz-Serna and Zygalakis (2021) and correct the constant C0 in
Theorem 25. This is done in Section 3.1 and in Section 3.2 a new version of Theorem 25 is
stated with the new constants.

2.1 Strongly Hessian Lipschitz assumption


Chen and Gatmiry (2023) introduced the notion of strongly Hessian Lipschitz. We are
going to use this new concept (Assumption 6) instead of Assumption 2, which was used in
Sanz-Serna and Zygalakis (2021), to correct dimension dependence in inequality (7).
The strongly Hessian Lipschitz property relies on the following tensor norm.

Definition 5 For A ∈ Rd×d×d , let


 
d
 X d
X d
X 
kAk{1,2}{3} = sup Aijk xij yk x2ij ≤ 1, yk2 ≤ 1
x∈Rd×d ,y∈Rd 
i,j,k=1 i,j=1 k=1

Assumption 6 f : Rd → R is thrice differentiable and Ls1 -strongly Hessian Lipschitz if

kH0 (x)k{1,2}{3} ≤ Ls1

for all x ∈ Rd .

From the definition, it is clear that kAk{1,2}{3} ≥ kAk{1}{2}{3} , and so by (2), the
strong Hessian Lipschitz property (Assumption 6) implies the Hessian Lipschitz property
(Assumption 2) with L1 = Ls1 . We will show a result in the other direction in Lemma 8.
It is also easy to check that the strong Hessian Lipschitz property does not introduce
dimension dependency for product target distributions, since A ∈ Rd×d×d with Aijk =
0 unless i = j = k (diagonal tensors), kAk{1,2},{3} = maxi |Aiii |, and so Ls1 equals the
maximum of the supremum of the absolute value of third derivatives amongst the potentials
of all components. Other examples of interest which do not introduce dimension dependency
include the Bayesian multinomial regression (see (Chada et al., 2023, Lemma H.6)), ridge
separable functions and 2-layer neural networks (see Chen and Gatmiry (2023)).

3
Paulin and Whalley

The following Lemma allows us to control the quantity


   
X X X
Ep∼N (0,Id ) kA[p, p, ·]k2 = Ep∼N (0,Id ) 

 Aijk pi pj   Almk pl pm  .
k≤d i,j≤d l,m≤d

This is a special case of Lemma 13 of Chen and Gatmiry (2023), but with an explicit
constant in the bound (their constant was not made explicit).
Lemma 7 Let p ∼ N (0, Id ), and
  
X X X
g(p) =  Aijk pi pj   Almk pl pm  .
k≤d i,j≤d l,m≤d

Then
Eg(p) ≤ 3dkAk2{12}{3} .
Proof Due to the independence of the components, E(pi pj pl pm ) = 0 unless there are either
two pairs of the same or four of the same amongst the indices. We know that E(p4i ) = 3.
Hence, we have
X  X
Aiik Ajjk + A2ijk + Aijk Ajik ≤ Aiik Ajjk + 2A2ijk .

E(g(p)) =
i,j,k i,j,k

We have that
 
 X X X 
kAk{12}{3} = sup Ai1 i2 i3 x i1 i2 y i3 x2i1 i2 ≤ 1, yi23 ≤ 1
x,y  
i1 ,i2 ,i3 i1 ,i2 i3
 
 !2 1/2 
 X X X 
2
= sup  Ai1 i2 i3 yi3  yi3 ≤ 1
y 
 i1 ,i2 i3 
i3 
 1/2 1/2
X 2
X
= sup  hAi1 ,i2 ,· , yi = Ai1 ,i2 ,· · ATi1 ,i2 ,·
y:kyk≤1 i1 ,i2 i1 ,i2
1/2
X
= ATi1 ,·,· · Ai1 ,·,· .
i1

Now, it is easy to see that


!
X X
2 A2ijk = 2Tr ATi1 ,·,· · Ai1 ,·,· ≤ 2dkAk2{12}{3} .
i,j,k i1

For the other term i,j,k Aiik Ajjk , we define a matrix A ∈ Rd×d as Aik = Aiik . Let e ∈ Rd
P
be a vector of ones, i.e. e1 = 1, . . . , ed = 1. Using these, we have
X T
Aiik Ajjk = eT A Ae ≤ dkAk2 .
i,j,k

4
Wasserstein distance and numerical approximations to ergodic SDEs

We have
 
 X X X 
kAk{12}{3} = sup Ai1 i2 i3 x i1 i2 y i3 x2i1 i2 ≤ 1, yi23 ≤ 1
x,y  
i1 ,i2 ,i3 i1 ,i2 i3

by only considering x such that xi1 i2 = 0 for i1 6= i2 ,


 
X X X 
≥ sup Ai1 i1 i3 xi1 i1 yi3 x2i1 i1 ≤ 1, yi23 ≤ 1 = kAk,
x,y  
i1 ,i3 i1 i3

Aiik Ajjk ≤ dkAk2{12}{3} , and the claim of the lemma follows.


P
hence i,j,k

Using Lemma 7 and starting from the top of page 31 we have

E(kH0 (x)[v, v]k2 ) ≤ 3(Ls1 )2 c2 d,

and we can simply replace L1 by Ls1 in the estimates of Sanz-Serna and Zygalakis (2021).
Note that although the strongly Hessian Lipschitz assumption is stronger than the pre-
vious Hessian Lipschitz assumption (i.e. L1 ≤ Ls1 ), it is possible to show that every Hessian
Lipschitz function is also strongly Hessian Lipschitz, due to the following result.

Lemma 8 For any A ∈ Rd×d×d , kAk{12}{3} ≤ dkAk{1}{2}{3} . Hence every L1 -Hessian

Lipschitz function is dL1 -strongly Hessian Lipschitz.

Proof In the proof of Lemma 7, we have shown that

1/2
X
kAk{12}{3} = ATi1 ,·,· · Ai1 ,·,· .
i1

Let ei = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rd be the unit vector with 1 in component i. Using


Definition 3, we have for every i ≤ d,
 
d
 X X X X 
kAk{1}{2}{3} = sup Aijk xi yj zk x2i ≤ 1, yj2 ≤ 1, zk2 ≤ 1
 
i,j,k=1 i j k
 
d
 X X X 
≥ sup Aijk xi yj zk xi = ei , yj2 ≤ 1, zk2 ≤ 1 = kATi,·,· · Ai·,· k1/2 ,
 
i,j,k=1 j k

and the claim follows by rearrangement.

5
Paulin and Whalley

3. Non-asymptotic guarantees for UBU


In Sanz-Serna and Zygalakis (2021) they consider contraction and discretization bias of
the UBU scheme with γ = 2 and h ≤ 2 in the 2-Wasserstein distance with respect to
Ph := P̂ ⊗ Id , where  
1 1
P̂ = . (8)
1 2
They define the weighted Hilbert-space norm with respect to the matrix Ph , k · kL2 ,Ph and
corresponding inner product h·, ·iL2 ,Ph . They then define the iterates of UBU, (ξn )n∈N with
ξ0 ∼ π, by the update rule (vn+1 , xn+1 ) = ξn+1 = ψh (ξn , tn ), tn = nh, n = 0, 1, ..., h > 0
to be the one-step approximation governed by the UBU integrator with initial condition
ξ ∈ R2d and φh (·, ·) to be the exact counterpart with shared Brownian incremements. They
introduce at each time level n, the random variable ξˆn ∼ π ∗ to be the optimal coupling
such that WP (π ∗ , Ψh,n π) = kξˆn − ξn kL2 ,Ph , where Ψh,n π is the measure of ξn . They have
shown in Sanz-Serna and Zygalakis (2021)[Theorem 18] that for γ = 2, c = c/(L + m),
where c ∈ (0, 4), there is a h0 , such that, for any h ≤ h0 and n ∈ N
(2) (1)
kξn+1 − ξn+1 k2Ph ≤ ρh kξn(2) − ξn(1) k2Ph , (9)

(i)
where ρh ∈ (0, 1) for realizations (ξk )k∈N for i = 1, 2 of the UBU discretization.

Assumption 9 (Assumption 22 of Sanz-Serna and Zygalakis (2021)) There is a de-


composition
φh (ξˆn , tn ) − ψh (ξˆn , tn ) = αh (ξˆn , tn ) + βh (ξˆn , tn ),
and positive constants p, h0 , C0 , C1 , C2 such that for n ≥ 0 and h ≤ h0 :
D E
Ψh (ξˆn , tn ) − ψh (ξn , tn ), αh (ξˆn , tn ) ≤ C0 hkξˆn − ξn kL2 ,Ph kαh (ξˆn , tn )kL2 ,Ph (10)

and
kαh (ξˆn , tn )kL2 ,Ph ≤ C1 hp+1/2 , kβh (ξˆn , tn )kL2 ,Ph ≤ C2 hp+1 . (11)

Theorem 10 (Theorem 23 of Sanz-Serna and Zygalakis (2021)) Assume that the in-
tegrator satisfies Sanz-Serna and Zygalakis (2021)[Assumption 22] and in addition, there
are constants h0 > 0, r > 0 such that for h ≤ h0 the contractivity estimate (9) holds with
ρh ≤ (1 − rh)2 . Then, for any initial distribution π, stepsize h ≤ h0 , and n = 0, 1, ...,
√ !
∗ n ∗ 2C1 C2
WPh (π , Ψh,n π) ≤ (1 − hRh ) WPh (π , π) + √ + hp (12)
Rh Rh

with
1 p 
Rh = 1 − (1 − rh)2 + C0 h2 = r + o(1), as h ↓ 0.
h
The remaining subsections are devoted to correcting the constants in Assumption 22 for
the UBU integrator and p = 2 and thereby the non-asymptotic guarantees by Theorem 23
of Sanz-Serna and Zygalakis (2021).

6
Wasserstein distance and numerical approximations to ergodic SDEs

3.1 The local error of UBU: Error in the fifth step


Throughout this section we use the same notation as Sanz-Serna and Zygalakis (2021)[Sec-
tion 7.6], in particular, (vn+1 , xn+1 ) = ξn+1 and (ṽn+1 , x̃n+1 ) denotes the velocity component
and position of a UBU step initialized at ξˆn = (v̂n , x̂n ). In the fifth step of Sanz-Serna and
Zygalakis (2021)[Section 7.6] they use the equality

D E
ψh (ξˆn , tn ) − ψh (ξn , tn ), αh (ξˆn , tn ) = |E(hṽn+1 − vn+1 , αv i)|.
L2 ,Ph

Unfortunately, this is incorrect, due to the matrix inner product defined by Ph , we have

D E
ψh (ξˆn , tn ) − ψh (ξn , tn ), αh (ξˆn , tn ) = |E(hṽn+1 − vn+1 , αv i + hx̃n+1 − xn+1 , αv i)|.
L2 ,Ph

The first term on the right-hand-side of this expression was bounded in Sanz-Serna and
Zygalakis (2021)[Eq. (43)] in terms of E(kv̂n − vn k2 ), E(kx̂n − xn k2 ) and E(kαv k2 ). The
additional term can be treated by the same argument as the fourth step of Sanz-Serna and
Zygalakis (2021)[Section 7.6] that is we estimate

|E (hx̃n+1 − xn+1 , αv i) | = |E (hx̃n+1 − x̂n − xn+1 + xn , αv i) |


1/2 1/2
≤ Ekx̃n+1 − x̂n − xn+1 + xn k2 E(kαv k2 ) .

Now, from (5),

x̃n+1 − x̂n − xn+1 + xn = F(h)(v̂n − vn ) − hF(h/2)c(∇f (ỹn ) − ∇f (yn ))

with (6)
p Z tn+1/2
ỹn = x̂n + F(h/2)v̂n + 2γc F(tn+1/2 − s)dW (s),
tn

and thus, since F(h) ≤ h

1/2 1/2 h2 cL  1/2


E kx̃n+1 − x̂n − xn+1 + xn k2 ≤ hE kv̂n − vn k2 + E (kỹn − yn k)2 .
2

Taking into account (6) and the definition of ỹn

1/2 h 1/2
(Ekỹn − yn k2 )1/2 ≤ E kx̂n − xn k2 + E kv̂n − vn k2
2

and we conclude that |E (hx̃n+1 − xn+1 , αv i) | is bounded above by

h2 cL
   
hcL 2 1/2 2 1/2
1/2
E(kαv k2 )
 
h E kx̂n − xn k + 1+ E kv̂n − vn k .
2 4

7
Paulin and Whalley

3.2 Theorem 25 of Sanz-Serna and Zygalakis (2021)


In this section we are stating the corrected version of Sanz-Serna and Zygalakis (2021)[The-
orem 25].

Theorem 11 Assume that f satisfies Assumptions 1 and 6. Set γ = 2 and Ph = P̂ ⊗


Id . Then, for h ≤ 2, the UBU discretization satisfies Assumption 22 of Sanz-Serna and
Zygalakis (2021) (Assumption 9) with p = 2,

C0 = K0 (3 + 2cL)
C1 = K1 c3/2 Ld1/2
√ !
√ 2 3/2 42 3/2 √
C2 = K2 (1 + 4 3)c L + (3 + )c L + 6cL1/2 + 3c2 Ls1 d1/2 ,
2

where Kj , j = 0, 1, 2, are the following absolute constants


s √ s √
4 3 1 3+ 5
K0 = √ , K1 = , K2 = .
3− 5 12 24 2

Proof Combining estimates of Section 3.1 in the norm k(v, x)k2Ph := kvk2 + 2 hv, xi + 2kxk2
q
and using the fact that k(x, v)k ≤ 3−2√5 k(x, v)kPh we have

D E
ψh (ξˆn , tn ) − ψh (ξn , tn ), αh (ξˆn , tn ) 2 ≤
L ,Ph
h2 cL hcL
    
hcL 2 1/2 2 1/2
1/2
E(kαv k2 )
 
h + cL E kx̂n − xn k + 3+ + E kv̂n − vn k
2 4 2
s
4
≤h √ (3 + 2cL) kξˆn − ξn kL2 ,Ph kαh kL2 ,Ph ,
3− 5

and the required C0 constant, with C2 given by following the argument of Sanz-Serna and
Zygalakis (2021)[Section 7.6] using Lemma 7.

Acknowledgments

We would like to thank Jesus Sanz-Serna and Kostas Zygalakis for the helpful correspon-
dence regarding their paper.

References
Neil K Chada, Benedict Leimkuhler, Daniel Paulin, and Peter A Whalley. Unbiased Kinetic
Langevin Monte Carlo with Inexact Gradients. arXiv preprint arXiv:2311.05025, 2023.

8
Wasserstein distance and numerical approximations to ergodic SDEs

Yuansi Chen and Khashayar Gatmiry. When does Metropolized Hamiltonian Monte
Carlo provably outperform Metropolis-adjusted Langevin algorithm? arXiv preprint
arXiv:2304.04724, 2023.

Jesus Marı́a Sanz-Serna and Konstantinos C. Zygalakis. Wasserstein distance estimates for
the distributions of numerical approximations to ergodic stochastic differential equations.
J. Mach. Learn. Res., 22:Paper No. 242, 37, 2021. ISSN 1532-4435,1533-7928. doi:
10.1080/14685248.2020.1855352.

Alfonso Alamo Zapatero. Word series for the numerical integration of stochastic differential
equations. PhD thesis, Universidad de Valladolid, 2017.

You might also like