1 Introduction

1.1 General overview

Monge–Kantorovich distances, also known as Wasserstein distances, play a central role in statistical mechanics, especially in the theory of propagation of chaos and studying large particle systems’ mean behavior. From the late 1970s, there have been many applications of Wasserstein distances in kinetic theory, as is beautifully described in the bibliographical notes of [60, Chapter 6]. In particular, these distances are frequently used to prove the uniqueness and stability of solutions to kinetic equations, study singular limits, and measure convergence to equilibrium.

The first celebrated result relying on Monge–Kantorovich–Wasserstein distances in non-collisional kinetic theory is the proof by Dobrushin [16] on the well-posedness for Vlasov equations with \(C^{1,1}\) potentials, where existence, uniqueness, and stability are proved via a fixed point argument in the bounded-Lipschitz or the 1-Wasserstein distance. As a consequence of this argument, one also obtains the validity of the mean-field limit for Vlasov equations with smooth potentials. The interested reader may refer to [20, Chapter 1.4] and [40, Chapter 3.3] for a detailed explanation of Dobrushin’s stability estimate, its consequences on the mean-field limit for the Vlasov equation, and of the role of Monge–Kantorovich–Wasserstein distances. Dobrushin’s estimate is at the core of several kinetic theory arguments; see, for example, [10,11,12,13, 15, 18, 22, 29] for some applications.

In recent times, Golse and Paul in [23] introduced a quantum analog of the 2-Wasserstein distance to measure the approximation of the N-body quantum dynamics by its mean-field limit. In [21] the authors prove quantitative stability estimates that are reminiscent of Dobrushin’s, and they show that, in the case of \(C^{1,1}\) potentials, the mean-field limit of the quantum mechanics of N identical particles is uniform in the classical limit.

Another fundamental stability estimate was proved by Loeper [49], who established uniqueness and stability of solutions with bounded density for the Vlasov–Poisson equation. Loeper’s argument relies on the fact that the Coulomb kernel is generated by a potential solving Poisson’s equation and exploits the strong connection between the 2-Wasserstein distance and the \(H^{-1}\)-norm. Besides providing the best-known uniqueness criterion for Vlasov–Poisson, this approach also gives a new proof of uniqueness \(\grave{a}\) la Yudovich for 2D Euler. Loeper’s result has been generalized to less singular kernels [37], and it is the cornerstone for several other stability arguments [7, 14, 36, 42, 44, 48, 58]. Also, Loeper’s uniqueness criterion for Vlasov–Poisson has been extended to solutions whose associated density belongs to some suitable Orlicz spaces [38, 51]. In what follows, we will focus our attention on some applications of Loeper’s stability estimate related to quasi-neutral limit for the Vlasov–Poisson equation [28, 30, 34, 35].

In general, extending Dobrushin’s and Loeper’s estimates is a delicate matter. A possible idea is to introduce an anisotropic metric that weights spatial and momentum coordinates differently. For example, in [43], the author considers a variant of the 2-Wasserstein distance where the cost for moving points in the x-variable is higher than for the v-variable. By suitably selecting the parameters, this allows the author to extend the validity ranges for the mean-field limit for the Vlasov–Poisson system. Also, as shown in [28, 30], an analogous method provides better convergence estimates when considering combined mean-field and quasi-neutral limits in Vlasov–Poisson-type systems. At the same time as this paper was written, another variant of this idea was introduced in [56], where the author improves the trend to equilibrium for 1-D kinetic Fokker–Planck equations via estimates measured in an analog of the 2-Wasserstein metric.

This work aims to push further the idea that, when applied to kinetic problems, Wasserstein distances should be modified to reflect the natural anisotropy between position and momentum variables. Moreover, since these metrics are used to measure the distance between PDEs’ solutions, we will introduce time-dependent counterparts that can vary along with the characteristic flow. Still, it is worth noticing that our method could be applied, beyond the kinetic framework, to equations where the evolution in one of the variables enjoys better regularity properties than the others.

Before stating our main results, let us emphasize that the idea of finding appropriate generalised Wasserstein distances has been used successfully in other contexts in the optimal transport and evolution PDE community; see, for instance, [17, 19, 45, 46, 54, 55] and references therein.

1.2 Definitions and main results

Let us recall the definition of Wasserstein distances (see for instance [1, 60]). In what follows, \({\mathcal {X}}\) will be either the d-dimensional torus \({\mathbb {T}}^d\) or the Euclidean space \(\mathbb {R}^d\).

Definition 1.1

Let \(\mu , \nu \) be two probability measures on \({\mathcal {X}}\times \mathbb {R}^d\). We denote with \(\Pi (\mu , \nu )\) the set of all probability measures on \(({\mathcal {X}}\times \mathbb {R}^d)^2\) with marginals \(\mu \) and \(\nu .\) More precisely, \(\pi \in \Pi (\mu , \nu )\) if

$$\begin{aligned}&\pi [A \times ({\mathcal {X}}\times \mathbb {R}^d)]=\mu [A], \ \ \\&\qquad \pi [(\mathcal X\times \mathbb {R}^d) \times B]= \nu [B], \qquad \text {for all } A,B\subset {\mathcal {X}}\times \mathbb {R}^d \text { Borel.} \end{aligned}$$

We shall call coupling (between \(\mu \) and \(\nu \)) an element in \(\Pi (\mu , \nu ).\)

For \(p\geqq 1\), the p-Wasserstein distance between \(\mu \) and \(\nu \) is defined as

$$\begin{aligned} W_{p} (\mu , \nu ):=\left( \inf _{\gamma \in \Pi (\mu ,\nu )} \int _{(\mathcal {X} \times \mathbb {R}^d)^2} \left( |x-y|^{p} +|v-w|^p\right) \mathrm {d} \gamma (x,v,y,w) \right) ^{1/p}. \end{aligned}$$

(1) A free-flow \(W_1\)-type distance for the Vlasov equations with \(C^{1,1}\) potential. Consider two solutions \(f_1,f_2\) of the Vlasov equation on \({\mathcal {X}}\), namely

$$\begin{aligned} \partial _t f+v\cdot \nabla _x f+F[f]\cdot \nabla _vf=0,\qquad F[f]:=\nabla K*\rho _f,\qquad \rho _f:=\int f \mathrm {d}v, \end{aligned}$$

where \(\Vert D^2 K\Vert _{\infty }=:B<\infty \). The classical Dobrushin’s argument shows that

$$\begin{aligned} W_1(f_1(t),f_2(t))\leqq e^{(1+2B)t}W_1(f_1(0),f_2(0)). \end{aligned}$$

In particular, when the potential K is identically zero, this bound provides an exponential stability for \(W_1\) that is far from optimal. Indeed, since the solution is simply given by \(f(t,x,v)=f(0,x-tv,v)\), it is clear that in this case \(W_1(f_1(t),f_2(t))\sim t\) for \(t \gg 1\).

By introducing a \(W_1\)-type distance adapted to the free flow, we can prove that

$$\begin{aligned} W_1(f_1(t),f_2(t))\leqq \min \left\{ (1+t) e^{\frac{2}{3} B\left( (1+t)^3-1\right) },e^{(1+2B)t}\right\} W_1(f_1(0),f_2(0)) \end{aligned}$$

(see Theorem 2.1 below).

This estimate gives the optimal bound when \(K\equiv 0\). Moreover, for \(B\leqq 1\), this provides a better estimate compared to the usual Dobrushin’s bound when \(t \in [0,T_B]\) with \(T_B\simeq B^{-1/2}\).

(2) An improved \(W_2\)-stability estimate for Vlasov–Poisson with bounded density. For this second application, we focus on the case of the torus for simplicity, but a completely similar analysis works on the whole space.

Consider two solutions \(f_1,f_2\) of the Vlasov–Poisson equation on \({\mathbb {T}}^d\), namely

$$\begin{aligned} \partial _t f+v\cdot \nabla _x f+\nabla U\cdot \nabla _vf=0,\qquad -\Delta U:= \rho _f-1,\qquad \rho _f:=\int f \mathrm {d}v. \end{aligned}$$

As shown in [49], Loeper’s proof provides the following stability estimate whenever \(W_2(f_1(0),f_2(0))\) is sufficiently small (which is the interesting case):

$$\begin{aligned} W_2(f_1(t),f_2(t))\leqq c_d e^{\log \left( \frac{W_2(f_1(0),f_2(0))}{c_d}\right) e^{-Ct}}. \end{aligned}$$

Here \(c_d>0\) is a dimensional constant, while C depends on the \(L^\infty \) norm of \(\rho _{f_1}\) and \(\rho _{f_2}.\) This estimate can then be applied to prove the validity of the quasi-neutral limit for Vlasov–Poisson for initial data that are double-exponential perturbation of analytic functions [34, 35] (see Remark 3.4).

To improve this result, given \((X_i,V_i)\) the characteristics associated to \(f_i\), we consider a nonlinear \(W_2\)-type quantity of the form

$$\begin{aligned} Q(t):= & {} \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left[ \lambda (t)|X_1(t,x,v)-X_2(t,y,w)|^2\right. \\&\left. + |V_1(t,x,v)-V_2(t,y,w)|^2\right] \mathrm {d}\pi _0(x,v,y,w), \end{aligned}$$

where \(\pi _0\) is an optimal coupling, and \(\lambda (t)=|\log (Q(t))|.\) We then prove that Q(t) is well-defined whenever \(Q(t)\ll 1,\) and finally, comparing Q(t) to \(W_2,\) we show that

$$\begin{aligned} W_2(f_1(t),f_2(t))^2 \leqq 2 e^{-\left( \sqrt{\left| \log \left\{ W_2(f_1(0),f_2(0))^2 \left| \log \left( \frac{1}{2} W_2(f_1(0),f_2(0))^2\right) \right| \right\} \right| } -Ct \right) ^2}. \end{aligned}$$

(see Theorem 3.1). To better understand the improvement of our estimate with respect to Loeper’s, one can think as follows: if \(W_2(f_1(0),f_2(0)= \theta \ll 1\), then Loeper’s estimate implies that \(W_2(f_1(t),f_2(t)) \lesssim 1\) for \(t \in [0,\log |\log \theta |]\). Instead, our bound gives \(W_2(f_1(t),f_2(t)) \lesssim 1\) for \(t \in \bigl [0,|\log \theta |^{1/2}\bigr ]\), so on a much longer time-interval.

Remark 1.2

  • Note that a standard Gronwall estimate of the form \(W_2(f_1(t),f_2(t))\leqq e^{Ct}W_2(f_1(0),f_2(0))\) would imply that \(W_2(f_1(t),f_2(t)) \lesssim 1\) for \(t \in [0,|\log \theta |]\). So, while Loeper’s bound loses an extra logarithm in terms of time-scale, our bound only loses a square root. Since the electric field for a solution with bounded density is at most log-Lipschitz, an estimate of the form \(W_2(f_1(t),f_2(t))\leqq e^{Ct}W_2(f_1(0),f_2(0))\) is not expected to hold in this setting, and we believe our bound to be essentially sharp.

  • Our improvement from \(\log |\log \theta |\) to \(|\log \theta |^{1/2}\) is similar to the one obtained for the \(W_1\) distance, see [38, Remark 1.7]. In that paper, the authors rely crucially on the second-order structure of the Vlasov equation, namely \(\ddot{X}=\nabla U(t,X)\). Our proof, instead, relies only on the fact that \(\dot{X}=a(t,X,V)\), where \(a(t,\cdot ,\cdot )\) is Lipschitz, and it can be generalized to other contexts where the second-order structure fails.

Our new stability estimate has interesting applications for what concerns some singular limits for Vlasov-type equations. In particular, by considering the Vlasov–Poisson system in appropriate dimensionless variables that take the Debye length into account, we prove the validity of the quasi-neutral limit for Vlasov–Poisson for initial data that are an exponential perturbation of analytic functions, see also Remark 3.4.

The paper is structured as follows: in the next two sections, we will present our two main results, and then in the final section of the paper, we will discuss more generally our approach and how it leads to the introduction of a new family of Wasserstein-type distances.

2 Dobrushin’s Estimate Revisited

2.1 The Vlasov equation

The Vlasov equation is a non-linear partial differential equation providing a statistical description for the collective behavior of large numbers of charged particles in mutual, long-range interaction. This model was first introduced by Jeans in the context of Newtonian stellar dynamics [41], and later by Vlasov in his work on plasma physics [61, 62]. The unknown of the Vlasov equation f(txv) is the distribution function of the system at time, that is the number density of particles that are located at the position x and have instantaneous velocity v at time t. The Vlasov equation for the distribution function f reads as

$$\begin{aligned} \partial _t f(t,x,v)+v\cdot \nabla _xf(t,x,v) +F[f](t,x) \cdot \nabla _v f(t,x,v)=0, \end{aligned}$$

where

$$\begin{aligned} F[f](x)= \iint \nabla K(x-y)\,f(\mathrm {d}y,\mathrm {d}w)=\nabla K*_{x,v} f. \end{aligned}$$

In other words, the Vlasov equation for particle systems is a kinetic model where each particle is subject to the acceleration field F[f] created by all the other particles in the system.

The Vlasov equation is a transport equation and, for a sufficiently regular force field, it can be described by the method of characteristics. The initial distribution \(f_0\) is transported by a characteristic flow (XV) generated by the mean-field force F[f]; if we denote

$$\begin{aligned} \left\{ \begin{array}{l} \dot{X}(t,x,v)=V(t,x,v),\\ \dot{V}(t,x,v)=F[f](t,X(t,x,v)),\\ X(0,x,v)=x,\,V(0,x,v)=v, \end{array} \right. \end{aligned}$$

then \(f(t,X(t,x,v),V(t,x,v))=f(0,x,v)\). Also, since the vector field (vF[f]) is divergence free, one has conservation of mass and of all \(L^p\)-norms. For an introduction to this topic we refer to the lecture notes [20].

2.2 An improved Dobrushin’s estimate

Consider the Vlasov equation with smooth kernel. More precisely,

$$\begin{aligned} \partial _t f+v\cdot \nabla _x f+F[f]\cdot \nabla _vf=0,\qquad F[f]:=\nabla K*\rho _f,\qquad \rho _f:=\int f \mathrm {d}v,\nonumber \\ \end{aligned}$$
(2.1)

where \(\Vert D^2 K\Vert _{\infty }=:B<\infty \). As explained in the introduction, our goal is to provide a stability estimate for solutions that is optimal in the regime as B tends to zero. Here is our result:

Theorem 2.1

Let \(f_1,f_2\) be two solution of (2.1). Then

$$\begin{aligned} W_1(f_1(t),f_2(t))\leqq \min \left\{ (1+t) e^{\frac{2}{3} B\left( (1+t)^3-1\right) },e^{(1+2B)t}\right\} W_1(f_1(0),f_2(0)). \end{aligned}$$

Proof

Let \((X_i,V_i)\) denote the characteristic flow associated to \(f_i\), that is,

$$\begin{aligned} \left\{ \begin{array}{l} \dot{X}_i(t,x,v)=V_i(t,x,v),\\ \dot{V}_i(t,x,v)=\nabla \bigl (K*\rho _{f_i}\bigr )(t,X_i(t,x,v)),\\ X_i(0,x,v)=x,\,V_i(0,x,v)=v. \end{array} \right. \end{aligned}$$

Note that, since \(\nabla K\) is Lipschitz, the characteristic flow is well-defined thanks to Cauchy–Lipschitz theory (see [20, Chapter 2]). To prove Theorem 2.1, we consider \(\pi _0\) an optimal \(W_1\)-coupling between \(f_{1}(0)\) and \(f_{2}(0)\), and we define the quantity

$$\begin{aligned} Q(t):= & {} \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \big [|X_1(t,x,v)-t V_1(t,x,v)-(X_2(t,y,w)-tV_2(t,x,v))|\\&+|V_1(t,x,v)-V_2 (t,y,w)|\big ]\mathrm {d}\pi _0(x,v,y,w). \end{aligned}$$

Note that

$$\begin{aligned} Q(0)=\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left[ |x-y|+|v-w|\right] \mathrm {d}\pi _0(x,v,y,w)=W_1(f_1(0),f_2(0)).\nonumber \\ \end{aligned}$$
(2.2)

Also

$$\begin{aligned} \begin{aligned}&\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \big [|X_1(t,x,v)-X_2(t,y,w)|+|V_1(t,x,v)-V_2(t,y,w)|\big ]\\&\qquad \mathrm {d}\pi _0(x,v,y,w)\\&\leqq \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \big [|X_1(t,x,v)-t V_1(t,x,v)-(X_2(t,y,w)-tV_2(t,x,v))|\\&\qquad +|V_1(t,x,v)-V_2 (t,y,w)|\big ]\mathrm {d}\pi _0(x,v,y,w)\\&\qquad +t \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} |V_1(t,x,v)-V_2(t,y,w)|\mathrm {d}\pi _0(x,v,y,w)\\&\leqq (1+t)Q(t). \end{aligned} \end{aligned}$$
(2.3)

Since \(\frac{d}{dt}(X_i-tV_i)=-t\dot{V}_i\), one has

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}Q(t)&\leqq (1+t)\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left[ |\dot{V}_1(t,x,v)-\dot{V}_2(t,y,w)|\right] \mathrm {d}\pi _0(x,v,y,w)\\&=(1+t)\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left| \nabla \bigl (K*\rho _{f_1}\bigr )(t,X_1(t,x,v)) \right. \\&\quad \left. - \nabla \bigl (K*\rho _{f_2}\bigr )(t,X_2(t,y,w))\right| \mathrm {d}\pi _0(x,v,y,w)\\&\leqq (1+t)\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left| \nabla \bigl (K*\rho _{f_1}\bigr )(t,X_1(t,x,v))\right. \\&\quad \left. - \nabla \bigl (K*\rho _{f_2}\bigr )(t,X_1(t,x,v))\right| \mathrm {d}\pi _0(x,v,y,w)\\&\qquad +(1+t)\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left| \nabla \bigl (K*\rho _{f_2}\bigr )(t,X_1(t,x,v)) \right. \\&\quad \left. - \nabla \bigl (K*\rho _{f_2}\bigr )(t,X_2(t,y,w))\right| \mathrm {d}\pi _0(x,v,y,w)\\&=:(1+t)\bigl [T_1+T_2\bigr ]. \end{aligned}$$

We now observe that, since \(\nabla K\) is B-Lipschitz, we can bound

$$\begin{aligned} T_2\leqq B\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \big [|X_1(t,x,v)-X_2(t,y,w)|\big ]\mathrm {d}\pi _0(x,v,y,w) \leqq B(1+t)Q(t), \end{aligned}$$

where the second inequality follows from (2.3). For \(T_1\), we note that

$$\begin{aligned}&\left| \nabla \bigl (K*\rho _{f_1}\bigr )(t,X_1(t,x,v)) - \nabla \bigl (K*\rho _{f_2}\bigr )(t,X_1(t,x,v))\right| \\&\quad =\left| \int _{\mathbb {T}^d} \nabla K(X_1(t,x,v)-z)\mathrm {d}\,\bigl (\rho _{f_1(t)}(z)-\rho _{f_2(t)}(z)\bigr ) \right| . \end{aligned}$$

Here, similarly to Dobrushin’s argument, we use that \(W_1\) admits the following dual formulation:

$$\begin{aligned} W_1(\mu ,\nu )=\sup _{\psi \,\text {1-Lip}}\int \psi \, \mathrm {d}(\mu -\nu ). \end{aligned}$$
(2.4)

Thanks to this fact, since \(z\mapsto \nabla K(X_1(t,x,v)-z)\) is B-Lipschitz, we deduce that

$$\begin{aligned} \left| \int _{\mathbb {T}^d} \nabla K(X_1(t,x,v)-z)\!\mathrm {d}\bigl (\rho _{f_1(t)}(z)-\rho _{f_2(t)}(z)\bigr ) \right| \leqq B\,W_1(\rho _{f_1(t)},\rho _{f_2(t)}), \end{aligned}$$

and therefore

$$\begin{aligned} T_1\leqq B\,W_1(\rho _{f_1(t)},\rho _{f_2(t)}) \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2}\mathrm {d}\pi _0(x,v,y,w)= B\,W_1(\rho _{f_1(t)},\rho _{f_2(t)}). \end{aligned}$$

Let \(\gamma _t=(X_1(t,\cdot ,\cdot ),X_2(t,\cdot ,\cdot ))_\#\pi _0 \in \Pi (\rho _{f_1},\rho _{f_2}).\) Then, by the definition of \(W_1\) (see Definition 1.1),

$$\begin{aligned}&W_1(\rho _{f_1(t)},\rho _{f_2(t)}) \le \int _{(\mathbb {T}^d)^2} |x-y| \mathrm {d}\gamma _t(x,y)\\&\quad =\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} |X_1(t,x,v)-X_2(t,y,w)| \mathrm {d}\pi _0(x,v,y,w), \end{aligned}$$

so using (2.3) we conclude that \(T_1 \leqq B(1+t)Q(t)\).

In conclusion, we proved that

$$\begin{aligned} Q'(t)\leqq 2B(1+t)^2Q(t), \end{aligned}$$

therefore

$$\begin{aligned} Q(t)\leqq e^{\frac{2}{3} B\left( (1+t)^3-1\right) }Q(0). \end{aligned}$$

Recalling (2.2) and (2.3), this yields

$$\begin{aligned} W_1(f_1(t),f_2(t))\leqq (1+t) e^{\frac{2}{3} B\left( (1+t)^3-1\right) }W_1(f_1(0),f_2(0)). \end{aligned}$$
(2.5)

As noted in the introduction, this estimate is more powerful than the usual Dobrushin’s estimateFootnote 1

$$\begin{aligned} W_1(f_1(t),f_2(t))\leqq e^{(1+2B)t}W_1(f_1(0),f_2(0)) \end{aligned}$$

when B is small. On the other hand, for large times, the term \((1+t)^3\) in our estimate provides a worse bound (2.6). Hence, both bounds are helpful depending on the mutual sizes of B and t, and one can choose to apply whichever gives the stronger bound. In conclusion, one has

$$\begin{aligned} W_1(f_1(t),f_2(t))\leqq \min \left\{ (1+t) e^{\frac{2}{3} B\left( (1+t)^3-1\right) },e^{(1+2B)t}\right\} W_1(f_1(0),f_2(0)), \end{aligned}$$

as desired.\(\square \)

3 Stability Estimates for Vlasov–Poisson and Quasi-Neutral Limits

3.1 The Vlasov–Poisson system

The Vlasov–Poisson system is the classical kinetic model describing dilute, totally ionised, unmagnetized plasma. In its most common form, f is the distribution function of the electrons moving in a self-induced electrostatic field, while the ions are assumed to act as a fixed background. In this section, we consider the phase space to be \({{\mathbb {T}}}^d\times {{\mathbb {R}}}^d,\) for reasons that will be explained later.

$$\begin{aligned} (VP):= \left\{ \begin{array}{l}\partial _t f+v\cdot \nabla _x f+ E\cdot \nabla _v f=0, \\ E=-\nabla U, \\ \Delta U=1- \int _{{{\mathbb {R}}}^d} f\, \mathrm {d}v= 1- \rho _f,\\ f\vert _{t=0}=f_{0}\ge 0,\ \ \int _{{{\mathbb {T}}}^d \times {{\mathbb {R}}}^d} f_{0}\,\mathrm {d}x\,\mathrm {d}v=1. \end{array} \right. \end{aligned}$$
(3.1)

The well-posedness theory of this system has been extensively studied, see, for example, the survey paper [32]. Global-in-time classical solutions have been constructed under various conditions on the initial data (see for example [4, 6, 47, 53, 57, 59]), while global-in-time weak solutions were presented in [2] and [39] for \(L^p\) initial data (see also [3, 5]). In this section, we will focus on an important contribution to the uniqueness theory made by Loeper [49], who proved uniqueness for solutions of (3.1) with bounded density by means of a strong–strong stability estimate in Wasserstein.

3.2 Quasi-neutral limits

Since plasmas are excellent conductors of electricity, and any charges that develop are readily neutralized, they can be treated as being quasi-neutral. On the other hand, at small spatial and time scales, the quasi-neutrality is no longer verified. The distance over which quasi-neutrality may break down can be described in terms of the Debye length \(\lambda _D\), and varies according to the physical characteristics of the plasma. The Debye length is usually much shorter than the typical observation scale. Therefore, we can define the parameter \(\varepsilon := \lambda _D/L\) and consider the limit as \(\varepsilon \) tends to zero. This procedure is known as quasi-neutral limit.

When we take the Debye length into account, in appropriate dimensionless variables, the Vlasov–Poisson system becomes

$$\begin{aligned} (VP)_{\varepsilon } : = \left\{ \begin{array}{l}\partial _t f_\varepsilon +v\cdot \nabla _x f_\varepsilon + E_\varepsilon \cdot \nabla _v f_\varepsilon =0, \\ E_\varepsilon =-\nabla _x U_\varepsilon , \\ -\varepsilon ^2 \Delta _x U_\varepsilon =\rho _{f_{\varepsilon }} - 1 ,\\ f_\varepsilon \vert _{t=0}=f_{0,\varepsilon }\ge 0,\ \ \int _{\mathbb {T}^d \times \mathbb {R}^d} f_{0,\varepsilon }\,\mathrm {d}x\,\mathrm {d}v=1, \end{array} \right. \end{aligned}$$
(3.2)

and the energy of the rescaled system is the following:

$$\begin{aligned} \mathcal {E}(f_\varepsilon (t)):= \frac{1}{2}\int _{\mathbb {T}^d \times \mathbb {R}^d} f_\varepsilon |v|^2 \mathrm {d}x \mathrm {d}v + \frac{\varepsilon ^2}{2}\int _{\mathbb {T}^d}|\nabla _x U_\varepsilon |^2 \mathrm {d}x. \end{aligned}$$

The quasi-neutral limit corresponds to a singular limit for the rescaled system (3.2), in which the formal limiting system is the Kinetic Isothermal Euler system

$$\begin{aligned} (KIE) :=\left\{ \begin{array}{l}\partial _t f +v\cdot \nabla _x f+ E\cdot \nabla _v f=0, \\ E= -\nabla _x U, \\ \rho = 1,\\ f\vert _{t=0}=f_{0}\ge 0,\ \ \int _{\mathbb {T}^d \times \mathbb {R}^d} f_0\,\mathrm {d}x\,\mathrm {d}v=1. \end{array} \right. \end{aligned}$$
(3.3)

The force \(E=-\nabla _x U\) is defined implicitly through the incompressibility constraint \(\rho =1\), and may be thought of as a Lagrange multiplier associated to this constraint. In other words, electrons move under the effect of a gradient in such a way that their density remains equal to 1 everywhere. Thus (KIE) is a “kinetic” version of the incompressible Euler equations. As shown in [8], the potential U formally satisfies the Laplace equation

$$\begin{aligned} -\Delta _x U= \sum _{i,j} \partial _{x_i}\partial _{x_j} \int _{{{\mathbb {R}}}^d} v_i v_j f \!\mathrm {d}\,v. \end{aligned}$$

As discussed in [31], the justification of this limit is very delicate. In particular, it can fail even for smooth initial data. Still, a series of positive results are available. In particular, as shown in [34, 35], a way to get the validity of the quasi-neutral limit for a large class of data can be achieved if one can prove some quantitative strong-strong stability at the level of the \((VP)_{\varepsilon }\) system. Also, the stronger the stability estimate, the larger the class of initial data for which the quasi-neutral limit hold. In [34, 35] the authors prove that the quasi-neutral limit holds for initial data that are an extremely small perturbation of an analytic function. Here, by introducing a suitable non-linear version of the Wasserstein distance, we can considerably improve that results.

Here is our main theorem, which provides us with a new \(W_2\) stability estimate. We prove the result with a general parameter \(\varepsilon \leqq 1\) as this is necessary for the study of the quasi-neutral limit. The reader interested in the Vlasov–Poisson case can simply apply our estimate with \(\varepsilon =1\).

Theorem 3.1

Let \(\varepsilon \leqq 1\), and let \(f_1, f_2\) be two weak solutions of the \((VP)_{\varepsilon }\) system (3.2), and set

$$\begin{aligned} \rho _1:= \int _{\mathbb {R}^d} f_1 \, \mathrm {d}v, \quad \rho _2= \int _{\mathbb {R}^d} f_2 \, \mathrm {d}v. \end{aligned}$$

Define the function

$$\begin{aligned} A(t):=\Vert \rho _1(t)\Vert _{L^\infty (\mathbb T^d)}+\Vert \rho _2(t)\Vert _{L^\infty ({\mathbb {T}}^d)}, \end{aligned}$$

and assume that \(A(t) \in L^1([0,T])\) for some \(T>0\). There exist a dimensional constant \(C_d>0\) and a universal constant \(c_0>0\) such that the following holds: if \(W_2(f_1(0),f_2(0))\) is sufficiently small so that \(W_2(f_1(0),f_2(0))\leqq c_0 \varepsilon \) and

$$\begin{aligned}&\sqrt{\left| \log \left( \varepsilon ^{-2}W_2(f_1(0),f_2(0))^2 \left| \log \left( \frac{1}{2}\varepsilon ^{-2}W_2(f_1(0),f_2(0))^2\right) \right| \right) \right| }\nonumber \\&\quad \geqq \frac{C_d}{\varepsilon }\int _0^TA(s)\,\mathrm {d}s+\sqrt{\left| \log \left( \frac{\varepsilon }{e}\right) \right| }, \end{aligned}$$
(3.4)

then

$$\begin{aligned}&W_2(f_1(t),f_2(t))^2 \\&\quad \leqq 2 e^{-\left( \sqrt{\left| \log \left\{ \varepsilon ^{-2}W_2(f_1(0),f_2(0))^2 \left| \log \left( \frac{1}{2}\varepsilon ^{-2}W_2(f_1(0),f_2(0))^2\right) \right| \right\} \right| }- \frac{C_d}{\varepsilon }\int _0^tA(s)\,\mathrm {d}s\right) ^2}\\&\qquad \text {for all }t \in [0,T]. \end{aligned}$$

Remark 3.2

The assumption (3.4) depends on the time interval [0, T]. If T is very small so that

$$\begin{aligned} \frac{C_d}{\varepsilon }\int _0^TA(s)\,\mathrm {d}s\le \sqrt{\left| \log \left( \frac{\varepsilon }{e}\right) \right| } \end{aligned}$$

then (3.4) corresponds to \(W_2(f_1(0),f_2(0))\le \varepsilon ^3.\) Of course this is not the relevant regime since the time interval is usually at least of size 1. In particular, since \(A(t)\ge 2\),Footnote 2

$$\begin{aligned} \frac{C_d}{\varepsilon }\int _0^TA(s)\,\mathrm {d}s= \frac{C_T}{\varepsilon } \qquad \text {for some constant } C_T\gtrsim 1. \end{aligned}$$

Therefore (3.4) corresponds to asking \(W_2(f_1(0),f_2(0))\) being bounded by \(e^{-C\varepsilon ^{-2}}.\) This requirement is very natural in this context, as also discussed in Remark 3.4.

As in [34, 35], Theorem 3.1 yields the validity of the quasi-neutral limit for \(W_2\)-perturbations of analytic data. However, our estimate is stronger with respect to the previous results and provides an almost optimal rate in the quasi-neutral limit. More broadly, we believe that our approach for proving Theorem 3.1 has its own interest and could be used in other settings.

To state our application to the quasi-neutral limit, we need to recall some notation introduced by Grenier [26] in one of the first mathematical works on this topic. In [26] the author relies on an interpretation of the plasma as a superposition of a—possibly uncountable—collection of fluids and he shows that the quasi-neutral limit holds when the sequence of initial data \(f_{0,\varepsilon }\) enjoys uniform analytic regularity with respect to the space variable. As explained in [34] (see the discussion after Definition 1.4), this decomposition is purely a technical tool and it does not impose any restriction on the initial datum. This result has been improved by Brenier [9], who gives a rigorous justification of the quasi-neutral limit in the so called “cold electron” case, i.e. when the initial distribution \(f_{0,\varepsilon }\) converges to a monokinetic profile

$$\begin{aligned} f_0(x,v) = \rho _0(x) \delta _{v= v_0(x)}, \end{aligned}$$

where \(\delta _v\) denotes the Dirac measure in velocity, see also [9, 25, 50].

Let us define a suitable analytic norm, as in [26]: given \(\delta >0\) and a function \(g:\mathbb {T}^d \rightarrow {\mathbb {R}}\), we define

$$\begin{aligned} \Vert g \Vert _{B_\delta } := \sum _{k \in \mathbb {Z}^d} | \widehat{g}(k) | \delta ^{|k|}, \end{aligned}$$

where \(\widehat{g}(k)\) is the k-th Fourier coefficient of g. We define \(B_\delta \) as the space of functions g such that \(\Vert g \Vert _{B_\delta }<+\infty \).

Corollary 3.3

Let \(d=2, 3\), and let \(\gamma \), \(\delta _0\), and \(C_0\) be positive constants. Consider a sequence \((f_{0,\varepsilon })\) of non-negative initial data in \(L^1\) for (3.2) such that for all \(\varepsilon \in (0,1)\), and all \(x \in \mathbb {T}^d\),

  • (uniform estimates)

    $$\begin{aligned} \Vert f_{0,\varepsilon }\Vert _\infty \leqq C_0, \quad \mathcal {E}(f_{0,\varepsilon }) \leqq C_0, \end{aligned}$$
  • (compact support in velocity)

    $$\begin{aligned} f_{0,\varepsilon }(x,v)= 0 \quad \text {if } |v| > \frac{1}{\varepsilon ^\gamma }, \end{aligned}$$
  • (analytic + perturbation) Assume the following decomposition:

    $$\begin{aligned} f_{0,\varepsilon } = g_{0,\varepsilon } + h_{0,\varepsilon }, \end{aligned}$$

    where \((g_{0,\varepsilon })\) is a sequence of continuous functions satisfying

    $$\begin{aligned} \sup _{\varepsilon \in (0,1)}\sup _{v \in \mathbb {R}^d} \, (1+|v|^2) \Vert g_{0,\varepsilon } (\cdot ,v)\Vert _{B_{\delta _0}} \leqq C_0, \end{aligned}$$

    admitting a limit \(g_0\) in the sense of distributions. Furthermore, \((h_{0,\varepsilon })\) is a sequence of functions satisfying for all \(\varepsilon >0\)

    $$\begin{aligned} W_2(f_{0,\varepsilon },g_{0,\varepsilon }) \leqq e^{-K \varepsilon ^{-2\zeta }}\qquad \text {with }\zeta =\left\{ \begin{array}{ll} 1+2\max \{\beta ,\gamma \}) &{} \text {if }d=2,\\ 1+\max \{38,3\gamma \}) &{} \text {if }d=3,\\ \end{array} \right. \end{aligned}$$
    (3.5)

    for some constants \(K>0\) and \(\beta >2\).

For all \(\varepsilon \in (0,1)\), consider \(f_\varepsilon (t)\) a global weak solution of (3.2) with initial condition \(f_{0,\varepsilon }\), and define the filtered distribution function

$$\begin{aligned} {\widetilde{f}}_\varepsilon (t,x,v) := f_\varepsilon \Big (t,x,v-\frac{1}{i}(d_+(t,x)e^{\frac{it}{\sqrt{\varepsilon }}}-d_-(t,x)e^{-\frac{it}{\sqrt{\varepsilon }}})\Big ) \end{aligned}$$

where \((d_\pm )\) are the correctors are defined as the solution of

$$\begin{aligned} \mathrm{curl} \ d_{\pm }= & {} 0, \qquad \mathrm{div} \bigg (\partial _t d_{\pm }+\left( \int \rho _\theta v_\theta \mu (\mathrm {d}\theta ) \cdot \nabla \right) d_{\pm } \bigg ) =0, \\ \mathrm{div} d_{\pm } (0)= & {} \underset{\varepsilon \rightarrow 0}{\lim } \mathrm{div} \frac{\sqrt{\varepsilon }E^\varepsilon (0) \pm i j^\varepsilon (0)}{2},\qquad j^\varepsilon := \int \rho ^\varepsilon _\theta v^\varepsilon _ \theta \mu (\mathrm {d}\theta ). \end{aligned}$$

Then there exist \(T>0\), and g(t) a weak solution on [0, T] of (3.3) with initial condition \(g_0,\) such that

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \sup _{t \in [0,T]} W_1(\widetilde{f}_\varepsilon (t), g(t)) = 0. \end{aligned}$$

Remark 3.4

Already in the one dimensional case, there is a negative result stating that an initial rate of convergence of the form \(W_2(f_{0,\varepsilon },g_{0,\varepsilon }) \leqq \varepsilon ^k\) for some \(k>0\) is not sufficient to ensure the validity of the quasi-neutral limit for positive times. This is the consequence of instability mechanisms described in [27] and [33]. Hence, our assumption on the size of \(W_2(f_{0,\varepsilon },g_{0,\varepsilon }) \) considerably improves the results in [34, 35], where a double exponential \(\exp \big (-\exp ({K \varepsilon ^{-\zeta }})\big )\) was required.

Remark 3.5

In Corollary 3.3 we consider sequences of initial conditions with compact support in velocity (yet, we allow the support to grow polynomially as \(\varepsilon \) goes to zero). The reason is that we need \(L^\infty \) bounds on the density \(\rho _{f_\varepsilon }(t)=\int f_\varepsilon (t)\,\mathrm {d}v\), so a control on the support in velocity is needed. We have decided to put these assumptions because they are the same as in [6] and so we can rely on some estimates proved in that paper. However, using the argument in [52] (see also [29]) one could relax the assumptions and require only a moment condition on \(f_{0,\varepsilon }\). Providing this extension is not difficult, but it would require some work that would go beyond the main goal of this paper.

Before proving Proof of Theorem 3.1, we first show how it implies Corollary 3.3.

Proof of Corollary 3.3

Let \(g_\varepsilon (t)\) denote the solution of (3.2) starting from \(g_{0,\varepsilon }\). As shown in [34, Sect. 4], under the assumptions in the statement, the following bounds hold:

$$\begin{aligned} \Vert \rho _{g_\varepsilon }\Vert _{L^\infty ([0,T]\times \mathbb {T}^d)}\leqq {\bar{C}},\qquad \Vert \rho _{f_\varepsilon }\Vert _{L^\infty ([0,T]\times \mathbb {T}^d)}\leqq \bar{C}\varepsilon ^{-(\zeta -1)}, \end{aligned}$$

where \(\zeta =\zeta (d)\) is as in the statement. Hence Theorem 3.1 and (3.5) yield

$$\begin{aligned}&\sup _{[0,T]}W_2(f_\varepsilon (t),g_\varepsilon (t)) \\&\quad \leqq 2 e^{-\left( \sqrt{\left| \log \left\{ \varepsilon ^{-2}W_2(f_{0,\varepsilon },g_{0,\varepsilon })^2 \left| \log \left( \frac{1}{2}\varepsilon ^{-2}W_2(f_{0,\varepsilon },g_{0,\varepsilon })^2\right) \right| \right\} \right| } - C_d{\bar{C}} T\varepsilon ^{-\zeta } \right) ^2}, \end{aligned}$$

provided that \(C_d^2{\bar{C}}^2T^2<K\) (which can be guaranteed by taking T smaller if necessary). This implies that \(\sup _{[0,T]}W_2(f_\varepsilon (t),g_\varepsilon (t)) \rightarrow 0\) as \(\varepsilon \rightarrow 0,\) and we can now conclude as in [34, Proof of Theorem 1.7]. \(\quad \square \)

3.3 Proof of Theorem 3.1

Before starting the proof we recall [34, Lemma 3.2], see also [29, Lemma 3.3].

Lemma 3.6

Let \(\Psi _i:{\mathbb {T}}^d\rightarrow {\mathbb {R}}\) solve

$$\begin{aligned} -\varepsilon ^2 \Delta \Psi _i=\rho _i-1,\qquad i=1,2. \end{aligned}$$

Then

$$\begin{aligned} \varepsilon ^2 \Vert \nabla \Psi _1-\nabla \Psi _2\Vert _{L^2({\mathbb {T}}^d)}\leqq & {} \Bigl [\max \bigl \{\Vert \rho _1\Vert _{L^\infty ({\mathbb {T}}^d)}, \Vert \rho _2\Vert _{L^\infty ({\mathbb {T}}^d)}\bigr \} \Bigr ]^{1/2}\,W_2(\rho _1,\rho _2),\\ \varepsilon ^2|\nabla \Psi _i(x) - \nabla \Psi _i(y)|\leqq & {} C\,|x-y|\,\log \biggl (\frac{4 \sqrt{d}}{|x-y|} \biggr ) \,\Vert \rho _i-1\Vert _{L^\infty ({\mathbb {T}}^d)}\qquad \\&\qquad \forall \,x,y \in \mathbb T^d,\, i=1,2. \end{aligned}$$

Let \((X_i,V_i)\) denote the characteristic flow associated to \(f_i\), that is

$$\begin{aligned} \left\{ \begin{array}{l} \dot{X}_i(t,x,v)=V_i(t,x,v),\\ \dot{V}_i(t,x,v)=E_i(t,X_i(t,x,v)),\\ X_i(0,x,v)=x,\,\,V_i(0,x,v)=v, \end{array} \right. \qquad E_i=\nabla U_i,\qquad \varepsilon ^2\Delta U_i=\rho _{f_i}-1. \end{aligned}$$

To prove Theorem 3.1, we consider \(\pi _0\) an optimal \(W_2\)-coupling between \(f_{1}(0)\) and \(f_{2}(0)\), and we define the quantity Q(t) defined as the unique constant (assuming it exists) such that

$$\begin{aligned} Q(t)= & {} \frac{1}{2}\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left[ \varepsilon ^{-2}|\log Q(t)| \,|X_1(t,x,v)-X_2(t,y,w)|^2\right. \\&\left. +|V_1(t,x,v) -V_2(t,y,w)|^2\right] \mathrm {d}\pi _0(x,v,y,w). \end{aligned}$$

In other words, we are considering a quantity of the form

$$\begin{aligned} Q(t)= & {} \frac{1}{2}\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left[ \lambda (t)|X_1(t,x,v)-X_2(t,y,w)|^2\right. \\&\left. +| V_1(t,x,v)-V_2(t,y,w)|^2\right] \mathrm {d}\pi _0(x,v,y,w), \end{aligned}$$

with \(\lambda (t)\) depending on time, and we are assuming that actually \(\lambda (t)\) is a function of Q(t) itself. The particular choice \(\lambda (t)=\varepsilon ^{-2}|\log Q(t)|\) is specific to this problem: the logarithm will help to compensate for the log-Lipschitz regularity of the electric fields, while \(\varepsilon ^{-2}\) is the natural scaling in the current setting.

Note that a priori is not clear that Q(t) is well-defined. This will be proved in Lemma 3.7 below. However, assuming for now that Q(t) is well-defined, we show how this quantity allows us to prove the result. We have

$$\begin{aligned} Q'(t)= & {} \frac{1}{2}\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \lambda '(t)|X_1(t,x,v)-X_2(t,y,w)|^2\,\mathrm {d}\pi _0(x,v,y,w)\\&\qquad +\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left[ \lambda (t)(X_1(t,x,v)\right. \\&\left. \qquad -X_2(t,y,w))\cdot (V_1(t,x,v)-V_2(t,y,w)\right] \mathrm {d}\pi _0(x,v,y,w)\\&\qquad -\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2}\left[ (V_1(t,x,v) -V_2(t,y,w)\cdot (E_1(t, X_1(t,x,v))\right. \\&\left. \qquad -E_2(t, X_2(t,y,w))) \right] \mathrm {d}\pi _0(x,v,y,w) \end{aligned}$$

By Cauchy–Schwarz inequality and recalling the definition of Q(t) we have,

$$\begin{aligned} Q'(t)\le & {} \frac{1}{2}\lambda '(t)\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} |X_1(t,x,v)-X_2(t,y,w)|^2\,\mathrm {d}\pi _0(x,v,y,w)\\&+\lambda (t)\left( \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2}|X_1(t,x,v) -X_2(t,y,w)|^2\,\mathrm {d}\pi _0(x,v,y,w)\right) ^{\frac{1}{2}}\\&\cdot \left( \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} |V_1(t,x,v)-V_2(t,y,w)|^2\,\mathrm {d}\pi _0(x,v,y,w)\right) ^{\frac{1}{2}}\\&+\left( \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2}|V_1(t,x,v)-V_2 (t,y,w)|^2\,\mathrm {d}\pi _0(x,v,y,w)\right) ^{\frac{1}{2}}\\&\cdot \left( \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} |E_1(t, X_1(t,x,v))\right. \\&\left. -E_2(t, X_2(t,y,w))|^2\,\mathrm {d}\pi _0(x,v,y,w) \right) ^{\frac{1}{2}}\\\le & {} \frac{1}{2}\lambda '(t)\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} |X_1(t,x,v)-X_2(t,y,w)|^2\,\mathrm {d}\pi _0(x,v,y,w)\\&+2\sqrt{\lambda (t)}Q(t)+\sqrt{ Q(t)}\Vert E_1(t, X_1(t,x,v))\\&-E_2(t, X_2(t,y,w))\Vert _{L^2(\mathrm {d}\pi _0(x,v,y,w))}. \end{aligned}$$

Adding and subtracting \(-E_2(t,X_1)\) we obtain,

$$\begin{aligned} Q'(t)&\le \frac{1}{2}\lambda '(t)\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} |X_1(t,x,v)-X_2(t,y,w)|^2\,\mathrm {d}\pi _0(x,v,y,w)\nonumber \\&+2\sqrt{\lambda (t)}Q(t) +\sqrt{Q(t)}\left( T_1+T_2\right) , \end{aligned}$$
(3.6)

where

$$\begin{aligned} T_1= & {} \Vert E_2(t, X_1(t,x,v))-E_2(t, X_2(t,y,w))\Vert _{L^2(\mathrm {d}\pi _0(x,v,y,w))},\\ T_2= & {} \Vert E_1(t, X_1(t,x,v))-E_2(t,X_1(t,x,v))\Vert _{L^2(\mathrm {d}\pi _0(x,v,y,w))}. \end{aligned}$$

Thanks to Lemma 3.6 and by the very same argument in [34] we can bound \(T_1\) and \(T_2\) as follows:Footnote 3

$$\begin{aligned} T_2\le \frac{C}{\varepsilon ^2} A(t)\sqrt{\frac{Q(t)}{\lambda (t)}}, \qquad \text {and}\qquad T_1\le \frac{C}{\varepsilon ^2}A(t) \sqrt{\phi \left( \frac{Q(t)}{\lambda (t)}\right) } \end{aligned}$$

where we have

$$\begin{aligned} \phi (s)=\left\{ \begin{array}{ll} s\log ^2(s) &{} \text{ for }\, s\in (0,1/e]\\ s&{} \text{ for }\, s>1/e . \end{array} \right. \end{aligned}$$

We now recall that \(\lambda (t)=\varepsilon ^{-2}|\log (Q(t))|\) and we substitute this expression in the derivative of Q(t). Notice that in this estimate we are interested in small values of Q(t) and in particular, as we will show below, we will always be in the regime \(\varepsilon ^2 Q(t)/|\log (Q(t))|\in (0,1/e).\) Therefore we have

$$\begin{aligned} T_1\le \frac{C}{\varepsilon ^2}A(t)\sqrt{\frac{\varepsilon ^2\,Q(t)}{|\log (Q(t))|}\log ^2 \left( \frac{\varepsilon ^2\,Q(t)}{|\log (Q(t))|}\right) }, \end{aligned}$$

so, by equation (3.6), we have,

$$\begin{aligned} Q'(t)&\le \left( -\frac{1}{2}\frac{Q'(t)}{|\log (Q(t))|}\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2}|X_1(t,x,v)-X_2(t,y,w) |^2\,\mathrm {d}\pi _0(x,v,y,w)\right) \\&\qquad +\left( 2\frac{\sqrt{|\log (Q(t))|}}{\varepsilon }+\frac{C\,A(t)}{\varepsilon \sqrt{|\log (Q(t))|}}\right) Q(t)\\&\qquad +C\,A(t)\frac{\sqrt{Q(t)}}{\varepsilon }\sqrt{\frac{Q(t)}{|\log (Q(t))|}\log ^2 \left( \frac{\varepsilon ^2 \,Q(t)}{|\log (Q(t))|}\right) }. \end{aligned}$$

We now consider two cases, depending on the sign of \(Q'(t)\). If \(Q'(t)\leqq 0\), then we do not do anything. If instead \(Q'(t)>0\), then the first term in the right-hand side above is negative, and therefore

$$\begin{aligned} Q'(t)&\le \left( 2\frac{\sqrt{|\log (Q(t))|}}{\varepsilon }+\frac{C\,A(t)}{\varepsilon \sqrt{|\log (Q(t))|}}\right) Q(t)\\&\qquad +C\,A(t)\frac{\sqrt{Q(t)}}{\varepsilon } \sqrt{\frac{Q(t)}{|\log (Q(t))|}\log ^2 \left( \frac{\varepsilon ^2 \,Q(t)}{|\log (Q(t))|}\right) }. \end{aligned}$$

Since the right-hand side above is nonnegative, independently of the sign of \(Q'(t)\) we know that the bound above holds. We now observe that, as long as \(Q(t)\leqq \varepsilon \), then

$$\begin{aligned} \log ^2 \left( \frac{\varepsilon ^2 \,Q(t)}{|\log (Q(t))|}\right) \leqq C\log ^2(Q(t)). \end{aligned}$$

Thus,

$$\begin{aligned}&Q'(t)\le \left( 2\frac{\sqrt{|\log (Q(t))|}}{\varepsilon }+\frac{C\,A(t)}{\varepsilon \sqrt{|\log (Q(t))|}}+\frac{C\,A(t)\sqrt{|\log (Q(t))|}}{\varepsilon }\right) Q(t) \qquad \\&\qquad \text {provided } Q(t)\leqq \varepsilon . \end{aligned}$$

Since \(A(t)\geqq 1\)Footnote 4, provided \(|\log (Q(t))|\geqq 1\), the above bound reduces to

$$\begin{aligned} Q'(t)\le \frac{2C_d\,A(t)}{\varepsilon }Q(t)\sqrt{|\log (Q(t))|}, \end{aligned}$$

where \(C_d\) is a dimensional constant. Note that the two conditions \(Q(t)\leqq \varepsilon \) and \(|\log (Q(t))|\geqq 1\) are guaranteed if \(Q(t)\leqq \frac{\varepsilon }{e}\) (recall that \(\varepsilon \leqq 1\) by assumption).

Hence, provided that we are in the regime \(Q(s)\leqq \frac{\varepsilon }{e}\) on [0, t], this implies that

$$\begin{aligned} Q(t) \leqq R(t):=e^{-\left( \sqrt{|\log (Q(0))|} - \frac{C_d}{\varepsilon }\int _0^tA(s)\,\mathrm {d}s\right) ^2}. \end{aligned}$$
(3.7)

We observe that the bound (3.7) guarantees that

$$\begin{aligned} \sup _{s\in [0,t]}Q(s)\leqq \frac{\varepsilon }{e}\quad \text { holds if } \quad \sup _{s \in [0,t]}R(s)\leqq \frac{\varepsilon }{e}. \end{aligned}$$

In particular, (3.7) holds if

$$\begin{aligned} \sqrt{|\log (Q(0))|}\geqq \frac{C_d}{\varepsilon } \int _0^tA(s)\,\mathrm {d}s+\sqrt{\left| \log \left( \frac{\varepsilon }{e}\right) \right| }. \end{aligned}$$
(3.8)

We now compare the quantity Q to the Wasserstein distance. First of all, since \(Q(t)\leqq \frac{\varepsilon }{e}\), \(\varepsilon ^{-2}|\log (Q(t))| \geqq 1\), therefore

$$\begin{aligned}&\frac{1}{2} W_2(f_1(t),f_2(t))^2\\&\quad \leqq \frac{1}{2} \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left[ |X_1(t,x,v) -X_2(t,y,w)|^2+|V_1(t,x,v)-V_2(t,y,w)|^2\right] \\&\qquad \mathrm {d}\pi _0(x,v,y,w) \\&\leqq Q(t). \end{aligned}$$

On the other hand, since \(\varepsilon ^{-2}|\log (Q(0))| \geqq 1\) and \(\pi _0\) is an optimal plan,

$$\begin{aligned} Q(0)\leqq & {} \frac{1}{2}\varepsilon ^{-2}|\log (Q(0))| \int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} \left[ |x-y|^2+|v-w|^2\right] \mathrm {d}\pi _0(x,v,y,w)\\= & {} \frac{1}{2}\varepsilon ^{-2}|\log (Q(0))|W_2(f_1(0),f_2(0))^2, \end{aligned}$$

or equivalently

$$\begin{aligned} \frac{Q(0)}{|\log (Q(0))|}\leqq \frac{1}{2}\varepsilon ^{-2}W_2(f_1(0),f_2(0))^2, \end{aligned}$$

We now observe that, near the origin, the inverse of the function \(s\mapsto \frac{s}{|\log s|}\) behaves like \(\tau \mapsto \tau |\log \tau |\). In particular, there exists a universal small constant \(c_0>0\) such that

$$\begin{aligned} \frac{s}{|\log s|}\leqq \tau \qquad \text {for some }0 \leqq \tau \leqq c_0 \qquad \Rightarrow \qquad s \leqq 2 \tau |\log \tau |. \end{aligned}$$

Hence, if \( \frac{1}{2}\varepsilon ^{-2}W_2(f_1(0),f_2(0))^2 \leqq c_0\), we deduce that

$$\begin{aligned} Q(0)\leqq \varepsilon ^{-2}W_2(f_1(0),f_2(0))^2 \left| \log \left( \frac{1}{2}\varepsilon ^{-2}W_2(f_1(0),f_2(0))^2\right) \right| . \end{aligned}$$

Combining these bounds with (3.7), and recalling (3.8), this implies

$$\begin{aligned}&W_2(f_1(t),f_2(t))^2 \\&\qquad \leqq 2 e^{-\left( \sqrt{\left| \log \left( \varepsilon ^{-2}W_2(f_1(0),f_2(0))^2 \left| \log \left( \frac{1}{2}\varepsilon ^{-2}W_2(f_1(0),f_2(0))^2\right) \right| \right) \right| } - \frac{C_d}{\varepsilon }\int _0^tA(s)\,\mathrm {d}s\right) ^2} \end{aligned}$$

provided \( \frac{1}{2}\varepsilon ^{-2}W_2(f_1(0),f_2(0))^2 \leqq c_0\) and

$$\begin{aligned}&\sqrt{\left| \log \left( \varepsilon ^{-2}W_2(f_1(0),f_2(0))^2 \left| \log \left( \frac{1}{2}\varepsilon ^{-2}W_2(f_1(0),f_2(0))^2\right) \right| \right) \right| } \\&\qquad \geqq \frac{C_d}{\varepsilon }\int _0^tA(s)\,\mathrm {d}s+ \sqrt{\left| \log \left( \frac{\varepsilon }{e}\right) \right| }. \end{aligned}$$

Finally, to complete the proof, we show the following:

Lemma 3.7

With the notation and assumptions of the theorem, the quantity Q(t) is well defined and it is locally Lipschitz continuous where \(Q(t)>0\). In particular it is differentiable a.e.

Proof

Set

$$\begin{aligned} D(t):= & {} \frac{1}{2}\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2} |X_1(t,x,v)-X_2(t,y,w)|^2 \mathrm {d}\pi _0(x,v,y,w),\\ E(t):= & {} \frac{1}{2}\int _{(\mathbb {T}^d\times \mathbb {R}^d)^2}|V_1(t,x,v)-V_2(t,y,w)|^2 \mathrm {d}\pi _0(x,v,y,w). \end{aligned}$$

We can assume that D(t) and E(t) are nonzero, otherwise we are in the “degenerate” situation where \(f_1\equiv f_2\), in which case Q(t) is trivially 0. Also, since D(t) and E(t) are written in terms of the characteristic flow, it is standard to check that they are differentiable.Footnote 5

We note that the quantity Q(t) is implicitly defined via the relation

$$\begin{aligned} Q(t)= \varepsilon ^{-2}|\log Q(t)| D(t)+E(t), \end{aligned}$$
(3.9)

or equivalently, for each fixed time t, Q(t) is the solution of the equation

$$\begin{aligned} F(q, D(t), E(t))=0 \qquad \text {with } F(q, r , s) :=q+\varepsilon ^{-2} \log q\,r-s\qquad \text {for } q \in (0,1). \end{aligned}$$

Since the function \(q\mapsto q+\varepsilon ^{-2}\log q\, D(t)\) is strictly increasing on (0, 1) and its image covers the interval (0, 1), we deduce that the equation above has a unique solution provided \(E(t)< 1\). Hence, this proves that \(Q(t) \in (0,1)\) is well defined provided \(E(t)< 1\). In addition, thanks to the implicit function theorem applied to the function \(F\in C^1_{\mathrm{loc}}((0,1)\times {{\mathbb {R}}}\times {{\mathbb {R}}})\), we deduce the existence of a \(C^1_{\mathrm{loc}}\) function G such that \(Q(t)=G(D(t),E(t))\). Now, differentiating the relation (3.9) with respect to t we obtain

$$\begin{aligned} Q'(t)\biggl (1+\varepsilon ^{-2}\frac{D(t)}{Q(t)}\biggr ) =\varepsilon ^{-2}|\log Q(t)| D'(t)+E'(t). \end{aligned}$$

Hence, since D and E are uniformly bounded and Lipschitz, for any \(\delta >0\) we deduce that

$$\begin{aligned} |Q'(t)| \leqq \frac{\varepsilon ^{-2}|\log Q(t)|\,| D'(t)|+|E'(t)|}{1+\varepsilon ^{-2}\frac{D(t)}{Q(t)}} \leqq C_\delta \qquad \text {where }Q(t)>\delta . \end{aligned}$$

This proves that, for any \(\delta >0\), the function \(t\mapsto Q(t)\) is uniformly Lipschitz continuous inside the set \(\{Q(t)>\delta \}\). This proves that Q(t) is locally Lipschitz continuous inside the region \(\{Q(t)>0\}\).

So, to conclude the proof, we need to ensure that \(E(t)< 1\). Note that, since by assumption \(E(0)\leqq \frac{1}{2} W_2(f_1(0),f_2(0))^2 \ll 1\), by continuity we have that \(E(t)<1\) for \(t>0\) small. So Q(t) is well defined for \(t>0\) small. Also, as long as Q(t) is well defined, we have that \(Q(t)\geqq E(t)\). Hence, as long as Q(t) is well defined, we have that

$$\begin{aligned} E(t)\leqq Q(t) \leqq e^{-\left( \sqrt{|\log (Q(0))|} - \frac{C_d}{\varepsilon }\int _0^tA(s)\,\mathrm {d}s\right) ^2}. \end{aligned}$$

Since, by our smallness assumption on \(W_2(f_1(0),f_2(0))\), the right hand side above remain small on [0, T], the bound above guarantees that \(E(t)\ll 1\) for all \(t \in [0,T]\). This proves that Q(t) is well-defined on [0, T], which concludes the proof. \(\quad \square \)

Remark 3.8

In the previous proof we considered \(\lambda (t)=\varepsilon ^{-2}|\log (Q(t))|\) and in Lemma 3.7 we proved that Q(t) is well-defined provided it is small enough. This restriction is due to the fact that the function \({{\mathbb {R}}}^+ \ni s\mapsto \varepsilon ^{-2}|\log s|\) is decreasing only for \(s\in (0,1).\) An alternative choice could have been to define

$$\begin{aligned} \Phi (s)=\left\{ \begin{array}{ll} |\log s|&{} \text{ for }\, s\in (0,1/e]\\ e^{-1}s^{-1}&{} \text{ for }\, s>1/e, \end{array} \right. \end{aligned}$$

and \(\lambda (t):= \varepsilon ^{-2}\Phi (Q(t)).\) With this choice, since \({{\mathbb {R}}}^+ \ni s\mapsto \Phi (s)\) is decreasing and of class \(C^1\), one can define Q(t) as the unique solution of

$$\begin{aligned} F(q, D(t), E(t))=0 \qquad \text {with } F(q, r , s):=q+ \varepsilon ^{-2}\Phi (q)\,r-s\qquad \text {for } q \in (0,\infty ). \end{aligned}$$

With this definition, the proof of Lemma 3.7 shows that Q(t) is always well defined (without any restriction on the size of E(t)), and it is locally Lipschitz continuous where \(Q(t)>0\).

Since in our setting we are interested in the case \(Q(t)\ll 1,\) there is no advantage in using this latter definition of \(\lambda .\) However this observation could be useful in other situations, see also Section 4 below.

4 Summary, Generalizations, and Perspectives

As we have seen in the last two sections, suitably modifying Wasserstein distances can be particularly useful in a kinetic setting to take advantage of the asymmetry between x and v. More precisely, let \(\mathcal {X}={\mathbb {T}}^d\) or \(\mathcal {X}=\mathbb R^d\), and let \(\mu \) and \(\nu \) be two probability measures on \(\mathcal {X}\times \mathbb {R}^d\). Also, let \(\Pi (\mu ,\nu )\) denote the collection of all measures on \((\mathcal {X} \times \mathbb {R}^d)^2\) with marginals \(\mu \) and \(\nu \) on the first and second factors respectively.

The first natural generalization, given \(p\geqq 1\) and \(\lambda \in \mathbb {R}^+\), is to consider

$$\begin{aligned}&KW_{\lambda ,\, p} (\mu , \nu )\\&\quad :=\left( \inf _{\pi \in \Pi (\mu ,\nu )} \int _{(\mathcal {X} \times \mathbb {R}^d)^2} \left( \lambda |x-y|^{p} +|v-w|^p\right) \mathrm {d} \pi (x,v,y,w) \right) ^{1/p}, \end{aligned}$$

as done in [28, 30, 43].

An alternative way, introduced in [56] for \(p=2\), would be to consider three parameters \(a,b,c>0\) such that \(\sqrt{ac}>b\) and define

$$\begin{aligned}&KW_{a,b,c,\, p} (\mu , \nu )\\&\quad :=\left( \inf _{\pi \in \Pi (\mu ,\nu )} \int _{(\mathcal {X} \times \mathbb {R}^d)^2} \left( a |x-y|^{2} +2b(x-y)\cdot (v-w)+c|v-w|^2\right) ^{p/2}\right. \\&\left. \quad \mathrm {d} \pi (x,v,y,w) \right) ^{1/p}, \end{aligned}$$

In this paper, we have introduced two different generalizations.

  1. (i)

    First, we considered the nonlinear version of the \(KW_{\lambda ,\, p} (\mu , \nu )\) by choosing \(\lambda \) depending on the distance itself. We defined this along a flow, but that can be also be defined in a general setting as follows: given \(p\geqq 1\) and a decreasing function \(\Phi :{{\mathbb {R}}}^+\rightarrow {{\mathbb {R}}}^+\), for every \(\pi \in \Pi (\mu ,\nu )\) and \(\lambda \) we define \(D_p(\pi ,\Phi )\) as the unique number s such that

    $$\begin{aligned}&s- \Phi (s)\int _{(\mathcal {X} \times \mathbb {R}^d)^2} |x-y|^{p}\mathrm {d} \pi (x,v,y,w) \\&\quad = \int _{(\mathcal {X} \times \mathbb {R}^d)^2} |v-w|^p\mathrm {d} \pi (x,v,y,w) \end{aligned}$$

    (arguing as in the proof of Lemma 3.7 it is easy to check that \(D(\pi ,\Phi )\) is well defined, see also Remark 3.8). Then, we set

    $$\begin{aligned} KW_{\Phi ,p}(\mu ,\nu ):=\left( \inf _{\pi \in \Pi (\mu ,\nu )} D_p(\pi ,\Phi ) \right) ^{1/p}. \end{aligned}$$

    This definition with \(\Phi (s)=\varepsilon ^{-2}|\log s|\) for \(s \in (0,1/e)\) and \(p=2\) essentially corresponds to the quantity used in the proof of Theorem 3.1, although there we considered the quantity D(t) where we did not take the infimum over couplings \(\pi \in \Pi (\mu ,\nu ),\) since it was not needed for our purpose.

  2. (ii)

    In a different direction, we modified the \(W_1\) distance by introducing a shift in position. Note that this second quantity cannot be defined as a “static” distance since the shift \(x -tv\) depends on the time t. Hence, one can generalize it only as a time dependent quantity as follows:

    $$\begin{aligned}&{\widetilde{KW}}_{t,\, p} (\mu , \nu )\\&\quad :=\left( \inf _{\pi \in \Pi (\mu ,\nu )} \int _{(\mathcal {X} \times \mathbb {R}^d)^2} \big (|(x-tv)-(y-tw)|^{p} +|v-w|^p\big )\right. \\&\left. \mathrm {d} \pi (x,v,y,w) \right) ^{1/p}. \end{aligned}$$

Of course, these approaches can be further combined by mixing the different quantities defined above. Note that there is no universal “best” choice, and each problem requires its adaptation. Still, we believe, as this paper shows, that this approach can lead to an improvement to several existing results, as well as to prove new estimates. In addition, the approach is very general and can be useful in any situation where there is an asymmetry between the variables involved.

To mention some concrete applications, our ideas could also be applied in the setting of quantum systems by suitably modifying the quantum Wasserstein distances introduced in [21, 24]. Also, our new Loeper-type estimate may be helpful to obtain stability estimates in \(W_2\) when the density belongs to some suitable Orlicz spaces, in analogy to [38] where stability estimates have been proved for \(W_1\).