Abstract
In this paper we study theoretical properties of the entropy-transport functional with repulsive cost functions. We provide sufficient conditions for the existence of a minimizer in a class of metric spaces and prove the \(\Gamma \)-convergence of the entropy-transport functional to a multi-marginal optimal transport problem with a repulsive cost. We point out that our construction can deal with the case when the space X is a domain in \({\mathbb {R}}^d\), answering a question raised in Benamou et al. (Numer Math 142:33–54, 2019). Finally, we also prove the entropy-regularized version of the Kantorovich duality.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider the following multi-marginal entropy-transport problem
where \(C_0[\gamma ] = \int _{X^N} c\,{\mathrm d}\gamma \) is the transportation cost related to a cost function c, \(E[\gamma ]\) is the entropy, and \(\varepsilon \ge 0\) is a parameter, see Sect. 2 for details. We consider the setting where \((X,d,{\mathfrak {m}})\) is a Polish measure space, and \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) is an absolutely continuous probability measure with respect to the reference measure \({\mathfrak {m}}\). An element \(\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )\) is called a symmetric coupling (or transport plan), that is, a symmetric probability measure in \(X^N\) having all marginals equal to \(\rho {\mathfrak {m}}\).
We are interested in a class of repulsive cost functions \(c:X^N\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) of the form
We assume \(f:]0,\infty [\rightarrow {\mathbb {R}}\) to be a continuous and decreasing function that approaches \(+\infty \) if \(d(x_i,x_j)\rightarrow 0\). Among the examples of such cost functions we have the Coulomb cost \(f(z) = 1/\vert z\vert \), the Riesz cost \(f(z) = 1/\vert z\vert ^s, n \ge s\ge \max \lbrace n-2, 0\rbrace \) (in \({\mathbb {R}}^n\)) and the logarithmic cost \(f(z) = -\log (\vert z\vert )\). We observe that when \(\varepsilon =0\), this entropy-transport problem reduces to the classical multi-marginal optimal transport problem with repulsive costs [4, 6, 7, 12].
The motivation of this paper comes from both theory and numerics. For repulsive cost functions, the entropy term in (1.1) plays a role of a regularizer to compute numerically a solution \(\gamma \) of the multi-marginal optimal transport problem \(I_{0}[\rho ]\), see [2]. Numerical experiments suggest that when the regularization parameter \(\varepsilon \) goes to 0, the minimizer \(\gamma _{\varepsilon }\) converges to a minimizer of \(I_{0}[\rho ]\) having minimal entropy among the minimizer of \(I_{0}[\rho ]\).
From a theoretical viewpoint, this type of a functional has direct relevance in Density Functional Theory. By choosing carefully the parameter \(\varepsilon \), the functional (1.1) provides a lower-bound for the Hohenberg–Kohn functional in Density Functional Theory [15, 24, 27]. This is an immediate consequence of the Log-Sobolev Inequality.
The entropy-transport problem has appeared previously in the literature in the attractive case, in particular when \(c(x_1,x_2) = d(x_1,x_2)^2\). We mention briefly below some of the connections of the entropy-transport with other fields and point out the relevance in the Coulomb case.
Brief comments on some applications of the entropy-transport
Optimal transport and Sinkhorn algorithm: The entropy-transport (1.1) was introduced by Cuturi [9] in order to compute numerically the optimal transport plan for the distance squared cost in the 2-marginals case via the Sinkorn algorithm. Due to its reasonable computational cost, it has been applied to a wide range of problems in various research areas, including Information Theory, Computer Graphics, Statistical Inference, Machine Learning, and Mean-Field Games. The entropic regularization method was also considered in the (attractive) multi-marginal case in the so-called barycenter problem introduced by Agueh and Carlier [1] (see also [5, 11]) and in numerical methods in the time discretization of Brenier’s relaxed formulation of the incompressible Euler equation [3]. For a thorough presentation of the computational aspects we refer to Cuturi and Peyré’s book [25].
Second-order calculus on RCD spaces: Gigli and Tamanini [8, 17] studied the entropic-transport problem on a class of metric spaces with (Riemannian) Ricci curvature bounded from below (2-marginals case, \(c(x_1,x_2) = d(x_1,x_2)^2\)). The entropic regularization procedure was crucial for establishing a second-order differential structure in that setting.
Schrödinger problem: In 1926, E. Schrödinger introduced the (linear) Schrödinger equations describing the non-relativistic evolution of a single particle in an electric field with potential energy and also established an equivalence between such equations and a system of diffusion equations [26]. Roughly speaking, the variational problem (see (1.1) with \(X = C([0,1],{\mathbb {R}}^d)\) and \(N=2\)) arises in the Schrödinger manuscript while studying the limit \(k\rightarrow \infty \)\((N=2)\) of the empirical measures associated to the evolution of k i.i.d. Brownian motions. We refer the reader to Léonard survey [21] for technical details and historical notes.
Lower bound on the Hohenberg–Kohn functional in density functional theory: This is the particular case where the entropy-transport problem with Coulomb cost comes into play. It has been shown in [24, 27] that the functional (1.1) provides a lower bound for computing the ground state energy of the Hohenberg–Kohn functional [4, 6, 7, 12, 22]. Below we give a brief description of the result. Notice that in this context \(X = {\mathbb {R}}^d\) and \({\mathfrak {m}}\) is the Lebesgue measure on \({\mathbb {R}}^d\).
Assume that \(\gamma \in \Pi _N(\rho )\) such that \(\sqrt{\gamma } \in H^1({\mathbb {R}}^{dN})\). This is the case, for example, when \(\gamma (x_1,\ldots ,x_N) = \vert \psi (x_1,\ldots ,x_N)\vert ^2\), where \(\psi \in H^1({\mathbb {R}}^{dN})\) is a ground-state wave function solving the N-electron Schrödinger Equation (see [6, 7, 12, 15, 27] for details). Then, we can define the Hohenberg–Kohn functional by
Now, as a consequence of the logarithmic Sobolev inequality for the Lebesgue measure [18], the following result holds: if \(\rho {\mathcal {L}}^d\in {\mathcal {P}}({\mathbb {R}}^d)\) and \(\sqrt{\gamma }\in H^1({\mathbb {R}}^{dN})\) then
1.1 Examples of optimal entropy couplings
Let us present some computational examples of minimizers of \(I_\varepsilon [\rho ]\) illustrating the role of the parameter \(\varepsilon \). Before this, we recall a result on the characterization of minimizers in the one-dimensional case [10]. In particular, according to it the minimizer of \(I_0[\rho ]\) is concentrated on finitely many graphs and thus singular with respect to the product reference measure.
Theorem 1.1
[10] Let \(\mu \in {\mathcal {P}}({\mathbb {R}})\) be an absolutely continuous probability measure and \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) strictly convex, bounded from below and non-increasing function. Then there exists a unique optimal symmetric plan \(\gamma \in {\Pi _N^{\mathrm {sym}} (\mu )}\) that solves
Moreover, this plan is induced by an optimal cyclical map T, that is, \(\gamma _{\mathrm {sym}}=\left( \gamma _T\right) ^S\), where \(\gamma _T=(\mathrm {id},T,T^{(2)} , \ldots , T^{(N-1)})_{\sharp } \mu \). An explicit optimal cyclical map is
Here \(F_{\mu }(x)=\mu (( -\infty , x])\) is the distribution function of \(\mu \), and \(F_{\mu }^{-1}\) is its lower semicontinuous left inverse.
1.1.1 One-dimensional entropic-transport with Coulomb cost and a Gaussian measure
Let \(\rho \) be the normal distribution on the real line with zero mean and standard deviation \(\sigma = 5\). We compute numerically the solution of the entropic-transport problem with Coulomb cost in the real line using the Sinkorn algorithm [9]. Notice that by Theorem 1.1, we know that the minimizer of \(I_0[\rho ]\) is concentrated on a graph. See Fig. 1 for an illustration of the computational results. Our code is based on the Python implementation available at POT library [14].
1.2 Organization of the paper
In Sect. 2, we introduce the setting and study sufficient conditions for the existence of minimizers for the entropy-transport problem (1.1). Section 3 is devoted to the \(\Gamma -\)convergence proof of the entropic-transport functional \(C_{\varepsilon }[\gamma ]\) to the multi-marginal optimal transport with repulsive costs \(C_0[\gamma ]\). In Sect. 4, we study the Kantorovich duality for the entropic-transport problem.
1.3 The strategy of the main proof and some technical remarks
The main result of this paper is Theorem 3.1, in which we prove the \(\Gamma \)-convergence of the entropic-regularized functional \(C_{\varepsilon }[\gamma ]\) to \(C_0[\gamma ]\). The technical difficulty on dealing with the \(\Gamma \)-convergence comes from the fact that while for the entropic part \(E[\gamma ]\) the minimizer \(\gamma \) tends to be as spread as possible with respect to \({\mathfrak {m}}\), for the cost \(C_{0}[\gamma ]\) a minimizer can be very singular and have infinite entropy.
We divide the proof in two parts. The part (I), the \(\liminf -\)inequality, follows basically from the lower-semicontinuity of the costs \(C_0[\gamma ]\) and \(C_{\varepsilon }[\gamma ]\) - which are obtained from the assumption \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) on the marginal measure \(\rho {\mathfrak {m}}\), giving a lower bound on the entropy. The part (II), the \(\limsup -\)inequality, is more involved. In Sect. 3.2, we construct a block approximation \(\gamma '_n\) for a coupling \(\gamma \) with \(C_{0}[\gamma ] <+\infty \). Such construction is done in several steps, since we need to construct a competitor \(\gamma '_n\) such that \(E[\gamma '_n] <\infty \) and \(\gamma '_n \in \Pi ^{\mathrm {sym}}_N(\rho )\). The main idea and the rigorous construction are given in Sect. 3.2.
Futhermore, we point out that our construction can deal with the case when the space X is a domain in \({\mathbb {R}}^d\), answering a question raised in [3]. There the \(\Gamma \)-convergence was proven using convolutions; an approach that does not seem to be easy to implement for domains, or in general metric spaces.
Related works: A proof of the \(\Gamma \)-convergence of (1.1) to the Monge-Kantorovich problem for \(c(x,y) = d(x,y)^p\) first appeared in [20, 23] via probabilistic methods. In [5], G. Carlier, V. Duval, G. Peyré and B. Schmitzer provided an alternative and more analytical proof carrying out a similar block approximation procedure for the two-marginal squared distance cost in the Euclidean space and the Wasserstein Barycenter.
2 The entropy-regularized repulsive costs
Let (X, d) be a Polish space and \({\mathfrak {m}}\) be a reference measure on X. We denote by \({\mathcal {P}}(X)\) the set of Borel probability measures on X, and \({\mathcal {P}}^{ac}(X)\) the set of Borel probability measures on X that are absolutely continuous with respect to \({\mathfrak {m}}\). We denote by \({\mathfrak {m}}_{N}\) the product measure \({\mathfrak {m}}\otimes {\mathfrak {m}}\otimes \cdots \otimes {\mathfrak {m}}\). This is the reference measure we use on the product space \(X^N\). On \(X^N\) we use the sup-metric, which we denote by \(d_N\).
The class of cost functions \(c:X^N\rightarrow {\mathbb {R}}\cup \{+\infty \}\) of our interest is given by functions of the form
where \(f:[0,\infty [\rightarrow {\mathbb {R}}\cup \{+\infty \}\) satisfies the following conditions
Above and from now on, we denote by \((x_1,\ldots , x_N)\) points in \(X^N\), so \(x_i\in X\) for each i.
We denote by
the set of couplings or transport plans, where \(\mathtt {pr}^i\) is the projection
A measure \(\gamma \in {\mathcal {P}}(X^N)\) is symmetric if
for all permutations \(\overline{\sigma }\) of the N symbols \((x_1,\ldots , x_N)\). We denote by \({\mathcal {P}}^{\mathrm {sym}}(X^N)\) the set of symmetric probability measures in \(X^N\), and by
the set of symmetric couplings of \(\rho \).
Let us also introduce the notation for symmetrising measures. If \(\gamma \) is a Borel measure on \(X^N\), we denote by \(\gamma ^S\) the symmetrized measure
where \({\mathcal {S}}_N\) is the set of permutations of the N coordinates \((x_1,\ldots , x_N)\).
We define the functional \(C_0[\gamma ]\) to be the cost related to the coupling \(\gamma \)
Because of the symmetry of the cost c, we immediately have
Proposition 2.1
For every \(\rho \in \mathcal (X)\), we have that
Moreover, if the infimum is attained on one side of the above equality, then it is attained on both sides.
Given \(\varepsilon \ge 0\), we denote by \(C_\varepsilon [\gamma ]\) the entropy-regularized cost
where the entropy \(E[\gamma ]:{\mathcal {P}}(X^N)\rightarrow {\mathbb {R}}\cup \lbrace -\infty ,+\infty \rbrace \) is defined as
The notation \(\rho _\gamma \) stands for the Radon-Nikodym derivative of \(\gamma \) with respect to the reference measure \({\mathfrak {m}}_{N}\) and \(\gamma \ll {\mathfrak {m}}_{N}\) means that \(\gamma \) is absolutely continuous with respect to the reference measure \({\mathfrak {m}}_{N}\). Let \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\). In this paper we are interested in the following infimum
In order to guarantee the lower semicontinuity for \(C_\varepsilon [\cdot ]\), we will assume \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\). This will take care of the entropy part \(E[\cdot ]\) of the cost. In order to establish the lower semicontinuity for the functional \(C_0[\cdot ]\), we assume that the measure \(\rho \) satisfies the following two conditions:
Above we have, by an abuse of notation, denoted the measure \(\rho {\mathfrak {m}}\) by only the density \(\rho \); we will use the same abbreviation in the rest of the paper if there is no risk of confusion. The Condition (B) is a similar assumption to requiring, in the case of the quadratic cost, that the marginal measures have finite second moments. The Condition (A) guarantees that the cost is finite.
If we endow the spaces \({\mathcal {P}}(X^N)\) and \({\mathcal {P}}(X)\) with \(w^*\)-topology then, by Prokhorov’s theorem, any subset of \({\mathcal {P}}(X)\) (or \({\mathcal {P}}(X^N)\)) is tight if and only if it is relatively compact.
Remark 2.2
(Entropy-transport seen as a Kullback–Leibler divergence) If \(\mu \) and \(\nu \) are measures on a set X, the Kullback–Leibler divergence of \(\mu \) with respect to \(\nu \) is defined as
Now, if both measures \(\mu \) and \(\nu \) are absolutely continuous with respect to some reference measure R of the space X with densities \(\rho _\mu \) and \(\rho _\nu \), respectively, we can write:
Considering the entropy-regularized MOT problem, we see that the cost functional \(C_{\varepsilon }[\gamma ]\) can be alternatively written as the Kullback–Leibler divergence between \(\gamma \) and a kernel\(\kappa \) defined below
where \(\kappa = e^{-c/\varepsilon }{\mathfrak {m}}_N\).
For the most part, in this paper we have chosen to consider as a reference measure the measure \({\mathfrak {m}}_N\). However, as the following lemma shows, we could also assume the reference measure to be \((\rho {\mathfrak {m}})^{\otimes N}\) since the minimizers of the entropy-regularized MOT problem (2.4) do not depend on the choice of the reference measure, at least if there exists a minimizer with finite cost. To state the lemma, let us introduce the notation of relative entropy: for each reference measure R of a Polish space Y, and for each \(\gamma \in {\mathcal {P}}(Y)\), we denote by \(E[\gamma \,|\,R]\) the relative entropy of \(\gamma \) with respect to R, defined as
Now we may consider two, a priori different, entropy-regularized MOT problems: the one introduced in (2.4)
and the problem with the reference measure chosen to be \((\rho {\mathfrak {m}})^{\otimes N}\)
The following Lemma 2.3 is used only to go from the compact to the general case in the duality Theorem 4.2. The proof in [11] can be directly applied here to prove Lemma 2.3.
Lemma 2.3
Let \((X,d,{\mathfrak {m}})\) be a Polish measure space, \(\rho {\mathfrak {m}}\in {\mathcal {P}}(X)\) a measure satisfying (A) and (B), and c a cost function satisfying (F1) and (F2). Now for all \(\epsilon > 0\) we have
Moreover, whenever at least one side of the equality above is finite, the problems (2.5) and (2.6) have the same minimizers.
2.1 Some properties of the entropy functional
Let us start by noting that the minimum of the entropy is attained by the product measure and that its value is not \(-\infty \).
Proposition 2.4
Let \((X,d, {\mathfrak {m}})\) be a Polish metric measure space, and let \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) with \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) . Then
Proof
As we will see, the minimality is an immediate consequence of Jensen’s inequality. Let \(\gamma \in \Pi _N(\rho )\). Then
\(\square \)
Using Proposition 2.4 we immediately get the lower semicontinuity of the entropy functional by representing the entropy as relative entropy against the probability measure \(\otimes _{i=1}^N(\rho {\mathfrak {m}})\). See for instance [28, Lemma 4.1] for the lower semicontinuity of the entropy when the reference measure is finite.
Corollary 2.5
Let \((X,d, {\mathfrak {m}})\) be a Polish metric measure space, and let \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) with \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\). Then \(E[\cdot ]\) is lower semicontinuous in the set \(\Pi ^{\mathrm {sym}}_N(\rho )\).
Now we are ready to prove the existence of the minimizers for entropy-regularized MOT:
Proposition 2.6
Let \((X,d,{\mathfrak {m}})\) be a Polish metric measure space. Assume that the measure \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) satisfies \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) along with Conditions (A) and (B). Assume that \(c:X^N\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) satisfies the conditions (F1) and (F2). Then, for each \(\varepsilon \ge 0\), there exists a minimizer \(\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )\) for the entropic-regularized cost \(C_{\varepsilon }[\gamma ]\).
Proof
We notice that the set \(\Pi ^{\mathrm {sym}}_N(\rho )\) is compact in the \(w^*\)-topology [19]. The functional E is lower semicontinuous by Corollary 2.5, and in our setting the lower semicontinuity of \(C_0\) is proven as a part of the proof of [16, Proposition 3.1]. Since for each \(\varepsilon \ge 0\) the functional \(C_\varepsilon \) is lower semicontinuous, we conclude that it has a minimizer in the set \(\Pi ^{\mathrm {sym}}_N(\rho )\)\(\square \)
3 The \(\Gamma \)-convergence of entropic-regularized cost
Now let us turn to the \(\Gamma \)-convergence. From now on, \((\tau _n)_{n\in {\mathbb {N}}}\) is any sequence of positive real numbers decreasing to zero. Let us introduce the following functionals: for each \(n\in {\mathbb {N}}\)
and
The goal of this section is to prove that the sequence \(({\mathcal {C}}_{n\in {\mathbb {N}}})\)\(\Gamma \)-converges to \({\mathcal {C}}\) in the space \({\mathcal {P}}^{\mathrm {sym}}(X^N)\).
Theorem 3.1
Let \((X,d,{\mathfrak {m}})\) be a Polish metric measure space. Let \(\rho \in {\mathcal {P}}^{ac}(X)\) with \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) satisfying (A) and (B). Then the sequence \(({\mathcal {C}}_n)\)\(\Gamma \)-converges to \({\mathcal {C}}\) in the space \({\mathcal {P}}^{\mathrm {sym}}(X^N)\).
Let us fix \(\gamma \in {\mathcal {P}}^{\mathrm {sym}}(X^N)\). We need to show that
The proof of Theorem 3.1 is divided into two parts. The proof of the first part, the liminf-inequality (I), is short and is established in the next subsection. The remainder of this section is then divided into subsections in which the second part, the limsup-inequality (II) is proven.
3.1 Proof of condition (I)
We fix a sequence \((\gamma _n)_{n\in {\mathbb {N}}}\) that converges to \(\gamma \). If \(\gamma \notin \Pi _N(\rho )\), then since the set \(\Pi _N(\rho )\) is compact, for large indices we also have \(\gamma _n\notin \Pi _N(\rho )\), so both sides of inequality (I) are \(+\infty \), and we are done. Hence we may assume that \(\gamma \) and \(\gamma _n\)’s are elements of the set \(\Pi _N(\rho )\). Since now \(\gamma _n\in \Pi _N(\rho )\), the claim (I) follows from the lower-semicontinuity of \(\gamma \mapsto \int c\,{\mathrm d}\gamma \) and from the entropy lower bound shown in Proposition 2.4. \(\square \)
3.2 Constructing an approximation of the coupling \(\gamma \)
First of all, we need to construct an approximation of \(\gamma \) only in the case where \(C_0[\gamma ] < \infty \): if this is not the case then any sequence \((\gamma _n)\) converging to \(\gamma \) can be used to prove Condition (II). The idea of the construction is to redefine a large part of \(\gamma \) to be a product measure on finitely many Borel sets with small diameter. In order not to increase the cost by too much, the Borel sets we are using have to be far away from the diagonal compared to the diameter of the sets. We call the part of the measure defined in this way the core part of the approximation. For the rest of the measure, we take another finite combination of product measures. However, this time the sets do not need to have small (or even bounded) diameter, but just small measure. This part will be called the remainder part of the approximation.
We start the construction by taking out a small part of \(\gamma \) that will later be used to deal with the remainder part of the approximation. For this we take a sequence of radii defined as \(r_n = 1/n\). Since \(C_0[\gamma ] < \infty \), there exists a point \(x =(x_1,\ldots , x_N)\in \mathop {\mathrm{spt}}\nolimits (\gamma )\) with
Moreover, since \(\gamma \in \Pi _N(\rho )\) and \(\rho \) satisfies (A), we have
Thus, using again \(C_0[\gamma ] < \infty \), there exists another point \(x'=(x_{N+1},\ldots , x_{2N}) \in \mathop {\mathrm{spt}}\nolimits (\gamma )\), so that
From now on, we consider \(x,x'\) fixed. Therefore, for \(n \in {\mathbb {N}}\) sufficiently large we have
Let us denote by
the balls around x and \(x'\) with radii \(r_n/10\) in the sup-metric of the product space. So,
if and only if
and analogously for \(B_n'\) with the relevant index modifications.
Let us now define
Observe that \(\gamma _{B_n}\) and \(\gamma _{B_n'}\) are symmetric probability measures. Since the marginals of a symmetric measure are the same, we may denote by \(\rho _{B_n}\) the marginal of \(\gamma _{B_n}\) and similarly by \(\rho _{B_n'}\) the marginal of \(\gamma _{B_n'}\). Let us further denote \({{\tilde{B}}}_n:=\mathop {\mathrm{spt}}\nolimits \gamma _{B_n}\), \({{\tilde{B}}}_n':=\mathop {\mathrm{spt}}\nolimits \gamma _{B_n'}\) and
We then define a measure
The idea behind the measure \(\gamma _{0,n}\) is that we have chopped off a small part of the measure around the points x and \(x'\) (symmetrically) for later use. Since we are working with a singular cost, we still need to take out a small neighbourhood of the diagonals before approximating by product measures. We do this now.
We fix a compact \(K_n\subset X\) such that
and take a small enough \(\delta _n \in (0,r_n)\) so that
denotes the \(\alpha \)-neighbourhood of the pairwise diagonals. Using \(K_n\) and \(\delta _n\) we then define
The measure \(\gamma _{1,n}\) is now the core part of the measure that we approximate. We denote by \(\rho _{1,n}\) the marginals of the symmetric measure \(\gamma _{1,n}\).
Let us then approximate the measure \(\gamma _{1,n}\). We take \(\lambda _n \in (0,\delta _n/n)\) so that
Such \(\lambda _n\) exists by the uniform continuity of f on the compact set \([\delta _n/2,2\mathop {\mathrm{diam}}\nolimits (K_n)]\). Since the set \(K_n\) is compact, we may fix a finite Borel partition \(\{B_n^i\}_{i=1}^{M_n}\) of the set \(\mathop {\mathrm{spt}}\nolimits (\rho _{1,n})\) such that
We are now ready to define the core part approximants \(\gamma _{1,n}^a\) as
Now let us handle the main part of the remainder of the measure, namely the measure
Because \(\gamma _{0,n}\) and the set where we restrict it are symmetric, \(\gamma _{2,n}\) is as well. We may thus denote its marginals by \(\rho _{2,n}\).
In order to determine which part of the remaining marginal measure should be coupled where, we define a partition \(\{A_{i,n}\}_{i=1}^N\) of the space X by setting, for all \(i\in \{1,\ldots , N-1\}\)
and
Condition (3.1) guarantees that the sets \(A_{i,n}\) are pairwise disjoint.
Now we approximate \(\gamma _{2,n}\) by the measure
where for all i the measure \(\eta _{n,i}\) is the product
By the definition of the sets \(A_{i,n}\), for every \((y_1,\ldots ,y_N) \in \mathop {\mathrm{spt}}\nolimits (\gamma _{2,n}^a)\) we have for each \(i \ne j\)
where we have assumed (which we can do without loss of generality) that \(y_j \in {\overline{B}}(x_j,r_n/10)\).
What we have done using the measure \(\gamma _{2,n}^a\) is that we have coupled the marginals of the measure \(\gamma _{2,n}\) with some suitable parts of the marginals of the reserved measure that was taken out around the point x. In this way we have used unevenly the marginals of this reserved part. To handle the rest of the reserved part of the measure around the point x, we now use the reserved measure around the point \(x'\). So, we need to redefine the coupling for the part of the marginal given by
We define it as
where each \(\phi _{n,i}\) is defined as
Since \(\mathop {\mathrm{spt}}\nolimits (\rho _{3,n}) \subset \mathop {\mathrm{spt}}\nolimits (\rho _{B_n})\), we have that for every \((y_1,\ldots ,y_N) \in \mathop {\mathrm{spt}}\nolimits (\gamma _{3,n}^a)\) and each \(i \ne j\)
where \(k(i)\ne k(j)\) are the indices for which \(y_j \in {\overline{B}}(x_{k(j)},r_n/10)\) and \(y_i \in {\overline{B}}(x_{k(i)},r_n/10)\).
What remains is the part of the measure around \(x'\) that was not used for \(\gamma _{3,n}^a\). Since \(\gamma _{3,n}^a\) used the marginals from this part of the reserved measure evenly, we may simply couple the rest by a measure
with b being the correct scaling constant. Similarly as for the previous remainder part, we have that for every \((y_1,\ldots ,y_N) \in \mathop {\mathrm{spt}}\nolimits (\gamma _{4,n}^a)\) and each \(i \ne j\) the inequality (3.9) holds.
Now we are ready to define the full approximation as
By construction \(\gamma _n' \in \Pi _N^\mathrm {sym}(\rho )\).
3.3 Narrow convergence of the approximations
Let us now prove that the sequence \((\gamma _n')_n\) narrowly converges to \(\gamma \). We could argue this by using the Wasserstein distance. However, let us do it here directly using the definition of narrow convergence.
Lemma 3.2
The sequence \((\gamma _n')_n\) narrowly converges to \(\gamma \).
Proof
Let \(\varphi \in C_b(X^N)\) and \(\varepsilon > 0\). We need an index \(N_0\in {\mathbb {N}}\) such that
Let us denote \(M:=\sup _{x\in X^N}|\varphi (x)|\); we may assume that \(M>0\). Since \(\rho \) is inner regular, we can fix a compact set \(K\subset X\) such that
Since \(\gamma \in \Pi _N(\rho )\), we now have
The function \(\varphi \), when restricted to \(K^N\), is uniformly continuous. Hence there exists \(\delta > 0\) so that
Now, let \(N_0 \in {\mathbb {N}}\) be so large that
Let us show that this choice of \(N_0\) satisfies (3.10). First we note that for all \(n\ge N_0\) we have
where in the last inequality we have used the following facts: \(\gamma (X^N)-\gamma _{1,n}(X^N)<3\varepsilon _n\) for all n, and for the remainder part of the measure \(\gamma _n'\) we have
It remains to show that for all \(n\ge N_0\) we have
We first estimate the integrals in the set \(K^N\). Let us fix, for each \((k_1,\ldots , k_N)\in M_n^N\) for which the set
is nonempty, an element
Now we have, for a fixed \((k_1,\ldots , k_N)\), denoting for simplicity
where in a) we have used (3.11), and in b) the fact that the total measures of \(\gamma \) and \(\gamma _a\) coincide on ’cubes’ Q. Summing the estimate above over all cubes \(Q=B_{k_1}\times \cdots \times B_{k_N}\), \((k_1,\ldots , k_N)\in M_n^N\), gives
where in inequality a) we have used the fact that \(\rho (X\setminus K)<\frac{\epsilon }{12MN}\) and, since the marginals of \(\gamma _{1,n}\) and \(\gamma _{1,n}^a\) are restrictions of \(\rho \), we can bound both \(\gamma _{1,n}(X^N\setminus K^N)\) and \(\gamma _{1,n}^a(X^N\setminus K^N)\) by \(\frac{\varepsilon }{12M}\). For the same reason, we have
Combining estimates (3.14) and (3.15) gives
proving (3.13). \(\square \)
3.4 Convergence of the cost functional
In order to prove the \(\Gamma \)-limsup inequality (II), we need the cost \(C_0[\cdot ]\) to converge along the approximating sequence \(\gamma _n\). We prove this in the following lemma.
Lemma 3.3
We have \(C_0[\gamma _n'] \rightarrow C_0[\gamma ]\) as \(n \rightarrow \infty \).
Proof
Let us first consider the remainder part. Recall that for all \(n \in {\mathbb {N}}\) we have
Thus, using the lower bounds (3.8) and (3.9) for distances in the support of the remainder part, and the definition (3.2) of \(\varepsilon _n\), we get
as \(n \rightarrow \infty \). By continuity of the integral, we get
as \(n \rightarrow \infty \).
Let us now estimate the core part of the approximation. By the construction (3.7) of \(\gamma _{1,n}^a\) and the choice (3.6) of \(\lambda _n\), we have
Combining (3.16), (3.17) and (3.18) we get
as \(n \rightarrow \infty \). \(\square \)
3.5 Finiteness of the entropy for the approximations
Next we show that the entropy is finite for the approximating sequence. Notice that, in order to prove (II), we do not need a better estimate on the entropy.
Lemma 3.4
For each \(n\in {\mathbb {N}}\) we have \(E[\gamma _n'] < \infty \).
Proof
In order to see the finiteness of the entropy, it suffices to notice that each \(\gamma _n'\) is a sum of finitely many measures \(({\tilde{\gamma }}_{n,k})_{k=1}^{N_n}\) each of which is of the form \({\tilde{\gamma }}_{n,k} = {\tilde{\rho }}_1^k{\mathfrak {m}}\otimes \cdots \otimes {\tilde{\rho }}_N^k{\mathfrak {m}}\) with \({\tilde{\rho }}_i^k \ll \rho \) and \(\frac{{\mathrm d}{\tilde{\rho }}_i^k}{{\mathrm d}\rho } \le 1\). Indeed, by Proposition 2.4, the entropy is always bounded from below, and so we can make a crude estimate:
\(\square \)
3.6 Proof of condition (II)
We are now ready to prove the \(\Gamma \)-\(\limsup \) inequality (II). By Lemma 3.2 we already know that \((\gamma _n')_n\) converges to \(\gamma \). However, \({\mathcal {C}}_n[\gamma _n']\) need not converge to \({\mathcal {C}}[\gamma ]\). This can be solved by making the convergence of \((\gamma _n')_n\) slower by repeating always the same measure for sufficiently (but finitely) many times before moving to the next one. We define k(n) for every \(n \in {\mathbb {N}}\) as
By definition, \(1 \le k(n) \le n\). Moreover, since for every \(j\in {\mathbb {N}}\) we have \(E[\gamma _j'] < \infty \) by Lemma 3.4 and \(\tau _n \rightarrow 0\) by definition, we have that \(k(n) \rightarrow \infty \) as \(n\rightarrow \infty \). Thus, defining \(\gamma _n = \gamma _{k(n)}'\), for large enough \(n\in {\mathbb {N}}\) we have
Recalling that by Lemma 3.3 we have \(C_0[\gamma _{k(n)}'] \rightarrow C_0[\gamma ]\), we conclude the proof. \(\square \)
In Proposition 2.6 the existence of a minimizer for the entropy-regularized cost was established. Now that we know that measures \(\gamma \) for which \(C_0(\gamma )<\infty \) can be approximated by measures with not only finite costs but also finite entropy, we can say more:
Corollary 3.5
Let \((X,d,{\mathfrak {m}})\) be a Polish metric measure space. Assume that \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) satisfies \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) and Conditions (A) and (B). Assume that \(c:X^N\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) satisfies Conditions (F1) and (F2). Then, for each \(\varepsilon > 0\), there exists a unique minimizer \(\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )\) for the entropic-regularized cost \(C_{\varepsilon }[\gamma ]\).
Proof
Our marginal measure satisfies Conditions (A) and (B), so there exists a measure \(\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )\) that minimizes \(C_0\) with \(C_0[\gamma ]<\infty \). It must be noted that this measure can have infinite entropy. However, because of the approximation result presented in the proof of Condition (II) above, we get the existence of a measure \(\gamma '\in \Pi ^{\mathrm {sym}}_N(\rho )\) such that \(C_\epsilon [\gamma ']<\infty \). The uniqueness claim now follows, since the functional \(\gamma \mapsto C_\epsilon [\gamma ]\) is strictly convex for \(\epsilon >0\). \(\square \)
4 Entropic-Kantorovich duality for Coulomb-type costs
We start by recalling the classical Fenchel–Rockafellar Theorem. We refer to the I. Ekeland and R. Témam’s book [13, Theorem 4.2] for a more complete presentation and references.
Theorem 4.1
(Fenchel–Rockafellar) Let \({\mathcal {X}}\) and \({\mathcal {Y}}\) be Banach spaces and \(A:{\mathcal {X}}\rightarrow {\mathcal {Y}}\) be linear and continuous. Let \(F:{\mathcal {X}}\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) and \(G:{\mathcal {Y}}\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) be proper and convex functions. Then
where \(A^*:{\mathcal {Y}}^*\rightarrow {\mathcal {X}}^{*}\) denotes the adjoint operator of A.
Next we prove the Entropic-Kantorovich duality for the problem (2.4).
Theorem 4.2
(Entropic Duality for repulsive costs) Let \((X,d,{\mathfrak {m}})\) be a Polish measure space. Suppose \(\rho {\mathfrak {m}}\in {\mathcal {P}}(X)\) such that (A) and (B) hold and \(\rho \log \rho \in L_m^1(X)\), and \(c:X^N\rightarrow [0,\infty ]\) is a cost function
where \(f:[0,+\infty [\rightarrow [0,\infty ]\) is a function satisfying (F1) and (F2). Then, for \(\varepsilon > 0\), the duality holds
where \(v_1\oplus \cdots \oplus v_N\) denotes the operator \((v_1\oplus \cdots \oplus v_N)(x_1,\ldots , x_N) = v_1(x_1)+ \cdots + v_N(x_N)\).
Proof
First let us assume that X is a compact space. We denote by \({\mathcal {X}}= (C_b(X))^N\) and \({\mathcal {Y}}=C_b(X^N)\), where \(C_b(X)\) is the space of continuous and bounded functions on X, and similarly for \(X^N\). By Riesz representation theorem, the space \({\mathcal {Y}}\) is dual to the space \({\mathcal {M}}(X^N)\) of signed regular Borel measures on \(X^N\). Thus, we may define the Legendre–Fenchel transform \(G^{*}\) of a functional \(G:{\mathcal {Y}}\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) by
We define the functionals
and
and the operator
Now, F and G are proper and convex functionals and A is a linear and continuous operator. So, we may apply Fenchel–Rockafellar duality Theorem 4.1 to get
This gives (since for every set S we have \(\inf (S)=-\sup (-S)\) and \(\sup S=-\inf (-S)\))
It remains to show that the above expression has exactly the form of our duality claim. The claim that the right-hand sides correspond to each other follows immediately from our choices of \({\mathcal {X}}\), F, and G. So, it remains to show that
To prove it, let \(\gamma \in {\mathcal {M}}(X^N)\). Now we have
Let us then compute \(G^*[\gamma ]\):
If \(\gamma \) is not absolutely continuous with respect to \({\mathfrak {m}}_{N}\), we have \(G^*[\gamma ]=+\infty \). If \(\gamma \ll {\mathfrak {m}}_{N}\), then the supremum (that appears in the definition of \(G^{*}[\gamma ]\)) is realized at \(\psi =\varepsilon \log \rho _\gamma +c\); this holds also if the function \(\rho _\gamma \) is not continuous since it can be approximated by a sequence of continuous functions. Thus, we get for \(\gamma \ll {\mathfrak {m}}_{N}\)
Hence, if \(\gamma \in \Pi _N(\rho )\), we have
This concludes the duality proof when X is a compact space.
The noncompact case Due to Lemma 2.3, it suffices to prove the claim in the case where the reference measure is \(\rho {\mathfrak {m}}\) instead of \({\mathfrak {m}}\); the finiteness of the measure \(\rho {\mathfrak {m}}\) now gives access to inner regularity and to the approximability by compact sets. We will for simplicity denote \(\rho :=\rho {\mathfrak {m}}\).
The claim is
For simplicity, let us denote
We may assume that \(\sup _{u\in C_b(X)}D_\rho (u)>-\infty \); indeed, since we can test with the function \(u\equiv 0\), this always holds for cost functions that are bounded from below.
Let us make, in the notation of the primal functional, the dependence on the reference measure explicit by the notation \(\gamma \mapsto C_\epsilon [\gamma \,|\,\mu ]\) when the reference measure on the space X is \(\mu \). Thus the original notation \(\gamma \mapsto C_\epsilon [\gamma ]\) corresponds to \(\gamma \mapsto C_\epsilon [\gamma \,|\,{\mathfrak {m}}]\).
Since the measures \(\rho \) and \(\gamma \) are inner regular, there exists a sequence \((K_n)_{n\in {\mathbb {N}}}\) of compact subsets of X such that
Let us denote \(\gamma _n:=\frac{1}{\gamma (K_n^N)}\gamma \big |_{K_n^N}\) and \(\rho _n:=\frac{1}{\rho (K_n)}\rho \big |_{K_n}\). Let us also denote by \(\gamma _n^{\text {min}}\) the minimizer of the problem \(I_\epsilon [\rho _n]\). Since \(\gamma \) is the minimizer of the problem \(I_\epsilon (\rho )\) and since (due to the absolute continuity of the integral and continuity of the function \(t\mapsto t\log t\))
we have
By the duality result proven above for compact spaces, we have for all \(n \in {\mathbb {N}}\)
Again, due to the absolute continuity of the integral, we have
Putting these conditions together, we get for all \(n\in {\mathbb {N}}\)
The claim follows by letting \(n\rightarrow \infty \). \(\square \)
References
Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43, 904–924 (2011)
Benamou, J.D., Carlier, G., Nenna, L.: A Numerical method to solve multi-marginal optimal transport problems with Coulomb cost. In: Glowinski, R., Osher, S., Yin, W. (eds.) Splitting Methods in Communication, Imaging, Science, and Engineering. Scientific Computation. Springer, Cham (2016)
Benamou, J.-D., Carlier, G., Nenna, L.: Generalized incompressible flows, multi-marginal transport and Sinkhorn algorithm. Numer. Math. 142, 33–54 (2019)
Buttazzo, G., De Pascale, L., Gori-Giorgi, P.: Optimal-transport formulation of electronic density functional theory. Phys. Rev. A 85, 062502 (2012)
Carlier, G., Duval, V., Peyré, G., Schmitzer, B.: Convergence of entropic schemes for optimal transport and gradient flows. SIAM J. Math. Anal. 49, 1385–1418 (2017)
Cotar, C., Friesecke, G., Klüppelberg, C.: Density functional theory and optimal transportation with coulomb cost. Commun. Pure Appl. Math. 66, 548–599 (2013)
Cotar, C., Friesecke, G., Klüppelberg, C.: Smoothing of transport plans with fixed marginals and rigorous semiclassical limit of the Hohenberg–Kohn functional. Arch. Ration. Mech. Anal. 228, 891–922 (2018)
Cotar, C., Friesecke, G., Klüppelberg, C.: Second order differentiation formula on \({RCD}^*({K},{N})\) spaces. Accepted at JEMS. arXiv:1802.02463 (2018)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, pp. 2292–2300 (2013)
Di Marino, S., De Pascale, L., Colombo, M.: Multimarginal optimal transport maps for \(1 \)-dimensional repulsive costs. Can. J. Math. 67, 350–368 (2015)
Di Marino, S., Gerolin, A.: An optimal transport approach for the Schrödinger bridge problem and convergence of Sinkhorn algorithm. arXiv preprint arXiv:1911.06850 (2019)
Di Marino, S., Gerolin, A., Nenna, L.: Optimal transport theory for repulsive costs. In: Topological Optimization and Optimal Transport: In the Applied Sciences, vol. 17 (2017)
Ekeland, I., Temam, R.: Analyse convexe et problèmes variationelles. Dunod, Gauthier-Villars, Paris, ix+340 p (1974)
Flamary, R., Courty, N.: POT Python Optimal Transport library (2017). https://github.com/rflamary/POT
Gerolin, A., Grossi, J., Gori-Giorgi, P.: Kinetic correlation functionals from the entropic regularisation of the strictly-correlated electrons problem. J. Chem. Theory Comput. (2019)
Gerolin, A., Kausamo, A., Rajala, T.: Duality theory for multi-marginal optimal transport with repulsive costs in metric spaces. ESAIM Control Optim. Calc. Var. 25, 62 (2019)
Gigli, N., Tamanini, L.: Second order differentiation formula on compact \({R}{C}{D}^\ast ({K}, {N})\) spaces. arXiv:1701.03932 (2017)
Gozlan, N., Léonard, C.: Transport inequalities: a survey. Markov Process. Related Fields 16, 635–736 (2010)
Kellerer, H.G.: Duality theorems for marginal problems. In: Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 67 (1984)
Léonard, C.: From the Schrödinger problem to the Monge–Kantorovich problem. J. Funct. Anal. 262, 1879–1920 (2012)
Léonard, C.: A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete Continuous Dyn. Syst. 34, 1533–1574 (2014)
Lieb, E.H.: Density functionals for Coulomb systems. In: Loss, M., Ruskai, M.B. (eds.) Inequalities. Springer, Berlin, Heidelberg (2002)
Mikami, T.: Monge’s problem with a quadratic cost by the zero-noise limit of h-path processes. Probab. Theory Related Fields 129, 245–260 (2004)
Nenna, L.: Numerical Methods for Multi-Marginal Opimal Transportation, PhD thesis, Université Paris-Dauphine (2016)
Peyré, G., Cuturi, M.: Computational Optimal Transport, vol. 11. Now Publishers Inc, Hanover (2019)
Schrödinger, E.: Über die umkehrung der naturgesetze. Verlag Akademie der wissenschaften in kommission bei Walter de Gruyter u, Company (1931)
Seidl, M., Di Marino, S., Gerolin, A., Giesbertz, K., Nenna, L., Gori-Giorgi, P.: The strictly-correlated electron functional for spherically symmetric systems revisited. Accepted in Phys. Rev. A. arXiv:1702.05022) (2016)
Sturm, K.-T.: On the geometry of metric measure spaces. Acta Math. 196, 65–131 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by A.Malchiodi.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors acknowledge the support of the Academy of Finland, Projects Nos. 274372, 284511, 312488, and 314789. A.G. also acknowledges funding by the European Research Council under H2020/MSCA-IF “OTmeetsDFT” (Grant ID: 795942). A.K. also wants to thank the Vilho, Yrjö and Kalle Väisälä Foundation for funding.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gerolin, A., Kausamo, A. & Rajala, T. Multi-marginal entropy-transport with repulsive cost. Calc. Var. 59, 90 (2020). https://doi.org/10.1007/s00526-020-01735-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00526-020-01735-3