\AtEveryBibitem\clearfield

month \DeclareFieldFormatvolumevolume #1 \DeclareFieldFormat[article]volume#1 \addbibresourcemybib.bib

Transient subtraction: A control variate method for computing transport coefficients

Pierre Monmarché^a, LJLL and LCT, Sorbonne Université, Paris, France Renato Spacek^b, MATHERIALS team, Inria Paris, France CERMICS, École des Ponts, France Gabriel Stoltz^c, CERMICS, École des Ponts, France MATHERIALS team, Inria Paris, France

Abstract

In molecular dynamics, transport coefficients measure the sensitivity of the invariant probability measure of the stochastic dynamics at hand with respect to some perturbation. They are typically computed using either the linear response of nonequilibrium dynamics, or the Green–Kubo formula. The estimators for both approaches have large variances, which motivates the study of variance reduction techniques for computing transport coefficients. We present an alternative approach, called the transient subtraction technique (inspired by early work by Ciccotti and Jaccucci in 1975), which amounts to simulating a transient dynamics, from which we subtract a sensibly coupled equilibrium trajectory, resulting in an estimator with smaller variance. We present the mathematical formulation of the transient subtraction technique, give error estimates on the bias and variance of the associated estimator, and demonstrate the relevance of the method through numerical illustrations for various systems.

1 Introduction

When considering large systems of interacting particles, quantities of interest are typically macroscopic properties, such as temperature and pressure, rather than microscopic ones. Generally, full microscopic descriptions are not only too large to be reasonably considered, but also largely uninteresting. From a numerical viewpoint, molecular dynamics provides an effective way of bridging the microscopic and macroscopic properties of such systems through computer simulations; see [tuckerman2010, leimkuhler2015, allen2017] for reference textbooks. These simulations are typically done via the numerical realization of a stochastic differential equation (SDE), such as the Langevin dynamics, which evolves the positions $q$ and momenta $p$ as

\displaystyle\begin{cases}dq_{t}=M^{-1}p_{t}\,dt,\\ dp_{t}=-\nabla V(q_{t})\,dt-\gamma M^{-1}p_{t}\,dt+\sqrt{\dfrac{2\gamma}{\beta% }}\,dW_{t},\end{cases}

(1)

where $V$ is the potential energy function, $M$ the mass matrix, $\gamma>0$ the damping coefficient, $\beta>0$ the inverse temperature and $W_{t}$ a standard multidimensional Brownian motion.

One particular application of molecular dynamics is the computation of transport coefficients (such as the diffusivity, mobility and shear viscosity), which encode important physical properties of materials, and in particular measure how quickly a perturbed system returns to steady-state. At the microscopic level, transport coefficients are defined as the proportionality constant between the magnitude $\eta\ll 1$ of some external forcing exerted on the system, and some flux induced by this forcing. The flux is represented as the steady-state average $\mathbb{E}_{\eta}(R)$ for some given observable $R$ with average 0 with respect to the equilibrium system $(\eta=0)$ . This can be made precise through the framework of linear response theory; see [chandler1987, Chapter 8] for an introduction. To numerically realize this, one considers a nonequilibrium system by adding a perturbation of magnitude $\eta$ to the reference dynamics at hand (e.g. Langevin dynamics), and the appropriate flux is then measured as a time-average over a long trajectory; this is known as the nonequilibrium molecular dynamics (NEMD) method.

Alternatively, the linear response can be reformulated as an equilibrium integrated correlation, known as the Green–Kubo (GK) formula [green1954, kubo1957]. Both the NEMD and Green–Kubo methods are commonly used, and each has their advantages and drawbacks; see [stoltz2024] for a detailed numerical comparison of both approaches.

Although less common, a third class of techniques consists of methods based on transient dynamics, which typically rely on monitoring the system’s relaxation to steady-state after an initial perturbation (unlike the NEMD and GK methods, which are based on steady-state averages). As will be made precise in Section 2.2, transient methods can be applied in two different ways: (i) starting from an equilibrium system with perturbed initial conditions, and allowing the system to relax back to its equilibrium steady-state, e.g. the momentum impulse relaxation [arya2000] and the approach-to-equilibrium molecular dynamics methods [lampin2013]; or a somewhat dual approach, carried out by (ii) applying a driving force to an equilibrium system and monitoring its relaxation towards a nonequilibrium steady-state, such as the transient-time correlation function method [morriss1987, evans1988].

All three classes of methods suffer from severe numerical difficulties, in particular with the presence of large statistical error being the main challenge, as made precise in Section 2.2. For NEMD, the statistical error mainly arises from the large signal-to-noise ratio (due to the small magnitude of the perturbation $\eta$ ), which requires long integration times to offset the variance. For Green–Kubo, the statistical error scales linearly with the integration time $T$ due to the decay of correlations, as it amounts to integrating a small quantity plagued by a large statistical error [sousaoliveira2017]. There have been several attempts at more efficient methods to compute transport coefficients in the context of variance reduction [pavliotis2023, spacek2023, blassel2024]. In particular, one such method is known as the subtraction technique, developed and investigated in [ciccotti1975], and further explored in [ciccotti1979]. The method is based on NEMD, and its key idea relies on estimating the equilibrium quantity $\mathbb{E}_{0}(R)$ in addition to the usual nonequilibrium response $\mathbb{E}_{\eta}(R)$ . One would then subtract the estimated equilibrium trajectory (which has average 0 with respect to the stationary distribution by definition, thus leaving the transport coefficient unchanged) from the perturbed trajectory, yielding an estimator with lower variance provided that the two trajectories are sufficiently correlated.

As discussed in both works [ciccotti1975, ciccotti1979] (which consider a deterministic framework), the high correlation between trajectories is a natural artifact of the deterministic dynamics for reasonably short integration times. This allows for the statistical error to be effectively subtracted out through the equilibrium trajectory, thus making the subtraction step an effective control variate. In stochastic settings, however, using independent noises for equilibrium and nonequilibrium trajectories (corresponding to $\eta=0$ and $\eta\neq 0$ , respectively) results in uncorrelated trajectories. This suggests the need for constructing a sensible coupling between the two systems; otherwise, the subtraction step would essentially amount to adding two independent Gaussians, doubling the variance of the estimator at hand.

One way to overcome this issue is to consider couplings, which have been used as a control variate to compute transport coefficients [goodman2009, garnier2022]. One particular example of a common coupling strategy is synchronous coupling, which amounts to using the same noise for both dynamics. A major challenge with coupling techniques, however, is ensuring that trajectories stay coupled for long times. This is especially problematic for systems which rely on long-time averages for convergence, e.g. NEMD. Typically, one hopes to obtain convergence of time-averages before trajectories decouple, but this cannot be assumed in general without additional (and often restrictive) requirements.

Synchronous coupling, for instance, typically requires conditions such as global dissipativity in order to ensure long-time couplings of the trajectories. In general, however, global dissipativity is only obtained under strong conditions. One such example is when the drift $b$ is given by $b=-\nabla V$ for some strongly convex potential $V$ , which is too restrictive a requirement for actual applications in MD. This suggests that synchronous coupling is typically impractical for estimators that require long-time integration such as NEMD, as the decoupling time is much shorter than the time needed for convergence with no global dissipativity. Although in some cases, for instance at high temperatures, synchronously coupled trajectories might not decouple at all even with no global dissipativity; see [monmarche2023]. A natural way to address this problem would be to construct couplings with milder conditions which guarantee long-time couplings but this remains challenging (see for instance [darshan2024]).

We adopt an alternative viewpoint: we consider methods for which convergence of an observable is feasible over short-times. In particular, we devise a transient method, consisting of an initially perturbed trajectory relaxing to equilibrium. We thus do not require long-time averages for convergence, which suggests that we can use synchronous coupling under weak conditions, provided that the relaxation time is smaller than the decoupling time. Indeed, even though the dynamics might start to decouple before relaxation, the total variance might be nonetheless decreased due to the control variate, as analysis will show that variance reduction is still obtainable.

Outline

This work is organized as follows. We discuss in Section 2 some standard numerical methods for approximating transport coefficients, and present an approach based on integrating dynamics in the transient regime. Then, by employing the subtraction technique to the transient method, we construct in Section 3 an improved transient subtraction estimator. We provide some error analysis on its bias and variance. We then illustrate the efficacy of our method with numerical results for several systems in Section 4, namely by computing the mobility for one-dimensional Langevin dynamics, and mobility and shear viscosity for a Lennard–Jones fluid. Finally, conclusions and extensions are discussed in Section 5.

2 Transient method to compute transport coefficients

We discuss in this section the definition and computation of transport coefficients, and in particular the use of a transient method for their approximation. We start by presenting in Section 2.1 the general setting used to compute transport coefficients for a general SDE, then overview their standard numerical approximations and associated numerical difficulties in Section 2.2. We then introduce the transient method we consider in this work in Section 2.3.

2.1 General setting

Consider a general time-homogeneous SDE with additive noise defined on the state-space $\mathcal{X}$ , where $\mathcal{X}$ is typically $\mathbb{R}^{d}$ or $\mathbb{T}^{d}$ (with $\mathbb{T}=\mathbb{R}/\mathbb{Z}$ the one-dimensional torus):

dX_{t}=b(X_{t})\,dt+\sigma\,dW_{t}.

(2)

where $b\colon\mathcal{X}\to\mathbb{R}^{d}$ is a smooth function, $\sigma\in\mathbb{R}^{d\times m}$ is a constant matrix and $W_{t}$ is a standard $m$ -dimensional Brownian motion. We assume that (2) admits a unique strong solution (which is the case for instance when $b$ is globally Lipschitz). We restrict ourselves to cases where $\sigma$ is constant, as the dynamics of interest considered later on involve additive noise, namely Langevin dynamics, and also because the coupling method we introduce in Section 3.1 is considerably easier to formulate in this setting. The dynamics (2) has associated infinitesimal generator

\mathcal{L}=b^{\mathsf{T}}\nabla+\frac{1}{2}\sigma\sigma^{\mathsf{T}}\colon% \nabla^{2}=\sum_{i=1}^{d}b_{i}\partial_{x_{i}}+\frac{1}{2}\sum_{i,j=1}^{d}\sum% _{k=1}^{m}\sigma_{ik}\sigma_{jk}\partial_{x_{i}x_{j}}^{2},

(3)

where $\colon$ denotes the Frobenius inner product. Throughout this work, we assume that (2) admits a unique invariant probability measure $\mu$ with a positive density with respect to the Lebesgue measure. We denote by

L^{2}_{0}(\mu)=\left\{\varphi\in L^{2}(\mu)\,\middle|\,\int_{\mathcal{X}}% \varphi\,d\mu=0\right\}

the space of $L^{2}(\mu)$ functions with average 0 with respect to $\mu$ .

Transport coefficients measure how the steady state of the reference dynamics (2) changes when some external forcing is applied to it. This external forcing typically arises as an extra drift term of magnitude $\eta$ , with $|\eta|$ small in order for the forcing to be considered as a small perturbation. In this context, the transport coefficient $\rho$ is defined as the proportionality constant between the steady-state flux of some observable $R$ of interest, and the magnitude of the external forcing needed to induce it, known as the linear response; see [chandler1987, Chapter 8] for an introduction to linear response theory, and for instance [spacek2023, Section 2] for a synthetic presentation. We assume that the observable $R$ has average 0 with respect to $\mu$ (without loss of generality, as it can always be recentered in case it has a nonzero average). The linear response can be reformulated in terms of an integrated time-correlation function, known as the Green–Kubo formula. For simplicity, we do not further recall the framework of linear response theory and instead directly write the Green–Kubo formula:

\displaystyle\rho=\int_{0}^{+\infty}\mathbb{E}_{\mu}(R(X_{t})S(X_{0}))\,dt,

(4)

where $S\in L^{2}_{0}(\mu)$ is the conjugate response function, which depends on the extra drift term added to perturb the dynamics (see [lelievre2016, Section 5.2.3] for a precise definition), and where the expectation $\mathbb{E}_{\mu}$ is taken with respect to all initial conditions $X_{0}\sim\mu$ , and over all realizations of the dynamics (2). Let us emphasize that the conjugate response $S$ has average 0 by construction.

2.2 Numerical techniques to compute transport coefficients

Transport coefficients can be numerically estimated using a variety of techniques. Generally, such techniques fall into one of three main categories (see [lelievre2016, stoltz2024] for a detailed discussion and numerical comparison):

(1)

Equilibrium techniques based on the Green–Kubo formula (4). In order to numerically realize (4), one constructs an estimator by (i) truncating the time-integral to finite integration time $T$ ; and (ii) approximating the expectation with an average over $K$ independent trajectories of the system $(X_{t}^{k})_{t\geqslant 0}$ with $1\leqslant k\leqslant K$ . This leads to the following natural estimator:

\mathcal{\widehat{\rho}}^{T,K}_{\rm GK}=\frac{1}{K}\sum_{k=1}^{K}\int_{0}^{T}R% (X_{t}^{k})S(X_{0}^{k})\,dt.

(5)

The sources of error associated with the estimator (5) are

•

A statistical error $\mathrm{O}(T)$ , which scales linearly with the time lag [sousaoliveira2017, plechac2022, gastaldello2024] and is typically the largest source of error;
•

An integration time truncation bias, which is small as correlations are typically exponentially decaying (as discussed in [plechac2022]);
•

A discretization bias, which arises from the finiteness of the timestep used to discretize (2), and from quadrature formulas for the time integral [leimkuhler2016, lelievre2016].

The various sources of error suggest carefully choosing $T$ in order to minimize the error as a tradeoff between $T$ large enough for the time truncation bias to be small, but not too large in order to limit the increase in variance.

(2)

Nonequilibrium steady-state techniques. This method is based on linear response theory. It amounts to permanently adding an external forcing to the system, which induces a nonzero flux in the steady-state. The transport coefficient is then obtained by diving the average flux by the magnitude of the perturbation, for small values of the perturbation in order to ensure one stays in the linear response regime.

There are several sources of error associated with this technique. In particular, the main concern is the statistical error, much larger than the usual asymptotic variance for standard time averages due to the small magnitude of the forcing. See [lelievre2016, Section 5], [spacek2023, Section 2] and [leimkuhler2016, Section 3] for a more detailed discussion on the numerical analysis of nonequilibrium methods.
(3)

Transient methods. While both Green–Kubo and nonequilibrium methods are based on steady-state dynamics, transient methods provide an alternative framework by monitoring the system’s relaxation to a steady-state after an initial perturbation, and can be classified into two main approaches:
1. (3a)
  
  Equilibrium relaxation: A typical scenario is to perturb an equilibrium system by creating an initial profile of momentum or energy, for instance, which is then allowed to relax to an equilibrium steady-state through the time-evolution of the equilibrium dynamics. During relaxation, the corresponding transient profiles are monitored, which are often fit to a macroscopic effective PDE parametrized by the transport coefficient at hand in order to estimate this transport coefficient by some form of inverse problem fitting. Examples include the method proposed in [hulse2005] to compute the thermal conductivity, the momentum impulse relaxation method [arya2000], and the approach-to-equilibrium molecular dynamics method [lampin2013].
2. (3b)
  
  Nonequilibrium relaxation: In a somewhat dual approach, one can alternatively start with an equilibrium system and drive it towards a nonequilibrium steady-state by applying an external forcing to the dynamics. The relaxation to a nonequilibrium steady-state is then monitored, from which the transport coefficient can be obtained. One example is the transient-time-correlation function (TTCF) method [morriss1987, evans1988], which generalizes the Green–Kubo relations to nonlinear regimes.

The limitations and drawbacks listed above suggest that there is space for alternative approaches, in particular in the context of variance reduction; this motivates the construction of the transient subtraction method.

2.3 Transient dynamics method

As discussed in Section 2.2, an alternative approach to the NEMD and GK methods for computing transport coefficients is based on transient dynamics. The fundamental idea is that, instead of applying an external forcing to the dynamics, or computing correlations for the equilibrium dynamics, we start from an initially perturbed system, and monitor its relaxation to steady-state by evolving equilibrium dynamics.

Mathematical formulation

The transient method relies on two main ingredients: (i) perturbing the distribution of initial conditions at order $\mathrm{O}(\eta)$ with $\eta\ll 1$ ; and (ii) monitoring return to stationarity via time integration. More precisely, we consider a process $X_{t}^{\eta}$ which evolves according to the reference dynamics (2), with $X_{0}^{\eta}\sim\widetilde{\mu}_{\eta}$ . The probability measure $\widetilde{\mu}_{\eta}$ is assumed to be a first-order perturbation of the invariant probability measure of the reference dynamics $\mu$ , satisfying

\widetilde{\mu}_{\eta}=(1+\eta S)\mu+\mathrm{O}(\eta^{2}).

(6)

We then evolve the process $X_{t}^{\eta}$ according to the reference equilibrium dynamics, which relaxes over time to its equilibrium steady-state. In particular, although not immediately clear, the time integral of the expectation of $R(X_{t}^{\eta})$ , when divided by $\eta$ , converges to the transport coefficient $\rho$ as $\eta$ goes to $0$ :

\rho=\lim_{\eta\to 0}\frac{1}{\eta}\int_{0}^{+\infty}\mathbb{E}(R(X_{t}^{\eta}% ))\,dt.

(7)

To motivate the equality (7), we consider finite $\eta\ll 1$ . By writing the expectation in terms of the semigroup, and using that $\mathrm{e}^{t\mathcal{L}}R$ has average 0 with respect to $\mu$ (by the invariance of $\mu$ by the dynamics and the fact that $R$ has average 0 with respect to $\mu$ ), we have, informally,

$\displaystyle\frac{1}{\eta}\int_{0}^{+\infty}\mathbb{E}(R(X_{t}^{\eta}))\,dt$	$\displaystyle=\frac{1}{\eta}\int_{0}^{+\infty}\int_{\mathcal{X}}\bigl{(}% \mathrm{e}^{t\mathcal{L}}R\bigr{)}\,d\widetilde{\mu}_{\eta}\,dt$	(8)
	$\displaystyle=\frac{1}{\eta}\int_{0}^{+\infty}\int_{\mathcal{X}}\mathrm{e}^{t% \mathcal{L}}R\,d\mu\,dt+\int_{0}^{+\infty}\int_{\mathcal{X}}\bigl{(}\mathrm{e}% ^{t\mathcal{L}}R\bigr{)}S\,d\mu\,dt+\mathrm{O}(\eta)$	(9)
	$\displaystyle=\int_{0}^{+\infty}\int_{\mathcal{X}}\bigl{(}\mathrm{e}^{t% \mathcal{L}}R\bigr{)}S\,d\mu\,dt+\mathrm{O}(\eta)$	(10)
	$\displaystyle=\int_{0}^{+\infty}\mathbb{E}_{\mu}\bigl{(}R(X_{t})S(X_{0})\bigr{% )}dt+\mathrm{O}(\eta).$	(11)

It is clear that by letting $\eta\to 0$ we get the correct result, i.e. (7) is equivalent to the Green–Kubo formula (4):

\lim_{\eta\to 0}\frac{1}{\eta}\int_{0}^{+\infty}\mathbb{E}(R(X_{t}^{\eta}))\,% dt=\int_{0}^{+\infty}\mathbb{E}_{\mu}(R(X_{t})S(X_{0}))\,dt.

(12)

We recall that $\mathbb{E}_{\mu}$ denotes the expectation with respect to the reference dynamics started at equilibrium, while $\mathbb{E}$ on the left-hand side denotes the expectation with respect to the reference dynamics initialized as $X_{0}^{\eta}\sim\widetilde{\mu}_{\eta}$ .

The above discussion is an informal presentation of the method, and is done for motivational purposes; see Proposition 1 for the formal meaning of the initial distribution (6), and the rigorous form of the computation (8)–(11).

Estimators of transient dynamics

In practice, numerically estimating (7) requires first approximating the limit with (sufficiently small) finite $\eta$ , truncating the time integral to finite $T$ , and approximating the expectation with an average over $K$ realizations of the dynamics started from i.i.d. initial conditions $X_{0}\sim\widetilde{\mu}_{\eta}$ . This leads to the following estimator for (7):

\widehat{\rho}^{T,K,\eta}_{\rm trans}=\frac{1}{\eta K}\sum_{k=1}^{K}\int_{0}^{% T}R(X_{t}^{\eta,k})\,dt.

(13)

Although these approximations lead to several sources of bias in (13), which are made precise in Section 3.2.2, the primary concern associated with (13) is its very large variance, as we discuss next. This disqualifies it as an appropriate numerical method.

Asymptotic variance of usual transient estimator

The asymptotic variance of the estimator (13) is

\lim_{T\to+\infty}T^{-1}\operatorname{Var}\bigl{(}\widehat{\rho}^{T,K,\eta}_{% \rm trans}\bigr{)}=\frac{2}{K\eta^{2}}\int_{\mathcal{X}}R\left\lparen-\mathcal% {L}^{-1}R\right\rparen\,d\mu.

(14)

It corresponds to the usual asymptotic variance for time averages of ergodic equilibrium dynamics, except for the very large prefactor $1/\eta^{2}$ .

Unlike the usual NEMD or Green–Kubo estimators of transport coefficients discussed in Section 2.2, the variance of (13) is magnified by two distinct contributions. First, as with NEMD, we divide (13) by $\eta\ll 1$ which gives rise to the $\mathrm{O}(\eta^{-2})$ factor. Second, since the estimator is not a time average but a time integral as with GK, the variance also scales linearly in $T$ , as opposed to the typical scaling $\mathrm{O}(1/T)$ for time-averages. This leads to variance of order $\mathrm{O}(T\eta^{-2})$ , much higher than its NEMD and GK counterparts (although if $T=\mathrm{O}(1)$ , the transient method is comparable to NEMD).

This result calls for modifying the estimator with the use of variance reduction techniques, in particular to get rid of the $\eta^{-2}$ contribution. To this end, we consider the use of couplings as a control variate, which are discussed more precisely in the next section.

3 Transient subtraction method

We propose in this section a method called transient subtraction technique, which employs a subtraction technique similar to the one suggested in [ciccotti1975] to the transient dynamics method discussed in Section 2.3 as a means for variance reduction. We first outline in Section 3.1 the construction of the method, then present the numerical analysis of its associated estimators in Section 3.2.

3.1 Constructing the method

In the transient dynamics setting of Section 2.3, one can consider the use of couplings as a control variate approach to construct an estimator with lower variance than (13). To this end, we consider the coupling $(X_{t}^{\eta},Y_{t}^{0})$ , where the processes $X_{t}^{\eta}$ and $Y_{t}^{0}$ are evolved according to the same underlying reference dynamics and have different initial conditions:

\begin{cases}\begin{aligned} dY_{t}^{0}&=b(Y_{t}^{0})\,dt+\sigma\,dW_{t},% \qquad Y_{0}^{0}\sim\mu,\\ dX_{t}^{\eta}&=b(X_{t}^{\eta})\,dt+\sigma\,d\widetilde{W}_{t},\qquad X_{0}^{% \eta}\sim\widetilde{\mu}_{\eta},\end{aligned}\end{cases}

(15)

where $W_{t}$ and $\widetilde{W}_{t}$ are standard $m$ -dimensional Brownian motions. The transport coefficient $\rho$ can then be computed as

\rho=\lim_{\eta\to 0}\frac{1}{\eta}\int_{0}^{+\infty}\mathbb{E}\left\lparen R(% X_{t}^{\eta})-R(Y_{t}^{0})\right\rparen\,dt.

(16)

Note that $\int_{0}^{+\infty}R(Y_{t}^{0})\,dt$ acts as a control variate since $\mathbb{E}(R(Y_{t}^{0}))=0$ for all $t\geqslant 0$ . The expression (16) admits the following natural estimator, carried out with independent initial conditions for the couple $(X_{t}^{\eta,k},Y_{t}^{0,k})_{t\geqslant 0}$ for $1\leqslant k\leqslant K$ and independent realizations of the dynamics (15):

\widehat{\rho}^{T,K,\eta}_{\rm sub}=\frac{1}{\eta K}\sum_{k=1}^{K}\int_{0}^{T}% \left[R(X_{t}^{\eta,k})-R(Y_{t}^{0,k})\right]\,dt.

(17)

A sufficient condition for (17) to have smaller variance than the standard estimator (13) is for the trajectories to start $\eta$ close, and to stay close for times of order $1/\lambda$ , where $\lambda$ is the relaxation rate of the system to the stationary state (see Assumption 3). More precisely,

(A)

The initial distance $|X_{0}^{\eta}-Y_{0}^{0}|$ must be of order $\eta$ ;
(B)

The dynamics must remain $\eta$ close for finite times as the copies of the system evolve, i.e. $|X_{t}^{\eta}-Y_{t}^{0}|$ must be of order $\eta$ for $t\leqslant T$ .

Condition 1 amounts to finding a coupling measure which is concentrated along the diagonal in the $(x,y)$ space, so that initial conditions are $\eta$ close. We emphasize that, although $\widetilde{\mu}_{\eta}$ is by construction a $\mathrm{O}(\eta)$ perturbation of $\mu$ , this is not enough to guarantee that the trajectories start $\eta$ close when the initial conditions are independent, thus we require a coupling on the initial conditions.

We discuss in Section 3.1.1 a natural way of coupling the dynamics (15), and outline sufficient conditions for condition 2 to hold. Then, we formally construct the coupling measure on the initial conditions and discuss its properties in Section 3.1.2.

Remark 1 (Tangent dynamics).

The expression (16) of the transport coefficient can be formulated in terms of tangent dynamics [assaraf2017]. Denote by $\mathcal{T}_{t}\in\mathbb{R}^{d}$ the tangent vector, where

\mathcal{T}_{t}=\lim_{\eta\to 0}\frac{X_{t}^{\eta}-X_{t}^{0}}{\eta}.

(18)

This vector evolves according to a random ordinary differential equation, obtained by linearizing (2). Moreover, (16) can be written as

\rho=\lim_{\eta\to 0}\frac{1}{\eta}\int_{0}^{+\infty}\mathbb{E}\left\lparen R(% X_{t}^{\eta})-R(Y_{t}^{0})\right\rparen\,dt=\int_{0}^{+\infty}\mathbb{E}\left% \lparen\mathcal{T}_{t}\cdot\nabla R(X_{t}^{0})\right\rparen\,dt.

(19)

3.1.1 Synchronous coupling

A natural way to ensure that the dynamics remain close is via synchronous coupling, which amounts to using the same Brownian motion for both processes, i.e. setting $\widetilde{W}=W$ . It is known that synchronous coupling performs well in the presence of global dissipativity. Without it, however, trajectories decouple and we cannot control the coupling distance for long times. For the transient subtraction method, we do not require long-time results, as the relaxation time of the estimator (17) is typically of order $\mathrm{O}(1/\lambda)$ , with $\lambda$ the exponential convergence rate from Assumption 3. Thus, this suggests that synchronous coupling is an admissible control variate as long as the relaxation time is smaller than the decoupling time.

In order to more precisely state some results on the coupling distance, we give a sufficient condition for trajectories to decouple at most exponentially in time in Assumption 1.

Assumption 1.

There exists $B\in\mathbb{R}$ such that the drift $b\colon\mathcal{X}\to\mathbb{R}^{d}$ satisfies

\forall(x,y)\in\mathcal{X}^{2},\qquad\langle x-y,b(x)-b(y)\rangle\leqslant B|x% -y|^{2}.

(20)

A sufficient condition for (20) to be satisfied is when the drift $b$ is globally Lipschitz with constant $\|b\|_{\rm Lip}$ , in which case $B=\|b\|_{\rm Lip}$ . In some fortunate cases where $B<0$ , the drift is globally dissipative. In particular, global dissipativity ensures uniform exponential decay of the coupling distance $|X_{t}^{\eta}-Y_{t}^{0}|$ . We next state a standard result providing an upper bound for how fast trajectories decouple based on the estimate considered in Assumption 1.

Lemma 1.

Suppose that Assumption 1 holds. Then, almost surely,

\forall t\geqslant 0,\qquad|X_{t}^{\eta}-Y_{t}^{0}|\leqslant\mathrm{e}^{tB}|X_% {0}^{\eta}-Y_{0}^{0}|.

(21)

Proof.

In order to bound the distance between the trajectories at time $t$ in terms of the initial distance, we first write, by Itô’s formula

$\displaystyle d\bigl{(}\|X_{t}^{\eta}-Y_{t}^{0}\|^{2}\bigr{)}$	$\displaystyle=2\langle X_{t}^{\eta}-Y_{t}^{0},dX_{t}^{\eta}-dY^{0}_{t}\rangle$	(22)
	$\displaystyle=2\langle X_{t}^{\eta}-Y_{t}^{0},b(X_{t}^{\eta})-b(Y_{t}^{0})% \rangle\,dt$	(23)
	$\displaystyle\leqslant 2B\|X_{0}^{\eta}-Y_{0}^{0}\|^{2}\,dt.$	(24)

Grönwall’s lemma then gives the claimed bound. ∎

3.1.2 Properties of initial conditions

We consider a coupling measure $\mu_{\rm coup}(dx\,dy)$ with marginals $\widetilde{\mu}_{\eta}(dx)$ and $\mu(dy)$ . In order to ensure that the initial conditions are $\eta$ close, the coupling measure must be concentrated along the diagonal (more precisely, within $\eta$ distance from the diagonal), as illustrated in Figure 1. A natural way of achieving this is to formulate $X_{0}^{\eta}$ as a deterministic map of $Y_{0}^{0}$ , i.e. to look for $\Phi_{\eta}\colon\mathcal{X}\to\mathcal{X}$ such that $X_{0}^{\eta}=\Phi_{\eta}(Y_{0}^{0})$ , with $\Phi_{\eta}$ close to the identity function. The function $\Phi_{\eta}$ should be chosen such that

\widetilde{\mu}_{\eta}=\Phi_{\eta}\#\mu=(1+\eta S)\mu+\mathrm{O}(\eta^{2}),

(25)

where $\#$ denotes the image measure of $\mu$ by $\Phi_{\eta}$ : For any bounded measurable test function $\varphi\colon\mathcal{X}\to\mathbb{R}$ ,

\int_{\mathcal{X}}\varphi\,d\widetilde{\mu}_{\eta}=\int_{\mathcal{X}}\varphi% \circ\Phi_{\eta}\,d\mu.

(26)

We look for a map $\Phi_{\eta}$ of the form

\Phi_{\eta}(x)=x+\eta\varphi_{1}(x),

(27)

where $\varphi_{1}$ is determined by (25). It is in fact given by a solution to the partial differential equation (PDE) (43) below, as made precise in Proposition 1. Note that (27) can be formulated as a map higher than first-order in $\eta$ ; see Section 3.2.2 for a discussion of this point.

The transient subtraction technique then amounts to evolving synchronously coupled equilibrium dynamics starting from initial conditions which are deterministically related:

\begin{cases}\begin{aligned} dY_{t}^{0}&=b(Y_{t}^{0})\,dt+\sigma\,dW_{t},% \qquad Y_{0}^{0}\sim\mu,\\ dX_{t}^{\eta}&=b(X_{t}^{\eta})\,dt+\sigma\,dW_{t},\qquad X_{0}^{\eta}=\Phi_{% \eta}(Y_{0}^{0}).\end{aligned}\end{cases}

(28)

We next perform error analysis on the transient subtraction technique estimator (17) for $(X_{t}^{\eta})_{t\geqslant 0}$ and $(Y_{t}^{0})_{t\geqslant 0}$ given by (28).

Refer to caption — Figure 1: Illustration of coupling measure on initial conditions.

3.2 Numerical analysis of the transient subtraction method

In this section, we perform error analysis on the transient subtraction estimator (17) realized with the dynamics (28). We start by making precise the functional setting and stating some estimates in Section 3.2.1. We then make precise in Section 3.2.2 the bounds on the bias, and finally discuss its variance in Section 3.2.3.

3.2.1 Functional estimates

Consider a family of Lyapunov functions $(\mathcal{K}_{n})_{n\in\mathbb{N}}$ with $\mathcal{K}_{n}\colon\mathcal{X}\to[1,+\infty)$ such that

\forall n\in\mathbb{N},\qquad\mathcal{K}_{n}\leqslant\mathcal{K}_{n+1}.

(29)

The associated weighted $B^{\infty}$ spaces are

B_{n}^{\infty}=\left\{\varphi\,\rm{measurable}\;\middle|\;\|\varphi\|_{B_{n}^{% \infty}}:=\sup_{x\in\mathcal{X}}\left|\frac{\varphi(x)}{\mathcal{K}_{n}(x)}% \right|<+\infty\right\}.

(30)

We next introduce the space $\mathscr{S}$ of smooth functions $\varphi$ belonging to the space $B^{\infty}_{n}$ for some $n$ , and whose derivatives also belong to $B^{\infty}_{m}$ for some $m$ :

\mathscr{S}=\left\{\varphi\in C^{\infty}(\mathcal{X})\;\middle|\;\forall k\in% \mathbb{N}^{d},\;\exists n\in\mathbb{N},\;\partial^{k}\varphi\in B_{n}^{\infty% }\right\}.

(31)

We finally define the subspace $\mathscr{S}_{0}$ of functions in $\mathscr{S}$ with average 0 with respect to $\mu$ .

We make the following assumptions on the Lyapunov functions.

Assumption 2 (Lyapunov estimates).

There exist $n\in\mathbb{N}$ and $C_{n}\in\mathbb{R}^{+}$ such that

|x|\leqslant C_{n}\mathcal{K}_{n}(x).

(32)

Furthermore, for any $n\in\mathbb{N}$ ,

\lVert\mathcal{K}_{n}\rVert_{L^{1}(\mu)}<+\infty.

(33)

Moreover, we assume that the Lyapunov functions are stable by products: for any $n,n^{\prime}\in\mathbb{N}$ , there exist $m\in\mathbb{N}$ and $C_{n,n^{\prime}}\in\mathbb{R}^{+}$ such that

\left\lparen\mathcal{K}_{n}\mathcal{K}_{n^{\prime}}\right\rparen(x)\leqslant C% ^{\prime}_{n,n^{\prime}}\mathcal{K}_{m}(x).

(34)

We also assume stability by compositions: for any $n,n^{\prime}\in\mathbb{N}$ and $\alpha^{*}\in\mathbb{R}^{+}$ , there exist $m\in\mathbb{N}$ and $C_{n,n^{\prime},\alpha^{*}}\in\mathbb{R}^{+}$ such that

\forall\alpha\in[0,\alpha^{*}],\qquad\mathcal{K}_{n}(\alpha\mathcal{K}_{n^{% \prime}}(x))\leqslant C_{n,n^{\prime},\alpha^{*}}\mathcal{K}_{m}(x).

(35)

Lastly, we assume that $\mathcal{K}_{n}$ is nondecreasing:

\left\lparen\forall i=1,\dotsc,d,\quad|y_{i}|\leqslant|z_{i}|\right\rparen% \implies\mathcal{K}_{n}(y)\leqslant\mathcal{K}_{n}(z).

(36)

A useful corollary of Assumption 2, which we will use in our estimates, is the following: for any $f\in B^{\infty}_{n}$ and $g=(g_{1},\dotsc,g_{d})$ with $g_{i}\in B^{\infty}_{n^{\prime}}$ ,

\displaystyle|f\circ g|(x)\leqslant\|f\|_{B_{n}^{\infty}}\mathcal{K}_{n}\circ g% (x)\leqslant\|f\|_{B_{n}^{\infty}}\mathcal{K}_{n}\left\lparen\|g\|_{B^{\infty}% _{n^{\prime}}}\mathcal{K}_{n^{\prime}}(x)\right\rparen\leqslant\lVert f\rVert_% {B^{\infty}_{n}}K_{n,n^{\prime},\|g\|_{B^{\infty}_{n^{\prime}}}}\mathcal{K}_{m% }(x),

(37)

with $m$ depending on $n$ and $n^{\prime}$ .

A typical choice for $\mathcal{K}_{n}$ are polynomial Lyapunov functions of the form $\mathcal{K}_{n}(x)=1+|x|^{n}$ . This is a standard choice for Langevin dynamics; see [mattingly2002, talay2002]. This choice satisfies Assumption 2 when $\mu$ has moments of all orders, which is a mild requirement.

We also make an assumption on the convergence of the semigroup $\mathrm{e}^{t\mathcal{L}}$ in weighted $B^{\infty}$ spaces. To this end, we introduce the space $B^{\infty}_{n}$ of functions with average 0 with respect to $\mu$ :

B_{n,0}^{\infty}=\left\{\varphi\in B^{\infty}_{n}\;\middle|\;\int_{\mathcal{X}% }\varphi\,d\mu=0\right\}.

(38)

Assumption 3 (Decay estimates on semigroup operator).

For any $n\in\mathbb{N}$ , there exist $L_{n}\in\mathbb{R}^{+}$ and $\lambda_{n}>0$ such that

\forall\varphi\in B_{n,0}^{\infty},\qquad\|\mathrm{e}^{t\mathcal{L}}\varphi\|_% {B^{\infty}_{n}}\leqslant L_{n}\mathrm{e}^{-\lambda_{n}t}\lVert\varphi\rVert_{% B^{\infty}_{n}}.

(39)

As a direct corollary of Assumption 3, the operator $\mathcal{L}$ is invertible on $B_{n,0}^{\infty}$ , with

\mathcal{L}^{-1}=-\int_{0}^{+\infty}\mathrm{e}^{t\mathcal{L}}\,dt.

(40)

Moreover, the following bound holds

\lVert\mathcal{L}^{-1}\rVert_{B^{\infty}_{n}}\leqslant\frac{L_{n}}{\lambda_{n}}.

(41)

We refer for instance to [lelievre2016, Section 2] for a discussion on sufficient conditions for Assumption 3 to hold (based on [reybellet2006, hairer2011]), and for the proof of (40).

3.2.2 Analysis of the bias

There are several sources of bias arising from the estimator (17), such as the time truncation and time discretization bias when considering numerical schemes to integrate the dynamics. Quantifying such biases is standard practice for estimators of this form. Additionally, there is a bias arising from the finiteness of $\eta$ , which is the main result of this section. This is made precise in Corollary 1, which builds upon the estimates on the coupling measures provided by Proposition 1 below.

To state the result, we denote by $\mathcal{A}^{*}$ the adjoint of a closed operator $\mathcal{A}$ on $L^{2}(\mu)$ : for any test functions $\varphi,\phi\in C^{\infty}$ with compact support,

\int_{\mathcal{X}}(\mathcal{A}\varphi)\phi\,d\mu=\int_{\mathcal{X}}\varphi(% \mathcal{A}^{*}\phi)\,d\mu.

(42)

Proposition 1 (Finite $\eta$ bias).

Suppose that Assumption 2 holds true and that, for $S\in\mathscr{S}_{0}$ , there exist solutions $\varphi_{1}=(\varphi_{1,x_{1}},\dotsc,\varphi_{1,x_{d}})\in(B^{\infty}_{n})^{d}$ and $\varphi_{2}=(\varphi_{2,x_{1}},\dotsc,\varphi_{2,x_{d}})\in(B^{\infty}_{n})^{d}$ for some $n\in\mathbb{N}$ to the equations

\nabla^{*}\varphi_{1}=\sum_{i=1}^{d}\partial_{x_{i}}^{*}\varphi_{1,x_{i}}=S,

(43)

and

\nabla^{*}\varphi_{2}=-\frac{1}{2}\sum_{i,j=1}^{d}\partial_{x_{i}}^{*}\partial% _{x_{j}}^{*}(\varphi_{1,x_{i}}\varphi_{1,x_{j}})=-\frac{1}{2}(\nabla^{*})^{2}% \colon\varphi_{1}\otimes\varphi_{1}.

(44)

Fix $\eta_{*}>0$ , and $f\in\mathscr{S}$ . Then, there exists $K_{f,\eta_{*}}\in\mathbb{R}_{+}$ (which depends on $f$ and $\eta_{*}$ ) such that, for any $|\eta|\leqslant\eta_{*}$ ,

\qquad\left|\int_{\mathcal{X}}f\circ\Phi^{\alpha}_{\eta}\,d\mu-\int_{\mathcal{% X}}f\,d\mu-\eta\int_{\mathcal{X}}fS\,d\mu\right|\leqslant\eta^{\alpha+1}% \mathcal{C}_{f,\eta_{*}},

(45)

with $\alpha=1$ or $\alpha=2$ , and

\begin{cases}\begin{aligned} &\Phi_{\eta}^{1}(x)=x+\eta\varphi_{1}(x),\\ &\Phi_{\eta}^{2}(x)=x+\eta\varphi_{1}(x)+\eta^{2}\varphi_{2}(x).\end{aligned}% \end{cases}

(46)

Remark 2.

The smoothness condition on $f$ can be weakened, as it would be enough to have derivatives of $f$ up to order 3 in $B^{\infty}_{m}$ . In order to simplify the presentation of the result, however, we suppose $f\in\mathscr{S}$ here.

This result states that the finite $\eta$ bias in the linear response is of order $\mathrm{O}(\eta^{\alpha})$ for a map $\Phi_{\eta}$ which includes well-chosen terms up to order $\mathrm{O}(\eta^{\alpha})$ . It is of course possible to construct higher-order corrections in order to further decrease the bias. The associated PDEs for the corresponding $\varphi_{i}$ terms, however, become increasingly cumbersome to solve, rendering it an impractical approach. In any case, a second-order map leads to an estimator with $\mathrm{O}(\eta^{2})$ bias, which is sufficiently small in general.

Remark 3 (Well-posedness of PDEs).

Although (43) might look difficult to solve, one can show that it admits gradient solutions of the form $\varphi_{1}=\nabla\psi$ under some conditions on $\mu$ . Indeed, (43) can then be written as

\nabla^{*}\nabla\psi=S,

(47)

which has a unique solution when $\nabla^{*}\nabla$ has a spectral gap when considered as an operator on $L^{2}(\mu)$ (implied by $\mu$ satisfying a Poincaré inequality).Note that the solutions $\varphi_{1}$ and $\varphi_{2}$ are defined up to an element of the kernel of $\nabla^{*}$ , i.e. if $\nabla\psi$ is a solution to (43), then $\varphi_{1}=\nabla\psi+g$ with $\nabla^{*}g=0$ also an admissible solution.

Proof of Proposition 1.

It suffices to prove the result for $\alpha=2$ , from which the result for $\alpha=1$ can be trivially deduced. By a Taylor expansion of $f(\Phi_{\eta}(x))$ ,

$\displaystyle f(\Phi_{\eta}(x))=f(x)$	$\displaystyle+\eta\nabla f(x)^{\mathsf{T}}(\varphi_{1}(x)+\eta\varphi_{2}(x))$	(48)
	$\displaystyle+\frac{\eta^{2}}{2}(\varphi_{1}(x)+\eta\varphi_{2}(x))^{\mathsf{T% }}\nabla^{2}f(x)(\varphi_{1}(x)+\eta\varphi_{2}(x))$	(49)
	$\displaystyle+\frac{\eta^{3}}{6}\nabla^{3}f\left(\Theta_{\eta}(x)\right)\cdot(% \varphi_{1}(x)+\eta\varphi_{2}(x))^{\otimes 3},$	(50)

where $\nabla^{3}f$ denotes the third-order derivative tensor (and $\nabla^{2}f$ denotes the Hessian), and $\Theta_{\eta}(x)$ interpolates between $x$ and $\Phi_{\eta}(x)$ :

\Theta_{\eta}(x)=(1-\theta_{\eta}(x))x+\theta_{\eta}(x)\Phi_{\eta}(x),\qquad% \theta_{\eta}(x)\in[0,1].

(51)

Integrating the Taylor expansion above yields

\displaystyle\int_{\mathcal{X}}f\circ\Phi_{\eta}\,d\mu

\displaystyle=\int_{\mathcal{X}}f\,d\mu+\eta\int_{\mathcal{X}}\nabla f^{% \mathsf{T}}\varphi_{1}\,d\mu+\eta^{2}\int_{\mathcal{X}}\left\lparen\varphi_{2}% ^{\mathsf{T}}\nabla f+\frac{1}{2}\varphi_{1}^{\mathsf{T}}(\nabla^{2}f)\varphi_% {1}\right\rparen\,d\mu+\eta^{3}\mathcal{R}_{3,\eta},

(52)

with $\mathcal{R}_{3,\eta}$ given by

\mathcal{R}_{3,\eta}=\int_{\mathcal{X}}\left[\frac{1}{6}\nabla^{3}f\left(% \Theta_{\eta}\right)\cdot(\varphi_{1}+\eta\varphi_{2})^{\otimes 3}+\varphi_{1}% ^{\mathsf{T}}(\nabla^{2}f)\varphi_{2}+\frac{\eta}{2}\varphi_{2}^{\mathsf{T}}(% \nabla^{2}f)\varphi_{2}\right]\,d\mu.

(53)

We next show that the remainder term $\mathcal{R}_{3,\eta}$ is uniformly bounded. We start with the first right-hand side integrand term in (53). Since $\varphi_{1},\varphi_{2}$ have components in $B^{\infty}_{n}$ for some $n\in\mathbb{N}$ , we deduce that so does $\Phi_{\eta}\in B^{\infty}_{n}$ (since the identity is in $B^{\infty}_{n}$ upon possibly increasing $n$ in view of (32)), and thus also $\Theta_{\eta}$ . Therefore, since $f\in\mathscr{S}$ and in view of (35) and (37), there exist $m,n^{\prime}\in\mathbb{N}$ and $C\in\mathbb{R}^{+}$ such that, for all $x\in\mathcal{X}$ and all $\eta\in[-\eta_{*},\eta_{*}]$ ,

\displaystyle\left\lvert\nabla^{3}f\left\lparen\Theta_{\eta}(x)\right\rparen% \right\rvert\leqslant C\smash{\sum_{i,j,k=1}^{d}}\left\lVert\partial_{x_{i},x_% {j},x_{k}}^{3}f\right\rVert_{B_{m}^{\infty}}\mathcal{K}_{n^{\prime}}(x),

(54)

so that

		$\displaystyle\left\|\int_{\mathcal{X}}\nabla^{3}f\left\lparen\Theta_{\eta}(x)% \right\rparen\cdot(\varphi_{1}(x)+\eta\varphi_{2}(x))^{\otimes 3}\,d\mu\right\|$		(55)
		$\displaystyle\qquad\leqslant C\smash{\sum_{i,j,k=1}^{d}}\left\lVert\partial_{x% _{i},x_{j},x_{k}}^{3}f\right\rVert_{B_{m}^{\infty}}\left\lparen\lVert\varphi_{% 1}\rVert_{B_{n}^{\infty}}+\eta\lVert\varphi_{2}\rVert_{B_{n}^{\infty}}\right% \rparen^{3}\int_{\mathcal{X}}\mathcal{K}_{n}^{3}\mathcal{K}_{n^{\prime}}\,d\mu,$		(56)

where (35) ensures uniformity in $\eta$ for the bounds above. By Assumption 2, there exist $C\in\mathbb{R}^{+}$ and $\ell\in\mathbb{N}$ such that $\mathcal{K}_{n}^{3}\mathcal{K}_{m}\leqslant C\mathcal{K}_{\ell}$ . Since $\mathcal{K}_{\ell}\in L^{1}(\mu)$ by (33), we conclude that (56) is uniformly bounded for $|\eta|\leqslant\eta_{*}$ . A similar computation shows that the remaining two integrand terms in (53) are also uniformly bounded.

To prove (45), it remains to show that the first and second-order terms in $\eta$ in (45) and (52) coincide, which is by design of $\varphi_{1}$ and $\varphi_{2}$ . Indeed, by taking $L^{2}(\mu)$ -adjoints and in view of condition (43) on $\varphi_{1}$ ,

\int_{\mathcal{X}}\nabla f^{\mathsf{T}}\varphi_{1}\,d\mu=\int_{\mathcal{X}}f% \nabla^{*}\varphi_{1}\,d\mu=\int_{\mathcal{X}}fS\,d\mu.

(57)

Similarly, applying $L^{2}(\mu)$ -adjoints to the second-order $\eta$ terms in (52) yields

\displaystyle\int_{\mathcal{X}}\left\lparen\varphi_{2}^{\mathsf{T}}\nabla f+% \frac{1}{2}\varphi_{1}^{\mathsf{T}}(\nabla^{2}f)\varphi_{1}\right\rparen\,d\mu% =\int_{\mathcal{X}}\left\lparen f\nabla^{*}\varphi_{2}+\frac{1}{2}(\nabla^{*})% ^{2}\colon\varphi_{1}\otimes\varphi_{1}\right\rparen\,d\mu,

(58)

which vanishes when $\varphi_{2}$ satisfies (44). This allows us to conclude the proof. ∎

The above result allows us to quantify the bias, as made precise in the corollary below. Before stating it, we require an additional assumption.

Assumption 4.

The generator $\mathcal{L}$ is invertible on $\mathscr{S}_{0}$ . In other words, for any $\phi\in\mathscr{S}_{0}$ , there exists a unique solution $\Psi\in\mathscr{S}_{0}$ to the Poisson equation $-\mathcal{L}\Psi=\phi$ .

Assumption 4 can be shown to hold for overdamped and underdamped Langevin dynamics under certain conditions on the potential $V$ [talay2002, kopec2014, kopec2015], and is a standard result in the literature.

Applying Proposition 1 together with the decay estimates from Assumption 3, as well as Assumption 4, to the transient subtraction estimator (17) yields the following result on the bias of the estimator (17).

Corollary 1.

Under the assumptions of Proposition 1 as well as Assumptions 3 and 4, there exists $C\in\mathbb{R}^{+}$ such that, for any $T>0$ and $\eta\in[-\eta_{*},\eta_{*}]\setminus\{0\}$ ,

\left|\mathbb{E}\left\lparen\widehat{\rho}^{T,K,\eta,\alpha}_{\rm sub}\right% \rparen-\rho\right|\leqslant C\left\lparen\eta^{\alpha}+\frac{\mathrm{e}^{-% \lambda T}}{\eta}\right\rparen,

(59)

where $\widehat{\rho}^{T,K,\eta,\alpha}_{\rm sub}$ is defined as (17) with the dynamics (28) and $\Phi_{\eta}^{\alpha}$ given by (46).

This result, obtained as a direct consequence of Proposition 1, shows that the bias of the transient (subtraction) technique has two distinct contributions: an exponentially decaying bias term arising from the time truncation, as in the Green–Kubo method (however magnified by a $\eta^{-1}$ factor); and a bias of order $\eta^{\alpha}$ due to the finiteness of $\eta$ , corresponding to deviations from the linear regime.

Remark 4.

Although the constant $C$ does not depend on $T$ (which suggests taking $T$ as large as possible to minimize the truncation bias contribution), the variance depends on $T$ (see Proposition 2). This calls for equilibrating between the two in order to have the smallest overall error.

Proof of Corollary 1.

Fix $\eta\in[-\eta_{*},\eta_{*}]\setminus\{0\}$ . Since $R$ has average 0 with respect to $\mu$ , it holds that

	$\displaystyle\left\|\mathbb{E}\left\lparen\widehat{\rho}^{T,K,\eta,\alpha}_{\rm sub% }\right\rparen-\rho\right\|$	$\displaystyle=\frac{1}{\eta}\left\lvert\int_{0}^{+\infty}\mathbb{E}\bigl{(}R(X% ^{\eta}_{t})-R(Y^{0}_{t})\bigr{)}\,dt-\int_{T}^{+\infty}\mathbb{E}\bigl{(}R(X^% {\eta}_{t})-R(Y^{0}_{t})\bigr{)}\,dt-\eta\rho\right\rvert$		(60)
		$\displaystyle\leqslant\frac{1}{\eta}\left\lvert\int_{0}^{+\infty}\mathbb{E}% \bigl{(}R(X^{\eta}_{t})\bigr{)}\,dt-\eta\rho\right\rvert+\frac{1}{\eta}\left% \lvert\int_{T}^{+\infty}\mathbb{E}\bigl{(}R(X^{\eta}_{t})\bigr{)}\,dt\right\rvert,$		(61)

We first consider the second term in (61). By the semigroup definition of the expectation and the fact that $\widetilde{\mu}_{\eta}=\Phi_{\eta}\#\mu$ , it holds that

\displaystyle\left\lvert\int_{T}^{+\infty}\mathbb{E}\bigl{(}R(X^{\eta}_{t})% \bigr{)}\,dt\right\rvert

\displaystyle=\left\lvert\int_{T}^{+\infty}\int_{\mathcal{X}}\bigl{(}\mathrm{e% }^{t\mathcal{L}}R\bigr{)}\circ\Phi_{\eta}\,d\mu\,dt\right\rvert.

(62)

To bound the above quantity, we apply the semigroup decay estimate (39) and use (37):

$\displaystyle\left\lvert\int_{T}^{+\infty}\mathbb{E}\bigl{(}R(X^{\eta}_{t})% \bigr{)}\,dt\right\rvert$	$\displaystyle\leqslant\int_{T}^{+\infty}\int_{\mathcal{X}}\left\lvert\bigl{(}% \mathrm{e}^{t\mathcal{L}}R\bigr{)}\circ\Phi_{\eta}\right\rvert\,d\mu\,dt$	(63)
	$\displaystyle\leqslant\int_{T}^{+\infty}\int_{\mathcal{X}}\left\lVert\mathrm{e% }^{t\mathcal{L}}R\right\rVert_{B^{\infty}_{n}}\mathcal{K}_{n}\circ\Phi_{\eta}% \,d\mu\,dt$	(64)
	$\displaystyle\leqslant\int_{T}^{+\infty}\left\lVert\mathrm{e}^{t\mathcal{L}}R% \right\rVert_{B^{\infty}_{n}}\int_{\mathcal{X}}\mathcal{K}_{n}\left\lparen% \lVert\Phi_{\eta}\rVert_{B^{\infty}_{n^{\prime}}}\mathcal{K}_{n^{\prime}}% \right\rparen\,d\mu\,dt$	(65)
	$\displaystyle\leqslant\lVert R\rVert_{B^{\infty}_{n}}\int_{T}^{+\infty}L_{n}C_% {n,n^{\prime},\eta^{*}}\mathrm{e}^{-\lambda_{n}t}\left\lparen\int_{\mathcal{X}% }\mathcal{K}_{m}\,d\mu\right\rparen\,dt$	(66)
	$\displaystyle\leqslant\widetilde{C}_{m,n,n^{\prime},\eta^{}}\int_{T}^{+\infty% }\mathrm{e}^{-\lambda_{n}t}\,dt=\frac{\widetilde{C}_{m,n,n^{\prime},\eta^{}}% \mathrm{e}^{-\lambda_{n}T}}{\lambda_{n}}.$	(67)

We now consider the first term on the right-hand side of (61). Once again applying the semigroup definition of the expectation as well as the operator identity (40), it holds that

\displaystyle\int_{0}^{+\infty}\mathbb{E}\bigl{(}R(X^{\eta}_{t})\bigr{)}\,dt=% \int_{\mathcal{X}}\int_{0}^{+\infty}\bigl{(}\mathrm{e}^{t\mathcal{L}}R\bigr{)}% \circ\Phi_{\eta}\,dt\,d\mu=\int_{\mathcal{X}}\bigl{(}-\mathcal{L}^{-1}R\bigr{)% }\circ\Phi_{\eta}\,d\mu.

(68)

Writing the expectation in terms of the semigroup, and in view of the operator identity (40), we write the Green–Kubo formula (4) as

\displaystyle\rho=\int_{0}^{+\infty}\mathbb{E}_{\mu}\bigl{(}R(Y_{t}^{0})S(Y_{0% }^{0})\bigr{)}\,dt=\int_{\mathcal{X}}(-\mathcal{L}^{-1}R)S\,d\mu.

(69)

Thus, by Proposition 1 with $f=-\mathcal{L}^{-1}R\in\mathscr{S}_{0}$ , it follows that

\left\lvert\frac{1}{\eta}\int_{0}^{+\infty}\mathbb{E}\bigl{(}R(X^{\eta}_{t})% \bigr{)}\,dt-\rho\right\rvert\leqslant\eta^{\alpha}\mathcal{C}_{-\mathcal{L}^{% -1}R,\eta_{*}}.

(70)

This allows us to obtain the desired result. ∎

3.2.3 Analysis of the variance

We state in this section some results on the scaling of the variance of the estimator (17) used with the dynamics (28).

Proposition 2 (Variance of transient subtraction estimator).

Suppose that Assumption 1 holds and that $R$ is globally Lipschitz with Lipschitz constant $\|R\|_{\rm Lip}$ . Then, for any $T>0$ ,

\operatorname{Var}\left\lparen\widehat{\rho}^{T,K,\eta,\alpha}_{\rm sub}\right% \rparen\leqslant\frac{\|R\|_{\rm Lip}^{2}}{K}\frac{\mathbb{E}\left[|X_{0}^{% \eta}-Y_{0}^{0}|^{2}\right]}{\eta^{2}}\left\lparen\int_{0}^{T}\mathrm{e}^{tB}% \,dt\right\rparen^{2}.

(71)

This result suggests that the variance grows at most exponentially fast in time, which is the case when $B>0$ . For dissipative drifts, i.e. $B<0$ , the variance is uniformly bounded in time by $B^{-2}$ . Lastly, for $B=0$ the variance grows linearly in $T$ .

Note that the variance is uniformly bounded in $\eta$ . In particular, since the functions $\varphi_{1},\varphi_{2}$ (defined in (43) and (44), respectively, and assumed to be in $B^{\infty}_{n}$ for some $n\in\mathbb{N}$ ) belong to $L^{2}(\mu)$ by (33)–(34), it holds that

\frac{\mathbb{E}\left[\lvert X_{0}^{\eta}-Y_{0}^{0}\rvert^{2}\right]}{\eta^{2}% }=\begin{cases}\lVert\varphi_{1}\rVert^{2}_{L^{2}(\mu)}&\text{if }\alpha=1,\\ \lVert\varphi_{1}+\eta\varphi_{2}\rVert^{2}_{L^{2}(\mu)}&\text{if }\alpha=2.% \end{cases}

(72)

Plugging (72) into (71) immediately implies a bound on the variance uniform in the perturbation parameter $|\eta|\leqslant\eta_{*}$ .

For the result stated in Proposition 2 , we do not consider the asymptotic variance as for the transient estimator in Section 2.3, as we cannot observe the $T\to+\infty$ limit due to the couplings we consider. As discussed in Section 3.1.1, the dynamics will decouple at large times, leading to a substantial increase in variance. We thus need to truncate the integration time $T$ .

Remark 5 (Comparison with transient method).

Variance reduction is obtainable for synchronous coupling even without strong conditions on the drift of (2). In particular, when we have no dissipativity (i.e. $B>0$ ), the subtraction technique is better than the usual transient method discussed in Section 2.3 provided that $\mathrm{e}^{2BT}\ll T/\eta^{2}$ , i.e. $T\ll-\log(\eta)/B$ . As one would typically consider $\eta\ll 1$ , this suggests that the subtraction technique should therefore be preferred.

Proof of Proposition 2.

It suffices to consider the estimator (17) for $K=1$ , since $\operatorname{Var}(\widehat{\rho}^{T,K,\eta,\alpha}_{\rm sub})=K^{-1}% \operatorname{Var}(\widehat{\rho}^{T,1,\eta,\alpha}_{\rm sub})$ . To simplify the notation, we write $\widehat{\rho}^{T,\eta,\alpha}_{\rm sub}$ instead of $\widehat{\rho}^{T,1,\eta,\alpha}_{\rm sub}$ :

\widehat{\rho}^{T,\eta,\alpha}_{\rm sub}=\frac{1}{\eta}\int_{0}^{T}(R(X_{t}^{% \eta})-R(Y_{t}^{0}))\,dt.

(73)

Since $R$ is Lipschitz, and using Lemma 1 to bound the coupling distance $|X_{t}^{\eta}-Y_{t}^{0}|$ in terms of the initial distance $|X_{0}^{\eta}-Y_{0}^{0}|$ , we have

\displaystyle\left\lvert\widehat{\rho}^{T,\eta,\alpha}_{\rm sub}\right\rvert% \leqslant\|R\|_{\rm Lip}\int_{0}^{T}\frac{|X_{t}^{\eta}-Y_{t}^{0}|}{\eta}\,dt% \leqslant\|R\|_{\rm Lip}\int_{0}^{T}\frac{\mathrm{e}^{tB}|X_{0}^{\eta}-Y_{0}^{% 0}|}{\eta}\,dt.

(74)

We bound the variance as

\displaystyle\operatorname{Var}\left\lparen\widehat{\rho}^{T,\eta,\alpha}_{\rm sub% }\right\rparen\leqslant\mathbb{E}\left[\left\lvert\widehat{\rho}^{T,\eta,% \alpha}_{\rm sub}\right\rvert^{2}\right]=\|R\|_{\rm Lip}^{2}\mathbb{E}\left[% \left\lvert\int_{0}^{T}\frac{\mathrm{e}^{tB}|X_{0}^{\eta}-Y_{0}^{0}|}{\eta}\,% dt\right\rvert^{2}\right],

(75)

which leads to the desired result. ∎

4 Application to Langevin dynamics

To illustrate the theoretical results obtained in Section 3, we apply the transient subtraction technique to compute the mobility and shear viscosity for a Lennard–Jones fluid, and to a low-dimensional example, with the Langevin dynamics (1) serving as the underlying dynamics for all cases. We present our numerical results in three parts:

•

In Section 4.1, we formulate the transient subtraction technique for Langevin dynamics by making precise $\varphi_{1}$ and $\varphi_{2}$ for the conjugate responses $S$ of interest.
•

In Section 4.2, we numerically illustrate the finite $\eta$ bias results from Corollary 1, which apply (45) to the subtraction estimator (17). In particular, we demonstrate the bias scaling for first and second-order maps $\Phi_{\eta}^{\alpha}$ . This is done with the one-dimensional Langevin dynamics, which allows to directly compute (45), at the level of operators, by discretizing the associated PDE. This enables a clear and effective presentation of the result, which would have otherwise been challenging to achieve with usual stochastic approaches.
•

Finally, we compute in Section 4.3 the mobility and shear viscosity for a Lennard–Jones fluid. This aim is to demonstrate the usefulness and viability of the method in more practical, high-dimensional molecular dynamics settings, particularly by highlighting its variance reduction capabilities.

4.1 Transient methods for Langevin dynamics

Although our transient method only considers equilibrium dynamics, it encodes the relevant nonequilibrium information through the conjugate response function $S$ , which is the key quantity allowing to obtain the transport coefficient. This is expressed through the first-order perturbation PDE (43), whose solution depends on $S$ . To define it, let $F(q)\in\mathbb{R}^{d}$ represent an external forcing, chosen appropriately based on the transport coefficient in consideration. Particular choices for $F(q)$ are made precise for each scenario we consider in Section 4.3. For all such scenarios, the associated conjugate response function $S$ is given by

S(q,p)=\beta F(q)^{\mathsf{T}}M^{-1}p.

(76)

We remark that the formal definition of $S$ is based on the associated nonequilibrium dynamics, and relies on linear response theory to be rigorously derived. In the interest of clarity, we do not provide such a rigorous discussion, and instead refer the reader to [lelievre2016, Section 5.2.3] for a comprehensive discussion.

Having identified the appropriate conjugate response function $S$ , one can now construct the map $\Phi_{\eta}^{\alpha}$ , for $\alpha=1,2$ , by solving the associated PDEs (43) and (44).

First-order map $\varphi_{1}$

For convenience, let us first recall the expression for (43):

\nabla^{*}\varphi_{1}=\sum_{i=1}^{d}\partial_{x_{i}}^{*}\varphi_{1,x_{i}}=S.

(77)

For Langevin dynamics, it is natural to consider the position and momentum components of $\varphi_{1}$ by writing $\varphi_{1}(q,p)=(\varphi_{1,q}(q,p),\varphi_{1,p}(q,p))$ , so that we can write $\nabla^{*}\varphi_{1}=\nabla^{*}_{q}\varphi_{1,q}+\nabla^{*}_{p}\varphi_{1,p}$ . More precisely, the action of the adjoint operators are given by

\partial^{*}_{q_{i}}=-\partial_{q_{i}}+\beta\partial_{q_{i}}V,\qquad\partial^{% *}_{p_{i}}=-\partial_{p_{i}}+\beta(M^{-1}p)_{i},

(78)

which can be obtained via integration by parts as in (42). In view of (78) and (76), we can write (43) more explicitly for Langevin dynamics as

-\operatorname{div}_{q}(\varphi_{1,q})-\operatorname{div}_{p}(\varphi_{1,p})+% \beta\nabla V^{\mathsf{T}}\varphi_{1,q}+\beta p^{\mathsf{T}}M^{-1}\varphi_{1,p% }=\beta F(q)^{\mathsf{T}}M^{-1}p.

(79)

Therefore, a natural solution for (43) in any dimension is

\varphi_{1}(q,p)=\begin{pmatrix}\varphi_{1,q}(q,p)\\ \varphi_{1,p}(q,p)\end{pmatrix}=\begin{pmatrix}0\\ F(q)\end{pmatrix},

(80)

and the transformation $\Phi^{1}_{\eta}$ is then given by

\Phi_{\eta}^{1}(q,p)=\begin{pmatrix}q\\ p+\eta F(q)\end{pmatrix}.

(81)

Thus, constructing the initial conditions for a first-order transient trajectory simply amounts to shifting the initial momentum $p_{0}^{0}$ of some associated stationary equilibrium process by $\eta F(q_{0}^{0})$ .

Second-order map $\varphi_{2}$

Constructing the second-order map amounts to finding $\varphi_{2}$ by solving (44), which we recall for convenience:

\nabla^{*}\varphi_{2}=-\frac{1}{2}\sum_{i,j=1}^{d}\partial_{x_{i}}^{*}\partial% _{x_{j}}^{*}(\varphi_{2,x_{i}}\varphi_{2,x_{j}})=-\frac{1}{2}(\nabla^{*})^{2}% \colon\varphi_{1}\otimes\varphi_{1}.

(82)

Substituting the solution (80) for $\varphi_{1}$ in (44) leads to

\displaystyle\nabla^{*}\varphi_{2}

\displaystyle=-\frac{1}{2}(\nabla^{*})^{2}\colon\begin{pmatrix}0\\ F\end{pmatrix}\otimes\begin{pmatrix}0\\ F\end{pmatrix}\equiv-\frac{1}{2}(\nabla_{p}^{*})^{2}\colon F\otimes F.

(83)

Thus, as in the first-order case, one can choose $\varphi_{2,q}=0$ so that $\varphi_{2}=(0,\varphi_{2,p}(q,p))$ . Next, recalling that $\partial^{*}_{p_{i}}=-\partial_{p_{i}}-\beta(M^{-1}p)_{i}$ ,

$\displaystyle-\frac{1}{2}(\nabla_{p}^{*})^{2}\colon F\otimes F$	$\displaystyle=-\frac{1}{2}\sum_{i,j=1}^{d}\partial_{p_{i}}^{}\partial_{p_{j}}% ^{}\left\lparen F_{i}F_{j}\right\rparen$	(84)
	$\displaystyle=-\frac{\beta}{2}\sum_{i,j=1}^{d}\left[-\partial_{p_{i}}+\beta(M^% {-1}p)_{i}\right](M^{-1}p)_{j}F_{i}F_{j}$	(85)
	$\displaystyle=-\frac{\beta^{2}}{2}\sum_{i,j=1}^{d}(M^{-1}p)_{i}(M^{-1}p)_{j}F_% {i}F_{j}+\frac{\beta}{2}\sum_{i,j=1}^{d}\partial_{p_{i}}(M^{-1}p)_{j}F_{i}F_{j}$	(86)
	$\displaystyle=-\frac{\beta^{2}}{2}\sum_{i,j=1}^{d}(M^{-1}p)_{i}(M^{-1}p)_{j}F_% {i}F_{j}+\frac{\beta}{2}\sum_{i,j=1}^{d}M^{-1}_{j,i}F_{i}F_{j}$	(87)
	$\displaystyle=-\frac{1}{2}\left\lparen\beta p^{\mathsf{T}}M^{-1}F\right\rparen% ^{2}+\frac{1}{2}\beta F^{\mathsf{T}}M^{-1}F.$	(88)

Thus, a possible solution for the second-order term $\varphi_{2}$ is

\varphi_{2}(q,p)=\begin{pmatrix}0\\ -\dfrac{\beta F(q)^{\mathsf{T}}M^{-1}p}{2}F(q)\end{pmatrix}=\begin{pmatrix}0\\ -\dfrac{1}{2}S(q,p)F(q)\end{pmatrix}.

(89)

This leads to the second-order transformation $\Phi_{\eta}^{\alpha}$

\Phi_{\eta}^{2}(q,p)=\begin{pmatrix}q\\ p+\eta F(q)-\dfrac{\eta^{2}}{2}F(q)S(q,p)\end{pmatrix}.

(90)

4.2 One-dimensional Langevin dynamics

We next present some numerical results showcasing the scaling of the finite $\eta$ bias for the first and second-order transformations $\Phi_{\eta}^{\alpha}$ derived in Section 3.1. As stated in Corollary 1, in particular (70), an estimator of order $\alpha$ has bias $\mathrm{O}(\eta^{\alpha})$ :

\left\lvert\frac{1}{\eta}\int_{0}^{+\infty}\mathbb{E}\bigl{(}R(X^{\eta}_{t})% \bigr{)}\,dt-\rho\right\rvert\leqslant\mathcal{C}\eta^{\alpha}.

(91)

Note that we did not truncate the time-integral in the estimator above as the finite-time integration bias vanishes as $T\to+\infty$ , allowing us to solely quantify the $\eta$ bias. In view of (68), we can rewrite (91) as

\left\lvert\frac{1}{\eta}\int_{\mathcal{X}}(-\mathcal{L}^{-1}R)\circ\Phi_{\eta% }^{\alpha}\,d\mu-\int_{\mathcal{X}}(-\mathcal{L}^{-1}R)S\,d\mu\right\rvert% \leqslant\mathcal{C}\eta^{\alpha},

(92)

where we used that $\mathcal{L}^{-1}R$ has average 0 with respect to $\mu$ . We write the bias in the form (92) since the low-dimensionality of the system in consideration allows us to directly compute the bias by discretizing $\mathcal{L}$ and solving the associated PDE. Note that the bias result presented here holds for both the naive transient (13) and subtraction (17) estimators.

Choice of observable

We consider the following observable, which has average 0 with respect to $\mu$ by construction:

R(q,p)=\left\lparen\cos(q)-\sin(q)\right\rparen\mathrm{e}^{\beta V(q)}.

(93)

This choice is also considered in order to avoid symmetries in the response function (which may occur for typical observables such as $p$ and $\nabla V$ ) so that the results are clearly presented; see [spacek2023, Section 4.2], for a more detailed discussion regarding the symmetries and the observable. Furthermore, the forcing $F$ in consideration is a normalized constant force, i.e. $F=1$ .

Numerically estimating the bias

The low dimensionality of this example allows us to analytically compute (92) through a direct computation of $-\mathcal{L}^{-1}R$ via finite-difference methods, and the use of quadratures for the associated integrals over the phase-space; see [spacek2023, Appendix B] for precise details on the numerical implementation of the finite-difference scheme. The unbounded momentum domain is truncated to $[-p_{\mathrm{max}},p_{\mathrm{max}}]$ , with $p_{\mathrm{max}}=5$ . The domain $[-\pi,\pi]\times[-p_{\mathrm{max}},p_{\mathrm{max}}]$ is then discretized into $m_{q}=200$ by $m_{p}=400$ points with uniform step sizes $\Delta q=2\pi/m_{q}$ and $\Delta p=2p_{\mathrm{max}}/(m_{p}-1)$ .

We consider two maps $\Phi_{\eta}^{\alpha}$ , for $\alpha=1,2$ , constructed from $\varphi_{1}$ and $\varphi_{2}$ obtained in Section 4.1, with particular forms

\begin{cases}\begin{aligned} &\Phi_{\eta}^{1}(q,p)=p+\eta,\\ &\Phi_{\eta}^{2}(q,p)=p+\eta-\eta^{2}\frac{\beta}{2}p.\end{aligned}\end{cases}

(94)

The bias was computed for various values of $\eta$ , and the results are shown in Figure 2 in a log-log scale, with reference lines included. This confirms that the bias associated with an $\alpha$ -ordered map is itself of order $\alpha$ , which is the main estimate of Proposition 1.

4.3 Mobility and shear viscosity for Lennard–Jones fluids

We next present some numerical results highlighting the variance-reduction potential of the transient subtraction method. The example in consideration is the computation of shear viscosity and mobility for a Lennard–Jones fluid. The system is composed of $N$ particles in spatial dimension $D=3$ (so that $d=DN$ ), evolving according to the underdamped Langevin dynamics (1). The potential energy corresponds to the sum of pairwise interactions

V(q)=\sum_{1\leqslant i<j\leqslant N}v(\lVert q_{i}-q_{j}\rVert),

(95)

with $v(r)$ given by the standard 12-6 Lennard–Jones interaction potential:

v(r)=4\varepsilon\left[\left\lparen\frac{\sigma}{r}\right\rparen^{12}-\left% \lparen\frac{\sigma}{r}\right\rparen^{6}\right].

(96)

The parameter $\varepsilon$ represents the depth of the potential well, and $\sigma$ is the distance at which interactions are 0, i.e. when interactions go from attractive to repulsive. In practice, one truncates the range of (96) at some value $r_{\mathrm{c}}$ , after which interactions can be deemed negligible. We employ the truncated shifted-force cutoff method with a cutoff value of $r_{\mathrm{c}}=2.5\sigma$ , resulting in the modified potential

v_{\mathrm{SF}}(r)=[v(r)-v(r_{\mathrm{c}})-(r-r_{\mathrm{c}})v^{\prime}(r_{% \mathrm{c}})]\mathbf{1}_{r\leqslant r_{\mathrm{c}}}.

(97)

We numerically integrate the dynamics using the BAOAB splitting scheme. The simulations were conducted using with the Molly.jl package [greener2024] in the Julia language, and were performed in dimensionless reduced units with $\sigma=\varepsilon=k_{B}=1$ and the mass matrix $M=\mathrm{Id}$ . For both shear viscosity and mobility computations, the results we present correspond to averages over $10^{5}$ realizations of the system with i.i.d. initial conditions.

We next describe the strategy for initializing and evaluating the trajectories, which is the procedurally identical for the mobility and shear cases. Each independent realization of the system is initialized as follows. For the equilibrium control system, initial momenta were sampled from the Boltzmann–Gibbs measure, while initial positions were initialized on a uniform grid. The system was then evolved for a thermalization time of $T_{\mathrm{therm}}=1$ with a timestep size $\Delta t=10^{-3}$ in reduced units. We ensured that the thermalization time was sufficient long to melt the crystal structure and relax the system to a stationary-state, as monitored by the stabilization of kinetic and potential energies, and by visual inspection of the molecular structure. Next, we initialize the transient trajectory by applying the transformation (98) to a copy of the stationary equilibrium system:

\displaystyle\begin{pmatrix}q_{0}^{\eta}\\ p_{0}^{\eta}\end{pmatrix}=\Phi_{\eta}(q_{0}^{0},p_{0}^{0})=\begin{pmatrix}q_{0% }^{0}\\ p_{0}^{0}+\eta F(q_{0}^{0})\end{pmatrix},

(98)

where the expression for $F(q)$ is made precise for mobility and shear viscosity in Sections 4.3.1 and 4.3.2, respectively. The equilibrium and transient trajectories are then evolved simultaneously according to synchronously coupled standard equilibrium dynamics. The integration time $T$ should not be much larger than the relaxation time of the transient trajectory, as decoupling becomes a significant source of error. Nevertheless, this can be overcome during postprocessing, during which one can choose the appropriate truncation time for the estimator. Observational runs should be performed beforehand to ensure that one knows the approximate relaxation time (which varies significantly depending on the system at hand), which can be deduced from reasonably coarse and inexpensive runs. All simulation parameters are made precise in Table 1.

Parameter	Shear visc.	Mobility
Integration time ( $T$ )	3.5	2.0
Thermalization time ( $T_{\mathrm{therm}}$ )	1	1
No. of realizations ( $K$ )	$10^{5}$	$10^{5}$
Timestep ( $\Delta t$ )	$10^{-3}$	$10^{-3}$
Inverse temp. ( $\beta$ )	1.25	0.8
Damping ( $\gamma$ )	1	1
No. of particles ( $N$ )	1000	1000
Particle density ( $\varrho$ [ $N$ /vol.])	0.7	0.6
LJ cutoff ( $r_{\mathrm{c}}$ )	2.5	2.5
Mass matrix $M$	Id	Id
LJ param. $(\sigma)$	1	1
LJ param. $(\varepsilon)$	1	1
Boltzmann const. ( $k_{B}$ )	1	1

Table 1: Simulation parameters

4.3.1 Mobility

When under the effect of a constant external field $F$ , the mobility quantifies the particles’ drift velocity in the direction of the applied field. In our Lennard–Jones fluid example, we consider a constant force applied in the $x$ -direction, and in particular we consider colored drift, which amounts to perturbing half the particles to one direction, and the other half in the opposite direction:

F=\frac{1}{\sqrt{N}}(F_{1},F_{2},\dotsc,F_{N})^{\mathsf{T}}\in\mathbb{R}^{3N},% \qquad F_{i}=((-1)^{i+1},0,0),\qquad i=1,\dotsc,N.

(99)

The observable we consider is the velocity in the direction $F$ , and is the standard choice for mobility computations [lelievre2016, Section 5.2.2]:

R(q,p)=F^{\mathsf{T}}M^{-1}p.

(100)

$\eta$	Naive	Subtraction	Ratio
	Variance at $T=1$
0.01	${2.66}\times 10^{3}$	${4.69}\times 10^{-3}$	${5.67}\times 10^{5}$
0.1	26.6	${4.65}\times 10^{-3}$	${5.71}\times 10^{3}$
1.0	0.265	${4.37}\times 10^{-3}$	60.6

(a) Data at

T=1

(start of decoupling)

	Variance at $T=2$
$\eta$	Naive	Subtraction	Ratio
0.01	${5.70}\times 10^{3}$	11.8	485
0.1	57.2	5.52	10.4
1.0	0.564	0.286	1.97

(b) Data at final time

T=2

(total decoupling)

Table 2: Comparison of variances between naive and subtraction transient estimators for various values of

\eta

for the computation of mobility.

Each trajectory is integrated for a physical time $T=2$ . Although the system relaxes significantly before, we deliberately wanted to observe the decoupling point, which can be easily spotted in Figure 3, which shows the average trajectories for two values of $\eta$ . Due to the large signal-to-noise ratio of the mobility response, the subtraction’s uniform bound in $\eta$ of the variance indeed shows to make a difference, as can readily be seen from Figure 3. The associated error bars shown in Figure 3 were computed with empirical averages over the independent realizations.

To quantitatively assess the variance reduction and the decoupling effect, we consider the variance values for both the naive transient and subtraction methods at two different times $T$ : one right before trajectories decouple $T=1$ , and one at the final time $T=2$ . These results are summarized in Table 2. At $T=1$ , we indeed see the variance’s uniform bound in $\eta$ for the subtraction trajectory, while the $\eta^{-2}$ factor shows for the naive trajectory. Additionally, we notice that at $T=2$ , even after significant decoupling, the subtraction method still provides significant variance reduction, even long after relaxation has occured.

4.3.2 Shear viscosity

The shear viscosity of a fluid can be computed in a variety of ways; see [todd2007] for a review on computational techniques. In this work, we consider a setting based on the transverse force-field method [joubaud2012] with a sinusoidal forcing profile. We denote by $F_{i}\in\mathbb{R}^{3}$ the force acting on the $i$ th particle:

F_{i}=(f(q_{y}),0,0)^{\mathsf{T}},\qquad f(y)=\sin\left\lparen\frac{2\pi y}{L_% {y}}\right\rparen.

(101)

The force acts on the $x$ -component of the momenta based on the particle’s $y$ -coordinate position. The observable $R$ of interest is the imaginary part of the first empirical Fourier coefficient $U_{1}$ :

R(q,p)=\operatorname{Im}(U_{1}),\qquad U_{1}=\frac{1}{N}\sum_{n=1}^{N}(M^{-1}p% )_{n,x}\exp\left\lparen\frac{2\mathrm{i}\pi q_{n,y}}{L_{y}}\right\rparen.

(102)

The initialization and evaluation of trajectories for this system were performed as described in Section 4.3. We consider the same numerical results as shown in Section 4.3.1, with largely the same interpretations and conclusion. A first difference, clearly seen from Figure 4, however, is the magnitude of the error for the naive trajectories. This is a trivial artifact of the observable $R(q,p)$ : for the mobility, the observable is $\mathrm{O}(\sqrt{N})$ , while it is $\mathrm{O}(1)$ for the shear case, since (102) corresponds to some spatial averaging. Secondly, the relaxation time for the shear trajectories is significantly longer compared to the one for mobility, and in fact almost coincides with the decoupling time. Nonetheless, Table 3 shows that variance reduction is still obtainable.

	Variance at $T=2$
$\eta$	Naive	Subtraction	Ratio
0.01	7.32	0.0260	281
0.1	0.0726	${5.49}\times 10^{-3}$	13.2

(a) Data at

T=2

(start of decoupling)

$\eta$	Naive	Subtraction	Ratio
	Variance at $T=3.5$
0.01	15.0	2.45	6.12
0.1	0.150	0.0487	3.07

(b) Data at final time

T=3.5

(total decoupling)

Table 3: Comparison of variances between naive and subtraction transient estimators for various values of

\eta

for the computation of shear viscosity.

5 Conclusion and perspectives

We presented a variance reduction method to compute transport coefficients based on a transient approach with control variate, where the trajectories which relax are coupled to equilibrium ones. The numerical results in Section 4 show significant variance-reduction potential, suggesting this method is viable as its implementation is neither complex nor expensive; in fact, it is roughly twice the cost of a typical run due to the control system, so that the computational overhead is more than compensated by the variance reduction. For general systems, the bottleneck lies in the construction of the transformation $\Phi_{\eta}$ , as the PDEs (43) and (44) might not have a readily available solution for some given conjugate response $S$ of interest.

This works calls for several extensions. A particularly appealing one is to explore other types of couplings. For the systems we considered, synchronous coupling was largely successful in keeping the trajectories sufficiently close during the transient relaxation. For the shear viscosity example, relaxation and decoupling almost coincided, which suggests that a less dissipative system is likely to undergo decoupling significantly before convergence. Such scenarios motivate exploring more robust coupling strategies to delay decoupling, such as the ones described in [guillin2012, monmarche2023a, chak2024]. \printbibliography

Transient subtraction: A control variate method for computing transport coefficients

Abstract

1 Introduction

Outline

2 Transient method to compute transport coefficients

2.1 General setting

2.2 Numerical techniques to compute transport coefficients

2.3 Transient dynamics method

Mathematical formulation

Estimators of transient dynamics

Asymptotic variance of usual transient estimator

3 Transient subtraction method

3.1 Constructing the method

Remark 1 (Tangent dynamics).

3.1.1 Synchronous coupling

Assumption 1.

Lemma 1.

Proof.

3.1.2 Properties of initial conditions

3.2 Numerical analysis of the transient subtraction method

3.2.1 Functional estimates

Assumption 2 (Lyapunov estimates).

Assumption 3 (Decay estimates on semigroup operator).

3.2.2 Analysis of the bias

Proposition 1 (Finite η𝜂\etaitalic_η bias).

Remark 2.

Remark 3 (Well-posedness of PDEs).

Proof of Proposition 1.

Assumption 4.

Corollary 1.

Remark 4.

Proof of Corollary 1.

3.2.3 Analysis of the variance

Proposition 2 (Variance of transient subtraction estimator).

Remark 5 (Comparison with transient method).

Proof of Proposition 2.

4 Application to Langevin dynamics

4.1 Transient methods for Langevin dynamics

First-order map φ1subscript𝜑1\varphi_{1}italic_φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

Second-order map φ2subscript𝜑2\varphi_{2}italic_φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

4.2 One-dimensional Langevin dynamics

Choice of observable

Numerically estimating the bias

4.3 Mobility and shear viscosity for Lennard–Jones fluids

4.3.1 Mobility

4.3.2 Shear viscosity

5 Conclusion and perspectives

Proposition 1 (Finite $\eta$ bias).

First-order map $\varphi_{1}$

Second-order map $\varphi_{2}$