1. Introduction
Langevin dynamics is a system of stochastic differential equations of the form
Here
represent vectors of the positions and momenta of the particles comprising together
degrees of freedom. The mass matrix
is symmetric positive definite. The force
is usually taken to be conservative, i.e.,
for some potential energy function
U. Langevin dynamics describes a physical system of particles moving under prescribed interaction forces and subject to collisions with particles of a “heat bath.” The friction matrix
, which is here allowed to vary with position, typically models drag on a set of distinguished particles due to the interactions with the surrounding environment, whereas the matrix
, also potentially varying with position, characterizes the stochastic effects of collisions.
represents a vector of independent and uncorrelated Wiener processes in
,
, where the components of its formal time derivative
are independent white noise processes such that
and
In case
where
is a nonnegative scalar and constant, one may take
, where
,
T is the temperature and
is Boltzmann’s constant. The resulting system can then be shown, under conditions reviewed in
Section 2, to have a unique invariant distribution with density
, where
represents the total energy of the isolated deterministic system and
.
The emphasis in this article is on the general form (
1) and (2), for which we will relax several typical assumptions. First, the system is not assumed to admit a universal fluctuation-dissipation relation, instead we assume only certain nondegeneracy conditions. Second, we do not assume a conservative force field. Such generalized forms of Langevin dynamics can be used to model diffusion (or thermal) gradients by particle simulation [
1,
2,
3], as well as a variety of models for flocking [
4,
5,
6], protein folding [
7] and bacterial suspensions [
8]. Problems with nonconservative forces are considered often in the physics literature, but the precise characterization of the convergence to stationary conditions is rarely discussed. In the general case, the form of the stationary distribution is non-obvious and the uniqueness of the steady state and its asymptotic stability are not well understood. An important contribution of this article is to show the existence of a steady state for the model (
1) and (2) under relaxed conditions and to explicitly construct a Lyapunov function, thus allowing us to conclude the geometric convergence of observable averages.
Our approach to proving ergodicity is dependent on the fact that the friction and noise terms in the equations depend only on positions. On the other hand, within this class, and subject to the nondegeneracy condition on the friction and noise tensors, our treatment is general and has a considerable range of applications. Formally (
1) and (2) includes dissipative particle dynamics (DPD), which is a momentum-conserving, 2nd order, gradient-type system [
9,
10], but we specifically exclude in this article cases for which
is not positive definite, which occur as a direct consequence of momentum conservation in DPD; ergodic properties for DPD systems in one dimension were discussed in previous work of Shardlow and Yan [
11]. See also [
12,
13,
14] for some special cases of Langevin-type systems with velocity dependent coefficients to which our formulation is not applicable.
To perform numerical discretization of the system (
1) and (2), we proceed by splitting. This involves performing an additive decomposition of the stochastic vector field into several component parts, each of which can be exactly integrated (in the sense of distributions). The resulting stochastic maps can be composed to approximate the solution in a single timestep. Even for a particular choice of decomposition, there are many ways to combine the sequence of steps. The properties of various choices of the splitting method have been examined in detail for constant
and
in [
15]. In that work it has been shown that the numerical methods constructed using certain components inherit the ergodicity properties of the SDE system, whereas the invariant measure of the numerical method differs from that of the SDE. The error in the invariant measure can be characterized in terms of the order of accuracy of averages of test functions. In general, it is found that symmetric compositions are preferred as they exhibit, under mild conditions, even order of accuracy for the invariant measure (thus one obtains second order accuracy for the same computational work as a first order scheme).
In the case of the more general systems considered in this article, the calculation of the Ornstein-Uhlenbeck solution, which is required at each step of our splitting-based numerical methods, becomes potentially demanding from a computational standpoint. In the methods of [
15,
16] this is done exactly, however in the current setting the solution of a Lyapunov equation would need to be obtained at each step in time, a calculation that would dominate the computational load in a large scale simulation. Therefore, we offer an efficient numerical procedure based on multiple timestepping [
17], relying on a further splitting of the OU equation at each interior timestep.
In the last section of this article, we illustrate the theory in three numerical examples. The first is a particle system with an imposed diffusion gradient, as in [
7,
18,
19,
20,
21]. Here the issue is to capture the correct density fluctuations as a function of position. In the second model, we incorporate a stirring force which alters the equilibrium state of a physical model. In the last example, we discuss the use of a blended Langevin model which draws on ideas from the literature of flocking, in particular the Cucker-Smale system. We use our theory to infer attractive steady states for the system, and characterize flocking tendencies by the use of two order parameters: one modelling the formation of
consensus and the other characterizing the
peculiarity which can be viewed as the average internal energy of isolated clumps of matter.
2. Stationary States of SDEs and Their Stability
In this section, we briefly outline the general theory on which our analysis of the ergodic properties of the system (
1) and (2) in the next section is based. We are following in large parts the presentation in the review articles [
22,
23] and we refer the reader to these articles for a more detailed and comprehensive presentation.
Let
denote the solution of an Itô diffusion process of the form
taking values in a suitable domain. In this general presentation,
may represent either a position vector or else the combined vector of positions and momenta, i.e.,
. In the latter case, we would typically either assume a compact configurational domain, as for example when periodic boundary conditions are used in the position space, i.e.,
, where
and
denotes the 1-torus, or else an unbounded domain, e.g.,
.
represents a standard Wiener process in
and the coefficients
,
are assumed to be smooth, i.e.,
, and
. The initial state of
is specified by the probability measure
, which throughout this article we assume to be such that
has finite mean and variance, i.e.,
2.1. The Associated Semigroup of Evolution Operators and Their Adjoints
The time evolution of the expectations of observables under the dynamics of the SDE is described by the semigroup of
evolution operators
for
, where the expectation is taken with respect to the Wiener measure associated with the driving noise process
and
denotes a set of
test functions or
observables, which is contained in
, the set of all real valued measurable functions defined on the domain
. If the test function set
is chosen appropiately, (e.g.,
, where
denotes the set of all smooth bounded real valued functions defined on
) the action of
corresponds to the solution of an initial value problem, i.e.,
, where
u solves
The operator
, defined such that
for all
, is referred to as the
infinitesimal generator. As the action of the operators
is given as the solution of the differential Equation (
6), it is common to use the notation
. In the case of an Itô diffusion process (
5), and with the regularity assumptions on the coefficients
stated above, it can be shown that the infinitesimal generator takes the form
where “
” denotes the Frobenius product, i.e.,
See, e.g., [
24] for the case
, and [
25] for extensions from that
core to more general test function sets such as the ones considered below.
It is often convenient to adopt a dual perspective by considering the evolution of the density of the law
of
in time. Let
, for
, in the sense of distributions. The corresponding semigroup
of
transfer operators is then naturally defined as the formal adjoint operators of
, i.e.,
assuming
.
Similarly as for its adjoint, the action of
corresponds to the solution of an initial value problem, which in this case is known as the
Fokker-Planck equation. More specifically,
where
denotes the probability density of
and
—the
Fokker-Planck operator—can be shown to correspond to the
-adjoint of the infinitesimal generator
, i.e.,
thus
2.2. Hypoellipticity and Existence of a Smooth Transition Kernel
Note that (
7) is in general to be interpreted in a weak sense as, up to this point, we have not made any assumptions on the regularity of
. However, within the scope of this article it is sufficient to consider the case where the differential operator
is
hypoelliptic. A differential operator
A is said to be hypoelliptic, if for any
g solving the differential equation
, it follows that
g is of higher regularity than
f in the sense that
with
, where
denotes the local Sobolev space of order
. This means that if
is hypoelliptic, then the solution of (
7) is smooth in the sense that
irrespective of the regularity of
. A common way to establish hypoellipticity of a differential operator is via Hörmander’s theorem ([
26], Theorem 22.2.1, on p. 353):
Theorem 1. Let A be a differential operator of the formwhere are vector fields in and † indicates the formal adjoint. Iteratively define a collection of vector fields bywheredenotes the commutator of vector fields and their Jacobian matrices. Ifthen the operator A is hypoelliptic. The condition (
9) is commonly referred to as
Hörmander’s condition. In the particular case of
one can easily verify that (
9) is exactly satisfied if
with
and
refers to the
i-th column of the diffusion tensor
in (
5). This particular version of Hörmander’s condition adapted to the parabolic PDE of the form (
7) is referred to as the
parabolic Hörmander condition.
The direct consequence of the parabolic Hörmander condition is the smoothness of the underlying transition kernel describing the evolution of the probability measure associated to the SDE.
2.3. Ergodicity and Convergence in Law
Let now
be a probability measure with density
. When we say that
is an
invariant measure of the SDE (
5) or that the latter preserves the probability measure
, we mean that the probability density
is a stationary solution of the Fokker-Planck equation associated to the SDE, i.e.,
Note that for (
10) to be well posed in a strong sense it is sufficient that
is hypoelliptic, i.e., the Fokker-Planck operator
satisfies the Hörmander condition. It is also straightforward to see that every convex combination of two invariant measures is again an invariant measure of the respective SDE which by definition means that the set of invariant measures of a particular SDE is convex.
Let for
and some suitable observable
,
denote the finite time average of
evaluated along the path of the solution
of (
5). Similarly, we write
for the
-weighted spatial average of
. The process
is said to be
ergodic with respect to the invariant probability measure
if for all
and for almost all realizations of the Wiener process
, and
-almost all initial values
, trajectory averages coincide with expectations with respect to the measure
in the asymptotic limit
, i.e.,
It can be shown (see [
27]) that ergodicity follows if (i) there exists an invariant measure with positive smooth density and (ii)
is
hypoelliptic.
Assume that
converges in law towards a unique invariant measure
, i.e.,
for all
,
-almost all
. A common way to characterize the convergence of the expectation
, or, in semi group notation the convergence of
to the
-weighted average
, is via functional decay estimates of the semi-group operators
. For this purpose a set of test functions
is fixed and equipped with a norm
such that
forms a Banach-space. Of particular interest in this context is exponential convergence of
towards
in the respective norm, i.e.,
where
are positive constants, the latter corresponding to the spectral gap of the generator
in the functional space
, where
denotes the subset of test functions with vanishing mean, i.e.,
Let the operator
denote the orthogonal projection from
onto
, i.e.,
Denote further by
the operator norm
of an operator
. Equation (
13) implies that
when considered as an operator on
E is bounded in the operator norm
as
2.4. Finite-Time Averages and the Central Limit Theorem
Ergodicity of an SDE ensures that time averages of infinitely long trajectories almost surely coincide with spatial averages with respect to the target measure. However, ergodicity per se does not address the statistical properties of finite-time averages apart from the convergence of the time average
as
. For practical applications, where for a given unique invariant measure, the aim is to approximate the
-weighted average
of a test function
, it is important that fluctuations of finite-time averages
(i.e., the Monte Carlo error in the finite-time approximations) around the infinite-time value
can be quantified. For this purpose it typically regarded as necessary that a central limit theorem holds, i.e.,
where
is commonly referred to as the asymptotic variance of the observable
. A sufficient condition for a central limit theorem of the form (
15) to hold for the solution
of (
5) and the observable
is that the Poisson equation
has a solution which belongs to
(see [
28]). The asymptotic variance then takes the form
Let
denote the subspace of
consisting of functions with vanishing mean. Note that for
it is a priori not clear how to interpret (
16) since only under additional regularity assumptions (
16) can be interpreted in a weak sense. A common way to makes sense of (
16) is by deriving bounds for the operator
in
where
is some subspace of
. If
is bounded in
, this then directly implies
in (
16) for
. The relationship between the spectral properties of the operator
and the convergence properties of the solution process
, or more specifically the decay properties of the semi-group operator
as
can be made more precise via the formal identity
which follows from
for
. Using the identity (
17), one directly finds that (
14) is a sufficient condition for
to be bounded since
We conclude that the exponential decay (
13) implies the central limit theorem (
15) for
in the corresponding function space
. Estimates for
can be obtained using the framework of hypocoercivity as presented in [
29]. In [
30] techniques are introduced to show the decay estimate (
13) for
. In this article we will use Lyapunov function-based techniques which allow to show exponential convergence as in (
13) in some weighted
spaces, which we specify in the next section.
2.5. Exponential Convergence in Weighted Spaces
Another way of deriving exponential decay estimates for the semi-group
, which are sufficient to establish a central limit theorem for certain observables, is by means of well established Lyapunov techniques. These techniques have been formulated originally for discrete-time Markov processes/Markov chains [
31,
32,
33] and have been subsequently extended to continuous time solutions processes of SDEs [
22,
23,
34,
35]. The function space on which decay estimates are shown in these references are Banach spaces of the form
, with
where
is a positive function, and
Exponential convergence in the sense of
is typically shown provided that the following two assumptions are satisfied.
Assumption 1 (Infinitesimal Lyapunov condition)
. There is a function with , and real numbers such that, Assumption 2. For some there exists a constant and a probability measure ν such thatwhere for some where are the same constants as in (23). As outlined in [
23,
35], Assumption 2 follows if
satisfies the parabolic Hörmander condition and the SDE (
5) is controllable in the sense that there is a
such that for any pair
, there exists a continuous control
, such that the solution
of the differential equation
satisfies
and
. The following theorem is derived in [
22] using results from [
33]. Similar results can be found e.g., in [
23,
36].
Theorem 2. Suppose that Assumptions 1 and 2 hold. Then, the solution of the SDE (5) admits a unique invariant probability measure μ, such thatMoreover, there exist and such that (22) holds for all and any . Finally, let us demonstrate how decay estimates in spaces
can be used to derive a central limit theorem for certain functions. Let
be a Lyapunov function such that the conditions for Theorem 2 are satisfied for
. Note that if the conditions of Theorem 2 are also satisfied for
, then this implies that a central limit theorem holds for all observables
, since (
24) being valid for
implies
Thus, the inequality (
19) for
again implies that the solution
of (
16) is contained in
for
, so that by [
28] indeed a central limit theorem of the form (
15) holds for
. This motivates to show (
22) for a wide class of Lyapunov functions. In the next section, we consider the case of the Langevin dynamics with position dependent coefficients (
1) and (2) and show (
22) for Lyapunov functions being of the form of polynomials of even but arbitrarily high degree.
4. Numerical Discretization
We next describe the construction of numerical methods for the Langevin system. The discretization schemes described here are based on a general splitting framework as explained in the introduction, adapted for the variable coefficient structure.
Following [
15], we break the Langevin system (
1) and (2) into three parts: A and B corresponding to the deterministic flow:
and
and O, which is associated to the isolated momenta diffusion process defined by an Ornstein-Uhlenbeck type equation, i.e., the stochastic system
A stochastic splitting method is then obtained by concatenating the (stochastic) flow maps corresponding to the B, A and O part. Although other decompositions can be used for splitting, experience in the constant coefficient case has shown that it is beneficial to maintain the same balance of noise and dissipation of the original equations by keeping these terms together. Among stochastic splitting schemes based on this decomposition, the symmetric integration sequence BAOAB was observed to yield a lower discretization bias for ergodic averages in comparison to other integration sequences requiring only one evaluation of the gradient
[
16]; moreover, in the case of constant coefficients, this integration sequence has been shown to yield a superconvergent (4th order) error in configurational (
-dependent) observables [
15].
In the case of non-constant coefficients
, exact solution of the O-step can be computationally costly. More specifically, bearing in mind that
is fixed during the isolated Ornstein Uhlenbeck process (
58), a time-step
can be written:
where
, and
The matrix
is related to
as
where
is the solution of the Lyapunov equation
This means that in order to compute an exact solution of the O-part as an (
59), first the Lyapunov Equation (
62) has to solved, followed by the computation of the Cholesky decomposition (
61) and matrix exponential (
60). Each of these operations is without any additional assumptions on the structure of the matrices
and
of computational complexity
. We circumvent these computations by instead integrating the O-part using a numerical method. We construct a symmetric splitting method based on the decomposition
and use the integration sequence DFD, which results in an update of the form
with
In order to avoid large numerical errors induced by this approximation we employ a multiple timestepping approach [
17], meaning that instead of executing a single integration step of (
63) for a full time step, we repeat for
, the integrations step (
63)
K-times using a step size
.
We refer this variant of the BAOAB integration scheme using a multiple time stepping solution of the
O-part as “multiple time stepping BAOAB” (m-BAOAB). We provide the implementation of this method in Algorithm 1. The performance evaluation of schemes such as this can be demanding as there is a trade off between computational costs and accuracy of the approximation of the solution of the Ornstein-Uhlenbeck process, which in turn affects the accuracy of the invariant measure of the numerical method. The parameter
K in the m-BAOAB algorithm allows control of the accuracy with which the OU process is resolved. The method was tested in the example of
Section 5.3, below, but the detailed numerical analysis of this method will be taken up in a forthcoming article.
Algorithm 1 m-BAOAB |
- 1:
INPUT - 2:
- 3:
- 4:
- 5:
for to K do - 6:
- 7:
end for - 8:
- 9:
- 10:
return
|