Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman

Physica D 166 (2002) 239257
Optimal prediction with memory

Alexandre J. Chorin a , Ole H. Hald a , Raz Kupferman b,
a Department of Mathematics, University of California, Berkeley, CA 94720, USA
b Institute of Mathematics, The Hebrew University, Jerusalem 91904, Israel
Received 9 July 2001; accepted 21 February 2002

Communicated by R. Temam
Abstract
Optimal prediction methods estimate the solution of nonlinear time-dependent problems when that solution is too complex
to be fully resolved or when data are missing. The initial conditions for the unresolved components of the solution are
drawn from a probability distribution, and their effect on a small set of variables that are actually computed is evaluated via
statistical projection. The formalism resembles the projection methods of irreversible statistical mechanics, supplemented
by the systematic use of conditional expectations and new methods of solution for an auxiliary equation, the orthogonal
dynamics equation, needed to evaluate a non-Markovian memory term. The result of the computations is close to the best
possible estimate that can be obtained given the partial data. We present the constructions in detail together with several useful
variants, provide simple examples, and point out the relation to the fluctuationdissipation formulas of statistical physics.
2002 Elsevier Science B.V. All rights reserved.
PACS: 05.10.a; 05.40.a
Keywords: Optimal prediction; Memory; Langevin equations; Orthogonal dynamics; Underresolution; Hamiltonian systems; Hermite
polynomials
1. Introduction
Many problems in science and engineering are described by nonlinear differential equations whose solutions
are too complicated to be properly resolved and/or where needed data are unavailable. The problem of predicting
the evolution of such systems has been addressed by the present authors and others in [111]. Nothing can be
predicted without some knowledge about the unresolved (subgrid) degrees of freedom. In the papers just cited it
is assumed that one possesses, as one often does, prior statistical information about the system in the form of an
initial probability distribution; what is sought is a mean solution with respect to this initial distribution, compatible
with the partial information available initially as well as with the limitations on the computing power one can bring
to bear. This mean solution is the conditional expectation of the solution given the partial initial data, and is the best
available estimate of the solution of the full problem.
Corresponding author.
E-mail addresses: chorin@math.berkeley.edu (A.J. Chorin), hald@math.berkeley.edu (O.H. Hald), raz@math.huji.ac.il (R. Kupferman).
0167-2789/02/$ see front matter 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 2 7 8 9 ( 0 2 ) 0 0 4 4 6 - 3
240 A.J. Chorin et al. / Physica D 166 (2002) 239257
The simplest construction of this conditional expectation, first-order optimal prediction ([35], see also below),
produces a small system of ordinary differential equations and works well for a time that depends on the degree of un-
derresolution and on the uncertainty in the initial conditions. This approximation is optimal in the class of Markovian
approximations [10], but eventually exhibits errors because the influence of partial initial data on the distribution of
the solutions weakens in time, and this loss of information is not captured, see [6,7]. As shown in the present paper,
an accurate estimate of a subset of variables requires the addition of a memory term, and the resulting prediction
scheme becomes a generalized Langevin equation, similar to those in irreversible statistical mechanics [1216].
Some of the relations between conditional expectations and irreversible statistical mechanics have been discussed
in [6]. The memory depends on a solution of an auxiliary equation, the orthogonal dynamics equation, and in the
present paper we also present algorithms for finding this solution. We also explain how the machinery can lead to
novel ways of using prior measurements to predict the future behavior of complex systems. We apply our methods
to a simple model problem. Related, partial and more heuristic, constructions have been presented in [1,17].
2. Projections of dynamical systems and Langevin equations
Consider a system of ordinary differential equations,

d
(t) = R((t)), (0) = x, (1)
dt
where and x are n-dimensional vectors with components i and xi , and R a vector-valued function with compo-
nents Ri ; t is the time. We denote the vector space in which and x reside by ; in classical statistical physics this
space is the n = 6N-dimensional space of coordinates and momenta (qj , pj ), where N is the number of particles
in the system. The case where is infinite-dimensional and (1) is a partial differential equation can be analyzed by
the methods of [18].
To each initial condition x in (1) corresponds a trajectory, (t) = (x, t); the initial value x is emphasized by this
notation in view of its key role in what follows. The map x (x, t) is the flow map. Our goal is to calculate average
values of the first m components of , m < n, without necessarily calculating all the components; the average is
over all the values that the remaining n m components may initially take. We assume that prior information
allows us to make statistical statements about the missing initial data. To shorten notations, we denote by x the
m-dimensional vector whose entries are the resolved components, (x1 , . . . , xm ), and by x the (n m)-dimensional
vector of unresolved components, (xm+1 , ... , xn ); thus, x = (x, x).
Similarly, (x,
t) = (1 (x, t), . . . , m (x, t))
denotes the m components of the solution that we focus on.

Let L = ni=1 Ri (x)i , (i = /xi ), and consider the linear partial differential equation

u(x, t) = Lu(x, t), u(x, 0) = g(x) (2)
t
for some function g(x). This is the Liouville equation. One can verify that the solution of this equation is u(x, t) =
g((x, t)). In particular, if g(x) = xi , the solution is u(x, t) = i (x, t), the ith component of the solution of (1).
We use the semigroup notation u(x, t) = (etL g)(x) = g((x, t)), where etL is the evolution operator associated
with the Liouville equation (2) (see, e.g. [19]). A short calculation shows that etL L = L etL . Eq. (2) becomes
tL
e g = L etL g = etL Lg.
t
Suppose that the initial conditions x are drawn from a probability distribution , where (dx) = (x) dx, and (x)
a probability density function. Given the initial values of the m coordinates x, the distribution of the remaining
n m coordinates x is given by the conditional measure, conditioned by x. If the system (1) is Hamiltonian
A.J. Chorin et al. / Physica D 166 (2002) 239257 241
with Hamiltonian H , one can use as initial distribution the canonical distribution with density (x) = Z 1 eH (x) ,
where Z is a normalization constant. Hamiltonian systems are often of interest, and the canonical distribution is
often natural for physical reasons. These choices simplify parts of the analysis.
Given , functions on can be viewed as random variables, and one can use the terminology of probability
theory. We define the expected value of g by

E[g] = g(x)(x) dx.

We endow the space of functions on with the inner product (f, g) = E[fg], which makes it a Hilbert space
L2 (, ) (L2 for brevity). If (1) is a Hamiltonian system and the probability density is (x) = Z 1 exp(H (x)),
then the operator L is skew-adjoint in this Hilbert space.
We now derive an equation, often referred to as the generalized Langevin equation, which is a reformulation of
the equations of motion (1) for the resolved variables (x,
t). The derivation uses projection operators: functions
in L2 are projected onto the space L 2 of functions of the m-dimensional vector x.
Several different projections P
are considered:
(1) Let f L2 , and consider the orthogonal projection of f onto the span of all functions of x,
given by

f (x)(x) dx
=
(Pf )(x) , dx = dxm+1 dxn .
(x) dx
In the language of probability, (Pf )(x)
is the conditional expectation of f given x and is denoted by E[f |x];

see [20]. E[f |x]
is the best approximation of f by a function of x:

E[|f E[f |x]|

2 ] E[|f h(x)|
2]
for all functions h in L 2 . P is the nonlinear projection, used in [16] with a different interpretation, as well as
in [3,4,6].
(2) Given f L2 , define
m

(Pf )(x)
= aij1 (f, xi )xj ,
i,j =1
where aij1 are the entries of a matrix whose inverse has entries aij = (xi , xj ). This is the linear projection
widely used in irreversible statistical mechanics (see [12,13,21]).
say h (x),
(3) More generally, pick a set of functions of x, = 1, . . . , M; for convenience, make them orthonormal:
(h , h ) = . Define a projection
M

(Pf )(x)
= (f, h )h (x).

=1
If the h span L 2 as M increases, the result approximates the conditional expectation E[f |x]. This is the
finite-rank projection; it interpolates between the linear projection and the conditional expectation.
We now follow the MoriZwanzig procedure [12,14,15,21]. We consider the equation of motion for a resolved
coordinate j (x, t) = etL xj , and split the time derivative, Rj ((x, t)) = etL Lxj , into a term that depends only on
(x,
t) plus a remainder:
tL
e xj = etL Lxj = etL PLxj + etL QLxj , (3)
t
where Q = I P . Define Rj (x) = (PRj )(x);

the first term is etL PLxj = R((x,
t)) and is a function of the
resolved components of the solution only.
We further split the remaining term etL QLxj as follows: Let w(x, t) be a solution of the orthogonal dynamics
equation:

w(x, t) = QLw(x, t) = Lw(x, t) PLw(x, t), w(x, 0) = QLxj . (4)
t
In semigroup notation, w(x, t) = etQL QLxj . An existence proof for Eq. (4) may be found in [22]. One verifies that
if Pf = 0, then P etQL f = 0 for all time t, i.e., etQL maps the null space of P into itself.
The evolution operators etL and etQL satisfy the Dyson formula [12]:
t
e =e +
tL tQL
e(ts)L PL esQL ds,
0
which can be checked by differentiation. Hence,

t
e QLxi = e QLxj +
tL tQL
e(ts)L PL esQL QLxj ds. (5)
0
Let
Fj (x, t) = etQL QLxj , Kj (x,

t) = PLFj (x, t).
Note that multiplication by P always yields a function of x.

Collecting terms, we obtain the generalized Langevin
equation:
t
tL
e xj = etL Rj (x, t) + e(ts)L Kj (x,
s) ds + Fj (x, t).
t 0
This is an identity, which in a more transparent form reads

t

j (x, t) = Rj ((x,
t)) + Kj ((x,
t s), s) ds + Fj (x, t). (6)
t 0
The various terms in Eq. (6) have conventional interpretations. The first term on the right-hand side is the Markovian
contribution to t j (x, t)it depends only on the instantaneous value of the resolved (x, t). The second term
depends on x through the values of (x, s) at times s between 0 and t, and embodies a memorya dependence
on the past values of the resolved variables. Finally, the third term, which depends on full knowledge of the initial
conditions x, lies in the null space of P and can be viewed as noise with statistics determined by the initial conditions.
The fact that the memory depends on the noise is known in the physics literature as a fluctuationdissipation theorem.
The specific form of this relation given in physics books is obtained when P is the linear projection.
The last step is the multiplication of (6) by P :
t

P j (x, t) = P Rj ((x,
t)) + PKj ((x,
t s), s) ds. (7)
t 0
This identity involves only the known components x of the initial data. When P is the conditional expectation,
P (x,
t) = E[(x,
t)|x],
the right-hand side of Eq. (7) is exactly what we want: the derivative of the average of
(x,
t) conditioned by the initially given data.
Eqs. (6) and (7) are identities; their solution is exactly equivalent to the solution of the full problem (1) followed
by averaging. In practice these equations have to be solved approximately; we shall show below how to perform the
approximation term by term.
3. A model problem
We introduce a model problem to illustrate the formalism of the previous section and to test the accuracy of
various approximations described below.
Consider the dynamical systems in = R4 :
d d d d
1 = 2 , 2 = 1 (1 + 32 ), 3 = 4 , 4 = 3 (1 + 12 ). (8)
dt dt dt dt
Eq. (8) are derived from the Hamiltonian
H (x) = 21 x12 + x22 + x32 + x42 + x12 x32
with (x1 , x2 ) and (x3 , x4 ) canonical pairs of coordinates. This system describes two oscillators with a nonlinear
coupling.
We take initial data randomly distributed with the canonical probability density (x) = Z 1 eH (x) , thus en-
dowing the space of functions on with the inner product

f (x)g(x) eH (x) dx
(f, g) = , (9)
eH (x) dx
where dx = dx1 dx4 . We retain only two of the four variables, 1 and 2 , thus x = (x1 , x2 ) and x = (x3 , x4 ).
The goal is to compute the average of 1 (x, t) and 2 (x, t) over all initial choices of x3 and x4 .
In Fig. 1, we plot the time evolution of the mean values of 1 (x, t) and 2 (x, t), given 1 (x, 0) = 1 and
2 (x, 0) = 0. This graph was obtained by a Monte Carlo calculation: we generated a collection of 5 104 initial
Fig. 1. Mean solutions E[1 (x, t)|x]

(top) and E[2 (x, t)|x] (bottom) for the initial data x = (1, 0). The mean solution was computed by
evolving in time a set of 5 104 solutions with initial conditions generated by Monte Carlo sampling.
conditions x by fixing x = (1, 0) and sampling x3 ,x4 from the canonical distribution. Each initial datum was evolved
in time with an ODE solver. At each t > 0 the mean values of 1 (x, t) and 2 (x, t) were evaluated by averaging
over the set of solutions. Note the decay of the mean solution towards its equilibrium value; this phenomenon has
been explained in [6].
We now write down explicitly each of the projections defined in Section 2:
(1) The conditional expectation of a function f (x) is

f (x) eH (x) dx
(Pf )(x)
= E[f |x] = H (x) (10)
e dx
with dx = dx3 dx4 . The density eH (x) is Gaussian when x1 , x2 are fixed, thus the integrals in (10) can often
be calculated explicitly. For example,

2n 2 n 2n
Px2n
3 = (1 + x 1 ) , Px2n
4 = , n = 1, 2, . . . .
2 2
(2) It is easy to verify that (x1 , x1 ) = c = 0.715, (x2 , x2 ) = 1, and (x1 , x2 ) = 0, and so the linear projection is
= c1 (f, x1 )x1 + (f, x2 )x2 .

(Pf )(x)
(3) For functions f, g that depend only on x,

the inner product (9) takes the form:

Z 1 e(1/2)x1 e(1/2)x2
2 2
(f, g) = f (x)g(
x) dx,

2 1 + x12
where

e(1/2)x1 e(1/2)x2
2 2
1
Z= dx = 0.78964.
2 1 + x12
Let > 1/2 be a parameter. For each fixed value of the following family of functions constitutes an orthonormal
basis in the space of square integrable functions of x:

= Z 1/2 (1 + x12 )1/4 H 1 (x1 )H 2 (x2 ),
h (x) (11)
where = (1 , 2 ), 1,2 = 0, 1, . . . , and

H k (z) = (1 + 2)1/4 Hk ( 1 + 2z) ez /2 .
2
Here the Hk (z) are Hermite polynomials satisfying the recursion relation

1 k1
H0 (z) = 1, H1 (z) = z, Hk (z) = zHk1 (z) Hk2 (z).
k k
For future use, we also note that
d
Hk (z) = 1 + 2 H k1 (z) zH k (z). (12)
dz
The span of a finite collection of these functions changes when changes; we will use the freedom of choosing
to optimize the rate at which the finite-rank projection converges to the conditional expectation.
4. Conditional expectations calculated from previous measurements
Consider the problem (1) and suppose we know a large number of its solutions (x, t) for various initial conditions
x drawn from the distribution ; these solutions may come from prior Monte Carlo computation or from experiment.
The best L2 estimate of the solution when x is given is E[(x, t)|x].

Let h (x),
in some finite index set I , be a set of orthonormal basis functions; we approximate the conditional
expectation E[j (x, t)|x]
by a finite-rank projection

E[j (x, t)|x]
= (j (t), h )h (x)
cj (t)h (x),
(13)
I
where the inner products are integrations over .

This approximation makes it possible to use information from collections of prior measurements to predict the
behavior of a particular system with partially known initial conditions. If one has many functions (x, t), one can
evaluate the coefficients cj (t) and then make an optimal prediction for a specific case by substituting the known
values of the initial data x into the right-hand side of (13).
Here we remind the reader of a basic fact of numerical analysis: Approximation by a finite set of orthogonal
functions, especially on an infinite interval, may converge poorly (see, e.g. [23,24]); it is prudent to check the conver-
gence, for example by checking the Bessel inequality. As an illustrative example, suppose we want to approximate
the function f (x) = cos (xt) (t is a parameter) in the inner product space,

1
f (x)g(x) e(1/2)x dx.
2
(f, g) = (14)
2
In this case, the functions H k (x) defined above form an orthonormal basis. If one approximates f (x) by a finite
sum of the form
N

cos (xt) ak (t)H k (x), (15)
k=0
where

1
cos (xt)H k (x) e(1/2)x dx,
2
ak (t) =
2
the quality of the approximation can be assessed by the L2 error:

N

1/2

E[ cos (xt)]
2
ak (t)
2
.
k=0
In Fig. 2, we compare the variation in time of the L2 norm of the function cos (xt) (solid line) with the L2 norm
of the finite sum (15) with N = 7 terms (dashed line). The dotted lines represent the contributions from ak2 (t)
for k = 1, 3, 5, 7; by symmetry a2k (t) = 0. The four graphs correspond to four different values of the parameter
. The finite sums approximate cos (xt) well for short times; the larger t, more modes are needed for an accurate
approximation. These graphs demonstrate that a proper scaling of the basis is important for accuracy with an
acceptable number of terms.
We now return to the model problem (8) and approximate the conditional expectation E[1 (x, t)|x] by a finite-rank
given by (11). The functions c1 (t) are evaluated by averaging the products
projection of the form (13), with the h (x)
1 (x, t)h (x)
over a collection of 5104 numerical solutions of (8) with initial conditions drawn from the canonical
distribution.
Fig. 2. Solid lines: the variation of the L2 norm of cos (xt) in the inner product space (14). Dashed lines: the L2 norm of the finite-rank
approximation (15) with N = 7 terms. Dotted lines: ak2 (t) for k = 1, 3, 5, 7. The four plots correspond to: (a) = 0, (b) = 0.5, (c) = 1,
and (d) = 2.
Fig. 3. c1 (t) versus t for = 0 and various values of . The two dominant functions correspond to = (1, 0) and = (0, 1).
In Fig. 3, we plot several of the functions c1 (t) with = 0, and observe that the two dominant contributions
come from the components = (1, 0) and = (0, 1). This is consistent with the assumption often made in physics
that the lower order terms are the most significant.
In Fig. 4, we compare the mean solution, E[1 (x, t)|x], x = (1, 0), generated by Monte Carlo sampling (dotted
lines), with the finite-rank projection (13) (solid lines). The top two graphs correspond to = 0 and to (a) 3 3 and
(b) 6 6 basis functions. In the first case, the finite-rank approximation deviates from the true solution already at
short times, although the qualitative properties of the mean solution are well captured. With four times as many basis
functions, better accuracy is preserved for a long time. In Fig. 4 (c) and (d), the value of is modified, indicating
how a significant reduction in computational effort can be obtained by scaling the basis functions.
Finally, note that the integrals that define the coefficients in the expansions are over all the components of x while
the basis functions are functions only of x; therefore the series expansion is an expansion, not of (x, t) which may
be very noisy, but of E[(x, t)|x] which is much smoother.
5. The Markovian term in the Langevin equation
We now examine the Markovian term R((x, t)) in the Langevin equation (6). For the model problem (8) and P
the conditional expectation, this term can be calculated explicitly:

2 (x, t)

R((x,
t)) = 1 .
1 (x, t) 1 +
1 + 12 (x, t)

Fig. 4. Comparison of the mean solution E[1 (x, t)|x] (dotted lines) and the finite-rank approximation, I c1 (t)h (x) (solid lines) for the
initial data x = (1, 0). The different graphs correspond to: (a) I = {0, 1, 2}2 and = 0, (b) I = {0, . . . , 5}2 and = 0, (c) I = {0, 1, 2}2 and
= 1, (d) I = {0, . . . , 5}2 and = 2.
This expression is a function of all the components of x, not only of the m = 2 that we wish to work with. Next we
apply the projection P and we do so now by interchanging the evaluations of R and P :
P R((x,
t) R(P (x,
t)).
The reader should not be horrified by the commutation of an average with a nonlinear function. The randomness in the
problem is a reflection of the unresolved degrees of freedom in x. One alternative to our methodology is neglecting
these degrees of freedom, which removes the randomness and makes the commutation perfectly legitimate. One
is better off preserving these degrees of freedom and mistreating them slightly rather than omitting them. Another
possible construction, more accurate but more expensive, consists of storing samples of R((x, t)) for initial data
drawn from the initial distribution and then projecting R just as we projected (x,
t) in the previous section.
In what was called first-order optimal prediction in [35], the second and third terms in the generalized Langevin
equation (6) are dropped; writing (t) = P (x, t) one obtains the approximate equations:
d
(t) = R((t)), (0) = x.
(16)
dt
A convergence proof for this approximation applied to a nonlinear partial differential equation can be found in [10].
For the model problem (8), (x, t) = (1 (x,
t), 2 (x,
t)), and the first-order optimal prediction equations are

d d 1
1 = 2 , 2 = 1 1 + . (17)
dt dt 1 + 12
As observed in [6], Eq. (16) are Hamiltons equations derived from the effective Hamiltonian

= log eH (x) dx
H(x)
provided that if a variable is resolved so is its canonically conjugate variable.

In Fig. 5, we compare 1 (t), obtained by the integration of Eq. (17) with initial conditions (0) = (1, 0), to
the function E[1 (x, t)|x],
x = (1, 0), resulting from the Monte Carlo sampling. The discrepancy between the two
curves is due to the omission of the memory, see [7,6].
While first-order optimal prediction is accurate (in fact, optimal) only for short times, it may be exploited for
longer times as a numerical predictor to improve the convergence rate of the finite-rank approximation (13). The
system (16) is Hamiltonian and in particular time reversible, hence the flow map x (x, t) is continuous and
invertible (although it may be very complex). Thus, the mean solution E[(x, t)|x]
can be written as a function
of (x, t). Since (x, t) approximates E[(x, t)|x]
well for short times, a finite-rank expansion of the latter in
terms of the former may exhibit better convergence. The change of variables from x to (x, t) is made easier for a
Hamiltonian system by the following observation: If h (x), I , are orthonormal functions in the inner product
space (9), and (x, t) is the solution of (16), then the composite functions h [(x, t)], t fixed, are orthonormal
with respect to the same inner product. Indeed, let , I . Fix t, and consider

H (x) H (x)
0 = h [(x, t)]h [(x, t)] e dx = h [(x, t)]h [(x,
t)] e dx dx

= h [(x, t)]h [(x, t)] eH(x)
dx.

Change variables: y = (x, t). Since the map x (x,

t) is Hamiltonian, it preserves the Lebesgue measure
dx = dy and the effective Hamiltonian H(x) = H((x, t)). Thus,

0 = h (y)h (y)
eH(y)
dy = h (y)h
(y)
eH (y) dy = .
Fig. 5. Comparison of the mean solution E[1 (x, t)|x]

(solid line) and the component 1 (t) of the solution of the optimal prediction equation
(17) (dashed line) for the initial data x = (1, 0).
One can therefore replace the finite-rank projection (13) by

E[j (x, t)|x]
(j (y, t), h [(y,
t)])h [(x,
t)].
I
6. Orthogonal dynamics and the memory kernel
We turn now to a general formalism for the evaluation of the noise F (x, t) and the memory kernel K(x,
t). For
each j m, the component Fj (x, t) of the noise is the solution of the orthogonal dynamics equation

Fj (x, t) = QLFj (x, t) = LFj (x, t) PLFj (x, t), (18)
t
Fj (x, 0) = QLxj = Rj (x) Rj (x).

Each Fj is computed independently of the others. Eq. (18) is equivalent to the Dyson formula:
t
Fj (x, t) = etL Fj (x, 0) e(ts)L PLFj (x, s) ds. (19)
0
This is a Volterra integral equation for Fj (x, t), which we next decompose into its components in an orthonormal
basis: Let P be a finite-rank projection. Then,

Kj (x,
s) = PLFj (x, s) = aj (s)h (x),
(20)
I
where
aj (s) = (LFj (s), h ).
Consequently,

e(ts)L PLFj (x, s) = aj (s)h ((x,
t s)).
I
Substitute this equation into (19), multiply both sides by L, and take the inner product with h ; the result is
t
(LFj (t), h ) = (L etL Fj (0), h ) aj (s)(L e(ts)L h , h ds.
0 I
This is a Volterra integral equation for the functions aj (t), which can be rewritten as follows:
t

aj (t) = fj (t) aj (s)g (t s) ds, (21)
0 I
where

fj (t) = (L etL Fj (0), h ), g (t) = (L etL h , h ).
The functions fj (t), g (t) can be found by averaging over a collection of experiments or simulations, with initial
conditions drawn from the initial distribution without reference to any specific realization; they are time-correlation
functions (i.e., inner products of the form (etL g1 , g2 ) for some functions (g1 , g2 ) and not dependent on the orthogonal
dynamics. The number of components a depends only on the resolution needed in the space of functions of x.
Once the aj (t) have been calculated, the memory kernel is given by (20).
Finally, we perform the projection (7) of the MoriZwanzig equation. The finite-rank projection of the memory
term
t
P e(ts)L Kj (x, s) ds
0
is
t
aj (s) (t s)h (x)
ds,
0 ,I
where
(t) = (etL h , h ).
This implementation of P relies on an expansion of the several h ((x, t)) in scaled Hermite series and cannot be
expected to be accurate with a moderate number of terms.

To formulate the algorithms compactly we introduce the matrices A, F, G, : Aj = aj , Fj = fj , G = g ,
and = , for 1 j m and , I . First let F, G be given and solve
t
A(t) = F (t) A(s)G(t s) ds.
0
The function P (x,

t) is approximated by (x,
t) obtained by solving either
t
d
(t) = R((t)) + A(s)h((t s)) ds, (0) = x, (22)
dt 0

(thick line) and the components 1 (t) obtained by solving Eq. (22) (thin line) and Eq. (23)
(dashed line). The initial data are x = (1, 0).
or
t
d
(t) = R((t)) + A(s) (t s)h(x)
ds, (0) = x.
(23)
dt 0
Some results are shown in Fig. 6. The thick line represents the exact solution E[1 (x, t)|x], the thinner line is
the function 1 (t) resulting from the integration of (22), whereas the dashed line results from the integration of
(23). The quadratures in the Volterra equation were performed by the trapezoidal rule, 21 basis functions h were
used, the parameter in the Hermite expansion was = 7/6, the ordinary differential equations were solved by
a RungeKutta method, modified to take into account the integral term. The Monte Carlo summations used 104
sample solutions.
To improve the accuracy of these calculations one can: (1) increase the number of Monte Carlo samples, (2)
decrease the time step, (3) allow the coefficient alpha to change with time, (4) truncate the integral kernel at an
appropriate .
7. Short-memory approximations
In some situations of interest one can expect the support in time of the integrand in Eq. (7) to be small, and this
can simplify the calculations. To analyze this situation, start from the Dyson formula (5) which can be the starting
point of a perturbative evaluation of etQL . The zeroth-order approximation of this relation is
etQL
= etL , (24)
in which one replaces the flow in the orthogonal complement of the range of P by the real flow induced by L. Now
consider the second term in Eq. (6):
t t
e(ts)L PL esQL QLxj ds = e(ts)L PLQ esQL QLxj ds,
0 0
where the insertion of the extra Q is legitimate since esQL maps functions in the null space of P back into the same
subspace. Adding and subtracting equal quantities, we find:
PL esQL QLxj = PLQ esL QLxj + PLQ(esQL esL )QLxj ,
a Taylor series yields:
esQL esL = I + sQL + I sL = sPL + O(s 2 ),
and therefore, using QP = 0, we find

t t
(ts)L
e PL e QLxj ds =
sQL
e(ts)L PLQ esL QLxj ds + O(t 3 ).
0 0
If P is the finite-rank projection then

PL esQL QLxj = (QL esQL QLxj , h )h (x).

I
If the correlations (L esQL QLxj , h ) are significant only over short times, the approximation (24) provides an
acceptable approximation without requiring the solution of the orthogonal dynamics equation. In statistical physics,
one often makes an even more drastic approximation, in which it is assumed that the correlations vanish for t = 0
(see, e.g. the high frequency approximation in [12, p. 86]). Some applications of the short-memory approximation
have been presented in [1].
8. The t-damping equation
A short-memory approximation of particular interest can be derived as follows: Write the memory term of the
Langevin equation
t
Kj ((x,
t s), s) ds. (25)
0
Expand the integrand in a Taylor series about s = 0, retaining only the leading term. The memory term reduces to
t
Kj ((x,
t), 0) ds = tSj ((x,
t)), (26)
0
where
Sj (x) = PLQLxj .
An alternative derivation uses the formalism of the previous section: write (25) as
t t t
e(ts)L PL esQL QRj (x) = L e(ts)L esQL QRj (x) e(ts)L esQL QLQRj (x),
0 0 0
where we have used the commutation of L and QL with etL and etQL , respectively. At this point, make the ap-
proximation (24) and replace the evolution operator of the orthogonal dynamics by the evolution operator of the
Liouvillian flow, which eliminates the s dependence of both integrands, and (26) follows readily.
Writing (t) P (x,
t), the t-damping equations are
d
(t) = R((t)) + tS((t)), (0) = x.
(27)
dt
The form of this equation is surprising. All that remains of the memory is the coefficient t and one is left with a
non-autonomous system of ordinary differential equations. No previously computed averages are needed.
For the model problem (8) Eq. (27) takes the explicit form

d d 1 12 2
1 = 2 , 2 = 1 1 + 2t . (28)
dt dt 1 + 12 (1 + 12 )2
In Fig. 7, we present a comparison of the component 1 (t) of the solution of Eq. (28) with E[1 (x, t)|x].
The new
term, tS((t)), leads to a damping of the solution.
We now show that the last term in Eq. (28), which approximates the memory term, leads to a decay just like the
original term did. Set q = (q,
q),
p = (p, p),
and write L, P , H as
n

L= (Hpj qj Hqj pj ), (29)
j =1

(thick line) and the component 1 (t) of the solution of the t-damping equation (28) (thin
line) for the initial data x = (1, 0).

f (q, p) eH (q,p) dq dp
(Pf )(q,
p)
= , (30)
eH (q,p) dq dp
and

H(q,
p)
= log eH (q,p) dq dp.
(31)
The subscripts in (29) represent differentiation.
Theorem 8.1. Suppose H (q, p) = T (p) + V (q). Let

p i = (PL + tPLQL)pi , qi = (PL + tPLQL)qi (32)
for i = 1, 2, . . . , m with (q(0),
p(0))
given. Then,
2 2
m m
d
H(q,
p) = tP (Hpj Hpj )Hqj + (Hqj Hqj )Hpj .

dt j =1 j =1
Thus, if the Hamiltonian H (q, p) is separable into a kinetic and a potential energy, then the effective Hamiltonian
H for the solution of the t-damping equation (32) is a decreasing function of time. Similar results can be obtained
for Fourier methods applied to Eulers equations in two- and three-dimensional flows.
Proof. We shall write Eq. (32) in terms of H and H. It follows from (29)(31) that
Lpi = Hqi , PLpi = PHqi = Hqi , QLpi = (I P )Lpi = Hqi + Hqi
for i = 1, 2, . . . , m. Since Hqi and Hqi do not depend on pi (due to the assumed separability of H ) Eqs. (29) and
(30) imply
n

PLQLpi = P [Hpj (Hqi Hqi )qj ]. (33)
j =1
To proceed we write H (q, p) = T (p) + V (q) and get

Hpj eT (p) dp (Hqi Hqi )qj eV (q) dq
P [Hpj (Hqi Hqi )qj ] = T (p) = PHpj P (Hqi Hqi )qj . (34)
e dp eV (q) dq
For j > m integration by parts gives
H
(e )pj dq dp
PHpj = H = 0. (35)
e dq dp
Using (30) we see that
qj P (Hqi Hqi ) = P (Hqi Hqi )qj + P [(Hqi Hqi )(Hqj )] + P (Hqi Hqi ) (1) P (Hqj ).
But Hqi does not depend on q,
p and P (Hqi Hqi ) = 0. Thus,
P (Hqi Hqi )qj = P [(Hqi Hqi )(Hqj Hqj )]. (36)
Combining (33)(36) yields

m

PLQLpi = P (Hqi Hqi ) (Hqj Hqj )Hpj .
j =1
The corresponding expression for PLQLqi is

m
PLQLqi = P (Hpi Hpi ) (Hpj Hpj )Hqj .
j =1
We can now rewrite the system (32) as
p i = Hqi + tPLQLpi , qi = Hpi + tPLQLqi .
Using the chain rule we find

m m
d
H(q,
p)
= (Hqi qi + Hpi p i ) = t (Hqi PLQLqi + Hpi PLQLpi )
dt
i=1 i=1

m m
= tP (Hpi Hpi )Hqi (Hpj Hpj )Hqj
i=1 j =1

m m

tP (Hqi Hqi )Hpi (Hqj Hqj )Hpj .
i=1 j =1
This completes the proof.
9. Conclusions
We have presented a general method for finding the best approximation of part of the solution of a partially
resolved initial value problem, conditioned by partial initial data. We presented a formula, Eq. (6), which describes
exactly the evolution in time of a few components of an underresolved problem. We have also introduced a collection
of methods for approximating the general formula, in general but not always requiring prior information obtainable
by Monte Carlo or by experiment but without reference to specific initial data. Some of the methods are expensive in
practice but provide theoretical insight; some require additional information, for example about correlation times,
which may be available; some are limited in accuracy by the properties of expansions in orthogonal functions.
Many variants are possible. Theoretical issues remain to be discussed in greater depth. The existence of orthogonal
dynamics is established in [22].
We applied the methods to a model problem which is easy to understand and analyze but is not typical of the
problems one encounters in practice. We have not discussed properties of real problems such as separation of scales
or convergence to a partial differential equation which help the application of our methods. We have not explored
the effect of the choice of initial distribution, for example the possible advantages of a microcanonical distribution.
Once the general framework has been established, further variants and assumptions are best discussed within the
context of specific applications.
Acknowledgements
We would like to thank Prof. G.I. Barenblatt, Dr. E. Chorin, Mr. E. Ingerman, Dr. A. Kast, Mr. K. Lin, Mr. P.
Okunev, Mr. B. Seibold, Mr. P. Stinis, Prof. J. Strain, and Prof. B. Turkington for helpful discussions and comments.
This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy
Research of the US Department of Energy under Contract DE-AC03-76-SF00098, and in part by the National
Science Foundation under Grant DMS98-14631. RK was supported by the Israel Science Foundation founded by
the Israel Academy of Sciences and Humanities and by the Alon Fellowship.
References
[1] J. Bell, A. Chorin, W. Crutchfield, Stochastic optimal prediction with application to averaged Euler equations, in: C. Lin (Ed.), Proceedings
of the Seventh National Conference on Comput. Fluid Mech., Pingtung, Taiwan, 2000, pp. 113.
[2] A. Chorin, Probability, Mechanics, and Irreversibility, Lecture Notes, Mathematics Department, University of California, Berkeley, CA,
2000.
[3] A. Chorin, A. Kast, R. Kupferman, Optimal prediction of underresolved dynamics, Proc. Natl. Acad. Sci. USA 95 (1998) 40944098.
[4] A. Chorin, A. Kast, R. Kupferman, On the prediction of large-scale dynamics using unresolved computations, AMS Contemporary Math.
238 (1999) 5375.
[5] A. Chorin, A. Kast, R. Kupferman, Unresolved computation and optimal prediction, Comm. Pure Appl. Math. 52 (1999) 12311254.
[6] A. Chorin, O. Hald, R. Kupferman, Optimal prediction and the MoriZwanzig representation of irreversible processes, Proc. Natl. Acad.
Sci. USA 97 (2000) 29682973.
[7] A. Chorin, R. Kupferman, D. Levy, Optimal prediction for Hamiltonian partial differential equations, J. Comp. Phys. 162 (2000) 267297.
[8] N. Goldenfeld, A. Mckane, Q. Hou, Block spins for partial differential equations, J. Statist. Phys. 93 (1998) 699714.
[9] O. Hald, Optimal prediction of the KleinGordon equation, Proc. Natl. Acad. Sci. USA 96 (1999) 47744779.
[10] O. Hald, R. Kupferman, Convergence of optimal prediction for nonlinear Hamiltonian systems, SIAM J. Numer. Anal. 39 (2001) 9831000.
[11] A. Kast, Optimal prediction of stiff oscillatory mechanics, Proc. Natl. Acad. Sci. USA 97 (2000) 62536257.
[12] D. Evans, G. Morriss, Statistical Mechanics of Nonequilibrium Liquids, Academic Press, London, 1990.
[13] E. Fick, G. Sauerman, The Quantum Statistics of Dynamical Processes, Springer, Berlin, 1990.
[14] H. Mori, Transport, collective motion, and Brownian motion, Prog. Theoret. Phys. 33 (1965) 423450.
[15] R. Zwanzig, Nonlinear generalized Langevin equations, J. Statist. Phys. 9 (1973) 215220.
[16] R. Zwanzig, Problems in nonlinear transport theory, in: L. Garrido (Ed.), Systems far from equilibrium, Springer, New York, 1980,
pp. 198225.
[17] A. Chorin, O. Hald, R. Kupferman, Non-Markovian optimal prediction, Monte Carlo Meth. Appl. 7 (2001) 99109.
[18] E. Hopf, Statistical hydromechanics and functional calculus, J. Rat. Mech. Anal. 1 (1952) 87114.
[19] A. Pazy, Semigroups of Linear Operators and Applications to Partial Differential Equations, Springer, New York, 1983.
[20] K. Chung, A Course in Probability Theory, Academic Press, New York, 1974.
[21] D. Forster, Hydrodynamic Fluctuations, Broken Symmetry and Correlation Functions, Benjamin, Reading, MA, 1975.
[22] D. Givon, O. Hald, R. Kupferman, Existence of orthogonal dynamics, Preprint, 2001.
[23] A. Chorin, Hermite expansions in Monte-Carlo computations, J. Comp. Phys. 8 (1971) 472482.
[24] D. Gottlieb, S. Orszag, Numerical Analysis of Spectral Methods, SIAM, Philadelphia, PA, 1977.

Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman

Uploaded by

Copyright:

Available Formats

Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman

Uploaded by

Copyright:

Available Formats

Physica D 166 (2002) 239257

Optimal prediction with memory

Received 9 July 2001; accepted 21 February 2002

2. Projections of dynamical systems and Langevin equations

Consider a system of ordinary differential equations,

E[|f E[f |x]|

where Q = I P . Define Rj (x) = (PRj )(x);

which can be checked by differentiation. Hence,

Fj (x, t) = etQL QLxj , Kj (x,

Note that multiplication by P always yields a function of x.

This is an identity, which in a more transparent form reads

H (x) = 21 x12 + x22 + x32 + x42 + x12 x32

Fig. 1. Mean solutions E[1 (x, t)|x]

= c1 (f, x1 )x1 + (f, x2 )x2 .

(3) For functions f, g that depend only on x,

where = (1 , 2 ), 1,2 = 0, 1, . . . , and

4. Conditional expectations calculated from previous measurements

where the inner products are integrations over .

the quality of the approximation can be assessed by the L2 error:

5. The Markovian term in the Langevin equation

provided that if a variable is resolved so is its canonically conjugate variable.

Change variables: y = (x, t). Since the map x (x,

Fig. 5. Comparison of the mean solution E[1 (x, t)|x]

One can therefore replace the finite-rank projection (13) by

6. Orthogonal dynamics and the memory kernel

The function P (x,

Fig. 6. Comparison of the mean solution E[1 (x, t)|x]

PL esQL QLxj = PLQ esL QLxj + PLQ(esQL esL )QLxj ,

a Taylor series yields:

esQL esL = I + sQL + I sL = sPL + O(s 2 ),

and therefore, using QP = 0, we find

If P is the finite-rank projection then

8. The t-damping equation

Fig. 7. Comparison of the mean solution E[1 (x, t)|x]

The subscripts in (29) represent differentiation.

Theorem 8.1. Suppose H (q, p) = T (p) + V (q). Let

To proceed we write H (q, p) = T (p) + V (q) and get

The corresponding expression for PLQLqi is

We can now rewrite the system (32) as

p i = Hqi + tPLQLpi , qi = Hpi + tPLQLqi .

Using the chain rule we find

This completes the proof.

You might also like