Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
Optimal Prediction With Memory: Alexandre J. Chorin, Ole H. Hald, Raz Kupferman
Abstract
Optimal prediction methods estimate the solution of nonlinear time-dependent problems when that solution is too complex
to be fully resolved or when data are missing. The initial conditions for the unresolved components of the solution are
drawn from a probability distribution, and their effect on a small set of variables that are actually computed is evaluated via
statistical projection. The formalism resembles the projection methods of irreversible statistical mechanics, supplemented
by the systematic use of conditional expectations and new methods of solution for an auxiliary equation, the orthogonal
dynamics equation, needed to evaluate a non-Markovian memory term. The result of the computations is close to the best
possible estimate that can be obtained given the partial data. We present the constructions in detail together with several useful
variants, provide simple examples, and point out the relation to the fluctuationdissipation formulas of statistical physics.
2002 Elsevier Science B.V. All rights reserved.
PACS: 05.10.a; 05.40.a
Keywords: Optimal prediction; Memory; Langevin equations; Orthogonal dynamics; Underresolution; Hamiltonian systems; Hermite
polynomials
1. Introduction
Many problems in science and engineering are described by nonlinear differential equations whose solutions
are too complicated to be properly resolved and/or where needed data are unavailable. The problem of predicting
the evolution of such systems has been addressed by the present authors and others in [111]. Nothing can be
predicted without some knowledge about the unresolved (subgrid) degrees of freedom. In the papers just cited it
is assumed that one possesses, as one often does, prior statistical information about the system in the form of an
initial probability distribution; what is sought is a mean solution with respect to this initial distribution, compatible
with the partial information available initially as well as with the limitations on the computing power one can bring
to bear. This mean solution is the conditional expectation of the solution given the partial initial data, and is the best
available estimate of the solution of the full problem.
Corresponding author.
E-mail addresses: chorin@math.berkeley.edu (A.J. Chorin), hald@math.berkeley.edu (O.H. Hald), raz@math.huji.ac.il (R. Kupferman).
0167-2789/02/$ see front matter 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 2 7 8 9 ( 0 2 ) 0 0 4 4 6 - 3
240 A.J. Chorin et al. / Physica D 166 (2002) 239257
The simplest construction of this conditional expectation, first-order optimal prediction ([35], see also below),
produces a small system of ordinary differential equations and works well for a time that depends on the degree of un-
derresolution and on the uncertainty in the initial conditions. This approximation is optimal in the class of Markovian
approximations [10], but eventually exhibits errors because the influence of partial initial data on the distribution of
the solutions weakens in time, and this loss of information is not captured, see [6,7]. As shown in the present paper,
an accurate estimate of a subset of variables requires the addition of a memory term, and the resulting prediction
scheme becomes a generalized Langevin equation, similar to those in irreversible statistical mechanics [1216].
Some of the relations between conditional expectations and irreversible statistical mechanics have been discussed
in [6]. The memory depends on a solution of an auxiliary equation, the orthogonal dynamics equation, and in the
present paper we also present algorithms for finding this solution. We also explain how the machinery can lead to
novel ways of using prior measurements to predict the future behavior of complex systems. We apply our methods
to a simple model problem. Related, partial and more heuristic, constructions have been presented in [1,17].
with Hamiltonian H , one can use as initial distribution the canonical distribution with density (x) = Z 1 eH (x) ,
where Z is a normalization constant. Hamiltonian systems are often of interest, and the canonical distribution is
often natural for physical reasons. These choices simplify parts of the analysis.
Given , functions on can be viewed as random variables, and one can use the terminology of probability
theory. We define the expected value of g by
E[g] = g(x)(x) dx.
We endow the space of functions on with the inner product (f, g) = E[fg], which makes it a Hilbert space
L2 (, ) (L2 for brevity). If (1) is a Hamiltonian system and the probability density is (x) = Z 1 exp(H (x)),
then the operator L is skew-adjoint in this Hilbert space.
We now derive an equation, often referred to as the generalized Langevin equation, which is a reformulation of
the equations of motion (1) for the resolved variables (x,
t). The derivation uses projection operators: functions
in L2 are projected onto the space L 2 of functions of the m-dimensional vector x.
Several different projections P
are considered:
(1) Let f L2 , and consider the orthogonal projection of f onto the span of all functions of x,
given by
f (x)(x) dx
=
(Pf )(x) , dx = dxm+1 dxn .
(x) dx
In the language of probability, (Pf )(x)
is the conditional expectation of f given x and is denoted by E[f |x];
see [20]. E[f |x]
is the best approximation of f by a function of x:
for all functions h in L 2 . P is the nonlinear projection, used in [16] with a different interpretation, as well as
in [3,4,6].
(2) Given f L2 , define
m
(Pf )(x)
= aij1 (f, xi )xj ,
i,j =1
where aij1 are the entries of a matrix whose inverse has entries aij = (xi , xj ). This is the linear projection
widely used in irreversible statistical mechanics (see [12,13,21]).
say h (x),
(3) More generally, pick a set of functions of x, = 1, . . . , M; for convenience, make them orthonormal:
(h , h ) = . Define a projection
M
(Pf )(x)
= (f, h )h (x).
=1
If the h span L 2 as M increases, the result approximates the conditional expectation E[f |x]. This is the
finite-rank projection; it interpolates between the linear projection and the conditional expectation.
We now follow the MoriZwanzig procedure [12,14,15,21]. We consider the equation of motion for a resolved
coordinate j (x, t) = etL xj , and split the time derivative, Rj ((x, t)) = etL Lxj , into a term that depends only on
(x,
t) plus a remainder:
tL
e xj = etL Lxj = etL PLxj + etL QLxj , (3)
t
242 A.J. Chorin et al. / Physica D 166 (2002) 239257
Let
The various terms in Eq. (6) have conventional interpretations. The first term on the right-hand side is the Markovian
contribution to t j (x, t)it depends only on the instantaneous value of the resolved (x, t). The second term
depends on x through the values of (x, s) at times s between 0 and t, and embodies a memorya dependence
on the past values of the resolved variables. Finally, the third term, which depends on full knowledge of the initial
conditions x, lies in the null space of P and can be viewed as noise with statistics determined by the initial conditions.
The fact that the memory depends on the noise is known in the physics literature as a fluctuationdissipation theorem.
The specific form of this relation given in physics books is obtained when P is the linear projection.
The last step is the multiplication of (6) by P :
t
P j (x, t) = P Rj ((x,
t)) + PKj ((x,
t s), s) ds. (7)
t 0
This identity involves only the known components x of the initial data. When P is the conditional expectation,
P (x,
t) = E[(x,
t)|x],
the right-hand side of Eq. (7) is exactly what we want: the derivative of the average of
(x,
t) conditioned by the initially given data.
Eqs. (6) and (7) are identities; their solution is exactly equivalent to the solution of the full problem (1) followed
by averaging. In practice these equations have to be solved approximately; we shall show below how to perform the
approximation term by term.
A.J. Chorin et al. / Physica D 166 (2002) 239257 243
3. A model problem
We introduce a model problem to illustrate the formalism of the previous section and to test the accuracy of
various approximations described below.
Consider the dynamical systems in = R4 :
d d d d
1 = 2 , 2 = 1 (1 + 32 ), 3 = 4 , 4 = 3 (1 + 12 ). (8)
dt dt dt dt
Eq. (8) are derived from the Hamiltonian
with (x1 , x2 ) and (x3 , x4 ) canonical pairs of coordinates. This system describes two oscillators with a nonlinear
coupling.
We take initial data randomly distributed with the canonical probability density (x) = Z 1 eH (x) , thus en-
dowing the space of functions on with the inner product
f (x)g(x) eH (x) dx
(f, g) = , (9)
eH (x) dx
where dx = dx1 dx4 . We retain only two of the four variables, 1 and 2 , thus x = (x1 , x2 ) and x = (x3 , x4 ).
The goal is to compute the average of 1 (x, t) and 2 (x, t) over all initial choices of x3 and x4 .
In Fig. 1, we plot the time evolution of the mean values of 1 (x, t) and 2 (x, t), given 1 (x, 0) = 1 and
2 (x, 0) = 0. This graph was obtained by a Monte Carlo calculation: we generated a collection of 5 104 initial
conditions x by fixing x = (1, 0) and sampling x3 ,x4 from the canonical distribution. Each initial datum was evolved
in time with an ODE solver. At each t > 0 the mean values of 1 (x, t) and 2 (x, t) were evaluated by averaging
over the set of solutions. Note the decay of the mean solution towards its equilibrium value; this phenomenon has
been explained in [6].
We now write down explicitly each of the projections defined in Section 2:
(1) The conditional expectation of a function f (x) is
f (x) eH (x) dx
(Pf )(x)
= E[f |x] = H (x) (10)
e dx
with dx = dx3 dx4 . The density eH (x) is Gaussian when x1 , x2 are fixed, thus the integrals in (10) can often
be calculated explicitly. For example,
2n 2 n 2n
Px2n
3 = (1 + x 1 ) , Px2n
4 = , n = 1, 2, . . . .
2 2
(2) It is easy to verify that (x1 , x1 ) = c = 0.715, (x2 , x2 ) = 1, and (x1 , x2 ) = 0, and so the linear projection is
(f, g) = f (x)g(
x) dx,
2 1 + x12
where
e(1/2)x1 e(1/2)x2
2 2
1
Z= dx = 0.78964.
2 1 + x12
Let > 1/2 be a parameter. For each fixed value of the following family of functions constitutes an orthonormal
basis in the space of square integrable functions of x:
= Z 1/2 (1 + x12 )1/4 H 1 (x1 )H 2 (x2 ),
h (x) (11)
Here the Hk (z) are Hermite polynomials satisfying the recursion relation
1 k1
H0 (z) = 1, H1 (z) = z, Hk (z) = zHk1 (z) Hk2 (z).
k k
For future use, we also note that
d
Hk (z) = 1 + 2 H k1 (z) zH k (z). (12)
dz
The span of a finite collection of these functions changes when changes; we will use the freedom of choosing
to optimize the rate at which the finite-rank projection converges to the conditional expectation.
A.J. Chorin et al. / Physica D 166 (2002) 239257 245
Consider the problem (1) and suppose we know a large number of its solutions (x, t) for various initial conditions
x drawn from the distribution ; these solutions may come from prior Monte Carlo computation or from experiment.
The best L2 estimate of the solution when x is given is E[(x, t)|x].
Let h (x),
in some finite index set I , be a set of orthonormal basis functions; we approximate the conditional
expectation E[j (x, t)|x]
by a finite-rank projection
E[j (x, t)|x]
= (j (t), h )h (x)
cj (t)h (x),
(13)
I
where
1
cos (xt)H k (x) e(1/2)x dx,
2
ak (t) =
2
In Fig. 2, we compare the variation in time of the L2 norm of the function cos (xt) (solid line) with the L2 norm
of the finite sum (15) with N = 7 terms (dashed line). The dotted lines represent the contributions from ak2 (t)
for k = 1, 3, 5, 7; by symmetry a2k (t) = 0. The four graphs correspond to four different values of the parameter
. The finite sums approximate cos (xt) well for short times; the larger t, more modes are needed for an accurate
approximation. These graphs demonstrate that a proper scaling of the basis is important for accuracy with an
acceptable number of terms.
We now return to the model problem (8) and approximate the conditional expectation E[1 (x, t)|x] by a finite-rank
given by (11). The functions c1 (t) are evaluated by averaging the products
projection of the form (13), with the h (x)
1 (x, t)h (x)
over a collection of 5104 numerical solutions of (8) with initial conditions drawn from the canonical
distribution.
246 A.J. Chorin et al. / Physica D 166 (2002) 239257
Fig. 2. Solid lines: the variation of the L2 norm of cos (xt) in the inner product space (14). Dashed lines: the L2 norm of the finite-rank
approximation (15) with N = 7 terms. Dotted lines: ak2 (t) for k = 1, 3, 5, 7. The four plots correspond to: (a) = 0, (b) = 0.5, (c) = 1,
and (d) = 2.
A.J. Chorin et al. / Physica D 166 (2002) 239257 247
Fig. 3. c1 (t) versus t for = 0 and various values of . The two dominant functions correspond to = (1, 0) and = (0, 1).
In Fig. 3, we plot several of the functions c1 (t) with = 0, and observe that the two dominant contributions
come from the components = (1, 0) and = (0, 1). This is consistent with the assumption often made in physics
that the lower order terms are the most significant.
In Fig. 4, we compare the mean solution, E[1 (x, t)|x], x = (1, 0), generated by Monte Carlo sampling (dotted
lines), with the finite-rank projection (13) (solid lines). The top two graphs correspond to = 0 and to (a) 3 3 and
(b) 6 6 basis functions. In the first case, the finite-rank approximation deviates from the true solution already at
short times, although the qualitative properties of the mean solution are well captured. With four times as many basis
functions, better accuracy is preserved for a long time. In Fig. 4 (c) and (d), the value of is modified, indicating
how a significant reduction in computational effort can be obtained by scaling the basis functions.
Finally, note that the integrals that define the coefficients in the expansions are over all the components of x while
the basis functions are functions only of x; therefore the series expansion is an expansion, not of (x, t) which may
be very noisy, but of E[(x, t)|x] which is much smoother.
We now examine the Markovian term R((x, t)) in the Langevin equation (6). For the model problem (8) and P
the conditional expectation, this term can be calculated explicitly:
2 (x, t)
R((x,
t)) = 1 .
1 (x, t) 1 +
1 + 12 (x, t)
248 A.J. Chorin et al. / Physica D 166 (2002) 239257
Fig. 4. Comparison of the mean solution E[1 (x, t)|x] (dotted lines) and the finite-rank approximation, I c1 (t)h (x) (solid lines) for the
initial data x = (1, 0). The different graphs correspond to: (a) I = {0, 1, 2}2 and = 0, (b) I = {0, . . . , 5}2 and = 0, (c) I = {0, 1, 2}2 and
= 1, (d) I = {0, . . . , 5}2 and = 2.
A.J. Chorin et al. / Physica D 166 (2002) 239257 249
This expression is a function of all the components of x, not only of the m = 2 that we wish to work with. Next we
apply the projection P and we do so now by interchanging the evaluations of R and P :
P R((x,
t) R(P (x,
t)).
The reader should not be horrified by the commutation of an average with a nonlinear function. The randomness in the
problem is a reflection of the unresolved degrees of freedom in x. One alternative to our methodology is neglecting
these degrees of freedom, which removes the randomness and makes the commutation perfectly legitimate. One
is better off preserving these degrees of freedom and mistreating them slightly rather than omitting them. Another
possible construction, more accurate but more expensive, consists of storing samples of R((x, t)) for initial data
drawn from the initial distribution and then projecting R just as we projected (x,
t) in the previous section.
In what was called first-order optimal prediction in [35], the second and third terms in the generalized Langevin
equation (6) are dropped; writing (t) = P (x, t) one obtains the approximate equations:
d
(t) = R((t)), (0) = x.
(16)
dt
A convergence proof for this approximation applied to a nonlinear partial differential equation can be found in [10].
For the model problem (8), (x, t) = (1 (x,
t), 2 (x,
t)), and the first-order optimal prediction equations are
d d 1
1 = 2 , 2 = 1 1 + . (17)
dt dt 1 + 12
As observed in [6], Eq. (16) are Hamiltons equations derived from the effective Hamiltonian
= log eH (x) dx
H(x)
We turn now to a general formalism for the evaluation of the noise F (x, t) and the memory kernel K(x,
t). For
each j m, the component Fj (x, t) of the noise is the solution of the orthogonal dynamics equation
Fj (x, t) = QLFj (x, t) = LFj (x, t) PLFj (x, t), (18)
t
Fj (x, 0) = QLxj = Rj (x) Rj (x).
Each Fj is computed independently of the others. Eq. (18) is equivalent to the Dyson formula:
t
Fj (x, t) = etL Fj (x, 0) e(ts)L PLFj (x, s) ds. (19)
0
This is a Volterra integral equation for Fj (x, t), which we next decompose into its components in an orthonormal
basis: Let P be a finite-rank projection. Then,
Kj (x,
s) = PLFj (x, s) = aj (s)h (x),
(20)
I
A.J. Chorin et al. / Physica D 166 (2002) 239257 251
where
aj (s) = (LFj (s), h ).
Consequently,
e(ts)L PLFj (x, s) = aj (s)h ((x,
t s)).
I
Substitute this equation into (19), multiply both sides by L, and take the inner product with h ; the result is
t
(LFj (t), h ) = (L etL Fj (0), h ) aj (s)(L e(ts)L h , h ds.
0 I
This is a Volterra integral equation for the functions aj (t), which can be rewritten as follows:
t
aj (t) = fj (t) aj (s)g (t s) ds, (21)
0 I
where
fj (t) = (L etL Fj (0), h ), g (t) = (L etL h , h ).
The functions fj (t), g (t) can be found by averaging over a collection of experiments or simulations, with initial
conditions drawn from the initial distribution without reference to any specific realization; they are time-correlation
functions (i.e., inner products of the form (etL g1 , g2 ) for some functions (g1 , g2 ) and not dependent on the orthogonal
dynamics. The number of components a depends only on the resolution needed in the space of functions of x.
Once the aj (t) have been calculated, the memory kernel is given by (20).
Finally, we perform the projection (7) of the MoriZwanzig equation. The finite-rank projection of the memory
term
t
P e(ts)L Kj (x, s) ds
0
is
t
aj (s) (t s)h (x)
ds,
0 ,I
where
(t) = (etL h , h ).
This implementation of P relies on an expansion of the several h ((x, t)) in scaled Hermite series and cannot be
expected to be accurate with a moderate number of terms.
To formulate the algorithms compactly we introduce the matrices A, F, G, : Aj = aj , Fj = fj , G = g ,
and = , for 1 j m and , I . First let F, G be given and solve
t
A(t) = F (t) A(s)G(t s) ds.
0
or
t
d
(t) = R((t)) + A(s) (t s)h(x)
ds, (0) = x.
(23)
dt 0
Some results are shown in Fig. 6. The thick line represents the exact solution E[1 (x, t)|x], the thinner line is
the function 1 (t) resulting from the integration of (22), whereas the dashed line results from the integration of
(23). The quadratures in the Volterra equation were performed by the trapezoidal rule, 21 basis functions h were
used, the parameter in the Hermite expansion was = 7/6, the ordinary differential equations were solved by
a RungeKutta method, modified to take into account the integral term. The Monte Carlo summations used 104
sample solutions.
To improve the accuracy of these calculations one can: (1) increase the number of Monte Carlo samples, (2)
decrease the time step, (3) allow the coefficient alpha to change with time, (4) truncate the integral kernel at an
appropriate .
7. Short-memory approximations
In some situations of interest one can expect the support in time of the integrand in Eq. (7) to be small, and this
can simplify the calculations. To analyze this situation, start from the Dyson formula (5) which can be the starting
point of a perturbative evaluation of etQL . The zeroth-order approximation of this relation is
etQL
= etL , (24)
A.J. Chorin et al. / Physica D 166 (2002) 239257 253
in which one replaces the flow in the orthogonal complement of the range of P by the real flow induced by L. Now
consider the second term in Eq. (6):
t t
e(ts)L PL esQL QLxj ds = e(ts)L PLQ esQL QLxj ds,
0 0
where the insertion of the extra Q is legitimate since esQL maps functions in the null space of P back into the same
subspace. Adding and subtracting equal quantities, we find:
If the correlations (L esQL QLxj , h ) are significant only over short times, the approximation (24) provides an
acceptable approximation without requiring the solution of the orthogonal dynamics equation. In statistical physics,
one often makes an even more drastic approximation, in which it is assumed that the correlations vanish for t = 0
(see, e.g. the high frequency approximation in [12, p. 86]). Some applications of the short-memory approximation
have been presented in [1].
A short-memory approximation of particular interest can be derived as follows: Write the memory term of the
Langevin equation
t
Kj ((x,
t s), s) ds. (25)
0
Expand the integrand in a Taylor series about s = 0, retaining only the leading term. The memory term reduces to
t
Kj ((x,
t), 0) ds = tSj ((x,
t)), (26)
0
where
Sj (x) = PLQLxj .
An alternative derivation uses the formalism of the previous section: write (25) as
t t t
e(ts)L PL esQL QRj (x) = L e(ts)L esQL QRj (x) e(ts)L esQL QLQRj (x),
0 0 0
254 A.J. Chorin et al. / Physica D 166 (2002) 239257
where we have used the commutation of L and QL with etL and etQL , respectively. At this point, make the ap-
proximation (24) and replace the evolution operator of the orthogonal dynamics by the evolution operator of the
Liouvillian flow, which eliminates the s dependence of both integrands, and (26) follows readily.
Writing (t) P (x,
t), the t-damping equations are
d
(t) = R((t)) + tS((t)), (0) = x.
(27)
dt
The form of this equation is surprising. All that remains of the memory is the coefficient t and one is left with a
non-autonomous system of ordinary differential equations. No previously computed averages are needed.
For the model problem (8) Eq. (27) takes the explicit form
d d 1 12 2
1 = 2 , 2 = 1 1 + 2t . (28)
dt dt 1 + 12 (1 + 12 )2
In Fig. 7, we present a comparison of the component 1 (t) of the solution of Eq. (28) with E[1 (x, t)|x].
The new
term, tS((t)), leads to a damping of the solution.
We now show that the last term in Eq. (28), which approximates the memory term, leads to a decay just like the
original term did. Set q = (q,
q),
p = (p, p),
and write L, P , H as
n
L= (Hpj qj Hqj pj ), (29)
j =1
Thus, if the Hamiltonian H (q, p) is separable into a kinetic and a potential energy, then the effective Hamiltonian
H for the solution of the t-damping equation (32) is a decreasing function of time. Similar results can be obtained
for Fourier methods applied to Eulers equations in two- and three-dimensional flows.
Proof. We shall write Eq. (32) in terms of H and H. It follows from (29)(31) that
Lpi = Hqi , PLpi = PHqi = Hqi , QLpi = (I P )Lpi = Hqi + Hqi
for i = 1, 2, . . . , m. Since Hqi and Hqi do not depend on pi (due to the assumed separability of H ) Eqs. (29) and
(30) imply
n
PLQLpi = P [Hpj (Hqi Hqi )qj ]. (33)
j =1
9. Conclusions
We have presented a general method for finding the best approximation of part of the solution of a partially
resolved initial value problem, conditioned by partial initial data. We presented a formula, Eq. (6), which describes
exactly the evolution in time of a few components of an underresolved problem. We have also introduced a collection
of methods for approximating the general formula, in general but not always requiring prior information obtainable
by Monte Carlo or by experiment but without reference to specific initial data. Some of the methods are expensive in
practice but provide theoretical insight; some require additional information, for example about correlation times,
which may be available; some are limited in accuracy by the properties of expansions in orthogonal functions.
Many variants are possible. Theoretical issues remain to be discussed in greater depth. The existence of orthogonal
dynamics is established in [22].
We applied the methods to a model problem which is easy to understand and analyze but is not typical of the
problems one encounters in practice. We have not discussed properties of real problems such as separation of scales
or convergence to a partial differential equation which help the application of our methods. We have not explored
the effect of the choice of initial distribution, for example the possible advantages of a microcanonical distribution.
Once the general framework has been established, further variants and assumptions are best discussed within the
context of specific applications.
Acknowledgements
We would like to thank Prof. G.I. Barenblatt, Dr. E. Chorin, Mr. E. Ingerman, Dr. A. Kast, Mr. K. Lin, Mr. P.
Okunev, Mr. B. Seibold, Mr. P. Stinis, Prof. J. Strain, and Prof. B. Turkington for helpful discussions and comments.
A.J. Chorin et al. / Physica D 166 (2002) 239257 257
This work was supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy
Research of the US Department of Energy under Contract DE-AC03-76-SF00098, and in part by the National
Science Foundation under Grant DMS98-14631. RK was supported by the Israel Science Foundation founded by
the Israel Academy of Sciences and Humanities and by the Alon Fellowship.
References
[1] J. Bell, A. Chorin, W. Crutchfield, Stochastic optimal prediction with application to averaged Euler equations, in: C. Lin (Ed.), Proceedings
of the Seventh National Conference on Comput. Fluid Mech., Pingtung, Taiwan, 2000, pp. 113.
[2] A. Chorin, Probability, Mechanics, and Irreversibility, Lecture Notes, Mathematics Department, University of California, Berkeley, CA,
2000.
[3] A. Chorin, A. Kast, R. Kupferman, Optimal prediction of underresolved dynamics, Proc. Natl. Acad. Sci. USA 95 (1998) 40944098.
[4] A. Chorin, A. Kast, R. Kupferman, On the prediction of large-scale dynamics using unresolved computations, AMS Contemporary Math.
238 (1999) 5375.
[5] A. Chorin, A. Kast, R. Kupferman, Unresolved computation and optimal prediction, Comm. Pure Appl. Math. 52 (1999) 12311254.
[6] A. Chorin, O. Hald, R. Kupferman, Optimal prediction and the MoriZwanzig representation of irreversible processes, Proc. Natl. Acad.
Sci. USA 97 (2000) 29682973.
[7] A. Chorin, R. Kupferman, D. Levy, Optimal prediction for Hamiltonian partial differential equations, J. Comp. Phys. 162 (2000) 267297.
[8] N. Goldenfeld, A. Mckane, Q. Hou, Block spins for partial differential equations, J. Statist. Phys. 93 (1998) 699714.
[9] O. Hald, Optimal prediction of the KleinGordon equation, Proc. Natl. Acad. Sci. USA 96 (1999) 47744779.
[10] O. Hald, R. Kupferman, Convergence of optimal prediction for nonlinear Hamiltonian systems, SIAM J. Numer. Anal. 39 (2001) 9831000.
[11] A. Kast, Optimal prediction of stiff oscillatory mechanics, Proc. Natl. Acad. Sci. USA 97 (2000) 62536257.
[12] D. Evans, G. Morriss, Statistical Mechanics of Nonequilibrium Liquids, Academic Press, London, 1990.
[13] E. Fick, G. Sauerman, The Quantum Statistics of Dynamical Processes, Springer, Berlin, 1990.
[14] H. Mori, Transport, collective motion, and Brownian motion, Prog. Theoret. Phys. 33 (1965) 423450.
[15] R. Zwanzig, Nonlinear generalized Langevin equations, J. Statist. Phys. 9 (1973) 215220.
[16] R. Zwanzig, Problems in nonlinear transport theory, in: L. Garrido (Ed.), Systems far from equilibrium, Springer, New York, 1980,
pp. 198225.
[17] A. Chorin, O. Hald, R. Kupferman, Non-Markovian optimal prediction, Monte Carlo Meth. Appl. 7 (2001) 99109.
[18] E. Hopf, Statistical hydromechanics and functional calculus, J. Rat. Mech. Anal. 1 (1952) 87114.
[19] A. Pazy, Semigroups of Linear Operators and Applications to Partial Differential Equations, Springer, New York, 1983.
[20] K. Chung, A Course in Probability Theory, Academic Press, New York, 1974.
[21] D. Forster, Hydrodynamic Fluctuations, Broken Symmetry and Correlation Functions, Benjamin, Reading, MA, 1975.
[22] D. Givon, O. Hald, R. Kupferman, Existence of orthogonal dynamics, Preprint, 2001.
[23] A. Chorin, Hermite expansions in Monte-Carlo computations, J. Comp. Phys. 8 (1971) 472482.
[24] D. Gottlieb, S. Orszag, Numerical Analysis of Spectral Methods, SIAM, Philadelphia, PA, 1977.