The Concept of Stability in Numerical Mathematics
The Concept of Stability in Numerical Mathematics
The Concept of Stability in Numerical Mathematics
Wolfgang Hackbusch
The Concept
of Stability
in Numerical
Mathematics
Springer Series in 45
Computational
Mathematics
Editorial Board
R. Bank
R.L. Graham
J. Stoer
R. Varga
H. Yserentant
123
Wolfgang Hackbusch
MPI für Mathematik in den
Naturwissenschaften
Leipzig, Germany
ISSN 0179-3632
ISBN 978-3-642-39385-3 ISBN 978-3-642-39386-0 (eBook)
DOI 10.1007/978-3-642-39386-0
Springer Heidelberg New York Dordrecht London
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Setting of the Problem and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Quadrature Formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Interpolatory Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.3 Newton–Cotes Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.4 Gauss Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Definitions and Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Functionals, Dual Norm, and Dual Space . . . . . . . . . . . . . . . . 23
3.4 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.1 Amplification of the Input Error . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.2 Definition of Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.3 Stability of Particular Quadrature Formulae . . . . . . . . . . . . . . 26
3.4.4 Romberg Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.5 Approximation Theorem of Weierstrass . . . . . . . . . . . . . . . . . 30
3.4.6 Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
ix
x Contents
4 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1 Interpolation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Convergence and Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Equivalence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Instability of Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.6 Is Stability Important for Practical Computations? . . . . . . . . . . . . . . . 53
4.7 Tensor Product Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.8 Stability of Piecewise Polynomial Interpolation . . . . . . . . . . . . . . . . . 56
4.8.1 Case of Local Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.8.2 Spline Interpolation as an Example for Global Support . . . . . 57
4.9 From Point-wise Convergence to Operator-Norm Convergence . . . . . 59
4.10 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
List of Symbols
Symbols
Greek Letters
xiii
xiv List of Symbols
φ(xi , ηi , [ηi+1 , ]h; f ) function defining an explicit [implicit] one-step method; cf.
§5.1.2, §5.4.1
φ(xj , ηj+r−1 , . . . , ηj , h; f ) function defining a multistep method; cf. (5.20a)
ψ(ζ) characteristic polynomial of a multistep method; cf. (5.21a)
Ωn grid for difference method; cf. §7.2
Latin Letters
analytical tools from functional analysis are involved, namely Weierstrass’ approx-
imation theorem and the uniform boundedness theorem.
Interpolation treated in Chapter 4 follows the same pattern as in Chapter 3. In
both chapters one can pose the question of how important the stability statement
sup Cn < ∞ is, if one wants to perform only one quadrature or interpolation for a
fixed n. In fact, polynomial interpolation is unstable, but when applied to functions
of certain classes it behaves quite well.
This is different in Chapter 5, where one-step and multistep methods for the
solution of ordinary initial-value problems are treated. Computing approxima-
tions requires an increasing number of steps, when the step size approaches zero.
Often an instability leads to exponential growth of an error, eventually causing a
termination due to overflow.
For ordinary differential equations instability occurs only for proper multistep
methods, whereas one-step methods are always stable. This is different for partial
differential equations, which are investigated in Chapter 6. Here, difference methods
for hyperbolic and parabolic differential equations are treated. Stability describes the
uniform boundedness of powers of the difference operators.
Also in the case of elliptic differential equations discussed in Chapter 7, stability
is needed to prove convergence. In this context, stability describes the boundedness
of the inverse of the difference operator or the finite element matrix independently
of the step size.
The final chapter is devoted to Fredholm integral equations. Modern projection
methods lead to a very easy proof of stability, consistency, and convergence. How-
ever, the Nyström method—the first discretisation method based on quadrature—
requires a more involved analysis. We conclude the chapter with the analysis of the
corresponding eigenvalue problem.
Despite the general concept of stability, there are different aspects to consider
in the subfields. One aspect is the practical importance of stability (cf. §4.6),
another concerns a possible conflict between a higher order of consistency and
stability (cf. §3.5.2, Remark 4.15, Theorem 5.47, §6.6, §7.5.9).
Chapter 2
Stability of Finite Algorithms
If Φ is realisable by at least one algorithm, then there are even infinitely many
algorithms of this kind. Therefore, there is no one-to-one correspondence between
a task and an algorithm.
A finite algorithm can be described by a sequence of vectors
(j)
x(0) = (x1 , . . . , xn ) , x(1) , . . . , x(j) = (x1 , . . . , x(j)
nj ), . . . , x
(p)
= (y1 , . . . , ym ) ,
(j)
where the values xi from level j can be computed by elementary operations from
the components of x(j−1) .
Example 2.1. The scalar product y = xx12 , xx34 has the input vector (x1 , . . . , x4 ).
such an infinite process, one finally obtains a finite algorithm producing an approx-
imate result.
Here we only want to motivate the concept that algorithms should be constructed
carefully regarding stability. It remains to analyse various concrete numerical
methods (see, e.g., the monograph of Higham [5]).
The following problem involves the family of Bessel functions. In such a case, one
is well advised to look into a handbook of special functions. One learns that the n-th
Bessel function (also called cylinder function) can be represented by a power series
or as an integral:
∞ k
X (−1) (x/2)n+2k
Jn (x) = for n ∈ N0 (2.2)
k! (n + k)!
k=0
n Z π
(−1)
= eix cos(ϕ) cos(nϕ)dϕ for n ∈ Z. (2.3)
π 0
but not the value of J5 (0.6). Furthermore, assume that the book contains the
recurrence relation
2n
Jn+1 (x) + Jn−1 (x) = Jn (x) for n ∈ Z (2.5)
x
as well as the property
∞
X
Jn (x) = 1 for all x ∈ R. (2.6)
n=−∞
Exercise 2.2. Prove convergence of the series (2.2) for all x ∈ C (i.e., Jn is an
entire function).
An obvious algorithm solving our problem uses the recursion (2.5) for n =
1, 2, 3, 4 together with the initial values (2.4):
2.2 A Paradoxical Example 5
2 2
J2 (0.6) = -J0 (0.6)+ 0.6 J1 (0.6) = -0.9120+ 0.6 0.2867 = 4.3666710 -2,
4 4
J3 (0.6) = -J1 (0.6)+ 0.6 J2 (0.6) = -0.2867+ 0.6 4.3666710 -2 = 4.4111110 -3,
6 6
(2.7)
J4 (0.6) = -J2 (0.6)+ 0.6 J3 (0.6) = -4.3666710 -2+ 0.6 4.4111110 -3 = 4.4444410 -4,
8 8
J5 (0.6) = -J3 (0.6)+ 0.6 J4 (0.6) = -4.4111110 -3+ 0.6 4.4444410 -4 = 1.5148110 -3.
The result is obtained using only eight elementary operations. The underlying equa-
tions are exact. Nevertheless, the computed result for J5 (0.6) is completely wrong,
even the order of magnitude is incorrect! The exact result is J5 (0.6) = 1.9948210 -5.
Why does the computation fail? Are the tabulated values (2.4) misprinted? No,
they are as correct as they can be. Is the (inexact) computer arithmetic, used in (2.5),
responsible for the deviation? No, even exact arithmetic yields the same results. For
those who are not acquainted with numerical effects, this might look like a paradox:
exact computations using exact formulae yield completely wrong results.
and calculating
0 2n 0 0
jn−1 = j − jn+1 for n = m, m − 1, . . . , 0, (2.9)
0.6 n
we get quantities jn0 with the property jn = jm jn0 . Now the unknown value jm
can be determined from (2.6). From (2.2) or even from (2.5) we can derive that
n
J−n (x) = (−1) Jn (x). Hence, (2.6) is equivalent to
2.2.3 Explanation
As stated above, the tabulated values (2.4) are correct, since all four digits offered
in the table are correct. It turns out that we have a fortunate situation since even the
fifth digit is zero: the precise values with six digits are J0 (0.6) = 0.912005 and
J1 (0.6) = 0.286701. However, the sixth and seventh decimals cause the absolute
errors1
(0) (1)
εabs = 0.9120049 − 0.9120 = 4.910 -6, εabs = 0.28670099 − 0.2867 = 9.910 -7
(2.11)
and the relative errors
(0) 4.910 -6 (1) 9.910 -7
εrel = = 5.310 -6, εrel = = 3.410 -6 .
0.912005 0.286701
Both are relatively small. This leads to the delusive hope that the same accuracy
holds for J5 (0.6). Instead, we observe the absolute and relative errors
(5) (5)
εabs = 1.510 -3, εrel = 75. (2.12)
As we see, the absolute error has increased (from about 10−6 in (2.11) to 10−3 in
(2.12)). Additionally, the small value J5 (0.6) J1 (0.6) causes the large relative
error in (2.12).
In order to understand the behaviour of the absolute error, we consider the recur-
2n
sion jn+1 = 0.6 jn − jn−1 (cf. (2.5)) for general starting values j0 , j1 . Obviously,
1 x−x̃
Let x̃ be any approximation of x. Then εabs = x − x̃ is the absolute error, while εrel = x
is the relative error.
2.3 Accuracy of Elementary Operations 7
For simplicity, we ignore the boundedness of the exponent E which may lead to
overflow or underflow.
Exercise 2.4. (a) What is the best bound ε in supξ∈R minx∈M |x − ξ|/|x| ≤ ε ?
(b) What is ε in the special case of the dual basis b = 2?
Here, fl(. . .) indicates the evaluation of the operations indicated by ‘. . .’ in the sense
of floating-point computer arithmetic. A√similar estimate is assumed for elementary
operations with one argument (e.g., fl ( a)). Furthermore, we assume eps 1.
Note that (2.13) controls the relative error.
More details about computer arithmetic can be found, e.g., in [7, §16] or [8, §2.5].
8 2 Stability of Finite Algorithms
2.4.1 Cancellation
The reason for the disastrous result for J5 (0.6) in the first algorithm can be discussed
by a single operation:
y = x1 − x2 . (2.14)
On the one hand, this operation is harmless, since the consistency (2.13) of the
subtraction guarantees that the floating point result ỹ = fl(x1 − x2 ) satisfies the
estimate2
|ỹ − y| ≤ eps |y|.
On the other hand, this estimate holds only for x1 , x2 ∈ M. The more realistic
problem occurs when replacing the exact difference η = ξ1 − ξ2 with the difference
x1 − x2 of machine numbers:
As long as |η| ∼ |xi | (i = 1, 2), the situation is under control, since also the relative
error has size eps. Dramatic cancellation appears when |η| |x1 | + |x2 |. Then the
relative error is . eps |x1 |+|x
|η|
2|
. In the worst case, |η| is of the size eps (|x1 | + |x2 |),
so that the relative error equals O(1).
Cancellation takes place for values ξ1 and ξ2 having the same sign and similar
size. On the other hand, the sum of two non-negative numbers is always safe, since
|η| = |ξ1 + ξ2 | ≥ |xi | and |η−ỹ| |η| ≤ 3eps. The factor 3 corresponds to the three
truncations ξ1 7→ x1 , ξ2 7→ x2 , x1 + x2 7→ fl(x1 + x2 ).
2
If x1 and x2 are close, often the exact difference is a machine number so that ỹ = y.
2.4 Error Amplification 9
Given the mapping y = Φ(x) in (2.1), we have to check how errors ∆xi of the input
value x̃i = xi + ∆xi affect the output ỹ = Φ(x̃).
Assume that Φ is continuously differentiable. The Taylor expansion of ỹj =
∂Φ
Φj (x1 , . . . , xi + ∆xi , . . . , xn ) yields ỹj = Φj (x1 , . . . , xn ) + ∂xij ∆xi + o(∆xi ).
Hence,
∂Φj (x1 , . . . , xn )
(2.15)
∂xi
is the amplification factor of the input error ∆xi , provided that we consider the
absolute errors. The amplification of the relative error equals
∂Φj (x1 , . . . , xn ) |xi |
·
|yj | . (2.16)
∂xi
describe the transfer from x(q) to x(p) = {y1 , . . . , ym } (for q = p, the empty
product is Φ(p) = id). While Φ(0) = Φ depends only on the underlying problem, the
(q) (q)
mappings Φ(q) for 1 ≤ q < p depend on the algorithm. The derivative ∂Φj /∂xi
(q)
corresponds to (2.15). It is the amplification factor describing ∆yj /∆xi , where
(q) (q)
∆xi is the error of xi and ∆yj is the induced error of yj . Since, by definition,
(q)
the intermediate result xi is obtained by an elementary operation, its relative error
is controlled by (2.13).
Let κ be the condition number of problem Φ. If all amplification factors
10 2 Stability of Finite Algorithms
(q) (q)
|∂Φj /∂xi | are at most of size κ, the algorithm is called stable. Otherwise, the
algorithm is called unstable.
Note that the terms stability/instability are not related to the problem, but to the
algorithm. Since there are many algorithms for a given problem, one algorithm for
problem Φ may be unstable, while another one could be stable.
Furthermore, we emphasise the relative relation: if the condition κ is large, also
(q) (q)
the amplification factors |∂Φj /∂xi | of a stable algorithm may be large. In the
case of the example in §2.2, the algorithms are stable. The difficulty is the large
condition number.
We summarise: If a problem Φ possesses a large condition number, the disastrous
amplification of the input errors cannot be avoided3 by any algorithm realising Φ. If
the problem is well-conditioned, one has to take care to choose a stable algorithm.
On purpose, the definitions of condition and stability are vague4 . It is not fixed
whether we consider relative or absolute errors, single components or a norm of the
error. Furthermore, it remains open as to what amplification factors are considered
to be moderate, large, or very large.
We consider the simple (scalar) problem Φ(x) = exp(x) for x = ±20 under the
hypothetical assumption that exp does not belong to the elementary operations.
First, we consider the condition. Here, it is natural to ask for the relative errors.
According to (2.16), the condition number equals
d exp(x) |x|
· = |x|.
dx | exp(x)|
3
The only exception occurs when the input values are exact machine numbers, i.e., ∆xi = 0.
4
A systematic definition is attempted by de Jong [2].
2.4 Error Amplification 11
The following example shows that standard concepts, e.g., from linear algebra, may
lead to a completely unstable algorithm. We consider the eigenvalue problem for
symmetric matrices (also here an approximation is unavoidable, since, in general,
eigenvalues are not computable by a finite algorithm). Computing the eigenvalue of
symmetric matrices is well-conditioned, as the following theorem will show. Since
the notation of the spectral norm is needed in the theorem, we recall some facts
about matrix norms. Given some vectors norms k·k, the associated matrix norm is
defined by
kAk := max{kAxk / kxk : x 6= 0}
(cf. (3.23) for the operator case). If k·k = k·k∞ is the maximum norm, the associ-
ated matrix norm is denoted by the identical symbol k·k∞ and called the row-sum
norm (cf. §5.5.4.1). The choice k·k = k·k2 of the Euclidean vector norm leads us to
√ k·k2 of matrices. An explicit description of the spectral norm is
the spectral norm
kAk2 = max{ λ : λ eigenvalue of AT A}.
Theorem 2.6. Let A, ∆A ∈ Rn×n be matrices with A being symmetric (or normal),
and set à := A + ∆A. Then, for any eigenvalue λ̃ of Ã, there is an eigenvalue λ of
A such that |λ̃ − λ| ≤ k∆Ak2 .
The identity
proves k(λ̃I − A)−1 ∆Ak2 ≥ 1 for the spectral norm. We continue the inequality
by 1 ≤ k(λ̃I−A)−1 ∆Ak2 ≤ k(λ̃I−A)−1 k2 k∆Ak2 . Since A = diag{λ1 , . . .}, the
norm equals k(λ̃I − A)−1 k2 = 1/ min |λ̃ − λi |. Altogether, there is an eigenvalue
1≤i≤n
λ of A such that |λ̃ − λ| = min |λ̃ − λi | = 1/k(λ̃I − A)−1 k2 ≤ k∆Ak2 . t
u
i
Since in linear algebra eigenvalues are introduced via the characteristic poly-
nomial, one might get the following idea:
Pn
(a) Determine the characteristic polynomial P (x) = k=0 ak xk .
(b) Compute the eigenvalues as roots of P : P (x) = 0.
For simplicity we assume that the coefficients ak of P (x) = det(xI − A) can be
determined exactly. Then the second part remains to be investigated: How are the
zeros of P effected by perturbations of ak ? Here, the following famous example of
Wilkinson is very informative (cf. [11] or [12, p. 54ff]).
We prescribe the eigenvalues (roots) 1, 2, . . . , 20. They are the zeros of
20
Y
P (x) = (i − x) = a0 + . . . + a19 x19 + a20 x20
k=1
(a0 = 20! = 2 432 902 008 176 640 000, . . . , a19 = 190, a20 = 1). The large value
of a0 shows the danger of an overflow during computations with polynomials. The
determination of zeros of P seems to be rather easy, because P has only simple
zeros and these are clearly separated.
We perturb only the coefficient a19 into ã19 = a19 − 2−23 (2−23 = 1.19210 -7
corresponds to ‘single precision’). The zeros of the perturbed polynomial P̃ are
The (absolute and relative) errors are not only of size O(1), the appearance of five
complex pairs shows that even the structure of the real spectrum is destroyed.
For sufficiently small perturbations, conjugate complex pairs of zeros cannot
occur. Considering the perturbation ã19 = a19 − 2−55 (2−55 = 2.77610 -17), we
obtain the zeros
For instance, the perturbation of the 15-th zero shows an error amplification by
5910 -9/2−55 = 2.110 9 .
Stable algorithms for computing the eigenvalue directly are using the entries of
the matrix and avoid the detour via the characteristic polynomial (at least in the form
2.4 Error Amplification 13
Pn
of P (x) = k=0 ak xk ; cf. Quarteroni et al. [8, §5]). In general, one should avoid
polynomials in the classical representation as sums of monomials.
f (x + h) − f (x − h) h2
Dh f (x) := = f 0 (x) + f 000 (ξ)
2h 6
with some x − h < ξ < x + h yields the estimate C3 h2 of the error, where C3
is a bound of f 000 . However, for h → 0 the truncation error spoils the computation
because of the cancellation error. A very optimistic assumption is that the numerical
evaluation of the function yields values f˜(x ± h) with the relative accuracy
˜
f (x ± h) − f (x ± h) ≤ |f (x ± h)| eps ≤ C0 eps,
εh := C3 h2 + C0 eps/(2h).
f : R2 × R2 → R2
f1 (x, y) = x1 y1 , (2.18)
f2 (x, y) = x2 y1 + x1 y2 .
fε,1 (x, y) = x1 y1 + ε2 x2 y2 ,
fε,2 (x, y) = x2 y1 + x1 y2
for some ε > 0. Obviously, for ε small enough, we obtain results for any prescribed
accuracy. The second function can be evaluated by the algorithm
s := (x1 + εx2 ) · (y1 + εy2 ) (s1 + s2 )/2
(x, y) 7→ 1 7→ fε := ,
s2 := (x1 − εx2 ) · (y1 − εy2 ) (s1 − s2 )/(2ε)
which requires only two multiplications. The expression (s1 − s2 )/(2ε) shows the
analogy to the numerical differentiation, so that again cancellation occurs.
In fact, the numbers of multiplications (3 and 2, respectively) have their origin in
tensor properties. We may rewrite (2.18) as
X
fi = vijk xj yk for i = 1, 2.
1≤j,k≤2
for some vectors a(µ) , b(µ) , c(µ) ∈ R2 (cf. [4, Definition 3.32]). The solution for
r = 3 is
6
Pr (µ) (µ)
Note that the minimal r in vij = µ=1 ai bj is the usual matrix rank of (vij ).
References 15
References
1. Bini, D., Lotti, G., Romani, F.: Approximate solutions for the bilinear form computational
problem. SIAM J. Comput. 9, 692–697 (1980)
2. de Jong, L.S.: Towards a formal definition of numerical stability. Numer. Math. 28, 211–219
(1977)
3. Deuflhard, P.: A summation technique for minimal solutions of linear homogeneous difference
equations. Computing 18, 1–13 (1977)
4. Hackbusch, W.: Tensor spaces and numerical tensor calculus, Springer Series in Computa-
tional Mathematics, Vol. 42. Springer, Berlin (2012)
5. Higham, N.J.: Accuracy and stability of numerical algorithms. SIAM, Philadelphia (1996)
6. Oliver, J.: The numerical solution of linear recurrence relations. Numer. Math. 11, 349–360
(1968)
7. Plato, R.: Concise Numerical Mathematics. AMS, Providence (2003)
8. Quarteroni, A., Sacco, R., Saleri, F.: Numerical Mathematics, 2nd ed. Springer, Berlin (2007)
9. Stoer, J., Bulirsch, R.: Introduction to Numerical Analysis. North-Holland, Amsterdam (1980)
10. Van der Cruyssen, P.: A reformulation of Olver’s algorithm for the numerical solution of
second-order linear difference equations. Numer. Math. 32, 159–166 (1979)
11. Wilkinson, J.H.: Rounding errors in algebraic processes. Prentice-Hall, Englewood Cliffs
(1964). Reprinted by Dover Publications, New York, 1994
12. Wilkinson, J.H.: Rundungsfehler. Springer, Berlin (1969)
13. Zeidler, E. (ed.): Oxford Users’ Guide to Mathematics. Oxford University Press, Oxford
(2004)
Chapter 3
Quadrature
In this chapter some facts from interpolation are used. Therefore, the reader may
first have a look at §4.1 of the next chapter, which is devoted to interpolation. We
prefer to start with quadrature instead of interpolation, since a projection between
function spaces (interpolation) is more involved than a functional (quadrature).
Problem 3.1. Assume that f ∈ C ([0, 1]). Compute an approximate value of the
integral
Z 1
f (x)dx.
0
Other intervals and additional weight functions can be treated, but these general-
isations are uninteresting for our purpose.
For all n ∈ N0 we define a quadrature formula
n
X
Qn (f ) = ai,n f (xi,n ). (3.1)
i=0
Here ai,n are the quadrature weights and xi,n are the (disjoint) quadrature points.
A sequence
{Qn : n ∈ N0 }
yields a family of quadrature formulae (sometimes also called a quadrature rule
meaning that this rule generates quadrature formulae for all n).
The usual way to derive (3.1) is the ‘interpolatory quadrature’ via a (family of)
interpolation methods (see §4.1). Consider an interpolation
n
X
f (x) ≈ fn (x) := f (xi,n )Φi,n (x)
i=0
with Lagrange functions Φi,n (i.e., Φi,n belongs to the desired function space and
satisfies Φi,n (xj,n ) = δij ; cf. (4.2)). Integrating fn instead of f , one obtains
Z 1 Z 1 n
X Z 1
f (x)dx ≈ fn (x)dx = f (xi,n ) Φi,n (x)dx.
0 0 i=0 0
| {z }
=:ai,n
The standard choice is polynomial interpolation. In this case, Φi,n = Li,n are the
Lagrange polynomials
Y x − xk,n
Li,n (x) := . (3.2)
xi,n − xk,n
k∈{1,...,n}\{i}
The latter statement does not exclude that even more functions (e.g., polynomials
of higher order) are also integrated exactly.
The next exercise refers to the case of polynomials, where Φi,n = Li,n are the
Lagrange polynomials (3.2).
R1
Exercise 3.3. The definition ai,n = 0 Li,n (x)dx is less helpful for its computation.
Show that instead one can obtain the values ai,n by solving the following system of
linear equations:
n
X 1
ai,n xki,n = for k = 0, 1, . . . , n.
i=0
k+1
R1
Hint: Verify that Qn (xk ) = 0
xk dx.
So far the quadrature points xi,n are not fixed. Their choice determines the
(family of) quadrature formulae.
3.2 Consistency 19
i
xi,n = for i = 0, 1, . . . , n.
n
The Newton–Cotes quadrature2 is the interpolatory quadrature based on the poly-
nomial interpolation at the nodes xi,n .
For later use we cite the following asymptotic statement about the Newton–Cotes
weights from Ouspensky [6] (also mentioned in [3, p. 79]):
i−1 n
(−1) n! 1 (−1) 1
ai,n = + 1+O (3.3)
i! (n − i)!n log2 n
2 i n−i log n
for 1 ≤ i ≤ n − 1.
3.2 Consistency
For the interpolatory quadrature defined via polynomial interpolation one uses the
following consistency definition.
1
The family of Newton–Cotes formulae is defined for n ∈ N only, not for n = 0. Formally, we
may add Q0 (f ) := f (0).
2
See Newton’s remarks in [11, pp. 73-74].
3
The original publication of Gauss [4] is from 1814. Christoffel [2] generalises the method to
integrals with a weight function. For a modern description, see Stroud–Secrest [10].
20 3 Quadrature
The quadrature families mentioned above satisfy (3.4) with the following values
of g(n).
Qn (Pn ) = 0 follows from Part (i), while in the case of the Newton–Cotes quadra-
ture Qn ((x − 1/2)n+1 ) = 0 can be concluded from Exercise 3.4.
(iii) Let P be a polynomial of degree 2n + 1 and consider the Gauss quadra-
ture. The Euclidean algorithms allows us to divide P by the (transformed) Legendre
polynomial Ln+1 : P = pLn+1 + q. Both polynomials p and q are of degree ≤ n.
R1
The integral 0 p(x)Ln+1 (x)dx vanishes because of the orthogonality property of
Ln+1 . On the other hand, Qn (pLn+1 ) = 0 follows from the fact that Ln+1 vanishes
R1 R1
at the quadrature points. Hence, 0 P (x)dx = 0 q(x)dx =(i) Qn (q) = Qn (P )
proves the assertion. t u
3.2 Consistency 21
Remark 3.8 (Peano kernel). Let Qn have maximal consistency order gn . For some
m ∈ N with m ≤ gn suppose that f ∈ C m+1 ([0, 1]). Then the quadrature error
equals
Z 1 Z 1
f (x)dx − Qn (f ) = πm (x, y)f (m+1) (y)dy (πm defined in the proof).
0 0
R1 (3.6a)
The error εn := 0 f (x)dx − Qn (f ) is estimated by
If πm (x, y) does not change sign, the following error equality holds for a suitable
intermediate value ξ ∈ [0, 1]:
Z 1 Z 1
f (x)dx − Qn (f ) = αm f (m+1) (ξ) with αm := πm (x, y)dy. (3.6c)
0 0
Proof. The Taylor representation with remainder term yields f (x) = Pm (x)+r(x),
where the polynomial Pm of degree ≤ m is irrelevant, since its quadrature error
vanishes. Hence the quadrature error of f is equal to that of r. The explicit form of
r is
Z x Z 1
1
r(x) = (x − y)m f (m+1) (y)dy = (x − y)m +f
(m+1)
(y)dy,
m! 0 0
where (t)+ = t for t ≥ 0 and (t)+ = 0, otherwise. The quadrature error of r equals
Z 1
r(x)dx − Qn (r)
0
Z 1 Z 1 n
X Z 1
= (x − y)m
+f
(m+1)
(y)dydx − ai,n (xi,n − y)m
+f
(m+1)
(y)dy
0 0 i=0 0
"Z n
#Z
Z 1 1 X 1
= (x − y)m
+ dx − ai,n (xi,n − y)m
+ f (m+1) (y)dy,
0 0 i=0 0
3.3 Convergence
(for k·k∞ compare §3.4.7.1). However, this estimate may be too pessimistic. An
optimal error analysis can be based on the Peano kernel (cf. Remark 3.8). In any
case one obtains bounds of the form
εn (f ) ≤ cn kf (kn ) k∞ , (3.8)
The answer is that in this case, cn → 0 cannot hold. For a proof, modify the constant
function f = 1 in η-neighbourhoods of the quadrature points xi,n (0 ≤ i ≤ n) such
that 0 ≤ f˜ ≤ f and f˜(xi,n ) = 0. Because of f˜(xi,n ) = 0, we conclude that
R1
Qn (f˜) = 0, while for sufficiently small η, the integral 0 f˜(x)dx is arbitrarily close
R1
to 0 f (x)dx = 1 (the difference is bounded by δ := 2nη). Since kf k∞ = kf˜k∞ = 1,
we obtain no better error estimate than
3.3 Convergence 23
Z 1
εn (f˜) = f˜(x)dx ≥ 1 − δ = 1 · kf˜k∞ − δ; i.e., cn ≥ 1.
0
Above we made use of the Banach space X = C([0, 1]) equipped with the maxi-
mum norm k·kX = k·k∞ from (3.5). The dual space of X consists of all linear and
continuous mappings φ from X into R. Continuity of φ is equivalent to bounded-
ness; i.e., the following dual norm must be finite:
∗
kφkX := sup{|φ(f )| : f ∈ X, kf kX = 1}
= sup{|φ(f )| / kf kX : 0 6= f ∈ X}.
∗
is a linear functional on X = C([0, 1]) with kIkX = 1.
Another functional is the Dirac function(al) δa ∈ X ∗ defined by δa (f ) := f (a).
∗
Here kδa kX = 1 holds.
The quadrature formula (3.1) is a functional which may be expressed in terms of
Dirac functionals:
n
X
Qn = ai,n δxi,n . (3.11)
i=0
Lemma 3.14. Let X = C([0, 1]). The quadrature formula (3.11) has the dual norm
n
∗
X
kQn kX = |ai,n | .
i=0
∗ Pn
holds because of |f (xi,n )| ≤ kf kX = 1. This proves kQn kX ≤ i=0 |ai,n |. The
equality sign is obtained by choosing the following particular function g ∈ X. Set
g(xi,n ) := sign(ai,n ) and interpolate between the quadrature points and the end
points 0, 1 linearly. Then g is continuous, i.e., g ∈ X, and kgkX = 1. The definition
yields the reverse inequality
n
∗
X
kQn kX = sup{|Qn (f )| : f ∈ X, kf kX = 1} ≥ |Qn (g)| = |ai,n | . t
u
i=0
3.4 Stability
It follows that
|δfi,n | ≤ kf − f˜k∞ .
The constant Cn from (3.13b) is the error amplification factor of the quadrature Qn .
To avoid an increasing error amplification, we have to require that Cn be uniformly
bounded. This leads us directly to the stability definition.
From Part (b) of the remark and after replacing f by f − g, one obtains the
following estimate analogous to (3.13a).
Corollary 3.18. |Qn (f ) − Qn (g)| ≤ Cstab kf − gk∞ holds for all f, g ∈ C([0, 1])
and all n ∈ N0 .
(εn (f ) from (3.7)). If stability holds, the error is bounded by Cstab kf˜−f k∞ +εn (f )
(cf. Corollary 3.18). Provided that εn (f ) → 0, the total error approaches the level
of numerical noise Cstab kf˜ − f k∞ , which is unavoidable since it is caused by the
input error kf˜ − f k∞ .
In the case of instability, Cn → ∞ holds. While the term εn (f ) approaches zero,
Cn kf˜ − f k∞ tends to infinity. Hence, an enlargement of n does not guarantee a
better result. If one does not have further information about the behaviour of εn (f ),
it is difficult to find an n such that the total error is as small as possible.
In spite of what has been said about the negative consequences of instability,
we have to state that the quality of a quadrature rule for fixed n has no relation to
stability or instability. The sensitivity of Qn to input errors is given only by the
amplification factor Cn . Note that the size of Cn is not influenced by whether the
sequence (Cn )n∈N is divergent or convergent.
Under the minimal condition that Qn be exact for constants (polynomials of order
R1 Pn
zero; i.e., g(n) ≥ 0 in (3.4)), one concludes that 1 = 0 dx = Qn (1) = i=0 ai,n :
n
X
ai,n = 1. (3.16)
i=0
3.4 Stability 27
Conclusion 3.19. Assume (3.16). (a) If the quadrature weights are non-negative
(i.e., ai,n ≥ 0 for all i and n), then the family {Qn } is stable with
The latter statement is based upon (3.16). A weaker formulation is provided next.
Exercise 3.20. Conclusion 3.19 remains valid, if (3.16) is replaced by the condition
lim Qn (1) = 1.
n→∞
Lemma 3.21. The Gauss quadrature has non-negative weights: ai,n ≥ 0. Hence
the family of Gauss quadratures is stable with constant Cstab = 1.
Proof. Define the polynomial P2n (x) := 0≤k≤n,k6=i (x − xk,n )2 of degree 2n.
Q
R1
Obviously, 0 P2n (x)dx > 0. Since Qn is exact for polynomials of degree ≤ 2n+1,
also ai,n P2n (xi,n ) = Qn (P2n ) > 0. The assertion follows from P2n (xi,n ) > 0. t
u
n 1 to 7 8 9 10 11 12 14 16 18 20 22 24
Cn 1 1.45 1 3.065 1.589 7.532 20.34 58.46 175.5 544.2 1606 9923
Obviously, Cn increases exponentially to infinity; i.e., the Newton–Cotes formulae
seem to be unstable. An exact proof of instability can be based on the asymptotic
description (3.3) of ai,n . For even n, the following inequality holds:
n
X
Cn = |ai,n | ≥ a n2 ,n (ai,n from (3.3)).
i=0
Exercise 3.22. (a) Recall Stirling’s formula for the asymptotic representation of n!
(cf. [13, §1.14.16], [5, Anhang 1]).
(b) Using (a), study the behaviour of |a n2 ,n | and conclude that the family of
Newton–Cotes formulae is unstable.
ThePexistence of negative weights ai,n is not yet a reason for instability, as long
n
as i=0 |ai,n | stays uniformly bounded. The following Romberg quadrature is an
example of a stable quadrature involving negative weights.
For h = 1/N with N ∈ N, the sum
1 1
T (f, h) := h f (0) + f (h) + f (2h) + . . . + f (1 − h) + f (1)
2 2
represents the compound trapezoidal rule. Under the assumption f ∈ C m ([0, 1]),
m even, one can prove the asymptotic expansion
Z 1
T (f, h) = f (x)dx+h2 e2 (f )+. . .+hm−2 em−2 (f )+O(hm kf (m) k∞ ) (3.17)
0
(cf. Bulirsch [1], [7, §9.6]). Hence, the Richardson extrapolation is applicable:
compute T (f, hi ) for different hi , i = 0, . . . , n, and extrapolate the values
2
hi , T (f, hi ) : i = 0, . . . , n
1
Exercise 3.24. Prove (a) 1 + x ≤ ex for all real x and (b) 1−x ≤ 1 + ϑx with
1
ϑ = 1−x 0
for all 0 ≤ x ≤ x0 < 1.
1
(i) Part (b) with ϑ = 1−α2 yields
m m m ∞
Y 1 Y
2j
Y 2j
Y
exp ϑα2j =: A,
2j
≤ 1 + ϑα ≤ exp ϑα ≤
j=1
1−α j=1 j=1 j=1
P
∞ α2 ϑ
where A = exp j=1 ϑα2j = exp 1−α 2 2
2 = exp α ϑ . This implies that
m m
Y α2j Y
2j
≤ A α2j = Aαm(m+1) ≤ Aαm for all m ≥ 0.
j=1
1 − α j=1
Q Qi−1 Qn
(ii) Split the product ν6=i in (3.18) into the partial products ν=0 and ν=i+1 .
The first one is estimated by
i−1 i−1 i−1 i
Y h2ν Y 1 Y 1 Y 1
= ≤ = ≤ A,
h2 − h2 1 − h 2 /h2
1 − α 2(i−ν) 1 − α2j (i)
ν=0 ν i ν=0 i ν (3.20) ν=0 j=1
Lemma 3.25. The family of the Romberg quadratures {Qn } in (3.18) is stable.
PNi
Proof. The compound trapezoidal rule T (f, hi ) = k=0 τk,i f (khi ) has the weights
τk,i = hi for 0 < k < Ni and τk,i = hi /2 for k = 0, Ni . In particular,
Ni
X Ni
X
|τk,i | = τk,i = 1
k=0 k=0
X Ni
X X X
Qn (f ) = ci,n τk,i f (khi ) = aj,n f (xj,n ) with aj,n := ci,n τk,i .
i k=0 j (i,k):khi =xj,n
P P PNi P
Now, j |aj,n | ≤ i |ci,n | k=0 |τk,i | = i |ci,n | ≤ C proves the sta-
Lemma 3.23
bility condition. t
u
Lemma 3.26. The family of the Romberg quadratures {Qn } in (3.18) is consistent.
Proof. Let P be a polynomial of degree ≤ g(n) := 2n + 1. In (3.17), the remainder
term for m := 2n + 2 vanishes, since P (m) = 0. This proves that T (f, h) is a
polynomial of degree ≤ n in the variable h2 . Extrapolation eliminates the terms
R1
hji ej (P ), j = 2, 4, . . . , 2n, so that Qn (P ) = 0 P (x)dx. Since g(n) = 2n + 1 → ∞
for n → ∞, consistency according to Definition 3.5 is shown. t u
The later Theorem 3.36 will prove convergence of the Romberg quadrature.
Exercise 3.27. Condition hi+1 ≤ αhi from (3.19) can be weakened. Prove: if an
` ∈ N and an α ∈ (0, 1) exist such that hi+` ≤ αhi for all i ≥ 0, then Lemma 3.23
remains valid.
For the next step of the proof we need the well-known approximation theorem of
Weierstrass.
Theorem 3.28. For all ε > 0 and all f ∈ C([0, 1]) there is a polynomial P = Pε,f
with kf − P k∞ ≤ ε.
An equivalent formulation is: the set P of all polynomials is a dense subset of
C([0, 1]).
In the following we prove a more general form (Stone–Weierstrass theorem). The
next theorem uses the point-wise maximum Max(f, g)(x) := max{f (x), g(x)}
and point-wise minimum Min(f, g)(x) := min{f (x), g(x)} of two functions. The
following condition (i) describes that F is closed under these mappings. Condition
(ii) characterises the approximability at two points (‘separation of points’).
3.4 Stability 31
The latter inequality yields in particular that g(y) − ε < h(y). By continuity of h
and g, this inequality holds in a whole neighbourhood U (y) of y:
belongs again to F and satisfies g + ε > f . Since each h(· ; xi , ε) satisfies the
inequality g − ε < h(· ; xi , ε) from Part (a), also g − ε < f follows. Together, one
obtains kf − gk∞ < ε; i.e., f ∈ F is the desired approximant. t u
Remark 3.30. Instead of the lattice operations Max and Min, one can equivalently
require that F be closed with respect to the absolute value:
f ∈F ⇒ |f | ∈ F,
In the case of (d) in the previous example one has to show that, e.g., the product
sin(nx) cos(mx) (n, m ∈ N0 ) is again a trigonometric function. This follows from
2 sin(nx) cos(mx) = sin ((n + m)x) + sin ((n − m)x).
If A ⊂ C(Q) is an algebra, the closure Ā (with respect to the maximum norm
k·k∞ ) is called the closed hull of the algebra A.
Proof. (i) A simple scaling argument shows that it suffices to show the assertion for
f ∈ A with kf k∞ ≤ 1. p
(ii) Let ε > 0 be given. The function T (ζ) := Pζ + ε2 is holomorphic in the
complex half-plane <e ζ > −ε2 . The Taylor series αν (x − 12 )ν of T (x) has the
convergence radius 12 + ε2 and converges uniformly in the interval [0, 1]. Hence
there is a finite Taylor polynomial Pn of degree n, so that
p
x + ε2 − Pn (x) ≤ ε for all 0 ≤ x ≤ 1.
The particular case x = 0 shows |ε − Pn (0)| ≤ ε and, therefore, |Pn (0)| ≤ 2ε. The
polynomial Q2n (x) := Pn (x2 ) − Pn (0) of degree 2n has a vanishing absolute term
and satisfies
p p
x2 + ε2 − Q2n (x) ≤ x2 + ε2 − Pn (x2 ) + |Pn (0)| ≤ 3ε
(iv) For all f with kf k∞ ≤ 1, the values f (ξ) (ξ ∈ Q) satisfy the inequality
−1 ≤ f (ξ) ≤ 1, so that x = f (ξ) can be inserted into the last inequality:
|f (ξ)| − Q2n (f (ξ)) ≤ 4ε for all ξ ∈ Q.
Because of 6
n n
!
X 2ν
X
2ν
Q2n (f (ξ)) = qν (f (ξ)) = qν f (ξ),
ν=1 ν=1
Q2n (f ) belongs again to A and satisfies the estimate ||f | − Q2n (f )| ≤ 4ε. As
ε > 0 is arbitrary, |f | belongs to the closure of A. t
u
Proof. (a) By Lemma 3.34 and Remark 3.30, F = Ā satisfies the requirement (i) of
Lemma 3.29. As soon as (ii) from Lemma 3.29 is shown, F̄ = C(Q) follows. Since
F = Ā is already closed, the first case Ā = C(Q) follows.
(b) For the proof of (ii) from Lemma 3.29, we consider the following alternative:
either for any x ∈ Q there is an f ∈ A with f (x) 6= 0 or there exists an x0 ∈ Q
with f (x0 ) = 0 for all f ∈ A. The first alternative will be investigated in (c), the
second one in (d).
(c) Assume the first alternative. First we prove the following:
Assertion: For points x0 , x00 ∈ Q with x0 6= x00 from Assumption (ii) in Lemma
3.29 there exists an f ∈ A with 0 6= f (x0 ) 6= f (x00 ) 6= 0.
For its proof we use the separability property (iii): f (x0 ) 6= f (x00 ) holds for a
suitable f . Assume f (x0 ) = 0, which implies f (x00 ) 6= 0 (the case f (x00 ) = 0 and
6
Here we use that q0 = 0, since f 0 = 1 may not necessarily belong to the algebra A.
34 3 Quadrature
f (x0 ) 6= 0 is completely analogous). The first alternative from Part (b) guarantees
the existence of an f0 ∈ A with f0 (x0 ) 6= 0. The function
fλ := f − λf0
0 6= fλ (x0 ) 6= fλ (x00 ) 6= 0, fλ ∈ A,
and fλ (renamed by f ) has the required properties. This proves the assertion.
Let g ∈ C(Q) be the function from assumption (ii) of Lemma 3.29. Concerning
the required ϕ, we make the ansatz ϕ = αf + βf 2 with f from the assertion. f ∈ A
implies that also f 2 , ϕ ∈ A. The conditions ϕ(x0 ) = g(x0 ) and ϕ(x00 ) = g(x00 ) lead
to a 2 × 2-system of linear equations for α and β. Since the determinant
does not vanish, solvability is ensured. Therefore, assumption (ii) of Lemma 3.29 is
satisfied even for ε = 0 and Part (a) shows Ā = C(Q).
(d) Assume the second alternative: there is an x0 ∈ Q with f (x0 ) = 0 for all
f ∈ A. Let 1I ∈ C(Q) be the function with constant value 1. Denote the algebra
generated from A and {1I} by A∗ ; i.e., A∗ = {f = g + λ1I: g ∈ A, λ ∈ K}.
Obviously, for any x ∈ Q there is an f ∈ A∗ with f (x) 6= 0 (namely f = 1I). Hence
the first alternative applies to A∗ . The previous proof shows A∗ = C(Q).
Let g ∈ C(Q) be an arbitrary function with g(x0 ) = 0, i.e., belonging to
the right-hand side of (3.21). Because of g ∈ C(Q) = A∗ , for all ε > 0 there
is an f ∗ ∈ A∗ with kg − f ∗ k∞ < ε. By definition of A∗ one may write f ∗ as
f ∗ = f + λ1I with f ∈ A. This shows that
kg − f − λ1Ik∞ < ε.
For the proof of Theorem 3.28 choose Q = [0, 1] (compact subset of R1 ) and
A as the algebra of all polynomials. For this algebra, assumption (iii) of Theorem
3.35 is satisfied with f (x) = x. Hence, one of the two alternatives Ā = C(Q) or
(3.21) holds. Since the constant function 1I belongs to A, (3.21) is excluded and
Ā = C(Q) is shown.
3.4 Stability 35
Proof. Let ε > 0 be given. We have to show that for all f ∈ C([0, 1]) there is an n0
such that Z 1
Qn (f ) − f (x)dx ≤ε for n ≥ n0 .
0
Z 1 Z 1
P (x)dx − f (x)dx ≤ kf − P k∞ ≤ ε/ (1 + Cstab )
0 0
X is called a normed (linear) space (the norm may be expressed by the explicit
notation (X, k·k)), if a norm k·k is defined on the vector space X. If necessary, we
write k·kX for the norm on X.
X is called a Banach space, if X is normed and complete (‘complete’ means that
all Cauchy sequences converge in X).
By L(X, Y ) we denote the set of linear and continuous mappings from X to Y .
Remark 3.37. (a) If X, Y are normed, also L(X, Y ) is normed. The associated
‘operator norm’ of T ∈ L(X, Y ) equals7
kT xkY
kT k := kT kY ←X := sup . (3.23)
x∈X\{0} kxkX
3.4.7.2 Theorem
kT kY ←X = sup kT xkY .
x∈K
The statement of the theorem becomes supT ∈T supx∈K kT xk < ∞. Since suprema
commute, one may also write supx∈K supT ∈T kT xk < ∞. Assumption (d) of the
7
For the trivial case of X = {0}, we define the supremum over the empty set by zero.
3.4 Stability 37
theorem reads C(x) := supT ∈T kT xk < ∞; i.e., the function C(x) is point-wise
bounded. The astonishing8 property is that C(x) is even uniformly bounded on K.
In the later applications we often apply a particular variant of the theorem.
Corollary 3.39. Let X and Y be as in Theorem 3.38. Furthermore, assume that the
operators T, Tn ∈ L(X, Y ) (n ∈ N) satisfy either
(a) {Tn x} a Cauchy sequence for all x ∈ X, or
(b) there exists an operator T ∈ L(X, Y ) with Tn x → T x for all x ∈ X.
Then supn∈N kTn k < ∞ holds.
3.4.7.3 Proof
The proof of Theorem 3.38 is based on two additional theorems. The first is called
Baire’s category theorem or the Baire–Hausdorff theorem.
Then there exists at least one k0 ∈ N, so that Åk0 6= ∅ (Åk0 denotes the interior of
Ak0 ).
Proof. (a) For an indirect proof assume Åk = ∅ for all k. We choose a non-empty,
open set U ⊂ X and some k ∈ N. Since Ak closed, U \Ak is again open and non-
empty (otherwise, Ak would contain the open set U ; i.e., Åk ⊃ U 6= ∅). Since
U \Ak is open, it contains a closed sphere Kε (x) with radius ε > 0 and midpoint x.
Without loss of generality, ε ≤ 1/k can be chosen.
(b) Starting with ε0 := 1 and x0 := 0, according to (a), we choose by induction
8
Only in the case of a finite-dimensional vector space X, is there a simple proof using
supT ∈T kT bi kY < ∞ for all basis vectors bi of X.
38 3 Quadrature
Theorem 3.41. Let X be a complete metric space and Y a normed space. For some
subset F ⊂ C 0 (X, Y ) of the continuous mappings, assume that sup kf (x)kY < ∞
for all x ∈ X. Then there exist x0 ∈ X and ε0 > 0 such that f ∈F
T
Proof. Set Ak := f ∈F {x ∈ X : kf (x)kY ≤ k} for k ∈ N and check that Ak
S According to the assumption, each x ∈ X must belong to some Ak ; i.e.,
is closed.
X = k∈N Ak . Hence the assumptions of Theorem 3.40 are satisfied. Correspond-
ingly, Åk0 6= ∅ holds for at least one k0 ∈ N. By the definition of Ak , we have
supx∈Ak supf ∈F kf (x)kY ≤ k0 . Choose a sphere with Kε0 (x0 ) ⊂ Ak0 . This
0
yields the desired inequality (3.24) with the bound ≤ k0 . t
u
For the proof of Theorem 3.38 note that a Banach space is also a complete metric
space and L(X, Y ) ⊂ C 0 (X, Y ), so that we may set F := T . The assumption
supf ∈F kf (x)kY < ∞ is equivalent to supT ∈T kT xkY < ∞. The result (3.24)
becomes supx∈Kε (x0 ) supT ∈T kT xkY < ∞. For an arbitrary ξ ∈ X\{0}, the
0
ε0
element xξ := x0 + kξkX ξ belongs to Kε0 (x0 ), so that
kT ξkY 1 1
= kT (xξ − x0 )kY ≤ kT xξ kY + kT x0 kY
kξkX ε0 ε0
is uniformly bounded for all T ∈ T and all ξ ∈ X\{0}. Hence the assertion of
kT ξk
Theorem 3.38 follows: sup sup kξk Y = sup kT k < ∞.
X
T ∈T ξ∈X\{0} T ∈T
X = C([0, 1]) together with the maximum norm k·k∞ is a Banach space, and
Y := R is normed (its norm is the absolute value). The mappings
Z 1
f ∈ C([0, 1]) 7→ I(f ) := f (x)dx ∈ R and f ∈ C([0, 1]) 7→ Qn (f ) ∈ R
0
are linear and continuous; hence they belong to L(X, Y ). By Remark 3.37b, conti-
nuity is equivalent to boundedness, which is quantified in the following lemma.
Lemma 3.42. The operator norms of I and Qn ∈ L(X, Y ) are
n
X
kIk = 1, kQn k = Cn := |ai,n | . (3.25)
i=0
Proof. The estimates kIk ≤ 1 and kQn k ≤ Cn are equivalent to the estimates
(3.12) and (3.13a) from Remark 3.15. According to Remark 3.17b, Cn is the
3.4 Stability 39
The terms ‘consistency’ and ‘convergence’ can be even better separated, without
weakening the previous statements.
The previous definition of convergence contains not only the statement that the
R1
sequence Qn (f ) is convergent, but also that it has the desired integral 0 f (x)dx
as the limit. The latter part can be omitted:
{Qn } is consistent if there is a dense subset X0 ⊂ C([0, 1]) such that (3.27)
Z 1
Qn (f ) → f (x)dx for all f ∈ X0 .
0
R1
Note that, simultaneously, we have replaced the exactness Qn (f ) = 0 f (x)dx for
n ≥ n0 by the more general convergence definition (3.26). The stability property
remains unchanged.
Then the previous theorem can be reformulated as follows.
40 3 Quadrature
Theorem 3.45. (a) Let {Qn } be consistent in the more general sense of (3.27)
and stable. Then {Qn } is convergent in the sense of (3.26) and, furthermore,
R1
limn→∞ Qn (f ) = 0 f (x)dx is the desired value of the integral.
(b) Let {Qn } be convergent in the sense of (3.26). Then {Qn } is also stable.
(c) Under the assumption of consistency in the sense of (3.27), stability and conver-
gence (3.26) are equivalent.
R1
Proof. (i) It suffices to show that limn→∞ Qn (f ) = 0 f (x)dx, since this implies
(3.26). Let f ∈ C([0, 1]) and ε > 0 be given. Because X0 from (3.27) is dense,
there is a g ∈ X0 with
ε
kf − gk∞ ≤ (Cstab : stability constant).
2 (1 + Cstab )
R1
According to (3.27), there is an n0 such that Qn (g) − 0 g(x)dx ≤ 2ε for all
n ≥ n0 . The triangle inequality yields the desired estimate
Z 1
Qn (f ) − f (x)dx
0
Z 1 Z 1 Z 1
≤ |Qn (f ) − Qn (g)| + Qn (g) − g(x)dx + g(x)dx − f (x)dx
0 0 0
ε
≤ Cstab kf − gk∞ + + kf − gk∞ ≤ ε.
2 kf −gk∞ ≤ε/[2(1+Cstab )]
(ii) Convergence in the sense of (3.26) guarantees that {Qn (f )} has a limit for
all f ∈ C([0, 1]). Alternative (a) of Corollary 3.39 applies and yields
proving stability.
(iii) Part (c) follows from Parts (a) and (b). t
u
Finally, we give a possible application of generalised consistency. To avoid
the difficulties arising from the instability of the Newton–Cotes formulae, one
often uses compound Newton–Cotes formulae. The best known example is the
compound trapezoidal rule, which uses the trapezoidal rule on each subinterval
[i/n, (i + 1) /n]. The compound trapezoidal rule defines again a family {Qn }. It
is not consistent in the sense of Definition 3.5, since except for constant and linear
functions, no further polynomials are integrated exactly. Instead, we return to the
formulation (3.8) of the quadrature error. The well-known estimate states that
Z 1
1
kf 00 k∞ → 0
Qn (f ) − f (x)dx ≤ (3.28)
12n 2
0
for all f ∈ C 2 ([0, 1]) (cf. [9, §3.1]). The subset C 2 ([0, 1]) is dense in C([0, 1])
(simplest proof: C 2 ([0, 1]) ⊃ {polynomials} and the latter set is already dense
3.5 Further Remarks 41
according to Theorem 3.28). Hence the compound trapezoidal rule {Qn } satisfies
the consistency condition (3.27) with X0 = C 2 ([0, 1]). The stability of {Qn }
R 1 are positive and Qn (1) = 1.
follows from Conclusion 3.19a, since all weights
From Theorem 3.45a we conclude that Qn (f ) → 0 f (x)dx for all continuous f .
The trapezoidal rule is the Newton–Cotes method for n = 1. We may fix any
nNC ∈ N and use the corresponding Newton–Cotes formula in each subinterval
[i/n, (i + 1) /n]. Again, this compound formula is stable, where the stability
constant is given by CnNC from (3.25).
Also the error estimate can be transferred from [0, 1] to a general interval [a, b].
Assume an error estimate (3.8) for f ∈ C kn ([0, 1]) by
Z 1
≤ cn kf (kn ) k∞ .
f (x)dx − Qn (f )
0
The stability constant Cn is the minimal cn for kn = 0. One sees that Cn in [0, 1]
[a,b]
becomes Cn := LCn in [a, b]. This fact can be interpreted in the way that the
42 3 Quadrature
Rb [a,b]
relative quadrature error L1 a g(t)dt − Qn (g) possesses an unchanged stability
[a,b]
constant. Anyway, the stability properties of {Qn } and {Qn } are the same.
In applications it happens that the integrand is a product f (x)g(x), where one
factor—say g—is not well-suited for quadrature (it may be insufficiently smooth,
e.g., containingPa weak singularity or it may be highly oscillatory). Interpolation of
n
f by In (f ) = i=0 f (xi,n )Φi,n (x) (cf. §3.1.2) induces a quadrature of f g by
Z 1 n
X Z 1
f (x)g(x)dx ≈ ai,n f (xi,n ) with ai,n := Φi,n (x)g(x)dx,
0 i=0 0
R1
which requires that we have precomputed the (exact) integrals 0
Φi,n (x)g(x)dx.
3.5.3 Perturbations
if we consider, say f ∈ C 1 ([0, 1]), instead of f ∈ C([0, 1]). Note that f ∈ C 1 ([0, 1])
comes with the norm kf kC 1 ([0,1]) = max{|f (x)| , |f 0 (x)| : 0 ≤ x ≤ 1}.
The next result is prepared by the following lemma. We remark that a subset B of
a Banach space is precompact if and only if the closure B̄ is compact, which means
that all sequences {fk } ⊂ B possess a convergent subsequence: limn→∞ fkn ∈ B̄.
The term ‘precompact’ is synonymous with ‘relatively compact’.
Lemma 3.49. Let M ⊂ X be a precompact subset of the Banach space X. Let the
operators An ∈ L(X, Y ) be point-wise convergent to A ∈ L(X, Y ) (i.e., An x → Ax
for all x ∈ X) . Then the sequences {An x} converge uniformly for all x ∈ M ; i.e.,
sup kAn x − AxkY → 0 for n → ∞. (3.29)
x∈M
and equicontinuous; i.e., for any ε > 0 and x ∈ D, there is some δ such that
Then M is precompact.
References
{xi,n ∈ [0, 1] : 0 ≤ i ≤ n}
Φ(xi,n ) = yi (0 ≤ i ≤ n) (4.1)
has to be determined.
Exercise 4.1. (a) The interpolation problem is solvable for all tuple {yi : 0 ≤ i ≤ n},
if and only if the linear space
n
Vn := (Φ(xi,n ))i=0 ∈ Rn+1 : Φ ∈ Vn
has dimension n + 1.
(b) If dim Vn = n + 1, the interpolation problem is uniquely solvable.
1
The term ‘linear’ refers to the underlying linear space Vn , not to linear functions.
2
In the case of the more general Hermite interpolation, a p-fold interpolation point ξ corresponds
to prescribed values of the derivatives f (m) (ξ) for 0 ≤ m ≤ p − 1.
(a) either the interpolation problem is uniquely solvable for arbitrary values yi or
(b) the interpolant either does not exist for certain yi or is not unique.
The polynomial interpolation is characterised by
Vn = {polynomials of degree ≤ n}
and is always solvable. In the case of general vector spaces Vn , we always assume
that the interpolation problem is uniquely solvable.
For the special values yi = δij (j fixed, δij Kronecker symbol), one obtains an
interpolant Φj,n ∈ Vn , which we call the j-th Lagrange function (analogous to the
Lagrange polynomials in the special case of polynomial interpolation).
Exercise 4.3. (a) The interpolation In : X = C([0, 1]) → C([0, 1]) is continuous
and linear; i.e., In ∈ L(X, X).
(b) In is a projection; i.e., In In = In .
4.3 Stability
≤ kf k∞ Cn .
Since this estimate holds for all x ∈ [0, 1], it follows that kIn (f )k ≤ Cn kf k∞ .
Because f is arbitrary, kIn k ≤ Cn is proved.
Pn Pn
(ii) Let the function i=0 |Φi,n (·)| be maximal at x0 : i=0 |Φi,n (x0 )| = Cn .
Choose f ∈ C([0, 1]) with kf k∞ = 1 and f (xi,n ) = sign(Φi,n (x0 )). Then
50 4 Interpolation
Xn X n
|In (f )(x0 )| = f (xi,n )Φi,n (x0 ) = |Φi,n (x0 )| = Cn = Cn kf k∞
i=0 i=0
holds; i.e., kIn (f )k∞ = Cn kf k∞ for this f . Hence the operator norm
is bounded from below by kIn k ≥ Cn . Together with (i), the equality kIn k = Cn is
proved. tu
Remark 4.9. Given f ∈ C([0, 1]), let p∗n be the best approximation to f by a poly-
nomial 3 of degree ≤ n, while pn is its interpolant. Then the following estimate
holds:
kf − pn k ≤ (1 + Cn ) kf − p∗n k with Cn = kIn k . (4.6)
kf − p∗n k → 0
holds. An obvious conclusion from (4.6) is the following: If stability would hold
(i.e., Cn ≤ Cstab ), also kf − pn k → 0 follows. Instead, we shall show instabil-
ity, and the asymptotic behaviour on the right-hand side in (4.6) depends on which
process is faster: kf − p∗n k → 0 or Cn → ∞.
3
The space of polynomials can be replaced by any other interpolation subspace Vn .
4.5 Instability of Polynomial Interpolation 51
Proof. Let f ∈ C([0, 1]) and ε > 0 be given. There is some g ∈ X0 with
ε
kf − gk∞ ≤ ,
2 (1 + Cstab )
where Cstab is the stability constant. According to Definition 4.5, there is an n0 such
that kIn (g) − gk∞ ≤ 2ε for all n ≥ n0 . The triangle inequality yields the desired
estimate:
Proof. Since {In (f )} converges, the In are uniformly bounded. Apply Corollary
3.39 with X = Y = C([0, 1]) and Tn := In ∈ L(X; Y ). t u
Theorem 4.10 and Lemma 4.11 yield the following equivalence theorem.
We choose the equidistant interpolation points xi,n = i/n and restrict ourselves
to even n. The Lagrange polynomial L n2 ,n is particularly large in the subinterval
(0, 1/n). In its midpoint we observe the value
n 1 j n 1
2n − n − j
1
Y Y 2
L n ,n = =
2n 1 j n
2
j=0 2
−n
j=0 2
−j
j6= n
2 j6= n
2
1 1 3 n
− 32 × n2 + 12 × . . . × n − 12
2 × 2 × 2 × ... × 2
= n 2 .
2 !
Exercise 4.13. Show that the expression from above diverges exponentially.
52 4 Interpolation
Pn 1
Because of Cn = k i=0 |Li,n (·)|k∞ ≥ kL n2 ,n k∞ ≥ L n2 ,n ( 2n ) , interpolation
(at equidistant interpolation points) cannot be stable. The true behaviour of Cn has
first4 been described by Turetskii [21]:
2n+1
Cn ≈ .
e n log n
where γ is Euler’s constant.6 Even more asymptotic terms are determined in [11].
One may ask whether the situation improves for another choice of interpolation
points. In fact, an asymptotically optimal choice are the so-called Chebyshev points:
1
xi,n = 1 + cos i+1/2
n+1 π
2
(these are the zeros of the Chebyshev polynomial7 Tn+1 ◦ φ, where φ(ξ) = 2ξ + 1 is
the affine transformation from [0, 1] onto [−1, 1]). In this case, one can prove that8
2
kIn k ≤ 1 + log(n + 1) (4.7)
π
(cf. Rivlin [17, Theorem 1.2]), which is asymptotically the best bound, as the next
result shows.
4
For historical comments see [20].
5
Pn
The function ϕ = i=0 |Li,n (·)| attains its maximum Cn in the first and last interval. As
pointed out by Schönhage [18, §4], ϕ is of similar size as in (4.7) for the middle interval.
6
The value γ = 0.5772 . . . is already given in Euler’s first article [5]. Later, Euler computed 15
exact decimals places of γ.
7
The Chebyshev polynomial Tn (x) := cos(n arccos(x)), n ∈ N0 , satisfies the three-term
recursion Tn+1 (x) = 2xTn (x) − Tn−1 (x) (n ≥ 1), starting from T0 (x) = 1 and T1 (x) = x.
2 2 8
8
A lower bound is kIn k > π log(n + 1) + π γ + log π = limn→∞ kIn k , where
2 8
π
γ + log π = 0.962 52 . . .
4.6 Is Stability Important for Practical Computations? 53
2−ε
kIn k > log(n + 1) for all ε > 0.
π
The estimate of Theorem 4.14 originates from Erdös [4]. The bound
1
kIn k > √ log(n + 1)
8 π
Does the instability of polynomial interpolation mean that one should avoid polyno-
mial interpolation altogether? Practically, one may be interested in an interpolation
In∗ for a fixed n∗ . In this case, the theoretically correct answer is: the property of
In∗ has nothing to do with convergence and stability of {In }n∈N . The reason is that
convergence and stability are asymptotic properties of the sequence {In }n∈N and
are in no way related to the properties of a particular member In∗ of the sequence.
One can construct two different sequences {In0 }n∈N and {In00 }n∈N —one stable, the
other unstable—such that In0 ∗ = In00∗ belongs to both sequences. This argument also
holds for the quadrature discussed in the previous chapter.
On the other hand, we may expect that instability expressed by Cn → ∞ may
lead to large values of Cn , unless n is very small. We return to this aspect later.
The convergence statement from Definition 4.4 is, in practice, of no help. The
reason is that the convergence from Definition 4.4 can be arbitrarily slow, so that
for a fixed n, it yields no hint concerning the error In (f ) − f . Reasonable error
estimates can only be given if f has a certain smoothness, e.g., f ∈ C n+1 ([0, 1]).
Then the standard error estimate of polynomial interpolation states that
1
kf − In (f )k∞ ≤ Cω (In )
f (n+1)
, (4.8)
(n + 1)! ∞
54 4 Interpolation
where
n
Y
Cω (In ) := kωk∞ for ω(x) := (x − xi,n )
i=0
(cf. [14, §1.5], [19], [15, §8.1.1], [8, §B.3]). The quantity Cω (In ) depends on the
location of the interpolation points. It is minimal for the Chebyshev points, where
Cω (In ) = 4−(n+1) .
kf − In (f ) − δIn k∞ ≤ εint
n + εn
per
with
1
εint
n = Cω (In )kf (n+1) k∞ and εper
n = η kIn k kf k∞ .
(n + 1)!
Since η is small (maybe of the size of machine precision), the contribution εintn is
not seen in the beginning. However, with increasing n, the part εintn is assumed to
tend to zero, while εper
n increases to infinity because of the instability of In .
We illustrate this situation in two different scenarios. In both cases we assume
that the analytic function f is such that the exact interpolation error (4.8) decays
−n
like εint
n =e .
(1) Assume a perturbation error εpern = ηen due to an exponential increase of
kIn k. The resulting error is
e−n + ηen .
Regarding n as a real variable, we find a minimum at n = 12 log η1 with the value
√
2 η. Hence, we cannot achieve better accuracy than half the mantissa length.
2
(2) According to (4.7), we assume that εint
n = η(1 + π log(n + 1)), so that the
sum
2
e−n + η(1 + log(n + 1))
π
is the total error. Here, minimising n is the solution to the fixed-point equation
n = log(n + 1) − log(2η/π). For η = 10−16 the minimal value 3.4η of the total
error is taken at the integer value n = 41. The precision corresponds to almost the
full mantissa length. Hence, in this case the instability kIn k → ∞ is completely
harmless.10
9
There are further rounding errors, which we ignore to simplify the analysis.
10
To construct an example, where even for (4.7) the instability becomes obvious, one has to assume
that the interpolation error decreases very slowly like εint
n = 1/ log(n).
4.7 Tensor Product Interpolation 55
Finally, we give an example where the norm kIn k is required for the analysis of
the interpolation error, even if we ignore input errors and rounding errors. Consider
the function f (x, y) in two variables (x, y) ∈ [0, 1] × [0, 1]. The two-dimensional
polynomial interpolation can easily be constructed from the previous In . The tensor
product11 In2 := In ⊗ In can be applied as follows. First, we apply the interpolation
with respect to x. For any y ∈ [0, 1] we have
n
X
F (x, y) := In (f (·, y))(x) = f (xi,n , y) Φi,n (x).
i=0
Again
n+1
1
∂
|f (xi,n , y) − In (f (xi,n , ·)(y)| ≤ Cω (In )
n+1
f
(n + 1)!
∂y
∞
The previous estimates and the triangle inequality yield the final estimate
∂ n+1 ∂ n+1
f − In2 (f )
≤
1
∞
C (I
ω n ) kI n k k f k∞ + k f k ∞ .
(n + 1)! ∂y n+1 ∂xn+1
1
Note that the divergence of kIn k can be compensated by (n+1)! .
11
Concerning the tensor notation see [9].
12
Here, k·k∞ is the maximum norm over [0, 1]2 .
56 4 Interpolation
The simplest example is the linear interpolation where In (f )(xk ) = f (xk ) and f |Jk
(i.e., f restricted to Jk ) is a linear polynomial. The corresponding Lagrange function
Φj,n is called the hat function and has the support14 supp(Φj,n ) = Jj ∪ Jj+1 .
We may fix another polynomial degree d and fix points 0 = ξ0 < ξ1 < . . . <
ξd = 1. In each subinterval Jk = [xk−1 , xk ] we define interpolation nodes ζ` :=
xk−1 + (xk − xk−1 ) ξ` . Interpolating f by a polynomial of degree d at these nodes,
we obtain In (f )|Jk . Altogether, In (f ) is a continuous15 and piecewise polynomial
function on J. Again, supp(Φj,n ) = Jj ∪ Jj+1 holds.
A larger but still local support occurs in the following construction of piecewise
cubic functions. Define In (f )|Jk by cubic interpolation at the nodes14 xk−2 , xk−1 ,
xk , xk+1 . Then the support supp(Φj,n ) = Jj−1 ∪ Jj ∪ Jj+1 ∪ Jj+2 is larger than
before.
13
The support of a function f defined on I is the closed set supp(f ) := {x ∈ I : f (x) 6= 0}.
14
The expression has to be modified for the indices 1 and n at the end points.
15
If In (f ) ∈ C 1 (I) is desired, one may use Hermite interpolation; i.e., also dIn (f )/dx = f 0 at
x = xk−1 and x = xk . This requires a degree d ≥ 3.
4.8 Stability of Piecewise Polynomial Interpolation 57
The error estimates can be performed for each subinterval separately. Transfor-
−(d+1)
mation of inequality (4.8) to Jk yields kIn (f ) − f k∞,Jk ≤ Chk kf (d+1) k∞ ,
where d is the (fixed) degree of the local interpolation polynomial. The overall
estimate is
kIn (f ) − f k∞ ≤ Cδn−(d+1) kf (d+1) k∞ → 0, (4.10)
where we use the condition δn → 0. Pn
Stability is controlled by the maximum norm of Φn := i=1 |Φi,n (·)|. For
the examples from above it is easy to verify that kΦi,n k ≤ K independently
of i and n. Fix an argument x ∈ I. The local support property (4.9) implies
Pn
that Φi,n (x) 6= 0 holds for at most α + β + 1 indices i. Hence i=1 |Φi,n (x)|
≤ C stab := (α + β + 1) K holds and implies supn kIn k ≤ C stab (cf. (4.5)).
In this case the support of a Lagrange function Φj,n , which now is called a cardinal
spline, has global support:16 supp(Φj,n ) = J. Interestingly, there is another basis
of Vn consisting of so-called B-splines Bj , whose support is local:14 supp(Bj ) =
Jj−1 ∪ Jj ∪ Jj+1 ∪ Jj+2 . Furthermore, they are non-negative and sum up to
n
X
Bj = 1. (4.11)
j=0
17
We choose an equidistantP grid; i.e., Ji = [(i − 1)h, ih] with h := 1/n. The
n
stability estimate kIn k = k i=0 |Φi,n (·)|k∞ ≤ Cstab (cf. (4.5)) is equivalent to
n
X
kSk∞ ≤ Cstab kyk∞ , where S = yi Φi,n ∈ Vn
i=0
is the spline function interpolating yi = S(xi ). In the following, we make use of the
16
Φj,n is non-negative in Jj ∪ Jj+1 and has oscillating signs in neighbouring intervals. One
can prove that the maxima of Φj,n in Jk are exponentially decreasing with |j − k|. This fact can
already be used for a stability proof.
17
For the general case compare [14, §2], [15, §8.7], [19, §2.4].
58 4 Interpolation
B-splines, which easily can be described for the equidistant case.18 The evaluation
at the grid points yields
for all x ∈ J, so that the stability estimate kSk∞ ≤ Cstab kyk∞ is proved with
Cstab := 3.
18
The explicit polynomials are
ξ3 , ξ = x − xj−2 , x ∈ Jj−1
1 h3 + 3h2 ξ + 3hξ 2 − 3ξ 3 , ξ = x − xj−1 , x ∈ Jj ,
Bj = 3 2 2 3 for 2 ≤ j ≤ n − 2,
6h3 h + 3h ξ + 3hξ − 3ξ , ξ = xj+1 − x, x ∈ Jj+1 ,
ξ3 , ξ = xj+2 − x, x ∈ Jj+2 ,
2
1 6h x − 2x3 , x ∈ J1 ,
B1 = h3 + 3h2 ξ + 3hξ 2 − 3ξ 3 , ξ = 2h − x, x ∈ J2 , , Bn−1 (x) = B1 (1 − x),
6h3 ξ 3 , ξ = 3h − x, x ∈ J3 ,
3
1 h + 3h2 ξ + 3hξ 2 − ξ 3 , ξ = h − x, x ∈ J1 ,
B0 = , Bn (x) = B0 (1 − x).
6h 3 ξ3 , ξ = 2h − x, x ∈ J2 ,
4.9 From point-wise Convergence to Operator-Norm Convergence 59
Remark 4.15. The previous results show that consistency is in conflict with stability.
Polynomial interpolation has an increasing order of consistency, but suffers from in-
stability (cf. Theorem 4.14). On the other hand, piecewise polynomial interpolation
of bounded order is stable.
4.10 Approximation
The second part of (4.14) describes n + 1 = dim(Vn ) equations, which are used
by the Remez algorithm to determine Φn ∈ Vn (cf. Remez [16]).
From (4.14) one concludes that there are n zeros ξ1 < . . . < ξn of ε = f − Φn ;
i.e., Φn can be regarded as an interpolation polynomial with these interpolation
points. However note that the ξµ depend on the function f.
The mapping f 7→ Φn is in general nonlinear. Below, when we consider Hilbert
spaces, it will become a linear projection.
Since the set of polynomials is dense in C([a, b]) (cf. Theorem 3.28), the condi-
tion
[
V0 ⊂ V1 ⊂ . . . ⊂ Vn ⊂ Vn+1 ⊂ . . . and Vn = C([a, b]) (4.15)
n∈N0
Stability issues do not appear in this setting. One may consider the sequence
{kΦn k : n ∈ N0 } , but (4.16) proves convergence kΦn k → kf k; i.e., the sequence
must be uniformly bounded.
The approximation is simpler if B is a Hilbert space with scalar product h·, ·i .
Then the best approximation from (4.13) is obtained by means of the orthogonal
19
B is strictly convex if kf k = kgk = 1 and f 6= g imply kf + gk < 2.
4.10 Approximation 61
Φn = Πn f.
Given any orthonormal basis {φµ : 0 ≤ µ ≤ n} of Vn , the solution has the explicit
representation
Xn
Φn = hf, φµ i φµ . (4.17)
µ=0
At first glance there is no stability problem to be discussed, since the operator norm
of orthogonal projections equals one: kΠn kL2 ←L2 = 1. However, if we consider
the operator norm kΠn kB←B for another Banach space, (in)stability comes into
play.
Let Πn be the Fourier projection from above and choose the Banach space
B = C2π := {f ∈ C([−π, π]) : f (−π) = f (π)} equipped with the maximum
norm k·k∞ . We ask for the behaviour of kΠn k∞ , where now k·k∞ = k·kC2π ←C2π
denotes the operator norm. The mapping (4.17) can be reformulated by means of
the Dirichlet kernel,
1 π sin(2n + 1)y
Z
(Φn f ) (x) = [f (x + 2y) + f (x − 2y)]dy.
π 0 sin(y)
1 π sin(2n + 1)y
Z
kΠn k∞ = dy.
π 0 sin(y)
20
That means (i) Πn Πn = Πn (projection property) and (ii) Πn is selfadjoint: hΠn f, gi =
hf, Πn gi for all f, g ∈ B.
62 4 Interpolation
This, however, can only lead to larger norms kPn k∞ due to the following result of
Cheney et al. [3].
References
1. Bernstein, S.N.: Sur la limitation des valeurs d’un polynôme p(x) de degré n sur tout un
segment par ses valeurs en (n + 1) points du segment. Izv. Akad. Nauk SSSR 8, 1025–1050
(1931)
2. Chebyshev, P.L.: Sur les questions de minima qui se rattachent à la représentation approxima-
tive des fonctions. In: Oeuvres, Vol. I, pp. 641–644. St. Petersburg (1899)
3. Cheney, E.W., Hobby, C.R., Morris, P.D., Schurer, F., Wulbert, D.E.: On the minimal property
of the Fourier projection. Trans. Am. Math. Soc. 143, 249–258 (1969)
4. Erdös, P.: Problems and results on the theory of interpolation. Acta Math. Acad. Sci. Hungar.
12, 235–244 (1961)
5. Euler, L.: De progressionibus harmonicis observationes. Commentarii academiae scientiarum
imperialis Petropolitanae 7, 150–161 (1740)
6. Faber, G.: Über die interpolatorische Darstellung stetiger Funktionen. Jber. d. Dt. Math.-
Verein. 23, 190–210 (1914)
7. Haar, A.: Die Minkowskische Geometrie und die Annäherung an stetige Funktionen. Math.
Ann. 78, 294–311 (1918)
8. Hackbusch, W.: Hierarchische Matrizen - Algorithmen und Analysis. Springer, Berlin (2009)
9. Hackbusch, W.: Tensor spaces and numerical tensor calculus, Springer Series in Computa-
tional Mathematics, Vol. 42. Springer, Berlin (2012)
10. Meinardus, G.: Approximation of functions: theory and numerical methods. Springer, New
York (1967)
11. Mills, T.M., Smith, S.J.: The Lebesgue constant for Lagrange interpolation on equidistant
nodes. Numer. Math. pp. 111–115 (1992)
12. Natanson, I.P.: Konstruktive Funktionentheorie. Akademie-Verlag, Berlin (1955)
13. Natanson, I.P.: Constructive function theory, Vol. III. Frederick Ungar Publ., New York (1965)
14. Plato, R.: Concise Numerical Mathematics. AMS, Providence (2003)
15. Quarteroni, A., Sacco, R., Saleri, F.: Numerical Mathematics, 2nd ed. Springer, Berlin (2007)
16. Remez, E.J.: Sur un procédé convergent d’approximations successives pour déterminer les
polynômes d’approximation. Compt. Rend. Acad. Sc. 198, 2063–2065 (1934)
17. Rivlin, T.J.: Chebyshev Polynomials. Wiley, New York (1990)
18. Schönhage, A.: Fehlerfortpflanzung bei Interpolation. Numer. Math. 3, 62–71 (1961)
19. Stoer, J., Bulirsch, R.: Introduction to Numerical Analysis. North-Holland, Amsterdam (1980)
20. Trefethen, L.N., Weideman, J.A.C.: Two results on polynomial interpolation in equally spaced
points. J. Approx. Theory 65, 247–260 (1991)
21. Turetskii, A.H.: The bounding of polynomials prescribed at equally distributed points (Rus-
sian). Proc. Pedag. Inst. Vitebsk 3, 117–127 (1940)
Chapter 5
Ordinary Differential Equations
1
In the case of a system of differential equations, f is defined in R × Rn and the solution y ∈
C 1 (R, Rn ) is vector-valued. For our considerations it is sufficient to study the scalar case n = 1.
2
However, it may happen that the solution exists only on a smaller interval [x0 , xS ) ⊂ [x0 , xE ].
The Lipschitz constant L will appear in later analysis. Unique solvability will be
stated in Corollary 5.10.
We choose a fixed step size3 h > 0. The corresponding grid points are
xi := x0 + ih (i ∈ N0 , xi ∈ I).
η0 := y0 . (5.3)
The prototype of the one-step methods is the Euler method, which starts with
(5.3) and defines recursively
Exercise 5.1. Consider the differential equation y 0 = ay (i.e., f (x, y) = ay) with
the initial value y0 = 1 at x0 = 0 (its exact solution is y(x) = eax ). Determine the
solution of (5.4). Does y(x) − η(x; h) → 0 hold for h := x/n > 0, when n → ∞
and h → 0 for fixed nh = x ?
For other one-step methods, one replaces the right-hand side in (5.4) by a more
general expression hφ(xi , ηi , h; f ). Here, the last argument f means that inside of φ
the function f can be used for arbitrary evaluations. The Euler method corresponds
to φ(xi , ηi , h; f ) = f (xi , ηi ).
Often, the evaluation of φ is performed in several partial steps. For instance, the
Heun method [10] uses the intermediate step ηi+1/2 :
h h
ηi+1/2 := ηi + f (xi , ηi ), ηi+1 := ηi + hf (xi + , ηi+1/2 ).
2 2
These equations yield φ(xi , ηi , h; f ) := f (xi + h2 , ηi + h2 f (xi , ηi )). The classi-
cal Runge–Kutta method uses four intermediate steps (cf. Runge [19], Kutta [15];
history in [1]; for a modern description see, e.g., [8, §II]).
The term ‘one-step method’ refers to the fact that (xi+1 , ηi+1 ) is determined only
by (xi , ηi ). The past values ηj , j < i, do not enter into the algorithm.
On the other hand, since besides ηi the values ηi−r , ηi−r+1 , . . . , ηi−1 are avail-
able (r is a fixed natural number), one may ask whether one can use these data. In-
deed, having more free parameters, one can try to increase the order of the method.
This leads to the r-step method, which is of the form
r−1
X
ηj+r := − αν ηj+ν + hφ(xj , ηj+r−1 , . . . , ηj , h; f ) (5.6)
ν=0
(more precisely, this is the explicit form of a multistep method) with the additional
parameters α0 , . . . , αr−1 ∈ R for j = 0, . . . , r − 1. As we shall see,
r−1
X
αν = −1 (5.7)
ν=0
Remark 5.3. Because of (5.7), a multistep method (5.6) with r = 1 coincides with
the one-step method (5.5).
Remark 5.4. In the case of r ≥ 2, the multistep methods (5.6) can only be used
for the computation of ηi for i ≥ r. The computation of η1 , η2 , . . . , ηr−1 must be
defined in another way (e.g., by a one-step method).
Then the fixed-point equation x = Ψ (x) has exactly one solution, and for all starting
values x0 ∈ X the fixed-point iteration xn+1 = Ψ (xn ) converges to this solution.
Furthermore, the error of the n-th iterant can be estimated by
(LΨ )n
kx − xn kX ≤ kx1 − x0 kX for all n ∈ N0 . (5.10b)
1 − LΨ
Proof. (i) Let x0 , x00 be two solutions of the fixed-point equation; i.e., x0 = Ψ (x0 )
and x00 = Ψ (x00 ). Exploiting the contraction property with L = LΨ , we get
From L < 1 we conclude that kx0 − x00 kX = 0; i.e., uniqueness x0 = x00 is proved.
(ii) The iterants xn of the fixed-point iteration satisfy the inequality
i.e., {xn } is a Cauchy sequence. Since X is a Banach space and therefore complete,
a limit x∗ = lim xn exists. Because of the continuity of the function Ψ , the limit in
xn+1 = Ψ (xn ) yields x∗ = Ψ (x∗ ); i.e., x∗ is a fixed-point solution.
Inequality (5.10b) follows from (5.10c) for n → ∞. t u
5.2 Fixed-Point Theorem and Recursive Inequalities 67
In the following analysis, recursive inequalities of the following form will appear:
The meaning of the parameters is: L Lipschitz constant, h step size, and k local
consistency order.
Lemma 5.6. Any solution of the inequalities (5.11) satisfies the estimate
νh for L = 0
aν ≤ eνhL a0 + hk−1 B · eνhL − 1 (ν ∈ N0 ).
for L > 0
L
Proof. We pose the following induction hypothesis:
ν−1
X µ ν
aν ≤ Aν := (1 + hL) hk B + (1 + hL) a0 . (5.12)
µ=0
ν ν
Exercise 3.24a shows that (1 + hL) ≤ ehL = ehLν . For L > 0, the geometric
sum yields the value
ν−1 ν
X µ (1 + hL) − 1 B ν
hk B (1 + hL) = hk B = hk−1 [(1 + hL) − 1]
µ=0
(1 + hL) − 1 L
B hLν
≤ hk−1
e −1 .
L
Therefore, Aν from (5.12) can be estimated by Aν ≤ hk−1 B
hLν
L e − 1 + ehLν a0 .
The particular case L = 0 can be treated separately or obtained from L > 0 by
performing the limit L → 0. t u
Rx
Exercise 5.7. Prove that any solution ϕ of the inequality ϕ(x) ≤ ϕ0 + L x0 ϕ(t)dt
is bounded by Φ; i.e., ϕ(x) ≤ Φ(x), where Φ is the solution of the integral equation
Z x
Φ(x) = ϕ0 + L Φ(t)dt.
x0
Rx
Hint: (a) Define Ψ (Φ) by Ψ (Φ)(x) := ϕ0 + L x0 Φ(t)dt. The integral equation is
the fixed-point equation Φ = Ψ (Φ) for Φ ∈ C(I), I = [x0 , xE ]. Show that Ψ is a
contraction with respect to the norm
68 5 Ordinary Differential Equations
Lemma 5.8. Let X and Y be Banach spaces. The operators S, T ∈ L(X, Y ) are
supposed to satisfy
kT −1 kX←Y
kS −1 kX←Y ≤ .
1 − kS − T kY ←X kT −1 kX←Y
Before we start with the numerical solution, we should check whether the initial-
value problem, i.e., the mapping (y0 , f ) 7→ y, is well-conditioned. According to
§2.4.1.2, the amplification of a perturbation of the input data is to be investigated. In
the present case, one can perturb the initial value y0 as well as the function f . The
first case is analysed below, the second one in Theorem 5.12.
Assume that f ∈ C(I × R) satisfies (5.2). Then the following estimate4 holds in I:
|y1 (x) − y2 (x)| ≤ |y0,1 − y0,2 | eL(x−x0 ) with L from (5.2). (5.13)
4
By definition of I in §5.1, x ≥ x0 holds. Otherwise, eL(x−x0 ) is to be replaced by eL|x−x0 | .
5.3 Well-Conditioning of the Initial-Value Problem 69
Rx
Proof. yi (x) = y0,i + x0
f (t, yi (t))dt holds for i = 1, 2, so that
Z x
|y1 (x) − y2 (x)| = y0,1 − y0,2 + [f (t, y1 (t)) − f (t, y2 (t))] dt
x
Z 0x
≤ |y0,1 − y0,2 | + |f (t, y1 (t)) − f (t, y2 (t))| dt ≤
x0 (5.2)
Z x
≤ |y0,1 − y0,2 | + L |y1 (t) − y2 (t)| dt.
x0
Hence Exercise 5.7 proves |y1 (x) − y2 (x)| ≤ Φ(x); i.e., (5.13). This ends the proof
of Theorem 5.9. tu
Corollary 5.10. Assumption (5.2) ensures uniqueness of the solution of the initial-
value problem (5.1a,b).
hence y1 = y2 on I. t
u
One may denote the solution of the initial-value problem (5.1a,b) by y(x; y0 )
with the initial value y0 as second argument. Then (5.13) states that y(·; ·) as well as
f (·, ·) are Lipschitz continuous with respect to the second argument. This statement
can be generalised.
Theorem 5.12. Let y and ỹ be solutions of y 0 = f (x, y) and ỹ 0 = f˜(x, ỹ), respec-
tively, with coinciding initial values y(x0 ) = ỹ(x0 ) = y0 . Only f (not f˜) has to
fulfil the Lipschitz condition (5.2), while
f (x, y) − f˜(x, y) ≤ ε for all x ∈ I, y ∈ R.
Then
ε
eL(x−x0 ) − 1 if L > 0
|y(x) − ỹ(x)| ≤ L , (L from (5.2)).
ε(x − x0 ) if L = 0
70 5 Ordinary Differential Equations
Exercise 5.14. Let φ from (5.14) be Lipschitz continuous with respect to ηi+1 :
Show that the fixed-point equation (5.14) is uniquely solvable, if h < 1/L.
Since φ is defined implicitly via f , the Lipschitz property of φ is inherited from the
Lipschitz continuity of f . Therefore, we have always to assume that f satisfies (5.2).
Exercise 5.15. (a) Prove for the Euler and Heun methods that (5.2) implies (5.16).
(b) According to §5.4.1, the implicit Euler method (5.15) leads to an explicit method
with φ̂(xi , ηi , h; f ). For sufficiently small h, prove Lipschitz continuity of φ̂.
5.4.3 Consistency
For the explicit definition of τ (ξ, η; h) fix ξ ∈ I and η ∈ R, and let Y (· ; ξ, η) be the
solution of (5.1a) with initial value condition
Y (ξ; ξ, η) = η at x = ξ (not at x = x0 ).
Then
Y (ξ + h; ξ, η) − η
τ (ξ, η; h) := − φ(ξ, η, h; f ) (5.17)
h
defines the local discretisation error at (ξ, η; h).
Obviously, we may expect that the one-step method (5.5) is better the smaller τ
is. Note that τ = 0 leads to the ideal result ηi = y(xi ).
72 5 Ordinary Differential Equations
Here y is the solution of (5.1a,b). The argument f ∈ C(I×R) of φ must satisfy (5.2).
(b) Furthermore, φ is called consistent of order p if τ (x, y(x); h) = O(hp ) holds
uniformly for h → 0 on x ∈ I for all sufficiently smooth f .
Assuming f to be sufficiently smooth,5 one performs the Taylor expansion of
1
h [y(x + h; x, η) − η] and uses
Hence (5.18) implies the condition φ(x, η, h; f ) → f (x, η). One easily checks that
this condition is satisfied for the methods of Euler and Heun.
However, the trivial one-step method ηi+1 := ηi (i.e., φ = 0) leads, in general,
to τ (x, η; h) = O(1) and is not consistent.
5.4.4 Convergence
We recall the notation ηi = η(xi , h). The desired property is η(x, h) ≈ y(x).
Concerning the limit h → 0, we restrict ourselves tacitly to (a subsequence of)
hn := (x − x0 ) /n, since then x = nhn belongs to the grid on which η(·, hn ) is
defined.
Definition 5.17 (convergence). A one-step method is called convergent if for all
Lipschitz continuous f and all x ∈ I
holds. A one-step method has convergence order p if η(x, h) = y(x) + O(hp ) for
sufficiently smooth f.
5.4.5 Stability
Consistency controls the error generated in the i-th step from xi to xi+1 under the
assumption that ηi is the exact starting value. At the start, η0 = y0 is indeed exact,
so that according to condition (5.18) the error ε1 := η1 − y1 is o(h) or O(hp+1 ),
respectively.
During the steps for i ≥ 1 the consistency error, e.g., arising at x1 , is transported
into η2 , η3 , . . . Since the computation proceeds up to x = xn , one has to perform
5
Without p-fold continuous differentiability of f one cannot verify τ (x, η; h) = O(hp ).
5.4 Analysis of One-Step Methods 73
n = O(1/h) steps. If the error would be amplified in each step by a factor c > 1
(c independently of h), ηn had an error O(cn ) = O(c1/h ). Obviously, such an error
would explode exponentially as h → 0. In addition, not only can the consistency
error ε1 be amplified, but also can all consistency errors εi at the later grid points
xi .
Next we state that—thanks to Lipschitz condition (5.16)—the errors are under
control.6
Lemma 5.18 (stability of one-step methods). Assume that the Lipschitz condi-
tion (5.16) holds with constant Lφ and the local discretisation error is bounded by
|τ (xi , y(xi ); h)| ≤ Th (cf. (5.17)). Then the global discretisation error is bounded
by
e(x−x0 )Lφ − 1
|η(x, h) − y(x)| ≤ Th . (5.19)
Lφ
Proof. δi := |ηi − y(xi )| is the global error. The local discretisation error is de-
noted by τi = τ (xi , y(xi ); h). Starting with δ0 = 0, we obtain the recursion formula
Theorem 5.19. Let the one-step method (5.5) fulfil the Lipschitz condition (5.16),
and assume consistency. Then (5.5) is also convergent:
lim η(x, h) = y(x).
h→0
If, in addition, the consistency order is p, then also the convergence is of order p.
We remark that, in general, the Lipschitz condition (5.16) holds only locally.
Then one argues as follows. G := {(x, y) : x ∈ [x0 , xE ], |y − y(x)| ≤ 1} is com-
(x−x0 )Lφ
pact. It suffices7 to require (5.16) on G. For sufficiently small h, Th e Lφ −1
in (5.19) is bounded by 1 and therefore (x, η(x, h)) ∈ G. A view to the proof
of Lemma 5.18 shows that all intermediate arguments belong to G and therefore,
(5.16) is applicable.
According to Theorem 5.19, one may be very optimistic that any consistent one-
step method applied to some ordinary differential equation is working well. How-
ever, the statement concerns only the asymptotic behaviour as h → 0. A problem
arises if, e.g., the asymptotic behaviour is only observed for h ≤ 10−9 , while we
want to apply the methods for a step size h ≥ 0.001. This gives rise to stronger
stability requirements (cf. §5.5.8).
ψ(1) = 0. (5.22)
7
Locally Lipschitz continuous functions are uniformly Lipschitz continuous on a compact set.
5.5 Analysis of Multistep Methods 75
τ (x, y; h) := (5.23)
" r #
1 X
αν Y (xj+ν ; xj , y) − hφ xj , Y (xj+ν−1 ; xj , y), . . . , Y (xj ; xj , y), h; f .
h ν=0 | {z }
=y
for all f ∈ C(I × R) with Lipschitz property (5.2). Here y(x) is the solution of
(5.1a,b). Furthermore, the multi-step method (5.20a) is called consistent of order p
if |τ (x, y(x); h)| = O(hp ) holds for sufficiently smooth f .
For f = 0 and the initial valuePy0 = 1, the solution is y(x) = 1 and,
Prin this case,
r
τ (x, y(x); h) → 0 simplifies to ( ν=0 αν − hφ) /h → 0, implying ν=0 αν = 0,
which is condition (5.7).
5.5.2 Convergence
Differently from the case of a one-step method, we cannot assume exact starting
values η1 , . . . , ηr−1 . Therefore, we assume that all starting values are perturbed:
In this case, ε = (εj )j≥0 is a tuple with as many entries as grid points (note that
the quantities εj for j < r and j ≥ r have a quite different meaning!). Again,
η(x; ε, h) → y(x) can be required for h → 0 and kεk∞ → 0.
76 5 Ordinary Differential Equations
5.5.3 Stability
Definition 5.23. The multistep method (5.20a) is called stable if all roots ζ of the
characteristic polynomial ψ have the following property: either |ζ| < 1 or ζ is a
simple zero with |ζ| = 1.
Remark 5.24. One-step methods are always stable in the sense of Definition 5.23.
The relation between the stability condition from Definition 5.23 and the multistep
methods (5.20a) is not quite obvious. The connection will be given in the study of
difference equations. As preparation we first discuss power bounded matrices.
Definition 5.25. Let k·k be a matrix norm. A square matrix A is power bounded8 if
k·k is an associated matrix norm in Cd×d if there is a vector norm ||| · ||| in Cd
such that kAk = sup{||| Ax ||| / ||| x |||: x 6= 0} for all A ∈ Cd×d .
The norm k·k∞ has two meanings. For vectors x ∈ Cd , it is the maximum norm
kxk∞ = maxi |xi |, while forP matrices it is the associated matrix norm. Because of
the property kM k∞ = maxi j |Mij |, it is also called the row-sum (matrix) norm.
We denote the spectrum of a square matrix M by
σ(M ) := {λ ∈ C : λ eigenvalue of M }.
Exercise 5.26. Suppose that k·k is an associated matrix norm. Prove |λ| ≤ kM k
for all λ ∈ σ(M ).
Theorem 5.27. Equivalent characterisations of the power boundedness of A are
(5.26a) as well as (5.26b):
All eigenvalues of A satisfy either
(a) |λ| < 1 or (5.26a)
(b) |λ| = 1, and λ has coinciding algebraic and geometric multiplicities.
There is an associated matrix norm such that kAk ≤ 1. (5.26b)
where the entries ∗ are either zero or one. Since the algebraic and geometric
multiplicities of λd−m+1 , . . . , λd coincide, D is a diagonal m × m matrix, while
all eigenvalues λi (i = 1, . . . , r − m) have absolute value < 1. Set ∆ε :=
diag{1, ε, ε2 , . . . , εr−1 } with ε ∈ (0, 1 − |λr−m |]. One verifies that ∆−1
ε J∆ε has
the row-sum norm k∆−1 ε J∆ε k∞ ≤ 1. Therefore, a transformation by S := T ∆ε
yields the norm kS −1 ASk∞ ≤ 1. kAk := kS −1 ASk∞ is the associated matrix
norm corresponding to the vector norm kxk := kSxk∞ . This proves (5.26b).
(iii) Assume (5.26b). Associated matrix norms are submultiplicative; i.e.,
n
kAn k ≤ kAk , so that kAk ≤ 1 implies Cstab = 1 < ∞. t u
78 5 Ordinary Differential Equations
Lemma 5.28. Let A ⊂ Cn×n be bounded, and suppose that there is some γ < 1
such that the eigenvalues satisfy |λ1 (A)| ≤ |λ2 (A)| ≤ . . . ≤ |λn (A)| ≤ 1 and
|λn−1 (A)| ≤ γ < 1 for all A ∈ A. Then sup{kAn k : n ∈ N, A ∈ A} < ∞.
Proof. We may choose the spectral norm k · k = k · k2 . The Schur normal form RA
is defined by A = QRA Q−1 , Q unitary, RA upper triangular, with (RA )ii = λi (A).
By boundedness of A, there is some M > 0 such that RA is bounded entry-wise by
(RA )ij ≤ Rij , where the matrix R is defined by Rij := 0 for i > j, Rii := γ for
γ ... M M
.
0 ..
1 ≤ i ≤ n − 1, Rnn := 1, Rij := M for i < j; i.e., R = M M. It is easy to
0 ... γ M
0 ... 0 1
verify that kAn k2 = kRA
n
k2 ≤ kRn k2 . Since R is power bounded, we have proved
a uniform bound. t u
The set F = CN0 consists of sequences x = (xj )j∈N0 of complex numbers. We are
looking for sequences x ∈ F satisfying the following difference equation:
r
X
αν xj+ν = 0 for all j ≥ 0, where αr = 1. (5.27)
ν=0
Proof. (i) The vector space properties of F and F0 are trivial. It remains to prove
dim F0 = r. We define x(i) ∈ F for i = 0, 1, . . . , r − 1 by the initial values
(i)
xj = δij for j ∈ {0, . . . , r − 1}. For j ≥ r we use (5.27) to define
r−1
(i) (i)
X
xj := αν xj−r+ν for all j ≥ r. (5.28)
ν=0
i.e., x ∈ F0 .
(ii) By Part (i), x(i) ∈ F0 holds. One easily verifies that the solutions x(i) are
linearly independent. Because of dim F0 = r, {x(i) : 1 ≤ i ≤ r} forms a basis. t u
In the case of Remark 5.30, the zeros are simple. It remains to discuss the case
of multiple zeros.
We recall that the polynomial ψ has an (at least) k-fold zero ζ0 if and only if
ψ(ζ0 ) = ψ 0 (ζ0 ) = . . . = ψ (k−1) (ζ0 ) = 0 . The Leibniz rule yields
d ` j
ζ ψ(ζ) = 0 at ζ = ζ0 for 0 ≤ ` ≤ k − 1.
dζ
d `
The explicit representation of ( dζ ) ζ j ψ(ζ) reads
r
(`) X
j
0 = ζ ψ(ζ) = αν ζ0j+ν−` (j + ν) (j + ν − 1) · . . . · (j + ν − ` + 1) .
ζ=ζ0
ν=0
(5.29)
(`)
Define x(`) for ` ∈ {0, 1, . . . , k − 1} via xj = ζ0j j (j − 1) · . . . · (j − ` + 1).
Insertion into the difference equation (5.27) yields
r r
(`)
X X
αν xj+ν = αν ζ0j+ν (j + ν) (j + ν − 1) · . . . · (j + ν − ` + 1) .
ν=0 ν=0
Pr (`)
This is ζ0` times the expression in (5.29); hence ν=0 αν xj+ν = 0; i.e., x(`) ∈ F0 .
The case ζ0 = 0 is excluded in Remark 5.31, since the previous definition leads
(`)
to xj = 0 for j ≥ min{1 − `, 0} and therefore does not yield linearly independent
solutions.
(i)
Remark 5.32. Let ζ0 = 0 be a k-fold zero of ψ. Then x(i) with xj = (δij )j∈N0
(i = 0, . . . , k − 1) are k linearly independent solutions of (5.27).
5.5.4.4 Stability
Definition 5.34. The difference equation (5.27) is called stable if any solution of
(5.27) is bounded with respect to the supremum norm:
Proof. (i) Eq. (5.30) implies kxk∞ < ∞, since maxj=0,...,r−1 |xj | is always finite.
(i)
Pr−1x ∈(i)F0 as in (ii) of the proof of Lemma 5.29. x has
(ii) We choose the basis
a representation x = i=0 xi x . Assuming stability in the sense of Definition
(i)
Pr−1
5.34, we have Ci := kx k∞ < ∞ and therefore also C := Ci < ∞.
Pr−1 Pr−1 Pi=0
r−1
The estimate kxk∞ = k i=0 xi x(i) k∞ ≤ i=0 |xi | kx(i) k∞ = i=0 Ci |xi | ≤
C maxj=0,...,r−1 |xj | proves (5.30). t
u
5.5 Analysis of Multistep Methods 81
Obviously, all solutions of (5.27) are bounded if and only if all basis solutions
given in Theorem 5.33 are bounded. The following complete list of disjoint cases
refers to the zeros ζi of ψ and their multiplicities ki .
1. |ζi | < 1: all sequences (ζij j ` )j∈N0 with 0 ≤ ` < ki are zero sequences and
therefore bounded.
2. |ζi | > 1: for all sequences lead to lim |ζij j ` | = ∞; i.e., they are unbounded.
3. |ζi | = 1 and ki = 1 (simple zero): (ζij )j∈N0 is bounded by 1 in absolute value.
4. |ζi | = 1 and ki > 1 (multiple zero): lim |ζij j ` | = ∞ holds for 1 ≤ ` ≤ ki − 1;
i.e., the sequences are unbounded.
Therefore, the first and third cases characterise the stable situations, while the
second and fourth cases lead to instability. This proves the next theorem.
Theorem 5.36. The difference equation (5.27) is stable if and only if ψ satisfies the
stability condition from Definition 5.23.
Uniform boundedness of |xj | and kXj k are equivalent. If (5.27) is stable, kAn X0 k
is uniformly bounded for all X0 with kX0 k ≤ 1 and all n ∈ N; i.e., A is a power
bounded matrix. On the other hand, Cstab := sup{kAn k : n ∈ N} < ∞ yields the
estimate kXn k ≤ Cstab kX0 k and therefore stability. t
u
Lemma 5.40. The difference equation (5.27) is stable if and only if there is an
associated matrix norm, so that kAk ≤ 1 holds for the companion matrix A from
(5.31).
Theorem 5.41. Suppose that the difference equation is stable and that the initial
values fulfil |xj | ≤ α for 0 ≤ j ≤Pr − 1. If the sequence (xj )j∈N0 satisfies the
r
inhomogeneous difference equation ν=0 αν xj+ν = βj+r with
|βj+r | ≤ β + γ max{|xµ | : 0 ≤ µ ≤ j + r − 1}
Proof. (i) Let A be the companion matrix. k·k denotes the vector norm in Rr as
well as the associated matrix norm k·k from Lemma 5.40. Because of the norm
equivalence,
≤ kXj k + β + γξj+r−1 .
Define ηj by
5.5 Analysis of Multistep Methods 83
ηj+1 ≤ (1 + γk) ηj + β.
Apply Lemma 5.6 to this inequality. The corresponding quantities in (5.11) are
ν ≡ j, aν ≡ ηj , h ≡ 1, L ≡ kγ, B ≡ β. Lemma 5.6 yields the inequality
jβ if γ = 0
ηj ≤ η0 ejkγ + β (j ∈ N0 ).
kγ ejkγ − 1 if γ > 0
We shall show that convergence and stability of multistep methods are almost
equivalent. For exact statements one needs a further assumption concerning the
connection of φ(xj , ηj+r−1 , . . . , ηj , h; f ) and f . A very weak assumption is
This assumption is satisfied, in particular, for the important class of linear r-step
methods:
r
X
φ(xj , ηj+r , . . . , ηj , h; f ) = bµ fj+µ with fk = f (xk , ηk ). (5.35)
µ=0
Theorem 5.42 (stability theorem). Suppose (5.34). Then the convergence from
Definition 5.22 implies stability.
(ii) For the indirect proof, assume instability. Then an unbounded solution
x ∈ F0 exists. The divergence
follows from J(h) → ∞ for h → 0 and kxk∞ = ∞. Choose the initial pertur-
bation ε = (εj )j=0,...,r−1 := (xj /C(h))j=0,...,r−1 . Obviously, kεk∞ → 0 holds
for h → 0. For this initial perturbation the multistep method produces the solution
1
(ηj )0≤j≤J(h) with ηj = C(h) xj . Since
1
sup |η(x; ε, h) − y(x)| = max {|xj | : 0 ≤ j ≤ J(h)} = 1,
x∈I C(h)
Remark 5.43. Condition (5.36) is satisfied for linear r-step methods (5.35).
Theorem 5.44 (convergence theorem). Let (5.36) be valid. Furthermore, the multi-
step method is supposed to be consistent and stable. Then it is convergent (even in
the stronger sense as discussed below Definition 5.22).
Proof. The initial error is defined by ηj = y(xj ) + εj for j = 0, . . . , r − 1. The
multistep formula with additional errors hεj+r is described in (5.24). The norm of
the error is kεk∞ := max{|εj | : 0 ≤ j ≤ J(h)} with J(h) as in the previous
proof (the usual convergence leads to εj = 0 for r ≤ j ≤ J(h), only in the case
described in brackets can εj 6= 0 appear for r ≤ j ≤ J(h)). We have to show that
η(x; ε, h) → y(x) for h → 0, kεk∞ → 0.
The error is denoted by ej := ηj − y(xj ). The initial values for 0 ≤ j ≤ r − 1
are ej = εj . The equation
r
X
αν y(xj+ν )−hφ(xj , y(xj+r−1 ), . . . , y(xj ), h; f ) = hτ (xj , y(xj ); h) =: hτj+r
ν=0
containing the local discretisation error τj+r is equivalent to (5.23). We form the
difference between the latter equation and (5.24) for j ≥ r and obtain
r
X
αν ej+ν = βj+r := h (εj+r − τj+r )
ν=0
+ h [φ(xj , ηj+r−1 ,. . . , ηj , h; f ) − φ(xj , y(xj+r−1 ),. . . , y(xj ), h; f )] .
5.5 Analysis of Multistep Methods 85
kεk∞ + kτ k∞ jhLφ k
|ej | ≤ kk 0 kεk∞ ejhLφ k +
e −1 (5.37)
Lφ
in the case of hLφ > 0 (the case hLφ = 0 is analogous). The product jh in the
exponent is to be interpreted as xj − x0 and therefore bounded by xE − x0 (or it
is constant and equal to x in the limit process j = n → ∞, h := (x − x0 ) /n). As
part of the definition of convergence, kτ k∞ → 0 (consistency) and kεk∞ → 0
holds for h → 0. According to (5.37), ej converges uniformly to zero; i.e.,
supx∈I |η(x; ε, h) − y(x)| → 0. t u
Corollary 5.45. In addition to the assumptions in Theorem 5.44 assume consistency
of order p. Then also the convergence order is p, provided that the initial errors are
sufficiently small:
p r−1
|η(x; ε, h) − y(x)| ≤ C h + max |εj | .
j=0
5.5.6.1 Examples
The Adams–Bashforth methods are explicit linear r-step methods of the form
r−1
X
ηj+r = ηj+r−1 + h bµ fj+µ (5.38)
µ=0
Exercise 5.46. (a) Euler’s method is the optimal Adams–Bashforth method for r = 1.
(b) What are the optimal coefficients b0 , b1 in (5.38) for r = 2 ?
contains four free parameter. Because of the side condition (5.7) (i.e., α0 + α1 = 1),
there remain three degrees of freedom. Thus, the optimal choice α0 = −5, α0 = 2,
α1 = 4, b1 = 4 of the coefficients can reach consistency of order p = 3. The
resulting method is
The previous example shows that one cannot use all coefficients αν , bµ from (5.6)
and (5.35) in order to maximise the consistency order. Instead stability is a side
condition, when we optimise αν , bµ . The characterisation of optimal stable multi-
step methods is due to Dahlquist [2].
Theorem 5.47. (a) If r ≥ 1 is odd, the highest consistency order of a stable linear
r-step method is p = r + 1.
(b) If r ≥ 2 is even, the highest consistency order is p = r + 2. In this case, all roots
of the characteristic polynomial ψ have absolute value 1.
9
Note the following advantage of multistep methods compared with one-step methods: In spite
of the increased consistency order r, only one function value fj+r−1 = f (xj+r−1 , ηj+r−1 )
needs to be evaluated per grid point xj ; the others are known from the previous steps.
5.5 Analysis of Multistep Methods 87
5.5.6.3 Proof
Since ζ0 = ∞ is excluded, z0 = 1 is not a root and the bracket [. . .] does not vanish
at z = z0 . This proves Part (b). t
u
Because of stability and ψ(1) = 0 (cf. (5.22)), ζ = 1 is a simple root. By
Remark 5.48c, p(z) has a simple root at z = 0. Hence, p is of the form
α1 > 0. (5.41b)
(otherwise scale the equation of the multistep method by −1 changing ψ(ζ) into
−ψ(ζ)). We shall prove that
88 5 Ordinary Differential Equations
ψ(ζ)
ϕ(ζ) := − σ(ζ). (5.42)
log ζ
Theorem 5.49. Assume ψ(1) = 0 (cf. (5.22)). Then the linear multistep method
(5.20b) has the (local) consistency order p if and only if ζ = 1 is a p-fold root of ϕ.
Set δ := eh −1 = h·eθh (0 < θ < h from the mean value theorem). Hence, δ can be
estimated from both sides by const · h. The Taylor expansion of ϕ(eh ) = ϕ(1 + δ)
around δ = 0 exists, since ϕ is holomorphic at ζ = 1:
z p(z)
1+z = β 0 + β 1 z + β2 z 2 + . . . (5.44a)
log 1−z
z
s(z) = β0 + β1 z + . . . + βr z r , (5.44b)
βµ = 0 for r + 1 ≤ µ ≤ p − 1 (5.44c)
must hold (the first βµ for 0 ≤ µ ≤ r can be made to zero by the choice (5.44b)).
1+z
The function z/ log 1−z is an even function in z, so that the power series becomes
z
1+z = c0 + c2 z 2 + c4 z 4 + . . .
log 1−z
We shall prove that c2ν < 0 for all ν ≥ 1. Hence, for odd r it follows that
For the proof of c2ν < 0 we need the following lemma of Kaluza [12].
P∞ P∞
Lemma 5.50. Let f (t) = ν=0 Aν tν and g(t) = ν=0 Bν tν be power series with
the properties
f (t)g(t) ≡ 1, Aν > 0 (ν ≥ 0), Aν+1 Aν−1 > A2ν (ν ≥ 1).
Then Bν < 0 holds for all ν ≥ 1.
Aν+1 Aν
Proof. (i) The assumption Aν+1 Aν−1 > A2ν can be written as Aν > Aν−1 for
An+1 An−ν+1
ν ≥ 1. From this we conclude that An > An−ν for 1 ≤ ν ≤ n. The latter
inequality can be rewritten as
An+1 An−ν − An An−ν+1 > 0. (5.45)
(ii) Without loss of generality assume A0 = 1. This implies B0 = 1. Comparison
of the coefficients in f (t)g(t) ≡ 1 proves that
n
X n
X
0 = An + Bν An−ν (n ≥ 1), −Bn+1 = An+1 + Bν An−ν+1 (n ≥ 0).
ν=1 ν=1
1 1+z
We apply this lemma to f (z 2 ) = z log 1−z = 2+ 23 z 2 + 25 z 4 +. . . The coefficients
2
Aν = 2ν+1 > 0 satisfy
2 2 4 4 2
Aν+1 Aν−1 = · = 2 > 2 = Aν
2ν + 3 2ν − 1 (2ν + 1) − 4 (2ν + 1)
and therefore the supposition of the lemma. Since Bν = c2ν , the assertion of Step 2
is proved.
Step 3: For even r, stable methods are characterised as in Theorem 5.47b.
For even r, the sum corresponding to (5.44d) becomes
decays very strongly. As in the one-dimensional example from above, one would
like to use a step size of the size h < 1. However, the second component enforces
the inequality h < 1/1000. This effect appears for general linear systems, if A
possesses one eigenvalue of moderate size and another one with strongly negative
real part, or for nonlinear systems y 0 = f (x, y), where A(x) := ∂f /∂y has similar
properties. This leads to the definition of A-stability (or absolute stability; cf. [20],
[9], [3]). Good candidates for A-stable methods are implicit ones (cf. 5.4.1). Implicit
methods have to solve a linear system of the form Az = b. Here, another problem
arises. For instance, the spatial discretisation of a parabolic initial value problem (see
Chap. 6) yields a large stiff system of ordinary differential equations. The solution
of the large linear system requires further numerical techniques (cf. [6]).
Next, we consider the formulation By 0 = Cy of a linear system or, in the gen-
eral nonlinear case, F (x, y, y 0 ) = 0. If B is regular (or F solvable with respect to
y 0 ), we regain the previous system with A := B −1 C. If, however, B ∈ Rn×n is
singular with rank k, the system By 0 = Cy consists of a mixture of n − k differ-
ential equations and k algebraic side conditions. In this case, the system is called
a differential-algebraic equations (DAE; [14]). In between there are singularly per-
turbed systems, where B = B0 + εB1 is regular for ε > 0, but ε is small and B0 is
singular.
92 5 Ordinary Differential Equations
The analysis of numerical schemes for ordinary differential equations has created
many further variants of stability definitions. Besides this, there are stability con-
ditions (e.g., Lyapunov stability) which are not connected with discretisations, but
with the (undiscretised) differential equation and its dynamical behaviour (cf. [8,
§I.13], [11, §X], [7], [21]).
References
1. Butcher, J.C.: A history of Runge-Kutta methods. Appl. Numer. Math. 20, 247–260 (1996)
2. Dahlquist, G.: Convergence and stability in the numerical integration of ordinary differential
equations. Math. Scand. 4, 33–53 (1956)
3. Deuflhard, P., Bornemann, F.: Scientific computing with ordinary differential equations.
Springer, New York (2002)
4. Deuflhard, P., Bornemann, F.: Numerische Mathematik II. Gewöhnliche Differentialgleichun-
gen, 3rd ed. Walter de Gruyter, Berlin (2008)
5. Gautschi, W.: Numerical Analysis. An Introduction. Birkhäuser, Boston (1997)
6. Hackbusch, W.: Iterative solution of large sparse systems of equations. Springer, New York
(1994)
7. Hahn, W.: Stability of Motion. Springer, Berlin (1967)
8. Hairer, E., Nørsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I, 2nd ed. North-
Holland, Amsterdam (1993)
9. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II. Springer, Berlin (1991)
10. Heun, K.: Neue Methoden zur approximativen Integration der Differentialgleichungen einer
unabhängigen Veränderlichen. Z. für Math. und Phys. 45, 23–38 (1900)
11. Heuser, H.: Gewöhnliche Differentialgleichungen. Teubner, Stuttgart (1991)
12. Kaluza, T.: Über Koeffizienten reziproker Potenzreihen. Math. Z. 28, 161–170 (1928)
13. Kreiss, H.O.: Über die Stabilitätsdefinition für Differenzengleichungen die partielle Differen-
tialgleichungen approximieren. BIT 2, 153–181 (1962)
14. Kunkel, P., Mehrmann, V.: Differential-algebraic equations - analysis and numerical solution.
EMS, Zürich (2006)
15. Kutta, W.M.: Beitrag zur näherungsweisen Integration totaler Differentialgleichungen. Z. für
Math. und Phys. 46, 435–453 (1901)
16. Lipschitz, R.O.S.: Lehrbuch der Analysis, Vol. 2. Cohen, Bonn (1880)
17. Morton, K.W.: On a matrix theorem due to H. O. Kreiss. Comm. Pure Appl. Math. 17, 375–
379 (1964)
18. Richtmyer, R.D., Morton, K.W.: Difference Methods for Initial-value Problems, 2nd ed. John
Wiley & Sons, New York (1967). Reprint by Krieger Publ. Comp., Malabar, Florida, 1994
19. Runge, C.D.T.: Ueber die numerische Auflösung von Differentialgleichungen. Math. Ann. 46,
167–178 (1895)
20. Stetter, H.J.: Analysis of Discretization Methods for Ordinary Differential Equations. North-
Holland, Amsterdam (1973)
21. Terrel, W.J.: Stability and Stabilization - An Introduction. Princeton University Press, Prince-
ton and Oxford (2009)
22. Toh, K.C., Trefethen, L.N.: The Kreiss matrix theorem on a general complex domain. SIAM
J. Matrix Anal. Appl. 21, 145–165 (1999)
23. Werner, H., Arndt, H.: Gewöhnliche Differentialgleichungen. Eine Einführung in Theorie und
Praxis. Springer, Berlin (1986)
Chapter 6
Instationary Partial Differential Equations
The analysis presented in this chapter evolved soon after 1950, when discretisations
of hyperbolic and parabolic differential equations had to be developed. Most of the
material can be found in Richtmyer–Morton [21], see also Lax–Richtmyer [16].
All results concern linear differential equations. In the case of hyperbolic equa-
tions there is a crucial difference between linear and nonlinear problems, since in
the nonlinear case many unpleasant features may occur that are unknown in the
linear case. Concerning general hyperbolic conservation laws, we refer, e.g., to
Kröner [14] and LeVeque [19]. However, even in the nonlinear case, the linearised
problems should satisfy the stability conditions described here.
Notation 6.1. The desired solution is denoted by u (instead of y). The independent
variables are t and x. The classical notation for u depending on t and x is u(t, x).
Let B be a space of functions in the variable x. Then u(t) denotes the function
u(t, ·) ∈ B (partially evaluated at t). Therefore, u(t, x) and u(t)(x) are equivalent
∂
u(t) = Au(t) for all t ∈ I. (6.1a)
∂t
The initial-value condition is given by
u(0) = u0 for some u0 ∈ DA ⊂ B. (6.1b)
Concerning the differential operator A, we discuss two model cases:
∂ ∂2
A := a (a 6= 0) and A := a (a > 0). (6.2)
∂x ∂x2
In what follows, the domain of u(·, ·) is the set
Here the time t varies in I = [0, T ], while the spatial variable x varies in R.
I corresponds to the interval I = [x0 , xE ] from §5.1. The spatial domain R is
chosen as the unbounded domain to avoid boundary conditions.2
We restrict our considerations to two Banach spaces, generally denoted by B:
• B = C(R), space of the complex-valued, uniformly3 continuous functions with
finite supremum norm kvkB = kvk∞ = sup{|v(x)| : x ∈ R}.
• B = L2 (R), space of the complex-valued, measurable, and square-integrable
functions. This means that the L2 norm
sZ
2
kvkB = kvk2 = |v(x)| dx
R
is finite. This Banach space is also a Hilbert space with the scalar product
Z
(u, v) := u(x)v(x)dx.
R
For both cases of (6.2) we shall show that the initial-value problem is solvable.
1
The domain of a differential operator A is DA = {v ∈ B : Av ∈ B is defined}. Often, it
suffices to choose a smaller, dense set B0 ⊂ DA and to extend the results continuously onto DA .
2
A similar situation arises if the solutions are assumed to be 2π-periodic in x. This corresponds
to the bounded domain Σ = I × [0, 2π] with periodic boundary condition u(t, 0) = u(t, 2π). In
the 2π-periodic case, the spaces are
Cper (R) := {v ∈ C(R) : v(x) = v(x + 2π) for all x ∈ R},
L2per (R) := {v ∈ L2 (R) : v(x) = v(x + 2π) for almost all x ∈ R}.
3
Note that limits of uniformly continuous functions are again uniformly continuous.
6.1 Introduction and Examples 95
∂
First we choose B = C(R). The domain of A = a ∂x is the subspace B0 = C 1 (R),
4
which is dense in B. The partial differential equation
∂ ∂
u=a u
∂t ∂x
is of hyperbolic type.5 The solution of problem (6.1a,b) can be described directly.
Lemma 6.2. For any u0 ∈ B0 = C 1 (R), the unique solution of the initial-value
problem (6.1a,b) is given by u(t, x) := u0 (x + at).
∂
Proof. (i) By ∂t ∂
u = ∂t u0 (x+at) = au00 (x+at) and a ∂x ∂
u0 (x+at) = au00 (x+at),
the differential equation (6.1a) is satisfied. The initial value is u(0, x) := u0 (x).
(ii) Concerning uniqueness, we transform to the (characteristic) direction:
Proof. Since u(t) is a shifted version of u0 and a shift does not change the norm
k·kB , the assertions follow. t
u
4
The case of a = 0 is exceptional, since then ut = 0 can be considered as a family of ordinary
differential equations for each x ∈ R. Hence, the theory of §5 applies again.
5
Concerning the definition of types of partial differential equations, see Hackbusch [8, §1].
96 6 Instationary Partial Differential Equations
If we can prove that the last term tends to zero as t & 0, the desired statement
limt&0 u(t, x) = u0 (x) follows.
Let x and ε > 0 be fixed. Because of continuity of u0 , there is a δ > 0 such
that |u0 (ξ) − u0 (x)| ≤ ε/2 for all |ξ − x| ≤ δ. We split the integral into the sum of
three terms:
Z x+δ 2
1 − (x − ξ)
I1 (t, x) := √ [u0 (ξ) − u0 (x)] exp dξ,
4πt x−δ 4t
Z x−δ 2
1 − (x − ξ)
I2 (t, x) := √ [u0 (ξ) − u0 (x)] exp dξ,
4πt −∞ 4t
Z ∞ 2
1 − (x − ξ)
I3 (t, x) := √ [u0 (ξ) − u0 (x)] exp dξ.
4πt x+δ 4t
x+δ 2
− (x − ξ)
Z
1
|I1 (t, x)| ≤ √ |u0 (ξ) − u0 (x)| exp dξ
4πt x−δ 4t
Z x+δ 2
ε/2 − (x − ξ)
≤√ exp dξ
4πt x−δ 4t
Z ∞ 2
ε/2 − (x − ξ) ε
≤√ exp dξ = .
4πt −∞ 4t (6.6) 2
The representation (6.5) shows that the solution u(t) at t > 0 is infinitely often
differentiable, although the initial value u0 is only continuous. However, the solution
exists only for t > 0, not for t < 0. Note the different property in the hyperbolic
case, where the representation of the solution from Lemma 6.2 holds for all t ∈ R.
In the hyperbolic case, the norm ku(t)kB is independent of t, whereas in the
parabolic case, only a monotonicity statement holds.
Lemma 6.6. Let u(t) ∈ B = C(R) be a solution of (6.4). Then the inequality
ku(t)k∞ ≤ ku(0)k∞ holds for all t ≥ 0.
Proof. Set C := ku(0)k∞ and let t > 0. From (6.5) we infer that
∞ 2
− (x − ξ)
Z
1
|u(t, x)| ≤ √ |u0 (ξ)| exp dξ
4πt −∞ 4t
98 6 Instationary Partial Differential Equations
∞ 2
− (x − ξ)
Z
C
≤√ exp dξ = C ;
4πt −∞ 4t (6.6)
hence, ku(t)k∞ ≤ C. t
u
Lemma 6.7. Let u(t) ∈ B = L2 (R) be a solution of (6.4). Then ku(t)k2 ≤ ku(0)k2
holds for all t ≥ 0.
Proof. It is sufficient to restrict to u(0) ∈ DA . One concludes either from (6.5) or
2
from general considerations (cf. (6.7e)) that ∂∂xu2 ∈ L2 (R) for t ≥ 0, so that the
00 0
following integrals exist. Let t ≥ t ≥ 0. Because of
Z Z Z Z t00
00 0 ∂
2 2
u(t0 , x)2 dtdx
u(t , x) dx − u(t , x) dx =
R R R t0 ∂t
t00 t00
∂ 2 u(t, x)
Z Z Z Z
∂u(t, x)
=2 u(t, x) dtdx = 2 dtdx u(t, x)
R t0 ∂t R t0 ∂x2
Z t00 Z Z t00 Z 2
∂ 2 u(t, x) ∂u(t, x)
=2 u(t, x) dxdt = −2 dxdt ≤ 0,
t0 R ∂x2 t0 R ∂x
2
ku(t)k2 is weakly decreasing. t
u
Because of
T (t) ∈ L(B, B), (6.7b)
any initial value u0 ∈ B leads to a function u(t) := T (t)u0 ∈ B. If u0 ∈ B\B0 ,
the resulting function is called a ‘generalised’ or ‘weak’ solution in contrast to the
strong solution mentioned above. Note that the descriptions u(t, x) := u0 (x + at)
from Lemma 6.2 and of u(t, x) by (6.5) make sense for any u0 ∈ B.
Next, we show the semigroup property
For this purpose consider the strong solution u(τ ) = T (τ )u0 for an initial value
u0 ∈ B0 and fix some s ≥ 0. Then û0 := u(s) equals T (s)u0 . Set û(t) := u(t + s).
Since ût (t) = ut (t + s) = Au(t + s) = Aû(t) and û(0) = u(s) = û0 , we conclude
that û(t) = T (t)û0 ; i.e., T (t + s)u0 = u(t + s) = û(t) = T (t)û0 = T (t)T (s)u0
holds for all u0 ∈ B0 . Since B0 is dense in B, the identity (6.7c) follows.
The operators A and T (t) commute:
(see Footnote 1 for the domain DA of A). For a proof, consider the strong solution
u(t) = T (t)u0 for any u0 ∈ B0 . The third line in
uses the continuity of T (t) (cf. (6.7b)). Since T (t)Au0 ∈ B is defined for all
u0 ∈ DA , also AT (t) has this property. This proves6
6
The semigroups of hyperbolic and parabolic problems have different properties. In the parabolic
case, T (t) : DA 7→ B holds for positive t (even T (t) : DA → C ∞ (R) ∩ B) as can be seen
from (6.5), whereas in the hyperbolic case the smoothness does not improve with increasing t.
100 6 Instationary Partial Differential Equations
Exercise 6.9. Suppose that Kτ := sup0≤t≤τ kT (t)kB←B < ∞ for some τ > 0.
dt/τ e
Prove kT (t)kB←B ≤ Kτ for any t ≥ 0, where dxe := min{n ∈ Z : x ≤ n}.
In the next subsections, we shall refer to the following inequality (6.8), which
∂ ∂2
holds with KT = 1 for the model examples A = a ∂x and A = ∂x 2.
So far, only the homogeneous equation ut (t) = Au(t) has been studied.
∂
Remark 6.11. The solution of the inhomogeneous equation ∂t u(t) = Au(t) + f (t)
can be represented by
Z t
u(t) = T (t)u0 + T (t − s)f (s)ds.
0
6.3.1 Notations
We replace the real axis R of the x variable by an infinite grid of step size ∆x > 0:
(cf. (6.3)). As we shall see, the step sizes ∆x, ∆t are, in general, not chosen inde-
pendently, but are connected by a parameter λ (the power of ∆x corresponds to the
order of the differential operator A):
∂
∆t/∆x in the hyperbolic case A = a ∂x ,
λ= 2 ∂ 2 (6.11)
∆t/∆x in the parabolic case A = ∂x2 .
∆t
For a grid function U : Σ∆x → C we use the notation
U µ := (Uνµ )ν∈Z .
The continuous Banach space B and the discrete space `p of grid functions are
connected via
r = r∆x : B → `p . (6.12a)
The letter r means ‘restriction’. The index ∆x will be omitted, when the underlying
step size is known.
In the case of B = C(R), an obvious choice of r is the evaluation at the grid
points of G∆x :
Lemma 6.12. The restrictions (6.12b,c) satisfy condition (6.13) with Cr = 1 with
respect to the respective norms k·k`∞ ←C(R) and k·k`2 ←L2 (R) .
2 P (j+1/2)∆x
R 2 R 2 2
yield (6.12c): kruk`2 ≤ |u(x)| dx = |u(x)| dx = kukL2 (R) . t
u
j (j−1/2)∆x R
Exercise 6.13. Verify that in the cases (6.12b,c) the following choices of p satisfy
condition (6.14) with Cp = 1:
p is piecewise linear interpolation: (6.15a)
∞
v∈` 7→ pv ∈ C(R) with (pv) (x) = ϑvj + (1 − ϑ) vj+1 ,
where x = (j + ϑ) ∆x, j ∈ Z, ϑ ∈ [0, 1),
or
p is piecewise constant interpolation: (6.15b)
2 2
v ∈ ` 7→ pv ∈ L (R) with (pv) (x) = vj ,
where x ∈ (j − 12 )∆x, (j + 12 )∆x , j ∈ Z.
Initially, U 0 is prescribed via Uν0 = ru0 (r from (6.12a), u0 from (6.1b)). The
explicit difference scheme
X µ
Uνµ+1 = aj Uν+j (6.16)
j∈Z
6.3 Discretisation of the Partial Differential Equation 103
a∆t
u(t + ∆t, x) = u(t, x) + u(t, x + ∆x) − u(t, x) ;
∆x
µ
i.e., Uνµ+1 = (1 − aλ) Uνµ + aλUν+1 . (6.17a)
aλ µ aλ µ
Uνµ+1 = − Uν−1 + Uνµ + U . (6.17b)
2 2 ν+1
u(t+∆t,x)−u(t,x) u(t+∆t,x)−[u(t,x+∆x)+u(t,x−∆x)]/2
(c) A replacement of ∆t by ∆t yields
1 − aλ µ 1 + aλ µ
Uνµ+1 = Uν−1 + Uν+1 . (6.17c)
2 2
2 u(t+∆t,x)−u(t,x)
∂
(d) In the parabolic case of A = ∂x 2 , the difference quotient ∆t for
∂ u(t,x−∆x)−2u(t,x)+u(t,x+∆x) ∂2u
∂t u and the second difference quotient ∆x2 for ∂x2 are
obvious choices and lead together with λ = ∆t/∆x2 from (6.11) to
i.e.,
µ µ
Uνµ+1 = λUν−1 + (1 − 2λ) Uνµ + λUν+1 . (6.18)
U µ+1 = CU µ . (6.20)
104 6 Instationary Partial Differential Equations
Notation 6.15. The difference operator C (and therefore also the coefficients aj )
may depend on the parameter λ and the step size ∆t:
C = C(λ, ∆t).
a00 := a0 + ∆t · b.
U µ = C µU 0 U (µ∆t, ν∆x) = C µ U 0
and ν
.
Remark 6.18. So far, the coefficients aj are assumed to be scalars. If the scalar
∂
equation ∂t u = Au for u : I × R → R is replaced by a vector-valued equa-
tion for u : I × R → RN , the coefficients aj in (6.19) become N × N matrices.
The vector-valued case will be discussed in §6.5.5.
(Ej U )ν := Uj+ν .
1 1
∆t = 1, 10 , 100 are shown. The required number of time steps is 1, 10, or 100,
1/∆t
respectively. The last three rows of the table list the values of Uν = U (1, ν∆x)
for −2 ≤ ν ≤ 2.
6.4.1 Definitions
Again, the restriction r from §6.3.2 is used. Consistency will depend on the under-
lying norm `p , where p = 2, ∞ are the discussed examples corresponding to the
Banach spaces B = L2 (R) and C(R). We say that ‘`p is suited to B’, if the require-
ments (6.13) and (6.14) are satisfied.
sup k[rT (∆t) − C(λ, ∆t)r] T (t)u0 k`p := o(∆t) for all u0 ∈ B0 .
0≤t≤T −∆t
Note that the following definition of convergence refers to the whole Banach
space B, not to a dense subspace B0 .
106 6 Instationary Partial Differential Equations
Definition 6.20 (convergence). For all u0 ∈ B, let u(t) = T (t)u0 denote the
generalised solution. The difference scheme C(λ, ∆t) is called convergent (with
respect to `p ), if
Definition 6.21 (stability). The difference scheme C(λ, ∆t) is called stable (with
respect to `p ) if
sup{kC(λ, ∆t)µ k`p ←`p : ∆t ≥ 0, µ ∈ N0 , 0 ≤ µ∆t ≤ T } < ∞. (6.23)
If (6.23) holds only for certain values of λ, the scheme is called conditionally stable;
otherwise, the scheme is called unconditionally stable.
In the case of (6.23), the stability constant is defined by
K = K(λ) := sup{kC(λ, ∆t)µ k`p ←`p : ∆t ≥ 0, µ ∈ N0 , 0 ≤ µ∆t ≤ T }.
Instead of ‘stable with respect to `p ’ we say for short ‘`p stable’. Similarly, the
terms ‘`p stability’, ‘`p consistent’, ‘`p consistency’ etc. are used.
First, we show that consistency and stability—together with some mild technical
assumptions—imply convergence.
Theorem 6.22 (convergence theorem). Suppose (a) to (d):
(a) r bounded with respect to `p (cf. (6.13)),
(b) T (t) satisfies Assumption 6.10,
(c) `p stability of the difference scheme C(λ, ∆t),
(d) `p consistency.
Then the difference scheme is convergent with respect to `p .
Proof. (i) Given an initial value u0 ∈ B0 ⊂ DA (B0 dense subset of B), define
u(t) = T (t)u0 . We split the discretisation error as follows:
ru(t) − C(λ, ∆t)µ ru0 = r [u(t) − u(µ∆t)] + [rT (µ∆t) − C(λ, ∆t)µ r] u0
= r [u(t) − u(µ∆t)] + [rT (∆t)µ − C(λ, ∆t)µ r] u0 .
µ−1
X
≤ Kr ku(t) − u(µ∆t)kB + K(λ)∆t
τ (µ − ν − 1) ∆t
`p
ν=0
Together with kru∗ (t) − C(λ, ∆t)µ ru∗0 k`p ≤ ε/3 from (i) for sufficiently small ∆t
and t − µ∆t, it follows that kru(t) − C(λ, ∆t)µ ru0 k`p ≤ ε, so that also for general
initial values u0 ∈ B convergence is shown. t u
Next, we show that stability is also necessary for convergence.
Theorem 6.23 (stability theorem). Choose B and `p suitably, so that (6.13),
(6.14), and (6.8) hold. Then `p convergence implies `p stability.
Proof. For an indirect proof assume that the difference scheme is unstable. Then
there are sequences ∆tν > 0, µν ∈ N0 with 0 ≤ µν ∆tν ≤ T , so that
and therefore,
with sufficiently large ν0 = ν0 (u0 ). One concludes that Cν := C(λ, ∆tν )µν r
is a point-wise bounded sequence of operators. Corollary 3.39 yields that Cν is
uniformly bounded: there is a constant K with
kC(λ, ∆tν )µν k`p ←`p = kC(λ, ∆tν )µν rpk`p ←`p
≤ kC(λ, ∆tν )µν rk`p ←B kpkB←`p ≤ KKp
(6.14)
Theorem 6.24 (equivalence theorem). Suppose (6.8), (6.13), (6.14), and `p con-
sistency. Then `p convergence and `p stability are equivalent.
So far, we restrict the analysis to the `p and Lp norms for p = 2 and p = ∞. For
1 ≤ p < ∞, the Lp norm is defined in Exercise 6.4, while `p is defined analogously.
The reason for the restriction to p ∈ {2, ∞} is twofold. If different properties hold
for `2 and `∞ , a more involved analysis is necessary to describe the properties for
`p with 2 < p < ∞. It might be that the separation is between p = 2 and p > 2,
because in the latter case the Hilbert structure is lost. However, it might also happen
that properties change between the cases p < ∞ and p = ∞, since in the first case
`p is reflexive, but not in the latter case.
If stability estimates hold for both cases p = 2 and p = ∞, we are in a very
pleasant situation, since these bounds imply corresponding estimates for the `p and
Lp setting for 2 < p < ∞. This result is based on the interpolation estimate by
Riesz–Thorin. It is proved by Marcel7 Riesz [23] in8 1926/27, while Thorin [25]
(1939) simplified the proof. In the following lemma, let k·kp←p be the operator
norm of L(`p , `p ) or L(Lp , Lp ).
α β q − p1 p2 − q
k·kq←q ≤ k·kp1 ←p1 k·kp2 ←p2 with α = , β= .
p2 − p1 p2 − p1
7
Marcel Riesz is the younger brother of Frigyes Riesz, who is the author of, e.g., [22].
8
The article belongs to Volume 49, which is associated to the year 1926. However, it is the last
article of that volume and carries the footnote ‘Imprimé le 11 janvier 1927’. Therefore, one finds
also 1927 as the publication date.
6.5 Sufficient and Necessary Conditions for Stability 109
Note that α + β = 1. This implies that p1 stability (i.e., k. . .kp1 ←p1 ≤ M1 ) and
p2 stability k. . .kp2 ←p2 ≤ M2 imply q stability in the form
Hence, if a criterion yields both `2 and `∞ stability, then `p stability holds for all
2 ≤ p ≤ ∞.
The following results belong to the classical stability theory of Lax–Richtmyer [16].
Criterion 6.26. If
Proof. Use
µ µ
kC(λ, ∆t)µ k`p ←`p ≤ kC(λ, ∆t)k`p ←`p ≤ (1 + Kλ ∆t)
µ µ
and apply Exercise 3.24a: (1 + Kλ ∆t) ≤ eKλ ∆t = eKλ µ∆t ≤ eKλ T . t
u
µ∆t≤T
The coefficients aj of C(λ, ∆t) can be used to estimate kC(λ, ∆t)k`p ←`p .
P
Remark 6.27. The difference scheme (6.19) satisfies kC(λ, ∆t)k`p ←`p ≤ |aj | .
P
P P
Proof. kC(λ, ∆t)k`p ←`p =
j aj Ej
≤ j |aj | kEj k`p ←`p = j |aj |
`p ←`p
follows from (6.22) and (6.21). tu
Proof. (i) In the case of `∞ , choose the constant grid function U 0 ∈ `∞P , i.e.,
Uj0 = 1 for all j ∈ Z. Then U 1 = C(λ, ∆t)U 0 equals ζU 0 with ζ := aj
and, correspondingly, U µ = C(λ, ∆t)µ U 0 = ζ µ U 0 . Hence, kC(λ, ∆t)µ k`p ←`p ≥
µ
ζ µ ≥ [1 + ∆tc(∆t)] . Exercise 6.32a proves the assertion.
(ii) In the case of `2 , the previous proof cannot be repeated, since U 0 ∈
/ `2 .
Instead, the proof will be given after Theorem 6.44 on page 118. t u
Lemma 6.34 (perturbation lemma). Let C(λ, ∆t) be `p stable with stability con-
stant K(λ). Suppose that a perturbation D(λ, ∆t) is bounded by
Then
C 0 (λ, ∆t) := C(λ, ∆t) + D(λ, ∆t)
6.5 Sufficient and Necessary Conditions for Stability 111
where the second sum runs over all αj ∈ N0 with α1 + . . . + αµ+1 = µ − m. Each
term C α1 DC α2 D · · · C αm DC αm+1 contains m factors
D and µ − m factors C.
µ
For a fixed m ∈ [0, µ] the number of these terms is m . Together with the estimate
kC α1 DC α2 D · · · C αµ DC αµ+1 k
m
≤ kC α1 k kDk kC α2 k kDk · · · kC αm k kDk kC αm+1 k ≤ K(λ)m+1 (CD ∆t)
µ
with [1 + K(λ)CD ∆t] ≤ eK(λ)CD µ∆t ≤ eK(λ)CD T (cf. Exercise 3.24). t
u
µ∆t≤T
Remark 6.35. (a) A simple application of Lemma 6.34 is the following one. Let
∂
the differential operator A in ∂t u = Au (cf. (6.1a)) be A = A1 + A0 , where
A1 contains derivatives of at least first order, while A0 u = a0 u is the term of order
zero. The discretisation yields correspondingly C(λ, ∆t) = C1 (λ, ∆t)+C0 (λ, ∆t).
A consistent discretisation of C0 satisfies the estimate kC0 (λ, ∆t)k`p ←`p = O(∆t).
By Lemma 6.34, the stability of C1 (λ, ∆t) implies the stability of C(λ, ∆t). There-
fore, it suffices to investigate differential operators A without terms of order zero.
(b) Let A = A1 be the principal part from above. The property A1 = 0 (1 ∈ `∞ :
constant function with value one) shows that u = 1 is a solution. This implies the
special consistency condition
X
aj = 1 (= I in the matrix-valued case ) (6.24)
j∈Z
Proof. ρ(Aµ ) = ρ(A)µ holds for µ ∈ N. On the other hand, ρ(A) ≤ kAk is valid
for any associated norm. Therefore, stability yields
for all µ, ∆t with µ∆t ≤ T . Exercise 6.32b proves ρ(C(λ, ∆t)) ≤ 1 + Cρ ∆t. t
u
Remark 6.37. Suppose that the operator C(λ, ∆t) ∈ L(`2 , `2 ) is normal; i.e.,
C(λ, ∆t) commutes with the adjoint operator C(λ, ∆t)∗ . Then ρ(C(λ, ∆t)) =
kC(λ, ∆t)k`2 ←`2 holds and `2 stability is equivalent to ρ(C(λ, ∆t)) ≤ 1 + O(∆t).
Proof. (i) Let C be a normal operator. First we prove kC 2 k`2 ←`2 = kC ∗ Ck`2 ←`2 .
From
it follows that
kC 2 k2`2 ←`2 = sup hCCu, CCui`2 = sup hC ∗ Cu, C ∗ Cui`2 = kC ∗ Ck2`2 ←`2 .
kuk`2 =1 kuk`2 =1
Since also
kCk2`2 ←`2 ≥ kC 2 k`2 ←`2 is shown. Because of kC 2 k`2 ←`2 ≤ kCk2`2 ←`2 (submulti-
plicativity of the operator norm), the equality kC 2 k`2 ←`2 = kC ∗ Ck`2 ←`2 is proved.
Analogously, kCkn`2 ←`2 = kC n k`2 ←`2 follows for all n = 2k (k ∈ N).
p
(ii) The characterisation ρ(C) = limn→∞ n kC n k`2 ←`2 together with (i) proves
ρ(C) = kCk`2 ←`2 .
(iii) According to Criterion 6.36, ρ(C(λ, ∆t)) ≤ 1 + O(∆t) is necessary,
while kC(λ, ∆t)k`2 ←`2 ≤ 1 + O(∆t) is sufficient (cf. Criterion 6.26). Since
ρ(C(λ, ∆t)) = kC(λ, ∆t)k`2 ←`2 , both inequalities are identical. t
u
11
λ is regular value of A, if λI − A is bijective and the inverse (λI − A)−1 ∈ L(B, B)
exists. Otherwise, λ is a singular value of A. In the case of finite-dimensional vector spaces (i.e.,
for matrices), the terms ‘singular value’ and ‘eigenvalue’ coincide.
6.5 Sufficient and Necessary Conditions for Stability 113
Since ρ(C(λ, ∆t)) ≤ 1 + O(∆t) from Remark 6.37 is relatively easy to check,
one may ask whether a similar criterion holds for more general operators. For this
purpose we introduce the ‘almost normal operators’: C(λ, ∆t) is almost normal if
2 2
kC(λ, ∆t)C(λ, ∆t)∗ − C(λ, ∆t)∗ C(λ, ∆t)k`2 ←`2 ≤ M (∆t) kC(λ, ∆t)k`2 ←`2
2 2
and µ2 + µ ≤ (µ+1) 2 prove the hypothesis.
2 2µ
(iii) Each interchange perturbs the operator norm at most by M (∆t) kCk`2 ←`2 .
∗
In the following formula the factors Cj , 1 ≤ j ≤ 2µ, are either C or C :
µ 2
the left-hand
h i kC k`2 ←`2 , while the right-hand side yields the lower
side becomes
2 2µ
bound 1 − M 2 (µ∆t) kCk`2 ←`2 because of
µ µ µ 2µ
sup hu, (C ∗ C) ui`2 = k(C ∗ C) k`2 ←`2 = kC ∗ Ck`2 ←`2 = kCk`2 ←`2 .
C ∗C normal
These inequalities are the most convenient conditions proving stability. However,
even if kC(λ, ∆t)k`2 ←`2 ≥ c > 1, it may happen that the powers stay bounded
(kC(λ, ∆t)µ k`2 ←`2 ≤ const). In that case one may try to find an equivalent norm
2 P 2
which behaves easier. Since the square kU k`2 = i∈Z |Ui | is a quadratic form,
one may introduce another quadratic form Q(U ) such that
1 2 2
kU k`2 ≤ Q(U ) ≤ K1 kU k`2 for all U ∈ `2 . (6.27)
K1
p
This describes the equivalence of the norms k·k`2 and Q(·). The next lemma is a
slight generalisation of the ‘energy method’ stated in [21, pages 139–140].
Lemma 6.39. Let (6.27) be valid. Suppose that the growth of Q(·) for one time step
U µ+1 = C(λ, ∆t)U µ is limited by
2
2
0 ≤ Q(U µ+1 ) − Q(U µ ) ≤ K2 ∆t kU µ k`2 +
U µ+1
`2 + K3 ∆t. (6.28a)
Then, for ∆t ≤ 1/(2K1 K2 ), the norms Q(U µ ) and kU µ k`2 stay bounded. More
precisely,
µ
2
2 K3 1 + K∆t K3
kU µ k`2 ≤ 2K12 − 1
U 0
`2 + − for µ ∈ N.
2K2 1 − K∆t 2K2
(6.28b)
2 1 + K∆t
qµ := Q(U µ ), sµ := kU µ k`2 , K := K1 K2 , L := K1 K3 , c := .
1 − K∆t
µ−1 µ−1
P P
Since sµ ≤ K1 qµ = K1 q0 + (q`+1 − q` ) ≤ K1 K1 s0 + (q`+1 − q` ) ,
`=0 `=0
we infer from (6.28a) that
µ−1
X
sµ ≤ K12 − K1 K2 ∆t s0 + 2K1 K2 ∆t
s` + K1 K2 ∆t sµ + K1 K3 µ∆t.
`=0
µ−1
X
(1 − K∆t) sµ ≤ K12 − K∆t s0 + 2K∆t
s` + Lµ∆t for all µ ∈ N.
`=0
(6.29a)
Note that 1 − K∆t ≥ 1/2 because of ∆t ≤ 1/(2K1 K2 ). The induction hypothesis
K3
sµ ≤ Sµ := Acµ − B , A := 2K12 − 1 s0 + B
with B := (6.29b)
2K2
S0 = Ac0 − B = 2K12 − 1 s0 + B − B ≥ s0 .
K12 −K∆t
n o
= (1 − K∆t) Acµ − B + B + 1−K∆t s0 − A .
L=2KB | {z }
=Sµ
K12 −K∆t
The expression 1−K∆t is monotone in K∆t. From K∆t ≤ 1/2 we infer that
K12 −K∆t
− A ≤ B + 2K12 − 1 s0 − A = 0,
B+ 1−K∆t s0
The transfer from the function f ∈ L2 (0, 2π) to its Fourier coefficients ϕ ∈ `2 is
the Fourier analysis, which will be denoted by F:
Ff = ϕ.
with
Ĉ(λ, ∆t) := F −1 C(λ, ∆t)F. (6.31)
2 2
While C(λ, ∆t) ∈ L(` , ` ), the transformed operator Ĉ(λ, ∆t) belongs to the
space L(L2 (0, 2π), L2 (0, 2π)).
The decisive quantity for stability is the operator norm kC(λ, ∆t)µ k`2 ←`2 . The
previous exercise shows that
1 X 1 X
F −1 Cj U = √ (aj Uα+j ) eiαξ = aj √ Uα+j eiαξ
2π α∈Z 2π α∈Z
1 X 1 X
= aj √ Uβ ei(β−j)ξ = aj e−ijξ √ Uβ eiβξ = aj e−ijξ Û .
β=α+j 2π β∈Z 2π β∈Z
A comparison with F −1 Cj U = Ĉj Û shows that Ĉj = aj e−ijξ ; i.e., the linear
mapping Ĉj : L2 (0, 2π) → L2 (0, 2π) is the multiplication by the function aj e−ijξ .
Since Ĉ(λ, ∆t) = F −1 C(λ, ∆t)F = F −1 j Cj F = j F −1 Cj F = j Ĉj ,
P P P
we obtain the following remark.
P
Remark 6.42. Consider the difference operator C(λ, ∆t) = j∈Z aj Ej . The
Fourier transformed operator is Ĉ(λ,∆t)= j∈Z aj e−ijξ . Application of Ĉ(λ, ∆t)
P
Note that (6.36) is only valid for scalar coefficients aj . Together with Remark 6.41,
(6.36) leads to the following theorem.
Proof. (i) (6.37) implies kC(λ, ∆t)k`2 ←`2 = kĈ(λ, ∆t)kL2 (0,2π)←L2 (0,2π) =
kGk∞ ≤ 1 + Kλ ∆t, so that Corollary 6.28 proves stability.
(ii) Set c(∆t) := (kGk∞ − 1) /∆t. If there is no constant Kλ with (6.37),
c(∆t) → ∞ follows for ∆t → 0. Exercise 6.32a and (6.36) imply instability. t u
Lemma 6.45. Let C(λ, ∆t) be of the form (6.19) with constant coefficients aj . Then
`∞ stability implies `2 stability:
kC(λ, ∆t)µ k`∞ ←`∞ ≥ kGµ k∞ = kC(λ, ∆t)µ k`2 ←`2 for all µ ∈ N0 .
The negation of this statement is: if the difference scheme (6.19) is `2 unstable, it is
also `∞ unstable.
Proof. Choose the special initial value U 0 with Uν0 = eiνξ , where ξ ∈ R is charac-
terised by |G(ξ)| = kGk∞ . Note that U 0 ∈ `∞ and kU 0 k∞ = 1. Application of
C(λ, ∆t) yields U 1 = C(λ, ∆t)U 0 with
X X X
Uν1 = 0
aj Uν+j = aj ei(ν+j)ξ = eiνξ aj eijξ = G(ξ)Uν0 ,
j∈Z j∈Z j∈Z
Below, the examples from §6.3.3 are again analysed with regard to (in)stability.
6.5 Sufficient and Necessary Conditions for Stability 119
Example 6.46. (a) Scheme (6.17a) with a0 = 1 − aλ, a1 = aλ. For λ satisfy-
ing 0 ≤ aλ ≤ 1 , stability is already confirmed in Example 6.33a. The associated
characteristic function
has modulus |G(π)| = |1 − 2aλ| at ξ = π. Since |G(π)| > 1 for all aλ ∈ / [0, 1],
Theorem 6.44 and Lemma 6.45 prove instability with respect to `2 and `∞ for aλ
outside of [0, 1]. Hence, the scheme (6.17a) is conditionally stable (with respect to
`2 and `∞ ) under the restriction aλ ∈ [0, 1], and unstable otherwise.
(b) Scheme (6.17b) with a−1 = − aλ aλ
2 , a0 = 1, a1 = 2 . The associated character-
istic function
aλ iξ aλ −iξ
G(ξ) := − e +1+ e = 1 − iaλ sin ξ
2 2
q
2
has maximum norm kGk∞ = 1 + |aλ| and, therefore, except the trivial case
a = 0, the scheme is always unstable (with respect to `2 and `∞ ).
(c) Scheme (6.17c) with a−1 = 1−aλ 2 , a1 =
1+aλ
2 . If |aλ| ≤ 1, Example 6.33c
shows stability. Because of
1 − aλ iξ 1 + aλ −iξ 2 2
G(ξ) := e + e and |G(ξ)| = cos2 ξ + |aλ| sin2 ξ,
2 2
2
the bound kGk∞ = max{1, |aλ| } follows. Hence, the scheme is conditionally
stable for |aλ| ≤ 1, but unstable for |aλ| > 1 (with respect to `2 and `∞ ).
(d) Scheme (6.18) with a−1 = λ, a0 = 1 − 2λ, a1 = λ. For λ ∈ (0, 1/2], stability
is shown in Example 6.33d. Because of
and kGk∞ = |G(π)| = |1 − 4λ| the scheme is conditionally stable for λ ∈ (0, 1/2],
but unstable for λ > 1/2 (with respect to `2 and `∞ ).
Lemma 6.45 yields a direct relation between G and `2 stability. Because of the
inequality kC(λ, ∆t)µ k`∞ ←`∞ ≥ kC(λ, ∆t)µ k`2 ←`2 , boundedness of powers of G
is necessary for `∞ stability. As we shall see, this condition is not sufficient for `∞
stability. A complete characterisation of `∞ stability is given by Thomée [24].
Theorem 6.47. Assume a difference scheme (6.16) with G(ξ) = j∈Z aj e−ijξ (cf.
P
(6.34)). Then the scheme is `∞ stable if and only if either condition (a) or (b) is
fulfilled:
(a) G(ξ) = ce−ijξ with |c| = 1,
(b) the set {ξ ∈ [−π, π] : |G(ξ)| = 1} = {ξ1 , . . . , ξN } has finite cardinality, and
there are numbers αk ∈ R, βk ∈ N, and γk ∈ C with <e γk > 0, such that
J1 = min{j ∈ Z : aj 6= 0},
J2 = max{j ∈ Z : aj 6= 0},
P µ P µ
so that the sum aj Uν+j can be reduced to aj Uν+j . If
j∈Z J1 ≤j≤J2
aλ ∈
/ [J1 , J2 ],
the scheme is not convergent (with respect to any norm). Therefore, aλ ∈ [J1 , J2 ] is
a necessary convergence condition.
If aλ ∈
/ [J1 , J2 ], then x + at ∈
/ [x + tJ1 /λ, x + tJ2 /λ]. Therefore, the computation
of Uνµ does not use those data, on which the solution u(t, x) depends. To be more
precise, choose an initial value u0 ∈ C0∞ (R) with
14
The original paper is written in German. An English translation can be found in [3,AppendixC].
6.5 Sufficient and Necessary Conditions for Stability 121
All schemes examined so far are, at best, conditionally stable. In order to obtain
unconditionally stable schemes, one must admit implicit difference schemes.15
∂u ∂2u
In the parabolic case ∂t = ∂x2 , the second x-difference can be formed at time
level t + ∆t, i.e.,
Instead of the explicit form U µ+1 = C(λ, ∆t)U µ , one now obtains an implicit
scheme of the form
C1 (λ, ∆t)U µ+1 = C2 (λ, ∆t)U µ , (6.39)
15
The CFL criterion does not apply to implicit schemes, since formally implicit schemes can be
viewed as explicit ones with an infinite sum (i.e., J1 = −∞, J2 = ∞). Then aλ ∈ [J1 , J2 ] = R
is valid, and the CFL condition is always satisfied.
122 6 Instationary Partial Differential Equations
X
(C1 (λ, ∆t)U )ν = a1,j Uν+j with a1,−1 = a1,1 = −λ, a1,0 = 1 + 2λ and
−1≤j≤1
X all other coefficients = 0;
C2 (λ, ∆t)U = a2,j Uν+j with a2,0 = 1,
i.e., C2 = identity.
j∈Z
Lemma 6.49. (a) If the coefficients a1,j of C1 (λ, ∆t) satisfy the inequality
!
X
|a1,j | / |a1,0 | ≤ 1 − ε < 1 for some ε > 0,
j∈Z\{0}
−1 1
then the inverse exists and satisfies k [C1 (λ, ∆t)] k`p ←`p ≤ ε|a1,0 | .
−1
(b) The inverse [C1 (λ, ∆t)] exists in L(`2 , `2 ) if and only if the characteristic
function G1 (ξ) of C1 satisfies an inequality |G1 (ξ)| ≥ η > 0. The norm equals
−1
k [C1 (λ, ∆t)] k`2 ←`2 = 1/ inf |G1 (ξ)| .
ξ∈R
On the other hand, one verifies that kĈ1−1 k`2 ←`2 cannot be finite if inf |G1 | equals
zero. Since the Fourier transform does not change the `2 norm, the equality
kC1−1 k`2 ←`2 = kĈ1−1 k`2 ←`2 = k1/G1 k∞ = 1/ inf ξ∈R |G1 (ξ)| follows. t u
Example 6.50. The scheme (6.38) is `2 stable for all λ = ∆t/∆x2 ; i.e., it is uncon-
ditionally stable.
Proof. G1 (ξ) = −λeiξ +1+2λ−λe−iξ = 1+2λ (1 − cos x) ≥ inf ξ∈R |G1 (ξ)| = 1
is the characteristic function of C1 . Since C2 = I, we obtain
−1 −1
C(λ, ∆t) := [C1 (λ, ∆t)] C2 (λ, ∆t) = [C1 (λ, ∆t)]
and kC1−1 k`2 ←`2 = 1/ inf ξ∈R |G1 (ξ)| = 1. Stability follows by Criterion 6.26. t
u
−1
Remark 6.51. [C1 (λ, ∆t)] from above is equal to the infinite operator
|j|
−1
X 1 2λ
[C1 (λ, ∆t)] = aj (λ)Ej with aj = √ √ ;
j∈Z
1 + 4λ 2λ + 1 + 1 + 4λ
µ
i.e., the implicit scheme (6.38) is identical to Uνµ+1 =
P
aj Uν+j .
j∈Z
P
Proof. Check that C1 (λ, ∆t) j∈Z aj Ej = identity. t
u
The general case of an implicit scheme (6.39) is treated in the next criterion (its
proof is identical to that of Theorem 6.44).
Criterion 6.52. The scheme (6.39) is `2 stable if and only if the characteristic func-
tion
G(ξ) := G2 (ξ)/G1 (ξ)
satisfies condition (6.37).
for Θ ∈ [0, 1]. For Θ = 0 and Θ = 1 we regain (6.18) and (6.38), respectively.
For Θ = 1/2, (6.41) is called16 the Crank–Nicolson scheme. The scheme (6.41) is
unconditionally `2 stable for Θ ∈ [1/2, 1], whereas in the case of Θ ∈ [0, 1/2) it is
conditionally `2 stable for
λ ≤ 1/ (2 (1 − 2Θ)) .
Proof. Let (DU )ν = −Uν−1 + 2Uν − Uν+1 be the negative second difference
operator. The characteristic function of D is GD (ξ) = 2 − 2 cos ξ = 4 sin2 (ξ/2).
The operators C1 , C2 from (6.39) in the case of (6.41) are
so that
1 − λ (1 − Θ) GD (ξ)
G(ξ) = .
1 + λΘGD (ξ)
1−λ(1−Θ)X
The function 1+λΘX is monotonically decreasing with respect to X, so that the
maximum of |G(ξ)| is taken at X = 0 = GD (0) or X = 4 = GD (π):
4λ (1 − Θ) − 1
kGk∞ = max{G(0), −G(π)} = max 1, .
1 + 4λΘ
If Θ ∈ [ 12 , 1], then
4λ (1 − Θ) − 1 4λ 4λ 2λ − 1
−G(π) = = −1 ≤ −1= ≤1
1 + 4λΘ 1 + 4λΘ Θ≥ 12 1 + 2λ 2λ + 1
proves kGk∞ = 1.
In the case of Θ ∈ [0, 1/2), the choice of λ must ensure the estimate −G(π) ≤ 1.
Equivalent statements are
So far, `p has been the set the complex-valued sequences (Uν )ν∈Z , Uν ∈ C. If the
∂
equation ∂t u(t) = Au(t) from (6.1a) is vector-valued (values in CN ), also the grid
functions (Uν )ν∈Z must be vector-valued:
Now, the coefficients aj in the difference scheme (6.19) are N ×N matrices. The
statements concerning consistency, convergence, and stability remain unchanged
(only the norms are to be interpreted differently). However, the criteria (see §6.5
and later) are to be generalised to the case N > 1.
Criterion 6.26 remains valid without any change.
The estimate in Remark 6.27 becomes
X
kC(λ, ∆t)k`p ←`p ≤ kaj kp ,
j
where k.k2 is the spectral norm and k.k∞ is the row-sum norm for N × N matrices.
P P
In Corollary 6.28 one has to replace |aj | by j kaj kp .
When we form G(ξ) = j∈Z aj e−ijξ by the Fourier transform, G(ξ) is now an
P
Instead of the relatively abstract operator C(λ, ∆t), one has now to investigate the
boundedness of the N × N matrices G(ξ)µ .
Criterion 6.54 (von Neumann condition). (a) A necessary condition for stability
(with respect to `2 and `∞ ) is
(b) If all matrices G(ξ) are normal, this condition is even sufficient for `2 stability.
Proof. The numerical radius of general square matrices A possesses the properties
holds for all n with n∆t ≤ T and all ξ ∈ [0, 2π). Because of (6.42), the assertion is
proved. t u
Definition 6.29 carries over to the matrix case when we replace non-negative
reals by positive semidefinite matrices; i.e., a positive difference scheme is charac-
terised by ‘aj positive semidefinite’.
Exercise 6.56. Suppose that the positive semidefinite coefficients aj are either
(i) all diagonal or
(ii) simultaneously diagonalisable; i.e., there is a transformation S such that the
matrices dj = Saj S −1 are diagonal for all j.
Show that analogous to Criterion 6.30, the difference scheme (6.19) is `2 and
∞
P
` stable if aj = I (cf. (6.24)).
Criterion 6.57 (Friedrichs [5]). Suppose that the difference schemeP(6.19) has
positive semidefinite coefficients aj satisfying the consistency condition j∈Z aj = I
(cf. (6.24)). The coefficients aj must be either constant or the following three
conditions must hold:
(i) the hyperbolic case with λ = ∆t/∆x is given (cf. (6.11)),
(ii) aj (·) P
are globally Lipschitz continuous in R with Lipschitz constant Lj ,
(iii) B := j∈Z jLj < ∞.
Then
kC(λ, ∆t)k`2 ←`2 ≤ 1 + CL ∆t
holds with CL = B/ (2λ) implying `2 stability.
∆x
PDP E
∆x 1 2
kUν k22
P
The first term is 2 aj (ν∆x)Uν , Uν = = kU k`2 .
ν∈Z j∈Z (6.24) 2 ν∈Z 2
∂ ∂
u + A(x) u = 0 (A(x) symmetric N × N matrix, u ∈ CN ). (6.44)
∂t ∂x
128 6 Instationary Partial Differential Equations
Replacing ∂x∂
u by u(t,x+∆x)−u(t,x−∆x)
2r∆x
∂
and ∂t u by u(t+∆t,x)−ū(t,x)
∆t with the
1
average ū(t, x) = 2 [u(t, x + ∆x) + u(t, x − ∆x)], we are led to the difference
scheme (Friedrichs’ scheme)
1
(C(λ, ∆t)U )ν = {[I − λA(ν∆x)] Uν−1 + [I + λA(ν∆x)] Uν+1 } . (6.45)
2
Choosing 0 < λ ≤ 1/ supx∈R kA(x)k2 , we obtain a positive scheme C(λ, ∆t).
Hence `2 stability follows, provided that A(x) is Lipschitz continuous.
Criterion 6.36 as well as Remark 6.37 and Criterion 6.38 remain valid in the
vector-valued case.
Also in the vector-valued
P case, the Fourier transformed difference operator has
−ijξ
the form Ĉ(λ, ∆t)(ξ) = j∈Z aj e , where now Ĉ is a 2π-periodic function,
whose values are N × N matrices. As in the scalar case, the stability estimate
kC(λ, ∆t)µ k`2 ←`2 ≤ K(λ) holds for all µ∆t ≤ T if and only if
∂ ∂
u + A(t, x) u = 0 (A N×N matrix, u ∈ CN, x ∈ R, 0 ≤ t ≤ T ) (6.46)
∂t ∂x
is called regularly hyperbolic, if the eigenvalues di (t, x) (1 ≤ i ≤ N ) of A(t, x)
are real and distinct; i.e., there is some δ > 0 such that
Theorem 6.58. Suppose that (6.46) is a regularly hyperbolic system satisfying the
smoothness condition
is `∞ stable.
6.5 Sufficient and Necessary Conditions for Stability 129
6.5.6 Generalisations
where d
X
νξ = hν, ξi = νj ξj
k=1
denotes the Euclidean scalar product in Rd .
The stability analysis of system (6.47) with N × N matrices Ak leads to the
following complication. In the univariate case d = 1, all coefficient matrices aj
of C(λ, ∆t) are derived from only one matrix A1 , so that in the standard case the
matrices aj are mutually commutable (and thereby simultaneously diagonalisable).
For d > 1 with non-commutative matrices Ak in (6.47), the matrices aj are expected
to be not simultaneously diagonalisable.
17
In principle, different step widths ∆xj make sense. However, after a transformation xj 7→
∆x1
∆xj j
x of the spatial variables we regain a common step size ∆x.
130 6 Instationary Partial Differential Equations
with 0 ≤ t0 ≤ t0 + µ∆t ≤ T .
Criteria 6.26, 6.36 and Lemma 6.34 hold also in the time-dependent case.
The differential equation may contain spatially dependent coefficients as, e.g., in
the differential equation (6.44). Correspondingly, all coefficients aj = aj (x) of
C(λ, ∆t) may depend on x. Criterion 6.57 explicitly admits variable coefficients in
the case of positive difference schemes.
There are criteria using again the function
X
G(x, ξ) := aj (x) e−ijξ , (6.48)
j∈Z
Remark 6.59 (Kreiss [9, §7]). Stability of C(x0 ; λ, ∆t) at all x0 ∈ R is, in general,
neither sufficient nor necessary for the stability of C(λ, ∆t).
We hint to a connection between condition (6.49) and the Definition 5.23 of the
stability of ordinary differential equations, which requires that zeros λν of ψ with
|λν | = 1 be simple zeros, while otherwise |λν | < 1 must hold. In the case of (6.49),
the powers kG(x, ξ)n k must be uniformly bounded, while in the second case the
companion matrix must satisfy kAn k ≤ const (cf. Remark 5.39). However, the
difference is that in the second case only finitely many eigenvalues exist, so that
max |λν | < 1 holds for all eigenvalues with |λν | 6= 1. In the case of |λν (x, ∆t, ξ)|,
the eigenvalues λν are continuous functions of ξ and their absolute value may tend
to 1. Condition (6.49) describes quantitatively how fast λν (x, ∆t, ξ) approaches 1.
The stability result of Kreiss [10] takes the following form (see also [21, §5.4]).
If we replace the real axis in Σ = [0, T ]×R (cf. (6.3)) by an interval or the half-axis
[0, ∞), the new computational domain Σ = [0, T ] × [0, ∞) has a non-empty spatial
boundary [0, T ] × {0}. Then the parabolic initial-value problem has to be completed
by a boundary condition at x = 0; e.g.,
u(t, 0) = uB (t)
As usual, we need starting values for µ = 0 and µ = 1 to proceed with the leap-frog
scheme. The name originates from the fact that the computation of Uνµ+1 involves
only values
Unm with n + m = ν + µ + 1 (modulo 2).
∆t
The grid Σ∆x defined in (6.10) splits into the two colours of the chequer board:
˙ odd , where
Σeven ∪Σ
∆t
Σeven = {(n∆x, m∆t) ∈ Σ∆x : n + m even integer}.
This observation allows us to reduce the computation to Σeven , which halves the
cost of the computation.
The stability analysis follows the idea of Remark 5.38b: we formulate the two-
µ
step method as a one-step method for V µ := UUµ−1 :
µ+1 µ
U λa (E1 − E−1 ) 1 U
= ;
Uµ 1 0 U µ−1
Here Ej is the shift operator from (6.22). The corresponding characteristic function
G(ξ) from (6.34) is now matrix-valued:
λa e−iξ − eiξ 1
−2iλa sin ξ 1
G(ξ) = = (0 ≤ ξ ≤ 2π).
1 0 1 0
One easily checks that |λa| > 1 together with ξ = π/2 (i.e., sin ξ = 1) leads to
an eigenvalue λ(ξ) with |λ(ξ)| > 1. Von Neumann’s condition in Criterion 6.54a
implies that |λa| ≤ 1 is necessary for `2 stability.
Proposition 6.61. The leap-frog scheme (6.50) is `2 stable if and only if |λa| < 1.
Proof. It remains to show that |λa| < 1 implies stability. For this purpose we give
an explicit
description of the powers G(ξ)n . Abbreviate x := −λa sin ξ, so that
G(ξ) = 2ix 1
1 0 . We claim that
n
i Un (x) in−1 Un−1 (x)
n
G(ξ) = n−1 for n ≥ 1, (6.51)
i Un−1 (x) in−2 Un−2 (x)
where Un are the Chebyshev polynomials of the second kind. These are defined on
[−1, 1] by
sin((n + 1) arccos(x))
Un (x) := √ (n = −1, 0, 1, . . .)
1 − x2
and satisfy the same three-term recursion
Un+1 (x) = 2xUn (x) − Un−1 (x)
as the Chebyshev polynomials (of the first kind) mentioned in Footnote 7 on
page 52. Statement (6.51) holds for n = 1. The recursion formula together with
G(ξ)n+1 = G(ξ)n G(ξ) proves the induction step.
If |λa| ≤ A < 1, the inequalities |x| ≤ A < 1 and |ϕ| ≤ arccos(A) < 1 follow,
where ϕ := arccos(x). One verifies that
18
If the amplification factor is some fixed ζ > 1, the instability effects of kC(λ, ∆t)n k > ζ n
are easily observed and may lead to an overflow in the very end. On the other hand, we conclude
134 6 Instationary Partial Differential Equations
∂ 1
u≈ [u(t + ∆t, x) − u(t − ∆t, x)] and
∂t 2∆t
∂2 1
2
u≈ [u(t, x − ∆x) − 2u(t, x) + u(t, x + ∆x)].
∂x ∆x2
In order to obtain the leap-frog pattern, we replace 2u(t, x) by the average
u(t + ∆t, x) + u(t − ∆t, x). Together with λ = ∆t/∆x2 (cf. (6.11)), we obtain the
Du Fort–Frankel scheme
µ µ
(1 + 2λ)Uνµ+1 = (1 − 2λ)Uνµ−1 + 2λ[Uν+1 + Uν−1 ]. (6.52)
The stability analysis can again be based on the representation of G(ξ)n . For
λ ≤ 1/2, a simpler approach makes use of the fact that the coefficients 1−2λ
1+2λ and
2λ µ µ
1+2λ of Uνµ+1 , Uν+1 , Uν−1 are positive. As in Criterion 6.30, one concludes that
the scheme (6.52) is `∞ stable, which implies `2 stability.
For λ > 1/2 we return to the characteristic function
2λ
e−iξ + eiξ 1−2λ 4λ 1−2λ
G(ξ) = 1+2λ 1+2λ = 1+2λ cos ξ 1+2λ (0 ≤ ξ ≤ 2π).
1 0 1 0
Proposition 6.62. For any fixed λ =∆t/∆x2 , the Du Fort–Frankel scheme (6.52) is
`2 stable.
ut = uxx − µutt ;
from kC(λ, ∆t)n k ≈ n that the result at some fixed T = n∆t > 0 contains errors, amplified
by T /∆t. If the initial values are such that the consistency error is of the second order O(∆t2 ),
we have still a discretisation error O(∆t) at t = T . As in §4.6, we have to take into consideration
T
floating point errors, which are also amplified by ∆t . Together, we obtain an error O(∆t2 + eps
∆t
),
eps: machine precision, so that we cannot obtain better results than O(eps2/3 ).
6.6 Consistency Versus Stability 135
Finally, we discuss the dissipativity (6.49) for discretisations of the heat equation
(6.4). Since the solution operator strongly damps high-frequency components, a
smoothing effect occurs: initial values u0 , which are only supposed to be contin-
uous (or to belong to L∞ ), lead to solutions u(t) of (6.5) which are infinitely often
differentiable for all t > 0. A corresponding condition for the discrete schemes is
condition (6.49), which in this case takes the form
2r
|G(ξ)| ≤ 1 − δ |ξ| for all |ξ| ≤ π, (6.53)
since G does not depend on x, and 1 × 1 matrices coincide with the eigenvalue.
Exercise 6.63. (a) The simplest scheme (6.18) satisfies (6.53) with the parameters
r = 1,
δ = min{4λ, 2 (1 − 2λ)}/π 2 for 0 < λ ≤ 1/2.
Dissipativity holds for 0 < λ < 1/2 because of δ > 0, but it fails for λ = 1/2.
(b) The Crank–Nicolson scheme (which is (6.41) with Θ = 1/2) is dissipative for
all λ > 0 with r = 1. For λ → ∞, the `2 stability is uniform (i.e., the stability
constant remains bounded for λ → ∞), but dissipativity vanishes (i.e., δ → 0).
Here the functions are to be evaluated at (t, x) and ‘. . .’ are higher-order terms. The
first bracket vanishes, since u is a solution of ut = aux . The second bracket is, in
general, different from zero, so that the discretisation (6.17a) has consistency order
p = 1.
136 6 Instationary Partial Differential Equations
Remark 6.64. There is an exceptional case in the derivation from above. Applying
the differential equation twice, we get
u(t + ∆t, x)
∆t2 (aλ∆x)2
= u + ∆t ut + 2 utt + O(∆t3 ) = u + aλ∆x ux + 2 uxx + O(∆t3 )
= u(t, x) + aλ[u(t, x + ∆x) − u(t, x − ∆x)]
(aλ)2
+ 2 [u(t, x + ∆x) − 2u(t, x) + u(t, x − ∆x)] + O(∆t3 ).
Lemma 6.65. The Lax–Wendroff scheme (6.54) is `2 stable if and only if |aλ| ≤ 1.
Proof. Verify that G(ξ) = 1 − iaλ sin ξ − (aλ)2 (1 − cos ξ) is the characteristic
function. For
2 2
|G(ξ)| = [1 − τ (1 − cos ξ)] + τ sin 2 ξ with τ := (aλ)2
we use sin2 ξ = 1− cos2 ξ and substitute x := cos ξ. Then we have to prove that the
polynomial
2
p(x) = [1 − τ (1 − x)] + τ 1 − x2 = 1 + τ 2 − τ + 2(τ − τ 2 )x + τ 2 − τ x2
remains bounded by 1 for all values x = cos ξ ∈ [−1, 1]. Inserting x = −1 yields
2
(2τ − 1) and proves that τ ≤ 1 is necessary (τ ≥ 0 holds by definition).
2
Since by definition |G(ξ)| = p(x) ≥ 0, the maximum is given by
2
max{p(1), p(−1)} = max{1, (2τ − 1) } = 1,
where for the last step we used |aλ| ≤ 1. Now `2 stability follows from Theorem
6.44. tu
As in §5.5.6, we may ask whether there is a possible conflict between consistency
and stability. While in Theorem 5.47 the consistency order p is limited, it is now the
parity of p which need to be restricted in certain cases.
References 137
∂ ∂
We consider the hyperbolic problem ∂t u = a ∂x u and a general explicit
P differ-
ence scheme (6.16) with coefficients aj . Again, we introduce G(ξ) := j aj e−ijξ
(cf. (6.34)). Furthermore, we choose the Banach space `∞ and ask for `∞ stable
schemes. Using Theorem 6.47, Thomée [24] proves the following implication for
the consistency order.
Theorem 6.66. Under the assumption from above, `∞ stability implies that the con-
sistency order is odd.19 Furthermore, there are `∞ stable schemes for any odd order.
Since the Lax–Wendroff scheme has even consistency order (p = 2), it cannot
be `∞ stable. However, the instability is rather weak. Thomée [24] also proves the
following two-sided inequality for the Lax–Wendroff scheme:
References
1. Courant, R., Friedrichs, K.O., Lewy, H.: Über die partiellen Differentialgleichungen der
mathematischen Physik. Math. Ann. 100, 32–74 (1928)
2. Crank, J., Nicolson, P.: A practical method for numerical evaluation of partial differential
equations of the heat-conduction type. Proc. Cambridge Phil. Soc. 43, 50–67 (1947). (reprint
in Adv. Comput. Math. 6, 207–226, 1996)
3. de Moura, C.A., Kubrusly, C.S. (eds.): The Courant–Friedrichs–Lewy (CFL) condition.
Springer, New York (2013)
4. Du Fort, E.C., Frankel, S.P.: Stability conditions in the numerical treatment of parabolic differ-
ential equations. Math. Tables and Other Aids to Computation 7, 135–152 (1953)
5. Friedrichs, K.O.: Symmetric hyperbolic linear differential equations. Comm. Pure Appl. Math.
7, 345–392 (1954)
6. Hackbusch, W.: Iterative Lösung großer schwachbesetzter Gleichungssysteme, 2nd ed.
Teubner, Stuttgart (1993)
7. Hackbusch, W.: Iterative solution of large sparse systems of equations. Springer, New York
(1994)
8. Hackbusch, W.: Elliptic differential equations. Theory and numerical treatment, Springer
Series in Computational Mathematics, vol. 18, 2nd ed. Springer, Berlin (2003)
9. Kreiss, H.O.: Über die Stabilitätsdefinition für Differenzengleichungen die partielle Differen-
tialgleichungen approximieren. BIT 2, 153–181 (1962)
10. Kreiss, H.O.: On difference approximations of the dissipative type for hyperbolic differential
equations. Comm. Pure Appl. Math. 17, 335–353 (1964)
11. Kreiss, H.O.: Difference approximations for the initial-boundary value problem for hyperbolic
differential equations. In: D. Greenspan (ed.) Numerical solution of nonlinear differential
equations, pp. 141–166. Wiley, New York (1966)
12. Kreiss, H.O.: Initial boundary value problem for hyperbolic systems. Comm. Pure Appl. Math.
23, 277–298 (1970)
13. Kreiss, H.O., Wu, L.: On the stability definition of difference approximations for the initial
boundary value problem. Appl. Numer. Math. 12, 213–227 (1993)
14. Kröner, D.: Numerical Schemes for Conservation Laws. J. Wiley und Teubner, Stuttgart
(1997)
19
This statement does not extend to the parabolic case.
138 6 Instationary Partial Differential Equations
15. Lax, P.D.: Difference approximations of linear differential equations – an operator theoretical
approach. In: N. Aronszajn, C.B. Morrey Jr (eds.) Lecture series of the symposium on partial
differential equations, pp. 33–66. Dept. Math., Univ. of Kansas (1957)
16. Lax, P.D., Richtmyer, R.D.: Survey of the stability of linear finite difference equations. Comm.
Pure Appl. Math. 9, 267–293 (1965)
17. Lax, P.D., Wendroff, B.: Systems of conservation laws. Comm. Pure Appl. Math. 13, 217–237
(1960)
18. Lax, P.D., Wendroff, B.: Difference schemes for hyperbolic equations with high order of ac-
curacy. Comm. Pure Appl. Math. 17, 381–398 (1964)
19. LeVeque, R.J.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press,
Cambridge (2002)
20. Mizohata, S.: The theory of partial differential equations. University Press, Cambridge (1973)
21. Richtmyer, R.D., Morton, K.W.: Difference Methods for Initial-value Problems, 2nd ed. John
Wiley & Sons, New York (1967). Reprint by Krieger Publ. Comp., Malabar, Florida, 1994
22. Riesz, F., Sz.-Nagy, B.: Functional Analysis. Dover Publ. Inc, New York (1990)
23. Riesz, M.: Sur les maxima des formes bilinéaires et sur les fonctionelles linéaires. Acta Math.
49, 465–497 (1926)
24. Thomée, V.: Stability of difference schemes in the maximum-norm. J. Differential Equations
1, 273–292 (1965)
25. Thorin, O.: An extension of the convexity theorem of M. Riesz. Kungl. Fysiogr. Sällsk. i Lund
Förh. 8(14) (1939)
26. Tomoeda, K.: Stability of Friedrichs’s scheme in the maximum norm for hyperbolic systems
in one space dimension. Appl. Math. Comput. 7, 313–320 (1980)
27. Zeidler, E. (ed.): Oxford Users’ Guide to Mathematics. Oxford University Press, Oxford
(2004)
Chapter 7
Stability for Discretisations of Elliptic Problems
u=0 on Γ.
d
X 2
aij (x)ξi ξj ≥ δ kξk2 for all x ∈ Ω and all ξ = (ξ1 , . . . , ξd ) ∈ Rd
i,j=1
7.2 Discretisation
N0 ⊂ N,
Instead of u(x, y) from (7.1), we are looking for approximations of u at the nodal
points (νh, µh) ∈ Ωn :
uν,µ ≈ u(νh, µh).
In each nodal point (νh, µh), the second x derivative uxx from (7.1) is replaced by
the second divided difference h12 [uν−1,µ − 2uν,µ + uν+1,µ ]. Correspondingly, uyy
becomes h12 [uν,µ−1 − 2uν,µ + uν,µ+1 ]. Together, we obtain the so-called five-point
scheme:
1
[−4uν,µ + uν−1,µ + uν+1,µ + uν,µ−1 + uν,µ+1 ] = fν,µ for all 1 ≤ ν, µ ≤ n,
h2
(7.3)
where fν,µ := f (νh, µh) is the evaluation1 of the right-hand side f of (7.1). If
ν = 1, equation (7.3) contains also the value uν−1,µ = u0,µ . Note that the point
(0, µh) lies on the boundary Γn := {(x, y) ∈ Γ : x/h, y/h ∈ Z}, and does not
1
A finite element discretisation using piecewise linear functions in a regular triangle grid yields
the same matrix, only the right-hand side fh consists of integral mean-values fν,µ instead of point
evaluations.
7.2 Discretisation 141
Ln un = fn , (7.4)
Remark 7.1. The matrix Ln from (7.4) and (7.3) has the following properties:
(a) Ln is sparse, in particular, it possesses at most five non-zero entries per row.
(b) Ln is symmetric.
4
(c) −Ln has positive diagonal elements h2 , while all off-diagonal entries are ≤ 0.
(d) the sum of entries in each row of −Ln is ≥ 0. More precisely: if 2 ≤ ν, µ ≤ n − 1,
the sum is 0; at the corner points (ν, µ) ∈ {(1, 1), (1, n), (n, 1), (n, n)} the sum
equals 2/h2 ; for the other points with ν or µ in {1, n} the sum is 1/h2 .
(e) For a concrete representation of the matrix Ln , one must order the com-
ponents of the vectors un , fn . A possible ordering is the lexicographical one:
(1, 1), (2, 1), . . . , (n, 1), (1, 2), . . . , (n, 2), . . . , (1, n), . . . , (n, n). In this case,
Ln has the block form
T I −4 1
I T I 1 −4 1
1 .. .. ..
. . .
Ln = 2 . . .
with T =
.. .. .. ,
h
I T I 1 −4 1
I T 1 −4
where all blocks T, I are of size n × n. Empty blocks and matrix entries are zero.
In the case of the matrix Ln from (7.3) (but not for any discretisation of elliptic
differential equations) Ln is in particular an M-matrix (cf. [7, §6.4], [8, §4.3]).
Remark 7.1c shows the properties (a) and (b) of A = −Ln . Remark 7.1d, to-
gether with the fact that Ln is irreducible, describes that A = −Ln is irreducibly
diagonal dominant. Irreducibly diagonal dominant matrices with the properties (a)
and (b) already possess property (c); i.e., −Ln is an M-matrix (cf. Hackbusch [7,
Theorem 6.4.10b]).
142 7 Stability for Discretisations of Elliptic Problems
7.3.1 Consistency
Rn
The consistency condition must ensure that Ln is a discretisa-
X
X −→ Xn tion of the differential operator L. For this purpose we intro-
↓L ↓ Ln duce Banach spaces for the domains and ranges of L and Ln :
X, Y, Xn , Yn , so that
PYn
Y Yn L : X → Y, Ln : Xn → Yn (n ∈ N0 ) (7.5a)
n
RY
PYn : Yn → Y (n ∈ N0 ). (7.5c)
The consistency condition has to relate L and Ln . Since both mappings act in
n
different spaces, the auxiliary mappings RX , RYn , PYn are needed. They permit us to
formulate consistency in two versions. Condition (7.6a) measures the consistency
error by k · kYn , while (7.6b) uses k · kY :
n
lim k (Ln RX − RYn L) ukYn = 0 for all u ∈ X, (7.6a)
N0 3n→∞
lim k (PYn Ln RX
n
− L) ukY = 0 for all u ∈ X. (7.6b)
N0 3n→∞
The conditions (7.6a,b) are almost equivalent. For this purpose, some of the
following technical assumptions are of interest:
Under the conditions (7.7a-c), the consistency formulations (7.6a,b) are equiva-
lent; more precisely:
7.3 General Concept 143
proves (7.6a). t u
7.3.2 Convergence
In the sequel, a further norm k · kX̂n on X can be fixed, which may be weaker than
k · kXn (or equal), but stronger than k · kYn (or equal):2
k · kXn & k · kX̂n & k · kYn .
7.3.3 Stability
sup kL−1
n kX̂n ←Yn ≤ Cstab . (7.11)
N0 3n→∞
n
k(RX − L−1 n −1 n n
n RY L)ukX̂n ≤ kLn kXn ←Yn k (Ln RX − RY L) ukYn
n
≤ Cstab k (Ln RX − RYn L) ukYn → 0,
Although the setting is similar to the previous chapters, a stability theorem stating
that ‘convergence implies stability’ is, in general, not valid, as we shall prove in
§7.4.2. However, the stability theorem can be based on the convergence definition
in (7.10). Again, the following technical requirement comes into play:
kfn kYn 0
sup ≥ 1/CR for all 0 6= fn ∈ Yn , n ∈ N0 . (7.12)
f ∈Y n
: fn =RY f kf kY
0
Lemma 7.9. The conditions (7.7b,d) imply (7.12) with CR = CP .
kfn kYn 1 kfn kYn
Proof. f := PYn fn yields sup{. . .} ≥ kPYn fn kY ≥ CP kfn kYn = 1/CP . t
u
Theorem 7.10 (stability theorem). Suppose (7.8), kL−1 kX̂←Y ≤ CL−1 , and
(7.12). Then convergence in the sense of (7.10) implies stability (7.11).
kL−1 n n −1
n RY kX̂n ←Y ≤ C + kRX L kX̂n ←Yn ≤ C + ĈR CL−1 .
0
Inequality (7.12) implies kRYn f kYn ≤ CR kf kY . The definition of the operator
norm together with (7.12) yields
kL−1 n
n RY f kX̂n kL−1 n
n RY f kX̂n kRY f kYn
n
kL−1 n
n RY kX̂n ←Y = sup = sup n
06=f ∈Y kf kY 06=f ∈Y kRY f kYn kf kY
kL−1
n fn kX̂n kfn kYn kL−1
n kX̂n ←Yn
= sup sup ≥ 0 ,
nf
06=fn ∈Yn f ∈Y : fn =RY kfn kYn kf kY CR
Theorem 7.11 (equivalence theorem). Suppose that the spaces X̂, X, and Y are
chosen such that L : X → Y is bijective, kL−1 kX̂←Y ≤ CL−1 , and (7.8) and
(7.12) hold. Then convergence (7.9) and stability (7.11) are equivalent.
Remark 7.12. Consistency and convergence can be regarded in two different ways.
The previous setting asked for the respective conditions for all u ∈ X. On the other
hand, we can consider one particular solution u = L−1 f . Because of extra smooth-
ness, the consistency error εn := k (Ln RX n
− RYn L) ukYn may behave as O(n−α )
for some α > 0. Then the proof of Theorem 7.8 shows that also the discretisation
error k(RX n
− L−1 n
n RY L)ukX̂n ≤ Cstab εn is of the same kind.
The concrete choice of the Banach spaces depends on the kind of discretisation.
The following spaces correspond to difference methods and classical solutions of
boundary value problems:
146 7 Stability for Discretisations of Elliptic Problems
X = C 2 (Ω) ∩ C0 (Ω),
Y = C(Ω),
2 (7.13a)
Xn = Rn , Xn 3 un = (uν,µ )(νh,µh)∈Ωn ,
2
Yn = Rn , Yn 3 fn = (fν,µ )(νh,µh)∈Ωn ,
with the norms3
kukX = max{kuk∞ , kux k∞ , kuy k∞ , kuxx k∞ , kuyy k∞ },
kukY = kuk∞ := max{|u(x, y)| : (x, y) ∈ Ω},
(7.13b)
kun kXn = max{kun k∞ , k∂x un k∞ , k∂y un k∞ , k∂xx un k∞ , k∂yy un k∞ },
kfn kYn = kfn k∞ := max{|uν,µ | : (νh, µh) ∈ Ωn },
where ∂x , ∂y , ∂xx , ∂yy are the first and second divided difference quotients with
step size h := 1/n. The associated restrictions are the point-wise restrictions:
n
(RX u)ν,µ := u(νh, µh) , (RYn f )ν,µ := f (νh, µh) .
n
The components of (Ln RX − RYn L)u are
[∂xx + ∂yy ] u(x, y) − [uxx (x, y) + uyy (x, y)] for (x, y) ∈ Ωn .
Since, for u ∈ X0 ⊂ C 2 (Ω) ∩ C0 (Ω), the second differences ∂xx u converge uni-
n
formly to the second derivative uxx , it follows that k(Ln RX − RYn L)ukXn → 0
for n → 0. Therefore, the consistency condition (7.6a) is verified.
Consistency follows immediately for X0 = X = C 2 (Ω) ∩ C0 (Ω), since the
n
difference quotients in Ln RX u tend uniformly to the derivatives in RYn Lu.
The additional space X̂ = C0 (Ω) is equipped with the maximum norm
Therefore, kL−1 −1
n kX̂n ←Yn becomes the row-sum norm kLn k∞ . For the estimate of
−1
kLn k∞ in the model example (7.3), we use that −Ln is an M-matrix (cf. Definition
7.2). For M-matrices there is a constructive way to determine the row-sum norm of
the inverse. In the following lemma, 1I is the vector with all entries of value one.
Lemma 7.13. Let A be an M-matrix and w a vector such that the inequality
Aw ≥1I holds component-wise. Then kA−1 k∞ ≤ kwk∞ holds.
n
Proof. For u ∈ Rn , the vector (|ui |)i=1 is denoted by |u|. The following inequal-
ities are to be understood component-wise. We have |u| ≤ kuk∞ 1I ≤ kuk∞ Aw.
Because of the M-matrix property (c), A−1 ≥ 0 holds, so that
−1
A u ≤ A−1 |u| ≤ A−1 kuk Aw = kuk w
∞ ∞
and kA−1 uk∞ / kuk∞ ≤ kwk∞ can be obtained. Therefore, the desired estimate
follows: kA−1 k∞ = supu6=0 kA−1 uk∞ / kuk∞ ≤ kwk∞ . t
u
3
Only the norm of Yh appears in (7.6a) and (7.11).
7.4 Application to Difference Schemes 147
In the case of A = −Ln one has to look for a function w(x, y) with Lw(x, y) ≥ 1.
A possible solution is w(x, y) = 12 x (1 − x) with the maximum norm kwk∞ = 1/8.
n
Then the point-wise restriction yields the vector wh = RX w on the grid Ωn . Since
second differences and second derivatives are identical in the case of a quadratic
function, it follows that (−Ln ) wh ≥ 1I and kwh k∞ ≤ 1/8, proving the stability
property
kL−1
n k∞ ≤ 1/8 for all n ∈ N (7.14)
with Cstab = 1/8.
Using the consistency, which is checked above, and the stability result (7.14), we
obtain convergence by Theorem 7.8. As discussed in §7.4.2, the given theory does
not fully correspond to the previous setting, because convergence does not imply
stability.
Another aspect is the order of consistency. So far, only the convergence order
n
o(1) is shown. In general, one likes to show that kRX u − un kXn = O(hκ ) for
some κ > 0. For this purpose, the solution u ∈ X must have additional smoothness
properties: u ∈ Z for some Z ⊂ X = C 2 (Ω) ∩ C0 (Ω). Choosing
Z := X ∩ C 4 (Ω),
n
we conclude that k (Ln RX − RYn L) ukYn = O(h2 ) for all u ∈ Z. Correspondingly,
convergence follows with the same order:
n
kRX u − un kX̂n = O(h2 ).
7.4.2 Bijectivity of L
For the proof of the second statement, we have to introduce the generalised
solutions u = L−1 f , which do not belong to C 2 (Ω) ∩ C0 (Ω). For this purpose,
one can use Green’s representation
Z
u(x) = G(x, y)f (y)dy. (7.15)
Ω
The Green function G(·, ·) satisfies LG(·, y) = 0 for all y ∈ Ω and G(x, ·) = 0 for
x ∈ Γ . In the case of Poisson’s equation (7.1) in the circle Ω = {kxk < 1} ⊂ R2 ,
" #
1
1
G(x, y) = − log kx − yk − log kxk
y − 2x
2π kxk
∂
holds (cf. §2.2 in [8, 10]). Since G as well as ∂x i
G(x, y) has an integrable singular-
ity, the integral in (7.15) exists and defines a function in X̂ = C 1 (Ω) ∩ C0 (Ω). In
particular, X̂ = C0 (Ω) leads to the (finite) operator norm
Z
−1
L
C(Ω)←C(Ω)
= |G(x, y)| dy.
Ω
Note that s = 0 yields the trivial statement L : H01 (Ω) H −1 (Ω). By Theorem
3/2−ε
7.15, smooth functions f produce solutions in the space H0 (Ω) for all ε > 0.
4
The Sobolev spaces H t (Ω) and H0t (Ω) are introduced in [8, 10].
7.5 Finite Element Discretisation 149
The variation formulation of the boundary value problem (7.1) and (7.2) is based on
a bilinear form:
Z
a(u, v) = − f (x)v(x)dx for all v ∈ H01 (Ω), where (7.16a)
Ω
Z X
d ∂u(x) ∂v(x)
a(u, v) := dx. (7.16b)
Ω j=1 ∂xj ∂xj
The Sobolev space H01 (Ω) can be understood as the completion of C 1 (Ω) ∩ C0 (Ω)
with respect to the norm
v
uZ d 2
u 2
X ∂
kukH 1 := t |u(x)| +
∂xj u(x) dx
Ω j=1
(cf. §6.2 in [8, 10]). The dual space H −1 (Ω) consists of all functionals with finite
norm
kf kH −1 = sup{|f (v)| : v ∈ H01 (Ω), kvkH 1 = 1}.
The second embedding in H01 (Ω) ⊂ L2 (Ω) ⊂ H −1 (Ω) Ris based on the identifica-
tion of functions f ∈ L2 (Ω) with the functional f (v) := Ω f (x)v(x)dx.
If the bilinear form a(·, ·) is bounded, i.e.,
Ca := sup{a(u, v) : u, v ∈ H01 (Ω), kukH 1 = kvkH 1 = 1} < ∞, (7.17)
and problem (7.16a) is identical to the abstract equation
The norm kAkH 1 ←H −1 coincides with Ca from above. The particular example
(7.16b) satisfies Ca < 1.
150 7 Stability for Discretisations of Elliptic Problems
If A−1 ∈ L(H −1 (Ω), H01 (Ω)) exists, u := A−1 f is the desired solution (it is
called a ‘weak solution’). The existence of A−1 can be expressed by the so-called
inf-sup conditions for the bilinear form a (more precisely, the inf-sup expression is
an equivalent formulation of 1/kA−1 kH 1 ←H −1 ; cf. §6.5 in [8, 10]).
A very convenient, sufficient condition is the H01 (Ω)-coercivity5
2
a(u, u) ≥ εco kukH 1 with εco > 0 for all u ∈ H01 (Ω). (7.19)
Above, we used the spaces X, Y together with the differential operator L. Now
L will be replaced by A, involving the spaces
As (7.19) is valid for (7.16b), A ∈ L(X, Y ) is bijective; i.e., A−1 ∈ L(Y, X) exists.
Exercise 7.16. The dual norm of Xn∗ is not the restriction of the dual norm of X ∗ to
Un . Prove that kfn kXn∗ ≤ kfn kX ∗ for fn ∈ Xn∗ .
kA−1
n kXn ←Xn
∗ ≤ kA
−1
kX←X ∗
holds.
7.5.3 Consistency
JΠn = Πn∗ J;
i.e., the dual projection has the representation Πn∗ = JΠn J ∗ = JΠn J −1 .
(d) Property (7.21) is equivalent to
Proof. For Part (b) write yn ∈ Yn and yn⊥ ∈ Yn⊥ as yn = Jxn and yn⊥ = Jx⊥ n
for suitable xn ∈ Xn and x⊥ ⊥
n ∈ Xn . Then
For Part (c) verify that Πn∗ maps y = yn + yn⊥ into yn (with notations yn ∈ Yn
and yn⊥ ∈ Yn⊥ as in Part (b)).
For Part (d) with v = Ju use kv − Πn∗ vkX ∗ = k (I − Πn∗ ) JukX ∗ =Part (c)
kJ (I − Πn ) ukX ∗ = k (I − Πn ) ukX . t u
We recall the operators A from (7.18) and An from (7.20).
Remark 7.19. The relation between A ∈ L(X, X ∗ ) and An ∈ L(Xn , Xn∗ ) is given
by
An = Πn∗ AΠn .
Proof. This follows from hAn un , vn iX ∗ ×Xn = a(un , vn ) = a(Πn un , Πn vn ) =
n
hAΠn un ,Πn vn iX ∗ ×X = hΠn∗ AΠn un , vn iX ∗ ×X for all un , vn ∈ Xn . t
u
The canonical choice of the space Yn is Xn∗ . The mappings between X, Y, Xn , Yn
are as follows:
Y := X ∗ , Yn := Xn∗ , (7.23a)
n
RX = Πn : X → Xn X-orthogonal projection onto Xn , (7.23b)
RYn : Y → Yn restriction to Xn ⊂ X; i.e., fn = RYn f = f |Xn ∈ Xn∗ , (7.23c)
n ∗
PYn := (RX ) = Πn∗ = JΠn J ∗ : Yn → Y. (7.23d)
n
We consider RX as a mapping onto Xn (not into X). Concerning RYn , note that
the mapping f ∈ Y = X ∗ can be restricted to the subspace Xn ⊂ X. For PYn = Πn∗
compare Remark 7.19a.
Lemma 7.20. Assumption (7.21) implies the consistency statement (7.6a).
n
Proof. Application of the functional (An RX − RYn A) u to vn ∈ Xn yields
n
[(An RX − RYn A) u] (vn ) = an (RX
n
u, vn ) − a(u, vn ) = a(Πn u, vn ) − a(u, vn )
= a(Πn u − u, vn ),
n
proving k (Ln RX − RYn L) ukYn ≤ Ca ku − Πn ukX → 0 with Ca from (7.17)
because of (7.21). t
u
Next, we verify that all conditions (7.7a–e) are valid. The constants are
CP = CP0 = CR = 1. Concerning the proof of (7.7a), we note that
Hence, the consistency statement (7.6a) is equivalent to (7.6b) (cf. Proposition 7.4a),
which reads (PYn An RXn
− A) u = (Πn∗ An Πn − A)u → 0 in Y .
7.5 Finite Element Discretisation 153
where u and un are the exact and discrete solutions, respectively. This is not the
standard Galerkin convergence statement
u − un → 0 in X for N0 3 n → ∞,
Finite element discretisation uses the subspace Un ⊂ X = H01 (Ω) of, e.g., piece-
wise linear functions on a triangulation. Let h = hn be the largest diameter of
the involved triangles (or other elements, e.g., tetrahedra in the three-dimensional
case). In the standard case, hn = O(n−1/d ) is to be expected (d: spatial dimension,
Ω ⊂ Rd ), provided that n is related to the dimension: n = dim(Un ).
The consistency error is bounded by Ca ku − Πn ukX as shown by the proof
of Lemma 7.20. According to Remark 7.12, the discretisation error ku − un kX
can be estimated by7 (1 + Cstab Ca ) ku − Πn ukX . While u ∈ X = H01 (Ω) allows
only point-wise convergence, quantitative error bound can be expected for smoother
functions u. Assume, e.g., u ∈ H 2 (Ω) ∩ H01 (Ω). Then the finite element error can
be proved to be ku − un kH01 (Ω) = O(h). More generally, finite elements of degree
p lead to an error8
to A−1 ∈ L(H −1 (Ω), H01 (Ω)) (cf. §7.5.1). It describes that the solution process
increases the degree of smoothness by 2. One may ask whether a similar statement
holds for certain t > 1. In this case, the problem is called t-regular. Such regularity
statements depend on the smoothness of the coefficients of the differential operator
(7.2) and on the smoothness of the boundary. Theorem 7.15 yields t-regularity for
1 < t < 3/2 under the condition that Ω is a Lipschitz domain. For convex domains
(or domains that are smooth images of convex domains) and sufficiently smooth
coefficients, the problem is 2-regular:
7.5.6 L2 Error
The adjoint operator A∗ ∈ L(X, X ∗ ) belongs to the adjoint bilinear form a∗ defined
by a∗ (u, v) := a(v, u). In this section we assume that A∗ is 2-regular.
The representation u − un = A−1 f − A−1
n Πn f = (A
−1
− A−1
n Πn )f shows that
En := A−1 − A−1
n Πn = A
−1
− Πn A−1
n Πn
where C2 depends on details of the finite element triangulation (cf. [8, 10]).
Theorem 7.22. Assume that A∗ is 2-regular, and kEn∗ gkH01 (Ω) ≤ C2 hn kgkL2 (Ω) .
Then
ku − un kL2 (Ω) = kEn f kL2 (Ω) ≤ Ca C2 hn kEn f kH01 (Ω) .
7.5 Finite Element Discretisation 155
kEn f kL2 (Ω) = max{| (Ef, g)L2 (Ω) | : g ∈ L2 (Ω), kgkL2 (Ω) = 1},
Theorem 7.22 states that the L2 error is by one factor of hn better than the H 1
error kEn f kH01 (Ω) = ku − un kH01 (Ω) . This result, which traces back to Aubin [1]
and Nitsche [16], is usually proved differently, making indirect use of Lemma 7.21.
If coercivity (7.19) holds and if a is symmetric: a(u, v) = a(v, u), the variational
formulation (7.16a) is equivalent to the minimisation of J(u) := 12 a(u, u) − f (u).
Quite another type is the following saddle point problem. We are looking for
functions v ∈ V and w ∈ W (V, W Hilbert spaces) satisfying
Define
J(v, w) := 12 a(v, v) + b(w, v) − f1 (v) − f2 (w).
The following saddle point properties are proved in [8, 10, Theorem 12.2.4].
156 7 Stability for Discretisations of Elliptic Problems
The bilinear forms a and b in (7.25) give rise to operators A ∈ L(V, V ∗ ) and
B ∈ L(W, V ∗ ). The variational setting (7.25) is equivalent to the operator equation
A B v f
= 1 .
B∗ 0 w f2
where
V0,n := ker(Bn∗ ) = {v ∈ V : b(y, v) = 0 for all y ∈ Wn } ⊂ Vn .
The uniform boundedness of Cn−1 is expressed by the fact that the positive numbers
α, β in (7.27a,c) are independent of n (cf. §12.3.2 in [8, 10]).
The solvability of the undiscretised saddle-point problem is characterised by the
same conditions (7.27a-c), but with V0,n and Vn replaced by V0 = ker(B ∗ ) and V .
In the case of Stokes’ problem (7.26), a(·, ·) is coercive on the whole space V , but
for elasticity problems, one must exploit the fact that coercivity is only needed for
the smaller subspace V0 ⊂ V (cf. Braess [2, §III.4 and §VI]).
The following statement has a similar flavour as in Theorems 3.46 and 3.47.
(7.28), also the perturbed discretisation {An + δAn , fn + δfn }n∈N0 is stable.
The resulting solution ũn = un + δun (un : unperturbed discrete solution) satis-
fies the asymptotic inequality
Define the subset N00 := {n ∈ N0 : kδAn kX ∗ ←Xn ≤ 2C1stab }. Then, for n ∈ N00 , we
n
have
A−1 −1
n δAn Xn ←Xn ≤ 1/2 and k(I + An δAn )
−1
kXn ←Xn ≤ 2 (cf. Lemma
5.8), so that
while k(An +δAn )−1 kXn ←Xn∗ ≤ (Cstab +kδAn kXn∗ ←Xn ). Therefore, we conclude
that
δun = ũn − un = (An + δAn )−1 (fn + δfn ) − A−1n fn
= [(An + δAn )−1 − A−1
n ]fn + (An + δAn )
−1
δfn
R Another error treatment follows the idea of backward analysis. Let a(u, v) =
Ω
h∇v, c(x)∇ui dx be the bilinear form with a (piecewise) R smooth coefficient
c. For piecewise
R linear finite elements bi , the integrals ∆
h∇b R j i dx are
i , c(x)∇b
const · ∆ c(x)dx (∆: triangle). Let c∆ be the quadrature result of ∆ c(x)dx
and define a Rnew boundary value problem with piecewise constant coefficients
c̃(x) := c∆ / ∆ dx for x ∈ ∆. Then the finite element matrix An + δAn is the
exact matrix for the new bilinear form ã(·, ·).
Another formulation of the total error including (7.29) is given by the first Strang
lemma (cf. Braess [2, Part III, §1]): for coercive bilinear forms a(·, ·) and an (·, ·) and
right-hand sides f and fn let u and un be the respective solutions of
a(u, v) = f for all v ∈ X, an (un , vn ) = fn for all vn ∈ Xn ⊂ X.
Then
|a(un , wn ) − an (un , wn )|
ku − un kX ≤ C inf ku − vn kX + sup
vn ∈Xn wn ∈Xn kwn kX
|(f − fn )(wn )|
+ sup .
wn ∈Xn kwn kX
7.5 Finite Element Discretisation 159
The standard conforming finite element methods are characterised by the inclusion
Xn ⊂ X, which allows us to evaluate a(un , vn ) for un , vn ∈ Xn . This is also called
internal approximation. An external approximation uses a sequence of spaces Xn
containing also elements outside of X; i.e.,
Xn 6⊂ X.
where the subscript of an indicates that the bilinear form depends on n. The
associated operator An : Xn → Xn0 is defined by
An un = fn . (7.31)
Above we defined a vector space Xn , but we have not yet defined a norm. In
general, this norm depends on n:
X ⊂ U = U 0 ⊂ X0
(e.g., with U = L2 (Ω), cf. §6.3.3 in [8, 10]). If f ∈ U 0 and Xn ⊂ U, the functional
f is well defined on Xn and the variational formulation (7.33) makes sense (i.e., we
may choose fn := f ).
The restriction of f to a smaller subset U 0 $ X 0 implies that also the solution u
belongs to a subspace strictly smaller than X. We denote this space by X∗ . In the
case of f ∈ U 0 , this is
X∗ := {A−1 f : f ∈ U 0 } $ X.
with a constant C̄∗ . Without loss of generality, we may scale ||| · |||∗n such that
C̄∗ = 1.
The next requirement is that the bilinear form an (·, ·) can be continuously ex-
tended from Xn × Xn onto X∗n × Xn such that
Remark 7.25. Assume that (V, k·kV ) and (W, k·kW ) are Banach spaces with an
intersection V ∩ W possibly larger than the zero space {0}. Then a canoni-
cal norm of the smallest common superspace U := span(V, W ) is defined by
kukU := inf{kvkV + kwkW : u = v + w, v ∈ V, w ∈ W }.
Corollary 7.26. Assume that k·kV and k·kW are equivalent norms on V ∩W. Prove
that kvkV ≤ kvkU ≤ C kvkV for all v ∈ V and kwkW ≤ kwkU ≤ C kwkW for
all w ∈ W , where k·kU is the norm from Remark 7.25.
Consistency is expressed by
where u is the solution of a(u, v) = f (v) for v ∈ X (here, we assume that f belongs
to Xn0 ). Then the error estimate (Strang’s second lemma, cf. [18, §1.3])
7.5 Finite Element Discretisation 161
∗
|||u − un |||n ≤ C̄∗ + Ca Cstab inf |||u − wn |||∗n + Ca Cstab |||fn − f |||n
wn ∈Xn
∗
holds, where ||| · |||n is the norm dual to ||| · |||n .
The remarkable fact is that no conditions are required concerning the connection
of u|T 0 and u|T 00 for neighboured elements T 0 , T 00 ∈ Tn ; i.e., in general, func-
tions from X∗n are discontinuous across the internal boundaries of Tn . Obviously,
X∗n ⊃ X∗ = H01 (Ω) holds.
The DG finite element space is the subspace Xn of X∗n , where all u|T are,
e.g., piecewise linearly affine. The advantage of the discontinuous Galerkin method
becomes obvious when we want to choose different polynomial degrees for different
elements. Hence, the computational overhead for an hp-method is much lower.
Starting from a strong solution of −∆u = f in Ω with u = 0 on Γ = ∂Ω,
multiplication with a test function v ∈ Xn and partial integration in each T ∈ Tn
yield the variational form (7.36) with
Z X Z X Z ∂u
(−∆u) vn dx = h∇un , ∇vn i dx − [vn ] ds
Ω τ E ∂nE
τ ∈Tn E∈En
R
and fn (v) = Ω f vdx. Here, En is the set of all edges of τ ∈ Tn . Each edge E ∈ En
is associated with a normal nE (the sign of the direction does not matter). The curly
bracket {. . .} is the average of the expression in the two elements containing E,
while [. . .] is the difference (its sign depends on the direction of nE ). The right-
hand side in the latter expression is no longer symmetric. Since [u] = 0 for u ∈ X∗ ,
we may add further terms without changing the consistency property:
X Z
an (un , vn ) := h∇un , ∇vn i dx (7.37)
τ ∈Tn τ
X Z ∂un
∂vn
− [vn ] − [un ] ds
∂nE ∂nE
E∈En E
X Z
+η h−1
E [un ] [vn ] ds.
E∈En E
proves coercivity of an .
Other variants of the discontinuous Galerkin
n methods use a symmetric bilinear
P R n ∂un o ∂vn
o
form with the term E∈En E ∂nE [vn ] + ∂nE [un ] ds in the second line of
(7.37) (cf. [18, pp. 124ff]).
7.5 Finite Element Discretisation 163
1 c1
− +
h2 2h
and fails to be non-positive unless h ≤ 2/c1 . Because of the assumption kck 1,
the requirement h ≤ 2/c1 limits the use of the second-order scheme to rather small
step sizes.
This conflict between consistency and stability also occurs for finite element
discretisation (the corresponding modifications are, e.g., streamline diffusion
methods, cf. John–Maubach–Tobiska [12]).
164 7 Stability for Discretisations of Elliptic Problems
A possible remedy is the use of the defect correction scheme. It is based on two
different discretisations:
Lh uh = fh (7.39a)
is a stable scheme, possibly of a low order of consistency, whereas the second
scheme
L0h u0h = fh0 (7.39b)
has an order of higher consistency. Scheme (7.39b) is not required to be stable. It
may even be singular (i.e., a solution u0h may not exist).
The method consists of two steps. First, the basic scheme (7.39a) is applied to
(0)
obtain the starting value uh :
(0)
uh := L−1
h fh . (7.40a)
(0)
Next the defect of uh with respect to the second scheme is computed:
(0)
dh := L0h uh − fh0 . (7.40b)
Here L2h , Hh1 , Hh2 are the discrete analogues of L2h (Ω), Hh1 (Ω), Hh2 (Ω) (with
derivatives replaced by divided differences). Hh−1 is the dual space of Hh1 .
The last estimate of Lh − L0h in (7.41) can be obtained as follows. Split Lh − L0h
into
References 165
L0h I − RhX PhX − Rh0Y L − L0h RhX PhX + Rh0Y − RhY LPhX
and assume
0Y 0
I − RhX PhX
1 X
2 , Rh L − Lh Rh −1 ≤ Ch,
Hh ←Hh Hh ←H 2
0Y
Rh − RhY
−1 2 ,
RhY L − Lh RhX
−1
Hh ←L H ←H 2
≤ Ch,
0
X
h
kLh kH −1 ←H 1 , kLh kH −1 ←H 1 , kLkL2 ←H 2 ,
Ph
H 2 ←H 2 ≤ C.
h h h h h
(1)
Lemma 7.27. Under the suppositions (7.41), the result uh of the defect correction
(1)
method satisfies the estimate kuh − u∗h kHh1 ≤ Ch2 .
(1)
Proof. We may rewrite uh − u∗h as
h i
(1) (0) (0)
uh − u∗h = uh − u∗h − L−1h L0h uh − fh0
h (0) i
= I − L−1 0 ∗
+ L−1 0 ∗ 0
h L h u h − u h h [Lh uh − fh ]
h i
(0)
= L−1 0 ∗ −1 0 ∗
h [Lh − Lh ] uh − uh + Lh [Lh uh − fh ] .
0
Similarly,
(0)
kuh − u∗h kHh2 =
L−1 ∗
−1
2 2 kLh u∗h − fh k 2 ≤ Ch
h fh − uh H 2 ≤ Lh
H ←L
h h
L h h
for the first term. Together, the assertion of the lemma follows. t
u
The defect correction method was introduced by Stetter [23]. A further descrip-
tion can be found in Hackbusch [9, §14.2]. The particular case of diffusion with
dominant convection is analysed by Hemker [11].
The application of the defect correction method is not restricted to elliptic
problems. An application to a hyperbolic initial-value problem can be found in [5].
References
1. Aubin, J.P.: Behaviour of the error of the approximate solution of boundary value problems
for linear operators by Galerkin’s and finite difference methods. Ann. Scuola Norm. Sup. Pisa
21, 599–637 (1967)
166 7 Stability for Discretisations of Elliptic Problems
2. Braess, D.: Finite Elements: Theory, Fast Solvers, and Applications in Solid Mechanics,
3rd ed. Cambridge University Press, Cambridge (2007)
3. Cockburn, B., Karniadakis, G.E., Shu, C.W. (eds.): Discontinuous Galerkin Methods.
Theory, Computation and Applications, Lect. Notes Comput. Sci. Eng., Vol. 11. Springer,
Berlin (2000).
4. Fréchet, M.: Sur les ensembles de fonctions et les opérations linéaires. C. R. Acad. Sci. Paris
144, 1414–1416 (1907)
5. Hackbusch, W.: Bemerkungen zur iterativen Defektkorrektur und zu ihrer Kombination mit
Mehrgitterverfahren. Rev. Roumaine Math. Pures Appl. 26, 1319–1329 (1981)
6. Hackbusch, W.: On the regularity of difference schemes. Ark. Mat. 19, 71–95 (1981)
7. Hackbusch, W.: Iterative Solution of Large Sparse Systems of Equations. Springer, New York
(1994)
8. Hackbusch, W.: Elliptic differential equations. Theory and numerical treatment, Springer
Series in Computational Mathematics, Vol. 18, 2nd ed. Springer, Berlin (2003)
9. Hackbusch, W.: Multi-grid methods and applications, Springer Series in Computational
Mathematics, Vol. 4. Springer, Berlin (2003)
10. Hackbusch, W.: Theorie und Numerik elliptischer Differentialgleichungen, 3rd ed. (2005).
www.mis.mpg.de/scicomp/Fulltext/EllDgl.ps
11. Hemker, P.W.: Mixed defect correction iteration for the solution of a singular perturbation
problem. Comput. Suppl. 5, 123–145 (1984)
12. John, V., Maubach, J.M., Tobiska, L.: Nonconforming streamline-diffusion-finite-element-
methods for convection diffusion problems. Numer. Math. 78, 165–188 (1997)
13. Jovanovič, B., Süli, E.: Analysis of finite difference schemes for linear partial differential
equations with generalized solutions, Springer Series in Computational Mathematics, Vol. 45.
Springer, London (2013)
14. Kanschat, G.: Discontinuous Galerkin methods for viscous incompressible flows. Advances
in Numerical Mathematics. Deutscher Universitätsverlag, Wiesbaden (2007)
15. Nečas, J.: Sur la coercivité des formes sesquilinéares elliptiques. Rev. Roumaine Math. Pures
Appl. 9, 47–69 (1964)
16. Nitsche, J.: Ein Kriterium für die Quasi-Optimalität des Ritzschen Verfahrens. Numer. Math.
11, 346–348 (1968)
17. Ostrowski, A.M.: Über die Determinanten mit überwiegender Hauptdiagonale. Comment.
Math. Helv. 10, 69–96 (1937)
18. Di Pietro, D.A., Ern, A.: Mathematical aspects of discontinuous Galerkin methods. Springer,
Berlin (2011)
19. Riesz, F.: Sur une espèce de géométrie analytique des systèmes de fonctions sommables.
C. R. Acad. Sci. Paris 144, 1409–1411 (1907)
20. Riesz, F., Sz.-Nagy, B.: Functional Analysis. Dover Publ. Inc, New York (1990)
21. Roos, H.G., Stynes, M., Tobiska, L.: Numerical methods for singularly perturbed differential
equations: Convection-diffusion and flow problems, Springer Series in Computational Math-
ematics, Vol. 24. Springer, Berlin (1996)
22. Shi, Z.C.: A convergence condition for the quadrilateral Wilson element. Numer. Math. 44,
349–361 (1984)
23. Stetter, H.J.: The defect correction principle and discretization methods. Numer. Math. 29,
425–443 (1978)
24. Stummel, F.: The generalized patch test. SIAM J. Numer. Anal. 16, 449–471 (1979)
25. Stummel, F.: The limitations of the patch test. Int. J. Num. Meth. Engng. 15, 177–188 (1980)
Chapter 8
Stability for Discretisations of Integral
Equations
Since the paper of Nyström [7], integral equations are used to solve certain boundary
value problems. Concerning the integral equation method and its discretisation, we
refer to Sauter–Schwab [8] and Hsiao–Wendland [5]. The following considerations
of stability hold as long as the integral kernel is weakly singular.
λf = g + Kf, (8.1)
where λ ∈ C\{0} and the function g are given together with an integral operator K
defined by Z
(Kf ) (x) = k(x, y)f (y)dy for all x ∈ D. (8.2)
D
If λ = 0, Eq. (8.1) is called Fredholm’s integral equation of the first kind (cf. Fred-
holm [3]). The integration domain D is an (often compact) subset of Rn . Other
interesting applications lead to surface integrals; i.e., D = ∂Ω of some Ω ⊂ Rn .
The function g belongs to a Banach space X with norm k·k. The solution f of
(8.1) is also sought in X: g, f ∈ X. The Banach space X must be chosen such that
K ∈ L(X, X)
Z Z ≤1
z }| {
|g(ξ) − g(x)| = [k(ξ, y) − k(x, y)] f (y)dy ≤
|k(ξ, y) − k(x, y)| |f (y)|dy
D D
≤ Φ(ξ, x) ≤ ε for |ξ − x| ≤ δ,
8.1.2 Discretisations
λfn = g + Kn fn (8.4a)
or λfn = gn + Kn fn , (8.4b)
where in the latter case
gn → g (8.4c)
is assumed.
Because the right-hand sides in (8.4a,b) are contained in an at most (n + 1)-
dimensional subspace, λ 6= 0 implies that fn belongs to this subspace. This leads to
a system of finitely many linear equations.
leading to
Z n
X Z
Kn f = kn (·, y)f (y)dy = aj (·) bj (y)f (y)dy.
D j=1 D
Remark 8.4. Suppose that Kn has a finite-dimensional range. Then kK−Kn k→0
implies that K, Kn : X → X are compact operators.
Proof. We recall two properties of compact operators: (i) finite dimension range
implies compactness, (ii) a limit of compact operators is compact. From (i) we
infer that Kn is compact, while (ii) proves that K = lim Kn is compact. t
u
Proof of Lemma 8.1. It is sufficient to verify the supposition of Exercise 8.3.
For Part (b) involving X = C(D), use Weierstrass’ approximation theorem. Since
D × D is compact, there is a sequence of polynomials Pn (x, y) of degree ≤ n − 1
such that kPn − kkC(D×D) → 0. Since kn := Pn has a representation (8.5) with
aj (x) = xj and a polynomial bj (y) of degree ≤ n − 1, Exercise 8.3 can be applied.
Proof. In the Hilbert case of X = L2 (D), K possesses an infinite singular value
decomposition2
X∞
K= σj aj b∗j with orthonormal {aj }, {bj } ⊂ X,
j=1
2 P∞
and satisfies kK − Kn k = j=n+1 σj2 → 0. t
u
Since we need point evaluations, the Banach space X = C(D) must be used.
172 8 Stability for Discretisations of Integral Equations
8.2.1 Consistency
Exercise 8.6. (a) Prove that {Kn } is consistent with respect to K if and only if
(i) Kn ϕ → Kϕ for all ϕ ∈ M ⊂ X and some M dense in X, and
(ii) supn kKn k < ∞.
(b) Assuming that Kn ϕ is a Cauchy sequence and supn kKn k < ∞, define K by
Kϕ := lim Kn ϕ and prove that K ∈ L(X, X); i.e., {Kn } is consistent with respect
to K.
(c) Operator norm convergence kK − Kn k → 0 is sufficient for consistency.
8.2.2 Stability
Stability refers to the value λ 6= 0 from problem (8.1). This λ is assumed to be fixed.
Otherwise, one has to use the term ‘stable with respect to λ’.
Definition 8.7 (stability). {Kn } is called stable if there exist some n0 ∈ N and
Cstab ∈ R such that 4
−1
k (λI − Kn ) k ≤ Cstab for all n ≥ n0 .
−1
If (λI − K) ∈ L(X, X), the inverse exists also for perturbations of K.
−1
Remark 8.8. Suppose (λI − K) ∈ L(X, X) and
−1
kK − Kn k < 1/k (λI − K) k.
Then
−1
−1 k (λI − K) k
k (λI − Kn ) k≤ −1 .
1 − k (λI − K) k kK − Kn k
Proof. Apply Lemma 5.8 with T := λI − K and S := λI − Kn . t
u
3
In Definition 4.5, consistency involves Kn ϕ → Kϕ only for ϕ from a dense subset X0 ⊂ X.
Then the full statement (8.9) could be obtained from stability: supn kKn k < ∞. Here, stability
will be defined differently. This is the reason to define consistency as in (8.9).
4
k (λI − K)−1 k ≤ Cstab is the short notation for ‘λI − K ∈ L(X, X) is bijective and the
inverse satisfies k (λI − K)−1 k ≤ Cstab ’.
8.2 Stability Theory 173
−1
Exercise 8.10. If kK − Kn k < 1/k (λI − Kn ) k holds for all n ≥ n0 , then the
−1
inverse (λI − K) ∈ L(X, X) exists. The assumption is, in particular, valid, if
kK − Kn k → 0 and {Kn } is stable.
8.2.3 Convergence
The next remark shows that f := limn fn satisfies the continuous problem. The
−1
existence of (λI − K) is left open, but will follow later from Theorem 8.16.
Proof. (i) By Exercise 8.6a, C := supn kKn k < ∞ holds. The solutions fn exist
for n ≥ n0 and define some f := limn fn . Consistency shows that
A particular result of the Riesz–Schauder theory (cf. Yosida [10, §X.5]) is the
following result.
Exercise 8.14. Let K be compact and {Kn } be consistent and convergent. Prove
−1
(λI − K) ∈ L(X, X).
Lemma 8.15. If {Kn } is stable and consistent, then the operator λI−K is injective.
174 8 Stability for Discretisations of Integral Equations
Proof. Injectivity follows from k(λI − K) ϕk ≥ η kϕk for some η > 0 and all
ϕ ∈ X. For an indirect proof assume that there is a sequence ϕn ∈ X with
8.2.4 Equivalence
dn := λf − Kn f − gn ,
Theorem 8.18 (equivalence theorem). Suppose consistency and one of the condi-
tions (i) or (ii) from Theorem 8.17. Then stability and convergence are equivalent.
Furthermore, gn → g implies
−1 −1
fn = (λI − Kn ) gn → f = (λI − K) g.
−1
Remark 8.19. The suppositions (λI − K) ∈ L(X, X) and kK − Kn k → 0
imply consistency, stability, and convergence.
Proof. Exercise 8.6c shows consistency. Remark 8.9 yields stability. Finally,
convergence is ensured by Theorem 8.17a. t
u
We recall that only the Banach space X = C(D) (or a Banach space with even
stronger topology) makes sense. Therefore, throughout this section X = C(D) is
chosen.
Lemma 8.24. Assume that D is compact and k ∈ C(D×D). Then kK−Kn k ≥ kKk.
R
Proof. The operator norm kKk can be shown to be equal to supx∈D D |k(x, y)| dy.
Since D is compact,
R the supremum is a minimum; i.e., there is some ξ ∈ D
with kKk = D |k(ξ, y)| dy. For any ε > 0, oneR can construct a function
ϕε ∈ X = C(D) such that kϕε k = 1, kKϕε k ≥ D |k(ξ, y)| dy − ε, and in
addition, ϕε (ξk,n ) = 0 for 1 ≤ k ≤ n. The latter property implies Kn ϕε = 0, so
that kK − Kn k ≥ k(K − Kn )ϕε k = kKϕε k ≥ kKk − ε. As ε > 0 is arbitrary,
the assertion follows. t u
The statement of the lemma shows that we cannot use the argument that the
operator norm convergence kK − Kn k → 0 proves the desired properties.
It will turn out that instead of K − Kn , the products (K − Kn )K and
(K −Kn )Kn may still converge to zero. The next theorem of Brakhage [2] proves
the main step. Here we use the operators S, T ∈ L(X, X) which replace K and Kn
(for a fixed n). In this theorem, X may be any Banach space.
Theorem 8.25. Let X be a Banach space and λ 6= 0. Suppose that the operators
−1
S, T , (λI − S) belong to L(X, X), and that T is compact. Under the condition
−1
k(T − S) T k < |λ| /k (λI − S) k, (8.12a)
−1
also (λI − T ) ∈ L(X, X) exists and satisfies
−1
−1 1 + k (λI − S) k kT k
k (λI − T ) k≤ −1 . (8.12b)
|λ| − k (λI − S) k k(T − S) T k
−1
kfS − fT k ≤ k (λI − T ) k k(T − S) fS k . (8.12d)
−1
Proof. (i) If (λI − T ) ∈ L(X, X), the identity I = λ1 [(λI − T ) + T ] leads to
−1 −1 −1
(λI − T ) = λ1 [I + (λI − T ) T ]. We replace (λI − T ) on the right-hand
−1
side by the inverse (λI − S) , whose existence is assumed, and define
1 −1
A := [I + (λI − S) T ].
λ
B := A (λI − T ) should approximate the identity:
8.4 Stability Theory for Nyström’s Method 177
−1 −1
B = λ1 [I + (λI − S) T ] (λI − T ) = I − λ1 [T − (λI − S) T (λI − T )]
1 −1
=I− λ (λI − S) [(λI − S) T − T (λI − T )]
1 −1
=I− λ (λI − S) (T − S) T.
By assumption,
−1 −1
k λ1 (λI − S) (T − S) T k ≤ k (λI − S) k k(T − S) T k / |λ| < 1
−1 −1
(cf. Lemma 5.8). Together with kI + (λI − S) T k ≤ 1 + k (λI − S) k kT k,
we obtain inequality (8.12b).
(iii) Subtracting (λI − T ) fT = g from (λI − S) fS = g, we obtain the expres-
sion λ(fS − fT ) = SfS − T fT = T (fS − fT ) + (S − T )fS and
−1
fS − fT = (λI − T ) (S − T )fS .
1 −1
fT − fS = λ (λI − S) (T − S)(g + T fT )
1 −1 1 −1
= λ (λI − S) (T − S)(g + T fS ) + λ (λI −S) (T −S)T (fT −fS ).
This proves
h i−1
−1 1 −1
fT − fS = λI − (λI − S) (T − S) T λ (λI − S) (T − S)(g + T fS ).
The key assumption of Theorem 8.25 is that k(T − S) T k is small enough. Set
S = K and replace the fixed operator T by a sequence {Kn }. Then we have to take
care that (K − Kn )Kn → 0. This will be achieved by the following definition of
collectively compact operators (cf. Anselone [1]).
178 8 Stability for Discretisations of Integral Equations
Proof. (i) Let M be the set defined in (8.13). Part (a) follows immediately, since
{Kn ϕ : ϕ ∈ X, kϕk ≤ 1} ⊂ M for any n.
(ii) As Kϕ = lim Kn ϕ, it belongs to the closure M which is compact. Therefore,
K is compact.
(iii) {Kn } is uniformly bounded because of Corollary 3.39.
(iv) The image of E := {ϕ ∈ X, kϕk ≤ 1} under K or Kn is contained in the
compact set M . Hence, Lemma 3.49 and the definition of the operator norm prove
that k(K − Kn )Kk = supϕ∈E k(K − Kn )Kϕk ≤ supψ∈M k(K − Kn )ψk → 0
and similarly for (K − Kn )Kn . t u
Lemma 8.31. Suppose that {An } is consistent, while {Bn } is collectively compact.
Then {An Bn } is collectively compact.
gk = Bnk ϕk → g.
Next, two cases must be distinguished: (A) sup nk < ∞ and (B) sup nk = ∞.
180 8 Stability for Discretisations of Integral Equations
Combining Theorem 8.25 and Lemma 8.29, we obtain the following main result.
−1
Theorem 8.32. Let λ 6= 0 satisfy5 (λI − K) ∈ L(X, X). Suppose that {Kn } is
consistent with respect to K and collectively compact. Then {Kn } is stable and
convergent. Furthermore, there is some n0 ∈ N such that for all n ≥ n0 the
following statements hold:
−1
−1 1 + k (λI − K) k kKn k
k (λI − Kn ) k≤ −1 (n ≥ n0 );
|λ| − k (λI − K) k k(K − Kn )Kn k
−1 −1
the solutions f = (λI − K) g and fn = (λI − Kn ) g satisfy
5
In Lemma 8.40 we shall show that this assumption is necessary.
8.6 Application to Eigenvalue Problems 181
So far, we have required that (λI − K)−1 and (λI − Kn )−1 , at least for n ≥ n0 ,
exist. Then, λ ∈ C is called a regular value of K and Kn . The singular values of K
[Kn ] are those for which λI − K [λI − Kn ] is not bijective. The Riesz–Schauder
theory states that for compact K, all singular values are either λ = 0 or eigenvalues,
which means that there is an eigenfunction f such that
λf = Kf (0 6= f ∈ X). (8.14a)
λn fn = Kn fn (0 6= fn ∈ X, n ∈ N). (8.14b)
The sets of all singular values are the spectra σ = σ(K) and σn = σ(Kn ), respec-
tively. The Riesz–Schauder theory states the following.
Theorem 8.37. Suppose that {Kn } is consistent with respect to K and collectively
compact. Let {λn } be a sequence of eigenvalues of (8.14b) with corresponding
eigenfunction fn ∈ X normalised by kfn k = 1. Then, there exists a subsequence
{nk } such that either λnk → 0 or λnk → λ ∈ σ. In the latter case, the subsequence
can be chosen such that fnk → f with f being an eigenfunction of (8.14a).
The statement of TheoremS 8.37 about the eigenvalues can be abbreviated by ‘the
set of accumulation points of n∈N σn is contained in σ’. NowSwe prove the reverse
direction: ‘σ is contained in the set of accumulation points of n∈N σn ’.
Theorem 8.38. Suppose that {Kn } is consistent with respect to K and collectively
compact. Then, for any 0 6= λ ∈ σ there is a sequence {λn } of eigenvalues from
(8.14b) with λ = limn∈N λn . Again, a subsequence can be chosen so that the eigen-
functions fnk from (8.14b) converge to an eigenfunction of (8.14a).
Exercise 8.39. Prove that κn is not only continuous, but also satisfies the Lipschitz
property |κn (λ) − κn (µ)| ≤ |λ − µ| for all λ, µ ∈ C.
(λI − Kn ) fn = gn , kfn k = 1, gn → 0.
The inequality
∗
|F (z)| = |Φ((zI − Kn )−1 ϕ)| ≤ kΦkX
(zI − Kn )−1 ϕ
(λI − Kn )−1
≤
1
min{κn (z) : z ∈ ∂Ω}
follows, which is equivalent to min{κn (z) : z ∈ ∂Ω} ≤ κn (λ); i.e., the minimum
is taken on ∂Ω. t u
Now, we give the proof of Theorem 8.38.
(i) K is compact (cf. Remark 8.29b), and each eigenvalue 0 6= λ ∈ σ is isolated
(Remark 8.36). Choose a complex neighbourhood Ω of λ such that
0∈
/ Ω ⊂ C, Ω ∩ σ = {λ}, λ ∈ Ω\∂Ω, Ω compact. (8.15)
References