Lecture Notes For Math-CSE 451: Introduction To Numerical Computation
Lecture Notes For Math-CSE 451: Introduction To Numerical Computation
Wen Shen
2011
2
NB! These notes are used by myself in the course. They are provided to students as a
supplement to the textbook. They can not substitute the textbook.
Textbook:
Numerical Mathematics and Computing,
by Ward Cheney and David Kincaid, published by Brooks/Cole publishing Company,
2000. ISBN 0-534-35184-0
Contents
1 Computer arithmetic 9
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Representation of numbers in different bases . . . . . . . . . . . . . . . . 10
1.3 Floating point representation . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Loss of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Review of Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Polynomial interpolation 19
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Lagrange interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Newton’s divided differences . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Errors in Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . 25
2.5 Numerical differentiations . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Splines 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 First degree and second degree splines . . . . . . . . . . . . . . . . . . . 32
3.3 Natural cubic splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Numerical integration 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Trapezoid rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Simpson’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Recursive trapezoid rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Romberg Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Adaptive Simpson’s quadrature scheme . . . . . . . . . . . . . . . . . . 47
3
4 CONTENTS
8 Least Squares 81
8.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.2 Linear regression and basic derivation . . . . . . . . . . . . . . . . . . . 81
8.3 LSM with parabola . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.4 LSM with non-polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.5 General linear LSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.6 Non-linear LSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9 ODEs 89
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
CONTENTS 5
7.1 Splitting of A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7
8 LIST OF FIGURES
Chapter 1
Computer arithmetic
1.1 Introduction
What are numeric methods? They are algorithms that compute approximations to
solutions of equations or similar things.
Such algorithms should be implemented (programmed) on a computer.
physical model
Y
verification
? physical explanation of the results
mathematical model
presentation of results
* visualization
R
solve the model with
numerical methods
I
• development of algorithms;
• implementation;
9
10 CHAPTER 1. COMPUTER ARITHMETIC
8: octal;
• etc...
Observe
(1)8 = (1)2
(2)8 = (10)2
(3)8 = (11)2
(4)8 = (100)2
(5)8 = (101)2
(6)8 = (110)2
(7)8 = (111)2
(8)8 = (1000)2
Then,
(5034)8 = (|{z}
101 |{z}
000 |{z}
011 |{z}
100 )2
5 0 3 4
and
(110 010 111 001)2 = (|{z}
6 2
|{z} 7
|{z} 1 )8
|{z}
110 010 111 001
fl(x) = x · (1 + δ)
fl(x) − x
relative error: = =δ
x
1.4. LOSS OF SIGNIFICANCE 13
|δ| ≤ ε, where ε is called machine epsilon, which represents the smallest positive number
detectable by the computer, such that fl(1 + ε) > 1.
In a 32-bit computer: ε = 2−23 .
Example 1. Addition, z = x + y.
Let
fl(x) = x(1 + δx ), fl(y) = y(1 + δy )
Then
x = 0.d1 d2 d3 · · · d8 × 10−a
If b1 = d1 , b2 = d2 , b3 = d3 , then
x − y = 0.000c4 c5 c6 c7 c8 × 10−a
1 p
r1,2 = −b ± b2 − 4ac
2a
In our case, we have √
x1,2 = 20 ± 398 ≈ 20 ± 19.95
so
x1 ≈ 20 + 19.95 = 39.95, (OK)
c 2
x2 = = ≈ 0.05006
ax1 1 · 39.95
1 ′′ 1
f (x) = f (c) + f ′ (c)(x − c) + f (c)(x − c)2 + f ′′′ (c)(x − c)3 + · · ·
2! 3!
or using the summation sign
∞
X 1 (k)
f (x) = f (c)(x − c)k .
k!
k=0
f (x)
6 ξ
- x
a b
This implies
f (b) − f (a)
f ′ (ξ) =
b−a
So, if a, b are close to each other, this can be used as an approximation for f ′ .
Given h > 0 very small, we have
f (x + h) − f (x)
f ′ (x) ≈
h
f (x) − f (x − h)
f ′ (x) ≈
h
f (x + h) − f (x − h)
f ′ (x) ≈
2h
1.5. REVIEW OF TAYLOR SERIES 17
where
∞
X 1 (k) 1
En+1 = f (x)hk = f (n+1) (ξ)hn+1
k! (n + 1)!
k=n+1
Polynomial interpolation
2.1 Introduction
Problem description:
Given (n + 1) points, (xi , yi ), i = 0, 1, 2, · · · , n, with distinct xi such that
Pn (x) = a0 + a1 x + a2 x2 + · · · + an xn
Pn (xi ) = yi , i = 0, 1, 2, · · · , n
19
20 CHAPTER 2. POLYNOMIAL INTERPOLATION
Then
x = 0, y = 1 : P2 (0) = a0 = 1
x = 1, y = 0 : P2 (1) = a0 + a1 + a2 = 0
x = 2/3, y = 0.5 : P2 (2/3) = a0 + (2/3)a1 + (4/9)a2 = 0.5
In matrix-vector form
1 0 0 a0 1
1 1 1 a1 = 0
1 23 4
9 a2 0.5
Easy to solve in Matlab (homework 1)
a0 = 1, a1 = −1/4, a2 = −3/4.
Then
1 3
P2 (x) = 1 − x − x2 .
4 4
Pn (xi ) = yi , i = 0, 1, 2, · · · , n
In matrix-vector form
1 x0 x20 · · · xn0 a0 y0
1 x1 x2 · · · n
x1 a1 y1
1
.. .. .. . . .. .. = ..
. . . . . . .
2
1 xn xn · · · xnn an yn
or
X~a = ~y
where
x : (n + 1) × (n + 1) matrix, given, (van der Monde matrix)
~a : unknown vector, (n + 1)
~y : given vector, (n + 1)
2.2. LAGRANGE INTERPOLATION 21
• Lagrange polynmial
One can easily check that li (xi ) = 1 and li (xj ) = 0 for i 6= j, i.e., li (xj ) = δij .
xi 0 1 2/3
yi 1 0 0.5
Answer. We have
x − 2/3 x − 1 3 2
l0 (x) = · = (x − )(x − 1)
0 − 2/3 0 − 1 2 3
x−0 x−1 9
l1 (x) = · = − x(x − 1)
2/3 − 0 2/3 − 1 2
x − 0 x − 2/3 2
l2 (x) = · = 3x(x − )
1 − 0 1 − 2/3 3
so
• Not flexible: if one changes a points xj , or add on an additional point xn+1 , one
must re-compute all li ’s. (-)
n=0 : P0 (x) = y0
Assume that Pn−1 (x) interpolates (xi , yi ) for i = 0, 1, · · · , n − 1. We will find Pn (x) that
interpolates (xi , yi ) for i = 0, 1, · · · , n, in the form
where
yn − Pn−1 (xn )
an =
(xn − x0 )(xn − x1 ) · · · (xn − xn−1 )
Check by yourself that such polynomial does the interpolating job!
Newtons’ form:
pn (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )(x − x1 ) + · · ·
+an (x − x0 )(x − x1 ) · · · (x − xn−1 )
Example : Use Newton’s divided difference to write the polynomial that interpolates
the data
xi 0 1 2/3 1/3
yi 1 0 1/2 0.866
So
P3 (x) = 1 + -1 x + -0.75 x(x − 1) + 0.4413 x(x − 1)(x − 2/3).
Nested form:
• p = an
• for k = n − 1, n − 1, · · · , 0
– p = p(x − xk ) + ak
• end
p(xi ) = yi , q(xi ) = yi , i = 0, 1, · · · , n
Now, let g(x) = p(x) − q(x), which will be a polynomial of degree ≤ n. Furthermore,
we have
g(xi ) = p(xi ) − q(xi ) = yi − yi = 0, i = 0, 1, · · · , n
So g(x) has n + 1 zeros. We must have g(x) ≡ 0, therefore p(x) ≡ q(x).
2.4. ERRORS IN POLYNOMIAL INTERPOLATION 25
Pn (xi ) = f (xi ), i = 0, 1, · · · , n
and a constant
f (a) − Pn (a)
c= ,
W (a)
and another function
ϕ(x) = f (x) − Pn (x) − cW (x).
Now we find all the zeros for this function ϕ:
and
ϕ(a) = f (a) − Pn (a) − cW (a) = 0
So, ϕ has at least (n + 2) zeros.
Here goes our deduction:
ϕ(x) has at least n+2 zeros.
ϕ′ (x) has at least n+1 zeros.
ϕ′′ (x) has at least n zeros.
..
.
ϕ(n+1) (x) has at least 1 zero. Call it ξ.
26 CHAPTER 2. POLYNOMIAL INTERPOLATION
So we have
ϕ(n+1) (ξ) = f (n+1) (ξ) − 0 − cW (n+1) (ξ) = 0.
Use
W (n+1) = (n + 1)!
we get
f (a) − Pn (a)
f (n+1) (ξ) = cW (n+1) (ξ) = (n + 1)!.
W (a)
Change a into x, we get
n
1 1 Y
e(x) = f (x) − Pn (x) = f (n+1) (ξ)W (x) = f (n+1) (ξ) (x − xi ).
(n + 1)! (n + 1)!
i=0
Example n = 1, x0 = a, x1 = b, b > a.
We have an upper bound for the error, for x ∈ [a, b],
1 ′′ 1
(b − a)2 1
|e(x)| = f (ξ) · |(x − a)(x − b)| ≤
f ′′
∞ =
f ′′
∞ (b − a)2 .
2 2 4 8
Uniform nodes: equally distribute the space. Consider an interval [a, b], and we
distribute n + 1 nodes uniformly as
b−a
xi = a + ih, h= , i = 0, 1, · · · , n.
n
One can show that
n
Y 1 n+1
|x − xi | ≤ h · n!
4
i=0
(Try to prove it!)
This gives the error estimate
1 Mn+1 n+1
(n+1) n+1
|e(x)| ≤ f (x) h ≤ h
4(n + 1) 4(n + 1)
where
Mn+1 = max f (n+1) (x) .
x∈[a,b]
Example Consider interpolating f (x) = sin(πx) with polynomial on the interval [−1, 1]
with uniform nodes. Give an upper bound for error, and show how it is related with
total number of nodes with some numerical simulations.
2.4. ERRORS IN POLYNOMIAL INTERPOLATION 27
Answer. We have
(n+1)
f (x) ≤ π n+1
so the upper bound for error is
n+1
π n+1 2
|e(x)| = |f (x) − Pn (x)| ≤ .
4(n + 1) n
Problem with uniform nodes: peak of errors near the boundaries. See plots.
Example Consider the same example with uniform nodes, f (x) = sin πx. With Cheby-
shev nodes, we have
1
|e(x)| ≤ π n+1 2−n .
(n + 1)!
The corresponding table for errors:
Type II: Chebyshev nodes can be chosen strictly inside the interval [a, b]:
1 1 2i + 1
x̄i = (a + b) + (b − a) cos( π), i = 0, 1, · · · , n
2 2 2n + 2
As a consequence, we have:
1 (n)
f [x0 , x1 , · · · , xn ] = f (ξ), ξ ∈ [a, b].
n!
Proof. Let Pn−1 (x) interpolate f (x) at x0 , · · · , xn−1 . The error formula gives
n
1 (n) Y
f (xn ) − Pn−1 (xn ) = f (ξ) (xn − xi ), ξ ∈ (a, b).
n!
i=0
1 1
f [x0 , x1 , x2 ] = 2
[f (x + h) − 2f (x) + f (x + h)] = f ′′ (ξ), ξ ∈ [x − h, x + h].
2h 2
Finite difference:
1
(1) f ′ (x) ≈ (f (x + h) − f (x))
h
1
(2) f ′ (x) ≈ (f (x) − f (x − h))
h
1
(3) f ′ (x) ≈ (f (x + h) − f (x − h)) (central difference)
2h
1
f ′′ (x) ≈ (f (x + h) − 2f (x) + f (x − h))
h2
f (x)
6 (1) (3)
f ′ (x)
(2)
- x
x−h x x+h
1 1 3 ′′′
f (x + h) = f (x) + hf ′ (x) + h2 f ′′ (x) + h f (x) + O(h4 )
2 6
1 2 ′′ 1 3 ′′′
′
f (x − h) = f (x) − hf (x) + h f (x) − h f (x) + O(h4 )
2 6
Then,
f (x + h) − f (x) 1
= f ′ (x) + hf ′′ (x) + O(h2 ) = f ′ (x) + O(h), (1st order)
h 2
similarly
f (x) − f (x − h) 1
= f ′ (x) − hf ′′ (x) + O(h2 ) = f ′ (x) + O(h), (1st order)
h 2
and
f (x + h) − f (x − h) 1
= f ′ (x) − h2 f ′′′ (x) + O(h2 ) = f ′ (x) + O(h2 ), (2nd order)
2h 6
finally
f (x + h) − 2f (x) + f (x − h) 1
2
= f ′′ (x)+ h2 f (4) (x)+O(h4 ) = f ′′ (x)+O(h2 ), (2nd order)
h 12
Piece-wise polynomial
interpolation. Splines
3.1 Introduction
Usage:
Requirement:
• interpolation
• no convergence result;
x t0 t1 · · · tn
y y0 y1 · · · yn
31
32 CHAPTER 3. SPLINES
Requirements:
S0 (t0 ) = y0
Si−1 (ti ) = Si (ti ) = yi , i = 1, 2, · · · , n − 1
Sn−1 (tn ) = yn .
Easy to find: write the equation for a line through two points: (ti , yi ) and (ti+1 , yi+1 ),
yi+1 − yi
Si (x) = yi + (x − ti ), i = 0, 1, · · · , n − 1.
ti+1 − ti
3.3. NATURAL CUBIC SPLINES 33
S(x) 6 y1
×
y2
× y3
y0 ×
×
-
x
t0 t1 t2 t3
Accuracy Theorem for linear splines: Assume t0 < t1 < t2 < · · · < tn , and let
h = max(ti+1 − ti )
i
Let f (x) be a given function, and let S(x) be a linear spline that interpolates f (x) s.t.
S(ti ) = f (ti ), i = 0, 1, · · · , n
1
|f (x) − S(x)| ≤ h max f ′ (x) .
2 x
1
|f (x) − S(x)| ≤ h2 max f ′′ (x) .
8 x
Write
Si (x) = ai x3 + bi x2 + ci x + di , i = 0, 1, · · · , n − 1
Total number of unknowns= 4 · n.
Equations we have
equation number
(1) Si (ti ) = yi , i = 0, 1, · · · , n − 1 n
(2) Si (ti+1 ) = yi+1 , i = 0, 1, · · · , n − 1 n
(3) Si′ (ti+1 ) = ′ (t ),
Si+1 i i = 0, 1, · · · , n − 2 n−1 total = 4n.
(4) Si′′ (ti+1 ) = Si+1
′′ (t ),
i i = 0, 1, · · · , n − 2 n−1
(5) S0′′ (t0 ) = 0, 1
′′ (t ) = 0,
(6) Sn−1 1.
n
• Start with Si′′ (x), they are all linear, one can use Lagrange form,
• Integrate Si′′ (x) twice to get Si (x), you will get 2 integration constant
• Determine these constants by (2) and (1). Various tricks on the way...
Details: Define zi as
zi = S ′′ (ti ), i = 1, 2, · · · , n − 1, z0 = zn = 0
Interpolating properties:
(1). Si (ti ) = yi gives
zi 1 yi hi
yi = − (−hi )3 − Di (−hi ) = zi h2i + Di hi ⇒ Di = − zi
6hi 6 hi 6
We see that, once zi ’s are known, then (Ci , Di )’s are known, and so Si , Si′ are known.
zi+1 3 zi 3 yi+1 hi
Si (x) = (x − ti ) − (x − ti+1 ) + − zi+1 (x − ti )
6hi 6hi hi 6
yi hi
− − zi (x − ti+1 ).
hi 6
z i+1 zi yi+1 − yi zi+1 − zi
Si′ (x) = (x − ti )2 − (x − ti+1 )2 − hi .
2hi 2hi hi 6
How to compute zi ’s? Last condition that’s not used yet: continuity of S ′ (x), i.e.,
′
Si−1 (ti ) = Si′ (ti ), i = 1, 2, · · · , n − 1
We have
zi yi+1 − yi zi+1 − zi
Si′ (ti ) = − (−hi )2 + − hi
2hi hi 6
| {z }
bi
1 1
= − hi zi+1 − hi zi + bi
6 3
′ 1 1
Si−1 (ti ) = zi−1 hi−1 + zi hi−1 + bi−1
6 3
Set them equal to each other, we get
hi−1 zi−1 + 2(hi−1 + hi )zi + hi zi+1 = 6(bi − bi−1 ), i = 1, 2, · · · , n − 1
z0 = zn = 0.
In matrix-vector form:
H · ~z = ~b
where
2(h0 + h1 ) h
h1 2(h1 + h2 ) h2
h2 2(h2 + h3 ) h3
H= .. .. ..
. . .
hn−3 2(hn−3 + hn−2 ) hn−2
hn−2 2(hn−2 + hn−1 )
36 CHAPTER 3. SPLINES
and
z1 6(b1 − b0 )
z2
6(b2 − b1 )
z3 6(b3 − b2 )
~z =
,
.. ~b =
...
.
.
zn−2 6(bn−2 − bn−3 )
zn−1 6(bn−1 − bn−2 )
then Z Z
b 2 b 2
′′
S (x) dx ≤ f ′′ (x) dx.
a a
R
Note that (f ′′ )2 is related to the curvature of f .
Cubic spline gives the least curvature, ⇒ most smooth, so best choice.
Proof. Let
g(x) = f (x) − cS(x)
Then
g(ti ) = 0, i = 0, 1, · · · , n
and f ′′ = S ′′ + g ′′ , so
(f ′′ )2 = (S ′′ )2 + (g′′ )2 + 2S ′′ g′′
Z b Z b Z b Z b
′′ 2 ′′ 2 ′′ 2
⇒ (f ) dx = (S ) dx + (g ) dx + 2S ′′ g′′ dx
a a a a
Claim that Z b
S ′′ g′′ dx = 0
a
then this would imply
Z b Z b
(f ′′ )2 dx ≥ (S ′′ )2 dx
a a
3.3. NATURAL CUBIC SPLINES 37
Since g(a) = g(b) = 0, so the first term is 0. For the second term, since S ′′′ is piecewise
constant. Call
ci = S ′′′ (x), for x ∈ [ti , ti+1 ].
Then
Z b n−1
X Z ti+1 n−1
X
′′′ ′ ′
S g dx = ci g (x) dx = ci [g(ti+1 ) − g(ti )] = 0,
a i=0 ti i=0
Numerical integration
4.1 Introduction
Problem: Given a function f (x) on interval [a, b], find an approximation to the inte-
gral
Z b
I(f ) = f (x) dx
a
Main idea:
x0 = a, xi < xi+1 , xn = b
39
40 CHAPTER 4. NUMERICAL INTEGRATION
pi (x)
f (x)
x0 = a xi xi+1 xn = b
b−a
Uniform grid: h = n , xi+1 − xi = h,
Z b n−1
X h
f (x) dx = (f (xi+1 ) + f (xi ))
a 2
i=0
"n−1
#
1 X 1
= h f (x0 ) + f (xi ) + f (xn )
2 2
i=1
| {z }
T (f ; h)
so we can write Z b
f (x) ≈ T (f ; h)
a
Error estimates.
n−1
X Z xi+1
ET (f ; h) = I(f ) − T (f ; h) = f (x) − pi (x) dx
i=0 xi
4.2. TRAPEZOID RULE 41
Total error:
n−1 n−1
"n−1 #
X X 1 3 ′′ 1 3 X ′′ 1 b−a
ET (f ; h) = ET,i (f ; h) = − h f (ξi ) = − h f (ξi ) · ·
12 12 n | {z
h }
i=0 i=0
| i=0 {z }
= f ′′ (ξ) =n
Total error is
b − a 2 ′′
ET (f ; h) = h f (ξ), ξ ∈ (a, b).
12
Error bound
b−a 2
ET (f ; h) ≤ h max f ′′ (x) .
12 x∈(a,b)
Require error ≤ 0.5 × 10−4 . How many points should be used in the Trapezoid rule?
Answer. We have
so
max f ′′ (x) = e2 .
x∈(a,b)
b−a
x0 = a, x2n = b, h= , xi+1 − xi = h
2n
Consider the interval [x2i , x2i+2 ]. Find a 2nd order polynomial that interpolates f (x) at
the points
pi (x)
f (x)
Figure 4.2: Simpson’s rule: quadratic polynomial approximation (thick line) in each
sub-interval.
Then
Z x2i+2 Z x2i+2
f (x) dx ≈ pi (x) dx
x2i x2i
Z x2i+2
1
= f (x2i )f (x2i ) (x − x2i+1 )(x − x2i+2 ) dx
2h2 x2i
| {z }
2 3
h
Z 3
x2i+2
1
− f (x2i+1 ) (x − x2i )(x − x2i+2 ) dx
h2 x2i
| {z }
4
− h3
Z 3
x2i+2
1
+ 2 f (x2i+2 ) (x − x2i )(x − x2i+1 ) dx
2h x2i
| {z }
2 3
h
3
h
= [f (x2i ) + 4f (x2i+1 ) + f (x2i+2 )]
3
Putting together
Z b
f (x) dx ≈ S(f ; h)
a
n−1
X Z x2i+2
= pi (x) dx
i=0 x2i
n−1
h X
= [f (x2i ) + 4f (x2i+1 ) + f (x2i+2 )]
3
i=0
1 4 1
=2
1 4 1
Error bound
b−a 4
h max f (4) (x) .
|ES (f ; h)| ≤
180 x∈(a,b)
Example With f (x) = ex in [0, 2], now use Simpson’s rule, to achieve an error ≤
0.5 × 10−4 , how many points must one take?
Answer. We have
2 4 2
|ES (f ; h)| ≤h e ≤ 0.5 × 10−4
180
⇒ h4 ≤ 0.5−4 × 180/e2 = 1.218 × 10−3
⇒ h ≤ 0.18682
b−a
⇒ n= = 5.3 ≈ 6
2h
We need at least 2n + 1 = 13 points.
Note: This is much fewer points than using Trapezoid Rule.
b−a 1
hn = , hn+1 = h
2n 2
So
" n −1
2X
#
1 1
T (f ; hn ) = hn · f (a) + f (b) + f (a + ihn )
2 2
i=1
2n+1 −1
1 1 X
T (f ; hn+1 ) = hn+1 · f (a) + f (b) + f (a + ihn+1 )
2 2
i=1
4.5. ROMBERG ALGORITHM 45
n=1
n=2
n=3
n=4
n=5
n=6
n=7
Advantage: One case keep the computation for a level n. If this turns out to be not
accurate enough, then add one more level to get better approximation. ⇒ flexibility.
E(f ; h) = I(f ) − T (f ; h) = a2 h2 + a4 h4 + a6 h6 + · · · + an hn
h h h h h h
E(f ; ) = I(f ) − T (f ; ) = a2 ( )2 + a4 ( )4 + a6 ( )6 + · · · + an ( )n
2 2 2 2 2 2
Here an depends on the derivatives f (n) .
We have
(1) I(f ) = T (f ; h) + a2 h2 + a4 h4 + a6 h6 + · · ·
h h h h h
(2) I(f ) = T (f ; ) + a2 ( )2 + a4 ( )4 + a6 ( )6 + · · · + an ( )n
2 2 2 2 2
The goal is to use the 2 approximations T (f ; h) and T (f ; h2 ) to get one that’s more
accurate, i.e., we wish to cancel the leading error term, the one with h2 .
Multiply (2) by 4 and subtract (1), gives
Let
24 U (h/2) − U (h) U (h/2) − U (h)
V (h) = 4
= U (h/2) + .
2 −1 24 − 1
Then
I(f ) = V (h) + ã′6 h6 + · · ·
So V (h) is even better than U (h).
One can keep doing this several layers, until desired accuracy is reached.
This gives the Romberg Algorithm: Set H = b − a, define:
H
R(0, 0) = T (f ; H) = (f (a) + f (b))
2
R(1, 0) = T (f ; H/2)
R(2, 0) = T (f ; H/(22 ))
..
.
R(n, 0) = T (f ; H/(2n ))
R =romberg(f, a, b, n)
R = n × n matrix
h = b − a; R(1, 1) = [f (a) + f (b)] ∗ h/2;
for i = 1 to n − 1 do %1st column recursive trapezoid
R(i + 1, 1) = R(i, 1)/2;
h = h/2;
for k = 1 to 2i−1 do
R(i + 1, 1) = R(i + 1, 1) + h ∗ f (a + (2k − 1)h)
end
end
for j = 2 to n do %2 to $n$ column
for i = j to n do
1
R(i, j) = R(i, j − 1) + 4j −1
[R(i, j − 1) − R(i − 1, j − 1)]
end
end
Error form:
1 5 (4)
E1 [a, b] = − h f (ξ), ξ(a, b)
90
48 CHAPTER 4. NUMERICAL INTEGRATION
Then
I(f ) = S1 [a, b] + E1 [a, b]
a+b
Divide [a, b] up in the middle, let c = 2 .
where
Assume f (4) does NOT change much, then E1 [a, c] ≈ E1 [c, b], and
1 1
E2 [a, b] ≈ 2E1 [a, c] = 2 E1 [a, b] = 4 E1 [a, b]
25 2
This gives
S2 − S1
≤ε
24 − 1
This gives the idea of an adaptive recursive formula:
(A) I = S1 + E1
1
(B) I = S2 + E2 = S2 + E1
24
(B) ∗ 24 − (A) gives
(24 − 1)I = 24 S2 − S1
24 S2 − S1 S2 − S1
⇒ I= = S2 +
24 − 1 15
Note that this gives the best approximation when f (4) ≈ const.
Pseudocode: f : function, [a, b] interval, ε: tolerance for error
4.7. GAUSSIAN QUADRATURE FORMULAS 49
answer=simpson(f, a, b, ε)
compute S1 and S2
If |S2 − S1 | < 15ε
answer= S2 + (S2 − S1 )/15;
else
c = (a + b)/2;
Lans=simpson(f, a, c, ε/2);
Rans=simpson(f, c, b, ε/2);
answer=Lans+Rans;
end
In Matlab, one can use quad to compute numerical integration. Try help quad, it will
give you info on it. One can call the program by using:
a=quad(’fun’,a,b,tol)
It uses adaptive Simpson’s formula.
See also quad8, higher order method.
Nodes xi : are roots of Legendre polynomials qn+1 (x). These polynomials satisfies
Z b
xk qn+1 (x) dx = 0, (0 ≤ k ≤ n)
a
Examples for n ≤ 3, for interval [−1, 1]
q0 (x) = 1
q1 (x) = x
3 2 1
q2 (x) = x −
2 2
5 3 3
q3 (x) = x − x
2 2
50 CHAPTER 4. NUMERICAL INTEGRATION
q1 : 0
√
q2 : ±1/ 3
p
q3 : 0, ± 3/5
2x − (a + b) 1 1
t= , x = (b − a)t + (a + b)
b−a 2 2
so for −1 ≤ t ≤ 1 we have a ≤ x ≤ b.
Then Z b
Ai = li (x) dx.
−a
There are tables of such nodes and weights (Table 5.1 in textbook).
We skip the proof. See textbook if interested.
Advantage: Since all nodes are in the interior of the interval, these formulas can han-
dle integrals of function that tends to infinite value at one end of the interval (provided
that the integral is defined). Examples:
Z 1 Z 1 p
x −1
dx, (x2 − 1)1/3 sin(ex − 1) dx.
0 0
Chapter 5
5.1 Introduction
Problem: f (x) given function, real-valued, possibly non-linear. Find a root r of f (x)
such that f (r) = 0.
• Bisection (briefly)
• Fixed point iteration (main focus): general iteration, and Newton’s method
• Secant method
51
52 CHAPTER 5. NUMERICAL SOLUTION OF NONLINEAR EQUATIONS.
• Pick that interval [a, c] or [c, b], and repeat the procedure until stop criteria satis-
fied.
Stop Criteria:
2) |f (cn )| almost 0
• |xk − xk−1 | ≤ ε,
• |xk − g(xk )| ≤ ε,
• max # of iteration reached,
• any combination.
x1 = cos x0 = 0.5403
x2 = cos x1 = 0.8576
x3 = cos x2 = 0.6543
..
.
x23 = cos x22 = 0.7390
x24 = cos x23 = 0.7391
x25 = cos x24 = 0.7391 stop here
x1 = cos x0 = 0.9886
x2 = cos x1 = 0.9870
x3 = cos x2 = 0.9852
..
.
x27 = cos x26 = 0.1655
x28 = cos x27 = −0.4338
x29 = cos x28 = −3.8477 Diverges. It does not work.
Observation:
• If |g′ (ξ)| < 1, error decreases, the iteration convergence. (linear convergence)
In Example 2, we have
g(x) = e−2x (x − 1) + x,
g′ (x) = −2e−2x (x − 1) + x−2x + 1
With r = 1, we have
′
g (r = e−2 + 1 > 1
Divergence.
Pseudo code:
r=fixedpoint(’g’, x,tol,nmax}
r=g(r);
nit=1;
while (abs(r-g(r))>tol and nit < nmax) do
r=g(r);
nit=nit+1;
end
5.4. NEWTON’S METHOD 55
We also have
then
1
|e0 | ≤ |x1 − x0 | , (can be computed)
1−m
Put together
mk
|ek | ≤ |x1 − x0 | .
1−m
If the error tolerance is ε, then
Example cos x − x = 0, so
And x1 = cos x0 = cos 1 = 0.5403. Then, to achieve an error ≤ ε = 10−5 , the maximum
# iterations needed is
Of course that is the worst situation. Give it a try and you will find that k = 25 is
enough.
56 CHAPTER 5. NUMERICAL SOLUTION OF NONLINEAR EQUATIONS.
f (x) 6
f ′ (xk )
-
x
xk+1 xk
Convergence analysis. Let r be the root so f (r) = 0 and r = g(r). Define error:
ek+1 = |xk+1 − r| = |g(xk ) − g(r)|
Taylor expansion for g(xk ) at r:
1
g(xk ) = g(r) + (xk − r)g′ (r) + (xk − r)2 g′′ (ξ), ξ ∈ (xk , r)
2
′
Since g (r) = 0, we have
1
g(xk ) = g(r) + (xk − r)2 g′′ (ξ)
2
Back to the error, we now have
1 1
ek+1 = (xk − r)2 g ′′ (ξ) = e2k g′′ (ξ)
2 2
Write again m = maxx |g′′ (ξ)|, we have
ek+1 ≤ m e2k
This is called Quadratic convergence. Guaranteed convergence if e0 is small enough! (m
can be big, it would effect the convergence!)
Proof for the convergence: (can drop this) We have
e1 ≤ m e0 e0
If e0 is small enough, such that m e0 < 1, then e1 < e0 .
Then, this means m e1 < me0 < 1, and so
e2 ≤ m e1 e1 < e1 , ⇒ m e2 < me1 < 1
Continue like this, we conclude that ek+1 < ek for all k, i.e., error is strictly decreasing
after each iteration. ⇒ convergence.
√
Example Find an numerical method to compute a using only +, −, ∗, / arithmetic
operations. Test it for a = 3.
√
Answer. It’s easy to see that a is a root for f (x) = x2 − a.
Newton’s method gives
f (xk ) x2k − a xk a
xk+1 = xk − = x k − = +
f ′ (xk ) 2xk 2 2xk
Test it on a = 3: Choose x0 = 1.7.
error
x0 = 1.7 7.2 × 10−2
x1 = 1.7324 3.0 × 10−4
x2 = 1.7321 2.6 × 10−8
x3 = 1.7321 4.4 × 10−16
Note the extremely fast convergence. Usually, if the initial guess is good (i.e., close to
r), usually a couple of iterations are enough to get an very accurate approximation.
58 CHAPTER 5. NUMERICAL SOLUTION OF NONLINEAR EQUATIONS.
• |xk − xk−1 | ≤ ε
• |f (xk )| ≤ ε
Sample Code:
r=newton(’f’,’df’,x,nmax,tol)
n=0; dx=f(x)/df(x);
while (dx > tol) and (f(x) > tol) and (n<nmax) do
n=n+1;
x=x-dx;
dx=f(x)/df(x);
end
r=x-dx;
Advantages include
• No computation of f ′ ;
√
Example Use secant method for computing a.
Answer. The iteration now becomes
error
x1 = 1.7 7.2 × 10−2
x2 = 1.7328 7.9 × 10−4
x3 = 1.7320 7.3 × 10−6
x4 = 1.7321 1.7 × 10−9
x5 = 1.7321 3.6 × 10−15
write it in detail:
f1 (x1 , x2 , · · · , xn ) = 0
f2 (x1 , x2 , · · · , xn ) = 0
..
.
fn (x1 , x2 , · · · , xn ) = 0
~x = G(~x)
Newton’s mathod:
~xk+1 = ~xk − Df (~xk )−1 · F(~xk )
60 CHAPTER 5. NUMERICAL SOLUTION OF NONLINEAR EQUATIONS.
6.1 Introduction
The problem:
a11 x1 + a12 x2 + · · · + a1n xn = b1 (1)
a21 x1 + a22 x2 + · · · + a2n xn = b2 (2)
(A) : ..
.
an1 x1 + an2 x2 + · · · + ann xn = bn (n)
Or in matrix-vector form:
A~x = ~b,
where A ∈ IRn×n , ~x ∈ IRn , ~b ∈ IRn
a11 a12 · · · a1n x1 b1
a21 a22 · · · a2n x2 b2
A = {aij } = . ..
.. ,
~x = . , ~b =
.. .
.. .. ..
. . . .
an1 an2 · · · ann xn bn
61
62 CHAPTER 6. DIRECT METHODS FOR LINEAR SYSTEMS
1. Full matrix
2. Large sparse system
3. Tri-diagonal or banded systems
4. regularity and condition number
• Methods
for k = 1, 2, 3, · · · , n − 1
ajk
(j) ← (j) − (k) × akk , j = k + 1, k + 2, · · · , n
Note, here the aij and bi are different from those in (A).
Step 2: Backward substitution – you get the solutions.
bn
xn =
ann
n
1 X
xi = bi − aij xj , i = n − 1, n − 2, · · · , 1.
aii
j=i+1
Potential problem: In step 1, if some akk is very close to or equal to 0, then you are in
trouble.
6.3. GAUSSIAN ELIMINATION WITH SCALED PARTIAL PIVOTING 63
Example 1. Solve
x + x2 + x3 = 1 (1)
1
2x1 + 4x2 + 4x3 = 2 (2)
3x1 + 11x2 + 14x3 = 6 (3)
Forward elimination:
(1) ∗ (−2) + (2) : 2x2 + 2x3 = 0 (2′ )
(1) ∗ (−3) + (3) : 8x2 + 11x3 = 3 (3′ )
(2′ ) ∗ (−4) + (3′ ) : 3x3 = 3 (3′′ )
Backward substitution:
x3 = 1
1
x2 = (0 − 2x3 ) = −1
2
x1 = 1 − x2 − x3 = 1
Now run the whole procedure again: (1) ∗ (−0.001) + (2) will give us x2 = 1.00.
Set it back in (1):
x1 = 3.00 − 2.00x2 = 1.00
Solution now is correct for 3 digits.
Conclusion: The order of equations can be important!
Consider
a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2
Assume that we have computed x̃2 = x2 + ε2 where ε2 is the error (machine error, round
off error etc).
We then compute x1 with this x̃2 :
1
x̃1 = (b1 − a12 x̃2 )
a11
1
= (b1 − a12 x2 − a12 ε2 )
a11
1 a12
= (b1 − a12 x2 ) − ε2
a a
| 11 {z } | 11{z }
= x1 − ε1
a12 a12
Note that ε1 = a11 ε2 . Error in x2 propagates with a factor of a11 .
For best results, we wish to have |a11 | as big as possible.
1. Get ~s.
~s = [2, 4, 10]
2. We have
a11 1 a21 3 a31 2
= , = , = , ⇒ k=2
s1 2 s2 4 s3 10
Exchange eq (1) and (2), and do one step of elimination
3x1 + 4x2 + 0x3 = 3 (2)
2
3 x2 + x3 = 2 (1′ ) = (1) + (2) ∗ ( 31 )
22
(3′ ) = (3) + (2) ∗ (− 23 )
3 x2 + 4x3 = 8
x3 = 2, x2 = 0, x1 = 1.
> x= A\b;
6.4 LU-Factorization
1 0 0 ··· 0 u11 u12 ··· u1,(n−1) u1n
l21 1 0 ··· 0 0 u22 ··· u2,(n−1) u2n
L = l31 l32 1 ··· 0 U = ... .. .. ..
, . . .
.. .. .. ..
. . . . 0 0 ··· u(n−1),(n−1) u(n−1),n
ln1 ln2 ln3 · · · 1 0 0 ··· 0 unn
Two triangular system. We first solve y (by forward substitution), then solve x (by
backward substitution).
With pivoting:
LU = P A
> [L,U]=lu(A);
> y = L \ b;
> x = U \ y;
+ transparences.
Work amount for direct solvers for A ∈ IRn×n : operation count
flop: one float number operation (+, -, *, / )
1 3
Elimination: 3 (n − n) flops
1 2
Backward substitution: 2 (n − n) flops
1 3
Total work amount is about 3n for large n.
This is very slow for large n. We will need something more efficient.
then A is called strictly diagonal dominant. And A has the following properties:
n
!1/2
X
2. kxk2 = x2i , l2 -norm
i=1
kAxk
kAk = max
x 6= 0
~ kxk
Obviously we have
kAxk
kAk ≥ ⇒ kAxk ≤ kAk · kxk
kxk
In addition we have
kIk = 1, kABk ≤ kAk · kBk .
Eigenvalues λi for A:
or we can write
A = tridiag(ai , di , ci ).
This can be solved very efficiently:
• Backward substitution
xn ← bn /dn
for i = n − 1, n − 2, · · · , 1
xi ← d1i (bi − ci xi+1 )
end
• diagonal matrix: k = 0,
• tridiagonal matrix: k = 1,
• pentadiagonal matrix: k = 2.
• Iterative methods: finding only approximation to solutions. Useful for large sparse
systems, usually coming from discretization of differential equations.
1. Fixed-point iterations
• Jacobi
• Gauss-Seidal
• SOR
73
74 CHAPTER 7. ITERATIVE SOLVERS FOR LINEAR SYSTEMS
or in a compact for:
n
1 X
xi = bi − aij xj , i = 1, 2, · · · , n
aii
j=1,j6=i
• Choose a start point, x0 = (x01 , x02 , · · · , x0n )t . For example, one may choose x0i = 1
for all i, or xi = bi /aii .
for i = 1, 2, · · · , n
n
1 X
xk+1
i = bi − aij xkj
aii
j=1,j6=i
end
end
• or others...
7.3. GAUSS-SEIDAL ITERATIONS 75
Observations:
In the first summation term, all xkj are already computed for step k + 1.
We will replace all these xkj with xk+1
j . This gives:
76 CHAPTER 7. ITERATIVE SOLVERS FOR LINEAR SYSTEMS
for i = 1, 2, · · · , n
i−1 n
1 X X
xk+1
i = bi − aij xk+1
j − aij xkj
aii
j=1 j=i+1
end
end
• Need only one vector for both xk and xk+1 , saves memory space.
Example 2. Try it on the same Example 1, with x0 = (0, 0.5, 1)t . The iteration now is:
k+1
x1 = 12 xk2
xk+1
2 = 12 (1 + xk+1
1 + xk3 )
k+1
x3 = 12 (2 + xk+1
2 )
7.4 SOR
SOR (Successive Over Relaxation) is a more general iterative method. It is based on
Gauss-Seidal.
i−1 n
1 X X
xk+1
j = (1 − w)xki + w · bi − aij xk+1
j − aij xkj
aii
j=1 j=i+1
• w = 1: Gauss-Seidal
Example Try this on the same example with w = 1.2. General iteration is now:
k+1
x1 = −0.2xk1 + 0.6xk2
xk+1
2 = −0.2xk2 + 0.6 ∗ (1 + xk+1
1 + xk3 )
k+1
= −0.2xk3 + 0.6 ∗ (2 + xk+1
x3 2 )
A = D
Now we have
Ax = (L + D + U )x = Lx + Dx + U x = b
Jacobi iterations:
Dxk+1 = b − Lxk − U xk
so
xk+1 = D −1 b − D −1 (L + U )xk = yJ + MJ xk
where
yJ = D −1 b, MJ = −D −1 (L + U ).
Gauss-Seidal:
Dxk+1 + Lxk+1 = b − U xk
so
xk+1 = (D + L)−1 b − (D + L)−1 U xk = yGS + MGS xk
where
yGS = (D + L)−1 b, MGS = −(D + L)−1 U.
SOR:
xk+1 = (1 − w)xk + wD −1 (b − Lxk+1 − U xk )
where
ySOR = (D + wL)−1 b, MSOR = (D + wL)−1 [(1 − w)D − wU ].
7.6. ANALYSIS FOR ERRORS AND CONVERGENCE 79
xk+1 = y + M xk
ek+1 = M ek .
This implies:
k
e
≤ M k
e0
, e0 = x0 − s.
Theorem If kM k < 1 for some norm k·k, then the iterations converge.
• Jacobi: M = −D −1 (L + U ), given by A;
Example Let’s check the same example we have been using. We have
2 −1 0
A = −1 2 −1 ,
0 −1 2
80 CHAPTER 7. ITERATIVE SOLVERS FOR LINEAR SYSTEMS
so
0 0 0 2 0 0 0 −1 0
L = −1 0 0 , D = 0 2 0 , U = 0 0 −1 .
0 −1 0 0 0 2 0 0 0
The iteration matrix for each method:
0 0.5 0 0 0.5 0 −0.2 0.6 0
MJ = 0.5 0 0.5 , MGS = 0 0.25 0.5 , MSOR = −0.12 0.16 0.6
0 0.5 0 0 0.125 0.25 −0.072 0.096 0.16
The l2 norm is the most significant one. We see now why SOR converges fastest.
Then, all three iteration methods converge for all initial choice of x0 .
NB! If A is not diagonal dominant, it might still converge, but there is no guarantee.
Chapter 8
x x0 x1 x2 ··· xm
y y0 y1 y2 ··· yM
Data come from observation (measured) or experiments.
These yi ’s can have error (called “noise”) in measuring or experimenting.
y has a relation with x from physical model: y = y(x).
Then, our data is
yk = y(xk ) + ek
where ek is error.
Example 1. If y = ax + b, this is called linear regression. Your lab data would not lie
exact on a straight line (or your lab instructor will be very suspicious!). Our job now is
to find a straight line that “best” fit our data. See Figure 8.1 for an illustration.
A more specific way of saying the same thing: Find a, b, such that when we use y = ax+b,
the “error” becomes smallest possible.
How to measure error?
81
82 CHAPTER 8. LEAST SQUARES
y 6
-
x
m
X
3. [y(xk ) − yk ]2 — l2 norm, used in Least Square Method. (LSM)
k=0
is minimized.
In detail:
m
∂ψ X
=0 : 2(axk + b − yk )xk = 0, (I)
∂a
k=0
m
∂ψ X
=0 : 2(axk + b − yk ) = 0, (II)
∂b
k=0
Tk 0 10 20 30 40 80 90 95
Sk 68.0 67.1 66.4 65.6 64.6 61.8 61.0 60.0
where
T : temperature
S = aT + b
7
X
Tk2 = 0 + 102 + 202 + · · · + 902 + 952 = 26525
k=0
X7
Tk = · · · = 365
k=0
7
X
Tk Sk = · · · = 22685
k=0
7
X
Sk = · · · = 514.5
k=0
Solve it
a = −0.079930, b = 67.9593.
So
S(T ) = −0.079930 T + 67.9593.
84 CHAPTER 8. LEAST SQUARES
At minimum, we have
∂ψ ∂ψ ∂ψ
= = =0
∂a ∂b ∂c
In detail:
m
∂ψ X
=0 : 2 ax2k + bxk + c − yk · x2k = 0
∂a
k=0
m
∂ψ X
=0 : 2 ax2k + bxk + c − yk · xk = 0
∂b
k=0
m
∂ψ X
=0 : 2 ax2k + bxk + c − yk = 0
∂c
k=0
which best fit the data. This means we need to find (a, b, c).
Here f (x), g(x), h(x) are given functions, for example
f (x) = ex , g(x) = ln(x), h(x) = cos x,
but not restricted to these.
Define the error function:
m
X m
X
ψ(a, b, c) = (y(xk ) − yk )2 = (a · f (xk ) + b · g(xk ) + c · h(xk ) − yk )2
k=0 k=0
At minimum, we have
m h i
∂ψ X
=0 : 2 a · f (xk ) + b · g(xk ) + c · h(xk ) − yk · f (xk ) = 0
∂a
k=0
m h i
∂ψ X
=0 : 2 a · f (xk ) + b · g(xk ) + c · h(xk ) − yk · g(xk ) = 0
∂b
k=0
m h i
∂ψ X
=0 : 2 a · f (xk ) + b · g(xk ) + c · h(xk ) − yk · h(xk ) = 0
∂c
k=0
We note that the system of normal equations is always symmetric. We only need to
compute half of the entries.
How to choose the basis functions? They are chosen such that the system of the
normal equations is regular (invertible) and well-conditioned.
n
X
ci gi (x) = 0, if and only if c0 = c1 = c2 = · · · = cn = 0.
i=0
m h m
" n #2
X i2 X X
ψ(c0 , c1 , · · · , cn ) = y(xk ) − yk = ci gi (x) − yk
k=0 k=0 i=0
At minimum, we have
∂ψ
= 0, j = 0, 1, ·, n.
∂cj
This gives:
m
" n
#
X X
2 ci gi (xk ) − yk gj (xk ) = 0
k=0 i=0
n m
! m
X X X
gi (xk )gj (xk ) ci = gj (xk )yk , j = 0, 1, ·, n.
i=0 k=0 k=0
A~c = ~b
m
X
A = {aij }, aij = gi (xk )gj (xk )
k=0
m
X
~b = {bj }, bj = gj (xk )yk .
k=0
y(x) = a · bx
This means, we need to find (a, b) such that this y(x) best fit the data.
Do a variable change:
ln y = ln a + x · ln b .
Let
S = ln y, ā = ln a, b̄ = ln b.
Given data set (xk , yk ). Compute Sk = ln yk for all k.
We can now find (ā, b̄) such that Sk best fits (xk , Sk ).
Then, transform back to the original variable
a = exp{ā}, b = exp{b̄}.
y(x) = ax · sin(bx).
We can not find a variable change that can change this problem into a linear one. So
we will now deal with it as a non-linear problem.
Define error
m h
X i2 m h
X i2
ψ(a, b) = y(xk ) − yk = axk · sin(bxk ) − yk .
k=0 k=0
At minimum:
m h i
∂ψ X
=0: 2 axk · sin(bxk ) − yk · [xk · sin(bxk )] = 0
∂a
k=0
m h i
∂ψ X
=0: 2 axk · sin(bxk ) − yk · [axk · cos(bxk )xk ] = 0
∂a
k=0
9.1 Introduction
Definition of ODE: an equation which contains one or more ordinary derivatives of an
unknown function.
Some examples:
x′ = x + 1, x(0) = 0. solution: x(t) = et − 1
x′ = 2, x(0) = 0. solution: x(t) = 2t.
Overview:
89
90 CHAPTER 9. ODES
Taylor series method of order m: take the first (m + 1) terms in Taylor expansion.
1 1 m (m)
x(t0 + h) = x(t0 ) + hx′ (t0 ) + h2 x′′ (t0 ) + · · · + h x (t0 ).
2 m!
Error in each step:
∞
X 1 k (k) 1
x(t0 + h) − x1 = h x (t0 ) = hm+1 x(m+1) (ξ)
k! (m + 1)!
k=m+1
xk+1 = xk + h · f (tk , xk ), k = 0, 1, 2, · · ·
For m = 2, we have
1
x1 = x0 + hx′ (t0 ) + h2 x′′ (t0 )
2
9.2. TAYLOR SERIES METHODS FOR ODE 91
Using
d
x′′ (t0 ) = f (t0 , x(t0 )) = ft (t0 , x0 ) + fx (t0 , x0 ) · x′ (t0 ) = ft (t0 , x0 ) + fx (t0 , x0 ) · f (t0 , x0 )
dt
we get
1
x1 = x0 + hf (t0 , x0 ) + h2 [ft (t0 , x0 ) + fx (t0 , x0 ) · f (t0 , x0 )]
2
For general step k, we have
1
xk+1 = xk + hf (tk , xk ) + h2 [ft (tk , xk ) + fx (tk , xk ) · f (tk , xk )]
2
x′ = −x + e−t , x(0) = 0.
For m = 2, we have
so
1
xk+1 = xk + hx′k + h2 x′′k
2
1
= xk + h (−xk + exp{−tk }) + h2 [xk − 2 exp{−tk }]
2
1 2 2
= (1 − h + h )xk + (h − h ) exp{−tk }
2
x′ = x, x(0) = 1.
We have
Local truncation error (error in each time step) for Taylor series method of order m is
m+1 m+1 m
(k) h (m+1)
h d f (ξ)
eL = |xk+1 − x(tk + h)| = x (ξ) =
, ξ ∈ (tk , tk+1 ).
(m + 1)! (m + 1)! dtm
Here we use the fact that
dm f
x(m+1) =
dtm
Assume now that m
d f
dtm ≤ M.
We have
(k) M
eL ≤ hm+1 = O(hm+1 ).
(m + 1)!
Total error: sum over all local errors.
Detail: We want to compute x(T ) for some time t = T . Choose an h. Then total
number of steps is
N = f racT h, i.e., T = N h.
Then
N N
X (k) X M
E = eL ≤ hm+1
(m + 1)!
k=1 k=1
M M MT
= N hm+1 = (N h) hm = hm = O(hm )
(m + 1)! (m + 1)! (m + 1)!
In general: If the local truncation error is of order O(ha+1 ), then the total error is of
O(ha ), i.e., one order less.
9.3. RUNGE KUTTA METHODS 93
A better method: should only use f (t, x), not its derivatives.
1st order method: The same as Euler’s method.
2nd order method: Let h = tk+1 − tk . Given xk , the next value xk+1 is computed as
1
xk+1 = xk + (K1 + K2 )
2
where (
K1 = h · f (tk , xk )
K2 = h · f (tk+1 , xk + K1 )
This is called Heun’s method.
Proof that this is a second order method: Taylor expansion in two variables
1
x(tk + h) = x(tk ) + hx′ (tk ) + h2 x′′ (tk ) + O(h3 )
2
1
= x(tk ) + hf (tk , xk ) + h2 [ft + fx x′ ] + O(h3 )
2
1 2
= x(tk ) + hf + h [ft + fx f ] + O(h3 ).
2
We see the first 3 terms are identical, this gives the local truncation error:
eL = O(h3 )
94 CHAPTER 9. ODES
xk+1 = xk + w1 K1 + w2 K2 + · · · + wm Km
where
K1 = h · f (tk , xk )
K2 = h · f (tk + a2 h, x + b2 K1 )
K3 = h · f (tk + a3 h, x + b3 K1 + c3 K2 )
..
.
Pm−1
K = h · f (tk + am h, x + φi Ki )
m i=1
K1 = h · f (tk , xk )
1 1
K2 = h · f (tk + h, xk + K1 )
2 2
1 1
K3 = h · f (tk + h, xk + K2 )
2 2
K4 = h · f (tk + h, xk + K3 )
Optimal situation: h varies each step to get uniform error at each step.
9.4. AN ADAPTIVE RUNGE-KUTTA-FEHLBERG METHOD 95
• Compute x(t + 21 h) from x(t) with step 21 h, then compute x(t + h) from x(t + 12 h)
with step 12 h; Call this x̄(t + h);
But this is rather wasteful of computing time. Although the idea is good.
A better method, by Fehlberg, building upon R-K methods. He has a 4th order method:
25 1408 2197 1
x(t + h) = x(t) + K1 + K3 + K4 − K5 ,
216 2565 4104 5
where
K1 = h · f (t, x)
1 1
K2 = h · f (t + h, x + K1 )
4 4
3 3 9
K3 = h · f (t + h, x + K1 + K2 )
8 32 32
12 1932 7200 7296
K4 = h · f (t + h, x + K1 − K2 + K3 )
13 2197 2197 2197
439 3680 845
K5 = h · f (t + h, x + K1 − 8K2 + K3 − K4 )
216 513 4104
Main interests here: The difference |x(t + h) − x̄(t + h)| gives an estimate for the error.
Pseudo code for adaptive RK45, with time step controller
96 CHAPTER 9. ODES
set h = h0 , x = x0 , k = 0,
end (while)
Assume we know xn , xn−1 , xn−2 , · · · , xn−k one can approximate the integrand x′ (s) by
using interpolating polynomial.
fn − fn−1
x′ (s) ≈ P1 (s) = fn−1 + (s − tn−1 ).
h
Then Z tn+1
h
xn+1 = xn + P1 (s) ds = xn + (3fn − fn−1 ).
tn 2
9.5. MULTI-STEP METHODS 97
Here step (P ) is called the predictor, and step (C) is the corrector.
This is called predictor-corrector’s method.
Remark: All methods for scalar equation can be used for systems!
Example Consider
x′1 = x1 − x2 + 2t − t2 − t3
x′2 = x1 + x2 − 4t2 + t3
We will need the high order derivatives:
′′
x1 = x′1 − x′2 + 2 − 2t − 3t2
x′′2 = x1 + x2 − 8t + 3t2
9.7. HIGHER ORDER EQUATIONS AND SYSTEMS 99
and
x′′′
1 = x′′1 − x′′2 − 2 − 6t
x2 = x′′1 + x′′2 − 8 + 6t
′′′
and so on...
~ 2 = h · F (tk + 1 h, ~xk + 1 K
K ~ 1)
2 2
~ 3 = h · F (tk + 1 h, ~xk + 1 K
K ~ 2)
2 2
~ 4 = h · F (tk + h, ~xk + K
K ~ 3)
We then have
x′1 = x′ = x2
x′2 = x′′ = x3
x′3 = x′′′ = x4
..
.
= x(n−1) = xn
x′
n−1′
xn = x(n) = f (t, x1 , x2 , · · · , xn )
This is a system of 1st order ODEs.
Systems of high-order equations are treated in the same way.
100 CHAPTER 9. ODES
Example Consider
x′ = −ax, x(0) = 1
the exact solution is
x(t) = e−t
We see that
(P 1) x → 0 as t → +∞.
xn → 0, as n → +∞,
We must have
2
|1 − ah| < 1, ⇒ h<
a
which gives a restriction to the time step size: it has to be sufficiently small.
Observations:
• They decay at a very different rate. The term e−39t tends to 0 much faster than
the term e−t ;
xn → 0, yn → 0 as n → +∞.
which implies
2
(1) : h < and (2) : h < 2
39
We see that condition (1) is much stronger than condition (2), therefore it must be
satisfied.
Condition (1) corresponds to the term e−39t , which is the transient term and it tends
to 0 very quickly as t grows. Unfortunately, time step size is restricted by this transient
term.
Let
−20 −19 x xn
A= , ~x = , ~xn = .
−19 −20 y yn
We can write
We have
λ1 (A) = −1, λ2 (A) = −39
They are both negative, therefore 1 − hλi > 1, implying
(I − hA)−1
< 1
Advantage: Can choose large h, always stable. Suitable for stiff systems.
(I − hA)~xn+1 = ~xn
Longer computing time for each step. Not recommended if the system is not stiff.