Iowa Numeric Computation Methods
Iowa Numeric Computation Methods
html
CALCULATION OF FUNCTIONS
p(x) = 1 + x
y
y = ex
2
y = p (x)
1
x
-1 1
Continue in this manner looking next for a quadratic
polynomial
p(x) = 1 + x + 12 x2
y
y = ex 2
y = p (x)
1
y = p (x)
2
x
-1 1
We continue this pattern, looking for a polynomial
p(x) = 1 + x + 12 x2 + + n!
1 xn
.25
x
-1 1
TAYLORS APPROXIMATION FORMULA
x
1 2
y = log x
y = p (x)
1
-1 y = p (x)
2
y = p (x)
3
THE TAYLOR POLYNOMIAL ERROR FORMULA
1 2 n xn+1
= 1 + x + x + + x +
1x 1x
x2 x4 x2m
cos x = 1 + + (1)m
2! 4! (2m)!
m x2m+2
+(1) cos cx
(2m + 2)!
x3 x5 x2m1
sin x = x + + (1)m1
3! 5! (2m 1)!
x2m+1
+(1)m cos cx
(2m + 1)!
with cx between 0 and x.
OBTAINING TAYLOR FORMULAS
x2 2 1 4 1 6 (1)n 2n
e = 1 x + x x + + x
2! 3! n!
(1)n+1 2n+2
+ x e x
(n + 1)!
Because ct must be between 0 and x2, we have it
must be negative. Thus we let ct = x in the error
term, with 0 x x2.
EVALUATING A POLYNOMIAL
xj = x xj1
n o
Then to compute x2, x3, ..., xn
will cost n 1 mul-
tiplications. Our algorithm becomes
poly = a0 + a1x
power = x
f or j = 2 : n
power = x power
poly = poly + aj power
end
The total operations cost is
additions : n
multiplications : n + n 1 = 2n 1
When n is evenly moderately large, this is much less
than for the first method of evaluating p(x). For ex-
ample, with n = 20, the first method has 210 multi-
plications, whereas the second has 39 multiplications.
We now considered nested multiplication. As exam-
ples of particular degrees, write
n = 2 : p(x) = a0 + x(a1 + a2x)
n = 3 : p(x) = a0 + x (a1 + x (a2 + a3x))
n = 4 : p(x) = a0 + x (a1 + x (a2 + x (a3 + a4x)))
These contain, respectively, 2, 3, and 4 multiplica-
tions. This is less than the preceding method, which
would have need 3, 5, and 7 multiplications, respec-
tively.
p(x) = b0 + (x z)q(x) ()
Proof: Simply expand
b0 + (x z) b1 + b2x + b3x2 + + bnxn1
and use the fact that
p(x) = (x z)q(x)
For the remaining roots of p(x), we can concentrate
on finding those of q(x). In rootfinding for polynomi-
als, this process of reducing the size of the problem is
called deflation.
Define
Z
1 x sin t
SF (x) = dt, x 6= 0
x 0 t
We use Taylor polynomials to approximate this func-
tion, to obtain a way to compute it with accuracy and
simplicity.
y
1.0
0.5
x
-8 -4 4 8
As an example, begin with the degree 3 Taylor ap-
proximation to sin t, expanded about t = 0:
1 1 5
sin t = t t3 + t cos ct
6 120
with ct between 0 and t. Then
sin t 1 1 4
= 1 t2 + t cos ct
Z x
t Z x
6 120
sin t 1 2 1 4
dt = 1 t + t cos ct dt
0 t 0 6 120
Z x
1 3 1
= x x + t4 cos ctdt
Z x
18 120 0
1 sin t 1 2
dt = 1 x + R2(x)
x 0 t 18
Zx
1 1
R2(x) = t4 cos ct dt
120 x
0
Z
1 x n t2n
R2n2(x) = (1) cos(ct) dt
x 0 (2n + 1)!
Z
1 x n t2n
R2n2(x) = (1) cos(ct) dt
x 0 (2n + 1)!
To simplify matters, let x > 0. Since |cos(ct)| 1,
Z
1 x t2n x2n
|R2n2(x)| dt =
x 0 (2n + 1)! (2n + 1)!(2n + 1)
It is easy to see that this bound is also valid for x < 0.
As required, choose the degree so that
|R2n2(x)| 5 109
From the error bound,
1
max |R2n2(x)|
|x|1 (2n + 1)!(2n + 1)
Choose n so that this upper bound is itself bounded
by 5 109. This is true if 2n + 1 11, i.e. n 5.
The polynomial is
x2 x4 x6 x8
p(x) = 1 + + , 1 x 1
3!3 5!5 7!7 9!9
and
x1 = x2 = x3 = 98765
There are keys on the calculator for the mean
n
1 X
= xj
n j=1
and the standard deviation s where
n 2
1 X
s2 = xj
n 1 j=1
In our case, what should these equal? In fact, the
calculator gives
.
= 98765 s = 1.58
Why?
2. A Fortran program example: Consider two pro-
grams run on a now extinct computer.
Program A:
Program B:
A = 1.0 + 2.0 (23)
SILLY = 0.0
B = A 1.0
P RIN T , A, B
EN D
Output: 1.0 0.0
x = 10e
with e an integer, 1 < 10, and = +1 or 1.
Thus
50
= (1.66666 )10 101, with = +1
3
x = 2e
with
1 < (10)2 = 2
and e an integer. For example,
(1111111)2 e (1111111)2
127 e 127
In actuality, the limits are
126 e 127
for reasons related to the storage of 0 and other num-
bers such as .
What is the connection of the 24 bits in the significand
to the number of decimal digits in the storage of
a number x into floating point form. One way of
answering this is to find the integer M for which
1. 0 < x M and x an integer implies f l(x) = x;
and
2. f l(M + 1) 6= M + 1
This integer M is at least as big as
1.11 1 223 = 223 + + 20
| {z }
23 10s 2
This sums to 224 1. In addition, 224 = (1.0 0)2
224 also stores exactly. What about 224 + 1? It does
not store exactly, as
1.0 01 224
| {z }
23 00s 2
Storing this would require 25 bits, one more than al-
lowed. Thus
M = 224 = 16777216
This means that all 7 digit decimal integers store ex-
actly, along with a few 8 digit integers.
THE MACHINE EPSILON
1+ >1
in the arithmetic of the machine.
1 = (1.00 0)2 20
What is the smallest number which can be added to
this without disappearing? Certainly we can write
Let
x = (1.a2 an )2 2e
with all ai equal to 0 or 1. Then for a chopped floating
point representation, we have
f l(x) < x
and the error x f l(x) is always positive. This later
has major consequences in extended numerical com-
putations. With x 6= f l(x) and rounding, the error
x f l(x) is negative for half the values of x, and it is
positive for the other half of possible values of x.
We often write the relative error as
x f l(x)
=
x
This can be expanded to obtain
f l(x) = (1 + )x
Thus f l(x) can be considered as a perturbed value
of x. This is used in many analyses of the eects of
chopping and rounding errors in numerical computa-
tions.
M = 224 = 16777216
PI=3.14159265358979D0
SOME DEFINITIONS
The error in xA is
error(xA) = xT xA
The relative error in xA is
error(xA) xT xA
rel(xA) = =
xT xT
Example: xT = e, xA = 19
7 . Then
19 .
error(xA) = e = .003996
7
rel(xA) = .00147
We also speak of significant digits. We say xA has m
significant digits with respect to xT if the magnitude
of error(xA) is 5 units in the (m + 1)st digit, be-
ginning with the first nonzero digit in xT . Above, xA
has 3 significant digits with respect to xT .
SOURCES OF ERROR
Example. Define
What happened?
Example. Define
1 cos x
f (x) = 2
, x 6= 0
x
Values for a sequence of decreasing positive values
of x is given in Section 2.2 of the text, using a past
model of a popular calculator. The calculator carried
10 decimal digits, and it used rounded arithmetic.
x Computed f (x) True f (x)
0.1 0.4995834700 0.4995834722
0.01 0.4999960000 0.4999958333
0.001 0.5000000000 0.4999999583
0.0001 0.5000000000 0.4999999996
0.00001 0.0 0.5000000000
Consider one case, that of x = .001. Then on the
calculator:
cos (.001) = .9999994999
1 cos (.001) = 5.001 107
1 cos (.001)
2 = .5001000000
(.001)
The true answer is f (.001) = .4999999583. The rel-
ative error in our answer is
.4999999583 .5001 .0001000417 .
= = .0002
.4999999583 .4999999583
There 3 significant digits in the answer. How can such
a straightforward and short calculation lead to such a
large error (relative to the accuracy of the calculator)?
When two numbers are nearly equal and we subtract
them, then we suer a loss of significance error in
the calculation. In some cases, these can be quite
subtle and dicult to detect. And even after they are
detected, they may be dicult to fix.
(5)3 125
= 20.83
3! 6
in the 4 digit decimal calculation, with an error of
magnitude 0.00333 . . . Note that this error in an in-
termediate step is of same magnitude as the true an-
swer 0.006738 being sought. Other similar errors
are present in calculating other coecients, and thus
they cause a major error in the final answer being cal-
culated.
0 x
-2
-4
-6
-8
0.99998 1.00000 1.00002
UNDERFLOW ERRORS
Consider evaluating
f (x) = x10
for x near 0. When using IEEE single precision arith-
metic, the smallest nonzero positive number express-
ible in normalized floating-point format is
.
m = 2126 = 1.18 1038;
see the table on IEEE single precision arithmetic with
E = 1 and (a1a2 . . . a23)2 = (00 . . . 0)2.Thus f (x)
will be set to zero if
x10 < m
1 .
|x| < m 10 = 1.61 104
0.000161 < x < 0.000161
OVERFLOW ERRORS
xA yA = f l (xAyA)
implies
x2 26x + 1 = 0
Using the quadratic formula, we have the true answers
(1) (2)
rT = 13 + sqrt(168), rT = 13 sqrt(168)
From a table of square roots, we take
.
sqrt(168) = 12.961
Since this is correctly rounded to 5 digits, we have
S = a1 + a2 + + an
with a sequence of machine numbers {a1, ..., an}. Should
we add from largest to small, should we add from
smallest to largest, or should we just add the numbers
based on their original given order? In other words,
does it matter how we calculate the sum?
f l(x) = (1 + ) x
For bounds on for a binary floating point represen-
tation with N binary digits in the mantissa, we have
2N 2N , rounding
2N +1 0, chopping
We use these results as tools for analyzing the error
in computing the sum S.
We create the sum S by a sequence simple additions.
Define
S2 = f l (a1 + a2) = (1 + 2) (a1 + a2)
S3 = f l(S2 + a3) = (1 + 3) (S2 + a3)
S4 = f l(S3 + a4) = (1 + 4) (S3 + a4)
..
Sn = f l(Sn1 + an) = (1 + n) (Sn1 + an)
This says each simple addition is performed exactly,
following which it is rounded or chopped back to the
precision of the machine. All of the numbers j satisfy
the inequalities given earlier.
For example,
S2 = (a1 + a2) + 2 (a1 + a2)
S3 = (1 + 3) [a3 + (1 + 2) (a1 + a2)]
= (a1 + a2 + a3) + (a1 + a2) (2 + 3)
+a33 + 23 (a1 + a2)
.
= (a1 + a2 + a3) + (a1 + a2) (2 + 3) + a33
Continue in this manner to get
.
Sn = (a1 + a2 + + an)
+ (a1 + a2) (2 + 3 + + n)
+a3 (3 + + n)
+a4 (4 + + n)
+ + ann
Using this yields the formula given earlier for S Sn.
Consider now the formula
S Sn = a1 (2 + + n)
a2 (2 + + n)
a3 (3 + + n)
a4 (4 + + n)
..
ann
and what it suggests as a method for adding the num-
bers a1, ..., an.
(n 1) (n 1) 2N
Thus (n 1) will grow in a manner proportional
to n. By more advanced arguments, it can be shown
that for the case of rounded arithmetic, (n 1) will
grow in a manner proportional to sqrt(n), which grows
much slower than n.
EXTENDED PRECISION ARITHMETIC
e
| |
We further assume the function f (x) changes sign on
[a, b], with
f (a) f (b) < 0
Algorithm Bisect(f, a, b, ). Step 1 : Define
1
c= (a + b)
2
Step 2 : If b c , accept c as our root, and then
stop.
Step 3 : If b c > , then check compare the sign of
f (c) to that of f (a) and f (b). If
sign(f (b)) sign(f (c)) 0
then replace a with c; and otherwise, replace b with
c. Return to Step 1.
f (x) x6 x 1 = 0
accurate to within = 0.001. With a graph, it is easy
to check that 1 < < 2. We choose a = 1, b =
2; then f (a) = 1, f (b) = 61, and the requirement
f (a) f (b) < 0 is satisfied. The results from Bisect are
shown in the table. The entry n indicates the iteration
number n.
n a b c bc f (c)
1 1.0000 2.0000 1.5000 0.5000 8.8906
2 1.0000 1.5000 1.2500 0.2500 1.5647
3 1.0000 1.2500 1.1250 0.1250 0.0977
4 1.1250 1.2500 1.1875 0.0625 0.6167
5 1.1250 1.1875 1.1562 0.0312 0.2333
6 1.1250 1.1562 1.1406 0.0156 0.0616
7 1.1250 1.1406 1.1328 0.0078 0.0196
8 1.1328 1.1406 1.1367 0.0039 0.0206
9 1.1328 1.1367 1.1348 0.0020 0.0004
10 1.1328 1.1348 1.1338 0.00098 0.0096
Recall the original example with the function.
h i h i
Nin
f (r) = Pin (1 + r) 1 Pout 1 (1 + r)Nout
Checking, we see that f (0) = 0. Therefore, with a
graph of y = f (r) on [0, 1], we see that f (x) < 0 if
we choose x very small, say x = .001. Also f (1) > 0.
Thus we choose [a, b] = [.001, 1]. Using = .000001
yields the answer
e = .02918243
with an error bound of
| cn|
Recall that
n
1
| cn| (b a)
2
Then ensure the error bound is true by requiring and
solving
n
1
(b a)
2
n
1
(4.5 1.6) .00005
2
Dividing and solving for n, we have
2.9
n log = 15.82
.00005
Therefore, we need to take n = 16 iterates.
ADVANTAGES AND DISADVANTAGES
(x ,f(x ))
0 0
x
x x
1 0
y = b - 1/x
x
1
x
x 1/b 2/b
0
(x ,f(x ))
0 0
Let x0 be an estimate of the root = 1/b. Then the
line tangent to the graph of y = f (x) at (x0, f (x0))
is given by
f (x0)
x1 = x0 0
f (x0)
For our particular case, this yields
1
b
x0
x1 = x0 = x0 bx20 + x0
1
x20
x1 = x0 (2 bx0)
x1 = x0 (2 bx0)
Note that no division is used in our final formula.
|r0| < 1
This is equivalent to saying
1 < 1 bx0 < 1
2
0 < x0 <
b
A look at a graph of f (x) b x1 will show the
reason for this condition. If x0 is chosen greater than
2 , then x will be negative, which is unacceptable.
b 1
The interval
2
0 < x0 <
b
is called the interval of convergence. With most
equations, we cannot find this exactly, but rather only
some smaller subinterval which guarantees convergence.
2 and r = be , we have
Using rn+1 = rn n n
ben+1 = (ben)2
en+1 = be2n
xn+1 = b ( xn)2
Methods with this type of error behaviour are said to
be quadratically convergent; and this is an especially
desirable behaviour.
To see why, consider the relative errors in the above.
Assume the initial guess x0 has been so chosen that
r0 = .1. Then
The iteration
f (x) x6 x 1 = 0
for its positive root . An initial guess x0 can be
generated from a graph of y = f (x). The iteration is
given by
x6n xn 1
xn+1 = xn , n0
6x5n 1
We use an initial guess of x0 = 1.5.
The column xn xn1 is an estimate of the error
xn1; justification is given later.
M ( xn+1) [M ( xn)]2
M ( xn+1) [M ( xn)]2
Then we want these quantities to decrease; and this
suggests choosing x0 so that
|M ( x0)| < 1
1 0
2f ()
| x0| < = 00
|M| f ()
If |M| is very large, then we may need to have a very
good initial guess in order to have the iterates xn
converge to .
ADVANTAGES & DISADVANTAGES
(x ,f(x ))
0 0
x x
1 2
x x
0
(x ,f(x ))
1 1
y
y=f(x)
(x0,f(x0))
(x ,f(x ))
1 1
x
x x x
2 1 0
(x1 x) f (x0) + (x x0) f (x1)
q(x) =
x1 x0
This is linear in x; and by direction evaluation, it satis-
fies the interpolation conditions of (*). We now solve
the equation q(x) = 0, denoting the root by x2. This
yields
f (x1) f (x0)
x2 = x1 f (x1)
x1 x0
We can now repeat the process. Use x1 and x2 to
produce another secant line, and then uses its root
to approximate . This yields the general iteration
formula
f (xn) f (xn1)
xn+1 = xnf (xn) , n = 1, 2, 3...
xn xn1
This is called the secant method for solving f (x) = 0.
Example We solve the equation
f (x) x6 x 1 = 0
which was used previously as an example for both the
bisection and Newton methods. The quantity xn
xn1 is used as an estimate of xn1. The iterate
x8 equals rounded to nine significant digits. As with
Newtons method for this equation, the initial iterates
do not converge rapidly. But as the iterates become
closer to , the speed of convergence increases.
0 f (xn) f (xn1)
f (xn)
xn xn1
CONVERGENCE ANALYSIS
| xn+1| c | xn|r
This looks very much like the Newton result
f 00()
xn+1 M ( xn)2 , M=
2f 0()
and c = |M |r1. Both the secant and Newton meth-
ods converge at faster than a linear rate, and they are
called superlinear methods.
The secant method converge slower than Newtons
method; but it is still quite rapid. It is rapid enough
that we can prove
|xn+1 xn|
lim =1
n | xn|
and therefore,
1. It is guaranteed to converge.
2. It has an error bound which will converge to zero
in practice.
3. For most problems f (x) = 0, with f (x) dieren-
tiable about the root , the method behaves like the
secant method.
4. In the worst case, it is not too much worse in its
convergence than the bisection method.
y = 1 + .5sin x
y=x
y = 3 + 2sin x
y=x
x
E1: x = 1 + .5 sin x
E2: x = 3 + 2 sin x
E1 E2
n xn xn
0 0.00000000000000 3.00000000000000
1 1.00000000000000 3.28224001611973
2 1.42073549240395 2.71963177181556
3 1.49438099256432 3.81910025488514
4 1.49854088439917 1.74629389651652
5 1.49869535552190 4.96927957214762
6 1.49870092540704 1.06563065299216
7 1.49870112602244 4.75018861639465
8 1.49870113324789 1.00142864236516
9 1.49870113350813 4.68448404916097
10 1.49870113351750 1.00077863465869
The above iterations can be written symbolically as
E1: xn+1 = 1 + .5 sin xn
E2: xn+1 = 3 + 2 sin xn
for n = 0, 1, 2, ... Why does one of these iterations
converge, but not the other? The graphs show similar
behaviour, so why the dierence.
axb a g(x) b
0
max g (x) < 1
axb
Then:
S1. The equation x = g(x) has a unique solution
in [a, b].
S2. For any initial guess x0 in [a, b], the iteration
x2 = g(x1)
belongs to [a, b]. This can be continued by induction
to show that every xn belongs to [a, b].
| xn| n | xn| , n0
With some extra manipulation, we can obtain the error
bound in S3.
The statements
xn+1 = 1 + .5 sin xn
will converge for E1.
xn+1 = 3 + 2 sin xn
will diverge for E2.
Corollary: Assume x = g(x) has a solution , and
further assume that both g(x) and g 0(x) are contin-
uous for all x in some interval about . In addition,
assume
0
g () < 1 (**)
Then any suciently small number > 0, the interval
[a, b] = [ , + ] will satisfy the hypotheses of
the preceding theorem.
xn = g(xn1), n = 1, 2, ...
Thus
xn ( xn1) (***)
with = g 0() and || < 1.
From
xn
lim = g 0()
n x
n1
we have
xn
xn1
Unfortunately this also involves the unknown root
which we seek; and we must find some other way of
estimating .
Step 1: Select x0
Step 2: Calculate
x1 = g(x0), x2 = g(x1)
Step3: Calculate
2 x2 x1
x3 = x2 + [x2 x1] , 2 =
1 2 x1 x0
Step 4: Calculate
x4 = g(x3), x5 = g(x4)
and calculate x6 as the extrapolate of {x3, x4, x5}.
Continue this procedure, ad infinatum.
x6 = 1.23 102
GENERAL COMMENTS
xn+1 = g(xn)
This shows the power of understanding the behaviour
of the error in a numerical process. From that un-
derstanding, we can often improve the accuracy, thru
extrapolation or some other procedure.
f (x) = (x )m h(x)
m times to obtain an equivalent formulation of what
it means for the root to have multiplicity m.
For an example, consider the case
f (x) = (x )3 h(x)
Then
f 0(x)= 3 (x )2 h(x) + (x )3 h0(x)
(x )2 h2(x)
h2(x) = 3h(x) + (x ) h0(x)
h2() = 3h() 6= 0
This shows is a root of f 0(x) of multiplicity 2.
f 00(x) = (x ) h3(x)
for a suitably defined h3(x) with h3() 6= 0, and is
a simple root of f 00(x).
Dierentiating a third time, we have
f 000() = h3() 6= 0
We can use this as part of a proof of the following:
is a root of f (x) of multiplicity m = 3 if and only if
f () = = f (m1)() = 0, f (m)() 6= 0
DIFFICULTIES OF MULTIPLE ROOTS
f (x) = (x )m h(x)
to obtain
(x )m h(x)
g(x) = x
m (x )m1 h(x) + (x )m h0(x)
(x ) h(x)
= x
mh(x) + (x ) h0(x)
Then we can use this to show
1 m1
g 0() = 1 =
m m
For m > 1, this is nonzero, and therefore Newtons
method is only linearly convergent:
m1
xn+1 ( xn) , =
m
Similar results hold for the secant method.
There are ways of improving the speed of convergence
of Newtons method, creating a modified method that
is again quadratically convergent. In particular, con-
sider the fixed point iteration formula
f (x)
xn+1 = g(xn), g(x) = x m 0
f (x)
in which we assume to know the multiplicity m of
the root being sought. Then modifying the above
argument on the convergence of Newtons method,
we obtain
1
g 0() = 1 m =0
m
and the iteration method will be quadratically conver-
gent.
y y
x x
f (m1)(x) = 0
since is a simple root of this equation.
STABILITY
F(()) = 0
for all small values of . Dierentiate this as a function
of and using the chain rule. Then we obtain
F0 (()) = f 0(())0()
+g(()) + g 0(())0() = 0
for all small . Substitute = 0, recall (0) = 0,
and solve for 0(0) to obtain
f 0(0)0(0) + g(0) = 0
g( )
0(0) = 0 0
f (0)
This then leads to
() (0) + 0(0)
g( ) (*)
= 0 0 0
f (0)
Example: In our earlier polynomial example, consider
the simple root 0 = 3. Then
36 .
() 3 = 3 15.2
48
With = .002, we obtain
.
(.002) 3 15.2(.002) = 3.0304
This is close to the actual root of 3.0331253.
y = cos(x)
y = p (x)
2
x
/4 /2
PURPOSES OF INTERPOLATION
y=tan(x)
1 1.3 x
y = tan(x)
y = p (x)
1
x
1.1 1.2
QUADRATIC INTERPOLATION
P2(xi) = yi, i = 0, 1, 2
for given data points (x0, y0) , (x1, y1) , (x2, y2). One
formula for such a polynomial follows:
The functions
(xx )(xx ) (xx )(xx )
L0(x) = (x x1)(x x2 ) , L1(x) = (x x0)(x x2 )
0 1 0 2 1 0 1 2
(xx )(xx )
L2(x) = (x x0)(x x1 )
2 0 2 1
are called Lagrange basis functions for quadratic in-
terpolation. They have the properties
(
1, i = j
Li(xj ) =
0, i =
6 j
for i, j = 0, 1, 2. Also, they all have degree 2. Their
graphs are on an accompanying page.
Introduce
R(x) = P2(x) Q(x)
From the properties of P2 and Q, we have deg(R)
2. Moreover,
R(xi) = P2(xi) Q(xi) = yi yi = 0
for all three node points x0, x1, and x2. How many
polynomials R(x) are there of degree at most 2 and
having three distinct zeros? The answer is that only
the zero polynomial satisfies these properties, and there-
fore
R(x) = 0 for all x
x1 x0 = x2 x1
EXAMPLE
We can show
yi = f (xi), i = 0, 1, ..., n
Using the divided dierences
Let
d1 = f [x0, x1]
d2 = f [x0, x1, x2]
..
dn = f [x0, ..., xn]
Then the formula
Pn(x) = f (x0) + f [x0, x1] (x x0)
+f [x0, x1, x2] (x x0) (x x1)
+f [x0, x1, x2, x3] (x x0) (x x1) (x x2)
+
+f [x0, ..., xn] (x x0) (x xn1)
can be written as
Pn(x) = f (x0) + (x x0) (d1 + (x x1) (d2 +
+(x xn2) (dn1 + (x xn1) dn) )
Thus we have a nested polynomial evaluation, and
this is quite ecient in computational cost.
ERROR IN LINEAR INTERPOLATION
(x x0) (x1 x) 0, x0 cx x1
(x x0) (x1 x) 0, x0 cx x1
and therefore
" #
log10 e
(x x0) (x1 x) 2 log10 x P1(x)
2x1
" #
log10 e
(x x0) (x1 x)
2x20
Pn(xi) = f (xi), i = 0, 1, , n
with distinct node points {x0, ..., xn} and a given
function f (x). Let [a, b] be a given interval on which
f (x) is (n + 1)-times continuously dierentiable; and
assume the points x0, ..., xn, and x are contained in
[a, b]. Then
(x x0) (x x1) (x xn) (n+1)
f (x)Pn(x) = f (cx)
(n + 1)!
with cx some point between the minimum and maxi-
mum of the points in {x, x0, ..., xn}.
(x x0) (x x1) (x xn) (n+1)
f (x)Pn(x) = f (cx)
(n + 1)!
As shorthand, introduce
For n = 2, we have
(x x0) (x x1) (x x2) (3)
f (x) P2(x) = f (cx)
3!
(*)
with cx some point between the minimum and maxi-
mum of the points in {x, x0, x1, x2}.
x1 = x0 + h, x2 = x1 + h
Further suppose we have x0 x x2, as we would
usually have when interpolating in a table of given
function values (e.g. log10 x). The quantity
2(x) = (x + h) x (x h)
using (x0, x1, x2) = (h, 0, h):
h
x
-h
In the formula (), however,
we do not know cx, and
therefore we replace f (3) (cx) with a maximum of
(3)
f (x) as x varies over x0 x x2. This yields
|2(x)|
(3)
|f (x) P2(x)| max f (x) (**)
3! x0xx2
If we want a uniform bound for x0 x x2, we must
compute
n(x) = x (x h) (x 2h) (x 1)
Our graphs are the cases of n = 2, ..., 9.
y n=2 y n=3
1
x
1
x
y n=4 y n=5
1
x
1
x
1
x
1
x
y n=8 y n=9
1
x
1
x
x
x x x x x x x
0 1 2 3 4 5 6
Using the following table
,
n Mn n Mn
1 1.25E1 6 4.76E7
2 2.41E2 7 2.20E8
3 2.06E3 8 9.11E10
4 1.48E4 9 3.39E11
5 9.01E6 10 1.15E12
y=P (x)
10
2
y=1/(1+x )
x
OTHER CHOICES OF NODES
f (xn+1) = Pn(xn+1)
+f [x0, ..., xn, xn+1] (xn+1 x0) (xn+1 xn)
In this formula, the number xn+1 is completely ar-
bitrary, other than being distinct from the points in
{x0, ..., xn}. To emphasize this fact, replace xn+1 by
x throughout the formula, obtaining
f (x) = Pn(x) + f [x0, ..., xn, x] (x x0) (x xn)
= Pn(x) + n(x) f [x0, ..., xn, x]
provided x 6= x0, ..., xn.
The formula
f (x) = Pn(x) + f [x0, ..., xn, x] (x x0) (x xn)
= Pn(x) + n(x) f [x0, ..., xn, x]
is easily true for x a node point. Provided f (x) is
dierentiable, the formula is also true for x a node
point.
This shows
n(x) (n+1)
f (x) Pn(x) = f (c)
(n + 1)!
Then
n(x) (n+1)
n(x) f [x0, ..., xn, x] = f (c)
(n + 1)!
f (n+1) (c)
f [x0, ..., xn, x] =
(n + 1)!
for some c between the smallest and largest of the
numbers in {x0, ..., xn, x}.
f (m) (c)
f [x0, ..., xm1, xm] =
m!
with c an unknown number between the smallest and
largest of the numbers in {x0, ..., xm}. This was given
in an earlier lecture where divided dierences were in-
troduced.
PIECEWISE POLYNOMIAL INTERPOLATION
x
1 2 3 4
x
1 2 3 4
x
1 2 3 4
Polynomial Interpolation
x
1 2 3 4
Let data points (x1, y1), ..., (xn, yn) be given, as let
Define
(
(x )3 , x
(x )3+ =
0, x
This is a cubic spline function on (, ) with the
single breakpoint x1 = .
Define
n
X 3
s(x) = p3(x) + aj x xj
+
j=1
with p3(x) some cubic polynomial. Then s(x) is a
cubic spline function on (, ) with breakpoints
{x1, ..., xn}.
Return to the earlier problem of choosing an interpo-
lating function s(x) to minimize the integral
Z xn 2
00
s (x) dx
x1
There is a unique solution to problem. The solution
s(x) is a cubic interpolating spline function, and more-
over, it satisfies
s00(x1) = s00(xn) = 0
Spline functions satisfying these boundary conditions
are called natural cubic spline functions, and the so-
lution to our minimization problem is a natural cubic
interpolatory spline function. We will show a method
to construct this function from the interpolation data.
Mi = s00(xi), i = 1, 2, 3, 4
Then on [x1, x2],
(x2 x) M1 + (x x1) M2
s00(x) = , x1 x x2
x2 x1
We can find s(x) by integrating twice:
For x2 x x3,
x2 = x1 + h, x3 = x1 + 2h, x4 = x1 + 3h
Then our earlier formulas simplify to
h 2h h
M1 + M2 + M3
6 3 6
y y2 y2 y1
= 3
h h
h 2h h
M2 + M3 + M4
6 3 6
y y3 y3 y2
= 4
h h
This gives us two equations in four unknowns. The
earlier boundary conditions on s00(x) gives us immedi-
ately
M1 = M4 = 0
Then we can solve the linear system for M2 and M3.
EXAMPLE
Similarly, for 2 x 3,
1 1 1 1
s(x) = (x 2)3 + (x 2)2 (x 1) +
12 4 3 2
and for 3 x 4,
1 1
s(x) = (x 4) +
12 4
x 1 2 3 4
y 1 1 1 1
2 3 4
y = 1/x
1 y = s(x)
0.8
0.6
0.4
0.2
0 x
0 0.5 1 1.5 2 2.5 3 3.5 4
x
1 2 3 4
xj = x1 + (j 1) h, j = 1, ..., n
We have that the interpolating spline s(x) on
xj x xj+1 is given by
3 3
xj+1 x Mj + x xj Mj+1
s(x) =
6h
xj+1 x yj + x xj yj+1
+
h
h h i
xj+1 x Mj + x xj Mj+1
6
for j = 1, ..., n 1.
To enforce continuity of s0(x) at the interior
n o node
points x2, ..., xn1, the second derivatives Mj must
satisfy the linear equations
h 2h h yj1 2yj + yj+1
Mj1 + Mj + Mj+1 =
6 3 6 h
for j = 2, ..., n 1. Writing them out,
h 2h h y 2y2 + y3
M1 + M2 + M3 = 1
6 3 6 h
h 2h h y 2y3 + y4
M2 + M3 + M4 = 2
6 3 6 h
..
h 2h h y 2yn1 + yn
Mn2 + Mn1 + Mn = n2
6 3 6 h
This is a system of n 2 equations in the n unknowns
{M1, ..., Mn}. Two more conditions must be imposed
on s(x) in order to have the number of equations equal
the number of unknowns, namely n. With the added
boundary conditions, this form of linear system can be
solved very eciently.
BOUNDARY CONDITIONS
with y10 , yn
0 given slopes for the endpoints of s(x) on
[x1, xn]. This has many quite good properties when
compared with the natural cubic interpolating spline;
but it does require knowing the derivatives at the end-
points.
x
1 2 3 4
yi = f (xi), j = 1, ..., n
Let sn(x) denote the cubic spline interpolating this
data and satisfying the not a knot boundary con-
ditions. Then it can be shown that for a suitable
constant c,
n En E 1 n/En
2
7 7.09E3
13 3.24E4 21.9
25 3.06E5 10.6
49 1.48E6 20.7
97 9.04E8 16.4
BEST APPROXIMATION
x
y=e
y=t (x)
1
y=m (x) 1
1
x
-1 1
0.0516
x
-1 1
y
0.00553
x
-1 1
-0.00553
[(b a)/2]n+1
(n+1)
n(f ) n
max f (x)
(n + 1)!2 axb
This error bound does not always become smaller with
increasing n, but it will give a fairly accurate bound
for many common functions f (x).
x
-1 1
T (x)
0
T (x)
1
T (x)
2
-1
T (x)
3
T (x)
4
y
x
-1 1
-1
The triple recursion relation. Recall the trigonomet-
ric addition formulas,
T0(x) = 1, T1(x) = x
Let n = 2. Then
|Tn(x)| 1, 1 x 1 (5)
for all n 0. Also, note that
max |xn| = 1
1x1
Thus xn is a monic polynomial whose size does not
change with increasing n.
e T4(x) 1 4 8x2 + 1)
(x) = T4(x) = 3
= (8x (4)
2 8
and the smallest value of (3) is 1/23 in this case. The
equation (4) defines implicitly the nodes {x0, x1, x2, x3}:
they are the roots of T4(x).
3 5 7
4 = , , , , . . .
2 2 2 2
3 5 7
= , , , ,...
8 8 8
8
3 5
x = cos , cos , cos ,... (5)
8 8 8
using cos() = cos().
3 5 7
x = cos , cos , cos , cos ,...
8 8 8 8
The first four values are distinct; the following ones
are repetitive. For example,
9 7
cos = cos
8 8
The first four values are
x
-1 1
-0.00624
. .
For comparison, E(t3) = 0.0142 and 3(ex) = 0.00553.
THE GENERAL CASE
e e1 .
0 = = 1.1752
2
.
1 = 3e1 = 1.1036
Using these values for 0 and 1, we denote the re-
sulting linear approximation by
1(x) = 0 + 1x
It is called the best linear approximation to ex in the
sense of least squares. For the error,
.
max |ex 1(x)| = 0.439
1x1
Errors in linear approximations of ex:
x
y=e
y=l (x) 1
1
x
-1 1
p(x) = 0 + 1x + + nxn
Z1 " #2
f (x) 0 1x
g(0, 1, . . . , n) dx
nxn
1
Find coecients 0, 1, . . . , n to minimize this in-
tegral. The integral g(0, 1, . . . , n) is a quadratic
polynomial in the n + 1 variables 0, 1, . . . , n.
P0(x) = 1
1 dn h 2 ni
Pn(x) = n
n x 1 , n = 1, 2, . . .
n!2 dx
For example,
P1(x) = x
1 2
P2(x) = 3x 1
2
1 3
P3(x) = 5x 3x
2
1 4 2
P4(x) = 35x 30x + 3
8
The Legendre polynomials have many special proper-
ties, and they are widely used in numerical analysis
and applied mathematics.
y
x
-1 1
P (x)
1
P (x)
2
P (x)
3
P (x)
4
-1
deg Pn = n, Pn(1) = 1, n0
(f p, f p) (3)
We begin by writing p(x) in the form
n
X
p(x) = j Pj (x)
j=0
n
X
p(x) = j Pj (x)
j=0
Substitute into (3), obtaining
ge ( 0
, 1, . . . , n) (f p, f p)
n
X n
X
= f j Pj , f iPi
j=0 i=0
Expand this into the following:
n (f, P )2
X j
ge = (f, f )
j=0 (Pj , Pj )
n " #2
X (f, Pj )
+ Pj , Pj j
j=0 (Pj , Pj )
Looking at this carefully, we see that it is smallest
when
(f, Pj )
j = , j = 0, 1, . . . , n
(Pj , Pj )
Looking at this carefully, we see that it is smallest
when
(f, Pj )
j = , j = 0, 1, . . . , n
(Pj , Pj )
The minimum for this choice of coecients is
n (f, P )2
X j
ge = (f, f )
j=0 (Pj , Pj )
We call
n (f, P )
X j
n(x) = Pj (x) (4)
j=0 (Pj , Pj )
the least squares approximation of degree n to f (x)
on [1, 1].
j 0 1 2 3
j 2.35040 0.73576 0.14313 0.02013
0.0112
x
-1 1
-0.00460
Consider evaluating
Z 1
x2
I= e dx
0
Use
1 t2 + + 1 tn +
et = 1 + t + 2! 1 tn+1ect
n! (n+1)!
2 1 x4 + + 1 x2n + 1 x2n+2edx
ex = 1 + x2 + 2! n! (n+1)!
y=f(x)
y=p1(x)
x
a b
Illustrating I T1(f )
Example.
Z /2 h i
sin x dx sin 0 + sin
0 4 2
=.
= 4 .785398
Error = .215
HOW TO OBTAIN GREATER ACCURACY?
y=f(x)
x
a=x x x b=x
0 1 2 3
Illustrating I T3(f )
THE TRAPEZOIDAL RULE
Rb
We want to approximate I = a f (x) dx using quadratic
interpolation of f (x). Interpolate f (x) at the points
{a, c, b}, with c = 12 (a + b). Also let h = 12 (b a).
The quadratic interpolating polynomial is given by
(x c) (x b) (x a) (x b)
P2(x) = 2
f (a) + 2
f (c)
2h h
(x a) (x c)
+ f (b)
2h2
Replacing f (x) by P2(x), we obtain the approximation
Z b Z b
f (x) dx P2(x) dx
a a
= h3 [f (a) + 4f (c) + f (b)] S2(f )
This is called Simpsons rule.
y
y=f(x)
x
a (a+b)/2 b
Illustration of I S2(f )
Example.
Z /2 h i
/2
sin x dx 3 sin 0 + 4 sin 4 + sin 2
0
.
= 1.00227987749221
Error = 0.00228
SIMPSONS RULE
Z /2
Approximate sin x dx. The Simpson rule results
0
are as follows.
n Sn(f ) Error Ratio
2 1.00227987749221 2.28E3
4 1.00013458497419 1.35E4 16.94
8 1.00000829552397 8.30E6 16.22
16 1.00000051668471 5.17E7 16.06
32 1.00000003226500 3.23E8 16.01
64 1.00000000201613 2.02E9 16.00
128 1.00000000012600 1.26E10 16.00
256 1.00000000000788 7.88E12 16.00
512 1.00000000000049 4.92E13 15.99
0 2x 00 2 + 6x2
f (x) = 2 , f (x) = 3
1 + x2 1 + x2
From a graph of f 00(x),
00
max f (x) = 2
0x2
Recall that b a = 2. Therefore,
h2 (b a)
EnT (f ) = f 00 (cn)
12
h2 (2) h2
T
En (f ) 2=
12 3
T h2 (b a) 00
En (f ) = f (cn)
12
h22 h2
T
En (f ) 2=
12 3
00
We bound f (cn) since we do not know cn, and
therefore we must assume the worst possible case, that
which makes the error formula largest. That is what
has been done above.
When do we have
T
En (f ) 5 106 (1)
To ensure this, we choose h so small that
h2
5 106
3
This is equivalent to choosing h and n to satisfy
h .003873
2
n= 516.4
h
Thus n 517 will imply (1).
DERIVING THE ERROR FORMULA
T h3 00 h3 00
En (f ) = f ( 1) f ( n)
12 12
This formula can be further simplified, and we will do
so in two ways.
Rewrite this error as
" #
3 00 00
h n f ( 1) + + f ( n)
EnT (f ) =
12 n
Denote the quantity inside the brackets by n. This
number satisfies
f 00(cn) = n
Recall also that hn = b a. Then
" #
3 00 00
h n f ( 1) + + f ( n)
EnT (f ) =
12 n
h2 (b a) 00
= f (cn)
12
This is the error formula given on the first slide.
AN ERROR ESTIMATE
T h2 h 00 00
i
En (f ) = f ( 1)h + + f ( n)h
12
The quantity
T h2 h 00 00
i
En (f ) = f ( 1)h + + f ( n)h
12
we have
T h2 h 0 0
i
e T (f )
En (f ) f (b) f (a) En
12
This is a computable estimate of the error in the nu-
merical integration. It is called an asymptotic error
estimate.
Example. Consider evaluating
Z
x e + 1 .
I(f ) = e cos x dx = = 12.070346
0 2
In this case,
f 0(x) = ex [cos x sin x]
f 00(x) = 2ex sin x
max f 00(x) = f 00 (.75) = 14. 921
0x
Then
h 2 (b a)
EnT (f ) = f 00 (cn)
12
h2
T
En (f ) 14.921 = 3.906h2
12
Also
h2
e T
En (f ) = 0 0
f () f (0)
12
h2 .
= [e + 1] = 2.012h2
12
In looking at the table (in a separate file on website)
for evaluating the integral I by the trapezoidal rule,
we see that the error EnT (f ) and the error estimate
Ee T (f ) are quite close. Therefore
n
h2
I(f ) Tn(f ) [e + 1]
12
h2
I(f ) Tn(f ) + [e + 1]
12
This last formula is called the corrected trapezoidal
rule, and it is illustrated in the second table (on the
separate page). We see it gives a much smaller er-
ror for essentially the same amount of work; and it
converges much more rapidly.
In general,
h2 0
I(f ) Tn(f ) f (b) f 0(a)
12
h2 0
I(f ) Tn(f ) f (b) f 0(a)
12
This is the corrected trapezoidal rule. It is easy to
obtain from the trapezoidal rule, and in most cases,
it converges more rapidly than the trapezoidal rule.
SIMPSONS RULE ERROR FORMULA
T h2 h 0 0
i
e T (f )
En (f ) f (b) f (a) En
12
EXAMPLE
Consider evaluating
Z 2
dx
I=
0 1 + x2
using Simpsons rule Sn(f ). How large should n be
chosen in order to ensure that
S
En (f ) 5 106
4h4
5 106
15
h .0658
n 30.39
Therefore, choosing n 32 will give the desired er-
ror bound. Compare this with the earlier trapezoidal
example in which n 517 was needed.
000 x2 1
f (x) = 24x 4
1+x 2
h 4
e S
En (f ) 000 000
f (2) f (0)
180
h4 144 4 4
= = h
180 625 3125
INTEGRATING sqrt(x)
To estimate p, we use
I2n In
2p
I4n I2n
To see this, write
I2n In (I In) (I I2n)
=
I4n I2n (I I2n) (I I4n)
Then substitute from the following and simplify:
c
I In p
n
c
I I2n p p
2 n
c
I I4n p p
4 n
Example. Consider the following table of numerical
integrals. What is its order of convergence?
n In In I 1 n Ratio
2
2 .28451779686
4 .28559254576 1.075E 3
8 .28570248748 1.099E 4 9.78
16 .28571317731 1.069E 5 10.28
32 .28571418363 1.006E 6 10.62
64 .28571427643 9.280E 8 10.84
It appears
. .
2p = 10.84, p = log2 10.84 = 3.44
We could now combine this with Richardsons error
formula to estimate the error:
1
I In p In I 1 n
2 1 2
For example,
1
I I64 [9.280E 8] = 9.43E 9
9.84
PERIODIC FUNCTIONS
w1 = w2 = 1, 1 ,
x1 = sqrt(3) 1
x2 = sqrt(3)
This yields the formula
Z 1
1
f (x) dx f sqrt(3) 1
+ f sqrt(3) (1)
1
We say it has degree of precision equal to 3 since it
integrates exactly all polynomials of degree 3. We
can verify directly that it does not integrate exactly
f (x) = x4.
Z 1
x4 dx = 25
1
1
f sqrt(3) 1
+ f sqrt(3) = 29
EXAMPLE Integrate
Z 1
dx .
= log 2 = 0.69314718
1 3 + x
The formula (1) yields
1 1
+ = 0.69230769
3 + x1 3 + x2
Error = .000839
THE GENERAL CASE
wi > 0
for all n > 0. This is considered a very desirable
property from a practical point of view. Moreover, it
permits us to develop a useful error formula.
CHANGE OF INTERVAL
OF INTEGRATION
x = (1 + t) 2
Then
Z Z 1
F (x) dx = 2
F (1 + t) 2 dt
0 1
EXAMPLE Consider again the integrals used as ex-
amples in Section 5.1:
Z 1
2 .
I (1) = ex dx = .74682413281234
0
Z 4
dx
I (2) = 2
= arctan 4
01+x
Z 2
dx 2
I (3) = =
0 2 + cos x sqrt(3)
f (2n)(cn)
En(f ) = en
(2n)!
22n+1 (n!)4
en = 2 n
(2n + 1) [(2n)!] 4
for some a cn b.
m m(f ) m m(f )
1 5.30E 2 6 7.82E 6
2 1.79E 2 7 4.62E 7
3 6.63E 4 8 9.64E 8
4 4.63E 4 9 8.05E 9
5 1.62E 5 10 9.16E 10
n I In Ratio
2 7.22E 3
4 1.16E 3 6.2
8 1.69E 4 6.9
16 2.30E 5 7.4
32 3.00E 6 7.6
64 3.84E 7 7.8
Numerically,
x1 = .2654117024, x2 = .8115113746
w1 = .3297238792, w2 = .4202761208
The formula
Z 1
1
x 3 f (x) dx w1f (x1) + w2f (x2) (4)
0
has degree of precision 3.
EXAMPLE Consider evaluating the integral
Z 1
1
x3 cos x dx (5)
0
In applying (4), we take f (x) = cos x. Then
0 f (x + h) f (x)
f (x) Dhf (x)
h
is called a forward dierence formula for approximat-
ing f 0(x). In contrast, the approximation
f (x) f (x h)
f 0(x) , h>0 (4)
h
is called a backward dierence formula for approxi-
mating f 0(x). A similar derivation leads to
f (x) f (x h) h
f 0(x) = f 00(c) (5)
h 2
for some c between x and x h. The accuracy of
the backward dierence formula (4) is essentially the
same as that of the forward dierence formula (1).
0 0 f (x1 + h) f (x1 h)
f (x1) P2(x1) = Dhf (x1)
2h
(7)
For the error,
f (x + h) f (x h) h2
1 1
f 0(x1) = f 000(c2) (8)
2h 6
with x1 h c2 x1 + h.
A + B + C = 0: coecient of f (t)
h(A C) = 0: coecient of f 0(t)
h2
(A + C) = 1: coecient of f 00(t)
2
Solution:
1 2
A = C = 2, B= 2 (13)
h h
This determines
(2) f (t + h) 2f (t) + f (t h)
Dh f (t) = (14)
h2
(2) 00 h2 (4)
Dh f (t) f (t) + f (t)
12
Thus
00 f (t + h) 2f (t) + f (t h) h2 (4)
f (t) 2
f (t)
h 12
(15)
Example. Let f (x) = cos(x), t = 16 ; use (14) to
calculate f 00(t) = cos 16 .
(2)
h Dh f Error Ratio
0.5 0.84813289 1.789E 2
0.25 0.86152424 4.501E 3 3.97
0.125 0.86489835 1.127E 3 3.99
0.0625 0.86574353 2.819E 4 4.00
0.03125 0.86595493 7.048E 5 4.00
EFFECTS OF ERROR IN FUNCTION VALUES
Recall
(2) f (x2) 2f (x1) + f (x0) 00(x )
Dh f (x1) = f 1
h2
with x2 = x1 + h, x0 = x1 h. Assume the ac-
tual function values used in the computation contain
data error, and denote these values by fb0, fb1, and fb2.
Introduce the data errors:
b
i = f (xi) fi, i = 0, 1, 2 (16)
The actual quantity calculated is
fb 2fb + fb
(2)
c f (x ) = 2 1 2
D h 1 2
(17)
h
For the error in this quantity, replace fbj by f (xj ) j ,
j = 0, 1, 2, to obtain the following:
(2)
c f (x ) = f 00(x )
f 00(x1) Dh 1 1
[f (x2) 2] 2[f (x1) 1] + [f (x0) 0]
h2
" #
f (x2) 2f (x1) + f (x0)
= f 00(x1)
h2
2 1+ 0
+ 2
h2
121 h2f (4)(x ) + 2 2 1 + 0
1
h2
(18)
The last line uses (15).
(2)
c (x ) for f (x) = cos(x) at
Example. Calculate D h 1
x1 = 16 . To show the eect of rounding errors, the
values fbi are obtained by rounding f (xi) to six signif-
icant digits; and the errors satisfy
| i| 5.0 107 = , i = 0, 1, 2
(2)
c f (x )
Other than these rounding errors, the formula D h 1
is calculated exactly. In this example, the bound (19)
becomes
00 (2)
c f (x ) h cos 1
f (x ) D 1 2
1 h 1 12 6
+ h42 (5 107)
. 2 2106
= 0.0722h + h2 E(h)
.
For h = 0.125, the bound E(h) = 0.00126, which is
not too far o from the actual error given in the table.
(2)
c f (x )
h Dh 1 Error
0.5 0.848128 0.017897
0.25 0.861504 0.004521
0.125 0.864832 0.001193
0.0625 0.865536 0.000489
0.03125 0.865280 0.000745
0.015625 0.860160 0.005865
0.0078125 0.851968 0.014057
0.00390625 0.786432 0.079593
x1 = 1, x2 = 0, x3 = 2
In general we want to solve n equations in n un-
knowns. For this, we need some simplifying nota-
tion. In particular we introduce arrays. We can think
of these as means for storing information about the
linear system in a computer. In the above case, we
introduce
1 2 3 5 1
A = 1 0 1 , b = 3 , x= 0
3 1 3 3 2
These arrays completely specify the linear system and
its solution. We also know that we can give mean-
ing to multiplication and addition of these quantities,
calling them matrices and vectors. The linear system
is then written as
Ax = b
with Ax denoting a matrix-vector multiplication.
x1 = = xn = 1
This has the associated arrays
3 1 0 0 2 1
1 3 1 0 1 1
..
A= ... , b = , x = ..
.. 1
3 1 1 1
0 1 3 2 1
SOLVING LINEAR SYSTEMS
b = ones(3, 1);
Consider setting up the matrices for the system
Ax = b with
b = ones(n, 1);
MATRICES
h i h i
Let A = ai,j and B = bi,j be matrices of order
m n. Then
C =A+B
is another matrix of order m n, with
EXAMPLE.
1 2 1 1 2 1
3 4 + 1 1 = 2 5
5 6 1 1 6 5
MULTIPLICATION BY A CONSTANT
a1,1 a1,n ca1,1 ca1,n
..
c .. ... =
.. ... ..
am,1 am,n cam,1 cam,n
EXAMPLE.
1 2 5 10
5 3 4 = 15 20
5 6 25 30
" # " #
a b a b
(1) =
c d c d
THE ZERO MATRIX 0
A+0=0+A=A
The zero matrix 0mn acts in the same role as does
the number zero when doing arithmetic with real and
complex numbers.
EXAMPLE.
" # " # " #
1 2 0 0 1 2
+ =
3 4 0 0 3 4
We denote by A the solution of the equation
A+B =0
It is the matrix obtained by taking the negative of all
of the entries in A. For example,
" # " # " #
a b a b 0 0
+ =
c d c d 0 0
" # " # " #
a b a b a b
= = (1)
c d c d c d
" # " #
a1,1 a1,2 a1,1 a1,2
=
a2,1 a2,2 a2,1 a2,2
MATRIX MULTIPLICATION
h i h i
Let A = ai,j have order m n and B = bi,j have
order n p. Then
C = AB
is a matrix of order m p and
ci,j = Ai,B,j
= ai,1b1,j + ai,2b2,j + + ai,nbn,j
or equivalently
b1,j
h i b2,j
ci,j = ai,1 ai,2 ai,n ..
bn,j
= ai,1b1,j + ai,2b2,j + + ai,nbn,j
EXAMPLES
" # 1 2 " #
1 2 3 22 28
3 4 =
4 5 6 49 64
5 6
1 2 " # 9 12 15
1 2 3
3 4 = 19 26 33
4 5 6
5 6 29 40 51
a1,1 a1,n x1 a1,1x1 + + a1,nxn
. ... .. . ..
. . =
an,1 an,n xn an,1x1 + + an,nxn
Thus we write the linear system
a1,1x1 + + a1,nxn = b1
..
an,1x1 + + an,nxn = bn
as
Ax = b
THE IDENTITY MATRIX I
AIn = A, ImA = A
The identity matrix I acts in the same role as does
the number 1 when doing arithmetic with real and
complex numbers.
THE MATRIX INVERSE
AB = BA = I
It can be shown that if an inverse exists for A, then
it is unique.
EXAMPLES. If ad bc 6= 0, then
" #1 " #
a b 1 d b
=
c d ad bc c a
" #1 " #
1 2 1 1
=
2 2 1 12
1
1 1 1
2 3 9 36 30
1 1 1
2 3 4 = 36 192 180
1 1 1 30 180 180
3 4 5
Recall the earlier theorem on the solution of linear
systems Ax = b with A a square matrix.
4. det (A) 6= 0.
5. A1 exists.
EXAMPLE
1 2 3
det 4 5 6 = 0
7 8 9
Therefore, the linear system
1 2 3 x1 b1
4 5 6 x2 = b2
7 8 9 x3 b3
is not always solvable, the coecient matrix does not
have an inverse, and the homogeneous system Ax = 0
has a solution other than the zero vector, namely
1 2 3 1 0
4 5 6 2 = 0
7 8 9 1 0
The arithmetic properties of matrix addition and mul-
tiplication are listed on page 248, and some of them
require some work to show. For example, consider
showing the distributive law for matrix multiplication,
(AB) C = A (BC)
with A, B, C matrices of respective orders mn, np,
and p q, respectively. Writing this out, we want to
show
p
X n
X
(AB)i,k Ck,l = Ai,j (BC)j,l
k=1 j=1
for 1 i m, 1 l q.
Having obtained
(1) (1) (1) (1)
a a1,2 a1,n b
1,1 1
(2) (2) (2)
(2) (2) 0 a2,2 a2,n b
[A | b ] =
.
2
.
. .. ... .. .
(2)
(2) (2) b
0 an,2 an,n n
(2)
what if a2,2 = 0? Then we proceed as before.
Among the elements
(2) (2) (2)
a2,2, a3,2, ..., an,2
pick the one of largest size:
(2) (2)
a = max a
k,2
j=2,...,n j,2
n(n 1)
Divisions
2
n(n 1)(2n 1)
Additions
6
n(n 1)(2n 1)
Multiplications
6
2. b g:
n(n 1)
Additions
2
n(n 1)
Multiplications
2
3. Solving U x = g:
Divisions n
n(n 1)
Additions
2
n(n 1)
Multiplications
2
[A | I]
a1,1 a1,2 a1,3 1 0 0
a2,1 a2,2 a2,3 0 1 0
a3,1 a3,2 a3,3 0 0 1
Then we proceed as before with a single linear system,
only now we have three right hand sides. We first
introduce zeros into positions 2 and 3 of column 1;
and then we introduce zero into position 3 of column
2. Following that, we will need to solve three upper
triangular linear systems by back substitution. See the
numerical example on pages 264-266.
MATRIX INVERSE EXAMPLE
1 1 2
A= 1 1 1
1 1 0
1
1 2 1 0 0
1 1 1 0 1 0
1 1 0 0 0 1
m2,1 = 1 m3,1 = 1
1
1 2 1 0 0
0 0 3 1 1 0
0 2 2 1 0 1
1
1 2 1 0 0
0 2 2 1 0 1
0 0 3 1 1 0
1
1 2 1 0 0
0 2 2 1 0 1
0 0 3 1 1 0
Then by using back substitution to solve for each col-
umn of the inverse, we obtain
1 1 1
6 3 2
1 1 1
A = 6 3 12
13 1
3 0
COST OF MATRIX INVERSION
In 1
h calculating A , we i are solving for the matrix X =
X,1, X,2, . . . , X,n where
h i
A X,1, X,2, . . . , X,n = [e1, e2, . . . , en]
and ej is column j of the identity matrix. Thus we
are solving n linear systems
AX,1 = e1, AX,2 = e2, . . . , AX,n = en (1)
all with the same coecient matrix. Returning to
the earlier operation counts for solving a single linear
system, we have the following.
Cost of triangulating A: approx. 23 n3 operations
Cost of solving Ax = b: 2n2 operations
Thus solving the n linear systems in (1) costs approx-
imately
2 n3 + n 2n2 = 8 n3 operations, approximately
3 3
It costs approximately four times as many operations
to invert A as to solve a single system. With attention
to the form of the right-hand sides in (1) this can be
reduced to 2n3 operations.
MATLAB MATRIX OPERATIONS
x = A\b
In Matlab, the command
inv (A)
will calculate the inverse of A.
m4,3 = 1
This reduces the augmented matrix to
2 1 1 2 5
0 3 1 2 1
0 0 1 4 11
0 0 0 2 6
Return this to the familiar linear system
2x1 + x2 x3 + 2x4 = 5
3x2 x3 + 2x4 = 1
x3 + 4x4 = 11
2x4 = 6
Solving by back substitution, we obtain
x4 = 3, x3 = 1, x2 = 2, x1 = 1
There is a surprising result involving matrices asso-
ciated with this elimination process. Introduce the
upper triangular matrix
2 1 1 2
0 3 1 2
U =
0 0 1 4
0 0 0 2
which resulted from the elimination process. Then
introduce the lower triangular matrix
1 0 0 0 1 0 0 0
m2,1 1 0 0 2 1 0 0
L= =
m3,1 m3,2 1 0 1 2 1 0
m4,1 m4,2 m4,3 1 2 3 1 1
This uses the multipliers introduced in the elimination
process. Then
A = LU
2 1 1 2 1 0 0 0 2 1 1 2
4 5 3 6 2 1 0 0 0 3 1 2
=
2 5 2 6 1 2 1 0 0 0 1 4
4 11 4 8 2 3 1 1 0 0 0 2
In general, when the process of Gaussian elimination
without pivoting is applied to solving a linear system
Ax = b, we obtain A = LU with L and U constructed
as above.
LU = P A
where L and U are constructed as before and P is a
permutation matrix. For example, consider
0 0 1 0
1 0 0 0
P =
0 0 0 1
0 1 0 0
Then
0 0 1 0 a1,1 a1,2 a1,3 a1,4 A3,
1 0 0 0 a2,1 a2,2 a2,3 a2,4 A1,
PA = =
0 0 0 1 a3,1 a3,2 a3,3 a3,4 A4,
0 1 0 0 a4,1 a4,2 a4,3 a4,4 A2,
0 0 1 0 a1,1 a1,2 a1,3 a1,4
1 0 0 0 a a2,2 a2,3 a2,4
2,1
PA =
0 0 0 1 a3,1 a3,2 a3,3 a3,4
0 1 0 0 a4,1 a4,2 a4,3 a4,4
A3,
A1,
=
A4,
A2,
The matrix P A is obtained from A by switching around
rows of A. The result LU = P A means that the LU -
factorization is valid for the matrix A with its rows
suitably permuted.
Consequences: If we have a factorization
A = LU
with L lower triangular and U upper triangular, then
we can solve the linear system Ax = b in a relatively
straightforward way.
LU x = b
Write this as a two stage process:
Lg = b, Ux = g
The system Lg = b is a lower triangular system
g1 = b1
2,1g1 + g2 = b2
3,1g1 + 3,2g2 + g3 = b3
..
n,1g1 + n,n1gn1 + gn = bn
We solve it by forward substitution. Then we solve
the upper triangular system U x = g by back substi-
tution.
VARIANTS OF GAUSSIAN ELIMINATION
u1,j = a1,j , j = 1, 2, 3, 4
Then multiplying rows 2, 3, 4 times the first column
of U yields
i,1u1,1 = ai,1, i = 2, 3, 4
n o
and we can solve for 2,1, 3,1, 4,1 . We can con-
tinue this process, finding the second row of U and
then the second column of L, and so on. For example,
to solve for 4,3, we need to solve for it in
b1 c1 0 0 0
a2 b2 c2 0
..
0 a3 b3 c3
A=
...
..
an1 bn1 cn1
0 an bn
These occur very commonly in the numerical solution
of partial dierential equations, as well as in other ap-
plications (e.g. computing interpolating cubic spline
functions).
Ax = f
or
LU x = f
instead solve the two triangular systems
Lg = f, Ux = g
Solving Lg = f :
g1 = f1
gj = fj j gj1, j = 2, ..., n
Solving Ux = g:
gn
xn =
n
gj cj xj+1
xj = , j = n 1, ..., 1
j
See the numerical example on page 278.
OPERATIONS COUNT
Factoring A = LU .
Additions: n1
Multiplications: n 1
Divisions: n1
Solving Lz = f and U x = z:
Additions: 2n 2
Multiplications: 2n 2
Divisions: n
Thus the total number of arithmetic operations is ap-
proximately 3n to factor A; and it takes about 5n to
solve the linear system using the factorization of A.
[L, U, P ] = lu(X)
returns the lower triangular matrix L, upper triangular
matrix U , and permutation matrix P so that
P X = LU
ESTIMATION OF ERROR
b
r = b Ax
b Then
a quantity called the residual for x.
r = b
b Ax
= Ax Axb
= b
A (x x)
b =
xx A1r
b is the exact solution of
or the error e = x x
Ae = r
Thus we can solve this to obtain an estimate eb of our
error e.
EXAMPLE. Recall the linear system
.729x1 + .81x2 + .9x3 = .6867
x1 + x2 + x3 = .8338
1.331x1 + 1.21x2 + 1.1x3 = 1.000
The true solution, rounded to four significant digits,
is
x = [.2245, .2814, .3279]T
Using Gaussian elimination without pivoting and four
digit decimal floating point arithmetic with rounding,
the resulting solution and error are
b = [.2251, .2790, .3295]T
x
e = [.0006, .0024, .0016]T
Then
x = 0, y = .1
The perturbed system
b + 10yb = 1.01
7x
b +
5x 7yb = .69
has the solution
b = .17,
x yb = .22
Why is there such a dierence?
Consider the following Hilbert matrix example.
1 1
1 2 3 1.000 .5000 .3333
1
H3 = 1 1 , f =
H .5000 .3333 .2500
2 3 4 3
1 1 1 .3333 .2500 .2000
3 4 5
9 36 30
H31 = 36 192 180
30 180 180
9.062 36.32 30.30
f1 =
H 36.32 193.7 181.6
3
30.30 181.6 181.5
We have changed H3 in the fifth decimal place (by
rounding the fractions to four decimal digits). But we
have ended with a change in H31 in the third decimal
place.
VECTOR NORMS
Let
h iT
x= 1 2 3
Then
kxk1 = 6
.
kxk2 = sqrt(14) = 3.74
kxk = 3
PROPERTIES
EXAMPLE. Let
" # " # " #
1 2 1 1
A= , z= , Az =
5 7 1 2
Then
r (B) = max ||
(B)
The set (B) is called the spectrum of B, and it
contains all the eigenvalues of B. The number r (B)
is called the spectral radius of B. There are easily
computable bounds for kAk, but the norm itself is
dicult to compute.
ERROR BOUNDS
Let Ax = b and Ax b = b
b, and we are interested in
cases with b bb. Then
b
b
kx xkv 1 b b
kAk A v
kxkv kbkv
where kkv is some vector norm and kk is an associ-
ated matrix norm.
Proof :
b = bb
Ax Ax b
b = bb
A (x x) b
b =
xx A 1 bbb
1 b
bkv =
kx x A bb
v
1
A b bb
v
1
b v
kx xk A b b b
v
kxkv kxkv
Rewrite this as
b
b v
kx xk 1 b b
kAk A v
kxkv kAk kxkv
Since Ax = b, we have
The quantity
1
cond(A) = kAk A
is called a condition number for the matrix A.
EXAMPLE. Recall the earlier example:
" # " #
7x1 + 10x2 = 1 x1 0
=
5x1 + 7x2 = .7 x2 .1
" # " #
b1 + 10x
7x b2 = 1.01 b1
x .17
=
b1 +
5x b2 =
7x .69 b2
x .22
Then
b
kbk = 1, b b = .01
kxk = .1, b = .17
kx xk
" # " #
7 10 7 10
A= , A1 =
5 7 5 7
1
kAk = 17, A = 17, cond(A) = 289
b
b b b
kx xk
= 1.7 = 170 cond(A)
kxk kbk .01
b
b
kx xk b b
cond(A)
kxk kbk
The result
b
b
kx xkv b b
cond(A) v
kxkv kbkv
has another aspect which we do not prove here. Given
any matrix A, then there is a vector b and a nearby
perturbation bb for which the above inequality can be
replaced by equality. Moreover, there is no simple way
to know of these b and bb in advance. For such b and
b
b, we have
b
b
kx xkv b b
cond(A) = v
kxkv kbkv
Thus if cond(A) is very large, say 108, then there are
b and bb for which
b v
kx xk b b
b
= 10 8 v
kxkv kbkv
We call such systems ill-conditioned.
Recall an earlier discussion of error in Gaussian elim-
ination. Let x b denote an approximate solution for
Ax = b; perhaps x b is obtained by Gaussian elimina-
tion. Let x denote the exact solution. Then introduce
the residual
b
r = b Ax
We then obtained x x b = A1r. But we could also
have discussed this as a special case of our present
results. Write
Ax = b and b =br b
Ax b
Then
b
b
kx xkv b b
cond(A) v
kxkv kbkv
becomes
b v
kx xk krkv
cond(A)
kxkv kbkv
ILL-CONDITIONED EXAMPLE
Rewrite Ax = b as
Nx = b + P x (1)
with A = N P a splitting of A. Choose N to be
nonsingular. Usually we want N z = f to be easily
solvable for arbitray f . The iteration method is
0 19 19
2 3
M = 10 0 10
3 4
11 0
11
7 .
kM k = = 0.636
11
This is consistent with the earlier table of values, al-
though the actual convergence rate was better than
predicted by (3).
EXAMPLE. For the earlier example with the Gauss-
Seidel method,
(k+1) (k) (k)
x1 = 19 b1 x2 x3
(k+1) 1 b 2x (k+1) (k)
x2 = 10 2 1 3x3
(k+1) 1 b 3x(k+1) 4x(k+1)
x3 = 11 3 1 2
1
9 0 0 0 1 1
M = 2 10 0 0 0 3
3 4 11 0 0 0
0 19 19
1 5
= 0 45 18
1
0 45 13
99
kMk = 0.3
This too is consistent with the earlier numerical re-
sults.
DIAGONALLY DOMINANT MATRICES
N e(k+1) = P e(k)
and write it in component form for the Gauss-Seidel
method:
i
X Xn
(k+1) (k)
ai,j ej = ai,j ej , 1in
j=1 j=i+1
i1
X ai,j (k+1) Xn a
(k+1) i,j (k)
ei = ej ej (5)
a
j=1 i,i a
j=i+1 i,i
Introduce
i1
n a
X ai,j X i,j
i = , i = , 1in
ai,i ai,i
j=1 j=i+1
with 1 = n = 0. Taking bounds in (5),
(k+1)
e e(k+1) + e(k)
, i = 1, ..., n (6)
i i i
(k+1) (k)
e e
1
Define
i
= max
i 1 i
Then
(k+1) (k)
e e
For A diagonally dominant, it can be shown that
kMk (7)
where kM k is for the definition of M for the Jacobi
method, given earlier in (4) as
n
X ai,j
kM k = max = max (i + i) < 1
1in ai,i 1in
j=1
j6=i
Consequently, for A diagonally dominant, the Gauss-
Seidel method also converges and it does so more
rapidly than the Jacobi method in most cases.
Since
kMk = kN 1P k kN 1k kP k,
kMk < 1 is satisfied if N satisfies
kN 1k kP k < 1
Using P = N A, this can be rewritten as
1
kA N k <
kN 1k
We also want to choose N so that systems N z = f
is easily solvable.
r(0) = b Ax(0)
Since Ax = b for the true solution x,
r(0) = Ax Ax(0)
= A(x x(0)) = Ae(0)
with e(0) = x x(0). Let e(0) be the solution of
N e(0) = r(0)
and then define
For k = 0, 1, . . . , define
r(k) = b Ax(k)
N e(k) = r(k)
x(k+1) = x(k) + e(k)
This is the general residual correction method.
xi yi xi yi
1 :0 1:945 3:2 0:764
1 :2 1:253 3:4 0:532
1 :4 1:140 3:6 1:073
1 :6 1:087 3:8 1:286
1 :8 0:760 4:0 1:502
2 :0 0:682 4:2 1:582
2 :2 0:424 4:4 1:993
2 :4 0:012 4:6 2:473
2 :6 0:190 4:8 2:503
2 :8 0:452 5:0 2:322
3 :0 0:337
i = f ( xi ) yi
denote the unknown measurement errors. We want to
use the points (x1; y1); : : : ; (xn; yn) to determine the
analytic relationship (1) as accurately as possible.
Often we suspect that the unknown function f (x) lies
within some known class of functions, for example, poly-
nomials. Then we want to choose the member of that
class of functions that will best approximate the unknown
function f (x), taking into account the experimental er-
rors f ig.
max jf (xi) yi j
1 i n
which is the maximum error of approximation.
All of these can be used, but #2 is the favorite, and we
now comment on why. To do so we need to understand
more about the nature of the unknown errors f ig.
x 1 = x2 = = xn = constant
and this is false for our case.
For our example in Table 1,
n
X n
X
xi = 63:0 x2i = 219:8
i=1 i=1
Xn Xn
yi = 9:326 xiyi = 60:7302
i=1 i=1
Using this in (7), the linear system becomes
21b + 63:0m = 9:326
63:0b + 219:8m = 60:7302
The solution is
: :
b= 2:74605 m = 1:06338
xi yi xi yi
0:00 0:486 0:55 1:102
0:05 0:866 0:60 1:099
0:10 0:944 0:65 1:017
0:15 1:144 0:70 1:111
0:20 1:103 0:75 1:117
0:25 1:202 0:80 1:152
0:30 1:166 0:85 1:265
0:35 1:191 0:90 1:380
0:40 1:124 0:95 1:575
0:45 1:095 1:00 1:857
0:50 1:122
A better basis.
'1(x) = T0(2x 1) 1
'2(x) = T1(2x 1) = 2x 1
'3(x) = T2(2x 1) = 8x2 8x + 1
'4(x) = T3(2x 1) = 32x3 48x2 + 18x 1
The values fa1; a2; a3; a4g are completely dierent than
in the representation (17).
The linear system (13) is again denoted by La = b:
2 3
21 0 5 :6 0
6 0 7:7 0 2:8336 7
6 7
L=6 7
4 5:6 0 10:4664 0 5
0 2:8336 0 11:01056
b = [24:118; 2:351; 6:01108; 1:523576]T
The solution is
Av = v
then we say is an eigenvalue of A and v is an associated
eigenvector. Note that if v is an eigenvector, then any
nonzero multiple of v is also an eigenvector for the same
eigenvalue .
Example: Let
" #
1:25 0:75
A= (1)
0:75 1:25
The eigenvalue-eigenvector pairs for A are
" #
1
1 = 2; v (1) =
1
" # (2)
1
2 = 0 :5 ; v (2) =
1
Eigenvalues and eigenvectors are often used to give addi-
tional intuition to the function
F (x) = Ax; x 2 Rn or Cn
Example. The eigenvectors in the preceding example (2)
form a basis for R2. For x = [x1; x2]T ,
F (x) = Ax; x 2 R2
can be written as
Figure 2: Decomposition of Ax
How to calculate and v ? Rewrite Av = v as
( I A)v = 0; v 6= 0 (3)
a homogeneous system of linear equations with the coef-
cient matrix I A and the nonzero solution v . This
can be true if and only if
f( ) det( I A) = 0
The function f ( ) is called the characteristic polynomial
of A, and its roots are the eigenvalues of A.
Assuming A has order n,
f( ) = n + n 1 n 1 + + 1 + 0 (4)
For the case n = 2,
" #
a11 a12
f ( ) = det
a21 a22
=( a11)( a22) a21a12
= 2 (a11 + a22) + a11a22 a21a12
The formula (4) shows that a matrix A of order n can
have at most n distinct eigenvalues.
Example. Let
2 3
7 13 16
6 7
A=4 13 10 13 5 (5)
16 13 7
Then
2 3
+7 13 16
6 7
f ( ) = det 4 13 + 10 13 5
16 13 +7
= 3 + 24 2 405 + 972
is the characteristic polynomial of A.
1 = 36; 2 = 9; 3 =3 (6)
Finding an eigenvector. For = 36, nd an
associated eigenvector v by solving ( I A)v = 0, which
becomes
( 36I A)v = 0
2 32 3 2 3
29 13 16 v1 0
6 76 7 6 7
4 13 26 13 5 4 v2 5 = 4 0 5
16 13 29 v3 0
If v1 = 0, then the only solution is v = 0. Thus v1 6= 0,
and we arbitrarily choose v1 = 1. This leads to the
system
13v2 + 16v3 = 29
26v2 13v3 = 13
13v2 29v3 = 16
The solution is v2 = 1, v3 = 1. Thus the
eigenvector v for = 36 is
v (i)Tv (j) = 0; 1 i; j n; i 6= j
v (i)Tv (i) = 1; 1 i n
(iii) For each column vector x = [x1; x2; : : : ; xn]T, there
is a unique choice of constants c1; : : : ; cn for which
x = c1v (1) + + cnv (n)
The constants are given by
n
X (i)
ci = xj vj = xTv (i); 1 i n
j=1
(i) (i)
where v (i) = [v1 ; : : : ; vn ]T.
Easily, U TU = I .
Also,
U TAU
2 3 2 3
p1 p1 " # p1 p 1
6 2 2 7 1:25 0:75 6 2 2 7
= 4 1 5 4 1 5
p p1 0:75 1:25 p p1
2 2 2 2
" #
2 0
=
0 0:5
as specied in (9).
NONSYMMETRIC MATRICES
v = c [1; 0; 0]T
for some c 6= 0.
v = [1; 0; 0]T
for the three equal eigenvalues 1 = 2 = 3 = a.
THE POWER METHOD
j 1j > j 2j j nj (10)
Denote the eigenvector for 1 by v (1). We dene an
iteration method for computing improved estimates of
(1)
1 and v .
w(1) = Az (0)
Let 1 be the maximum component of w(1), in size. If
there is more than one such component, choose the rst
such component as 1. Then dene
1
z (1) = w(1)
1
Repeat the process iteratively. Dene
w(m) = Az (m 1) (11)
Let m be the maximum component of w(m), in size.
Dene
(m) 1 (m)
z = w (12)
m
for m = 1; 2; : : : Then, roughly speaking, the vectors
z (m) will converge to some multiple of v (1).
(m)
It can be shown that 1 converges to 1 as m ! 1.
" #
1:25 0:75
Example. Recall the earlier example A = .
0:75 1:25
Double precision was used in the computation, with rounded
values shown in the table given here.
Amz (0)
z (m) = m ; m 1 (14)
kAmz (0)k
where m = 1.
Thus
z (0) = c1v (1) + + cnv (n) (15)
for some choice of constants fc1; : : : ; cng.
v (1)
z (m) ! v^(1) (17)
v (1)
with a xed sign independent of m. Our earlier normal-
ization of sign, dividing by m, will usually accomplish
this, but not always.
(m)
A similar convergence analysis can be given for 1 ,
with the same kind of error bound.
2= 1 = 0:5=2 = :25
which is the ratio observed in Table 1.
Example. Consider the symmetric matrix
2 3
7 13 16
6 7
A=4 13 10 13 5
16 13 7
From earlier, the eigenvalues are
1 = 36; 2 = 9; 3 =3
The eigenvector v (1) associated with 1 is
v (1) = [1; 1; 1]T
The results of using the power method are shown in the
following Table 2.
(m)
Note that the ratios of the successive dierences of 1
are approaching
2
= 0:25
1
Also note that location of the maximal component of
z (m) changes from one iteration to the next.
(m) (m) (m 1)
m 1 1 1 Ratio
1 31:80000
2 36:82075 5:03E + 0
3 35:82936 9:91E 1 0:197
4 36:04035 2:11E 1 0:213
5 35:99013 5:02E 2 0:238
6 36:00245 1:23E 2 0:245
7 35:99939 3:06E 3 0:249
AITKEN EXTRAPOLATION
From (20),
(m+1) (m)
1 1 r( 1 1 ); r = 2= 1 (21)
for large m. Choose r using
(m+1) (m)
r 1 1 (22)
(m) (m 1)
1 1
as with Aitken extrapolation in 3.4 on linear iteration
methods.
Also the Aitken formula (23) will give the exact answer
for 1, to seven signicant digits.
SYSTEMS OF NONLINEAR EQUATIONS
f (x; y ) = 0
g (x; y ) = 0 (1)
Example: Consider solving the system
f (x; y ) x2 + 4 y 2 9 =0
(2)
g (x; y ) 18y 14x2 + 45 = 0
A graph of z = f (x; y ) is given in Figure 1, along with
the curve for f (x; y ) = 0.
p(x; y ) f ( x0 ; y 0 ) (3)
+(x x0)fx(x0; y0) + (y y 0 ) f y ( x0 ; y 0 ) ;
@f (x; y ) @f (x; y )
fx(x; y ) = ; fy (x; y ) = ;
@x @y
the partial derivatives of f with respect to x and y , re-
spectively.
f (x; y ) p(x; y )
For functions of two variables, p(x; y ) is the linear Taylor
polynomial approximation to f (x; y ).
NEWTONS METHOD
f (x; y ) = 0
g (x; y ) = 0
Let (x0; y0) ( ; ) be an initial guess at the solution.
f ( x0 ; y 0 ) = 4; f x ( x 0 ; y 0 ) = 2; f y ( x0 ; y 0 ) = 8
At (1; 1; 4) the tangent plane to the surface z =
f (x; y ) has the equation
f (x; y ) = 0
g (x; y ) = 0
is the intersection of the zero curves of z = f (x; y ) and
z = g (x; y ).
p(x; y ) = 0
q (x; y ) = 0
by (x1; y1).
Example. Return to the equations
f (x; y ) x2 + 4 y 2 9 =0
g (x; y ) 18y 14x2 + 45 = 0
with (x0; y0) = (1; 1).
Figure 4: f = g = 0 and p = q = 0
Calculating (x1; y1). To nd the intersection of the zero
curves of the tangent planes, we must solve the linear
system
f (x; y ) = 0
g (x; y ) = 0
Many numerical methods for solving nonlinear systems
are variations on Newtons method.
Example. Consider again the system
f (x; y ) x2 + 4 y 2 9 =0
g (x; y ) 18y 14x2 + 45 = 0
Newtons method (4) becomes
" #" # " #
2 xk 8 yk x;k x2k + 4yk2 9
=
28xk 18 y;k 18yk 14x2k + 45
" # " # " #
xk+1 xk x;k
= +
yk+1 yk y;k
(5)
Choose (x0; y0) = (1; 1). The resulting Newton iter-
ates are given in Table 1, along with
Error = k ( xk ; y k ) k maxfj xk j ; j yk jg
Table 1: Newton iterates for the system (5)
k xk yk
0 1 :0 1 :0
1 1:170212765957447 1:457446808510638
2 1:202158829506705 1:376760321923060
3 1:203165807091535 1:374083486949713
4 1:203166963346410 1:374080534243534
5 1:203166963347774 1:374080534239942
k Error
0 3:74E 1
1 8:34E 2
2 2:68E 3
3 2:95E 6
4 3:59E 12
5 2:22E 16
F ( x) = 0 (7)
A solution of this equation will be denoted by .
Newtons method becomes
F 0(x(k)) (k) = F (x(k))
(8)
x(k+1) = x(k) + (k)
for k = 0; 1; : : : .
F ( x) = 0
Its solution is denoted by 2 Rn .
Newtons method is
F 0(x(k)) (k) = F (x(k))
(11)
x(k+1) = x(k) + (k) ; k = 0 ; 1; : : :
Alternatively, as before,
h i 1
x(k+1) = x(k) F 0(x(k)) F (x(k));
k = 0 ; 1; : : :
(12)
This formula is often used in theoretical discussions of
Newtons method for nonlinear systems. But (11) is used
for practical computations, since it is usually less expen-
sive to solve a linear system than to nd the inverse of
the coe cient matrix.
(k)
x(k) max i xi
1 i n
Newtons method is quadratically convergent.
DIFFERENTIAL EQUATIONS
The equation:
y 0 = f (x; y )
The initial value:
y (x0) = Y0
Find solution Y (x) on some interval x0 x b: To-
gether these two conditions constitute an initial value
problem.
y 0 = f (x; y )
y0 = y + 2 cos x
We can draw direction elds by hand by the method
described above, by using the Matlab program given in
the book; or we can use the Matlab program provided
in the class account.
Direction eld for y 0 = y + 2 cos x. Also shown are
example solution curves
SOLVABILITY THEORY
jf (x; y ) f (x; z )j K jy zj
for all (x; y ); (x; z ) satisfying
x0 x b; 1 < y; z < 1
Then the initial value problem
y 0 = f (x; y ); x0 x0 b; y (x0) = Y0
has a solution Y (x) on the entire interval [x0; b].
y 0 = y + g ( x) ; y (x0) = Y0
has a solution Y (x) has a unique continuous solution
for 1 < x < 1.
STABILITY
Consider solving
Y ( x) Y (x) = e100x
Thus Y (x) Y (x) increases very rapidly as x in-
creases, and we say (2) is an \unstable"or \ill-conditioned"
problem.
NUMERICAL METHODS FOR ODEs
y 0 = f (x; y ); x0 x b; y (x0) = Y0
and Euler's method is simplest example of most such
perspectives. Moreover, the error analysis for Euler's
method is introduction to the error analysis of most
more rapidly convergent (and more practical) numer-
ical methods.
A GEOMETRIC PERSPECTIVE
0 h2 00
Y ( x0 + h ) Y ( x 0 ) + h Y ( x 0 ) + Y ( x0 )
2!
hp (p)
+ + Y (x 0)
p!
Euler's method is the case p = 1:
Y ( x0 + h ) Y ( x0 ) + h Y 0 ( x0 )
= y 0 + h f ( x0 ; y 0 ) y 1
We have an error formula for Taylor polynomial ap-
proximations; and in this case,
h2 00
Y ( x1 ) y 1 = Y ( 0 )
2
with some x0 0 x1 .
GENERAL ERROR FORMULA
In general,
h 2
Y (xn+1) = Y (xn) + h Y 0(xn) + Y 00( n)
2
h2 00
= Y (xn) + h f (xn; Y (xn)) + Y ( n)
2
with some xn n xn+1.
Y ( x) = x2 + 2 x + 2 2 (x + 1) log (x + 1)
We give selected results for three values of h.
Note the behaviour of the error as h is halved.
h x yh(x) Error Relative
Error
0:2 1 :0 2:1592 6:82E 2 0:0306
3 :0 5:4332 4:76E 1 0:0805
5 :0 14:406 1:09 0:0703
0:1 1 :0 2:1912 3:63E 2 0:0163
3 :0 5:6636 2:46E 1 0:0416
5 :0 14:939 5:60E 1 0:0361
0:05 1 :0 2:2087 1:87E 2 0:00840
3 :0 5:7845 1:25E 1 0:0212
5 :0 15:214 2:84E 1 0:0183
GENERAL ERROR FORMULA
In general,
h 2
Y (xn+1) = Y (xn) + h Y 0(xn) + Y 00( n)
2
h2 00
= Y (xn) + h f (xn; Y (xn)) + Y ( n)
2
with some xn n xn+1.
y 0 = 2x; y (0) = 0
This has the solution Y (x) = x2. Euler's method
becomes
yn+1 = yn + 2xnh; y0 = 0
y 1 = y 0 + 2 x0 h = x 1 x0
y 2 = y 1 + 2 x1 h = x 1 x0 + 2 x1 h = x 2 x 1
y 3 = y 2 + 2 x2 h = x 2 x1 + 2 x 2 h = x 3 x 2
By an induction argument,
y n = xn xn 1 ; n 1
For the error,
Y ( xn ) yn = x2n xn xn 1 = xn h
Note that Y (xn) yn _ h at each xed xn.
Return to our error equation
Y (xn+1) yn+1 = Y (xn) yn
+h [ f (xn; Y (xn)) f (xn; yn)]
h2 00
+ Y ( n)
2
(1)
With the mean value theorem,
@f (xn; n)
f (xn; Y (xn)) f ( xn ; y n ) = [Y (xn) yn ]
@y
for some n between Y (xn) and yn. As shorthand,
use eh(x) = Y (x) yh(x). Then we can write
" #
@f (xn; n) h2 00
eh(xn+1) = 1 + h eh(xn) + Y ( n)
@y 2
(2)
with eh(x0) = 0. We also will assume henceforth that
@f (x; y )
K max <1
x0 x b @y
1<y<1
Consider those di erential equations with
@f (x; y )
0; x0 x b; 1<y<1
@y
Then
@f (xn; n)
1 1+h 1
@y
provided h is chosen su ciently small, e.g. if
1
h<
K
Using this in our error formula (2),
h2
jeh(xn+1)j jeh(xn)j + Y 00 ; n 0 (3)
2 1
in which
Y 00 = max Y 00(t)
1 x0 t b
Using induction with (3), we can prove
h
jeh(xn)j (xn x0) Y 00
2 1
Again the error is bounded by something of the form
c ( xn ) h .
EXAMPLE
Y (x) = e x + cos x
Then
Y 00(x) = e x
cos x
:
Y 00 1 = max Y 00(x) = 1:0442
0 x 5
We solve the problem with Euler's method on [0; 5].
Return to
" #
@f (xn; n) h2 00
eh(xn+1) = 1 + h eh(xn) + Y ( n)
@y 2
in which
eh(x) = Y (x) y h ( x)
As before, we assume
@f (x; y )
K max <1
1<y<1 @y
x0 x b
This is much more restrictive than is needed, but it
simpli es the statement and analysis of the error in
solving di erential equations. From this we can show
e(xn x0 )K 1
jeh(xn)j (x x )K
e n 0 jeh(x0)j+h Y 00
2K 1
for all points x0 xn b. In theory, we would have
eh(x0) = 0. But we can also examine what would
happen if we allowed for errors in the choice of y0.
EXAMPLE
exn 1 xn
ASYMPTOTIC ERROR FORMULA
jY (xn) yn j c (xn) h; x0 xn b
for some number c (xn). We can improve upon this.
Namely, it can be shown that
Y ( xn ) y h ( xn ) = D ( xn ) h + O h 2
Y ( xn ) y h ( xn ) D ( xn ) h (4)
for smaller values of h. The function D(x) satis es
a particular di erential equation, but it can seldom
be found in practice since it depends on the solution
Y (x). Instead we use (4) to justify using Richardson
extrapolation to estimate the error.
RICHARDSON EXTRAPOLATION
y 0 = f (x; y ); x0 x b; y (x0) = Y0
with Euler's method, as suppose we do it twice, using
stepsizes of h and 2h. Denote the respective numer-
ical solutions by yh(xn) and y2h(xn). Then from (4)
above, and at a generic node point x,
Y ( x ) y h ( x) D ( x) h
Y (x) y2h(x) D(x) (2h)
Multiply the rst equation by 2 and then subtract the
second equation. This yields
Y (x) 2yh(x) + y2h(x) 0
Y (x) 2yh(x) y2h(x)
This last formula is called \Richardson's extrapolation
formula" for Euler's method. We can also use it to
estimate the error.
Y ( x) y h ( x) [2yh(x) y2h(x)] y h ( x)
= yh(x) y2h(x)
The formula
Y ( x) y h ( x) y h ( x) y2h(x)
is called \Richardson's error estimation formula" for
Euler's method.
max "
jyn yn j cb j"j
x0 x n b
for some constant cb > 0 and for all su ciently small
values of the stepsize h.
This implies that Euler's method is stable, and in the
same manner as was true for the original di erential
equation problem.
The general idea of stability for a numerical method
is essentially that given above for Eulers's method.
Y 0 ( x ) = Y ( x) ; x 0; Y (0) = 1
The analysis for this problem is generally applicable
to the more general di erential equation problem.
yn+1 = yn + h yn
= (1 + h ) yn; n = 0 ; 1; : : :
with y0 = 1. By induction,
yn = (1 + h )n ; n 0
The numerical solution yn ! 0 as n ! 1 if and only
if
j1 + h j < 1
In the case is real and negative, this is equivalent to
2<h <0
If j j is quite large, then h must be correspondingly
small. Usually the truncation error does not require
such a small value of h; it is needed only for stability.
NUMERICAL EXAMPLE: Consider the problem
j1 + zj < 1
or equivalently,
jz ( 1)j < 1
This is a circle of radius one in the complex plane,
centered at the complex number 1 + 0 i.
Y 0 ( x ) = Y ( x) ; x 0; Y (0) = 1
This yields
yn+1 = yn + h yn+1
1
yn+1 = yn ; n = 0 ; 1; : : :
1 h
with y0 = 1. By induction,
1 n
yn = ; n = 0 ; 1; : : :
1 h
We want to know when yn ! 1 as n ! 1. This
will be true if
1
<1
1 h
The hypothesis that < 0 or Real( ) < 0 is su -
cient to show this is true, regardless of the size of the
stepsize h. Thus the backward Euler method is an
A-stable numerical method.
NUMERICAL EXAMPLE
z = yn + hf (xn+1; z ) (2)
with the root z = yn+1. Such numerical methods
(1) for solving di erential equations are called implicit
methods.
(0)
For an initial guess, we use yn+1 = yn or something
even better. The iteration error satis es
(k+1) @f (xn+1; yn+1) (k)
yn+1 yn+1 h yn+1 yn+1
@y
(4)
We have convergence if
@f (xn+1; yn+1)
h <1 (5)
@y
which is true if h is su ciently small. If this is too re-
strictive on h, then another root nding method must
be used to solve (2).
THE TRAPEZOIDAL METHOD
Y 0 ( x ) = Y ( x) ; x 0; Y (0) = 1
Then
h
yn+1 = yn + [ yn + yn+1]
2
with y0 = 1. Solving for yn+1,
0 1
1+ h
yn+1 = @ 2 Ay ; n 0
h n
1 2
By induction,
0 1n
1+ h
yn = @ 2 A ; n 0
1 h
2
Then for real and negative, and also for com-
plex with Real ( ) < 0, we can use this formula to
show yn ! 0 as n ! 1. This shows the trapezoidal
method is absolutely stable.
NUMERICAL EXAMPLE: We apply the trapezoidal
method to the problem
Y 0 ( x) = Y ( x ) + g ( x )
with Real ( ) < 0; jReal ( )j large, are called sti dif-
ferential equations. For general equations, the role of
is replaced by
@f (x; Y (x))
@z
Consider solving
y 0 = y cos x; y (0) = 1
Imagine writing a Taylor series for the solution Y (x),
say initially about x = 0. Then
0 h2 00 h3 000
Y (h ) = Y (0) + hY (0) + Y (0) + Y (0) +
2 6
We can calculate Y 0(0) = Y (0) cos(0) = 1. How do
we calculate Y 00(0) and higher order derivatives?
h2 h4
Y (h ) = 1 + h + +
2 8
We can truncate the series after a particular order.
Then continue with the same process to generate ap-
proximations to Y (2h); Y (3h); ::: Letting xn = nh,
and using the order 2 Taylor approximation, we have
0 h2 00 h3 000
Y (xn+1) = Y (xn)+hY (xn)+ Y (xn)+ Y ( n)
2 6
with xn n xn+1. Drop the truncation error
term, and then de ne
0 h2 00
yn+1 = yn + hyn + yn ; n 0
2
with
0 = y cos(x )
yn n n
00 =
yn 0 cos(x )
yn sin(xn) + yn n
Consider solving
y0 = y; y (0) = 1
whose true solution is Y (x) = e x. Di erentiating
the equation
Y 0 ( x) = Y ( x)
we obtain
Y 00 = Y0 =Y
Y 000 = Y 0 = Y; Y (4) = Y
Then expanding Y (xn + h) in a Taylor series,
h 2 h 3
Y (xn+1) = Yn + hYn0 + Yn00 + Yn000
2 6
4
h (4) h 5
+ Yn + Y (4)( n)
24 120
Dropping the truncation error, we have the numerical
method
h 2 h 3 h 4 (4)
yn+1 = 0
yn + hyn 00 000
+ 2 yn + 6 yn + 24 yn
h 2 h 3 h 4
= 1 h+ 2 6 + 24 yn
with y0 = 1. By induction,
!n
h2 h3 h 4
yn = 1 h+ + ; n 0
2 6 24
Recall that
h2 h3 h4 h5
e h =1 h+ + e
2 6 24 120
with 0 < < h. Then
n
yn = e h + h5 e
120
n
= e nh 1+ h5 eh
120
: 4
= e xn 1 + x120
n h eh
Thus
Y ( xn ) y n = e xn + O ( h 4 )
We can introduce the Taylor series method for the
general problem
y 0 = f (x; y ); y (x0) = Y0
Simply imitiate what was done above for the particular
problem y 0 = y cos x.
In general,
Y 0(x) = f (x; Y (x))
Y 00(x) = fx (x; Y (x)) + fy (x; Y (x)) Y 0(x)
= fx (x; Y (x)) + fy (x; Y (x)) f (x; Y (x))
Y 000(x) = fxx + 2fxy f + fyy f 2 + fy fx + fy2f
and we can continue on this manner. Thus we can
calculate derivatives of any order for Y (x); and then
we can de ne Taylor series of any desired order.
This used to be considered much too arduous a task
for practical problems, because everything had to be
done by hand. But with symbolic programs such as
Mathematica and Maple, Taylor series can be con-
sidered a serious framework for numerical methods.
Programs that implement this in an automatic way,
with varying order and stepsize, are available.
RUNGE-KUTTA METHODS
By choosing
1
= = ; 1 =1 2
2 2
we can show
Tn(Y ) = O(h3)
EXAMPLES
yn+1 = yn + hF (xn; yn; h; f )
F (x; y; h; f ) = 1f (x; y ) + 2f (x + h; y + hf (x; y ))
V1 = f (x; y )
V2 = f (x + 2h; y + 2;1hV1)
V3 = f (x + 3h; y + 3;1hV1 + 3;2hV2)
V4 = f (x + 4h; y + h( 4;1V1 + 4;2V2 + 4;3V3)
Again it can be analyzed by expanding the truncation
error
Tn(Y ) = O(hp)
with p 2, then it can be shown that
jY (xn) yn j chp 1; x0 xn b
when solving the initial value problem
y 0 = f (x; y ); x0 xn b; y (x0) = Y0
We can also go further and show that
Y ( xn ) y h ( xn ) = D ( xn ) h p 1 + O ( h p ) ; x0 xn b
This can then be used to justify Richardson's extrap-
olation.
Y ( xn ) y h ( xn ) = O h 2
with a constant of proportionality that depends on xn.
Y ( x0 ) y0 = O h 2 [usually zero]
Y ( x1 ) y1 = O h 2
with standard assumptions on the di erentiability of
f and Y , then we have
max jY (xn) y h ( xn ) j c h2
x0 xn b
for some constant c 0.
Moreover, if
Y ( x0 ) y0 c0 h 2
Y ( x1 ) y1 c1 h 2
for constants c0, c1, then
Y ( xn ) y h ( xn ) = D ( xn ) h 2 + O h 3
with D(x) a continuous function for x0 x b.
From this, we also have the Richardson error estimate
1
Y ( xn ) y h ( xn ) [y h ( x) y2h(x)]
3
NUMERICAL EXAMPLE
(continuation)
Consider solving
q
X
yn+1 = yn + h wj;q f xn j ; yn j
j=0
with initial values
y k = Y ( xk ) ; k = 0 ; 1; : : : ; q
Then perturb these initial values and solve
q
X
zn+1 = zn + h wj;q f xn j ; zn j
j=0
with
jzk yk j "; k = 0 ; 1; : : : ; q
for all su ciently small values of h, say 0 < h h0 .
Then there is a constant c > 0 with
max jzk yk j c"; 0<h h0 (8)
x q xn b
ADAMS-MOULTON METHODS
yn+1 = yn + hyn+1 0
b 1 2 00
E n = h Y ( n)
2
q = 1: It is the trapezoidal method,
h 0 0]
yn+1 = yn + [yn+1 + yn
2
b 1 3 (3)
En = h Y ( n)
12
q = 2:
h 0 0 0
yn+1 = yn + [5yn+1 + 8 yn yn 1]
12
b 1 4 (4)
En = h Y ( n)
24
q = 3:
h 0 0 0 0
yn+1 = yn + [9yn+1 + 19yn 5 yn 1 + yn 2 ]
24
b 19 5 (5)
E n = h Y ( n)
720
CONVERGENCE and STABILITY. The convergence
and stability properties of Adams-Bashforth methods
extend to Adams-Moulton methods. In particular, the
results given in (7) and (8).
Again write
" #
Y1(x)
Y ( x) =
Y2(x)
and de ne
" # " #
Az1[1 Bz2] z1
f (x; z) = ; z=
Cz2[Dz1 1] z2
although there is no explicit dependence on x. Then
system (2) can be written as
f (x; z) = Az + G(x); z 2 Rm
This equation is the analogue for studying systems of
ODEs that the model equation
y 0 = y + g ( x)
is for studying a single di erential equation.
EULER'S METHOD FOR SYSTEMS
Consider
Y0(x) = f (x; Y(x)) ; Y(0) = Y0
to be a systems of two equations
Y10(x) = f1(x; Y1(x); Y2(x)); Y1(0) = Y1;0
(5)
Y20(x) = f2(x; Y1(x); Y2(x)); Y2(0) = Y2;0
Denote its solution be [Y1(x); Y2(x)].
Following the earlier derivations for Euler's method,
we can use Taylor's theorem to obtain
h2 00
Y1(xn+1) = Y1(xn) + hf1(xn; Y1(xn); Y2(xn)) + Y1 ( n)
2
(6)
h2 00
Y2(xn+1) = Y2(xn) + hf2(xn; Y1(xn); Y2(xn)) + Y2 ( n)
2
Dropping the remainder terms, we obtain Euler's method
for problem (5),
y1;n+1 = y1;n + hf1(xn; y1;n; y2;n); y1;0 = Y1;0
y2;n+1 = y2;n + hf2(xn; y1;n; y2;n); y2;0 = Y2;0
for n = 0; 1; 2; : : :
ERROR ANALYSIS
x y ( x) y ( x) y h ( x) yh(x) y2h(x)
2 0:49315 4:25E 2 4:53E 2
4 1:41045 6:86E 2 7:05E 2
6 0:68075 2:49E 2 2:70E 2
8 0:84386 7:56E 2 7:99E 2
10 1:38309 4:14E 2 4:25E 2
OTHER METHODS
Ay = b
where, unknown numerical solution vector
y = [ y1 ; ; y N 1 ]T
right-hand side vector
h
b= h2r1 + 1 + p1 g1; h2 r 2 ; ;
2
h T
h2 r
N 2; h2 r
N 1+ 1 pN 1 g2
2
and coe cient matrix A, which is tridiagonal.
THEORETICAL RESULTS
max jY (xi) yi j = O (h 2 )
0 i N
Y ( xi ) y h ( xi ) = h 2 D ( xi ) + O ( h 4 )
for some function D(x) independent of h.
De ne Richardson extrapolation
4 y h ( xi ) y2h(xi)
y~h(xi) =
3
Then
Y ( xi ) y~h(xi) = O(h4)
i.e., without much additional e ort, we obtain a fourth-
order approximate solution.
x h = 1=40 h = 1=80 R
0 :1 9:23E 09 5:76E 10 16:01
0:2 1:04E 08 6:53E 10 15:99
0:3 6:60E 09 4:14E 10 15:96
0:4 1:18E 09 7:57E 11 15:64
0:5 3:31E 09 2:05E 10 16:14
0:6 5:76E 09 3:59E 10 16:07
0:7 6:12E 09 3:81E 10 16:04
0 :8 4:88E 09 3:04E 10 16:03
0 :9 2:67E 09 1:67E 10 16:02
DIFFERENCE SCHEME FOR GENERAL
EQUATION
Y 00 = f (x; Y; Y 0)
At an interior node point xi, the di erential equation
can be approximated by the di erence equation
yi+1 2 yi + yi 1 yi+1 yi 1
= f x ;
i iy ;
h2 2h
TREATMENT OF OTHER BOUNDARY
CONDITIONS
Y 0(b) + k Y (b) = g2
If we use the discrete boundary condition
yN yN 1
+ k yN = g2
h
then the di erence solution will have a rst-order ac-
curacy only, even though the di erence equations at
the interior nodes are second-order.
To maintain second-order accuracy, need a second-
order treatment of the derivative term Y 0(b), e.g.,
since
3 YN 4 YN 1 + YN 2
Y 0(b) = + O (h 2 )
2h
we can approximate the boundary condition by
3 yN 4 yN 1 + yN 2
+ k yN = g2
2h
!
!
!
"
#
! $ %
$ !
&
'( )
'(
!
!
!
&
'( !
%
$
%
* $
) #
!
% !$
%
'(
'
+
$
+
$
,
+
&
+
-
$
%
)
. /
*%
0
%
)
. /
$
'(
* $ !
%
!
'
+
$
+
1
%% )
%% !
2$ !
3 !$ $
% ) #
!
%
%
% !
!
+ 4 4 !
%
+
5
+ ! !
$
!
!
* )
) #
!
yn +1
y
yj
y1 x
x1 xi xn +1
x
"
#
8 8 9
#
:
%%
8
8
)
#
: !
$
% !$ #
8
8
8 8
4
4
4
4
4
8
*
8
#
4
!
2$
!
8
8
8
4
8
0 ! ;
7
%
!
4 8
%
!
8
!
8
4
*
4
4
!"
"
%
8
4
4
4
$ )
$
0.8
the true solution
0.6
0.4
0.2
0
1
0.8 1
0.6 0.8
0.4 0.6
0.4
0.2
0.2
0 0
yaxis
xaxis
*
)% %
$ 4<
% )
3
x 10
1.5 1
The numerical solution
0
1
The error
0.5
2
0 3
1 1
1 1
0.5 0.5
0.5 0.5
yaxis 0 0 xaxis yaxis 0 0 xaxis
=:
:
!
%
!$
=: (
>?8@( 8
A 48@>4( 8 @
4< ?84@( ? 8
?8 A?B( 4
5 %
6 % :
!
!
:
)
!
$
%
$
: %
:
4 4 48 4
? 4 4 48 48 48
4 ?
A 4<
8<>(8 <A(? @ 4<4(? 8
?B>(8 @4<(? @ 88A(? 8
8<>(8 <A(? @ 4<4(? 8
?B>(8 @4<(? @ 88A(? 8
>?(8 4?(8 @ ?88(? 8
8<>(8 <A(? @ 4<4(? 8
0
3 :
%
9
!$
:
%
?
!
-
!
!
%
%%
!
$
6 $
$
?
3!
6 8
$
6 &
$ %
8
(
!
$
$
C
$
C :
)
C
?
$
#
$
!
(:
!
%
%
%
:
A 4<
8(D 48>(D> 4<?>
8A@(D 4BB(D> 4<?>
8(D 48>(D> 4<?>
8A@(D 4BB(D> 4<?>
@(D 8>(D> 4<?>
8(D 48>(D> 4<?>
,
! :
$ )% #
2 4< : *
:
! :
8 4 $
! #
84 3
$
$ :
!
!
!
4
u =au +f
t xx
u(0,t) = g1(t) u(L,t) = g2(t)
x
O L
u(x,0) = u (x)
0
$
#
% &
#
$ '
(
# #
! "
)
* +
+
% $ ( $ #
,
-#(
.
(
%
(
/
#
0 1 2 "
& (
'
'
' 3
. '
'
*
$ ! "
!
*
. ,
! # #
. #
.
#
+
*
4
* $ 4
"
'
'
0 1
0.8
the true solution
0.6
0.4
0.2
0
0.2
0.15 1
0.8
0.1
0.6
0.05 0.4
0.2
0 0
taxis
xaxis
% . ' ' *
*
' - #
'
* '
*
2 < = '*
= *> ?= <
& 2 .
. *
$
.
< = ' @>'>A*
= *> B @*?A<
'* ?= ' *BC<A<
+ $$ .#
>
'* ?= $
$ &
Solution: h =0.0021 h =0.0833 Error: h =0.0021 h =0.0833
t x t x
3
x 10
1 2
The numerical solution
0.8
1.5
0.6
The error
1
0.4
0.5
0.2
0 0
0.2 0.2
1 1
0.1 0.1
0.5 0.5
taxis 0 0 xaxis taxis 0 0 xaxis
#
$ 6
2
& 6 ' '
$ &
%
* :$#
!
' '
!
Solution: ht=0.0200 hx=0.1000 Error: ht=0.0200 hx=0.1000
1 0.04
The numerical solution
0.8
0.03
0.6
The error
0.02
0.4
0.01
0.2
0 0
0.2 0.2
1 1
0.1 0.1
0.5 0.5
taxis 0 0 xaxis taxis 0 0 xaxis
6
* * > #
$ &
2
!
!
#
*
Solution: h =0.0100 h =0.0500 Error: h =0.0100 h =0.0500
t x t x
7 7
x 10 x 10
2 2
The numerical solution
1 1
The error
0 0
1 1
2 2
0.2 0.2
1 1
0.1 0.1
0.5 0.5
taxis 0 0 xaxis taxis 0 0 xaxis
E
>'
% $ $
"
(
. !
$ F
3
6
'
'
'
'
'
'
D & ! #
.
! .
*
# /
&# $
6$ !
6$
*
*
D
'
'
*
'
*
*
3
x 10
1 5
The numerical solution
0.8 0
0.6 5
The error
0.4 10
0.2 15
0 20
0.2 0.2
1 1
0.1 0.1
0.5 0.5
taxis 0 0 xaxis taxis 0 0 xaxis
6
$
$ B #
$
% $
. *
2 $
"
> < @* % $ "
> > *? 2
< @* > *? $
>
$
&
B ' * *
B ' ' *
* <B * *? <@* >*?
> B== * *? <@* >*?
)
*
!
+
+
' '"
,
(
!
' '
-
)
$
(
,"
.
(
" ' '"
*
*
+
*
.
,
+
' '
(
" '
'" )
%
+ .
%
)
%
+
+
'
)
%
'!
+ '
,
!
+
/ %
"
'
+
+
+
.
, "
!
-
"
' '
."
'
+
+
+
+
-
"
+ "
+'
+
0
'" "
1
"
"
*
"
,
!
2
!
+ '
3 4
3 '4
0.8
the true solution
0.6
0.4
0.2
0
1
0.8 3.5
0.6 3
2.5
0.4 2
1.5
0.2 1
0.5
0 0
taxis
xaxis
5 5"
' '
+ +
+ +
(
4
x 10
1 2
0
The numerical solution
0.8
2
0.6
The error
4
0.4
6
0.2 8
0 10
1 1
4 4
0.5 0.5
2 2
taxis 0 0 xaxis taxis 0 0 xaxis
"
*
*
5 ' 9
+ 9
+ '8+:; 6<+:6 ;8< ''<:6 6+
6 66=:; ''<:; ;85 +=':6 6'
7 <<+:; +':; ;86 5+:6 6'
8 '';:+ +=6:; ;8; <;6:6 6'
' '6=:; ;8=:; ;8; =7=:6 6'