Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lecture Notes On Numerical Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Lecture Notes

on
Numerical Analysis
Song Wang
School of Mathematics & Statistics
The University of Western Australia
swang@maths.uwa.edu.au
1
Contents
1 Computer Arithmetic 4
1.1 Floating point numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Overow and underow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Floating-point arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Computing sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Perturbation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Algorithms and convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Nonlinear Equations of One Variable 14
2.1 Bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Fixed point method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Newtons method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 The secant method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Quasi-Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 M ullers method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Interpolation & Polynomial Approximation 25
3.1 Lagrange Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Divided dierence & Newtons polynomial . . . . . . . . . . . . . . . . . . 27
3.3 Hermite interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Cubic spline interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Numerical Integration & Numerical Dierentiation 36
4.1 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Simpsons rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Newton-Cote formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Composite rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4.1 Composite Simpsons rule . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.2 Composite trapezoidal and mid-point rules . . . . . . . . . . . . . . 42
4.5 Gauss quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.6 First derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Second derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.8 Computational errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2
5 Numerical Solution of Ordinary Dierential Equations (ODEs) 47
5.1 Eulers method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Higher-order Taylor methods . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Runge-Kutta and Other Methods . . . . . . . . . . . . . . . . . . . . . . . 51
5.4 Multi-step Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Implicit Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.6 Stability of One-Step Methods . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.7 Stability of Multi-step Methods . . . . . . . . . . . . . . . . . . . . . . . . 57
6 Least Squares Approximation 60
6.1 Discrete case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2 Fitting by exponential functions . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Orthogonal polynomials & least-squares approximation . . . . . . . . . . . 62
7 Solution of Nonlinear Systems of Equations 64
7.1 Fixed point iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2 Newtons method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.3 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.4 The steepest descent method . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3
Chapter 1
Computer Arithmetic
1.1 Floating point numbers
Base 10
Computer memory can only story a nite number of digits. Therefore, a question
becomes apparent: given a xed number of digits how can we dene a representation
so that it gives the largest coverage of real numbers? An obvious method to use is the
scientic notation, i.e., a number of very large or very small magnitude is represented as
a truncated number multiplied by an appropriate power of 10. For example, 2.597 03
represents 2.597 10
3
.
Example. Given 4 digits, the minimum and maximum positive decimal numbers can
be expressed by the conventional method are .0001 and 9999 respectively (assume that
decimal point does not take a space). However, if we are ready to sacrice some accuracy,
we can use two digits to represent the power of 10 and its sign and the rest to represent
the rst two non-zero digits and the sign of a number, then, the minimum and maximum
positive numbers are respectively .01 10
9
and 99 10
+9
. Note that we may loose some
accuracy. For example 2.345 can only be expressed as .23 10
+1
.
In general we have
x = s 0.m 10
e
,
where s denotes the sign, m is called Mantissa (or M bits) and e is the exponent (or E
bits).
Base
A base oating point number consists of a fraction f containing the signicant gure
of the number and exponent e. The value of the number is
f
e
.
A oating-point number a = f
e
is said to be normalised if

1
1
(For base 10,
1
= 0.1). In other words, a is normalised if the base- representation of
its fraction has the form
f = 0.x
1
x
2
, with x
1
= 0.
4
Obviously, 2.597 10
3
is not normalized, while 0.2587 10
3
is.
Commonly used bases:
binary base 2, used by most of computer systems.
decimal base 10, used in most of hand calculators.
hex base 16, IBM mainframes and clones.
IEEE Standard (32-binary digits or bits)
In this standard, the rst bit is for the sign, bits 29 for exponent and bits 1032 for
fraction.
8
fraction exponent s
23
0 1 9 32
1
Numerical range: approximately 10
38
to 10
38
, or it represents 7 signicant decimal digits.
This is the so-called single precision. We may extend it to double precision, i.e., 64-bit
storage, which has roughly a numerical range from 10
307
to 10
307
.
1.2 Overow and underow
The real number set is nite, but the representation is nite. This limitation (that e is
nite) leads to overow or under ow. Collectively, we have exponent exception. There
are a few cases as listed below
When a number is too large to be presented by the nite exponent range, the result
is said to have overow. For example, 10
60
can be expressed by a 32-bit memory
block.
Similarly, we have underow, e.g. 10
60
.
Overow is a fatal error as it normally cause the system to stop. The system
normally displays NaN Not a Number!.
Underow is normally not harmful as it can be replaced be zero.
Overow can sometimes be avoided by scaling.
Example: Consider c =

a
2
+ b
2
with a = 10
60
and b = 1. On a 5 digit decimal computer,
assuming
Digit 1 sign
Digits 2 and 3 exponent
Digits 4 and 5 magnitude.
5
The computation will overow when computing
(10
60
)
2
= 10
120
as 120 has three digits, exceeding the length of exponent. However, we can scale the
problem by a parameter l, i.e.,
c = l

_
a
l
_
2
+
_
b
l
_
2
.
If we choose l = max{|a|, |b|} = 10
60
, then
c = 10
60

1
2
+
_
1
10
60
_
2
.
This should work, though
1
10
60
will cause underow. This is because it can be set to zero
and so c 10
60

1
2
= 10
60
.
Errors
Rounding: round o. eg. 2.6457513 2.6458.
Chopping/truncation. 2.6457513 2.6457.
Absolute and relative errors: If p

is an approximation to p, then |pp

| and
|p p

|
|p|
are called absolute and relative errors respectively, providing that p = 0.
Bounds on the relative error for rounding-o
Consider a = x.xxxxy rounded to b = x.xxxz. If y 5, round up, otherwise, round
down. Therefore,
|b a| 5 10
5
.
On the other hand, we assume that x = 0. Then |a| 1 and so
|b a|
|a|
5 10
5
=
1
2
10
4
.
From this we have the following general theory: rounding a to t decimal digits gives a
number b satisfying
|b a|
|a|

1
2
10
t+1
.
Bounds on the relative error for chopping
Similarly, if b is obtained by chopping a (t digits), then
|b a|
|a|
10
t+1
.
6
Bounds for t-digit binary numbers
|b a|
|a|

_
2
t
, rounding
2
t+1
chopping.
Let b = (a) denote the result of rounding or chopping a on a particular machine, and

M
> 0 the upper bound on the relative error. If we set
=
b a
a
, a = 0
then b = a(1 +) and ||
M
. In other words,
(a) = a(1 +), ||
M
.
This
M
is characteristic of the oating point arithmetic of the machine in question. It is
called the rounding unit for the machine or machine epsilon. We sometimes refer it to as
machine accuracy.
1.3 Floating-point arithmetic
In general, a combination (i.e., +, , , /) of oating-point numbers will not representable
as a oating number of the same size. For example, product of two 5-digit numbers will
generally require 10 digits for its representation. Thus, the result of a oating-point
operation can be represented only approximately.
Let denote an operation (+, , , /), then
(a b) = (a b)(1 +), ||
M
,
by the above theory. This is ideal. However, some systems can return a dierence larger
then the relative error. Here is one example.
Example: Consider the computation of dierence 1 0.999999 in 6-digit decimal arith-
metic. If it is done by a 7-digit machine, we have 0.000001 or 0.100000 10
5
. However,
on a 6-digit machine,
0.999999 0.99999 (chopping).
So,
1.00000 0.99999 = 0.1 10
4
.
The relative error is
rel. error =
0.1 10
4
0.1 10
5
0.1 10
5
= 9.
Note that in the above we rst approximate b by

b and then evaluate a

b. In reality we
may need to approximate both a and b, so that
(a b) = a(1 +
a
) b(1 +
b
)
with |
a
|, |
b
|
M
.
7
1.4 Computing sums
Consider the computation of
s
n
= (x
1
+ x
2
+ + x
n
).
When n = 2, we have
s
2
= (x
1
+ x
2
)
= (x
1
+ x
2
)(1 +
1
)
= x
1
(1 +
1
) + x
2
(1 +
1
), |
1
|
M
.
Similarly,
s
3
= (x
1
+ x
2
+ x
3
)
= (s
2
+ x
3
)(1 +
2
)
= [x
1
(1 +
1
) + x
2
(1 +
1
)](1 +
2
) + x
3
(1 +
2
),
= x
1
(1 +
1
)(1 +
2
) + x
2
(1 +
1
)(1 +
2
) + x
3
(1 +
2
)
where |
1
|, |
2
|
M
. Continue with this process we have
s
n
= (s
n1
+ x
n
)
= x
1
(1 +
1
)(1 +
2
) (1 +
n1
)
+x
2
(1 +
1
)(1 +
2
) (1 +
n1
)
+x
3
(1 +
2
) (1 +
n1
)
.
.
.
+x
n
(1 +
n1
)
=: x
1
(1 +
1
) + x
2
(1 +
2
) + + x
n
(1 +
n
),
where
1 +
i
= (1 +
i1
)(1 +
i
) (1 +
n1
)
with |
i
|
M
for all i = 1, 2, ..., n 1 and
0
= 0. Let us consider
1 +
n1
= (1 +
n2
)(1 +
n1
)
= 1 + (
n2
+
n1
) +
n2

n1
.
Now, |
n2
+
n1
| 2
M
and |
n2

n1
|
2
M
. From these we have

n1

n2
+
n1
|
n1
| 2
M
.
In general, we have that approximately,
|
1
| (n 1)
M
|
i
| (n i + 1)
M
.
8
More precisely, for all n = 1, 2, ...,
1 + = (1 +
1
)(1 +
2
) (1 +
n
)
= 1 +
n

i=1

i
+

i=j

j
+ +
1

2

n
1 + n
M
+
n(n + 1)
2

2
M
+ +
n
M
.
It can be shown that
1.06
M
=: n

M
,
where

M
is called the adjusted rounding error. So, the modied expressions are
|
1
| (n 1)

M
|
i
| (n i + 1)

M
.
Example: Let

M
= 10
15
and assume that we have a computer with an addition operation
rate of 1sec = 10
6
sec. Let us calculate how long it will take before the error is
accumulated to 0.1 on this computer.
From |
1
| (n1)

M
= 0.1 we get n 10
14
, i.e., it takes about 10
14
additions before
the accumulated error becomes 0.1. The time needed by this machine is
10
14
10
6
= 10
8
secs 3.2years.
Therefore, it takes about 3.2 years before the error is accumulated to 0.1.
Backward analysis
The expression
s
n
= x
1
(1 +
1
) + + x
n
(1 +
n
)
along with the bounds all
i
is called a backward error analysis because the rounding
errors made in the course of computation are projected backward onto the original data.
An algorithm that has such an analysis is called stable or backward stable.
1.5 Perturbation analysis
Consider the evaluation of
= x
1
+ x
2
+ + x
n
.
We suppose there is a perturbation in x
i
, i.e., x
i
x
i
with
x
i
= x
1
(1 +
i
), |
i
|
and look for a bound on the error in
= x
1
+ x
2
+ + x
n
.
9
(This is called a perturbation analysis.) Clearly,
| | = |x
1

1
+ x
2

2
+ + x
n

n
|

i=1
|x
i
||
i
|

n

i=1
|x
i
|.
Dividing by ||, we have
| |
||

n
i=1
|x
i
|
|

n
i=1
x
i
|
=: ,
where =

n
i=1
|x
i
|
|

n
i=1
x
i|
1. This is a magnication factor and serves as a condition number
for the problem.
Example: Consider the sum
s = 5.00 10
8
4.99 10
8
+ 1.00,
where all the numbers are experimental data with 3 signicant digits. In this case, =
10
2
. So,
=
5 10
8
+ 4.99 10
8
+ 1.00
(5 4.99) 10
8
+ 1

9.99 10
8
0.01 10
8
= 9.99 10
2
.
Therefore, the relative error equals = 9.99, not small!
When all x
i
s are positive (or negative), = 1. In this case the problem is called
perfectly conditioned, and the errors will not accumulate or accumulate slowly. However,
if >> 1, the problem is said to be ill-conditioned.
1.6 Cancellation
Let us calculate
37654 + 25.874 37679 = 0.874
on a 5-digit machine (or in 5-digit oating-point). This gives
(37654 + 25.784) = 37680
and
(37680 37679) = 1.
This does not agree with 0.874. The result has only one signicant digit. This is called
cancellation. In fact, cancellation does not cause any problem as (37680 37679) = 1 is
exact. The trouble comes from the rst step, i.e.,
(37654 + 25.784) = 37680.
10
So, we have only two signicant digits in the second number. This is why the result is
not accurate.
The quadratic equation
Let us consider solving x
2
bx + c = 0 which has the roots
x =
b

b
2
4c
2
.
If we take b = 3.6778 and c = 0.0020798, then
x
1
= 3.67723441190 , x
2
= 0.00056558809
An attempt to calculate the smallest root in 5-digit arithmetic gives
1. b
2
1.3526 10
1
.
2. 4c 8.3192 10
3
.
3. b
2
4c 1.3518 10
1
.
4.

b
2
4c 3.6767 10
0
.
5. b

b
2
4c 1.1000 10
3
.
6. (b

b
2
4c)/2 5.5000 10
4
.
The computed value diers from the true value 5.6558809 10
4
in the second signicant
number. The reason of the cancellation at step 5, where 3 signicant digits were cancelled
when computing 3.6778 3.6767. Cancellation only reveals a loss of information that
occurred earlier. The real trouble occurred at step 3.
b
2
4c = (13.543 0.0083192) = 1.3518 10
1
.
This cancellation corresponds to replacing the number 0.0083192 by 0.008 and perform
the dierence exactly.
Can anything be done to save it? It all depends.
1.7 Algorithms and convergence
Denition 1.1 (algorithm) : A numerical algorithms is the combination of
1. Input variables.
2. A sequence of steps which manipulates the input variables along with additional
temporary variables.
3. Output variables.
11
Denition 1.2 (Stability of algorithm) An algorithm is stable if any small change in
the initial data only results in a small change in the nal data. Otherwise, it is unstable.
Furthermore, if the stability is satised only for certain choice of initial data, then the
algorithm is called conditionally stable.
Denition 1.3 Suppose E
0
> 0 denotes an initial error and E
n
represents the magnitude
of an error after n steps/operations. Then,
E
n
CnE
0
linear error growth (C > 0 is a constant).
E
n
C
n
E
0
exponential error growth if C > 1.
Linear Growth is unavoidable, but exponential growth is fatal.
Example: Consider a general reccurence relation
ax
n+1
+ bx
n
+ cx
n1
= 0, n = 1, 2, ..., (1.1.7.1)
with x
0
= and x
1
= . Let us nd a solution of the form x
n
= p
n
where p is to be
determined. Substituting x
n
into (1.1.7.1) gives
ap
n+1
+ bp
n
+ cp
n1
= 0,
or
ap
2
+ bp + c = 0.
This has the solution
p =
b

b
2
4ac
2a
.
In particular, if a = 3, b = 13/3 and c = 4, then we have two roots: p
1
= 4 and p
2
= 1/3.
The general solution to the dierence equation is
x
n
= A 4
n
+ B
_
1
3
_
n
,
where A and B are two arbitrary constants. Using the initial conditions and we have
x
n
=
3
11
4
n
+
12 3
11
_
1
3
_
n
.
Now, if = 1 and = 1/3, then A = 0 and B = 1, and the solution becomes
x
n
=
12 1
11
_
1
3
_
n
=
_
1
3
_
n
.
Let us consider the computation of this solution. If A = 0 is exact, but B = 1 +, then
x
n
= (1 +)
_
1
3
_
n
0
12
as n . The absolute error is
E
n
= | x
n
x
n
| =
1
3
n
0
as n . This implies that the solution is stable with respect to B.
If there is also an initial error in A, i.e., A 1 + , then
x
n
= (1 +)4
n
+ (1 +)
_
1
3
_
n
.
So, the error becomes
E
n
= |x
n
x
n
| = 4
n
+

3
n

as n . Therefore, it is not stable with respect to A.
Consider x
n
=
1
3
n
again. If 3 becomes 3 + due to truncation for a very small , then
x
n
=
1
(3 +)
n
.
The error is
E
n
= |x
n
x
n
|
=

1
3
n

1
(3 +)
n

1
3
n
+
1
(3 +)
n

_
2
3
n
> 0
2
(3+)
n
< 0
0, as n .
So, it is stable. It is easy to show that x
n
= 4
n
is not stable with respect to 4.
13
Chapter 2
Nonlinear Equations of One Variable
How far a cannon ball can travel?
The motion of the ball satises
y

(t) = g, y(0) = 0, y

(0) = V
0
sin ,
where y is the displacement along the vertical direction (height), V
0
is the initial speed,
g is the gravitational acceleration constant and is the angle (from the horizontal axis).
The solution of this initial value problem is
y(t) = V
0
t sin
1
2
gt
2
.
When the ball touches the ground again at t = T, we have
y(T) = 0 = V
0
T sin
1
2
gT
2
.
This has tow solutions
T = 0 andT =
2V
0
sin
g
.
The distance travelled is
d
max
= V
0
cos T =
2V
0
sin cos
g
.
Question: How long it is needed for the ball to reach the height h
0
?
Obviously, we need to solve
y
0
= V
0
t sin
1
2
gt
2
for t. This is a nonlinear equation in t. In general we need to consider the solution of
f(x) = 0.
2.1 Bisection method
Let us quote the Intermediate Value Theorem from calculus.
14
Theorem 2.1 If f is continuous on [a, b] and g lies between f(a) and f(b), then there
exists a point x [a, b] such that g = f(x).
Now, for the solution of f(x) = 0 on [a, b], if f(a) f(b) < 0, then we can nd a approxi-
mation to x in the following way.
Let c = (a + b)/2. There are three possibilities.
1. f(c) = 0 c is a solution.
2. f(c) = 0, and f(c) f(b) < 0. f(x) = 0 has a solution in [c, b].
3. f(c) = 0, and f(a) f(c) < 0. f(x) = 0 has a solution in [a, c].
Clearly, we either have a solution or the solution is in an interval of which the size (length)
is half of that of [a, b]. We then repeat the above process.
Algorithm (bisection)
INPUT a, b, the tolerance TOL and the maximum number of iterations N
0
.
step 1. set i = 1;
FA = f(a)
step 2 While i N
0
do steps 36
step 3 set c = a + (b a)/2;
FC = f(c)
step 4 If c = 0 or (b c)/2 < TOL, then
OUTPUT C; STOP.
step 5 set i = i + 1
step 6 If FA FC > 0 then set a = c;
FA = FC
else set b = c.
step 7 OUTPUT(Method failed after N
0
iterations)
STOP.
Question: How many iterations are needed in order that the interval length is less then
?
Let L
0
= b a. From the construction of the bisection method we see that after k
iterations, the length becomes
L
k
=
L
0
2
k
.
We require
L
k

L
0
2
k
k log
2
L
0

.
We choose k =
_
log
2
L
0

_
, where denotes the ceiling function.
Example: If b a = 1 and = 10
6
, then k = 20.
15
2.2 Fixed point method
Denition 2.1 (xed point) A xed point of a function g(x) is a real number p such
that p = f(p).
Example. Let g(x) = x
2
2. Then the xed points of g can be found from g(x) = x
2
2 =
x, or by solving x
2
x 2 = 0. Solving this gives two xed points x
1
= 1 and x
2
= 2.
Denition 2.2 The set of functions which are continuous on [a, b] is denoted as C[a, b].
Theorem 2.2 If g C[a, b] and g(x) [a, b], x [a, b], then g has a xed point in
[a, b].
PROOF. Consider h(x) = g(x) x. Since g(x) [a, b] for all x [a, b], we have that
a g(a) and g(b) b.
Using these we have
h(a) = g(a) a 0 and h(b) = g(b) b 0.
By the Intermediate Value Theorem we see that c [a, b] such that h(c) = 0, or g(c) = c.
So, c is a xed point of g. 2
Theorem 2.3 In addition to the assumption in the above theorem, if g

(x) exists on (a, b)


and there exists a positive constant k < 1 such that
|g

(x)| k < 1, x (a, b),


then, the xed point is unique.
PROOF. We prove it by contradiction. Assume that g has two xed points p
1
, p
2
(a, b)
and p
1
= p
2
. Without loss of generality we assume that p
1
< p
2
. By the Mean Value
Theorem, d (p
1
, p
2
) such that
g

(d) =
g(p
2
) g(p
1
)
p
2
p
1
.
But g(p
i
) = p
i
for i = 1, 2. We have from the above that
g

(d) =
p
2
p
1
p
2
p
1
= 1.
This is a contradiction as we have |g

(x)| k < 1 for all x (a, b). Therefore, p


1
= p
2
. 2
Example. Show that g(x) = cos x has a unique xed point in [0, 1].
Clearly, g C[0, 1]. cos x is decreasing on [0, 1], and cos 0 = 1 and cos 1 > 0, we have
g(x) [0, 1]. Furthermore, |g

(x)| = | sin x| < sin 1 < 1 for all x (0, 1). Therefore, g has
a unique xed point in [0, 1].
Example. Consider the function g(x) = x x
3
4x
2
+ 10 on [1, 2].
16
Since g(1) = 6 and g(2) = 12, g(x) / [1, 2] for x [1, 2]. So, it is not known whether
there is a xed point in [1, 2].
Example. Consider the function g(x) =
_
10
4 + x
_
1/2
on [1, 2].
From the expression we see that g is decreasing in [1, 2] and g(1) =

2, g(2) =
_
5/3.
This implies that g(x) [1, 2] for any x [1, 2]. Also,
|g

(x)| =

10
2(x + 4)
3/2

5
10 5
3/2
< 0.15.
So, by the above theorem we have that g has a unique xed point in [1, 2].
Denition 2.3 (xed point iteration) The iteration p
n
= g(p
n1
) for n = 1, 2, ... is
called xed-point iteration.
Theorem 2.4 (Fixed-point Theorem) Let g C[a, b] be such that g(x) [a, b] for all
x [a, b]. Suppose, in addition, that g

(x) exists and there is a constant K (0, 1) such


that
g

(x) K x (a, b).


Then, for any p
0
[a, b], the xed point iteration
p
n
= g(p
n1
), n = 1, 2, ...
converges to the unique xed point p [a, b].
PROOF. For any positive integer n, using the Mean Value Theorem we have
|p
n+1
p
n
| = |g(p
n
) g(p
n1
)|
|g

(
n
)||p
n
p
n1
|
K|p
n
p
n1
|
K
2
|p
n1
p
n2
|
.
.
.
K
n
|p
1
p
0
|.
Then, for any positive integers m > n,
|p
m
p
n
| = |p
m
p
m1
+ p
m1
+ p
n+1
+ p
n+1
p
n
|
|p
m
p
m1
| + +|p
n+1
p
n
|
(K
m1
+ K
m2
+ + K
n
)|p
1
p
0
|
= K
n
(1 +K + + K
m1n
)|p
1
p
0
|
K
n

1
1 K
|p
1
p
0
| 0
as n . Therefore, {p
n
} is a Cauchy sequence in [a, b] because a g(x) b for all
x [a, b]. We thus have
p = lim
n
p
n
[a, b].
17
Since
lim
n
p
n
= lim
n
g(p
n1
) p = g(p),
we have that p is the unique xed point of g in [a, b]. 2
Example. Consider again g(x) =
_
10
4 + x
_
1/2
on [1, 2]. We have shown above that it has
a unique xed point. This xed point can be approximated by the following xed point
iteration
x
n
=
_
10
4 + x
n1
_
1/2
, n = 1, 2, ...
with any initial guess x
0
(1, 2).
2.3 Newtons method
Denition 2.4 The set of functions which together with their up to and including kth
order derivatives are continuous on [a, b] is denoted as C
k
[a, b].
By this denition and that for C[a, b] we have that C[a, b] C
0
[a, b].
Construction of the method
Let f C
2
[a, b] and consider the solution of
f(x) = 0. (2.2.3.1)
If x [a, b] is an approximation to a solution p to (2.2.3.1) in the sense that | x p| is
suciently small, then f(x) can be expanded at x as
f(x) = f( x) + f

( x)(x x) +
1
2
f

((x))(x x)
2
,
where (x) is a point between x and x. Replacing all the x in the above with p and since
f(p) = 0, we have
0 = f( x) +f

( x)(p x) +
1
2
f

((p))(p x)
2
. (2.2.3.2)
Now, if | x p| is suciently small,
( x p)
2
<< | x p|.
Then, (2.2.3.2) can be approximated by
f( x) +f

( x)(p x) = 0,
from which we get
p = x
f( x)
f

( x)
. (2.2.3.3)
18
Obviously, we need to assume that f

(p) = 0. If f(p) = 0 = f

(p), then p is a saddle point


of f and the problem becomes more complicated.
Eq.(2.2.3.3) motivates us to dene the following algorithm.
Algorithm (Newton). Given x
0
(a, b) suciently close to a solution p of (2.2.3.1),
we dene the sequence
x
k+1
= x
k

f(x
k
)
f

(x
k
)
, k = 0, 1, 2, ... (2.2.3.4)
Geometric explanation
x
f
a
x
x x
1 2
0
b
In the gure, the function
y = f(x
k
) + f

(x
k
)(x x
k
)
represents the tangent line of the curve y = f(x) at x = x
k
. The solution to
f(x
k
) +f

(x
k
)(x x
k
) = 0
gives an approximation to p better than x
k
.
Example. Let f(x) =
1
x
a, a = 0. We solve f(x) = 0. Obviously, the exact solution is
p = 1/a
Dierentiating f gives
f

(x) =
1
x
2
.
So, Newtons scheme becomes
x
k+1
= x
k

1/x
k
a
1/x
2
k
= 2x
k
ax
2
k
.
Choosing an x
0
(0, ), we can approximate a by the above iterative scheme.
Example. Let f(x) = x
2
a, a > 0. The exact positive solution is p =

a.
Applying the Newton method gives
x
k+1
= x
k

x
2
k
a
2x
k
=
1
2
_
x
k
+
a
x
k
_
.
This is known as Babylonian approximation to

a.
19
Convergence of Newtons method
Theorem 2.5 Let f C
2
[a, b]. If p [a, b] is such that f(p) = 0 and f

(p) = 0,
then > 0 such that the Newtons method generates a sequence {x
k
} given in (2.2.3.4)
converges to p for any x
0
[p , p + ].
PROOF. Let g(x) = x
f(x)
f

(x)
. Clearly p is a xed point of g, as f(p) = 0 and f

(p) = 0.
Consider
e
k+1
= x
k+1
p
= g(x
k
) g(p)
= g

(
k
)(x
k
p)
= g

(
k
)e
k
,
where is a point between x
k
and p. Dierentiating g gives
g

(x) = 1
(f

(x))
2
f(x)f

(x)
(f

(x))
2
=
f(x)f

(x)
(f

(x))
2
.
When x = p,
g

(p) =
f(p)f

(p)
(f

(p))
2
= 0.
So, if x
0
is close to p, say x
0
[p , p + ] for a suciently small > 0, from the
continuity of g

we have
|g

(x
0
)| M < 1,
where M is a positive constant. From this we have
|e
1
| = |g

(
0
)|(x
0
p)| M|e
0
|.
Similarly, we have
|e
2
| M|e
1
| M
2
|e
0
|
.
.
.
|e
k
| M|e
k1
| M
k
|e
0
| 0
as k . Therefore, x
k
p as k . 2
Rate of convergence
Denition 2.5 Suppose x
n
p as n with x
n
= p for all n. If there is a > 0 and
> 0 such that
lim
n
|x
n+1
p|
|x
n
p|

= ,
then, x
n
p at a rate of order and with asymptotic error constant . In particular,
if = 1, the sequence is linearly convergent, and
20
if = 2. it is quadratically convergent.
Theorem 2.6 (Quadratic convergence of Newtons method) If f C
3
[a, b], then
the Newtons method is quadratically convergent.
PROOF. From Theorem 2.5 we have
e
k+1
= g(x
k
) g(p).
Now, expanding g(x
k
) at p gives
g(x
k
) = g(p) + g

(p)(x
k
p) +
1
2
g

(
k
)(x
k
p)
2
=
1
2
g

(
k
)e
2
k
since g

(p) = 0. Therefore,
e
k+1
e
2
k
=
1
2
g

(
k
).
Taking the limit,
lim
n
e
k+1
e
2
k
=
1
2
g

(p) (2.2.3.5)
since x
k
p as k . Note that
g

=
ff

(f

)
2
g

=
(f

+ ff

)(f

)
2
+ff

2f

(f

)
4
and so
g

(p) =
(f

(p))
3
f

(p)
(f

(p))
4
=
f

(p)
f

(p)
= .
This, together with (2.2.3.5) imply that the rate is quadratic. 2
Slow death
Though Newtons method is quadratically convergent when x
0
is close to p, in practice,
it may take a large number of iterations before it becomes quadratic. Here is an example.
Example. Consider x
k+1
= 2x
k
ax
2
k
. The xed points for 2x ax
2
are x = 0 and
x = 1/a. We choose a = 10
10
and x
0
= a. Then
x
1
= 2x
0
ax
2
0
= 2 10
10
10
30
2 10
10
x
2
= 2x
1
ax
2
1
= 4 10
10
4 10
30
4 10
10
Every iteration the value of x
k+1
is about 2x
k
, or
x
k
2
k
10
10
.
If we want 2
k
10
10
= 1/a = 10
10
, we need above 66 iterations. This is a lot of iterations
in practice.
21
2.4 The secant method
Recall the Newtons algorithm
x
k+1
= x
k

f(x
k
)
f

(x
k
)
, k = 0, 1, 2, ...
Sometimes f

is hard to derive. So, we may simply use an approximation as follows


f

(x
k
) g
k
:=
f(x
k
+ h
k
) f(x
k
)
h
k
where h
k
> 0 is a constant. There are two problems associated with this approximations.
These are
1. The choice of h
k
is tricky. If h
k
is too large, g
k
is an inaccurate approximation to
f

(x
k
). If h
k
is too small, g
k
is also inaccurate because of the rounding error.
2. The procedure requires one extra function evaluation (f(x
k
+ h
k
)) per iteration.
This is a serious problem in practice.
Now, starting from k = 1, we choose
g
k
=
f(x
k
) f(x
k1
)
x
k
x
k1
.
The secant method is then dened as
x
k+1
= x
k

f(x
k
)
f(x
k
)f(x
k1
)
x
k
x
k1
=
x
k1
f(x
k
) x
k
f(x
k1
)
f(x
k
) f(x
k1
)
k = 1, 2, ...
where x
0
and x
1
are given.
This can be geometrically displayed by the gure.
x
f
a
x
2 b x
x
0 1
Example. Use the secant method to solve f(x) = x cos x = 0.
The secant algorithm gives
x
k+1
= x
k

(x
k1
cos x
k
)(x
k
x
k1
)
(x
k
x
k1
) (cos x
k
cos x
k1
)
k = 1, 2, ...
where x
0
and x
1
are given.
22
2.5 Quasi-Newton method
In general a quasi-Newton method is of the form
x
k+1
= x
k

f(x
k
)
g
k
, k = 0, 1, 2...
where g
k
is an approximation to f

(x
k
). The above secant method is a quasi-Newtons
method. Another choice is
g
k
= f

(x
0
).
(Assume f

(x
0
) = 0.) This is the constant slope method. Graphically, it is
x
f
a
x
x x
1 2
0
b
Convergence of constant slope method
Assume that p satises f(p) = 0. Let
e
k+1
= x
k+1
p = g(x
k
) g(p),
where g(x) = x
f(x)
f(x
0
)
. Using the Mean Value Theorem we have
e
k+1
= g

(
k
)(x
k
x) = g

(
k
)e
k
where
k
is a point between x
k
and p. But
g

(x) = 1
f

(x)
f(x
0
)
=
f(x
0
) f(x)
f

(x
0
)
.
So, we need to assume that

f(x
0
) f(p)
f(x
0
)

< K < 1.
In this case, when x
k
is suciently close to p (so is
k
), |g

(
k
)| < K < 1. Therefore,
|e
k+1
| K|e
k
| K
k+1
|e
0
| 0
as k .
23
2.6 M ullers method
While Newtons method uses a local linear approximation to the function f, M ullers one
is based on a quadratic approximation. This is done in the following two steps.
1. Given 3 points x
k2
, x
k1
and x
k
, nd a quadratic function
g(x) = a + bx + cx
2
such that
g(x
i
) = f(x
i
), i = k 2, k 1, k.
The coecients a, b and c satisfy the system
_
_
1 x
k2
x
2
k2
1 x
k1
x
2
k1
1 x
k
x
2
k
_
_
_
_
a
b
c
_
_
=
_
_
f(x
k2
)
f(x
k1
)
f(x
k
).
_
_
2. Solve g(x) = 0 for x
k+1
that lies nearest x
k
.
This is demonstrated graphically by the following gure.
3
f
a b x x
0
x
1
x
2
x
Quadratic tting of a curve
Given 3 points (x
i
, f(x
i
)), i = 0, 1, 2 on the curve y = f(x), we nd a quadratic
function of the form
g(x) = a(x x
2
)
2
+ b(x x
2
)
2
+ c
which passes through the three points. i.e., g satises
g(x
0
) = a(x
0
x
2
)
2
+ b(x
0
x
2
) + c = f
0
g(x
1
) = a(x
1
x
2
)
2
+ b(x
1
x
2
) + c = f
1
g(x
2
) = c = f
2
where f
i
= f(x
i
) for i = 0, 1, 2. Solving this system gives
c = f
2
b =
(x
0
x
2
)
2
(f
1
f
2
) (x
1
x
2
)
2
(f
0
f
2
)
(x
0
x
2
)(x
1
x
2
)(x
0
x
1
)
a =
(x
1
x
2
)
2
(f
0
f
2
) (x
0
x
2
)
2
(f
1
f
2
)
(x
0
x
2
)(x
1
x
2
)(x
0
x
1
)
24
Chapter 3
Interpolation & Polynomial
Approximation
Consider the following two questions:
Given a set of points (x
i
, y
i
), i = 0, 1, ..., n in a plane satisfying x
i1
< x
i
for i =
1, 2, ..., n. This determines a function relationship between x and y. Can we nd a
systematic way to approximate the function value at any x (x
0
, x
n
)?
If the answer to the above question is yes, how much error is involved in the ap-
proximation?
3.1 Lagrange Polynomial
Consider two points (x
0
, f
0
) and (x
1
, f
1
). We are to nd a linear function f(x) = a + bx
passing through the points. This gives
f(x
0
) = a +bx
0
= f
0
f(x
1
) = a +bx
1
= f
1
Solving this system we have
a =
x
1
f
0
x
0
f
1
x
1
x
0
, b =
f
0
f
1
x
0
x
1
.
Therefore,
f(x) =
x
1
f
0
x
0
f
1
x
1
x
0
+
f
0
f
1
x
0
x
1
x
=
_
x
1
x
1
x
0
+
x
x
0
x
1
_
f
0
+
_
x
0
x
1
x
0

x
x
0
x
1
_
f
1
=
x x
1
x
0
x
1
f
0
+
x x
0
x
1
x
0
f
1
=: L
0
(x)f
0
+ L
1
(x)f
1
.
25
Therefore, it is a linear combination of L
0
(x) and L
1
(x). These L
0
and L
1
are called
Lagrange interpolating polynomials of order 1. Graphically, L
0
is a line segment from
(x
0
, 1) to (x
1
, 0), and L
1
is the one from (x
0
, 0) to (x
1
, 1) (cf. the gure).
0
x
L L
1 0
1
x
1
In general, given n + 1 distinct nodes x
k
, k = 0, 1, ..., n, we can construct a unique
polynomial L
n,k
(x) of the form
L
n,k
(x) =
(x x
0
) (x x
k1
)(x x
k+1
) (x x
n
)
(x
k
x
0
) (x
k
x
k1
)(x
k
x
k+1
) (x
k
x
n
)
(3.3.1.1)
It is easy to check that this polynomial satises
L
n,k
(x
i
) =
_
0, i = k,
1, i = k.
(3.3.1.2)
L
n,k
is called the nth Lagrange interpolating polynomial. Graphically it is demonstrated
in the folloiwng gure.
1
x x x x x x x
0 1 k-1 k k+1 n-1 n
Theorem 3.1 If x
0
, x
1
, ..., x
n
are n + 1 distinct numbers and f(x) is a function whose
values are given at these points, then a unique polynomial P
n
(x) of degree at most n exists
with
f(x
k
) = P
n
(x
k
), k = 0, 1, ..., n, (3.3.1.3)
and P
n
(x) is given by
P
n
(x) =
n

k=0
f(x
k
)L
n,k
(x) (3.3.1.4)
where L
n,k
is the polynomial dened in (3.3.1.1).
PROOF. The existence of such a polynomial is a consequence of (3.3.1.1), (3.3.1.2) and
(3.3.1.4) because they are constructive.
To show the uniqueness of P
n
(x), we need the use the fundamental theorem of algebra,
i.e., a non-zero polynomial T(x) of degree N has at most N roots. In order words, if
T(x) is zero at N + 1 distinct nodes, it is identically zero.
26
Suppose there is another polynomial Q
n
(x) of order n satisying (3.3.1.3). We let
T(x) = P
n
(x) Q
n
(x).
Using (3.3.1.3) we have
T(x
k
) = f(x
k
) f(x
k
) = 0, k = 0, 1, .., n.
So, T(x) 0 and thus P
n
(x) = Q
n
(x). 2
Example. Consider f(x) = cos x over [0, 1.2]. Use the three nodes x
0
= 0, x
1
= 0.6 and
x
2
= 1.2 to construct a quadratic interpolation polynomial P
2
(x).
The function values at the nodes are
f
0
= cos 0 = 1, f
1
= cos 0.6 = 0.825336, f
2
= cos 1.2 = 0.362358.
So,
P
2
(x) = f
0

(x 0.6)(x 1.2)
(0 0.6)(0 1.2)
+ f
1

(x 0)(x 1.2)
(0.6 0)(0.6 1.2)
+ f
2

(x 0)(x 0.6)
(1.2 0)(1.2 0.6)
= 1.38889(x 0.6)(x 1.2) 2.292599x(x 1.2) + 0.503275x(x 0.6).
3.2 Divided dierence & Newtons polynomial
In the Lagrange interpolation (3.3.1.4), the expression (x
0
x
k
) (x
k1
x
k
)(x
k+1

x
k
) (x
n
x
k
) often causes overow or underow. A better form is
P
n
(x) = a
0
+a
1
(x x
0
) +a
2
(x x
0
)(x x
1
) + +a
n
(x x
0
) (x x
n1
). (3.3.2.5)
But we need to determine the coecients a
i
, i = 0, 1, ..., n. Clearly
a
0
= P
n
(x
0
) = f(x
0
).
When x = x
1
in (3.3.2.5), we have
f(x
0
) + a
1
(x
1
x
0
) = P
n
(x
1
) = f(x
1
).
This has the solution
a
1
=
f(x
1
) f(x
0
)
x
1
x
0
.
Similarly, setting x = x
2
in (3.3.2.5) gives
a
2
=
f(x
2
) (a
0
+ a
1
(x
2
x
0
))
(x
2
x
0
)(x
2
x
1
)
=
f(x
2
)
_
f(x
0
) +
f(x
1
)f(x
0
)
x
1
x
0
_
(x
2
x
0
)
(x
2
x
0
)(x
2
x
1
)
=
_
f(x
2
) f(x
0
)
x
2
x
0

f(x
1
) f(x
0
)
x
1
x
0
_
_
(x
2
x
1
).
27
For computational convenience, we rewrite the above as
a
2
=
_
f(x
2
) f(x
1
)
x
2
x
1

f(x
1
) f(x
0
)
x
1
x
0
_
_
(x
2
x
0
).
The above motivates us to dene the following divided dierences
0th order: f[x
i
] = f(x
i
)
1st order: f[x
i
, x
i+1
] =
f[x
i+1
] f[x
i
]
x
i+1
x
i
2nd order: f[x
i
, x
i+1
, x
i+2
] =
f[x
i+1
, x
i+2
] f[x
i
, x
i+1
]
x
i+2
x
i
.
.
.
kth order: f[x
i
.x
i+1
, ..., x
i+k
] =
f[x
i+1
, x
i+2
, ..., x
i+k
] f[x
i
, x
i+1
, ..., x
k+k1
]
x
i+k
x
i
.
Using this notation we have that the coecients in (3.3.2.5) is given by
a
k
= f[x
0
, x
1
, ..., x
k
], k = 0, 1, ...n,
and so (3.3.2.5) becomes
P
n
(x) = f[x
0
] +
n

k=1
f[x
0
, x
1
, ..., x
k
](x x
0
) (x x
k1
).
This is called the Newtons form of interpolant. The algorithmic description of the divided
dierences is
Algorithm (Divided dierences):
INPUT (x
0
, f
0
), (x
1
, f
1
), ..., (x
n
, f
n
) and let F
k,0
= f
k
for k = 0, 1, ..., n.
Step 1. For i = 1, 2, ..., n,
For j = 1, 2, ..., i
set F
i,j
=
F
i,j1
F
i1,j1
x
i
x
i1
Step 2. OUTPUT(F
0,0
, F
1,1
, ..., F
n,n
) and
P(x) =
n

i=0
F
i,i

i1
j=0
(x x
j
).
This case can also be written in the following table form
x
0
f[x
0
]
x
1
f[x
1
] f[x
0
, x
1
]
x
2
f[x
2
] f[x
1
, x
2
] f[x
0
, x
1
, x
2
]
x
3
f[x
3
] f[x
2
, x
3
] f[x
1
, x
2
, x
3
] f[x
0
, x
1
, x
2
, x
3
]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28
Theorem 3.2 Suppose that f C
n
[a, b] and x
i
, i = 0, 1, ..., n are n +1 distinct points in
[a, b]. Then, (a, b) such that
f[x
0
, x
1
, ..., x
n
] =
f
(n)
()
n!
.
PROOF. Let g(x) = f(x) P
n
(x). Since g
i
(x) = 0 for i = 0, 1, ..., n, using the Generalised
Rolles Theorem we have that (a, b) such that
g
(n)
() = 0, or f
(n)
() P
(n)
n
() = 0. (3.3.2.6)
Note that
P
n
(x) = a
n
x
n
+ lower order terms = f[x
0
, x
1
, ..., x
n
]x
n
+ lower order terms.
Dierentiating P
n
n times gives
P
(n)
n
= n!f[x
0
, x
1
, ..., x
n
].
Therefore, from (3.3.2.6) we have
f[x
0
, x
1
, ..., x
n
] =
f
(n)
n!
.
2
In the case that the nodes are equally spaced, i.e.,
h = x
i+1
x
i
i = 0, 1, ..., n 1,
we can simplify the Newtons form as follows.
Let x = x
0
+ sh for 0 s n. We have x x
i
= (s i)h, since x
i
= x
0
+ih. Thus
P
n
(x) = P
n
(x
0
+ sh)
= f[x
0
] +shf[x
0
, x
1
] + s(s 1)h
2
f[x
0
, x
1
, x
2
]
+ + s(s 1) (s n + 1)h
n
f[x
0
, ..., x
n
]
=
n

k=0
s(s 1) (s k + 1)h
k
f[x
0
, ..., x
k
].
Using the binomial coecient notation
_
s
k
_
=
s(s 1) (s k + 1)
k!
,
we have
P
n
(x) = f[x
0
] +
n

k=1
_
s
k
_
k!h
k
f[x
0
, ..., x
k
].
This is called the Newtons forward divided-dierence formula.
29
Error bound
We demonstrate it using the simplist case, i.e., the case that n = 1. Consder
e = f(x) P
1
(x),
where
P
1
(x) = f[x
0
] + f[x
0
, x
1
](x x
0
) = f[x
0
] +
f(x
1
) f(x
0
)
x
1
x
0
(x x
0
).
Now, expressing f as a Taylors expansion at x
0
,
f(x) = f(x
0
) + f

(x
0
)(x x
0
) +
f

()
2
(x x
0
)
2
,
where is a point between x
0
and x, we have that
e = f(x) P
1
(x)
= f

(x
0
)(x x
0
)
f(x
1
) f(x
0
)
x
1
x
0
(x x
0
) +
f

()
2
(x x
0
)
2
=
_
f

(x
0
)
f(x
1
) f(x
0
)
x
1
x
0
_
(x x
0
) +
f

()
2
(x x
0
)
2
.
The Taylors expansion of f(x
1
) at x
0
is
f(x
1
) = f(x
0
) +f

(x
0
)(x
1
x
0
) +
f

()
2
(x
1
x
0
)
2
with in between x
0
and x
1
. Therefore,
f

(x
0
)
f(x
1
) f(x
0
)
x
1
x
0
=
f

()
2
(x
1
x
0
)
2
x
1
x
0
.
Substituting this into the expression for e, we get
|e| =

()(x
1
x
0
)
2
2
+
f

()
2
(x x
0
)
2

M(x
1
x
0
)
2
Mh
2
.
if |f

(x)| M for all x [x


0
, x
1
], where M is a positive constant. This implies that
the method is of 2nd order accuracy. Note that this error bound is not sharp. A better
estimate is as follows.
Since e(x
0
) = 0 = e(x
1
), we have
e

() = f

() f[x
0
, x
1
] = 0
for a (x
0
, x
1
). In this case, |e(x)| attains the maximum at . The Taylors expansion
of f(x
0
) at is
f(x
0
) = f() +f

()(x
0
) +
f

()
2
(x
0
)
2
.
30
So,
e() = f() [f(x
0
) + f[x
0
, x
1
]( x
0
)]
= f()
_
f() + f

()(x
0
) +
f

()
2
(x
0
)
2
+f[x
0
, x
1
]( x
0
)
_
= [f

() f[x
0
, x
1
]](x
0
) +
f

()
2
(x
0
)
2
=
f

()
2
(x
0
)
2
.
If |f

(x)| M, x (x
0
, x
1
), then from the above we have
|e(x)| |e()|
M
2
(x
1
x
0
)
2
.
This improves the previous error bound by a factor 2. In fact, we can show that
|e(x)|
M
8
(x
1
x
0
)
2
.
Let g be dened as
g(t) = f(t) P
1
(t) [f(x) P
1
(x)]
(t x
0
)(t x
1
)
(x x
0
)(x x
1
)
.
Then, it is easy to see that
g(x
i
) = 0, i = 0, 1, g(x) = 0.
Using the generalized Rolles Theorem, there exists a (x
0
, x
1
) such that g

() = 0.
That is,
f

() [f(x) P
1
(x)]
2
(x x
0
)(x x
1
)
= 0.
From this we have
|f(x) P
1
(x)| = |f

()(x x
0
)(x x
1
)/2|
M
8
(x
1
x
0
)
2
since |(x x
0
)(x x
1
)| attains its maximum at x = (x
1
x
0
)/2.
3.3 Hermite interpolation
In the Lagrange polynomial approximation we nd P
n
(x) such that
P
n
(x
i
) = f(x
i
), i = 0, 1, ..., n.
For many cases, it is important to preserve the derivatives as well. For example, we need
to keep the convexity of a curve unchanged.
31
Denition 3.1 Let x
i
, i = 0, 1, ..., n be n + 1 distinct points in [a, b] and m
i
be a non-
negative integer associated with x
i
. Suppose f C
m
[a, b], where m = max
0in
m
i
. The
osculating polynomial approximating f is the polynomial P(x) of least degree such that
d
k
P(x
i
)
dx
k
=
d
k
f(x
i
)
dx
k
, i = 0, 1, ..., n, k = 0, 1, ..., m
i
.
There are two special cases:
When all m
i
= 0, we have the lagrange polynomial.
When all m
i
= 1, the resulting polynomial is called the Hermite polynomial.
Let us consider the approximation of f(x) on [x
0
, x
1
] by the Hermite polynomial. Since
there are 4 degrees of freedom/conditions,i.e., (x
i
, f(x
i
)) and (x
i
, f

(x
0
)) for i = 0, 1, we
let
H(x) = a + bx + cx
2
+ dx
3
.
Using the four conditions we get
H(x
i
) = a + bx
i
+ cx
2
i
+ dx
3
i
= f(x
i
)
H

(x
i
) = b + 2cx
i
+ 3dx
2
i
= f

(x
i
)
for i = 0, 1. This linear system determines the 4 constants a, b, c and d, and thus the
Hermite interpolant.
In general we have
Theorem 3.3 If f C
1
[a, b] and {x
i
}
n
0
are distinct points in [a, b], the unique Hermite
polynomial approximating f is
H
2n+1
(x) =
n

j=0
f(x
j
)H
n,j
(x) +
n

j=0
f

(x
j
)

H
n,j
(x),
where
H
n,j
(x) = [1 2(x x
j
)L

n,j
(x
j
)]L
2
n,j
(x)

H
n,j
(x) = (x x
j
)L
2
n,j
(x)
and L
n,j
(x) is the Lagrange polynomial dened in (3.3.1.1).
Example. Find the Hermite polynomial approximating
x
k
f(x
k
) f

(x
k
)
1.3 0.620 -0.522
1.6 0.455 -0.540
1.9 0.282 -0.581
32
First, lets nd the agrange polynomials and their derivatives:
L
2,0
(x) =
50
9
x
2

175
9
x +
152
9
L

2,0
(x) =
100
9
x
175
9
L
2,1
(x) =
100
9
x
2
+
320
9
x
247
9
L

2,1
(x) =
200
9
x +
320
9
L
2,2
(x) =
50
9
x
2
+
145
9
x +
104
9
L

2,2
(x) =
100
9
x +
145
9
.
Now, the Hermite basis functions are
H
2,0
(x) = [1 2(x 1.3)(5)]L
2
2,0
(x)
= (10x 12)
_
50
9
x
2

175
9
x +
152
9
_
2
H
2,1
(x) =
_
100
9
x
2
+
320
9
x
247
9
_
2
H
2,2
= 10(2 x)
_
50
9
x
2
+
145
9
x +
104
9
_
2

H
2,0
(x) = (x 1.3)
_
50
9
x
2

175
9
x +
152
9
_
2

H
2,1
(x) = (x 1.6)
_
100
9
x
2
+
320
9
x
247
9
_
2

H
2,2
(x) = (x 1.9)
_
50
9
x
2
+
145
9
x +
104
9
_
2
Finally, the Hermite polynomial is
H
5
(x) = 0.620H
2,0
(x) + 0.455H
2,1
(x) + 0.282H
2,2
(x)
0.522

H
2,0
(x) 0.570

H
2,1
(x) 0.581

H
2,2
(x).
3.4 Cubic spline interpolation
The Lagrange interpolation is normally used as a piecewise polynomial approximation.
This is because the error in the approximation is bounded by C(b 1)
p
with p > 0. Only
when b 1 < 1, the method converges. Therefore, the Lagrange interpolation is always
used to approximate a function locally. This will cause the problem of un-smoothness
as the approximate curve may not be smooth at some of the mesh points. For example,
a curve obtained by piecewise linear interpolants is not smooth at all the mesh points.
Therefore, we need to nd an interpolant which is piecewise polynomial and smooth.
33
Let (x
i
, f(x
i
)), i = 0, 1, ..., n be a set of n + 1 distinct points, and consider the in-
terpolation of the segment (x
k
, f(x
k
)) by S(x) consisting of three parts S
j1
(x), S
j
(x)
and S
j+1
(x) dened on the three subinterval (x
k
, x
k+1
), k = j 1, j, j + 1, respectively.
We require that the polynomial S(x) and its 1st and 2nd derivatives are continuouos on
(x
j1
, x
j+2
). In particular, S(x) and the derivatives should be continuous at the points
x
j
and x
j+1
, or
S
(k)
j
(x
j
) = S
(k)
j1
(x
j
),
S
(k)
j
(x
j+1
) = S
(k)
j+1
(x
j+1
)
for k = 0, 1, 2 and
S
j
(x
j
) = f(x
j
), S
j
(x
j+1
) = f(x
j+1
),
S
j1
(x
j1
) = f(x
j1
), S
j+1
(x
j+2
) = f(x
j+2
).
The number of equations in the above is 10. We may also impose either the free or natural
boundary condition
S

j1
(x
j1
) = 0 = S

j+1
(x
j+2
)
or the clamped boundary condition
S

j1
(x
j1
) = f

(x
j1
), S

j+1
(x
j+2
) = f

(x
j+2
).
Therefore, altogether we have 12 degrees of freedom. This implies that S(x) can have up
to 12 unknown constants. We now dene
S
k
(x) = a
k
+b
k
(x x
k
) + c
k
(x x
k
)
2
+ d
k
(x x
k
)
3
for k = j 1, j, j + 1. There are 12 unknown constants which can be determined by the
above 12 equations. Clearly, setting x = x
k
, we get
a
k
= S
k
(x
k
) = f(x
k
)
for k = j 1, j, j + 1.
The above idea can be extended to the set of n + 1 nodes (x
j
, f(x
j
)), j = 0, 1, ..., n so
that the constant a
k
is given by
a
k
= S
k
(x
k
) = f(x
k
), k = 0, 1, ..., n 1 (3.3.4.7)
From the condition S
j+1
(x
j+1
) = S
j
(x
j+1
) = a
j+1
we have
a
j
+ b
j
h
j
+ c
j
h
2
j
+ d
j
h
3
j
= a
j+1
, (3.3.4.8)
where h
j
= x
j+1
x
j
. Dierentiating S
j
gives
S

j
(x
j
) = [b
j
+ 2c
j
(x x
j
) + 3d
j
(x x
j
)
2
]
x=x
j
= b
j
.
From S

j
(x
j+1
) = S

j+1
(x
j+1
) we have
b
n
= S

(x
n
) (3.3.4.9)
b
j+1
= b
j
+ 2c
j
h
j
+ 3d
j
h
2
j
(3.3.4.10)
34
for j = 0, 1, ..., n 1. Continuity of S

(x) gives
S

(x
n
)/2 = c
n
, S

j
(x
j
) = 2c
j
,
and S

j
(x
j+1
) = S

j+1
(x
j+1
) yields
c
j+1
= c
j
+ 3d
j
h
j
.
From this we have
d
j
=
c
j+1
c
j
3h
j
. (3.3.4.11)
Substituting this into (3.3.4.8) and (3.3.4.10) we have
a
j+1
= a
j
+ b
j
h
j
+
h
2
j
3
(2c
j
+c
j+1
) (3.3.4.12)
b
j+1
= b
j
+ h
j
(c
j
+ c
j+1
). (3.3.4.13)
Solving (3.3.4.12) for b
j
gives
b
j
=
a
j+1
a
j
h
j

h
j
3
(2c
j
+ c
j+1
). (3.3.4.14)
Substituting this into (3.3.4.13) and re-arranging we get
h
j1
c
j1
+ 2(h
j1
+ h
j
)c
j
+ h
j
c
j+1
=
3
h
j
(a
j+1
a
j
)
3
h
j1
(a
i
a
j1
) (3.3.4.15)
for j = 1, 2, ..., n 1. This is a tri-diagonal system if c
0
and c
n
are given. We thus have
Solution of (3.3.4.15) gives c
j
.
Solution of (3.3.4.14) gives b
j
.
Solution of (3.3.4.11) gives d
j
.
a
j
is given by (3.3.4.7).
Theorem 3.4 If f is dened at a = x
0
< x
1
< < x
n
= b, then f has a unique natural
spline interpolant S(x) on the notes x
j
, j = 0, 1, ..., n, i.e., S(x) satises S

(a) = S

(b) =
0.
PROOF. From S

(a) = S

(b) = 0 we have that c


0
= c
n
= 0. Other coecients can be
uniquely determined by (3.3.4.15), (3.3.4.14), (3.3.4.11) and (3.3.4.7). 2
Similarly, the spline satisfying the clamped boundary condition is also uniquely de-
ned.
35
Chapter 4
Numerical Integration & Numerical
Dierentiation
Numerical Integration/Numerical Quadrature Rules
Many integrals in practice cannot be or really hard to be done exactly. For example,
the integrals
erf(x) =
2

_
x
0
e
t
2
dt and
_
b
a
sin x
2
dx.
The rst is the error function which cannot be evaluated exactly for any nite x > 0. In
this case, only numerical value of such an integral can be obtained. In fact, the denition of
denite integral provides the simplest numerical quadrature rule as demonstrated below.
Consider the integral
_
b
a
f(x)dx, and divide [a, b] into n equally spaced subintervals
with nodes
x
i
= a +
i
n
(b a), i = 0, 1, ..., n.
By denition, the integral is equal to
_
b
a
f(x)dx = lim
n
n

i=1
f(
i
)
b a
n
.
where
i
(x
i1
, x
i
) is arbitrary. Therefore, when n is large, we have
_
b
a
f(x)dx
1
n
n

i=1
f(
i
).
More generally, we let x
i
, i = 0, 1, ..., n be n + 1 distinct points in [a, b] with x
0
= a and
x
n
= b, and approximate f(x) on [a, b] by the Lagrange interpolant
f(x) P
n
(x) =
n

i=0
f(x
i
)L
n,i
(x).
36
Then, the integral can be approximated by
_
b
a
f(x)
_
b
a
P
n
(x)dx =
n

i=0
f(x
i
)
_
b
a
L
n,i
(x)dx. (4.4.0.1)
The last integral can be evaluated exactly as L
n,i
is a polynomial of order n.
4.1 Trapezoidal rule
Consider the integral
_
b
a
f(x)dx and let n = 2 in (4.4.0.1). Without loss of generality, we
assume that a = 0 and b = h. Approximating f(x) by the linear Lagrange interpolation
gives
_
h
0
f(x)dx
_
h
0
_
f(0) +
f(h) f(0)
h
x
_
dx =
f(0) +f(h)
2
h =: T(f).
For the integral of f on a general interval [a, b], we let h = b a and the transformation
y = x a transforms [a, b] to [0, h].
Error bound
We consider upper bound for

_
h
0
f(x)dx T(f)

_
h
0
[f(x) P
1
(x)] dx

. (4.4.1.2)
Let
g(t) = f(t) P
1
(t) [f(x) P
1
(x)]
(t x
0
)(t x
1
)
(x x
0
)(x x
1
)
for an x (x
0
, x
1
). (Note that x
0
= 0 and x
1
= h in the case here.) Then, we have
g(x
0
) = 0 = g(x
1
). Furthermore,
g(x) = f(x) P
1
(x) [f(x) P
1
(x)]
(x x
0
)(x x
1
)
(x x
0
)(x x
1
)
= 0.
Using the generalized Rolles theorem we have
g

() = f

() P

1
() [f(x) P
1
(x)] 2 = 0
for an (a, b). But P

1
= 0. So,
f(x) P
1
(x) =
1
2
f

()(x x
0
)(x x
1
).
Applying this to (4.4.1.2) gives

_
h
0
f(x)dx T(f)

1
2
_
h
0
|f

()x(x h)| dx
=
1
2

()
_
h
0
x(x h)dx

M
12
h
3
,
where M > 0 is such that |f

(x)| M for all x [0, h]. In the above we used the


weighted mean value theorem since x(x h) does not change sign in (a, b). Therefore, the
Trapezoidals quadrature rule is of 3rd order accuracy.
37
4.2 Simpsons rule
We approximate f on [a, b] by a quadratic function
P
2
(x) =
(x x
1
)(x x
2
)
h
1
h
2
f(x
0
) +
(x x
0
)(x x
2
)
h
1
h
2
f(x
1
) +
(x x
0
)(x x
1
)
(h
1
+ h
2
)h
2
f(x
2
),
where h
1
= x
1
x
0
and h
2
= x
2
x
1
. Then, we have
_
b
a
f(x)dx
_
b
a
P
2
(x)dx.
This is certainly more accurate the the Trapezoidals rule. But we can derive a quadrature
rule in a dierent way i.e., the undetermined coecients. This gives the Simpsons rule.
For simplicity and without loss of generality, we let x
0
= 0, x
1
= 1/2 and x
2
= 1, so
that h = 1/2. Assume that the quadrature rule is of the form
_
1
0
f(x)dx
_
1
0
L(x)dx = A
0
f(0) +A
1
f(
1
2
) +A
2
f(1),
where A
0
, A
1
and A
2
are constants to be determined. To determine these constants, we
require that
_
1
0
f(x)dx =
_
1
0
L(x)dx
holds for f(x) = 1, x and x
2
.
Case 1: f(x) = 1
A
0
+A
1
+ A
2
=
_
1
0
dx = 1 (4.4.2.3)
Case 2: f(x) = x
A
0
0 + A
1

1
2
+ A
2
1 =
_
1
0
xdx =
1
2
. (4.4.2.4)
Case 3: f(x) = x
2
A
0
0 + A
1

1
4
+ A
2
1 =
_
1
0
x
2
dx =
1
3
. (4.4.2.5)
Solving these 3 3 system gives
A
0
= A
2
=
1
6
, A
1
=
2
3
.
So,
_
1
0
f(x)dx
1
6
f(0) +
2
3
f(
1
2
) +
1
6
f(1) =
h
3
_
f(0) + 4f(
1
2
) +f(1)
_
.
In general we have
_
x
2
x
0
f(x)dx
h
3
[f(x
0
) + 4f(x
1
) + f(x
2
)] .
This is the Simpsons rule. It can be shown that the error is given by
_
x
2
x
0
f(x)
h
3
[f(x
0
) + 4f(x
1
) +f(x
2
)] =
h
5
90
f
(4)
()
where (x
0
, x
2
).
38
Denition 4.1 (Degree of accuracy or precision) The degree of accuracy of a quadra-
ture formula is the largest positive integer n such that the formula is exact for x
k
, k =
0, 1, ..., n.
Clearly we have
Trapezoidal rule has a degree of accuracy 1.
Simpsons rule has a degree of accuracy 2.
4.3 Newton-Cote formulas
(n + 1)-point closed Newton-Cotes rule
Let x
k
, k = 0, 1, ..., n, be n + 1 distinct points in [a, b] with x
0
= a and x
n
= b. We
wish to nd a
k
, k = 0, 1, ..., n such that
_
b
a
f(x)dx =
n

j=0
a
j
f(x
j
)
holds for all polynomial f(x) or order n. Let
L
n,i
(x) =
n

i=0,i=i
(x x
j
)
(x
i
x
j
)
be the Lagrange polynomial satisfying L
i
(x
j
) =
ij
, where
ij
denotes the Kronecker delta.
Then, replacing f by L
i
in the above expression we have
_
b
a
L
n,i
(x)dx =
n

j=0
a
j
L
n,i
(x
j
) = a
i
for i = 0, 1, ..., n. The (closed) Newton-Cotes formula is
_
b
a
f(x)dx
n

j=0
f(x
j
)
_
b
a
L
n.j
(x)dx.
Theorem 4.1 Suppose that

n
j=0
a
j
f(x
j
) denotes the (n+1)-point Newton Cotes formula
with a = x
0
and b = x
n
, and h = (b a)/n. Then (a, b) such that
_
b
a
f(x) =
n

j=0
a
j
f(x
j
) +
h
n+3
f
(n+2)
()
(n + 2)!
_
n
0
t
2
(t 1) (t n)dt
if n is even and f C
n+2
[a, b], and
_
b
a
f(x) =
n

j=0
a
j
f(x
j
) +
h
n+2
f
(n+1)
()
(n + 1)!
_
n
0
t(t 1) (t n)dt
if n is odd and f C
n+1
[a, b].
39
Examples are
n = 1: Trapezoidal rule.
n = 2: Simpsons rule.
n = 3: Simpsons Three-Eights rule
_
x
3
x
0
f(x)dx =
3h
8
[f
0
+ 3f
1
+ 3f
2
+f
3
]
3h
5
80
f
(4)
().
n = 4:
_
x
4
x
0
f(x)dx =
2h
45
[7f
0
+ 32f
1
+ 12f
2
+ 32f
3
+ 7f
4
]
8h
7
945
f
(6)
().
(n + 1)-point open Newton-Cotes rule
Let x
i
, i = 1, 0, 1, ..., n + 1 be points satisfying
x
1
= a < x
0
< x
1
< < x
n
< x
n+1
= b
and h = (b a)/(n + 2). Then, (a, b) such that
_
b
a
f(x)dx =
n

j=0
_
b
a
L
j
(x)dx +
h
n+3
f
(n+2)
()
(n + 2)!
_
n+1
1
t
2
(t 1) (t n)dt
if n is even and f C
n+2
[a, b], and
_
b
a
f(x)dx =
n

j=0
_
b
a
L
j
(x)dx +
h
n+2
f
(n+1)
()
(n + 1)!
_
n+1
1
t(t 1) (t n)dt
if n is odd and f C
n+1
[a, b].
A typical example is the case that n = 0. It is the mid-point rule
_
x
1
x
1
f(x)dx = 2hf(x
0
) +
h
3
3
f

(), x
0
=
a + b
2
=
x
1
+ x
1
2
.
4.4 Composite rules
From the above discussion we see that h need to be small in order that the approximate
error is small (normally bounded by h
p
for a positive number p). Therefore, if ba is large
we have to divide it into a number of subintervals and apply a quadrature rule within
each of the subintervals. This gives a composite quadrature rule.
40
4.4.1 Composite Simpsons rule
Let x
i
= a+ih for i = 0, 1, ..., n with h = (ba)/n, and denote f
i
= f(x
i
) for i = 0, 1, ..., n.
On [x
i
, x
i+2
], the Simpsons rule is
_
x
i+2
x
i
f(x)dx
h
3
(f
i
+ 4f
i+1
+f
i+2
).
Now, if n is even, we group the intervals into
[x
0
, x
2
], [x
2
, x
4
], ..., [x
n2
, x
n
]
and so
3
h
_
b
a
fdx
f
0
+4f
1
+f
2
+f
2
+4f
3
+f
4
.
.
.
+f
n4
+4f
n3
+f
n2
+f
n2
+4f
n1
+f
n
Therefore,
_
b
a
fdx
h
3
[f
0
+ 4f
1
+ 2f
2
+ 4f
3
+ + 2f
n2
+ 4f
n1
+ f
n
].
Theorem 4.2 Let f C
4
[a, b], n be even, h = (ba)/n and x
j
= a+jh for j = 0, 1, ..., n.
Then, (a, b) such that
_
b
a
fdx =
h
3
_
_
f(a) + 2
n/21

j=1
f(x
2j
) + 4
n/2

j=1
f(x
2j1
) +f(b)
_
_

b a
180
h
4
f
(4)
().
The following is an algorithm for realizing Simpsons rule.
Algorithm:
INPUT a, b, even positive integer n.
Step 1 h = (b a)/n.
Step2 XI0 = f(a) + f(b).
XI1 = 0
XI2 = 0.
Step 3 For i = 1, 2, ..., n 1, do X = a + ih
If i is even, then XI2 = XI2 + f(x), else XI1 = XI1 + f(x).
Step 4 XI = h(XI0 + 2 XI1 + 4 XI2)/3.
Step 5 OUTPUT(XI).
STOP.
41
4.4.2 Composite trapezoidal and mid-point rules
Let h = (b a)/n and x
j
= a + jh for j = 0, 1, ..., n. Then,
2
h
_
b
a
fdx
f
0
+f
1
+f
1
+f
2
.
.
.
+f
n2
+f
n1
+f
n1
+f
n
= f
0
+ 2f
1
+ 2f
2
+ + 2f
n1
+ f
n
.
From this we have
_
b
a
fdx
h
2
_
f(a) + 2
n1

j=1
f
j
+ f(b).
_
This is the composite trapezoidal rule. The error term is
E
n
=
b a
12
h
2
f

()
for a (a, b).
Similarly, the composite mid-point rule is
_
b
a
fdx = 2h
n/2

j=0
f(x
2j
) +
b a
6
h
2
f

(),
where n is a ven positive integer, h = (b1)/(n+2) and x
j
= a+(j +1)h. The mid-point
rule can also be expressed as
_
b
a
fdx h
n1

j=0
f
_
x
j
+ x
j+1
2
_
4.5 Gauss quadrature
Consider
_
1
1
f(x)dx c
1
f(x
1
) + c
2
f(x
2
) (4.4.5.6)
where 1 x
1
, x
2
1. Clearly, we can determine the Trapezoidal rule by forcing the
above formula to be exact for f = 1 and f = x, assuming x
1
and x
2
are xed. This
gives the degree of precision 1. Since x
1
and x
2
are also arbitrary, we may determine up
to 4 unknown constants. Therefore, let us consider the case that (4.4.5.6) is exact for
42
f = 1, x, x
2
and x
3
. This gives the following four equations.
c
1
+ c
2
=
_
1
1
dx = 2
c
1
x
1
+ c
2
x
2
=
_
1
1
xdx = 0
c
1
x
2
1
+ c
2
x
2
2
=
_
1
1
x
2
dx =
2
3
c
1
x
3
1
+ c
2
x
3
2
=
_
1
1
x
3
dx = 0.
Solving this 4 4 non-linear system gives
c
1
= c
2
= 1, x
1
=

3
3
, x
2
=

3
3
.
Therefore,
_
1
1
f(x)dx f
_

3i
3
_
+f
_

3
3
_
.
This is the 2-point Gauss quadrature rule.
Denition 4.2 (Legendre polynomials) Polynomials {P
0
(x), P
1
(x), ..., P
n
(x), ...} are
said to be Lagendre polynomials if
1. for each n, P
n
is a polynomial of order n, and
2.
_
1
1
P(x)P
n
(x) = 0 whenever P(x) is a polynomial of degree < n.
The rst few Legendre polynomials:
P
0
(x) = 1, P
1
(x) = x, P
2
(x) = x
2

1
3
,
P
3
(x) = x
3

3
5
x, P
4
(x) = x
4

6
7
x
2
+
3
35
.
Theorem 4.3 Suppose that x
1
, x
2
, ..., x
n
are the roots of the nth Legendre polynomial
P
n
(x) and let
c
i
=
_
1
1
n

j=1j=i
x x
j
x
i
x
j
dx =
_
1
1
L
n1,i
(x)dx,
where L
n1,i
is the Lagrange polynomial constructed using x
i
, i = 1, 2, ..., n. If P(x) is any
polynomial of degree < 2n, then
_
1
1
P(x)dx =
n

i=1
c
i
P(x
i
).
43
PROOF. If the degree of P(x) is less than n, we have
P(x) =
n

i=1
L
n1,i
(x)P(x
i
).
So,
_
1
1
P(x)dx =
_
1
1
n

i=1
L
n1,i
P(x
i
)
=
n

i=1
P(x
i
)
_
1
1
L
n1,i
(x)dx
=
n

i=1
c
i
P(x
i
).
When the degree of P(x) is less than 2n. we let
P(x) = Q(x)P
n
(x) + R(x),
where Q and R are polynomials whose degrees < n. From the denition of Lagendre
polynomials we have
_
1
1
Q(x)P
n
(x)dx = 0.
Also,
P(x
i
) = Q(x
i
)P
n
(x
i
) + R(x
i
) = R(x
i
)
since P(x
i
) = 0 for all i = 1, 2, ..., n. So,
_
1
1
P(x)dx =
_
1
1
[Q(x)P
n
(x) + R(x)]dx
=
_
1
1
R(x)dx =
n

i=1
c
i
R(x
i
) =
n

i=1
c
i
P(x
i
).
2
The points and weights of the 2 and 3-point Gauss quadratures are
n x
n,i
c
n,i
2 0.5773502692 1
-0.5773502692 1
3 0.7745966692 0.5555555556
0.0000000000 0.9999999999
-0.7745966692 0.5555555556
Numerical dierentiation
From Calculus we have
f

(x) =
f(x + h) f(x)
h
+O(h) =
f(x) f(x h)
h
+O(h).
44
Let us consider a linear combination of f(x h), f(x) and f(x +h) of the form
c
1
f(x + h) + c
2
f(x) + c
3
f(x h),
assuming h is small. Taylors expansions for these are, respectively,
f(x +h) = f(x) + hf

(x) +
h
2
2
f

(x) +
h
3
6
f

(x) +
h
4
24
f
(4)
(x) +
f(x) = f(x)
f(x h) = f(x) hf

(x) +
h
2
2
f

(x)
h
3
6
f

(x) +
h
4
24
f
(4)
(x) +
4.6 First derivatives
Forward dierence: c
1
= 1, c
2
= 1, c
3
= 0.
f(x + h) f(x) = hf

(x) +
h
2
2
f

(x) +O(h
3
).
From this we have
f

(x) =
f(x + h) f(x)
h
+O(h).
Central dierence: c
1
= 1, c
2
= 0, c
3
= 1.
f(x + h) f(x h) = 2hf

(x) +
h
3
3
f

(x) +O(h
5
),
or
f

(x) =
f(x + h) f(x h)
2h
+O(h
2
).
Backward dierence: c
1
= 0, c
2
= 1, c
3
= 1.
f(x) f(x h) = hf

(x) +O(h
2
),
or
f

(x) =
f(x) f(x h)
h
+O(h)
Clearly, the accuracy of the central dierence is one order higher than those of the other
two.
4.7 Second derivatives
Using the weights c
1
= 1, c
2
= 2 and c
3
= 1, we have
f(x + h) 2f(x) + f(x h) = h
2
f

(x) +O(h
4
).
From this we have
f

(x) =
f(x +h) 2f(x) +f(x h)
h
2
+O(h
2
).
This is the central dierence for f

(x).
45
4.8 Computational errors
Let D
h
(f) =
f(x +h) f(x)
h
and D(f) = D
h
(f) +O(h). Then,
|D(f) D
h
(f)| O(h) =
M
2
h, (4.4.8.7)
where M = max
axb
|f

(x)|. In practice, some computational errors are involved when


evaluating f. We assume

f(t) = f(t) + e(t)


with |e(t)| for a given machine error . So, the bound for the computational error is
|D
h
(

f) D
h
(f)|

e(t + h) e(t)
h

2
h
.
Combining this with the discretisation error (4.4.8.7) we have
|D
h
(

f) D(f)|
2
h
+
M
2
h.
When h is small, 2/h may be dominant. The optimal choice of h is such that
2
h
=
M
2
h.
This gives
h
2
=
4
M
h = 2
_

M
.
46
Chapter 5
Numerical Solution of Ordinary
Dierential Equations (ODEs)
5.1 Eulers method
Consider the following initial value problem (IVP)
dy
dt
= f(t, y), a < t b, (5.5.1.1)
y(a) = , (5.5.1.2)
where a, b R with a < b. Let [a, b] be divided into N equally spaced subintervals with
break points
t
i
= a + ih, i = 0, 1, ..., N
and h =
b a
N
. For each i, using Taylors expansion we have
y(t
i+1
) = y(t
i
) + hy

(t
i
) +
h
2
2
y

(
i
)
for any
i
(t
i
, t
i+1
). Since y satises (5.5.1.1), we have
y(t
i+1
) = y(t
i
) + hf(t
i
, y
i
) +
h
2
2
y

(
i
).
Omitting the 2nd order term and using the initial condition (5.5.1.2) we get the following
Eulers algorithm for the approximation of (5.5.1.1) and (5.5.1.2)
w
i+1
= w
i
+ hf(t
i
, w
i
), i = 0, 1, ..., N 1, (5.5.1.3)
w
0
= . (5.5.1.4)
A geometric explanation is given in the gure below.
47
i+1
t t
t
y
i i+1
w
y
i+1
Error bounds for Eulers method
Lemma 5.1 For all x 1 and any positive m, we have
0 (1 +x)
m
e
mx
.
PROOF. From Taylors theorem we have
e
x
= 1 +x +
x
2
2
e

for a between 0 and x. Since the last term in the above is non-negative, we have
e
x
1 + x 0
if x 1. From this we have the result. 2
Lemma 5.2 If s and t are positive real numbers, {a
i
}
k
0
is a sequence satisfying a
0
t/s
and
a
i+1
(1 +s)a
i
+ t, i = 0, 1, ..., k,
then,
a
i+1
e
(i+1)s
_
a
0
+
t
s
_

t
s
.
The proof of this, which uses Lemma 5.1, is omitted here. For the error in the approx-
imation of (5.5.1.1) by (5.5.1.3) we have the following theorem.
Theorem 5.1 Suppose f is continuous and satises the Lipschitz condition on D =
{(t, y) : a x b, < y < }, i.e.,
|f(t, y
1
) f(t, y
2
)| L|y
1
y
2
|. (5.5.1.5)
If there exists M > 0 such that
|y

(x)| M, t [a, b], (5.5.1.6)


then, we have
|y(t
i
) w
i
|
hM
2L
_
e
L(t
i
a)
1
_
,
where {w
i
} is the sequence from Eulers method.
48
PROOF. By Taylors expansion we have
y(t
i+1
) = y(t
i
) + hf(t
i
, y(t
i
)) +
h
2
2
y

(
i
).
From this and (5.5.1.3) we have
|y
i+1
w
i+1
| =

y
i
w
i
+ h[f(t
i
, y
i
) f(t
i
, w
i
)] +
h
2
2
y

(
i
)

|y
i
w
i
| +h|f(t
i
, y
i
) f(t
i
, w
i
)| +
h
2
2
|y

(
i
)|
(1 +hL) |y
i
w
i
| +
h
2
M
2
.
In the above we have used (5.5.1.5) and (5.5.1.6). Let e
i
= y
i
w
i
. The above becomes
|e
i+1
| (1 +hL) |e
i
| +
h
2
M
2
.
Using (5.2) we have
|e
i+1
| e
(i+1)hL
_
|e
0
| +
h
2
M
2hL
_

h
2
M
2hL
.
But e
0
= w
0
= 0 and (i + 1)h = t
i+1
a. Therefore, we get
|e
i+1
|
hM
2L
_
e
(t
i+1
a)L
1
_
.
This completes the proof. 2
Example. Consider y

= y t
2
+ 1, 0 t 2 with the initial condition y(0) = 0.5.
For this example, the exact solution is
y(t) = (1 +t)
2

1
2
e
t
.
So,
|y

| 0.5e
2
2 =: M.
Also, |f(t, y
1
) f(t, y
2
)| = |y
1
y
2
|, implying L = 1. Thus,
|e
i
|
h(0.5e
2
2)
2
_
e
(t
i
0)1
1
_
=
h(0.5e
2
2)
2
_
e
t
i
1
_
5.2 Higher-order Taylor methods
Denition 5.1 The dierence method
w
0
=
w
i+1
= w
i
+ h(t
i
, w
i
), i = 0, 1, ..., N 1
49
has a local truncation error

i+1
(h) =
y
i+1
(y
i
+ h(t
i
, y
i
))
h
=
y
i+1
y
i
h
(t
i
, y
i
)
for each i = 0, 1, ..., N 1.
For Eulers method, (t
i
, y
i
) = f(t
i
, y
i
), and so

i+1
(h) =
y
i+1
y
i
h
f(t
i
, y
i
)
=
y
i
+ y

(t
i
)h +
h
2
2
y

(
i
) y
i
h
f(t
i
, y
i
)
= y

(t
i
) f(t
i
, y
i
) +
h
2
y

(
i
)
=
h
2
y

(
i
).
In the case that y

satises (5.5.1.6). we have


|
i+1
(h)|
h
2
M.
Suppose (5.5.1.1) and (5.5.1.2) has a solution and y(t) has n + 1 continuous derivatives
for a positive integer n. At t = t
i
, Taylors expansion gives
y
i+1
= y
i
+ hy

i
+
h
2
2
y

i
+ +
h
n
n!
y
(n)
i
+
h
n+1
(n + 1)!
y
(n+1)
(
i
)
for a
i
(a, b). Dierentiating (5.5.1.1) k 1 times gives
y
(k)
(t) = f
(k1)
(t, y(t)), k = 1, 2, ..., n.
Therefore,
y
i+1
= y
i
+ +hf(t
i
, y
i
) +
h
2
2
f

(t
i
, y
i
) + +
h
n
n!
f
(n1)
(t
i
, y
i
) +
h
n+1
(n + 1)!
f
(n)
(
i
, y(
i
))
= y
i
+ h
_
f(t
i
, y
i
) +
h
2
f

(t
i
, y
i
) + +
h
n1
n!
f
(n1)
(t
i
, y
i
)
_
+
h
n+1
(n + 1)!
f
(n)
(
i
, y(
i
))
=: y
i
+ hT
(n)
(t
i
, y
i
) +
h
n+1
(n + 1)!
f
(n)
(
i
, y(
i
)). (5.5.2.7)
Therefore. we have the Taylors method of order n as follows
w
0
=
w
i+1
= w
i
+ hT
(n)
(t
i
, w
i
), i = 0, 1, ..., N 1
From (5.5.2.7) we have that the local truncation error bound is

i+1
=
y
i+1
y
i
h
T
(n)
(t
i
, y
i
)
=
h
n
(n + 1)!
f
(n)
(
i
, y(
i
)) = O(h
n
).
50
Example. Construct the Taylors methods of orders 2 and 4 for the following IVP
y

= y t
2
+ 1, 0 t 2
y(0) = 0.5
Dierentiating f repeatedly and using y

= y t
2
+ 1 we have
f

(t, y) = y

2t = y t
2
+ 1 2t
f

(t, y) = y

2t 2 = y t
2
+ 1 2t 2
= y t
2
2t 1
f

(t, y) = y

2t 2 = y t
2
+ 1 2t 2
= y t
2
2t 1.
Therefore,
T
(2)
= f(t
i
, w
i
) +
h
2
f

(t
i
, w
i
)
= w
i
t
2
i
+ 1 +
h
2
[w
i
t
2
i
+ 1 2t
i
]
= (1 +
h
2
)(w
i
t
2
i
+ 1) ht
i
.
The 2nd order Taylors method is
w
0
= 0.5
w
i+1
= w
i
+ h
__
1 +
h
2
_
(w
i
t
2
i
+ 1) ht
i
_
Similarly, we have
T
(4)
=
_
1 +
h
2
+
h
2
6
+
h
3
24
_
(w
i
t
2
i
)
_
1 +
h
3
h
2
12
_
+ 1 +
h
2

h
2
6

h
3
24
.
Therefore, the 4th order method is
w
0
= 0.5
w
i+1
= w
i
+ hT
(4)
(t
i
, w
i
), i = 0, 1, ..., N 1.
5.3 Runge-Kutta and Other Methods
Consider
w
0
= 0.5
w
i+1
= w
i
+ hT
(2)
(t
i
, w
i
), i = 0, 1, ..., N 1,
where
T
(2)
(t, y) = f(t, y) +
h
2
f

(t, y).
51
This involves the 1st derivative of f. Using the chain rule we have
f

(t, y) = f
t
(t, y) +f
y
(t, y)y

= f
t
(t, y) + f
y
(t, y) f(t, y).
Substituting into T
(2)
gives
T
(2)
(t, y) = f(t, y) +
h
2
f
t
(t, y) +
h
2
f
y
(t, y) f(t, y).
We now consider the approximation of f
t
and f
y
by linear combinations of f at some
points. Using Taylors expansion for two variables we have
f(t +
1
, y +
1
) = f(t, y) +
1
f
t
(t, y) +
1
f
y
(t, y) +O(
2
1
+
1

1
+
2
1
).
Matching the coecients of the rst order terms in T
(2)
and f(t +
1
, y +
1
) gives

1
=
h
2
and
1
=
h
2
f(t, y).
Therefore,
T
(2)
= f(t,
h
2
, y +
h
2
f(t, y)) + O(h
2
+ h
2
f
2
).
We thus dene the following 2nd order Runge-Kutta scheme
w
0
=
w
i+1
= w
i
+ hf(t
i
+
h
2
, w
i
+
h
2
f(t
i
, w
i
)), i = 0, 1, ..., N 1,
Obviously, from the above analysis we see that

i+1
(h) = O(h
2
).
But this method does not need f

.
Consider the mean value theorem
y
i+1
y
i
h
= y

() = f(, y()) (5.5.3.8)


for a (t
i
, t
i+1
). If we choose = t
i
+
h
2
, we have
y
i+1
y
i
h
f(t
i
+
h
2
, y(t
i
+
h
2
)) = f(t
i
+
h
2
, y(t
i
) +y

(t
i
)
h
2
+ O(h
2
))).
Therefore, we dene the following Mid-point Method
w
0
=
w
i+1
= w
i
+ hf(t
i
+
h
2
, w
i
+
h
2
f(t
i
, w
i
)), i = 0, 1, ..., N 1.
This is the same as the 2nd order Runge-Kuttas method.
Modied Eulers Method
52
Let us approximate y

() in (5.5.3.8) by
y

()
y

(t
i
) + y

(t
i+1
)
2
=
f(t
i
, y
i
) + f(t
i+1
, y
i+1
)
2
.
But the resulting scheme is implicit:
w
i+1
= w
i
+
h
2
[f(t
i
, w
i
) +f(t
i+1
, w
i+1
)].
To overcome this diculty, we use the standard Eulers method to approximate y
i+1
rst
and then plug it into the right-hand side of the above. This is the modied Eulers
method.
w
0
=
w
i+1
= w
i
+
h
2
[f(t
i
, w
i
) +f(t
i+1
, w
i
+ hf(t
i
, w
i
))], i = 0, 1, ..., N 1.
The local truncation error is
i
(h) = O(h
2
).
Runge-Kutta Order 4 Method
w
0
= ,
k
1
= hf(t
i
, w
i
),
k
2
= hf(t
i
+
h
2
, w
i
+
1
2
k
1
),
k
3
= hf(t
i
+
h
2
, w
i
+
1
2
k
2
),
k
4
= hf(t
i+1
, w
i
+ k
3
),
w
i+1
= w
i
+
1
6
(k
1
+ 2k
2
+ 2k
3
+k
4
), i = 0, 1, ..., N 1.
The local truncation error is O(h
4
).
5.4 Multi-step Methods
Denition 5.2 An m-step multi-step method for solving y

= f(t, y), a t b, y(a) =


has a dierence equation for nding the approximation w
i+1
at the mesh point t
i+1
represented by the following equation, where m is an integer > 1:
w
i+1
= a
m1
w
i
+ a
m2
w
i1
+ +a
0
w
i+1m
+h[b
m
f(t
i+1
, w
i+1
) + b
m1
f(t
i
, w
i
) + + b
0
f(t
i+1m
, w
i+1m
)]
for i = m1, m, ..., N 1, where h = (b a)/N, a
i
, b
i+1
, i = 0, 1, ..., m1 are constants,
and the starting values
w
0
= , w
1
=
1
, ..., w
m1
=
m1
are specied.
53
Integrating y

= f from t
i
to t
i+1
gives
y(t
i+1
) y(t
i
) =
_
t
i+1
t
i
f(t, y)dt,
or
y(t
i+1
) = y(t
i
) +
_
t
i+1
t
i
f(t, y)dt.
Let P(t) be an interpolating polynomial obtained using some of the previous data (t
0
, w
0
),
(t
1
, w
1
),...,(t
i
, w
i
). We approximate y
i+1
by
y
i+1
y
i
+
_
t
i+1
t
i
P(t)dt.
For example, we let P
m1
(t) denote the polynomial from the Newtons backward dierence
formula on (t
i
, f(t
i
, y
i
)), ..., (t
i+1m
, f(t
i+1m
, y
i+1m
)). Then, we have
f(t, y(t)) = P
m1
(t) +R
m
(t),
where
R
m
(t) =
f
(m)
(, y())
m!
(t t
i
) (t t
i+1m
).
Introducing t = t
i
+sh for s [0, 1], we have dt = hds, and so
_
t
i+1
t
i
f(t, y(t))dt =
_
t
i+1
t
i
m1

k=0
(1)
k
_
s
k
_

k
f(t
i
, y(t
i
))dt +
_
t
i+1
t
i
R
m
(t)dt
= h
m1

k=0
(1)
k

k
f(t
i
, y(t
i
))
_
1
0
_
s
k
_
ds +
_
t
i+1
t
i
R
m
(t)dt.
When k = 3, we have
(1)
3
_
1
0
_
s
3
_
ds =
_
1
0
(s)(s 1)(s 2)
3!
ds =
3
8
.
Other values of the integral for dierent ks are given in the following table
k 0 1 2 3 4 5
integral 1 1/2 5/12 3/8 251/720 95/288
Therefore,
_
t
i+1
t
i
f(t, y)dt = h
_
f(t
i
, y
i
) +
1
2
f(t
i
, y
i
) +
5
12

2
f(t
i
, y
i
) + +
_
t
i+1
t
i
R
m
(s)ds
_
.
Adam-Bashforth Method (m = 3)
54
When m = 3 we have
y
i+1
= y
i
+ h[1 +
1
2
+
5
12

2
]f(t
i
, y
i
)
= y
i
+ h
_
f(t
i
, y
i
) +
1
2
[f(t
i
, y
i
) f(t
i1
, y
i1
)]
+
5
12
[f(t
i
, y
i
) 2f(t
i1
, y
i1
) +f(t
i2
, y
i2
)]
_
= y
i
+
h
12
[23f(t
i
, y
i
) 16f(t
i1
, y
i1
) + 5f(t
i2
, y
i2
)].
From this we dene the following algorithm.
w
0
= , w
1
=
1
, w
2
=
2
,
w
i+1
= w
i
+
h
12
[23f(t
i
, w
i
) 16f(t
i1
, w
i1
) + 5f(t
i2
, w
i2
)],
i = 2, 3, ..., N 1. The local truncation error is O(h
3
).
Adam-Bashforth 2-step Method (m = 2)
w
0
= , w
1
=
1
,
w
i+1
= w
i
+
h
2
[3f(t
i
, w
i
) f(t
i1
, w
i1
)],
i = 1, 2, ...N 1. The local truncation error is O(h
2
).
5.5 Implicit Methods
Adams-Moulton 2-step Method
w
0
= , w
1
=
1
,
w
i+1
= w
i
+
h
12
[5f(t
i+1
, w
i+1
) + 8f(t
i
, w
i
) f(t
i1
, w
i1
)],
i = 1, 2, ..., N 1.
Adams-Moulton 3-step Method
w
0
= , w
1
=
1
, w
2
=
2
,
w
i+1
= w
i
+
h
24
[9f(t
i+1
, w
i+1
) + 19f(t
i
, w
i
) 5f(t
i1
, w
i1
) + f(t
i2
, y
i2
)],
i = 2, 3, ..., N 1.
Predictor-Corrector Methods
Idea: Use an explicit method to obtain an intermediate approximation and then plug it
into an implicit method to get an improved approximation. For example, the modied
Eulers method discussed before.
55
Combining Adams-Bashforth and Adams-Moulton 2-step methods we have
w
0
= , w
1
=
1
,
w

i+1
= w
i
+
h
2
[3f(t
i
, w
i
) f(t
i1
, w
i1
)],
w
i+1
= w
i
+
h
12
[5f(t
i+1
, w

i+1
) + 8f(t
i
, w
i
) f(t
i1
, w
i1
)],
i = 1, 2, ...N 1.
5.6 Stability of One-Step Methods
Consider a scheme of the form
w
0
= ,
w
i+1
= w
i
+h(t
i
, w
i
, h), i = 0, 1, ..., N 1.
The local truncation error is denoted as
i
(h). We now dene the consistency, convergence
and stability of this scheme.
Denition 5.3 (consistency) The scheme is consistent if
lim
h0
max
1iN
|
i
(h)| = 0.
Denition 5.4 (convergence) The scheme is consistent if
lim
h0
max
1iN
|w
i
y(t
i
)| = 0.
Denition 5.5 (stability) The scheme is stable if a small change/perturbation in
results in only a small change in {w
i
}. In this case, we also say that the method depends
continuously on the initial data.
Theorem 5.2 Consider the above 1-step scheme. Suppose that h
0
> 0 such that (t, w, h)
is continuous and satises a Lipschitz condition in w with Lipschitz constant L on
D = {(t, w, h) : a t b, < w < , 0 h h
0
}.
Then
1. The method is stable.
2. The method is convergent i it is consistent which is equivalent to
(t, y, 0) = f(t, y), t [a, b].
3. If there exists a function such that
|
i
(h)| (h) i = 1, 2, ..., N and 0 h h
0
,
then
|y(t
i
) w
i
|
(h)
L
e
L(t
i
a)
.
56
The proof is omitted here.
Example. Consider the following Modied Eulers Method when f satises the Lipschitz
condition:
w
0
=
w
i+1
= w
i
+
h
2
[f(t
i
, w
i
) +f(t
i+1
, w
i
+ hf(t
i
, w
i
))], i = 0, 1, ..., N 1.
In this case
(t, w, h) =
1
2
f(t, w) +
1
2
f(t + h, w + hf(t, w)).
So,
(t, w
1
, h)(t, w
2
, h) =
1
2
[f(t, w
1
)f(t, w
2
)]+
1
2
[f(t+h, w
1
+hf(t, w
1
))f(t+h, w
2
+hf(t, w
2
))].
Since f satises the Lipschitz condition, i.e.,
|f(t, y
1
) f(t, y
2
)| L|y
1
y
2
|.
we have
|(t, w
1
, h) (t, w
2
, h)|
L
2
|w
1
w
2
| +
L
2
|w
1
+ hf(t, w
1
) (w
2
+ hf(t, w
2
))|
L|w
1
w
2
| +
Lh
2
|f(t, w
1
) f(t, w
2
)|
L|w
1
w
2
| +
L
2
h
2
|w
1
w
2
|
= L(1 +
Lh
2
)|w
1
w
2
|
=: L

|w
1
w
2
|.
Therefore, also satises a Lipschitz condition. Clearly, is continuous. Now,
(t, w, 0) =
1
2
f(t, w) +
1
2
f(t, w) = f(t, w).
So, the method is stable and convergent. We can also show that
|y(t
i
) w
i
| O(h
2
)
e
L

(t
i
a)
L

.
5.7 Stability of Multi-step Methods
Consider
w
i
=
i
, i = 0, 1, ...m1,
0
=
w
i+1
= a
m1
w
i
+ a
m2
w
i1
+ +a
0
w
i+1m
+h[b
m
f(t
i+1
, w
i+1
) + b
m1
f(t
i
, w
i
) + + b
0
f(t
i+1m
, w
i+1m
)]
57
for i = m1, m, ..., N 1. The Characteristic Polynomial associated with this scheme is
dened as
P() =
m
a
m1

m1
a
m2

m2
a
1
a
0
.
In the case that f(t, y) = 0, we have
w
i+1
(a
m1
w
i
+ + a
0
w
i+1m
) = 0.
This is a dierence equation. If
k
satises P(
k
) = 0, then w
n
=
n
k
for all k, because

i+1
k
(a
m1

i
k
+ + a
0

i+1m
k
) =
i+1m
k
(
m
k
(a
m1

m1
k
+ + a
0
) = 0.
So,
w
n
=
m

k=1
c
k

n
k
for a set of constants {c
k
}. Using w
0
= we have
w
n
= +
m

k=2
c
k

n
k
If everything is exact. c
k
= 0 for all k = 2, 3, ..., m. But, in practice, c
k
= 0 because of
the machine error. So, we need
|
k
| 1
in order that the calculation error does not grow.
Denition 5.6 (root condition) Let
i
, i = 1, 2, ..., m be the roots of P() = 0 associ-
ated with the multi-step scheme. If |
i
| 1 for each i = 1, 2, ..., m and all the roots with
absolute value 1 are simple roots, then the multi-step scheme is said to satisfy the root
condition.
Theorem 5.3 The multi-step method is stable i it satises the root condition. Moreover,
if it is consistent with the dierence equation, then the multi-step scheme is stable i it is
convergent.
The proof is omitted here.
Example 1. The 4th order Adams-Bashforth method
w
i+1
= w
i
+ hF(t
i
, h, w
i+1
, w
i
, ..., w
i3
),
where
F =
h
24
[55f(t
i
, w
i
) 59f(t
i1
, w
i1
) + 37f(t
i2
, w
i2
) 9f(t
i3
, w
i3
)].
In this case we have m = 4, a
0
= 0 = a
1
= a
2
, a
3
= 1. So,
P() =
4

3
=
3
( 1) = 0.
This gives = 0, 0, 0, 1, and so it is stable.
58
Example 2. The explicit multi-step method with
w
i+1
= w
i3
+
4h
3
[2f(t
i
, w
i
) f(t
i
, w
i1
) + 2f(t
i2
, w
i2
)].
In this case, m = 4, a
0
= 1, a
1
= a
2
= a
3
= 0. So,
P() =
4
1 = 0.
The roots are = 1, i. The method is also stable.
Denition 5.7 A multi-step method is
1. strongly stable if the roots condition is satised and = 1 is the only root satisfying
|| = 1,
2. weakly stable if it is stable, but not strongly stable, and
3. unstable if it does not satisfy the root condition.
Example 3. Consider the method with
w
i+1
=
1
2
(w
i
+ w
i1
) + hF(t
i
, h, w
i
, w
i1
)
We have m = 2, a
0
= 1/2 = a
1
. So,
P() =
2

1
2

1
2
= 0,
or
2
2
1 = 0.
Solving this gives
=
1

9
4
= 1 or
1
2
.
The method is strongly stable.
59
Chapter 6
Least Squares Approximation
6.1 Discrete case
The approximation of a given data set, (x
i
, y
i
), i = 1, 2, ..., m by the Lagrange polynomial
often yields large errors at points other than the data points x
i
, i = 1, 2, ..., m. This is
because of the use of high order polynomials. Also, the curve obtained by Lagrange type
of interpolation does not agree with the practical situation. For example, the data set may
show a linear, quadratic or exponential trend, but the Lagrange interpolant gives a curve
with a complicated convexity property. Therefore, we need to nd a better/alternative
approximation method then/to the Lagrange interpolation method. A common practice
is to minimize the mean square error in the approximation as demonstrated below.
Let us consider the approximation of (x
i
, y
i
), i = 1, 2, ..., m by a polynomial
P
n
(x) =
n

j=0
a
j
x
j
, (6.6.1.1)
where m and n are positive integrers. Normally we assume that n << m. Thus, we look
for {a
j
}
n
0
such that
E
n
=
m

i=1
(y
i
P
n
(x
i
))
2
(6.6.1.2)
is minimized. Substituting (6.6.1.1) into (6.6.1.2) gives
E
n
=
m

i=1
_
y
i

j=0
a
j
x
j
i
_
2
=
m

i=1
_
_
y
2
i
2y
i
n

j=0
a
j
x
j
i
+
_
n

j=0
a
j
x
j
i
_
2
_
_
=
m

i=1
y
2
i
2
n

j=0
a
j
_
m

i=1
y
i
x
j
i
_
+
m

i=1
_
n

j=0
a
j
x
j
i
_
2
=
m

i=1
y
2
i
2
n

j=0
a
j
_
m

i=1
y
i
x
j
i
_
+
n

j,k=0
a
j
a
k
_
m

i=1
x
j+k
i
_
.
60
Dierentiating E
n
with respect to a
j
and setting the derivative to zero we have
E
n
a
j
= 2
m

i=1
y
i
x
j
i
+ 2
n

k=0
a
k
_
m

i=1
x
j+k
i
_
= 0,
or
n

k=0
a
k
_
m

i=1
x
j+k
i
_
=
m

i=1
y
i
x
j
i
for j = 0, 1, ..., n. In the matrix form we have
_
_
_
_
_

m
i=1
x
0
i

m
i=1
x
1
i
. . .

m
i=1
x
n
i

m
i=1
x
1
i

m
i=1
x
2
i
. . .

m
i=1
x
n+1
i
.
.
.
.
.
.
.
.
.
.
.
.

m
i=1
x
n
i

m
i=1
x
n+1
i
. . .

m
i=1
x
2n
i
_
_
_
_
_
_
_
_
_
_
a
0
a
1
.
.
.
a
n
_
_
_
_
_
=
_
_
_
_
_

m
i=1
y
i
x
0
i

m
i=1
y
i
x
1
i
.
.
.

m
i=1
y
i
x
n
i
_
_
_
_
_
.
When n = 1, i.e., P
1
(x) = a
0
+ a
1
x, we have
_
m

m
i=1
x
i

m
i=1
x
i

m
i=1
x
2
i
__
a
0
a
1
_
=
_
m
i=1
y
i

m
i=1
y
i
x
i
.
_
Solving this 2 2 system exactly gives
a
0
=
(

m
i=1
x
2
i
)(

m
i=1
y
i
) (

m
i=1
y
i
x
i
)(

m
i=1
x
i
)
m(

m
i=1
x
2
i
) (

m
i=1
x
i
)
2
a
1
=
m(

m
i=1
y
i
x
i
) (

m
i=1
x
i
)(

m
i=1
y
i
)
m(

m
i=1
x
2
i
) (

m
i=1
x
i
)
2
Example. Find the linear t to the data set given in the table
x 1 2 3 4 5 6 7 8 9 10
y 1.3 3.5 4.2 5.0 7.0 8.8 10.1 12.5 13 15.6
This can be solved by the following Matlab code:
x=[1;2;3;4;5;6;7;8;9;10];
y=[1.3;3.5;4.2;5.0;7.0;8.8;10.1;12.5;13;15.6];
m=10;
X=sum(x);
Y=sum(y);
XX=dot(x,x);
XY=dot(x,y);
den=m*XX-X^2;
a0=(XX*Y-XY*X)/den;
a1=(m*XY-X*Y)/den;
Run this code will result in a
0
= 0.36 and a
1
= 1.538.
61
6.2 Fitting by exponential functions
Consider tting (x
i
, y
i
), i = 1, 2, ..., m by y = f(x) = b exp(ax), where a and b are to be
determined. Taking ln yields
ln y = ln b + ax, or Y = B + ax.
Therfore, the problem becomes how to t the data set (x
i
, ln y
i
), i = 1, 2, ..., m by the
above linear t. The condition required is that y
i
> 0 for all i = 1, 2, ..., m.
6.3 Orthogonal polynomials & least-squares approx-
imation
Suppose f C[a, b]. We look for P
n
(x) of the form (6.6.1.1) so that
E =
_
b
a
[f(x) P
n
(x)]
2
dx
is minimized. Substituting (6.6.1.1) into the above expression we have
E =
_
b
a
[f
n

k=0
a
k
x
k
]
2
dx
=
_
b
a
f
2
(x)dx 2
_
b
a
f(x)
n

k=0
a
k
x
k
dx +
_
b
a
_
n

k=0
a
k
x
k
_
2
dx
=
_
b
a
f
2
(x)dx 2
_
b
a
f(x)
n

k=0
a
k
x
k
dx +
_
b
a
n

j,k=0
a
j
a
k
x
j+k
dx.
Dierentiating with respect to a
j
we have
E
a
j
= 2
_
b
a
x
j
f(x)dx + 2
n

k=0
a
k
_
b
a
x
j+k
dx = 0
for all j = 0, 1, ..., n. Therefore, we have following (n + 1) (n + 1) linear system deter-
mining a
j
, j = 0, 1, ..., n.
n

k=0
a
k
_
b
a
x
j+k
dx =
_
b
a
x
j
f(x)dx
for all j = 0, 1, ..., n. In matrix form this is
_
_
_
_
b
a
dx
_
b
a
xdx . . .
_
b
a
x
n
dx
.
.
.
.
.
. . . .
.
.
.
_
b
a
x
n
dx
_
b
a
x
n+1
dx . . .
_
b
a
x
2n
dx
_
_
_
_
_
_
a
0
.
.
.
a
n
_
_
_
=
_
_
_
_
b
a
fdx
.
.
.
_
b
a
x
n
fdx.
_
_
_
The coecient matrix is called a Hilbert matrix.
62
We may also use the weighted inner product to determine P
n
(x). More generally, we
may nd
(x) =
n

k=0
a
k

k
(x), (6.6.3.3)
where {
0
(x),
1
(x), ...,
n
(x)} is a set of linearly indepedent functions, so that the quadratic
function
E =
_
b
a
w(x)[f(x) (x)]
2
dx (6.6.3.4)
is minimized. In (6.6.3.4), w(x) > 0 is a weighting function. Substituting (6.6.3.3) into
(6.6.3.4), dierentiating E with respect to a
j
and setting the result to zero, we have
n

k=0
a
k
_
b
a
w(x)
k
(x)
j
(x)dx =
_
b
a
w(x)f(x)
j
(x)dx (6.6.3.5)
for j = 0, 1, ..., n. If we choose {
k
}
n
0
such that
_
b
a
w(x)
k
(x)
j
(x)dx =
_
0, j = k,

j
> 0, j = k,
(6.6.3.6)
then (6.6.3.5) has the solution
a
j
=
1

j
_
b
a
w(x)f(x)
j
(x)dx
for j = 0, 1, ..., n. Functions satisfying (6.6.3.6) are called orthogonal functions.
Examples 1. The set {sin(nx), cos(nx)} for n = 0, 1, ... contains orthogonal functions
on (, ).
Example 2. The Legendre polynomials {P
n
(x)}, where P
0
= 1, P
1
= x, P
2
= x
2

1
3
,
P
3
= x
3

3
5
x, P
4
= x
4

6
7
x
2
+
3
35
, P
5
= x
5

10
9
x
3
+
5
21
x, ... are a set of orthogonal
polynomials on [0, 1].
63
Chapter 7
Solution of Nonlinear Systems of
Equations
In this chapter we consider the solution of nonlinear algebraic systems of the form
f(x) = 0, (7.7.0.1)
where f = (f
1
, f
2
, ..., f
n
)
T
: R
n
R
n
and x = (x
1
, ..., x
n
)
T
R
n
.
7.1 Fixed point iterations
Theorem 7.1 (xed point) Let D R
n
be a closed region and G be a mapping from
D to itself satisfying the Lipschitz condition
||G(x) G(y)|| ||x y||, x, y D,
where the Lipschitz constant < 1. Then, a unique xed point of G, p D, and the
xed point iteration
p
n+1
= G(p
n
), n = 0, 1, ...
with p
0
D converges to p.
PROOF. Let p
0
D. Then {p
n
} D, because G maps D to D. The dierence
between two consecutive iterates is
||p
n+1
p
n
|| ||G(p
n
) G(p
n1
)||
||p
n
p
n1
||
.
.
.

n
||p
1
p
0
||.
64
Therefore,
||p
n
p
0
|| =
_
_
_
_
_
n1

i=0
(p
i+1
p
i
)
_
_
_
_
_

n1

i=0
||p
i+1
p
i
||
||p
1
p
0
||
n1

i=0

i
||p
1
p
0
||

1
for all n = 1, 2, .... Now, for any intergers n, k > 0,
||p
n+k
p
n
|| = ||G(p
n+k1
) G(p
n1
||
||p
n+k1
p
n1
||
.
.
.

n
||p
k
p
0
||


n+1
1
||p
1
p
0
||.
Taking the limit we have
lim
n,k
||p
n+k
p
n
|| = 0
since 0 < < 1. So, {p
n
} is a Cauchy sequence in D, and it has a limit in D since D is
closed.
Suppose {p
n
} has two limits, say p and q, both in D. Then,
||p q|| = ||G(p) G(q)|| ||p q||
with < 1. This implies that ||p q|| = 0. 2
Corollary 7.1 Let G = (g
1
, ..., g
n
)
T
. If G satises

g
i
(x)
x
j


n
, xinD
for all i, j = 1, 2, ..., n with < 1, then
p
n+1
= G(p
n
), n = 0, 1, ...
with p
0
D convergens to the unique xed point of G in D.
PROOF. (Scatch only) For any x, y D, using a Taylors expansion
G(x) G(y) =
_
n

j=1
g
1
()
x
j
(x
j
y
j
), ...,
n

j=1
g
n
()
x
j
(x
j
y
j
)
_
T
=
_
_
_
g
1
x
1
. . .
g
1
x
n
.
.
.
.
.
.
g
n
x
1
. . .
g
n
x
n
_
_
_
_
_
_
x
1
y
1
.
.
.
x
n
y
n
_
_
_
= J(G)(x y),
65
where J(G) is called the Jacobian (Jacobi matrix) of G. Taking the norm,
||G(x) G(y)|| max
1jn
n

i=1

g
i
x
j

||x y||

i=1

n
||x y||
= ||x y||.
Therefore, The Lipschitz condition is satised by G. 2
Example. Consider the problem
3x
1
cos(x
2
x
3
)
1
2
= 0
x
2
1
81(x
2
+ 0.1)
2
+ sin x
3
+ 1.06 = 0
e
x
1
x
2
+ 20x
3
+
10 3
3
= 0.
We rewrite these into
x
1
=
1
3
cos(x
2
x
3
) +
1
6
=: g
1
x
2
=
1
9
_
x
2
1
+ sin x
3
+ 1.06 0.1 =: g
2
x
3
=
1
20
_
e
0x
1
x
2

10 3
3
_
=: g
3
.
It is esy to check that the conditions in the Corollary are satises. We check the rsrt
few below. When x
1
, x
2
, x
3
[1, 1], we have

g
1
x
1

= 0,

g
1
x
2

1
3
sin(x
2
x
3
)x
3

<
1
3
sin 1,

g
1
x
3

=
1
3
| sin(x
2
x
3
)x
2
| <
1
3
sin 1,

g
2
x
1

=
1
9

1
2

2x
1
_
x
2
1
+ sin x
3
+ 1.06

19
|x
1
|

1.06 + sin x
3
19
|x
1
|

1.06 + sin x
3
19
|x
1
|

0.06
< 0.238.
Therefore, the xed point iteration has a unique solution in [0, 1]
3
.
66
7.2 Newtons method
Assume that x

is a solution to (7.7.0.1), i.e., f(x

) = 0. Then, when x is sucient close


to x

, a Taylors expansion gives


f(x

) = f(x) + J(x)(x

x) +O(||x x

||
2
),
where J(x) is the Jacobian of f dened by
J(x) =
_
_
_
f
1
x
1
f
1
x
2
. . .
f
1
x
n
.
.
.
.
.
.
.
.
.
.
.
.
f
n
x
1
f
n
x
2
. . .
f
n
x
n
.
_
_
_
Since f(x

) = 0, we have, omitting the second order term,


f(x) J(x)(x

x).
Therefore, we have
x

x J
1
(x)f(x).
This motivates us to design the following algorithm:
Given x
0
, for k = 0, 1, ... until convergence, do
x
k+1
= x
k
J
1
(x
k
)f(x
k
).
This is Newtons method.
7.3 Quasi-Newton methods
The idea is to approximate J by a nite dierence. We have shown that in one-dimension,
one choice is
f

(x
1
)
f(x
1
) f(x
0
)
x
1
x
0
or f

(x
1
)(x
1
x
0
) f(x
1
) f(x
0
).
In multi-dimensions, we have
J(x
1
)(x
1
x
0
) f(x
1
) f(x
0
).
Now, assume that x
0
is given and x
1
is computed by the Newtons method. We look for
an approximation A
1
to J(x
1
) in the following way. Note
A
1
(x
1
x
0
) = f(x
1
) f(x
0
).
This shows that A
1
is a mapping dened for vectors parallel to x
1
x
0
. Note that A
1
is
arbitrary for vectors perpendicular to x
1
x
0
. We dene
A
1
z = J(x
0
)z, z (x
1
x
0
).
67
The above two equalities dene A
1
uniquely as given below.
A
1
= J(x
0
) +
[f(x
1
) f(x
0
) J(x
0
)(x
1
x
0
)](x
1
x
0
)
T
||x
1
x
0
||
2
.
Let us check this by right-multiplying the above by (x
1
x
0
).
A
1
(x
1
x
0
) = J(x
0
)(x
1
x
0
) +
[f(x
1
) f(x
0
) J(x
0
)(x
1
x
0
)]||x
1
x
0
||
2
||x
1
x
0
||
2
= f(x
1
) f(x
0
),
and
A
1
z = J(x
0
)z, z (x
1
x
0
).
Therefore, we choose
x
2
= x
1
A
1
1
f(x
1
).
In general we have
A
i
= A
i1
+
y
i
A
i1
s
i
||s
i
||
2
s
T
i
x
i+1
= x
i
A
1
i
f(x
i
),
where y
i
= f(x
i
) f(x
i1
) and s
i
= x
i
x
i1
. This is called Broydens method.
7.4 The steepest descent method
Consider the following minimization problem:
min g(x
1
, x
2
, ..., x
n
) =
n

i=1
f
2
i
(x
1
, x
2
, ..., x
n
).
Obviously, if f(x) has a root, then
g
min
= 0 f
i
(x) = 0, i = 1, 2, ..., n.
Starting from an initial guess x
0
= (x
0
1
, ..., x
0
n
)
T
, we nd

k
R such that

k
minimizes
h() = g(x
k
g(x
k
)).
(Note that this is a one-dimensional probloem.). We then let
x
k+1
= x
k

k
g(x
k
), k = 0, 1, ...
This is the steeepest descent method.
68

You might also like