Num Math
Num Math
Num Math
1. Numerical analysis
Numerical analysis is the branch of mathematics which study and develop the algorithms that
use numerical approximation for the problems of mathematical analysis (continuous mathematics).
Numerical technique is widely used by scientists and engineers to solve their problems. A major
advantage for numerical technique is that a numerical answer can be obtained even when a problem
has no analytical solution. However, result from numerical analysis is an approximation,
in general,
which can be made as accurate as desired. For example to find the approximate values of 2, etc.
In this chapter, we introduce and discuss some basic concepts of scientific computing. We begin
with discussion of floating-point representation and then we discuss the most fundamental source of
imperfection in numerical computing namely roundoff errors. We also discuss source of errors and then
stability of numerical algorithms.
2. Numerical analysis and the art of scientific computing
Scientific computing is a discipline concerned with the development and study of numerical algorithms for solving mathematical problems that arise in various disciplines in science and engineering.
Typically, the starting point is a given mathematical model which has been formulated in an attempt
to explain and understand an observed phenomenon in biology, chemistry, physics, economics, or any
engineering or scientific discipline. We will concentrate on those mathematical models which are continuous (or piecewise continuous) and are difficult or impossible to solve analytically: this is usually
the case in practice. Relevant application areas within computer science include graphics, vision and
motion analysis, image and signal processing, search engines and data mining, machine learning, hybrid
and embedded systems, and many more. In order to solve such a model approximately on a computer,
the (continuous, or piecewise continuous) problem is approximated by a discrete one. Continuous functions are approximated by finite arrays of values. Algorithms are then sought which approximately
solve the mathematical problem efficiently, accurately and reliably.
3. Floating-point representation of numbers
Any real number is represented by an infinite sequence of digits. For example
2
6
6
8
= 2.66666 =
+
+
+ . . . 101 .
3
101 102 103
This is an infinite series, but computer use an finite amount of memory to represent numbers. Thus
only a finite number of digits may be used to represent any number, no matter by what representation
method.
For example, we can chop the infinite decimal representation of 83 after 4 digits,
8
2
6
6
6
= ( 1 + 2 + 3 + 4 ) 101 = 0.2666 101 .
3
10
10
10
10
Generalizing this, we say that number has n decimal digits and call this n as precision.
For each real number x, we associate a floating point representation denoted by f l(x), given by
f l(x) = (0.a1 a2 . . . an ) e ,
here based fraction is called mantissa with all ai integers and e is known as exponent. This representation is called based floating point representation of x.
For example,
42.965 = 4 101 + 2 100 + 9 101 + 6 102 + 5 103
= 42965 102 .
1
a1 6= 0.
(1) Chopping: We ignore digits after an and write the number as following in chopping
f l(x) = (.a1 a2 . . . an ) e .
(2) Rounding: Rounding is defined as following
(0.a1 a2 . . . an ) e , 0 an+1 < /2
(rounding down)
f l(x) =
(0.a1 a2 . . . an ) + (0.00 . . . 01) e , /2 an+1 < (rounding up).
Example 1.
6
0.86 100 (rounding)
=
fl
0.85 100 (chopping).
7
Rules for rounding off numbers:
(1) If the digit to be dropped is greater than 5, the last retained digit is increased by one. For example,
12.6 is rounded to 13.
(2) If the digit to be dropped is less than 5, the last remaining digit is left as it is. For example,
12.4 is rounded to 12.
(3) If the digit to be dropped is 5, and if any digit following it is not zero, the last remaining digit is
increased by one. For example,
12.51 is rounded to 13.
(4) If the digit to be dropped is 5 and is followed only by zeros, the last remaining digit is increased
by one if it is odd, but left as it is if even. For example,
11.5 is rounded to 12, and 12.5 is rounded to 12.
Definition 3.2 (Absolute and relative error). If f l(x) is the approximation to the exact value x, then
|x f l(x)|
.
the absolute error is |x f l(x)|, and relative error is
|x|
Remark: As a measure of accuracy, the absolute error may be misleading and the relative error is more
meaningful.
Definition 3.3 (Overflow and underflow). An overflow is obtained when a number is too large to fit
into the floating point system in use, i.e e > M . An underflow is obtained when a number is too small,
i.e e < m . When overflow occurs in the course of a calculation, this is generally fatal. But underflow
is non-fatal: the system usually sets the number to 0 and continues. (Matlab does this, quietly.)
X
ai
i
x = (0.a1 a2 . . . an an+1 . . . ) e =
!
e , a1 6= 0.
i=1
n
X
ai
i
f l(x) = (0.a1 a2 . . . an ) e =
!
e.
i=1
Therefore
X
ai
i
|x f l(x)| =
!
e
i=n+1
e |x f l(x)| =
X
ai
.
i
i=n+1
X
1
|x f l(x)|
i
i=n+1
1
1
= ( 1) n+1 + n+2 + . . .
" 1 #
e
= ( 1))
n+1
1 1
= n .
Now
|x| = (0.a1 a2 . . . an ) e
Therefore
1
e.
|x f l(x)|
n e
1
1n .
|x|
e
P ai
e , an+1 < /2
(0.a1 a2 . . . an ) =
i
i=1
f l(x) =
n a
P
1
i
e
e , an+1 /2.
(0.a1 a2 . . . an1 [an + 1]) = n +
i
i=1
For an+1 < /2,
e |x f l(x)| =
X
X
ai
an+1
ai
=
+
i
n+1
i=n+1
i=n+2
X
/2 1
( 1)
+
n+1
i
i=n+2
/2 1
1
1
+ n+1 = n .
n+1
i
n
i=n+1
1
X
ai
= n
i
i=n+1
1
X
a
a
n+1
i
n n+1
i
i=n+2
1
an+1
n n+1
1
/2
|x f l(x)| n n+1
1
= n .
2
1 en
.
2
Now
|x f l(x)|
1 n e
1
= 1n .
1
e
|x|
2
2
5. Significant Figures
All measurements are approximations. No measuring device can give perfect measurements without
experimental uncertainty. By convention, a mass measured to 13.2 g is said to have an absolute
uncertainty of plus or minus 0.1 g and is said to have been measured to the nearest 0.1 g. In other
words, we are somewhat uncertain about that last digit-it could be a 2; then again, it could be a
1 or a 3. A mass of 13.20 g indicates an absolute uncertainty of plus or minus 0.01 g.
The number of significant figures in a result is simply the number of figures that are known with some
degree of reliability.
The number 25.4 is said to have 3 significant figures. The number 25.40 is said to have 4 significant
figures
Rules for deciding the number of significant figures in a measured quantity:
(1) All nonzero digits are significant:
1.234 has 4 significant figures, 1.2 has 2 significant figures.
(2) Zeros between nonzero digits are significant: 1002 has 4 significant figures.
(3) Leading zeros to the left of the first nonzero digits are not significant; such zeros merely indicate
the position of the decimal point: 0.001 has only 1 significant figure.
(4) Trailing zeros that are also to the right of a decimal point in a number are significant: 0.0230 has
3 significant figures.
(5) When a number ends in zeros that are not to the right of a decimal point, the zeros are not necessarily significant: 190 may be 2 or 3 significant figures, 50600 may be 3, 4, or 5 significant figures.
The potential ambiguity in the last rule can be avoided by the use of standard exponential, or scientific, notation. For example, depending on whether the number of significant figures is 3, 4, or 5, we
would write 50600 calories as:
0.506 106 (3 significant figures)
0.5060 106 (4 significant figures), or
0.50600 106 (5 significant figures).
What is an exact number? Some numbers are exact because they are known with complete certainty.
Most exact numbers are integers: exactly 12 inches are in a foot, there might be exactly 23 students in
a class. Exact numbers are often found as conversion factors or as counts of objects. Exact numbers
can be considered to have an infinite number of significant figures. Thus, the number of apparent
significant figures in any exact number can be ignored as a limiting factor in determining the number
of significant figures in the result of a calculation.
1 X
x1 x2 . . . xn1
1
=
=
.
X xn
x1 x2 x3 . . . xn
xn
Therefore
X
x1 x2
xn
=
+
+ +
.
X
x1
x2
xn
We have
X
x2 X
x1 x2
x1 X
+
=
.
=
X
X x1
X x2
x1
x2
X x1 x2
,
Er =
+
X x1 x2
X
X.
Ea =
X
0.5
A = 1/200r2
100
100 A
r
= 0.25.
100 =
Percentage error in r =
r
r A
r
Therefore A =
7.342
Example 7. Find the relative error in calculation of
, where numbers 7.342 and 0.241 are correct
0.241
to three decimal places. Determine the smallest interval in which true result lies.
x1
7.342
Sol. Let
=
= 30.467
x2
0.241
Here errors x1 = x2 = 21 103 = 0.0005.
Therefore relative error
0.0005 0.0005
= 0.0021
+
Er
7.342 0.241
Absolute error
x1
Ea 0.0021
= 0.0639
x2
7.342
Hence true value of
lies between 30.4647 0.0639 = 30.4008 and 30.4647 + 0.0639 = 30.5286.
0.241
7. Loss of significance, stability and conditioning
Roundoff errors are inevitable and difficult to control. Other types of errors which occur in computation may be under our control. The subject of numerical analysis is largely preoccupied with
understanding and controlling errors of various kinds. Here we examine some of them.
7.1. Loss of significance. One of the most common error-producing calculations involves the cancellation of significant digits due to the subtractions nearly equal numbers (or the addition of one very
large number and one very small number). The phenomenon can be illustrated with the following
example.
Example 8. If x = 0.3721478693 and y = 0.3720230572. What is the relative error in the computation
of x y using five decimal digits of accuracy?
Sol. We can compute with ten decimal digits of accuracy and can take it as exact.
x y = 0.0001248121.
Both x and y will be rounded to five digits before subtraction. Thus
f l(x) = 0.37215
f l(y) = 0.37202.
f l(x) f l(y) = 0.13000 103 .
Relative error, therefore is
(x y) (f l(x) f l(y)
.04% = 4%.
xy
Example 9. Consider the stability of x + 1 1 when x is near 0. Rewrite the expression to rid it
of subtractive cancellation.
Sol. Suppose that x = 1.2345678 105 . Then x + 1 1.000006173. If our computer (or calculator)
can only keep 8 significant digits, this will be rounded to 1.0000062. When 1 is subtracted, the result
is 6.2 106 .
Thus 6 significant digits have been lost from the original. To fix this, we rationalize the expression
x+1+1
x
x + 1 1 = ( x + 1 1)
=
.
x+1+1
x+1+1
This expression has no subtractions, and so is not subject to subtractive cancelling. When x =
1.2345678 105 , this expression evaluates approximately as
Er =
1.2345678 105
= 6.17281995 106
2.0000062
p
106 102 = 0.1000e4
0.1000e4 + 0.1000e4
2
= 0.1000e4
0.1000e4 0.1000e4
x2 =
= 0.0000e4
2
One of the roots becomes zero due to the limited
precision allowed in computation. In this equation
since b2 is much larger than 4ac. Hence b and b2 4ac become two equal numbers. Calculation of x2
involves the subtraction of nearly two equal numbers which will cause serious loss of significant figures.
To obtain a more accurate 4-digit rounding approximation for x2 , we change the formulation by
rationalizing the numerator or we know that in quadratic equation ax2 + bx + c = 0, the product of
the roots is given by c/a, therefore the smaller root may be obtained by dividing (c/a) by the largest
root. Therefore first root is given by 0.1000e4 and second root is given as
25
0.2500e2
=
= 0.2500e 1.
0.1000e4
0.1000e4
Example 11. The quadratic formula is used for computing the roots of equation ax2 +bx+c = 0, a 6= 0
and roots are given by
b b2 4ac
.
x=
2a
Consider the equation x2 + 62.10x + 1 = 0 and discuss the numerical results.
Sol. Using quadratic formula and 8-digit rounding arithmetic, we obtain two roots
x1 = .01610723
x2 = 62.08390.
We use these
values asexact values. Now
we perform calculations with 4-digit rounding arithmetic.
We have b2 4ac = 62.102 4.000 = 3856 4.000 = 62.06 and
62.10 + 62.06
f l(x1 ) =
= 0.02000.
2.000
The relative error in computing x1 is
|f l(x1 ) x1 |
| 0.02000 + .01610723|
=
= 0.2417.
|x1 |
| 0.01610723|
In calculating x2 ,
62.10 62.06
f l(x2 ) =
= 62.10.
2.000
The relative error in computing x2 is
|f l(x2 ) x2 |
| 62.10 + 62.08390|
=
= 0.259 103 .
|x2 |
| 62.08390|
In this equation since b2 = 62.102 is much larger than 4ac = 4. Hence b and b2 4ac become two
equal numbers. Calculation of x1 involves the subtraction of nearly two equal numbers but x2 involves
the addition of the nearly equal numbers which will not cause serious loss of significant figures.
To obtain a more accurate 4-digit rounding approximation for x1 , we change the formulation by
rationalizing the numerator, that is,
2c
x1 =
.
b + b2 4ac
Then
2.000
f l(x1 ) =
= 2.000/124.2 = 0.01610.
62.10 + 62.06
The relative error in computing x1 is now reduced to 0.62103 . However, if rationalize the numerator
in x2 to get
2c
.
x2 =
b b2 4ac
The use of this formula results not only involve the subtraction of two nearly equal numbers but also
division by the small number. This would cause degrade in accuracy.
f l(x2 ) =
2.000
= 2.000/.04000 = 50.00
62.10 62.06
x3 x5 x7
+
+ ...)
3!
5!
7!
x3
x5
x7
+
...
6
6 20 6 20 42
x2
x2
x2
x3
1 (1 (1 )(...)) .
=
6
20
42
72
=
7.2. Conditioning. The words condition and conditioning are used to indicate how sensitive the
solution of a problem may be to small changes in the input data. A problem is ill-conditioned if
small changes in the data can produce large changes in the results. For a certain types of problems, a
condition number can be defined. If that number is large, it indicates an ill-conditioned problem. In
contrast, if the number is modest, the problem is recognized as a well-conditioned problem.
The condition number can be calculated in the following manner:
K=
x x
x
0
xf (x)
,
f (x)
10
, then the condition number can be calculated as
1 x2
0
2
xf (x)
= 2x .
K=
f (x) |1 x2 |
Condition number can be quite large for |x| 1. Therefore, the function is ill-conditioned.
10
7.3. Stability of an algorithm. Another theme that occurs repeatedly in numerical analysis is the
distinction between numerical algorithms are stable and those that are not. Informally speaking, a
numerical process is unstable if small errors made at one stage of the process are magnified and propagated in subsequent stages and seriously degrade the accuracy of the overall calculation.
An algorithm can be thought of as a sequence of problems, i.e. a sequence of function evaluations.
In this case we consider the algorithm for evaluating f (x) to consist of the evaluation of the sequence
x1 , x2 , , xn . We are concerned with the condition of each of the functions f1 (x1 ), f2 (x2 ), , fn1 (xn1 )
where f (x) = fi (xi ) for all i. An algorithm is unstable if any fi is ill-conditioned, i.e. if any fi (xi ) has
condition much worse than f (x). Consider the example
f (x) = x + 1 x
so that there is potential loss of significance when x is large. Taking x = 12345 as an example, one
possible algorithm is
x0
x1
x2
x3
f (x) := x4
:
:
:
:
:
=
=
=
=
=
x = 12345
x0 + 1
x
1
x0
x2 x3 .
The loss of significance occurs with the final subtraction. We can rewrite the last step in the form
f3 (x3 ) = x2 x3 to show how the final answer depends on x3 . As f30 (x3 ) = 1, we have the condition
x3 f30 (x3 ) x3
K(x3 ) =
=
f3 (x3 ) x2 x3
from which we find K(x3 ) 2.2 104 when x = 12345. Note that this is the condition of a subproblem
arrived at during the algorithm. To find an alternative algorithm we write
x+1+ x
1
f (x) = ( x + 1 x)
=
x+1+ x
x+1+ x
This suggests the algorithm
x0
x1
x2
x3
x4
f (x) := x5
:
:
:
:
:
:
=
=
=
=
=
=
x = 12345
x0 + 1
x
1
x0
x2 + x3
1/x4 .
11
(4) The following numbers are given in a decimal computer with a four digit normalized mantissa:
A = 0.4523e 4, B = 0.2115e 3, and C = 0.2583e1.
Perform the following operations, and indicate the error in the result, assuming symmetric
rounding:
(i) A + B + C (ii) A B (iii) A/C (iv) AB/C.
(5) Assume 3-digit mantissa with rounding
(i) Evaluate y = x3 3x2 + 4x + 0.21 for x = 2.73.
(ii) Evaluate y = [(x 3)x + 4]x + 0.21 for x = 2.73.
Compare and discuss the errors obtained in part (i) and (ii).
(6) Associativity not necessarily hold for floating point addition (or multiplication).
Let a = 0.8567 100 , b = 0.1325 104 , c = 0.1325 104 , then a + (b + c) = 0.8567 100 ,
and (a + b) + c) = 0.1000 101 .
The two answers are
NOT
the same!
Show the calculations.
(7) Calculate the sum of 3, 5, and 7 to four significant digits and find its absolute and relative
errors.
(8) Rewrite ex cos x to be stable when x is near 0.
(9) Find the smaller root of the equation
x2 400x + 1 = 0
using four digits rounding arithmetic.
(10) Discuss the condition number of the polynomial function f (x) = 2x2 + x 1.
(11) Suppose that a function ln is available to compute the natural logarithm of its argument.
Consider the calculation of ln(1 + x), for small x, by the following algorithm
x0 : = x
x1 : = x0
f (x) := x2 : = ln(x1 )
By considering the condition K(x1 ) of the subproblem of evaluating ln(x1 ), show that such a
function ln is inadequate for calculating ln(1 + x) accurately.
Bibliography
[Atkinson]
[Conte]
K. Atkinson and W. Han, Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
Samuel D. Conte and Carle de Boor, Elementary Numerical Analysis: An Algorithmic
Approach, Third edition, McGraw-Hill, New York, 1980.
CHAPTER 2 (6 LECTURES)
ROOTS OF NON-LINEAR EQUATIONS
1. Introduction
Finding one or more root of the equation
f (x) = 0
is one of the more commonly occurring problems of applied mathematics. In most cases explicit
solutions are not available and we must be satisfied with being able to find a root to any specified
degree of accuracy. The numerical procedures for finding the roots are called iterative methods.
Definition 1.1 (Simple and multiple root). A root having multiplicity one is called a simple root. For
example, f (x) = (x 1)(x 2) has a simple root at x = 1 and x = 2, but g(x) = (x 1)2 has a root
of multiplicity 2 at x = 1, which is therefore not a simple root.
A multiple root is a root with multiplicity m 2 is called a multiple point or repeated root. For example,
in the equation (x 1)2 = 0, x = 1 is multiple (double) root.
If a polynomial has a multiple root, its derivative also shares that root.
Let be a root of the equation f (x) = 0, and imagine writing it in the factored form
f (x) = (x )m (x)
with some integer m 1 and some continuous function (x) for which () 6= 0. Then we say that
is a root of f (x) of multiplicity m.
Definition 1.2 (Convergence). A sequence {xn } is said to be converge to a point with order p if
there is exist a constant c such that
|xn+1 |
n 0.
lim
p = c,
n |xn |
The constant c is known as asymptotic error constant.
Two cases are given special attention.
(i) If p = 1 (and c < 1), the sequence is linearly convergent.
(ii) If p = 2, the sequence is quadratically convergent.
Definition 1.3. Let {n } is a sequence which converges to zero and {xn } is any sequence. If there
exists a constant c > 0 and integer N > 0 such that
|xn | c|n |,
n N,
ak1
0
0
0
0.125
0.1875
bk1
1
0.5
0.25
0.25
0.25
ck
0.5
0.25
0.125
0.1875
0.21875
f (ak1 ) f (ck )
<0
<0
>0
>0
<0
End if.
Until |a b| (tolerance value).
Print root as c.
Example 1. Perform the five iterations of the bisection method to obtain the smallest root of the
equation x3 5x + 1 = 0.
Sol. We write f (x) = x3 5x + 1 = 0.
Since f (0) > 0 and f (1) < 0,
= the smallest root lies in the interval (0, 1). 1 Taking a0 = 0 and b0 = 1, we obtain c1 = 12 (a0 +b0 ) =
0.5.
Now f (c1 ) = 1.375, f (a0 ) f (c1 ) < 0.
This implies root lies in the interval [0, 0.5].
Now we take a1 = 0 and b1 = 0.5, then c2 = 12 (a1 + b1 ) = 0.25
f (c2 ) = 0.2343, and f (a1 ) f (c2 ) < 0,
which implies root lies in interval [0, 0.25].
Similarly applying the same procedure, we can obtain the other iterations as given in the following
Table.
0.1875 + 0.21875
= 0.203125 as root .
Root lies in (0.1875, 0.21875), and we take the mid point
2
2.2. Convergence analysis. Now we analyze the convergence of the iterations.
Theorem 2.1. Suppose that f C[a, b] and f (a)f (b) < 0. The Bisection method generates a sequence
{ck } approximating a zero of f with linear convergence.
Proof. Let [a0 , b0 ], [a1 , b1 ], ... denote the successive intervals produced by the bisection algorithm.
Thus
a = a0 a1 a2 b0 = b
b = b0 b1 b2 a0 = a.
This implies {an } and {bn } are monotonic and bounded and hence convergent.
Since
1
b1 a1 = (b0 a0 )
2
1
1
b2 a2 = (b1 a1 ) = 2 (b0 a0 )
2
2
........................
1
bn an = n (b0 a0 ).
2
Hence
lim (bn an ) = 0.
n
Take limit
lim an = lim bn = (say).
1Choice of Initial approximations: Initial approximations to the root are often known from the physical significance
of the problem. Graphical methods are used to find the zero of f (x) = 0 and any value in the neighborhood of root can
be taken as initial approximation.
If the given equation f (x) = 0 can be written as f1 (x) = f2 (x) = 0, then the point of the intersection of the graphs
y = f1 (x) and y = f2 (x) gives the root of the equation. Any value in the neighborhood of this point can be taken as
initial approximation.
= f () = 0.
i.e. limit of {an } and {bn } is a zero of [a, b].
Let cn+1 = 21 (an + bn )
Then
1
1
| cn+1 | = | lim an (an + bn )| |bn (an + bn )|
n
2
2
(since an bn , n)
1
1
= |bn an | = n+1 |b0 a0 |.
2
2
By definition of convergence, we can say that the bisection method converges linearly with rate 12 .
Note: 1. From the statement of the bisection algorithm, it is clear that the algorithm always converges,
however, can be very slow.
ak + bk
2. Computing ck : It might happen that at a certain iteration k, computation of ck =
will
2
give overflow. It is better to compute ck as:
bk ak
ck = ak +
.
2
Stopping Criteria: Since this is an iterative method, we must determine some stopping criteria that
will allow the iteration to stop. Criterion |f (ck )| very small can be misleading since it is possible to
have |f (ck )| very small, even if ck is not close to the root.
Lets now find out what is the minimum number of iterations N needed with the bisection method
b0 a0
to achieve a certain desired accuracy. The interval length after N iterations is
. So, to obtain
2N
b0 a0
an accuracy of , we must have
. That is,
2N
2N (b0 a0 ) ,
or
N
log(b0 a0 ) log
.
log 2
Note the number N depends only on the initial interval [a0 , b0 ] bracketing the root.
Example 2. Find the minimum number of iterations needed by the bisection algorithm to approximate
the root in interval [2.5, 4] of x3 6x2 + 11x 6 = 0 with error tolerance 103 .
Sol. Number of iterations
N
Thus, a minimum of 11 iterations will be needed to obtain the desired accuracy using the bisection
method.
x1 x0
f1 .
f1 f0
This is called the secant or chord method and successive iterations are given by
xk xk1
xk+1 = xk
fk , k = 1, 2, . . .
fk fk1
x2 = x1
Geometrically, in this method we replace the unknown function by a straight line or chord passing
through (xk1 , fk1 ) and (xk , fk ) and we take the point of intersection of the straight line with the
x-axis as the next approximation to the root.
k k1
f ( + k ).
f ( + k ) f ( + k1 )
1 f 00 ()
.
2 f 0 ()
This relation is called the error equation. Now by the definition of the order of convergence, we expect
a relation of the following type
k+1 = Cpk .
1/p
= pk = AC (1+1/p) k
f (x0 )
.
f 0 (x0 )
This is called the Newton method and successive iterations are given by
f (xk )
xk+1 = xk 0
, k = 0, 1, . . . .
f (xk )
The method can be obtained directly from the secant method by taking limit xk1 xk . In the
limiting case the chord joining the points (xk1 , fk1 ) and (xk , fk ) becomes the tangent at (xk , fk ).
In this case problem of finding the root of the equation is equivalent to finding the point of intersection
of the tangent to the curve y = f (x) at point (xk , fk ) with the x-axis.
= x0
2. for k = 0, 1, 2, . . . do
if f (x) is sufficiently small
then x = x
return x
end
f (xk )
3. xk+1 = xk 0
f (xk )
If |xk+1 xk | is sufficiently small
then x = xk+1
return x
end
4. end (for main loop)
Example 4. Use Newtons Method in computing of
2.
x4 = 1.41421356
x5 = 1.41421356.
Since the fourth and fifth iterates agree in to eight decimal places, we assume that 1.41421356 is a
correct solution to f (x) = 0, to at least eight decimal places.
Example 5. Perform four iterations to Newtons method to obtain the approximate value of (17)1/3
start with x0 = 2.0.
Sol. Let x = 171/3 which implies x3 = 17.
Let f (x) = x3 17 = 0.
Newton approximations are given by
xk+1 = xk
x3k 17
2x3k + 17
=
,
3x2k
3x2k
k = 0, 1, 2, . . . .
f (xk )
xk 2 sin xk
2(sin xk xk cos xk )
= xk
=
.
0
f (xk )
1 2 cos xk
1 2 cos xk
ln x
2
= (2x2 + ln x).
x
x
Our problem thus comes down to solving the equation S 0 (x) = 0. We can use the Newton Method
directly on S 0 (x), but calculations are more pleasant if we observe that S 0 (x) = 0 is equivalent to
x2 + ln x = 0.
Let f (x) = x2 + ln x. Then f 0 (x) = 2x + 1/x and we get the recurrence relation
xk+1 = xk
x2k + ln xk
2xk + 1/xk
We need to find a suitable starting point x0 . Experimentation with a calculator suggests that we take
x0 = 0.65.
Then x1 = 0.6529181, and x2 = 0.65291864.
Since x1 agrees with x2 to 5 decimal places, we can perhaps decide that, to 5 places, the minimum
distance occurs at x = 0.65292.
f ( + k )
f 0 ( + k )
k = 1, 2, . . .
Theorem 3.3. Let f(x) be twice continuously differentiable on the closed finite interval [a, b] and let
the following conditions be satisfied:
(i) f (a) f (b) < 0.
(ii) f 0 (x) 6= 0, x [a, b].
(iii) Either f 00 (x) 0 or f 00 (x) 0, x [a, b].
(iv) At the end points a, b,
|f (b)|
|f (a)|
< b a,
< b a.
0
|f (a)|
|f 0 (b)|
Then the Newtons method converges to the unique solution of f (x) = 0 in [a, b] for any choice of
x0 [a, b].
Some comments about these conditions: Conditions (i) and (ii) guarantee that there is one and
only one solution in [a, b]. Condition (iii) states that the graph of f (x) is either concave from above or
concave from below, and furthermore together with condition (ii) implies that f 0 (x) is monotone on
[a, b]. Added to these, condition (iv) states that the tangent to the curve at either endpoint intersects
the xaxis within the interval [a, b]. Proof of the Theorem is left as an exercise for interested readers.
Example 8. Find an interval containing the smallest positive zero of f (x) = ex sin x and which
satisfies the conditions of previous Theorem for convergence of Newtons method.
Sol. f (x) = ex sin x, we have f 0 (x) = ex cos x, f 00 (x) = ex sin x.
We choose [a, b] = [0, 1]. Then since f (0) = 1, f (1) = 0.47, we have f (a)f (b) < 0, so that condition
(i) is satisfied.
Since f 0 (x) < 0, x [0, 1], condition (ii) is satisfied, and since f 00 (x) > 0, x [0, 1], condition (iii) is
satisfied.
Finally since f (0) = 1, f 0 (0) = 2,
|f (0)|
|f (1)|
= 1/2 < b a = 1, and since f (1) = 0.47 and f 0 (1) = 0.90, 0
= 0.52 < 1. This verify
0
|f (0)|
|f (1)|
condition (iv).
Newtons iteration will therefore converge for any choice of x0 in [0, 1].
Example 9. Find all the roots of cos x x2 x = 0 to five decimal places.
10
Sol. f (x) = cos x x2 x = 0 has two roots in the interval (2, 1) and (0, 1). Applying Newton
method,
cos xn x2n xn
xn+1 = xn
.
sin xn 2xn 1
Take x0 = 1.5 for the root in the interval (2, 1), we obtain
x1 = 1.27338985, x2 = 1.25137907, x3 = 1.25115186, x4 = 1.25114184.
Starting with x0 = 0.5, we can obtain the root in (0, 1) and iterations are given by
x1 = 0.55145650, x2 = 0.55001049, x3 = 0.55000935.
Hence roots correct to five decimals are 1.25115 and 0.55001.
3.5. Newton method for multiple roots. Let be a root of f (x) = 0 with multiplicity m. In this
case we can write
f (x) = (x )m (x).
In this case
f () = f 0 () = ... = f (m1) () = 0, f (m) () 6= 0.
Now
m+1
m
k
f (m+1) () + . . .
f (xk ) = f ( + k ) = k f (m) () +
m!
(m + 1)!
f 0 (xk ) = f 0 ( + k ) =
m1
m
k
f (m) () + k f (m+1) () + . . .
(m 1)!
m!
Therefore
k+1 = k
f ( + k )
f 0 ( + k )
"
#
m
k
f (m+1) ()
k (m)
f () 1 +
+ ...
m!
(m + 1) f (m) ()
"
#
= k
(m+1) ()
m1
f
k
k
f (m) () 1 +
+ ...
(m 1)!
m f (m) ()
"
# "
#1
f (m+1) ()
k
k
k f (m+1) ()
= k
1+
+ ...
1+
+ ...
m
(m + 1) f (m) ()
m f (m) ()
!
k f (m+1) ()
k
= k
1
+ ...
m
m f (m) ()
1
) + O(2k ).
m
This implies method has linear rate of convergence for multiple roots.
However when the multiplicity of the root is known in advance, we can modify the method to increase
the order of convergence.
We consider
f (xk )
xk+1 = xk e 0
f (xk )
where e is an arbitrary constant to be determined.
If is a multiple root with multiplicity m then error equation
= k (1
f (xk )
.
f 0 (xk )
11
Example 10. Let f (x) = ex x 1. Show that f has a zero of multiplicity 2 at x = 0. Show that
Newtons method with x0 = 1 converges to this zero but not quadratically.
Sol. We have f (x) = ex x 1, f 0 (x) = ex 1 and f 00 (x) = ex .
Now f (0) = 1 0 1 = 0, f 0 (0) = 1 1 = 0 and f 00 (0) = 1. Therefore f has a zero of multiplicity 2
at x = 0.
f (xk )
Starting with x0 = 1, iterations are given by xk+1 = xk 0
x1 = 0.58198, x2 = 0.31906,
f (xk )
x3 = 0.16800
x4 = 0.08635, x5 = 0.04380, x6 = 0.02206.
Example 11. The equation f (x) = x3 7x2 + 16x 12 = 0 has a double root at x = 2.0. Starting
with x0 = 1, find the root correct to three decimals.
Sol. Firstly we apply simple Newton method and successive iterations are given by
xk+1 = xk
x3k x2k xk + 1
.
3x2k 2xk 1
x3k x2k xk + 1
.
3x2k 2xk 1
12
n 0,
2
3
2.0
2.0
1.5
1.75
2.0 1.732147
1.5 1.73205
Now 3 = 1.73205 and it is clear that third choice is correct but why other two are not working?
Therefore which of the approximation is correct or not, we will answer after the convergence result
(which requires |g 0 () < 1| in the neighborhood of ) for convergence.
Lemma 4.1. Let g(x) be a continuous function on [a, b] and assume that a g(x) b, x [a, b] i.e.
g([a, b]) [a, b] then x = g(x) has at least one solution in [a, b].
Proof. Let g be a continuous function on [a, b].
Let assume that a g(x) b, x [a, b].
Now consider (x) = g(x) x.
If g(a) = a or g(b) = b then proof is trivial. Hence we assume that a 6= g(a) and b 6= g(b).
Now since a g(x) b
= g(a) > a and g(b) < b.
Now
(a) = g(a) a > 0
and
(b) = g(b) b < 0.
Now is continuous and (a)(b) < 0, therefore by Intermediate Value Theorem has at least one
zero in [a, b], i.e. there exists some s.t.
g() = , [a, b].
Graphically, the roots are the intersection points of y = x & y = g(x) as shown in the Figure.
13
Then
1. x = g(x) has a unique solution of x = g(x) in the interval [a, b].
2. The iterates xn+1 = g(xn ), n 1 will converge to for any choice of x0 [a, b].
n
3. | xn |
|x1 x0 |, n 0.
1
4.
| xn+1 |
lim
= |g 0 ()|.
n | xn |
Thus for xn close to , xn+1 g 0 ()( x0 ).
Proof. Let g and g 0 are continuous functions on [a, b] and assume that a g(x) b, x [a, b].
By previous Lemma, there exists at least one solution to x = g(x).
By Mean-Value Theorem, c s.t.
g(x) g(y) = g 0 (c)(x y).
|g(x) g(y)| |x y|, 0 < < 1, x [a, b].
1. Let x = g(x) has two solutions, say and in [a, b] then = g(), and = g().
Now | | = |g() g()| | |
= (1 )| | 0
Since 0 < < 1, = = .
= x = g(x) has a unique solution in [a, b] which is (say).
2. To check the convergence of iterates {xn }, we observe that they all remain in [a, b] as xn
[a, b], xn+1 = g(xn ) [a, b].
Now
| xn+1 | = |g() g(xn )| = g 0 (cn )| xn |
for some cn between and xn .
= | xn+1 | | xn | 2 | xn1 |
................
n | x0 |
As n , n 0 which = xn .
14
1
|x1 x0 |
1
= n | x0 |
n
|x1 x0 |
1
Therefore
| xn |
n
|x1 x0 |
1
4. Now
| xn+1 |
= = lim |g 0 (cn )|
n | xn |
n
lim
| xn+1 |
= |g 0 ()|.
| xn |
shows that iterates are linearly convergent. If in addition g 0 () 6= 0, then formula proves that convergence is exactly linear, with no higher order of convergence being possible. In this case, the value of
g 0 () is the linear rate of convergence.
In practice, we dont use the above result in the Theorem. The main reason is that it is difficult to find
an interval [a, b] for which a g(x) b condition is satisfied. Therefore, we use the Theorem in the
following practical way.
Corollary 4.3. Let g & g 0 are continuous on some interval c < x < d with the fixed point contained
in this interval. Moreover assume that
|g 0 ()| < 1.
Thus there is an interval [a, b] around for which the hypothesis and hence conclusions of Theorem
are true.
On the contrary if |g 0 ()| > 1, then the iteration method xn+1 = g(xn ) will not converge to .
When |g 0 ()| = 1, no conclusion can be drawn and even if convergence occur, the method would be far
too slow for the iteration method to be practical.
Remark 4.2. The possible behavior of fixed-point iterates {xn } is shown in Figure for various values
of g 0 (). To see the convergence, consider the case case of x1 = g(x0 ), the height of y = g(x) at x0 .
We bring the number x1 back to the x-axis by using the line y = x and the height y = x1 . We continue
this with each iterate, obtaining a stair-step behavior when g 0 () > 0. When g 0 () < 0, the iterates
oscillates around the fixed point , as can be seen in the Figure. In first figure (on top) iterations are
monotonic convergence, in second oscillatory convergent, in third figure iterations are divergent and in
the last figure iterations are oscillatory divergent.
15
(4.1)
n0
lim
Proof. Let g(x) is p times continuously differentiable function for all x near to and satisfying the
conditions in equation (4.1) stated above.
Now expand g(xn ) about .
xn+1 = g(xn ) = g( + xn )
(xn )p1 (p1)
(xn )p (p)
g
() +
g (n )
(p 1)!
p!
for some n between xn and . Using equation (4.1) and g() = , we obtain
= g() + (xn )g 0 () + +
(xn )p (p)
xn+1 =
g (n ).
p!
=
xn+1
g (p) (n )
=
(xn )p
p!
16
(p)
xn+1
p1 g (n )
=
(1)
( xn )p
p!
g 00 () =
f 00 ()
6= 0,
f 0 ()
17
a + 2 + 1
a+1
= 3 2 1 = 0.
Therefore, the above formula is used to find the roots of the equation f (x) = x3 x2 1 = 0.
Now substitute xn = + n , and xn+1 = + n+1 , we get
(a + 1)( + n+1 ) = a( + n ) +
1
(1 + /)2 + 1
2
which implies
(a + 1)n+1 = (a 2/3 )n + O(2n ).
Therefore, for fastest convergence, we have a = 2/3 . Here is the root of the equation x3 x2 1 = 0
and can be computed by the Newton method.
Example 17. To compute the root of the equation
ex = 3 loge x,
using the formula
xn+1 = xn
3 loge xn exp(xn )
,
p
3 loge ( + n ) exp( n )
p
1
[3 loge + 3 loge (1 + n /) exp() exp(n )]
p
1
3 loge + 3 n / 2n /22 + O(3n ) exp() 1 n + 2n /2 . . .
p
18
Exercises
(1) Given the following equations: (i) x4 x 10 = 0, (ii) x ex = 0.
Find the initial approximations for finding the smallest positive root. Use these to find the
root correct three decimals with Secant and Newton method.
(2) Find all solutions of e2x = x + 6, correct to 4 decimal places using the Newton Method.
(3) Use the bisection method to find the indicated root of the following equations. Use an error
tolerance of = 0.0001.
(a) The real root of x3 x2 x 1 = 0.
(b) The smallest positive root of cos x = 1/2 + sin x.
(c) The real roots of x3 2x 1 = 0.
(4) Suppose that
2
e1/x , x 6= 0
f (x) =
0, x = 0
The function f is continuous everywhere, in fact differentiable arbitrarily often everywhere, and
0 is the only solution of f (x) = 0. Show that if x0 = 0.0001, it takes more than one hundred
million iterations of the Newton Method to get below 0.00005.
(5) A calculator is defective: it can only add, subtract, and multiply. Use the equation 1/x = 1.37,
the Newton Method, and the defective calculator to find 1/1.37 correct to 8 decimal places.
(6) Use the Newton Method to find the smallest and the second smallest positive roots of the
equation tan x = 4x, correct to 4 decimal places.
(7) What is the order of convergence of the iteration
xn+1 =
xn (x2n + 3a)
3x2n + a
(8)
What are the solutions , if any, of the equation x = 1 + x ? Does the iteration xn+1 =
1 + xn , converge to any of these solutions (assuming x0 is chosen sufficiently close to ) ?
(9) (a) Apply Newtons method to the function
x, x 0
f (x) =
x, x < 0
with the root = 0. What is the behavior of the iterates? Do they converge, and if so, at what
rate?
(b) Do the same as in (a), but with
3
x2 , x 0
f (x) =
3
x2 , x < 0.
(10) Find all positive roots of the equation
Z
10
ex dt = 1
( a xn+1 )
lim
n ( a xn )3
assuming x0 has been chosen sufficiently close to the root.
19
(12) Show that the following two sequences have convergence of the second order with the same
limit a.
1
a
1
x2n
(i) xn+1 = xn 1 + 2
(ii) xn+1 = xn 3
.
2
xn
2
a
If xn is a suitably close to approximation to a, show that the error in the first formula for
xn+1 is about one-third of that in the second formula, and deduce that the formula
1
3a x2
xn+1 = xn 6 + 2 n
8
xn
a
gives a sequence with third-order convergence.
(13) Suppose is a zero of multiplicity m of f , where f (m) is continuous on an open interval
containing . Show that the fixed-point method x = g(x) with the following g has secondorder convergence:
f (x)
g(x) = x m 0
.
f (x)
Bibliography
[Gerald]
Curtis F. Gerald and Patrick O. Wheatley Applied Numerical Analysis, 7th edition,
Pearson, 2003.
[Atkinson] K. Atkinson and W. Han. Elementary Numerical Analysis, 3rd edition, John Willey
and Sons, 2004.
[Jain] M. K. Jain, S. R. K. Iyengar, and R. K. Jain. Numerical Methods for Scientific and
Engineering Computation, 6th edition, New Age International Publishers, New Delhi,
2012.
CHAPTER 3 (4 LECTURES)
NUMERICAL SOLUTION OF SYSTEM OF LINEAR EQUATIONS
1. Introduction
System of simultaneous linear equations are associated with many problems in engineering and
science, as well as with applications to the social sciences and quantitative study of business and
economic problems. These problems occur in wide variety of disciplines, directly in real world problems
as well as in the solution process for other problems.
The principal objective of this Chapter is to discuss the numerical aspects of solving linear system of
equations having the form
a x + a12 x2 + .........a1n xn = b1
11 1
a21 x1 + a22 x2 + .........a2n xn = b2
(1.1)
................................................
a11
a21
..
.
a12
a22
..
.
..
.
an1 an2
a1n
x1
b1
x2 b2
a2n
.. .. = ..
. . .
ann
xn
(1.2)
bn
This equations has a unique solution x = A1 b, when the coefficient matrix A is non-singular. Unless
otherwise stated, we shall assume that this is the case under discussion. If A1 is already available,
then x = A1 b provides a good method of computing the solution x.
If A1 is not available, then in general A1 should not be computed solely for the purpose of obtaining
x. More efficient numerical procedures will be developed in this chapter. We study broadly two
categories Direct and Iterative methods. We start with direct method to solve the linear system.
2. Gaussian Elimination
Direct methods, which are technique that give a solution in a fixed number of steps, subject only to
round-off errors, are considered in this chapter. Gaussian elimination is the principal tool in the direct
solution of system (1.2). The method is named after Carl Friedrich Gauss (1777-1855). To solve larger
system of linear equation we use a method of introductory Algebra.
For example, consider
x1 + 2x2 + x3 = 0
2x1 + 2x2 + 3x3 = 3
x1 3x2 = 2.
Eliminating x1 from second and third equations, we obtain
x1 + 2x2 + x3 = 0
2x2 + x3 = 3
x2 + x3 = 2.
1
Now eliminate x2 from last equation with the help of second equation
x1 + 2x2 + x3 = 0
2x2 + x3 = 3
1
x3 = 2.
2
The last equation gives x3 = 1.
Therefore 2x2 + 1 = 3, = x2 = 1.
x1 + 2(1) + 1 = 0, = x1 = 1.
(R1 )
(R2 )
(R3 ).
(2)
(2)
(R2 )
(2)
(2)
(2)
(R3 ).
a22 x2 + a23 x3 = b2
a32 x2 + a33 x3 = b3
Here
(2)
bi
= bi mi1 b1 , i = 2, 3.
(2)
(3)
a33 x3 = b3
(R3 ).
Here
(3)
(2)
(2)
(3)
(2)
(2)
x3 =
b3
(3)
a33
(2)
x2 =
(2)
b3 a23 x3
(2)
a22
b1 a12 x2 a13 x3
.
a11
Note that every step of the elimination procedure can be obtained through elementary row operations
on the Augmented matrix (A | b).
x1 =
(3) Input the coefficients of the linear equation with right side as:
Do for i = 1 to n
Do for j = 1 to n + 1
Read a[i][j] End for j
End for i
(4) Do for k = 1 to n 1
Do for i = k + 1 to n
Do for j = k + 1 to n + 1
a[i][j] = a[i][j] a[i][k]/a[k][k] a[k][j] End for j
End for i
End for k
(5) Compute x[n] = a[n][n + 1]/a[n][n]
(6) Do for i = n 1 to 1
sum = 0
Do for j = i + 1 to n
sum = sum +a[i][j] x[j] End for j
x[i] = 1/a[i][i] (a[i][n + 1] sum)
End for i
(7) Display the result x[i]
(8) Stop
Partial Pivoting: In the elimination process, it is assumed that pivot element aii 6= 0, i = 1, 2, . . . , n.
If at any stage of elimination, one of the pivot becomes small (or zero) then we bring other element as
pivot by interchanging rows.
Remark 2.1. Unique Solution, No Solution, or Infinite Solutions.
Here are some tips that will allow us to determine what type of solutions we have based on either the
reduced echelon form.
1. If we have a leading one in every column, then we will have a unique solution.
2. If we have a row of zeros equal to a non-zero number in right side, then the system has no solution.
3. If we dont have a leading one in every column in a homogeneous system, i.e. a system where all
the equations equal zero, or a row of zeros, then system have infinite solutions.
Example 1. Solve the system of equations. This system has solution x1 = 2.6, x2 = 3.8, x3 = 5.0.
6x1 + 2x2 + 2x3 = 2
1
2
2x1 + x2 + x3 = 1
3
3
x1 + 2x2 x3 = 0.
Sol. Let us use a floating-point representation
Augmented matrix is given by
6.000 2.000
2.000 1.667
1.000 2.000
2.000 2.000
0.3333 1.000
1.000
0.0
2
1
= 0.3333 and m31 = = 0.1667.
6
6
(2)
= a21 m21 a11 , a22 = a22 m21 a12 etc.
a21
6.000
2.000
2.000
2.000
0.0 0.0001000 0.3333 1.667
0.0
1.667
1.333 0.3334
Multiplier is m32 =
1.667
= 16670
0.0001
6.000
2.000
2.000
2.000
0.0 0.0001000 0.3333 1.667
0.0
0.0
5555
27790
6.000 2.000
2.000
2.000
0.0 1.667 1.337 0.3334
0.0
0.0 0.3332 1.667
Using back substitution, we obtain
x3 = 5.003
x2 = 3.801
x1 = 2.602.
We see that after partial pivoting, we get the desired solution.
Complete Pivoting: In the first stage of elimination, we search the largest element in magnitude
from the entire matrix and bring it at the position of first pivot. We repeat the same process at every
step of elimination. This process require interchange of both rows and columns.
Scaled Partial Pivoting: In this approach, the algorithm selects as the pivot element the entry that
has the largest relative entries in its rows.
At the beginning, a scale factor must be computed for each equation in the system. We define
si = max |aij | (1 i n)
1jn
These numbers are recored in the scaled vector s = [s1 , s2 , , sn ]. Note that the scale vector does not
change throughout the procedure. In starting the forward elimination process, we do not arbitrarily
ai,1
is
use the first equation as the pivot equation. Instead, we use the equation for which the ratio
si
greatest. We repeat the process by taking same scaling factors.
Example 2. Solve the system
3x1 13x2 + 9x3 + 3x4
6x1 + 4x2 + x3 18x4
6x1 2x2 + 2x3 + 4x4
12x1 8x2 + 6x3 + 10x4
=
=
=
=
19
34
16
26
by hand using scaled partial pivoting. Justify all row interchanges and write out the transformed matrix
after you finish working on each column.
Sol. The augmented matrix is
3 13 9
3
19
6
4
1 18 34
6
2 2
4
16
12 8 6 10
26.
and the scale factors are s1 = 13, s2 = 18, s3 = 6, & s4 = 12. We need to pick the largest (3/13, 6/18, 6/6, 12/12),
which is the third entry, and interchange row 1 and row 3 and interchange s1 and s3 to get
6
2 2
4
16
6
4
1 18 34
3 13 9
3
19
12 8 6 10
26.
with s1 = 6, s2 = 18, s3 = 13, s4 = 12. Performing R2 (6/6)R1 R2, R3 (3/6)R1 R3 , and
R4 (12/6)R1 R4 , we obtain
6 2 2
4
16
0
2
3 14 18
0 12 8
1
27
0 4 2
2
6
Comparing (a22 /s2 = 2/18, a32 /s3 = 12/13, a42 /s4 = 4/12), the largest is the third entry so we need
to interchange row 2 and row 3 and interchange s2 and s3 to get
6 2 2
4
16
0 12 8
1
27
0
2
3 14 18
0 4 2
2
6
with s1 = 6, s2 = 13, s3 = 18, s4 = 12. Performing R3 (2/12)R2 R3 and R4 (4/12)R2 R4,
we get
6 2
2
4
16
0 12
8
1
27
0
0
13/3 83/6 45/2
0
0
2/3
5/3
3
Comparing (a33 /s3 = (13/3)/18, a43 /s4 = (2/3)/12), the largest is the first entry so we do not interchange rows. Performing R4 (2/13)R3 R4 , we get the final reduced matrix
6 2
2
4
16
0 12
8
1
27
0
0
13/3 83/6 45/2
0
0
0
6/13 6/13
Backward substitution gives x1 = 3, x2 = 1, x3 = 2, x4 = 1.
Example 3. Solve this system of linear equations:
0.0001x + y = 1
x+y =2
using no pivoting, partial pivoting, and scaled partial pivoting. Carry at most five significant digits
of precision (rounding) to see how finite precision computations and roundoff errors can affect the
calculations.
Sol. By direct substitution, it is easy to verify that the true solution is x = 1.0001 and y = 0.99990 to
five significant digits.
For no pivoting, the first equation in the original system is the pivot equation, and the multiplier is
1/0.0001 = 10000. The new system of equations is
0.0001x + y = 1
9999y = 9998
We obtain y = 9998/9999 0.99990 and x = 1. Notice that we have lost the last significant digit in
the correct value of x.
We repeat the solution process using partial pivoting for the original system. We see that the second
entry is larger, so the second equation is used as the pivot equation. We can interchange the two
equations, obtaining
x+y =2
0.0001x + y = 1
which gives y = 0.99980/0.99990 0.99990 and x = 2 y = 2 0.99990 = 1.0001.
Both computed values of x and y are correct to five significant digits.
We repeat the solution process using scaled partial pivoting for the original system. Since the scaling
constants are s = (1, 1) and the ratios for determining the pivot equation are (0.0001/1, 1/1), the
second equation is now the pivot equation. We do not actually interchange the equations and use
the second equation as the first pivot equation. The rest of the calculations are as above for partial
pivoting. The computed values of x and y are correct to five significant digits.
Operations count for Gauss elimination We consider the number of floating point operations
(flops) required to solve the system Ax = b. Gaussian Elimination first uses row operations to
transform the problem into an equivalent problem of the form U x = b, where U is upper triangular.
Then back substitution is used to solve for x. First we look at how many floating point operations are
required to reduce into upper triangular matrix.
First a multiplier is computed for each row. Then in each row the algorithm performs n multiplies and
n adds. This gives a total of (n 1) + (n 1)n multiplies (counting in the computing of the multiplier
in each of the (n 1) rows) and (n 1)n adds.
In total this is 2n2 n 1 floating point operations to do a single pivot on the n by n system.
Then this has to be done recursively on the lower right subsystem, which is an (n1) by (n1) system.
This requires 2(n 1)2 (n 1) 1 operations. Then this has to be done on the next subsystem,
requiring 2(n 2)2 (n 2) 1 operations, and so on.
In total, then, we use In total floating point operations, with
n
X
2
n(n + 1)(4n 1)
n n3 .
In =
(2k 2 k 1) =
6
3
k=1
Counts for back substitution: To find xn we just requires one division. Then to solve for xn1 we
requires 3 flops. Similarly, solving for xn2 requires 5 flops. Thus in total back substitution requires
Bn total floating point operations with
n
X
Bn =
(2k 1) = n(n 1) n = n(n 2) n2 .
k=1
The LU Factorization: When we use matrix multiplication, another meaning can be given to the
Gauss elimination. The matrix A can be factored into the product of the two triangular matrices.
Let AX = b is the system to be solved, A is n n coefficient matrix. The linear system can be reduced
to the upper triangular system U X = g with
U = ..
..
.
.
.
.
.
0
0 unn
Here uij = aij . Introduce an auxiliary lower triangular matrix L based on the multipliers mij as
following
1
0
0
0
m21
1
0
m31 m32 1
0
L=
..
..
..
.
.
.
mn,1
0
mn,n1 1
Theorem 2.1. Let A be a non-singular matrix and let L and U be defined as above. If U is produced
without pivoting then
LU = A.
=
=
=
=
1
1
1
1.
4 3 2 1 1
3 4 3 2 1
2 3 4 3 1
1 2 3 4 1
1
1
3
Multipliers are m21 = , m31 = , and m41 = .
4
2
4
Replace R2 with R2 m21 R1 , R3 with R3 m31 R1 and R4 with R4 m41 R1 .
4 3
2
1
1
0 7/4 3/2 5/4
1/4
0 3/2 3
5/2 3/2
0 5/4 5/2 15/4 5/4
5
6
Multipliers are m32 = and m42 = .
7
7
Replace R3 with R3 m32 R2 and R4 with R4 m42 R2 , we obtain
4 3
2
1
1
0 7/4 3/2 5/4
1/4
5
and we replace R4 with R4 m43 R3 .
6
4 3
2
1
1
0 7/4 3/2 5/4
1/4
4 3
2
1
0 7/4 3/2 5/4
U =
0 0 12/7 10/7
0 0
0
5/3
1
0
0
3/4 1
0
L=
1/2 6/7 1
1/4 5/7 5/6
It can be verified that LU = A.
0
0
0
1
3. Iterative Method
The linear system Ax = b may have a large order. For such systems Gauss elimination is often too
expensive in either computation time or computer memory requirements or both.
In an iterative method, a sequence of progressively iterates is produced to approximate the solution.
Jacobi and Gauss-Seidel Method: We start with an example and let us consider a system of
equations
9x1 + x2 + x3 = 10
2x1 + 10x2 + 3x3 = 19
3x1 + 4x2 + 11x3 = 0.
One class of iterative method for solving this system as follows.
We write
1
x1 = (10 x2 x3 )
9
1
x2 = (19 2x1 3x3 )
10
1
x3 = (0 3x1 4x2 ).
11
(0) (0) (0)
Let x(0) = [x1 x2 x3 ] be an initial approximation of solution x. Then define an iteration of sequence
1
(k+1)
(k)
(k)
x1
= (10 x2 x3 )
9
1
(k+1)
(k)
(k)
x2
= (19 2x1 3x3 )
10
1
(k+1)
(k)
(k)
x3
= (0 3x1 4x2 ), k = 0, 1, 2, . . . .
11
This is called Jacobi or method of simultaneous replacements. The method is named after German
mathematician Carl Gustav Jacob Jacobi.
We start with [0 0 0] and obtain
(1)
(1)
(1)
(2)
(2)
(1)
(1)
(2)
(2)
(2)
a11 x1
(k+1)
(k)
= a12 x2 + b1
(k+1)
(k)
+ a22 x2
= a23 x3 + b2
..................................
(k+1)
(k+1)
an1 x1
+ an2 x2
+ + ann x(k+1)
= bn
n
a21 x1
or (D + L)x(k+1) = U x(k) + b
where D, L and U are diagonal, strictly lower triangle and upper triangle matrices, respectively.
x(k+1) = (D + L)1 U x(k) + (D + L)1 b
x(k+1) = T x(k) + B, k = 0, 1, 2,
Here T = (D + L)1 U and this matrix is called iteration matrix.
Algorithm[Gauss-Seidel]
(1) Input matrix A = [aij ], b, XO = x(0) , tolerance TOL, maximum number of iterations
(2) Set k = 1
(3) while (k N ) do step 4-7
(4) For i = 1, 2, , n
i1
n
X
X
1
xi =
(aij xj )
(aij XOj ) + bi )
aii
j=1
j=i+1
x(k)
Therefore
lim x(k) = (I T )1 B
as k
10
= (T ) < 1.
|aii | >
|aij |,
i = 1, 2, , n
j=1,j6=i
i
X
naij = [
aij xj ],
X
j=i+1
i = 1, 2, . . . , n
j=1
naij = aii xi +
j=i+1
aij xj
j=1
aii xi =
i1
X
aij xj
j=1
|aii xi | ||
i1
X
i1
X
n
X
j=i+1
|aij | |xj | + ||
j=1
aij xj
n
X
|aij | |xj |
j=i+1
i1
n
X
X
|aij |
|aij |
|| |aii |
j=1
Pn
= ||
j=i+1 |aij |
Pi1
|aii | j=1
|aij |
j=i+1
Pn
j=i+1
Pn
|aij |
j=i+1 |aij |
=1
11
1 2 2
A = 1 1 1
2 2 1
Decide whether Gauss-Seidel converge to the solution of Ax = b.
Sol. The iteration matrix of the Gauss-Seidel method is
T
= (D + L)1 U
1
1 0 0
0
0
= 1 1 0
2 2 1
0
1
0 0
0
= 1 1 0 0
0 2 1
0
0 2 2
= 0 2 3
0 0 2
0 2 2
= 0 2 3
0 0 2
2 2
0 1
0 0
2 2
0 1
0 0
The eigenvalues of iteration matrix T are = 0, 2, 2 and therefore spectral radius > 1. The iteration
diverges.
4. Power method for approximating eigenvalues
The eigenvalues of an n n of matrix A are obtained by solving its characteristic equation
det(a I) = 0
n
n1
+ cn1
cn2 n2 + + c0 = 0.
For large values of n, the polynomial equations like this one are difficult, time-consuming to solve
and sensitive to rounding errors. In this section we look at an alternative method for approximating
eigenvalues. The method can be used only to find the eigenvalue of A that is largest in absolute value.
We call this eigenvalue the dominant eigenvalue of A.
Definition 4.1 (Dominant Eigenvalue and Dominant Eigenvector). Let 1 , 2 , , and n be the
eigenvalues of an n n matrix A. 1 is called the dominant eigenvalue of A if
|1 | > |i |,
i = 2, 3, . . . , n.
12
2
1
k
x2 + + cn
n
1
k
xn
Now, from our original assumption that 1 is larger in absolute value than the other eigenvalues it
follows that each of the fractions
2 3
n
, , ,
< 1.
1 1
1
Therefore each of the factors
k k
k
2
3
n
,
, ,
1
1
1
must approach 0 as k approaches infinity. This implies that the approximation
Ak x0 k1 c1 x1 ,
c1 6= 0
13
improves as k increases. Since x1 is a dominant eigenvector, it follows that any scalar multiple of x1 is
also a dominant eigenvector. Thus we have shown that Ak x0 approaches a multiple of the dominant
eigenvector of A.
Algorithm
(1) Start
(2) Define matrix A and initial guess x
(3) Calculate y = Ax
(4) Find the largest element in magnitude of matrix y and assign it to K.
(5) Calculate fresh value x = (1/K) y
(6) If [K(n) K(n 1)] > error, goto step 3.
(7) Stop
Example 7. Calculate seven iterations of the power
eigenvector of the matrix
1 2
2 1
1 3
Sol. Using x0 = [1, 1, 1]T as initial approximation,
1
y1 = Ax0 = 2
1
0
2
1
we obtain
2 0
1
3
1 2 1 = 1
3 1
1
5
3
0.60
x1 = 1/5 1 = 0.20
5
1.00
Similarly we get
1.00
0.45
y2 = Ax1 = 1.00 = 2.20 0.45 = 2.20x2
2.20
1.00
1.35
0.48
y3 = Ax2 = 1.55 = 2.8 0.55 = 2.8x3
2.8
1.00
0.51
y4 = Ax3 = 3.1 0.51
1.00
etc.
After several iterations, we observe that dominant eigenvector is
0.50
x = 0.50
1.00
Scaling factors are approaching to dominant eigenvalue = 3.
Remark 4.1. The power method is useful to compute the eigenvalue but it gives only dominant eigenvalue. To find other eigenvalue we use properties of matrix such as sum of all eigenvalue is equal to the
trace of matrix. Also if is an eigenvalue of A then 1 is the eigenvalue of A1 . Hence the smallest
eigenvalue of A is the dominant eigenvalue of A1 .
Remark 4.2. We consider A I then its eigenvalues are (1 , 2 , . . . ). Now the eigenvalues
1
1
,
, . . . ).
of (A I)1 are (
1 2
The eigenvalues of the original matrix A that is the closest to corresponds to the eigenvalue of largest
magnitude of the shifted and inverted of matrix (A I)1 .
To find the eigenvalue closest to , we apply the Power method to obtain the eigenvalue of (AI)1 .
14
Then we recover the eigenvalue of the original problem by = 1/ + . This method is shifted and
inverted. We solve y = (A I)1 x which implies (A I)y = x. We need not to compute the inverse
of the matrix.
Example 8. Find the eigenvalue of matrix nearest to 3
2 1 0
1 2 1
0 1 2
using power method.
Sol. The eigenvalue of matrix A which is nearest to 3 is the smallest eigenvalue in magnitude of A 3I.
Hence it is the largest eigenvalue of (A 3I)1 in magnitude. Now
1 1 0
A 3I = 1 1 1
0 1 1
0 1 1
B = (A 3I)1 = 1 1 1 .
1 1 0
Starting with
1
x0 = 1
1
we obtain
0 1 1
1
0
y1 = Bx0 = 1 1 1 1 = 1 = 1.x1
1 1 0
1
0
1
y2 = Bx1 = 1 = 1.x2
1
2
0.6667
y3 = Bx2 = 3 = 3 1 = 3x3
2
0.6667
1.6667
0.7143
y4 = Bx3 = 2.3334 = 2.3334 1 = 2.3334x4 .
1.6667
0.7143
After six iterations, we obtain the dominant eigenvalue of matrix B and which is 2.4 and the dominant
eigenvector is
0.7143
1 .
0.7143
1
Now the eigenvalue of matrix A is 3 2.4
= 3 0.42 = 3.42, 2.58. Since 2.58 does not satisfy
|A 2.58I| = 0, therefore the correct eigenvalue of matrix A nearest to 3 is 3.42.
Although the power method worked well in these examples, we must say something about cases in
which the power method may fail. There are basically three such cases :
1. Using the power method when A is not diagonalizable. Recall that A has n linearly Independent
eigenvector if and only if A is diagonalizable. Of course, it is not easy to tell by just looking at A
whether is is diagonalizable.
2. Using the power method when A does not have a dominant eigenvalue or when the dominant
eigenvalue is such that 1 = 2 .
3. If the entries of A contains significant error. Powers Ak will have significant roundoff error in their
entires.
15
Exercises
(1) Using the four-decimal-place computer solve the following system of equation without and with
pivoting
0.729x1 + 0.81x2 + 0.9x3 = 0.6867
x1 + x2 + x3 = 0.8338
1.331x1 + 1.21x2 + 1.1x3 = 1.000
This system has exact solution, rounded to four places x1 = 0.2245, x2 = 0.2814, x3 = 0.3279.
(2) Solve the following system of equations by Gaussian elimination with partial and scaled partial
pivoting
x1 + 2x2 + x3 = 3
3x1 + 4x2 = 3
2x1 + 10x2 + 4x3 = 10.
(3) Consider the linear system
x1 + 4x2 = 1
4x1 + x2 = 0.
The true solution is x1 = 1/5 and x2 = 4/15. Apply the Jacobi and Gauss-Seidel methods
with x(0) = [0, 0]T to the system and find out which methods diverges rapidly. Next, interchange
the two equations to write the system as
4x1 + x2 = 0
x1 + 4x2 = 1
and apply both methods with x(0) = [0, 0]T .
Iterate until ||x x(k) || 105 . Which method converges faster?
(4) Solve the system of equations by Jacobi and Gauss-Seidel method
8x1 + x2 + 2x3 = 1
x1 5x2 + x3 = 16
x1 + x2 4x3 = 7.
(5) Solve this system of equations by Gauss-Seidel, starting with the initial vector [0,0,0]:
4.63x1 1.21x2 + 3.22x3 = 2.22
3.07x1 + 5.48x2 + 2.11x3 = 3.17
1.26x1 + 3.11x2 + 4.57x3 = 5.11.
(6) Show that Gauss-Seidel method does not converge for the following system of equations
2x1 + 3x2 + x3 = 1
3x1 + 2x2 + 2x3 = 1
x1 + 2x2 + 2x3 = 1.
(7) Consider the iteration
x
(k+1)
2 1
=b+
1 2
x(k) ,
k0
where is a real constant. For some values of , the iteration method converges for any choice
of initial guess x(0) , and for some other values of , the method diverges. Find the values of
for which the method converges.
(8) Determine the largest eigenvalue and the corresponding eigenvector of the matrix
4 1 0
A = 1 20 1
0 1 4
correct to three decimals using the power method.
16
2 1 0
1 2 1
0 1 2
using four iterations of the inverse power method.
Bibliography
[Gerald]
Curtis F. Gerald and Patrick O. Wheatley Applied Numerical Analysis, 7th edition,
Pearson, 2003.
[Atkinson] K. Atkinson and W. Han. Elementary Numerical Analysis, 3rd edition, John Willey
and Sons, 2004.
CHAPTER 4 (6 LECTURES)
POLYNOMIAL INTERPOLATION AND APPROXIMATIONS
1. Introduction
Polynomials are used as the basic means of approximation in nearly all areas of numerical analysis.
They are used in the solution of equations and in the approximation of functions, of integrals and
derivatives, of solutions of integral and differential equations, etc. Polynomials have simple structure,
which makes it easy to construct effective approximations and then make use of them. For this reason,
the representation and evaluation of polynomials is a basic topic in numerical analysis. We discuss this
topic in the present chapter in the context of polynomial interpolation, the simplest and certainly the
most widely used technique for obtaining polynomial approximations.
Definition 1.1 (Polynomial). A polynomial Pn (x) of degree n is, by definition, a function of the
form
Pn (x) = a0 + a1 x + a2 x2 + + an xn
(1.1)
with certain coefficients a0 , a1 , , an . This polynomial has (exact) degree n in case its leading coefficient an is nonzero.
The power form (1.1) is the standard way to specify a polynomial in mathematical discussions. It is
a very convenient form for differentiating or integrating a polynomial. But, in various specific contexts,
other forms are more convenient. For example, the following shifted power form may be helpful.
P (x) = a0 + a1 (x c) + a2 (x c)2 + + an (x c)n .
(1.2)
It is good practice to employ the shifted power form with the center c chosen somewhere in the interval
[a, b] when interested in a polynomial on that interval.
Remark 1.1. The coefficients in the shifted power form provide derivative values, i.e.,
P (i) (c)
, i = 0, 1, 2, , n.
i!
In effect, the shifted power form provides the Taylor expansion for P (x) around the center c.
ai =
Definition 1.2 (Newton form). A further generalization of the shifted power form is the following
Newton form
P (x) = a0 + a1 x c1 ) + a2 (x c1 )(x c2 ) + + an (x c1 )(x c2 ) (x cn ).
This form plays a major role in the construction of an interpolating polynomial. It reduces to the
shifted power form if the centers c1 , , cn , all equal c, and to the power form if the centers c1 , , cn ,
all equal zero. The following discussion on the evaluation of the Newton form therefore applies directly
to these simpler forms as well.
It is inefficient to evaluate each of the n + 1 terms in the Newton form separately and then sum. This
would take n + n(n + 1)/2 additions and n(n + 1)/2 multiplications. Instead, we notice that the factor
(x c1 ) occurs in all terms but the first and the factor (x c2 ) occurs in remaining factors and then
(x c3 ) and so on. Finally we get
P (x) = a0 + (x c1 ) {a1 + (x c2 )[a2 + (x c3 )[a3 + + [(x cn1 [an1 + (x cn )an ] ]}
Now for any particular value of x takes 2n additions and n multiplications.
Theorem 1.3 (Algorithm). (Nested Multiplication) Let P (x) be the polynomial in Newton form having
coefficients a0 , a1 , , an and centers c1 , c2 , , cn , the following algorithm computes y = P (x) for a
given real number x.
y = an
for i = n 1, n 2, , 0 do
1
y = ai + (x ci )y
end.
Example 1. Consider the interpolating polynomial
P3 (x) = 3 7(x + 1) + 8(x + 1)x 6(x + 1)x(x 1).
We will use nested multiplication to write this polynomial in the power form
P3 (x) = b0 + b1 x + b2 x2 + b3 x3 .
This requires repeatedly applying nested multiplication to a polynomial of the form
P (x) = a0 + x1 (x c1 ) + a1 (x c1 )(x c2 ) + a3 (x c1 )(x c2 )(x c3 ),
and for each application it will perform the following steps,
b3
b2
b1
b0
=
=
=
=
a3
a2 + (z c3 )b3
a1 + (z c2 )b2
a0 + (z c1 )b1 ,
Initially, we have
P (x) = 3 7(x + 1) + 8(x + 1)x 6(x + 1)x(x 1),
so the coefficients of P (x) in this Newton form are
a0 = 3, a1 = 7, a2 = 8, a3 = 6
with the centers
c1 = 1, c2 = 0, c3 = 1.
Applying nested multiplication to these coefficients and centers, with z = 0, yields
b3
b2
b1
b1
=
=
=
=
6
8 + (0 1)(6) = 14
7 + (0 0)(14) = 7
3 + (0 (1))(7) = 4.
It follows that
P (x) = 4 + (7)(x 0) + 14(x 0)(x (1)) + (6)(x 0)(x (1))(x 0)
= 4 7x + 14x(x + 1) 6x2 (x + 1),
For the second application of nested multiplication, we have the centers
c1 = 0, c2 = 1, c3 = 0
with coefficients
a0 = 4, a1 = 7, a2 = 14, a3 = 6.
Applying nested multiplication to these coefficients and centers, with z = 0, yields
b3
b2
b1
b1
=
=
=
=
6
14 + (0 0)(6) = 14
7 + (0 (1))(14) = 7
4 + (0 0)(7) = 4.
It follows that
P (x) = 4 + 7(x 0) + 14(x 0)(x 0) + (6)(x 0)(x 0)(x (1))
= 4 + 7x + 14x2 6x2 (x + 1), n
For the third and final application of nested multiplication, we have the centers
c1 = 0, c2 = 0, c3 = 1
with coefficients
a0 = 4, a1 = 7, a2 = 14, a3 = 6.
Applying nested multiplication to these coefficients and centers, with z = 0, yields
b3
b2
b1
b1
=
=
=
=
6
14 + (0 (1))(6) = 8
7 + (0 0)(8) = 7
4 + (0 0)(7) = 4.
It follows that
P (x) = 4 + 7(x 0) + 8(x 0)(x 0) + (6)(x 0)(x 0)(x 0)
= 4 + 7x + 8x2 6x3 ,
and the centers are now 0, 0 and 0. Since all of the centers are equal to zero, the polynomial is now in
the power form.
2. Interpolation
In this chapter, we consider the interpolation problem. Suppose we do not know the function f , but
a few information (data) about f . Now we try to compute a function g that approximates f .
2.1. Polynomial Interpolation. The polynomial interpolation problem, also called Lagrange interpolation, can be described as follows: Given (n+1) data points (xi , yi ), i = 0, 1, , n find a polynomial
P of lowest possible degree such
yi = P (xi ),
i = 0, 1, , n.
Such a polynomial is said to interpolate the data. Here yi may be the value of some unknown function
f at xi , i.e. yi = f (xi ).
One reason for considering the class of polynomials in approximation of functions is that they uniformly
approximate continuous function.
Theorem 2.1 (Weierstrass Approximation Theorem). Suppose that f is defined and continuous on
[a, b]. For any > 0, there exists a polynomial P (x) defined on [a, b] with the property that
|f (x) P (x)| < ,
x [a, b].
Another reason for considering the class of polynomials in approximation of functions is that the
derivatives and indefinite integrals of a polynomial are easy to compute.
Theorem 2.2 (Existence and Uniqueness). Given a real-valued function f (x) and n + 1 distinct points
x0 , x1 , , xn , there exists a unique polynomial Pn (x) of degree n which interpolates the unknown
f (x) at points x0 , x1 , , xn .
Proof. Existence: Let (xi , f (xi )), i = 0, 1, , n. We prove the result by the mathematical induction.
The Theorem clearly holds for n = 0, only one data point is given and we can take constant polynomial
P0 (x) = f (x0 ), x.
Assume that the Theorem holds for n k, i.e. there is a polynomial Pk with degree k such that
Pk (xi ) = f (xi ), for 0 i k.
Now we try to construct a polynomial of degree at most k + 1 to interpolate (xi , f (xi )), 0 i k + 1.
Let
Pk+1 (x) = Pk (x) + c(x x0 )(x x1 ) (x xk ).
For x = xk+1 ,
Pk+1 (xk+1 ) = f (xk+1 ) = Pk (xk+1 ) + c(xk+1 x0 )(xk+1 x1 ) (xk+1 xk )
f (xk+1 ) Pk (xk+1 )
.
(xk+1 x0 )(xk+1 x1 ) (xk+1 xk )
Since xi are distinct, the polynomial Pk+1 (x) is well-defined and degree of Pk+1 k + 1. Now
= c =
(2.1)
where a and b are arbitrary constants satisfying the interpolating conditions f (x0 ) = P (x0 ) and
f (x1 ) = P (x1 ). We have
f (x0 ) = P (x0 ) = ax0 + b
f (x1 ) = P (x1 ) = ax1 + b.
Lagrange interpolation: Solving for a and b, we obtain
f (x0 ) f (x1 )
a=
x0 x1
f (x0 )x1 f (x1 )x0
b=
x1 x0
Substituting these values in equation (2.1), we obtain
P (x) =
f (x0 ) f (x1 )
f (x0 )x1 f (x1 )x0
x+
x0 x1
x1 x0
x x1
x x0
f (x0 ) +
f (x1 )
x0 x1
x1 x0
= P (x) = l0 (x)f (x0 ) + l1 (x)f (x1 )
x x1
x x0
where l0 (x) =
and l1 (x) =
.
x0 x1
x1 x0
These functions l0 (x) and l1 (x) are called the Lagrange Fundamental Polynomials and they satisfy the
following conditions.
l0 (x) + l1 (x) = 1.
l0 (x0 ) = 1, l0 (x1 ) = 0
l1 (x0 ) = 0, l1 (x1 ) = 1
1, i = j
= li (xj ) = ij =
0, i 6= j.
Newtons divided difference interpolation: Again write P (x) in different way as following
x x0
x x1
f (x0 ) +
f (x1 )
P (x) =
x0 x1
x1 x0
f (x0 )(x x1 ) f (x1 )(x x0 )
=
x0 x1
f (x1 ) f (x0 )
= f (x0 ) + (x x0 )
x1 x0
= f (x0 ) + (x x0 )f [x0 , x1 ].
= P (x) =
f (x1 ) f (x0 )
, is called first divided difference of f (x).
x1 x0
Higher-order interpolation: In this section we take a different approach and assume that the
interpolation polynomial is given as a linear combination of n + 1 polynomials of degree n. This time,
we set the coefficients as the interpolated values, {f (xi )}ni=0 , while the unknowns are the polynomials.
We thus let
n
X
Pn (x) =
f (xi )li (x),
The ratio f [x0 , x1 ] =
i=0
where li (x) are n + 1 polynomials of degree n. Note that in this particular case, the polynomials li (x)
are precisely of degree n (and not n). However, Pn (x), given by the above equation may have a
lower degree. In either case, the degree of Pn (x) is n at the most. We now require that Pn (x) satisfies
the interpolation conditions
Pn (xj ) = f (xj ), 0 j n.
By substituting xj for x we have
Pn (xj ) =
n
X
i=0
1, i = j
0, i =
6 j.
Each polynomial li (x) has n + 1 unknown coefficients. The conditions given above through delta
provide exactly n + 1 equations that the polynomials li (x) must satisfy and these equations can be
solved in order to determine all li (x)s. Fortunately there is a shortcut. An obvious way of constructing
polynomials li (x) of degree n that satisfy the condition is the following:
(x x0 )(x x1 ) (x xi1 )(x xi+1 ) (x xn )
.
(xi x0 )(xi x1 ) (xi xi1 )(xi xi+1 ) (xi xn )
The uniqueness of the interpolating polynomial of degree n given n + 1 distinct interpolation points
implies that the polynomials li (x) given by above relation are the only polynomials of degree n.
li (x) =
Note that the denominator does not vanish since we assume that all interpolation points are distinct.
We can write the formula for li (x) in a compact form using the product notation.
li (x) =
=
where
W (x) = (x x0 ) (x xi1 )(x xi )(x xi+1 ) (x xn )
W 0 (xi ) = (xi x0 ) (xi xi1 )(xi xi+1 ) (xi xn ).
We can write the Newton divided difference formula in the following fashion (and we will prove in
next Theorem).
Pn (x) = f (x0 ) + (x x0 )f [x0 , x1 ] + (x x0 )(x x1 )f [x0 , x1 , x2 ] +
+ (x x0 )(x x1 ) (x xn1 )f [x0 , x1 , , xn ]
= f (x0 ) +
n
X
f [x0 , x1 , , xi ]
i=1
i1
Y
(x xj ).
j=0
x2 x0
x2 x1
x1 x0
f (x0 )
f (x1 )
f (x2 )
=
+
+
(x0 x1 )(x0 x2 ) (x1 x0 )(x1 x2 ) (x2 x0 )(x2 x1 )
In general
f [x1 , x2 , , xn ] f [x0 , x1 , , xn1 ]
f [x0 , x1 , , xn ] =
xn x0
n
X
f (xi )
=
n
Q
i=0
(xi xj )
j=0
Example 2. Given the following four data points. Find a polynomial in Lagrange and Newton form
xi 0 1 3 5
yi 1 2 6 7
to interpolate the data.
Sol. The Lagrange functions are given by
(x 1)(x 3)(x 5)
1
l0 (x) =
= (x 1)(x 3)(x 5).
(0 1)(0 3)(0 5)
15
(x 0)(x 3)(x 5)
1
= (x 0)(x 3)(x 5).
(1 0)(1 3)(1 5)
8
(x 0)(x 1)(x 5)
1
l2 (x) =
= (x)(x 1)(x 5).
(3 0)(3 1)(3 5)
12
(x 0)(x 1)(x 3)
1
l3 (x) =
= (x)(x 1)(x 3).
(5 0)(5 1)(5 3)
40
The interpolating polynomial in the Lagrange form is
l1 (x) =
Sol. If f (x) = x x2 then our nodes are [x0 , x1 , x2 ] = [0, x1 , 1] and f (x0 ) = 0, f (x1 ) = x1 x21
and f (x2 ) = 0. Therefore
l0 (x) =
(x x1 )(x 1)
(x x1 )(x x2 )
=
,
(x0 x1 )(x0 x2 )
x1
(x x0 )(x x2 )
x(x 1)
=
,
(x1 x0 )(x1 x2 )
x1 (x1 1)
x(x 1)
(x x0 )(x x1 )
=
.
l2 (x) =
(x2 x0 )(x2 x1 )
(1 x1 )
l1 (x) =
p
x(x 1)
x x2 + p
.
x1 (1 x1 )
x21 x1 = 1/9
or
q
5
which gives x1 = 12 36
or x1 =
The largest of these is therefore
1
2
(x 1/2)2 = 5/36
q 1
5
+ 36
.
r
1
5
x1 = +
0.8727.
2
36
Theorem 2.3. The unique polynomial of degree n that passes through (x0 , y0 ), (x1 , y1 ), , (xn , yn )
is given by
Pn (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xn ](x x0 )(x x1 ) (x xn1 )
Proof. We prove it by induction. The unique polynomial of degree 0 that passes through (x0 , y0 )
is obviously
P0 (x) = y0 = f [x0 ].
Suppose that the polynomial Pk (x) of order k that passes through (x0 , y0 ), (x1 , y1 ), , (xk , yk ) is
Pk (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xk ](x x0 )(x x1 ) (x xk1 )
Write Pk+1 (x), the unique polynomial of order (degree) k that passes through (x0 , y0 ), (x1 , y1 ), , (xk , yk )(xk+1 , yk
by
Pk+1 (x) = f [x0 ] + f [x0 , x1 ](x x0 ) + f [x0 , x1 , x2 ](x x0 )(x x1 ) + +
f [x0 , , xk ](x x0 )(x x1 ) (x xk1 ) + C(x x0 )(x x1 ) (x xk1 )(x xk )
We only need to show that
C = f [x0 , x1 , , xk , xk+1 ].
For this, let Qk (x) be the unique polynomial of degree k that passes through (x1 , y1 ), , (xk , yk )(xk+1 , yk+1 ).
Define
x x0
R(x) = Pk (x) +
[Qk (x) Pk (x)]
xk+1 x0
Then,
R(x) is a polynomial of degree k + 1.
R(x0 ) = Pk (x0 ) = y0 ,
xi x0
(Qk (xi ) Pk (xi )) = Pk (xi ) = yi , i = 1, , k,
R(xi ) = Pk (xi ) +
xk+1 x0
R(xk+1 ) = Qk (xk+1 ) = yk+1 .
By the uniqueness, R(x) = Pk+1 (x).
The leading coefficient of Pk+1 (x) is C.
x x0
The leading coefficient of R(x) is the leading coefficient of
[Qk (x) Pk (x)] which is
xk+1 x0
1
(leading coefficient of Qk (x) - leading coefficient of Pk (x)).
xk+1 x0
On the other hand, the leading coefficient of Qk (x) is f [x1 , , xk+1 ], and the leading coefficient of
Pk (x) is f [x0 , , xk ]. Therefore
C=
Since Pn (xi ) = f (xi ) for i = 0, 1, , n, the function g has n + 1 distinct zeros in [a, b]. By the
generalized Rolles Theorem there exists (a, b) such that
g (n) () = f (n) () Pn(n) () = 0.
Here
Pn(n) (x) = n! f [x0 , x1 , , xn ].
Therefore
f [x0 , x1 , , xn ] =
f (n) ()
.
n!
Truncation error: The polynomial P (x) coincides with f (x) at all nodal points and may deviates at
other points in the interval. This deviation is called the truncation error and we write
En (f ; x) = f (x) P (x).
Theorem 3.2. Suppose that x0 , x1 , , xn are distinct numbers in [a, b] and f C n+1 [a, b]. Let Pn (x)
be the unique polynomial of degree n that passes through n + 1 nodal points then
x [a, b], (a, b)
such that
En (f ; x) = f (x) Pn (x) =
(x x0 ) (x xn ) (n+1)
f
().
(n + 1)!
Proof. Let x0 , x1 , , xn are distinct numbers in [a, b] and f C n+1 [a, b]. Let Pn (x) be the unique
polynomial of degree n that passes through n + 1 nodal points.
The truncation error in interpolation is given by
En (f ; x) = f (x) Pn (x).
En (f ; xi ) = 0, i = 0, 1, , n.
Now for any t in the domain, define
g(t) = f (t) P (t) [f (x) P (x)]
(t x0 ) (t xn )
(x x0 ) (x xn )
(3.1)
Now g(t) = 0 at t = x, x0 , x1 , ....., xn . Therefore g(t) satisfy the conditions of Rolles Theorem which
states that between n + 2 zeros of a function, there is at least one zero of (n + 1)th derivative of the
function. Hence there exists a point such that
g (n+1) () = 0
where is some point such that
min(x0 , x1 , , xn , x) < < max(x0 , x1 , , xn , x).
Now differentiate (3.1) (n + 1) times, we get
(n + 1)!
(x x0 ) (x xn )
(n + 1)!
= f (n+1) (t) [f (x) P (x)]
(x x0 ) (x xn )
(x x0 ) (x xn ) (n+1)
f
()
(n + 1)!
(x x0 ) (x xn ) (n+1)
f
().
(n + 1)!
10
n
Y
|x xi | 1, = | (x xi )| 1, x [0, 1].
i=0
Hence
|f (x) P (x)| =
n
Y
1
1
|f (10) ()| | (x xi )|
.
10!
10!
i=0
n
P
lk (x)f (xk ),
k=0
n
P
k=0
lk (0)xn+1
.
k
Let f (x) =
xn+1 ,
x
n+1
n
X
+
lk (x)xn+1
k
k=0
= x
n+1
n
X
(x x0 ) (x xn )
(n + 1)!
(n + 1)!
lk (x)xn+1
+ (x x0 ) (x xn ).
k
k=0
lk (0)xn+1
= (1)n x0 x1 xn .
k
k=0
The next example illustrates how the error formula can be used to prepare a table of data that will
ensure a specified interpolation error within a specified bound.
Example 6. Suppose a table is to be prepared for the function f (x) = ex , for x in [0, 1]. Assume
the number of decimal places to be given per entry is d 8 and that the difference between adjacent
xvalues, the step size, is h. What step size h will ensure that linear interpolation gives an absolute
error of at most 106 for all x in [0, 1]?
Sol. Let x0 , x1 , . . . be the numbers at which f is evaluated, x be in [0, 1], and suppose i satisfies
xi x xi+1 .
The error in linear interpolation is
1 2
|f 2 ()|
|f (x) P (x)| = f ()(x xj )(x xj+1 ) =
|(x xi )||(x xi+1 )|.
2
2
The step size is h, so xi = ih, xi+1 = (i + 1)h, and
1
|f (x) p(x)| |f 2 ()| |(x ih)(x (i + 1)h|.
2
Hence
1
|f (x) p(x)| max e
max |(x ih)(x (i + 1)h|
xi xxi+1
2 [0,1]
11
e
max |(x ih)(x (i + 1)h|.
2 xi xxi+1
Consider the function g(x) = (x ih)(x (i + 1)h), for ih x (i + 1)h. Because
h
),
2
the only critical point for g is at x = ih + h/2, with g(ih + h/2) = (h/2)2 = h2 /4. Since g(ih) = 0 and
g((i + 1)h) = 0, the maximum value of |g 0 (x)| in [ih, (i + 1)h] must occur at the critical point which
implies that
e
e h2
eh2
|f (x) p(x)|
max |g(x)|
=
.
2 xi xxi+1
2 4
8
Consequently, to ensure that the the error in linear interpolation is bounded by 106 , it is sufficient
for h to be chosen so that
eh2
106 .
8
This implies that h < 1.72 103 .
Because n = (1 0)/h must be an integer, a reasonable choice for the step size is h = 0.001.
Example 7. Determine the step size h that can be used in the tabulation of a function f (x), a x b,
at equally spaced nodal points so that the truncation error of the quadratic interpolation is less than .
Sol. Let x0 , x1 , x2 are three eqispaced points with space h. The truncation error of the quadratic
interpolation is given by
E2 (f ; x)
M
max |(x x0 )(x x1 )(x x2 )|
3! axb
c(i) c(i 1)
x(i) x(i k)
end for
end for
.
3 3
Truncation error
12
n
X
k=1
n
X
f [x0 , x1 , , xk ] (x x0 ) (x xk1 )
f [x0 , x1 , , xk ] (s 0)h (s 1)h (s k + 1)h
k=1
= f (x0 ) +
n
X
f [x0 , x1 , , xk ] s(s 1) (s k + 1) hk
k=1
= f (x0 ) +
n
X
k=1
s k
f [x0 , x1 , , xk ] k!
h
k
1
f (x1 ) f (x0 )
= 4f (x0 )
x1 x0
h
f [x1 , x2 ] f [x0 , x1 ]
f [x0 , x1 , x2 ] =
=
x2 x0
1
h 4f (x1 )
h1 4f (x0 )
1
=
42 f (x0 )
2h
2!h2
In general
f [x0 , x1 , , xk ] =
1
4k f (x0 ).
k!hk
Therefore
n
X
s
Pn (x) = f (x0 ) +
4k f (x0 ).
k
k=1
xn x0
, xi = xn (n i)h, i = n, n 1, , 0.
n
13
Let x = xn + sh.
Therefore
Pn (x) = f (xn ) +
n
X
k=1
= f (xn ) +
= f (xn ) +
n
X
k=1
n
X
k=1
1
f (xn )
h
1
2 f (xn )
f [xn , xn1 , xn2 ] =
2!h2
f [xn , xn1 ] =
In general
1
k f (xn ).
k!hk
Therefore by using the backward-difference operator, the Newton backward divided-difference formula
can be written as
n
X
s
(1)k k f (xn ).
Pn (x) = f (xn ) +
k
f [xn , xn1 , xn2 , xnk ] =
k=1
14
Now
f (0.25) P (0.25) = 1.655.
Differences and Derivatives:
Since
f (x) = f (x + h) f (x)
h2 00
0
= f (x) + hf (x) + f (x) + f (x)
2
0
= hf (x) + O(h)
hf 0 (x).
Similarly
2 f (x) = f (x + 2h) 2f (x + h) + f (x)
(2h)2 00
0
= f (x) + 2hf (x) +
f (x) +
2
h2
2 f (x) + hf 0 (x) + f 00 (x) + + f (x)
2
= h2 f 00 (x) + h3 f 000 (x) +
2 f (x)
.
h2
Similarly we can obtain higher-order derivatives.
= f 00 (x) =
is a minimum. A residual is defined as the difference between the actual value of the dependent variable
and the value predicted by the model. Thus
ei = yi f (xi ).
Least square fit of a straight line: Suppose that we are given a data set (x1 , y1 ), (x2 , y2 ), , (xn , yn )
of observations from an experiment. We are interested in fitting a straight line of the form y = a + bx,
to the given data. Now residuals is given by
ei = yi (a + bxi ).
Note that ei is a function of parameters a and b. We need to find a and b such that
n
X
E=
e2i
i=1
yi = na + b
n
X
i=1
15
xi
(5.1)
i=1
X
E
=
[yi (a + bxi )](2xi ) = 0
b
i=1
n
X
x i yi = a
n
X
i=1
xi + b
i=1
n
X
x2i .
(5.2)
i=1
These equations (5.1-5.2) are called normal equations, which are to be solved to get desired values for
a and b.
Example 9. Obtain the least square straight line fit to the following data
x
0.2
0.4
0.6
0.8 1
f (x) 0.447 0.632 0.775 0.894 1
Sol. The normal equations for fitting a straight line y = a + bx are
5
X
f (xi ) = 5a + b
i=1
5
X
5
P
xi = 3,
i=1
i=1
5
P
i=1
5
X
xi
i=1
xi f (xi ) = a
5
X
xi + b
i=1
x2i = 2.2,
5
P
5
X
x2i
i=1
i=1
5
P
xi f (xi ) = 2.5224.
i=1
Therefore
5a + 3b = 3.748, 3a + 2.2b = 2.5224.
The solution of this system is a = 0.3392 and b = 0.684. The required approximation is y = 0.3392 +
0.684x.
5
P
Least square error=
[f (xi ) (0.3392 + 0.684xi )2 ] = 0.00245.
i=1
Example 10. Find the least square approximation of second degree for the discrete data
x
2 1 0 1 2
f (x) 15 1 1 3 19
Sol. We fit a second degree polynomial y = a + bx + cx2 .
By principle of least squares, we minimize the function
E=
5
X
[yi (a + bxi + cx2i )]2 .
i=1
f (xi ) = 5a + b
i=1
5
X
i=1
xi f (xi ) = a
5
X
xi + c
i=1
5
X
i=1
xi + b
5
X
x2i
i=1
5
X
i=1
x2i + c
5
X
i=1
x3i
16
x2i f (xi )
=a
i=1
We have
5
P
xi = 0,
i=1
5
P
i=1
x2i
+b
i=1
4
P
x2i = 10,
5
X
i=1
x3i = 0,
5
P
i=1
x4i = 34,
5
X
i=1
5
P
x3i
+c
5
X
x4i .
i=1
f (xi ) = 39,
i=1
5
P
xi f (xi ) = 10,
i=1
5
P
i=1
x2i f (xi ) =
140.
From given data
5a + 10c = 39
10b = 10
10a + 34c = 140.
31
37
, b = 1, and c = .
The solution of this system is a =
35
7
1
The required approximation is y = (37 + 35x + 155x2 ).
35
Example 11. Use the method of least square to fit the curve f (x) = c0 x + c1 / x. Also find the least
x
0.2 0.3 0.5 1 2
f (x) 16 14 11 6 3
square error.
Sol. By principle of least squares, we minimize the error
E(c0 , c1 ) =
5
X
i=1
c1
[f (xi ) c0 xi ]2
xi
5
X
x2i
+ c1
i=1
c0
5
X
xi =
i=1
5
X
xi + c1
xi f (xi )
i=1
5
5
X
X
1
f (xi )
=
.
xi
xi
i=1
i=1
5
X
i=1
We have
5
X
xi = 4.1163,
5
5
X
X
1
= 11.8333,
x2i = 5.38
xi
i=1
i=1
5
X
xi f (xi ) = 24.9,
i=1
i=1
5
X
i=1
f (xi )
= 85.0151.
xi
7.5961
1.1836x.
x
5
X
i=1
7.5961
[f (xi )
+ 1.1836xi ]2 = 1.6887
xi
Example 12. Obtain the least square fit of the form y = abx to the following data
17
x
1
2
3
4
5
6
7
8
f (x) 1.0 1.2 1.8 2.5 3.6 4.7 6.6 9.1
Sol. The curve y = abx takes the form Y = A + Bx after taking log, where Y = log y, A = log a and
B = log b.
Hence the normal equations are given by
8
X
Yi = 8A + B
i=1
8
X
8
X
xi
i=1
xi Yi = A
i=1
x
X
xi + B
i=1
8
X
x2i
i=1
= A = 0.1656,
= a = 0.68,
x
The required curve is y = (0.68)(1.38) .
B = 0.1407
b = 1.38
Example 13. We are given the values of a function of the variable t. Obtain a least square fit of the
t
0.1 0.2 0.3 0.4
f (t) 0.76 0.58 0.44 0.35
form f = ae3t + be2t .
Sol. Using the method of least square, we minimize the error
E=
4
X
i=1
X
E
=
(fi ae3ti be2ti )e3ti = 0
a
E
=
b
a
4
X
i=1
i=1
4
X
i=1
e6ti +
4
X
i=1
e5ti
4
X
i=1
fi e2ti = 0
18
4
X
i=1
5ti
+b
4
X
4ti
4
X
i=1
fi e2ti = 0
i=1
3 0 61 19
The normal equations are,
0 = 6a 3b
61 = 3a + 19b.
By solving a = 1.7428 and b = 3.4857.
Therefore equation of the line is v = 1.7428 + 3.4857u.
Changing in to original variable, we obtain
x 15
y 20 = 1.7428 + 3.4857
5
= y = 11.2857 + 0.6971x.
Exercises
(1) Find the unique polynomial P (x) of degree 2 or less such that
P (1) = 1, P (3) = 27, P (4) = 64
using Lagrange and Newton interpolation. Evaluate P (1.05).
(2) Let P3 (x) be the Lagrange interpolating polynomial for the data (0, 0), (0.5, y), (1, 3) and (2, 2).
Find y if the coefficient of x3 in P3 (x) is 6.
(3) Calculate a quadratic interpolate in Newton form to e0.826 from the function values
e0.82 = 2.270500, e0.83 = 2.293319, e0.84 = 2.316367.
(4) Let f (x) = ln(1 + x), x0 = 1, x1 = 1.1. Use Lagrange linear interpolation to find the
approximate value of f (1.04) and obtain a bound on the truncation error.
(5) Use the following values and four-digit rounding arithmetic to construct a third degree Lagrange
polynomial approximation to f (1.09). The function being approximated is f (x) = log10 (tan x).
Use this knowledge to find a bound for the error in the approximation.
f (1.00) = 0.1924, f (1.05) = 0.2414, f (1.10) = 0.2933, f (1.15) = 0.3492.
19
(6) Determine the step size h that can be used in the tabulation of a function f (x), a x b, at
equally spaced nodal points so that the truncation error of the cubic interpolation is less than
.
(7) If linear interpolation is used to interpolate the error function
Z x
2
2
f (x) =
ex dt,
0
show that the
error of linear interpolation using data (x0 , f0 ) and (x1 , f1 ) cannot exceed
(x1 x0 )2 /2 2e.
(8) Suppose that f (x) = ex cos x is to be approximated on [0, 1] by an interpolating polynomial on
n + 1 equally spaced points. Determine n so that the truncation error will be less than 0.0001
in this interval.
(9) The following data represents the function f (x) = ex .
x
1
1.5
2.0
2.5
f (x) 2.7183 4.4817 7.3891 12.1825
Estimate the value of f (2.25) using the Newtons forward and backward difference interpolation.
Compare with the exact value. Also obtain the bound of the truncation error.
(10) Construct the interpolating polynomial that fits the following data using Newton forward and
backward difference interpolation.
x
0
0.1
0.2
0.3
0.4
0.5
f (x) 1.5 1.27 0.98 0.63 0.22 0.25
Hence find the values of f (x) at x = 0.15 and 0.45.
(11) The error function erf (x) is defined by the integral
Z x
2
2
et dt.
erf (x) =
0
(A) Approximate erf (0.08) by linear interpolation in the given table of correctively rounded
values. Estimate the total error.
x
0.05
0.10
0.15
0.20
erf (x) 0.05637 0.11246 0.16800 0.22270
(B) Suppose that the table were given with 7 correct decimals and with the step size 0.001.
Find the maximum total error for linear interpolation in the interval 0 x 0.10
in this table.
(12) Determine the spacing h in a table of equally spaced values of the function f (x) = x between
1 and 2, so that interpolation with a quadratic polynomial will yield an accuracy of 5 108 .
sin x
(13) The following data are parts of a table for function g(x) = 2 .
x
x
0.1
0.2
0.3
0.4
0.5
f (x) 9.9833 4.9667 3.2836 2.4339 1.9177
Calculate g(0.25) as accurately as possible
(a) by interpolating directly in this table, (b) by first calculating xg(x) and then interpolating
directly in that table, (c) explain the difference between the results obtained in (a) and (b),
respectively.
(14) By the method of least square fit a curve of the form y = axb to the following data
x
2
3
4
5
y 27.8 62.1 110 161
(15) Determine the least squares approximation of the type ax2 + bx + c, to the function 2x at the
points xi = 0, 1, 2, 3, 4.
(16) Experiments with a periodic process gave the following data :
t
0
50
100
150
200
y 0.754 1.762 2.041 1.412 0.303
Estimate the parameter a and b in the model y = a + b sin t, using the least square approximation.
20
Bibliography
[Gerald]
Curtis F. Gerald and Patrick O. Wheatley Applied Numerical Analysis, 7th edition,
Pearson, 2003.
[Atkinson] K. Atkinson and W. Han. Elementary Numerical Analysis, 3rd edition, John Willey
and Sons, 2004.
CHAPTER 5 (4 LECTURES)
NUMERICAL INTEGRATION
1. Introduction
The general problem is to find the approximate value of the integral of a given function f (x) over
an interval [a, b]. Thus
Z b
f (x)dx.
(1.1)
I=
a
Problem can be solved by using the Fundamental Theorem of Calculus by finding an anti-derivative
F of f , that is, F 0 (x) = f (x), and then
Z b
f (x)dx = F (b) F (a).
a
But finding an anti-derivative is not an easy task in general. Hence, it is certainly not a good approach
for numerical computations.
In this chapter well study methods for finding integration rules. Well also consider composite versions
of these rules and the errors associated with them.
2. Elements of numerical integration
The basic method involved in approximating the integration is called numerical quadrature and uses
a sum of the type
Z b
f (x)dx i f (xi ).
(2.1)
a
The method of quadrature is based on the polynomial interpolation. We divide the interval [a, b] in to
a set of distinct nodes {x0 , x1 , x2 , , xn }. Then we approximate the function f (x) by an interpolating
polynomial, say Lagrange interpolating polynomial is used to approximate f (x), i.e.
f (x) = Pn (x) + en
n
n
X
f (n+1) () Y
=
f (xi )li (x) +
(x xi ).
(n + 1)!
i=0
i=0
li (x) =
j=0, j6=i
x xj
, 0 i n.
xi xj
Therefore
Z
f (x)dx =
a
=
=
Z
Pn (x)dx +
a
n
X
i=0
n
X
en (x)dx
a
Z
f (xi )
li (x)dx +
a
1
(n + 1)!
i f (xi ) + En
i=0
where
Z
i =
li (x)dx.
a
1
Z
a
f (n+1) ()
n
Y
(x xi )dx
i=0
NUMERICAL INTEGRATION
(n+1)
n
Y
() (x xi )dx.
i=0
Z bY
n
(x xi )dx.
M
(n + 1)!
a i=0
We can also use Newton divided difference interpolation to approximate the function f (x).
Before we proceed, we define an alternative method to analyze error which is based on the method of
undetermined coefficients.
Definition 2.1. An integration method of the form
Z
f (x)dx =
a
n
X
1
i f (xi ) +
(n + 1)!
i=0
(n+1)
()
n
Y
(x xi )dx
i=0
is said to be of order p if it provides exact results for all polynomials of degree less than or equal to p.
Now if the above method gives exact results for polynomials of degree less than or equal n, then the
error term will be zero for all polynomials of degree n.
IF |f (n+1) ()| M then error term can be written as
En
=
M
(n + 1)!
Z bY
n
(x xi )dx
a i=0
C
M.
(n + 1)!
n+1
dx =
n
X
i xi n+1 +
i=0
b
Z
= C =
n+1
dx
C
(n + 1)!
(n + 1)!
n
X
i xi n+1 .
i=0
The number C is called error constant. By using the notation, we can write error term as following
En =
C
f (n+1) ().
(n + 1)!
3. Newton-Cotes Formula
ba
Let all nodes are equally spaced with spacing h =
. The number h is also called the step
n
length.
Let x0 = a and xn = b then xi = a + ih, i = 0, 1, , n.
The general quadrature formula is given by
Z
f (x)dx =
a
n
X
i=0
i f (xi ) + En .
NUMERICAL INTEGRATION
j=0, j6=i
n
Y
j=0, j6=i
(a + ht) (a + jh)
(a + ih) (a + jh)
tj
ij
Therefore
b
Z
i =
li (x)dx
a
Z
= h
0
n
Y
j=0, j6=i
tj
dt
ij
(dx = hdt).
For n=1. x0 = a, x1 = b, and h = b a and we use linear interpolation. The values of the multipliers
are given by
Z 1
t1
dt = h/2.
0 = h
0 01
Z 1
t0
1 = h
dt = h/2.
0 10
Hence
Z b
f (x)dx = 0 f (x0 ) + 1 f (x1 )
a
h
[f (a) + f (b)].
2
This is called the Trapezoidal rule. Now error is given by
Z
1 b 00
E1 =
f ()(x a)(x b)dx
2 a
=
Since (x a)(x b) does not change its sign in [a, b], therefore by the Weighted Mean-Value Theorem,
there exists (a, b) such that
Z b
1 00
E1 =
f ()
(x a)(x b)dx
2
a
1
3
00
= f () (b a)
6
3
h
= f 00 ().
12
Trapezoidal rule (with error) is given by
Z b
h
h3
f (x)dx = [f (a) + f (b)] f 00 ().
2
12
a
Geometrically, it is the area of Trapezium (Trapezoid) with width h and ordinates f (a) and f (b).
For n=2. We take x0 = a, x1 = a+b
2 , x2 = b. The values of the multipliers are given by
Z 2
(t 1)(t 2)
0 = h
dt = h/3
0 (0 1)(0 2)
Z 2
(t 0)(t 2)
1 = h
dt = 4h/3
0 (1 0)(1 2)
NUMERICAL INTEGRATION
2
Z
2 = h
0
(t 0)(t 1)
dt = h/3.
(2 0)(2 1)
Hence
Z
a
E3 =
ba
For n=3. Three nodal points are a = x0 , x1 , x2 , x3 = b with h =
. We get the Simpsons 83 rule
3
Z b
3h
f (x)dx =
[f (x0 ) + 3f (x1 ) + 3f (x2 ) + f (x3 )].
8
a
Error in Simpsons three-eighth rule is given by
3
E4 = h5 f (4) ().
80
Example 1. Find the value of the integral
Z 1
dx
I=
0 1+x
using trapezoidal and Simpsons rule. Also obtain a bound on the errors. Compare with exact value.
Sol.
f (x) =
1
1+x
By trapezoidal rule
IT = h/2[f (a) + f (b)]
Here a = 0, b = 1, h = b a = 1.
I = 1/2[1 + 1/2] = 0.75
NUMERICAL INTEGRATION
Exact value
Iexact = ln 2 = 0.693147
Error= |0.75 0.693147| = 0.056853
The error bound for the trapezoidal rule is given by
E1 h3 /12 max |f 00 ()|
0x1
2
1/12 max
0x1 (1 + x)3
1/6
Similarly by using Simpsons rule with h = (b a)/2 = 1/2, we obtain
IS = h/3[f (0) + 4f (1/2) + f (1)] = 1/6(1 + 8/3 + 1/2) = 0.69444
Error= |0.75 0.69444| = 0.001297.
The error bound for the Simpsons rule is given by
h5
max |f 0000 ()|
90 0x1
24
1
max
E3
x x3
0
and compare with the exact value.
Sol. We make the method exact for polynomials up to degree 2.
Z 1
dx
p
f (x) = 1 : I1 =
= 1 + 2 + 3
x(1 x)
0
Z 1
xdx
p
f (x) = x : I2 =
= 1/22 + 3
x(1 x)
0
Z 1
x2 dx
2
p
= 1/42 + 3
f (x) = x : I3 =
x(1 x)
0
Now
Z 1
Z 1
Z 1
dx
dx
dt
p
p
I1 =
=
=
= [sin1 t]11 =
2
2
1t
x(1 x)
1 (2x 1)
0
0
1
Similarly
I2 = /2
I3 = 3/8.
Therefore
1 + 2 + 3 =
1/22 + 3 = /2
1/42 + 3 = 3/8
By solving these equations, we obtain 1 = /4, 2 = /2, 3 = /4. Hence
Z 1
f (x)
p
dx = /4[f (0) + 2f (1/2) + f (1)].
x(1 x)
0
NUMERICAL INTEGRATION
Z
I=
dx
=
x x3
dx
p
=
1 + x x(1 x)
f (x)dx
p
.
x(1 x)
Here f (x) = 1/ 1 + x.
By using the above formula, we obtain
#
2
2 2
= 2.62331.
I = /4 1 + +
2
3
"
i=0
i=0
The method has two unknowns 0 and x0 . Make the method exact for f (x) = 1, x, we obtain
Z 1
f (x) = 1 :
dx = 2 = 0
1
f (x) = x :
xdx = 0 = 0 x0 = x0 = 0.
1
f (x)dx = 2f (0).
1
C 00
f ()
2!
C=
x2 dx 2f (0) = 2/3.
Hence
1
E1 = f 00 (),
3
1 < < 1.
Two-point formula:
Z
NUMERICAL INTEGRATION
The method has four unknowns. Make the method exact for f (x) = 1, x, x2 , x3 , we obtain
Z 1
dx = 2 = 0 + 1
f (x) = 1 :
(4.1)
1
Z 1
f (x) = x
:
1
Z 1
f (x) = x2
:
1
Z 1
f (x) = x3
:
1
xdx = 0 = 0 x0 + 1 x1
(4.2)
(4.3)
x3 dx = 0 = 0 x30 + 1 x31
(4.4)
C (4)
f ()
4!
and
1
1
1
8
x4 dx f
+f
= .
45
3
3
1
The error in two-point formula is given by
Z
C=
1 (4)
f (), 1 < < 1.
135
Three-point formula: By taking n = 2, we obtain
Z 1
f (x)dx = 0 f (x0 ) + 1 f (x1 ) + 2 f (x2 ).
E3 =
The method has six unknowns. Make the method exact for f (x) = 1, x, x2 , x3 , x4 , x5 , we obtain
f (x) = 1
f (x) = x
:
:
2 = 0 + 1 + 2
0 = 0 x0 + 1 x1 + 2 x2
f (x) = x2
f (x) = x3
f (x) = x4
f (x) = x5
p
3/5, x1 = 0 and
By solving
these
equations,
we
obtain
=
5/9
and
=
8/9.
x
=
0
2
1
0
p
x2 = 3/5.
Therefore formula is given by
"
r !
r !#
Z 1
1
3
3
f (x)dx =
5f
+ 8f (0) + 5f
.
9
5
5
1
NUMERICAL INTEGRATION
C (6)
f ()
6!
where
Z
C=
1
x6 dx
1
5
9
3
5
!6
r
+80+5+
1
f (6) (),
15750
E5 =
3
5
!6
= 8 .
175
1 < < 1.
Example 3. Evaluate
Z
2x
dx
4
1 1+x
using Gauss-Legendre 1 and 2-point formula. Also compare with the exact value.
I=
Z
I=
1
2x
dx =
1 + x4
t+3
, dx = dt/3.
2
8(t + 3)
dt
16 + (t + 3)4
Let
f (t) =
8(t + 3)
.
16 + (t + 3)4
By 1-point formula
I = 2f (0) = 0.4948
By 2-point formula
1
1
+f
= 0.5434
I=f
3
3
Now exact value of the integral is given by
Z 2
2x
I=
dx = tan1 4 = 0.5408
4
4
1 1+x
Example 4. Evaluate
Z
I=
(1 x2 )
3/2
cos x dx
NUMERICAL INTEGRATION
ba
h=
and taking nodal points a = x0 < x1 < < xN = b where xi = x0 +i h, i = 1, 2, , N 1.
N
Now
Z b
f (x)dx
I =
Z xN
Z x2
Za x1
f (x)dx + +
f (x)dx.
f (x)dx +
=
x0
xN 1
x1
Now use trapezoidal rule for each of the integrals on the right side, we obtain
h
[(f0 + f1 ) + (f1 + f2 ) + + (fN 1 + fN )]
2
h
[f0 + 2(f1 + f2 + + fN 1 ) + fN ]
=
2
where fi = f (xi ), i = 0, 1, , N . This formula is composite trapezoidal rule. The error in the
composite integration is given by
I =
h3 00
[f (1 ) + f 00 (2 ) + + f 00 (N )]
12
where xi1 i xi , i = 1, 2, , N. The error in numerical approximation decrease as N increases
ba
as h =
.
N
Composite Simpsons Method: Simpsons rule require three abscissas. We divide the interval [a, b]
in to 2N (to get odd number of abscissas) subintervals with step size h = ba
2N and taking nodal points
a = x0 < x1 < < x2N = b where xi = x0 + i h, i = 1, 2, , 2N 1. We write
Z b
f (x)dx
I =
Z x4
Z x2N
Za x2
f (x)dx.
=
f (x)dx +
f (x)dx + +
E=
x0
x2N 2
x2
Now use Simpsons rule for each of the integrals on the right side to obtain
h
[(f0 + 4f1 + f2 ) + (f2 + 4f3 + f4 ) + + (f2N 2 + 4f2N 1 + f2N )]
3
h
=
[f0 + 4(f1 + f3 + + f2N 1 ) + 2(f2 + f4 + + f2N 2 ) + f2N ].
3
This formula is called composite Simpsons rule. The error in the integration rule is given by
I =
h5 (4)
[f (1 ) + f (4) (2 ) + + f (4) (N )]
90
i x2i , i = 1, 2, , N.
E=
where x2i2
1
1
+
x
0
by using the composite trapezoidal and Simpsons rule with 2 and 4 subintervals.
I=
Sol. Let IT and IS represent the values of the integral by composite trapezoidal and composite
Simpsons rule, respectively.
ba
Case I: Number of subintervals N = 2 then h =
= 1/2. Therefore we have two subintervals for
N
trapezoidal and one interval for Simpsons rule.
We have
1
IT = [f (0) + 2f (1/2) + f (1)] = 0.70833.
4
1
IS = [f (0) + 4f (1/2) + f (1)] = 0.69444.
6
10
NUMERICAL INTEGRATION
Case II: Number of subintervals N = 4 then h = 1/4. We have four subintervals for trapezoidal rule
and two subintervals for Simpsons rule.
1
IT = [f (0) + 2(f (1/4) + f (1/2) + f (3/4)) + f (1)] = 0.69702.
8
1
[f (0) + 4f (1/4) + 2f (1/2) + 4f (3/4) + f (1)] = 0.69325.
IS =
12
Example 6. Evaluate
Z 1
dx
I=
0 1+x
by subdividing the interval [0, 1] into two equal parts and then by using Gauss-Legendre three-point
formula.
Sol.
"
1
f (x)dx =
5f
9
1
r !
3
+ 8f (0) + 5f
5
r !#
3
.
5
Let
1
Z 1/2
Z 1
dx
dx
dx
I=
=
+
= I1 + I2 .
1+x
0 1+x
0
1/2 1 + x
t+1
z+3
Now substitute x =
and x =
in I1 and I2 , respectively to change the limits to [1, 1].
4
4
We have dx = dt/4 and dx = dz/4 for integral I1 and I2 , respectively.
Therefore
"
#
Z 1
1
5
5
dt
8
p
p
=
= 0.405464
I1 =
+ +
9 5 3/5 5 5 + 3/5
1 t + 5
"
#
Z 1
dz
1
5
8
5
p
p
I2 =
=
+ +
= 0.287682
9 7 3/5 7 7 + 3/5
1 z + 7
Z
Hence
I = I1 + I2 = 0.405464 + 0.287682 = 0.693146.
Example 7. The area A inside the closed curve y 2 + x2 = cos x is given by
Z
1/2
A=4
(cos x x2 ) dx
0
cos xk x2k
, k = 0, 1, 2,
sin xk + 2xk
x1 = 0.5 +
x2
x3
x4
NUMERICAL INTEGRATION
11
Using composite trapezoidal method by taking h = 0.824, 0.412, and 0.206 respectively, we obtain the
following approximations of the area A.
4(0.824)
[1 + 0.017753] = 1.67725
2
4(0.412)
A =
[1 + 2(0.864047) + 0.017753] = 2.262578
2
4(0.206)
A =
[1 + 2(0.967688 + 0.864047 + 0.658115) + 0.017753] = 2.470951.
2
Algorithm (Composite Trapezoidal Method):
A =
(1) Given
Z
I=
x ex dx.
Approximate the value of I using trapezoidal and Simpsons one-third method. Also obtain
the error bounds and compare with exact value of the integral.
(2) Evaluate
Z 1
dx
I=
2
0 1+x
using trapezoidal and Simpsons rule with 4 and 6 subintervals. Compare with the exact value
of the integral.
12
NUMERICAL INTEGRATION
(3) Compute
1
xp
dx
3
0 x + 10
for p = 0, 1 using trapezoidal and Simpsons rule with 3, 5 and 9 nodes.
(4) The length of the curve represented by a function y = f (x) on an interval [a, b] is given by the
integral
Z bp
I=
1 + [f 0 (x)]2 dx.
Z
Ip =
Use the trapezoidal rule and Simpsons rule with 4 and 8 subintervals to compute the length
of the curve y = tan1 (1 + x2 ), 0 x 2.
(5) Evaluate the integral
Z 1
2
ex cos x dx
1
by using the one and two point Gauss-Legendre formulas. Also obtain the bound for error for
one-point formula.
(6) Evaluate
Z 3
cos 2x
dx
2 1 + sin x
by using the two and three point Gauss-Legendre integration formulas.
(7) Determine the values of a, b, and c such that the formula
Z h
f (x)dx = h [af (0) + bf (h/3) + cf (h)]
0
is exact for polynomials of degree as high as possible. Also obtain the order of the truncation
error.
(8) Determine constants a, b, c, and d that will produce a quadrature formula
Z 1
f (x)dx = af (1) + bf (1) + cf 0 (1) + df 0 (1)
1
ln(x + 1) dx
p
.
x(1 x)
(10) Evaluate
Z
I=
0
sin x dx
2+x
by subdividing the interval [0, 1] into two equal parts and then by using Gauss-Legendre twopoint formula.
(11) The equation
Z x
1
2
et /2 dt = 0.45
2
0
can be solved for x by applying Newtons method to the function
Z x
1
1
2
2
et /2 dt 0.45 & f 0 (x) = ex /2 .
f (x) =
2
2
0
Note that Newtons method would require the evaluation of f (xk ) at various xk which can be
estimated using a quadrature formula. Find a solution for f (x) = 0 with error no more than
105 using Newtons method starting with x0 = 0.5 and by means of the Composite Simpsons
rule.
NUMERICAL INTEGRATION
13
the coefficients A1 , A0 , and A1 are functions of parameter a and x1 is a constant and the error
E is of the form Cf (k) (). Determine the values of all four parameters so that the error will
be of highest possible order. Also investigate if the order of the error is influenced by different
values of the parameter a.
Bibliography
[Atkinson]
[Jain]
K. Atkinson and W. Han. Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
M. K. Jain, S. R. K. Iyengar, and R. K. Jain. Numerical Methods for Scientific and
Engineering Computation, Sixth edition, New Age International Publishers, New Delhi,
2012.
CHAPTER 6 (4 LECTURES)
NUMERICAL SOLUTIONS OF ORDINARY DIFFERENTIAL EQUATIONS
1. Introduction
In this chapter, we discuss the numerical methods for solving the ordinary differential equations of
initial-value problems (IVP) of the form
dy
= f (x, y), x R, y(x0 ) = y0
(1.1)
dx
where y is a function of x, f is function of x and y, and x0 is called the initial value. The numerical
values of y(x) on an interval containing x0 are to be determined.
We divide the domain [a, b] in to subintervals
a = x0 < x1 < < xN = b.
These points are called mesh points or grid points. Let equal spacing is h. The uniform mesh points
are given by xi = x0 + ih, i = 0, 1, 2, ... The set of points y0 , y1 , , yN are the numerical solution of
the initial-value problem (IVP).
2. Existence and Uniqueness of Solutions
Theorem 2.1. If f (x, y) is continuous on a region where
= {(x, y), |x x0 | a, |y y0 | b}
then the IVP (1.1) has a solution y(x) for |x x0 | min{a, Mb }, where M = max |f (x, y)|.
(x,y)
f
are continuous on a region then the IVP (1.1) has a unique
x
solution y(x) in the interval |x x0 | min{a, Mb }, where M = max |f (x, y)|.
(x,y)
1
[y(xi+1 ) y(xi )] = f (xi , yi )
h
= y(xi+1 ) y(xi ) = hf (xi , yi )
1
which gives
y(xi+1 ) = y(xi ) + hf (xi , yi ).
We can write
xi+1 = xi + h
yi+1 = yi + hf (xi , yi )
where yi = y(xi ). This is called Eulers method.
3.2. The Improved or Modified Eulers method. A better approximations of the slope is the
average of two slopes at points (xi , yi ) and (xi+1 , yi+1 ) and we can write the Eulers method as
h
Predictor
(0)
(0)
(2)
(1)
(3)
(2)
(3)
y1 = y1 = y1 = y(0.2) = 1.2309
Now
y1 = 1.2309, h = 0.2, x1 = 0.2, x2 = 0.4
(0)
y2
(1)
(0)
(2)
(1)
(3)
(2)
(0)
(1)
(3)
(2)
Example 3. Given the IVP y 0 = x2 y 1, y(0) = 1. By Taylor series method of order 4 with step size
0.1. Find y at x = 0.1 and x = 0.2.
h2 00
h3
h4
y (xi ) + y 000 (xi ) + y iv (xi ) + O(h5 )
2
3!
4!
Therefore
y(0.1) = 1 + (0.1)(1) + 0 + (0.1)3 (2)/6 (0.1)4 (6)/24 = 0.900033
Similarly
y(0.2) = 0.80227.
4.1. Runge-Kutta Methods: This is the one of the most important method to solve the IVP (1.1).
If we apply Taylors Theorem directly then we require that the function have higher-order derivatives.
The class of Runge-Kutta methods does not involve higher-order derivatives which is the advantage of
this class. Eulers method is an example of the Runge-Kutta method of first-order.
Now we discuss the formulation of second-order Runge-Kutta method which is actually modified Eulers
method.
By Taylors Theorem
h2
y(x + h) = y(x) + hy 0 (x) + y 00 (x) + O(h3 )
2
By differentiating y(x), we have
y 0 = f (x, y) = f
d
y 00 =
f = fx + fy y 0
dx
d
d
y 000 = fxx + f fy + fy f
dx
dx
= fxx + f (fyx + fyy f ) + fy (fx + fy f )
Therefore
y(x + h) = y(x) + hy 0 (x) +
h2 00
y (x) + O(h3 )
2
h2
(fx + fy f ) + O(h3 )
2
h
h2
+ f + (fx + fy f ) + O(h3 )
2
2
h
+ (f + hfx + hfy f ) + O(h3 )
2
= y + hf +
h
f
2
h
= y+ f
2
Now apply Taylors Theorem on f to evaluate
= y+
1
yi+1 = yi + (K1 + K2 )
2
end for
Third-order Runge-Kutta method:
1
yi+1 = yi + (K1 + 4K2 + K3 )
6
where
K1 = hf (xi , yi )
K2 = hf (xi + h/2, yi + K1 /2)
K3 = hf (xi + h, yi K1 + 2K2 )
and xi = x0 + ih. Fourth-order Runge-Kutta method:
1
yi+1 = yi + (K1 + 2K2 + 2K3 + K4 ) + O(h5 )
6
where
K1
K2
K3
K4
=
=
=
=
hf (xi , yi )
hf (xi + h/2, yi + K1 /2)
hf (xi + h/2, yi + K2 /2)
hf (xi + h, yi + K3 ).
y 2 x2
dy
= 2
with y0 = 1 at x = 0.2 and 0.4.
dx
y + x2
y 2 x2
, x0 = 0, y0 = 1, h = 0.2
y 2 + x2
K1 = hf (x0 , y0 ) = 0.2f (0, 1) = 0.200
h
K1
K2 = hf (x0 + , y0 +
) = 0.2f (0.1, 1.1) = 0.19672
2
2
K2
h
K3 = hf (x0 + , y0 +
) = 0.2f (0.1, 1.09836) = 0.1967
2
2
K4 = hf (x0 + h, y0 + K3 ) = 0.2f (0.2, 1.1967) = 0.1891
1
y1 = y0 + (K1 + 2K2 + 2K3 + K4 ) = 1 + 0.19599 = 1.196
6
Therefore y(0.2) = 1.196
Now x1 = x0 + h = 0.2
K1 = hf (x1 , y1 ) = 0.1891
f (x, y) =
h
K1
, y1 +
) = 0.2f (0.3, 1.2906) = 0.1795
2
2
h
K2
K3 = hf (x1 + , y1 +
) = 0.2f (0.3, 1.2858) = 0.1793
2
2
K4 = hf (x1 + h, y1 + K3 ) = 0.2f (0.4, 1.3753) = 0.1688
1
y2 = y(0.4) = y1 + (K1 + 2K2 + 2K3 + K4 ) = 1.196 + 0.1792 = 1.3752.
6
K2 = hf (x1 +
dy
dz
= 1 + xz = f (x, y, z),
= xy = g(x, y, z)
dx
dx
x0 = 0, y0 = 0, z0 = 1, h = 0.3
K1 = hf (x0 , y0 , z0 ) = 0.3f (0, 0, 1) = 0.3
L1 = hg(x0 , y0 , z0 ) = 0.3g(0, 0, 1) = 0
h
K1
L1
K2 = hf (x0 + , y0 +
, z0 +
) = 0.3f (0.15, 0.15, 1) = 0.346
2
2
2
h
K1
L1
L2 = hg(x0 + , y0 +
, z0 +
) = 0.00675
2
2
2
h
K2
L2
K3 = hf (x0 + , y0 +
, z0 +
) = 0.34385
2
2
2
h
K2
L2
L3 = hg(x0 + , y0 +
, z0 +
) = 0.007762
2
2
2
K4 = hf (x0 + h, y0 + K3 , z0 + L3 ) = 0.3893
L4 = hg(x0 + h, y0 + K3 , z0 + L3 ) = 0.03104
Hence
1
y1 = y(0.3) = y0 + (K1 + 2K2 + 2K3 + K4 ) = 0.34483
6
1
z1 = z(0.3) = z0 + (L1 + 2L2 + 2L3 + L4 ) = 0.9899
6
Example 7. Solve by using fourth-order Runge-Kutta method for x = 0.2.
2
dy
d2 y
=x
y 2 , y(0) = 1, y 0 (0) = 0.
2
dx
dx
Sol. Let
dy
= z = f (x, y, z)
dx
Therefore
dz
= xz 2 y 2 = g(x, y, z)
dx
Now x0 = 0, y0 = 1, z0 = 0, h = 0.2
K1 = hf (x0 , y0 , z0 ) = 0.0
L1 = hg(x0 , y0 , z0 ) = 0.2
K1
L1
h
, z0 +
) = 0.02
K2 = hf (x0 + , y0 +
2
2
2
h
K1
L1
L2 = hg(x0 + , y0 +
, z0 +
) = 0.1998
2
2
2
h
K2
L2
K3 = hf (x0 + , y0 +
, z0 +
) = 0.02
2
2
2
h
K2
L2
L3 = hg(x0 + , y0 +
, z0 +
) = 0.1958
2
2
2
K4 = hf (x0 + h, y0 + K3 , z0 + L3 ) = 0.0392
L4 = hg(x0 + h, y0 + K3 , z0 + L3 ) = 0.1905
Hence
1
y1 = y(0.2) = y0 + (K1 + 2K2 + 2K3 + K4 ) = 0.9801
6
1
0
z1 = y (0.3) = z0 + (L1 + 2L2 + 2L3 + L4 ) = 0.1970
6
Exercises
(6) Compute solutions to the following problems with a second-order Taylor method. Use step size
h = 0.2.
(A)
y 0 = (cos y)2 , 0 x 1, y(0) = 0.
(B)
20
y0 =
, 0 x 1, y(0) = 1.
1 + 19ex/4
(7) Using Runge-Kutta fourth-order method to solve the IVP at x = 0.8 for
dy
= x + y, y(0.4) = 0.41
dx
with step length h = 0.2.
(8) Use the Runge-Kutta fourth-order method to solve the following IVP
y 0 = xz + 1, y(0) = 0,
z 0 = xy, z(0) = 1
with h = 0.1 and 0 x 0.2.
(9) Apply the Taylors method of order three to obtain approximate value of y at x = 0.2 for the
differential equation
y 0 = 2y + 3ex , y0 = 0.
Compare the numerical solution with the exact solution.
(10) Use Runge-Kutta method of order four to solve
2
y 00 = xy 0 y 2 , y(0) = 1, y 0 (0) = 0
for x = 0.2 with stepsize 0.2.
(11) Consider the Lotka-Volterra system
du
= 2u uv, u(0) = 1.5
dt
dv
= 9v + 3uv, v(0) = 1.5.
dt
Use Eulers method with step size 0.5 to approximate the solution at t = 2.
(12) The following system represent a much simplified model of nerve cells
dx
= x + y x3 , x(0) = 0.5
dt
dy
x
= , y(0) = 0.1
dt
2
where x(t) represents voltage across the boundary of nerve cell and y(t) is the permeability of
the cell wall at time t. Solve this system using Runge-Kutta fourth-order method to generate
the profile up to t = 0.2 with step size 0.1.
Bibliography
[Atkinson]
[Jain]
K. Atkinson and W. Han, Elementary Numerical Analysis, John Willey and Sons, Third
edition, 2004.
M. K. Jain, S. R. K. Iyengar, and R. K. Jain, Numerical Methods for Scientific and
Engineering Computation, Sixth edition, New Age International Publishers, New Delhi,
2012.