Chapter 7

Chapter 7
Approximation Theory
The primary aim of a general approximation is to represent non-arithmetic
quantities by arithmetic quantities so that the accuracy can be ascertained to
a desired degree. Secondly, we are also concerned with the amount of computation required to achieve this accuracy. These general notions are applicable
to functions f (x) as well as to functionals F (f ) (A functional is a mapping
from the set of functions to the set of real or complex numbers). Typical examples of quantities to be approximated are transcendental functions, integrals
and derivatives of functions, and solutions of dierential or algebraic equations.
Depending upon the nature to be approximated, dierent techniques are used
for dierent problems.
A complicated function f (x) usually is approximated by an easier function
of the form (x; a0 , . . . , an ) where a0 , . . . , an are parameters to be determined
so as to characterize the best approximation of f . Depending on the sense in
which the approximation is realized, there are three types of approaches:
1. Interpolatory approximation: The parameters ai are chosen so that on a

xed prescribed set of points xi , i = 0, 1, . . . , n, we have
(xi ; a0 , . . . , an ) = f (xi ) := fi .
(7.1)
Sometimes, we even further require that, for each i, the rst ri derivatives
of agree with those of f at xi .
1
CHAPTER 7. APPROXIMATION THEORY

2. Least-square approximation: The parameters ai are chosen so as to
Minimizef (x) (x; a0 , . . . , an )2 .
(7.2)
3. Min-Max approximation: the parameters ai are chosen so as to minimize

f (x) (x; a0 , . . . , an ) .
(7.3)
Denition 7.0.1 We say is a linear approximation of f if depends linearly

on the parameters ai , that is, if
(xi ; a0 , . . . , an ) = a0 0 (x) + . . . + an (xn )
(7.4)
where i (x) are given and xed functions.
Choosing i (x) = xi , the approximating function becomes a polynomial.

In this case, the theory for all the above three types of approximation is well
established. The solution for the min-max approximation problem is the so
called Chebyshev polynomial. We state without proof two fundamental results
concerning the rst two types of approximation:
Theorem 7.0.1 Let f (x) be a piecewise continuous function over the interval
[a, b]. Then for any > 0, there exist an integer n and numbers a0 , . . . , an such
n

b
ai xi }2 dx < .
that a {f (x)
i=0
Theorem 7.0.2 (Weierstrass Approximation Theorem) Let f (x) be a continuous function on [a, b]. For any > 0, there exist an integer n and a polynomial
pn (x) of degree n such that max |f (x) pn (x)| < . In fact, if [a, b] = [0, 1],
x[a,b]
then the Bernstein polynomial

Bn (x) :=
n

n
k=0
k
xk (1 x)nk f ( )
n
(7.5)
converges to f (x) as n .
In this chapter, we shall consider only the interpolatory approximation.

Choosing i (x) = xi , we have the so called polynomial interpolation; choosing i (x) = eix , we have the so called trigonometric interpolation. The so
7.1. LAGRANGIAN INTERPOLATION FORMULA
called rational interpolation where

(xi ; a0 , . . . , an , b0 , . . . , bm ) =
a0 0 (x) + . . . + an n (x)
b0 0 (x) + . . . + bm m (x)
(7.6)
is an important non-linear interpolation.
7.1
Lagrangian Interpolation Formula
Theorem 7.1.1 Let f C[a, b]. Let xi , i = 1, . . . , n, be n distinct points in

[a, b],. There exists a unique polynomial p(x) of degree n1 such that p(xi ) =
f (xi ). In fact,
n

f (xi )i (x)
(7.7)
p(x) =
i=1
where
i (x) :=
n

x xj
.
x
i xj
j=1
(7.8)
j=1
In the case when f C [a, b], then

n
n

E(x) := f (x) p(x) =
(x xj )
j=1
n!
where min{x1 , . . . , xn , x} < < max{x1 , . . . , xn , x}.
(pf): Suppose p(x) =

mined. Then p(xi ) =
form
n1
(7.9)
ak xk where the coecients ak are to be deter-
k=0
n1
ak xki = f (xi ), i = 1, . . . , n, can be written in the
k=0
1, x1 , . . . ,
1,
f (n) ()
xn ,
xn1
1
. . . , xn1
n
a0
an1
f (x1 )
f (xn )
The matrix, known as the van Dermonde matrix, has determinant
(7.10)

(xi
i>j
xj ). Since all xi s are distinct, we can uniquely solve (7.10) for the unknowns
a0 , . . . , an1 . Note that each i (x) is a polynomial of degree n 1 and i (xj ) =

ij , the Kronecker delta notation. Therefore, by uniqueness, (7.7) is proved.
Let x0 [a, b] and x0 = xi for any i = 1, . . . , n. Construct the C n -function
n

F (x) = f (x) p(x) (f (x0 )
(x xi )
p(x0 )) i=1
n

(x0 xi )
i=1
It is easy to see that F (xi ) = 0 for i = 0, . . . , n. By the Rolles theorem, there

exists between x0 , . . . , xn such that F (n) () = 0. It follows that
f (n) () (f (x0 ) p(x0 ))
n!
n

= 0.
(x0 xi )
i=1
n

Thus E(x0 ) = f (x0 ) p(x0 ) =

theorem is proved.
(x0 xi )
i=1
n!
f (n) (). Since x0 is arbitrary, the
Denition 7.1.1 The polynomial p(x) dened by (7.7) is called the Lagrange
interpolation polynomial.
Remark. The evaluation of a polynomial p(x) = a0 + a1 x + . . . + an xn for

x = may be done by the so called Horner scheme:
p() = (. . . ((an + an1 ) + an2 ) . . . + a1 ) + a0
(7.11)
which only takes n multiplications and n additions.

Remark. While theoretically important, Lagranges formula is, in general, not
ecient for applications. The eciency is especially bad when new interpolating
points are added, since then the entire formula is changed. In contrast, Newtons
interpolation formula, being equivalent to the Lagranges formula mathematically, is much more ecient.
Remark. Suppose polynomials are used to interpolate the function
f (x) =
1
1 + 25x2
(7.12)
7.2. NEWTONS INTERPOLATION FORMULA
in the interval [1, 1] at equally spaced points. Runge (1901) discovered that as
the degree n of the interpolating polynomial pn (x) tends toward innity, pn (x)
diverges in the intervals .726 . . . |x| < 1 while pn (x) works pretty well in the
central portion of the interval.
7.2
Newtons Interpolation Formula
Interpolating a function by a very high degree polynomial is not advisable in

practice. One reason is because we have seen the danger of evaluating high degree polynomials (e.g. the Wilkinsons polynomial and the Runges function).
Another reason is because local interpolation (as opposed to global interpolation) usually is sucient for approximation.
One usually starts to interpolate a function over a smaller sets of support
points. If this approximation is not enough, one then updates the current interpolating polynomial by adding in more support points. Unfortunately, each
time the data set is changed Lagranges formula must be entirely recomputed.
For this reason, Newtons interpolating formula is preferred to Lagranges interpolation formula.
Let Pi0 i1 ...ik (x) represent the k-th degree polynomial for which
Pi0 i1 ...ik (xij ) = f (xij )
(7.13)
for j = 0, . . . , k.
Theorem 7.2.1 The recursion formula

pi0 i1 ...ik (x) =
(x xi0 )Pi1 ...ik (x) (x xik )Pi0 ...ik1 (x)

xik xi0
(7.14)
holds.
(pf): Denote the right-hand side of (7.14) by R(x). Observe that R(x) is a
polynomial of degree k. By denition, it is easy to see that R(xij ) = f (xij )
for all j = 0, . . . , k. That is, R(x) interpolates the same set of data as does the
polynomial Pi0 i1 ...ik (x). By Theorem 7.1.1 the assertion is proved.
The dierence Pi0 i1 ...ik (x) Pi0 i1 ...ik1 (x) is a k-th degree polynomial which
vanishes at xij for j = 0, . . . , k 1. Thus we may write
Pi0 i1 ...ik (x) = Pi0 i1 ...ik1 (x) + fi0 ...ik (x xi0 )(x xi1 ) . . . (x xik1 ). (7.15)
The leading coecients fi0 ...ik can be determined recursively from the formula
(7.14), i.e.,
fi0 ...ik =
fi1 ...ik fi0 ...ik1

xik xi0
(7.16)
where fi1 ...ik and fi0 ...ik1 are the leading coecients of the polynomials Pi1 ...ik (x)
and Pi0 ...ik1 (x), respectively.
Remark. Note that the formula (7.16) starts from fi0 = f (xi0 ).
Remark. The polynomial Pi0 ...ik (x) is uniquely determined by the set of support data {(xij , fij )}. The polynomial is invariant to any permutation of the
indices i0 , . . . , ik . Therefore, the divided dierences (7.16) are invariant to permutation of the indices.
Denition 7.2.1 Let x0 , . . . , xk be support arguments (but not necessarily in

any order) over the interval [a, b]. We dene the Newtons divided dierence as
follows:
f [x0 ] : = f (x0 )
f [x1 ] f [x0 ]
f [x0 , x1 ] : =
x1 x0
f [x1 , . . . , xk ] f [x0 , . . . , xk1 ]
f [x0 , . . . , xk ] : =
xk x0
(7.17)
(7.18)
(7.19)
It follows that the k-th degree polynomial that interpolates the set of support
data {(xi , fi )|i = 0, . . . , k} is given by
Px0 ...xk (x)
= f [x0 ] + f [x0 , x1 ](x x0 )

(7.20)
+ . . . + f [x0 , . . . , xk ](x x0 )(x x1 ) . . . (x xk1 ).
7.3. OSCULATORY INTERPOLATION
7.3
Osculatory Interpolation
(0)
(r )
Given {xi }, i = 1, . . . k and values ai , . . . , ai i where ri are nonnegative integers. We want to construct a polynomial P (x) such that
(j)
P (j) (xi ) = ai
(7.21)
for i = 1, . . . , k and j = 0, . . . , ri . Such a polynomial is said to be an osculatory

(j)
interpolating polynomial of a function f if ai = f (j) (xi ) . . .
Remark. The degree of P (x) is at most
k

(ri + 1) 1.
i=1
(j)
Theorem 7.3.1 Given the nodes {xi }, i = 1, . . . , k and values {ai }, j = 0, . . . , ri ,

there exists a unique polynomial satisfying (7.21).
(pf): For i = 1, . . . , k, denote

qi (x)
P (x)
(0)
Then P (x) is of degree

(0)
a1
(1)
(r )
= ci + ci (x xi ) + . . . + ci i (x xi )ri
= q1 (x) + (x x1 )r1 +1 q2 (x) + . . .
+ (x x1 )r1 +1 (x x2 )r2 +1 . . . (x xk1 )rk1 +1 qk (x).
k

(j)
(ri + 1) 1. Now P (j) (x1 ) = a1 for j = 0, . . . , r1
i=1
(0)
(j)
c1 , . . . , a1 =
implies
=
Now we rewrite (7.23) as
R(x) :=
(7.22)
(7.23)
(j)
(j)
c1 j!. So q1 (x) is determined with c1
(j)
a1
j!
P (x) q1 (x)
= q2 (x) + (x x2 )r2 +1 q3 (x) + . . .
(x x1 )r1 +1
Note that R(j) (x2 ) are known for j = 0, . . . , r2 since P (j) (x2 ) are known. Thus
(j)
all c2 , hence q2 (x), may be determined. This procedure can be continued to
determine all qi (x). Suppose Q(x) = P1 (x) P2 (x) where P1 (x) and P2 (x) are
k

two polynomials of the theorem. Then Q(x) is of degree
(ri + 1) 1,
i=1
and has zeros at xi with multiplicity ri + 1. Counting multiplicities, Q(x) has

k

(ri + 1) zeros. This is possible only if Q(x) 0.
i=1
Examples. (1) Suppose k = 1, x1 = a, r1 = n 1, then the polynomial (7.23)

n1

(x a)j
which is the Taylors polynomial of f at
becomes P (x) =
f (j) (a)
j!
j=0
x = x1 .
(2) Suppose ri = 1 for all i = 1, . . . , k. That is, suppose values of f (xi ) and
f (xi ) are to be interpolated. Then the resultant (2k 1)-degree polynomial
is called the Hermite interpolating polynomial. Recall that the (k 1)-degree
polynomial
k

x xj
(7.24)
i (x) =
x
i xj
j=1
j=i
has the property

i (xj ) = ij .
(7.25)
[1 2(x xi )i (xi )]2i (x)

(x xi )2i (x).
(7.26)
(7.27)
Dene
hi (x)
gi (x)
=
=
Note that both hi (x) and gi (x) are of degree 2k 1. Furthermore,

hi (xj ) = ij ;
gi (xj ) = 0;
(7.28)

2
hi (xj ) = [1 2(x xi )i (xi )]2i (x)i (x) 2i (xi )i (x)|x=xj = 0;
gi (xj ) = (x xi )2i (x)i (x) + 2i (x)|x=xj = ij .
So the Hermite interpolating polynomial can be written down as
P (x) =
k
f (xi )hi (x) + f (xi )gi (x)).
(7.29)
i=1
(3) Suppose ri = 0 for all i. Then the polynomial becomes P (x) = c1 + c2 (x

x1 ) + . . . + ck (x x1 ) . . . (x xk1 ) which is exactly the Newtons formula.
7.4
Spline Interpolation
Thus far for a given function f of an interval [a, b], the interpolation has been
to construct a polynomial over the entire interval [a, b]. There are at least two
disadvantages for the global approximation:
7.4. SPLINE INTERPOLATION
1. For better accuracy, we need to supply more support data. But then
the degree the resultant polynomial gets higher and such a polynomial is
dicult to work with.
2. Suppose f is not smooth enough. Then the error estimate of an high
degree polynomial is dicult to establish. In fact, it is not clear whether
or not that the accuracy will increase with increasing number of support
data.
As an alternative way of approximation, the spline interpolation is a local

approximation of a function f , which, nonetheless, yields global smooth curves
and is less likely to exhibit the large oscillation characteristic of high-degree
polynomials. (Ref: A Practical Guide to Splines, Springer-Verlga, 1978, by C.
de Boor).
We demonstrate the idea of cubic spline as follows.
Denition 7.4.1 Let the interval [a, b] be partitioned into a = x1 < x2 < . . . <
xn = b. A function p(x) is said to be a cubic plite of f on the partition if
1. The restriction of p(x) on each subinterval [xi , xi+1 ], i = 1, . . . , n 1 is a

cubic polynomial;
2. p(xi ) = f (xi ), i = 1, . . . , n;
3. p (xi ) and p (xi ) are continuous at each xi , i = 2, . . . , n 1.
Since there are n 1 subintervals, condition (1) requires totally 4(n 1)

coecients to be determined. Condition (2) is equivalent to 2(n 2) + 2 equations. Condition (3) is equivalent to (n 2) + (n 2) equations. Thus we still
need two more conditions to completely determine the cubic spline.
Denition 7.4.2 A cubic spline p(x) of f is said to be
1. A clamped spline if p (x1 ) and p (xn ) are specied.

2. A natural spline if p (x1 ) = 0 and p (xn ) = 0.
3. A periodic spline if p(x1 ) = p(xn ), p (x1 ) = p (xn ) and p (x1 ) = p (xn ).
10
Denote Mi := p (xi ), i = 1, . . . , n. Since p(x) is piecewise cubic and continuously dierentiable, p
(x) is piecewise linear and continuous on [a, b]. In
particular, over the interval [xi , xi+1 ], we have
p (x) = Mi
x xi+1
x xi
+ Mi+1
.
xi xi+1
xi+1 xi
(7.30)
Upon integrating p (x) twice, we obtain

p(x) =
(xi+1 x)3 Mi + (x xi )3 Mi+1

+ ci (xi+1 x) + di (x xi )
6i
(7.31)
where i := xi+1 xi , and ci and di are integral constants. By setting p(xi ) =

f (xi ), p(xi+1 ) = f (xi+1 ), we get
ci
di
f (xi ) i Mi
i
6
f (xi+1 ) i Mi+1
.
i
6
(7.32)
(7.33)
Thus on [xi , xi+1 ],

p(x)
(xi+1 x)3 Mi + (x xi )3 Mi+1

6i
(xi+1 x)f (xi ) + (x xi )f (xi+1 )
+
i
i
[(xi+1 x)Mi + (x xi )Mi+1 ].
6
(7.34)
It only remains to determine Mi . We rst use the continuity condition of p (x)

at xi for i = 2, . . . , n 1, that is,
lim p (x) = lim+ p (x).
xxi
(7.35)
xxi
Thus
2
3i1
Mi
6i1
3i2 Mi
=
6i
+
+
f (xi ) f (xi1 ) i1 (Mi Mi1 )
i1
6
f (xi+1 ) f (xi ) i (Mi+1 Mi )
,
i
6
(7.36)
or equivalently,
i1
i + i1
i
Mi1 +
Mi + Mi+1
6
3
6
f (xi+1 ) f (xi ) f (xi ) f (xi1 )
.
i
i1
(7.37)
7.5. TRIGONOMETRIC INTERPOLATION
11
Note that we have n 2 equations in n unknowns. Suppose, for example, we

work with the clamped spline, that is,
p (x1 ) = f (x1 )
p (xn ) = f (xn ).
(7.38)
(7.39)
Then we have
1
1
M1 + M2
3
6
n1
n1
Mn1 +
Mn
6
3
f (x2 ) f (x1 )
f (x1 )
1
f (xn ) f (xn1 )
= f (xn )
.
n1
=
In matrix form, we obtain a linear algebraic equation

1
1
0
3
6
2
2
61 1 +
0
3
6
..
..
..
.
.
.
n2 +n1
n1
n2
3
n1
6
f (x1 )
vdots
f (xi+1 )f (xi )
(xi1 )
f (xi )f
i
i1
..
.
f (xn )f (xn1 )

f (xn )
n1
f (x2 )f (x1 )
1
6
n1
3
(7.40)
(7.41)
M1
M2
..
.
Mn
We note that in (7.42) the coecient matrix is real, symmetric, tridiagonal

and strictly diagonally dominant. Therefore, there is a unique solution for
Mi , i = 1, . . . , n.
7.5
Trigonometric Interpolation
For a given set of N support points (xk , fk ), k = 0, . . . , N 1, we consider linear

interpolation of the following forms:
a0
+
(ah cos hx + bh sin hx)
2
(7.42)
M1

a0
aM
+
(ah cos hx + bh sin hx) +
cos M x,
2
2
(7.43)
(x)
h=1
(x)
h=1
12
depending upon whether N = 2M + 1 or N = 2M .

For simplicity, we shall consider equally spaced notes. Without loss of generality, we shall assume
xk =
2k
, k = 0, . . . , N 1.
N
Observe that
ei(hxk ) = ei(2hk/N ) = ei2(N h)k/N = ei(N h)xk .
Thus, we may write
cos hxk
sin hxk
eihxk + ei(N h)xk

;
2
eihxk ei(N h)xk
.
2i
(7.44)
Upon substitution into (7.42) and (7.43), we obtain

a0
eihxk + ei(N h)xk
eihxk ei(N h)xk
(ah
+
+ bh
)
2
2
2i
M
(xk ) =
h=1
a0 ah ibh ihxk ah + ibh i(N h)xk

+
(
e
+
e
)
2
2
2
M
h=1
= 0 + 1 eixk + . . . + 2M ei2Mxk ,
(xk ) =
a0
+
2
M1

h=1
(7.45)
ah ibh ihxk ah + ibh i(N h)xk

e
e
+
)
2
2
aM eiMxk + ei(N M)xk

+
2
2
ixk
= 0 + 1 e + . . . + 2M1 ei(2M1)xk ,
(7.46)
respectively. Thus instead of considering the trigonometric expressions (x), we

are motivated to consider the phase polynomial
p(x) := 0 + 1 eix + . . . + N 1 ei(N 1)x ,
(7.47)
or equivalently, by setting := eix , the standard polynomial

P () := 0 + 1 + . . . + N 1 N 1 .
(7.48)
Denoting k := eixk , the interpolating condition becomes

P (k ) = fk , k = 0, . . . , N 1.
Since all k are distinct, by Theorem 7.1.1, we know
(7.49)
7.5. TRIGONOMETRIC INTERPOLATION
13
Theorem 7.5.1 For any support data (xk , fk ), k = 0, . . . , N 1 with xk =

2k/N , there exists a unique phase polynomial p(x) of the form (7.47) such
that p(xk ) = fk for k = 0, . . . , N 1. In fact,
j =
N 1
N 1
1
1
fk (k )j =
fk (eixk )j .
N
N
k=0
(7.50)
k=0
2k
(pf): It only remains to show (7.50). Recall that k = eixk = ei N . Thus

j
N
k = jk and kj = kj . Observe that jh
= 1. That is, jh is a root of the
N
1

polynomial N 1 = ( 1)
k . Thus it must be either jh = 1 which is
the case j = h, or
N
1
k=0
N
1
k=0
summarize that
kjh =
k
jh
=
k=0
N
1
N
1
kj kh = 0. We may therefore
k=0
kj kh =
k=0
h
T
(1, 1h , . . . , N
1 )
Introducing (h) :=
of the complex inner product
(j) , (h) =
0,
N,
if
if
j=
0
j = h.
(7.51)
C N , we may rewrite (7.51) in terms
0,
if
if
N,
j = h
.
j=h
(7.52)
That is, the vectors (0) , . . . (N 1) form an orthogonal basis of C N . Denote

f := (f0 , . . . , fN 1 )T .
Then the interpolating condition (7.49) can
1 1
1
0
1 1 . . . 1N 1
..
.
N 1
N 1
1
N 1
N 1
be written as

f0

fN 1
or simply
0 (0) + 1 (1) + . . . + N 1 (N 1) = f.
By the orthogonality of (h) , (7.50) follows.
(7.53)
Corollary 7.5.1 The trigonometric interpolation of the data set (xk , fk ), k =

0, . . . , N 1 with xk = 2k/N is given by (7.42) or (7.43) with
ah
N 1
2
fk cos hxk
N
k=0
(7.54)
14
bh
N 1
2
fk sin hxk .
N
(7.55)
k=0
Denition 7.5.1 Given the phase polynomial (7.47) and 0 s N , the ssegment ps (x) is dened to be
ps (x) := 0 + 1 eix + . . . + s eisx .
(7.56)
Theorem 7.5.2 Let p(x) be the phase polynomial that interpolates a given set
of support data (xk , fk ), k = 0, . . . , N 1 with xk = 2k/N . Then the s-segment
ps (x) of p(x) minimizes the sum
S(q) =
N
1
|fk q(xk )|2
(7.57)
k=0
over all phase polynomials q(x) = 0 + 1 eix + . . . + s eisx .
(pf): Introducing the vectors ps := (ps (x0 ), . . . ps (xN 1 ))T =

q := (q(x0 ), . . . , q(xN 1 ))T =
s
s
j (j) and
j=0
j (j) in C N , we write S(q) = f q, f q.
j=0
By theorem 7.5.1, we know j =
1
(j)
.
N f, w
Thus for j s, we have
1
f ps , w(j) = j j = 0,
N
and hence
f ps , ps q =
s

f ps , (j j ) (j) = 0.
j=1
It follows that S(q) = f q, f q = f ps + ps q = f ps , f ps +

ps q, ps q f ps , f ps = S(ps ).
Remark. (7.5.2) states the important property that the truncated trigonometric interpolation polynomial p(x) produces the least-squares trigonometric
approximation ps (x) of all data.
7.6. FAST FOURIER TRANSFORM
7.6
15
Fast Fourier Transform
Suppose that we have a series of sines and cosines which represent a given
function of [L, L], say,
a0
nx
nx
+
+ bn sin
.
f (x) =
an cos
2
L
L
n=1
(7.58)
Using the facts that

nx
dx
L
nx
mx
cos
cos
dx
L
L
L
cos
L
sin
L
sin
L
and
0,
nx
dx = 0
L
if n = m
(7.59)
(7.60)
L, if n = m
nx
mx
cos
dx = 0,
L
L
(7.61)
one can show that

an
bn
1
L
1
L
f (x) cos
nx
dx, n = 0, 1, . . .
L
(7.62)
f (x) sin
nx
dx, n = 1, 2, . . .
L
(7.63)
L
L
L
Denition 7.6.1 The series (7.58), with coecients dened by (7.62) and
(7.63), is called the fourier series of f (x) on [L, L].
Given a function f (x), its Fourier series does not necessarily converge to
f (x) at every x. In fact,
Theorem 7.6.1 Suppose f (x) is piecewise continuous on [L, L]. Then (1) If
f (x0 + h) f (x0 )

x0 (L, L) and f (x+
),
0 ) and f (x0 ) both exist (where f (x ) := lim
h
h0
+
f (x0 )+f (x0 )

then the Fourier series converges to
. (2) At L or L, if f (L+ )
2
and f (L ) exist, then the Fourier series converges to
f (L+ )+f (L )
.
2
16
Remark. Suppose that f (x) is dened on [0, L]. We may extend f (x) to become
an even function on [L, L] (simply by dening f (x) := f (x) for x [L, 0]).
In this way, the Fourier coecients for the extended function become
2
L
0.
an
bn
f (x) cos
0
nx
dx
L
(7.64)
Thus we are left with a pure cosine series
f (x) =
a0
nx
+
an cos
.
2
L
n=1
(7.65)
Similarly, we may extend f (x) to become an odd function on [L, L], in which
case we obtain a pure sine series
f (x) =
bn sin
n=1
with
bn =
2
L
f (x) sin
0
nx
L
(7.66)
nx
dx.
L
(7.67)
Example.
1. If f (x) = |x|, x , then

f (x) =
4
cos 3x cos 5x
+
+ . . .).
(cos x +
2
32
52
2. If g(x) = x, x , then
f (x) = 2(sin x
sin 2x sin 3x sin 4x

+
+ . . .).
2
3
4
3. If

h(x)
x( x),
x( + x),
for
for
0x
x 0
then
h(x) =
8
sin 3x sin 5x
(sin x +
+
+ . . .).
33
53
17
Remark. The relationship between the trigonometric interpolation and the

Fourier series of a function f (x) can easily be established. For demonstration
purpose, we assume f (x) is dened over [0, 2]. Dene g(y) = f (y + ) = f (x)
for y [, ]. Then a typical term in (7.58) for g(y) is an cos ny with

1
an =
g(y) cos(ny)dy.
(7.68)

Suppose we partition the interval [, ] into N equally spaced subinterval and
dene yk = + 2k
N for k = 0, . . . , N 1. Then the approximation to (7.68) is
an
N 1
2
g(yk ) cos(nyk )
N
k=0
= (1)n
N 1
2 2k
2kn
f(
) cos
:= a
n .
N
N
N
(7.69)
k=0
Now observe that
f (x) = g(y) =
a0
an cos ny + bn sin ny
+
2
n=1
a0
an cos ny + bn sin ny
+
2
n=1
a
0
+
a
n cos n(x ) + bn sin n(x ).
2
n=1
Comparing (7.69) with (7.54), we realize that the trigonometric interpolation

polynomial converges to the Fourier series as N . Thus the trigonometric
interpolation can be interpreted as the Fourier analysis applied to discrete data.
Remark. The basis of the Fourier analysis method for smoothing data is as
follows: If we think of given numerical data as consisting of the true values of
a function with random errors superposed, the true functions being relatively
smooth and the superposed errors quite unsmooth, then the examples above suggest a way of partially separating functions from error. Since the true function
is smooth, its Fourier coecients will decrease quickly. But the unsmoothness
of the error suggests that its Fourier coecients may decrease very slowly, if at
all. The combined series will consist almost entirely of error, therefore, beyond
a certain place. If we simply truncate the series at the right place, then we are
discarding mostly error (although there will be error contribution in the terms
retained).
Remark. Since truncation produces a least-squares approximation (See (7.5.2),
the fourier analysis method may be viewed as least-squares smoothing.
18

We now derive the Fourier series in complex form.
jx
kx
Lemma 7.6.1 The functions ei L and ei L are orthogonal in the following

sense:

L
kx
0
if k = j
i jx
i
e L e L =
.
(7.70)
2L,
if k = j
L
Assume the Fourier series takes the form
f (x) =
fn ei
nx
L
(7.71)
n=
Multiplying both sides of (7.71) by ei

f (x)e
i kx
L
dx =
kx
L
and integrating brings

fn
n=
ei
nx
L
ei
kx
L
dx.
(7.72)
By the orthogonality property, it is therefore suggested that

L
kx
1
fk =
f (x)ei L dx.
2L L
(7.73)
Remark. The complex form (7.71) can be written as
fn ei
nx
L
n=
= f0 +
fn (cos
n=
(fn + fn ) cos
n=1
nx
nx
+ i sin
)
L
L
nx
nx
+ i(fn fn ) sin
.
L
L
(7.74)
It is easy to see that f0 = a20 , fn + fn = an and fn + fn = bn . That is, the

series (7.71) is precisely the same as the series (7.58).
Remark. We consider the relation between the trigonometric interpolation
and the fourier series again. Let f (x) be a function dened on [0, 2]. Then the
Fourier series of f may be written in the form
f (x) =

n=
fn einx
(7.75)
19
where
2
1
f (x)einx dx.
2 0
Consider the function g(x) of the form
fn =
g(x) =

dj eijx
(7.76)
(7.77)
j=
for x [0, 2]. Suppose g(x) interpolates f (x) at x = xk = 2n

N for k =
0, 1, . . . , N 1. Multiplying both sides of (7.77) by einxk and sum over k, we
obtain
N
1
N
1

f (xk )einxk =
dj eijxk einxk
(7.78)
k=0
k=0 j=

N
1
dj (
j=
eijxk einxk ).
(7.79)
k=0
We recall from (7.51) the orthogonality in the second summation of (7.74).

Therefore,
N 1
1
dn =
f (xk )einxk
(7.80)
N
k=0
for n = 0, 1, . . . . Once, we see that (7.80) is a Trapezoidal approximation of the integral (7.76). The match between (7.49) and (7.80) is conspicuous except that the ranges of validity do not coincide. Consider the
case where N = 2 + 1. Then obviously j = dj for j = 0, 1, . . . , . But
N
1
N 1

1
N +j = N1
f (xk )ei(N +j)xk =
f (xk )eijxk = dj for j = 1, . . . , .
N
k=0
k=0
The central idea behind the fast Fourier transform (FFT) is that when N is
the product of integers, the numbers dj (or j ) prove to be closely interdependent. This interdependence can be exploited to substantially reduce the amount
of computation required to generated these numbers. We demonstrate the idea
as follows:
Suppose N = t1 t2 where both t1 and t2 are integers. Let
j : = j1 + t1 j2
n : = n2 + t2 n1
for j1 , n1 = 0, 1, . . . , t1 1 and j2 , n2 = 0, 1, . . . , t2 1. Note that both j and n
2
run their required ranges 0 to N 1. Let := ei N . Then N = 1. Thus
j = j1 +t1 j2
N 1
N 1
1
1
i 2jn
N
=
f (xn )e
=
f (xn ) nj
N n=0
N n=0
20
N 1
1
f (xn ) j1 n2 +j1 t2 n1 +t1 j2 n2
N n=0
t2 1 t
1 1
1
(
f (xn2 +t2 n1 ) j1 t2 n1 ) j1 n2 +t1 j2 n2 .
N n =0 n =0
2
(7.81)
The equation (7.81) can be arranged in a two-step algorithm:

F1 (j1 , n2 ) :=
t
1 1
f (xn2 +t2 n1 ) j1 t2 n1 ;
(7.82)
t2 1
1
F1 (j1 , n2 ) j1 n2 +t1 j2 n2 .
N n =0
(7.83)
n1 =0
j = F2 (j1 , j2 ) :=
To compute F1 there are t1 terms to processed; to compute F2 there are t2 .

The total is t1 + t2 . This must be done for each (j1 , n2 ) and (j1 , j2 ) pair, or N
pairs. The nal count is, thus, N (t1 + t2 ) terms processed. The original form
processed N terms for each j, a total of N 2 terms. The gain in eciency, if
+t2
measured by this standard, is thus t1N
and depends very much on N . If, for
instance, N = 1000 = 10 100, then only 11% of the original 1000000 terms
are needed.
Example. Consider the case N = 6 = 2 3, xn = 2n
6 and the following discrete
data:
n
0 1 2 3 4
5
.
f (xn ) 0 1 1 0 -1 -1
Then values of F1 (j1 , n2 ), according to (7.82), are given in the table
n2
0
0
0
1
0
2
0
j1
Values of F2 (j1 , j2 ), according to (7.84), are given by
0
0
2 3i
j1
j2
1
0
0
2
0
2 3i
7.7. UNIFORM APPROXIMATION
21
Note that in programming language, the j2 loop is external to the j1 loop.

Suppose N = t1 t2 t3 . Let
j = j1 + t1 j2 + t1 t2 j3
n = n3 + t3 n2 + t3 t2 n1 .
Then in the nine power terms in nj , three will contain the product t1 t2 t3 and
hence may be neglected. The remaining six power terms may be grouped into
a three step algorithm:
F1 (j1 , n2 , n3 ) :=
t
1 1
f (xn ) j1 t3 t2 n1
(7.84)
F (j1 , n2 , n3 ) (j1 +t1 j2 )t3 n2
(7.85)
n1 =0
F2 (j1 , j2 , n3 ) :=
t
2 1
n2 =0
j = F3 (j1 , j2 , j3 ) :=
t3 1
1
F2 (j1 , j2 , n3 ) (j1 +t1 j2 +t1 t2 j3 )n3 . (7.86)
N n =0
3
We note that if N = 10 , then only 3% of the original terms are needed.
7.7
Uniform Approximation
We have seen that the problem of approximating a continuous function by a

nite linear combination of given functions can be approached in various ways.
In this section, we want to use the maximal deviation of the approximation as
a measure of the quality of the approximation. That is, we want to consider
the normed linear space C[a, b] equipped with the sup-norm . We shall
limit out attention to approximate a continuous function by elements from the
subspace Pn1 of all polynomials of degree n 1. Since the error provides a
uniform bound on the deviation throughout the entire interval, we refer to the
result as a uniform approximation.
We rst present a sucient condition for checking if a given polynomial is a
best approximation.
Theorem 7.7.1 Let g Pn1 , f C[a, b] and := f g . Suppose there

exist n + 1 points a x1 < . . . < xn+1 b such that
|f (x ) g(x )| =
(7.87)
22

f (x+1 ) g(x+1 ) = (f (x ) g(x ))
(7.88)
for all . Then g is a best approximation of f .
(pf): Let
M := {x [a, b]||f (x) g(x)| = }.
(7.89)
Certainly x M for all = 1, . . . , n+1. If g is not the best approximation, then

there exists a best approximation f that can be written in the form f = g + p
for some p Pn1 and p is not identically zero. Observer that for all x M we
have
|e(x) p(x)| < |e(x)|
(7.90)
where e(x) := f (x) g(x). The inequality in (7.90) is possible if and only if the
sign of p(x) is the same as that of e(x). That is, we must have (f (x)g(x))p(x) >
0 for all x M . By (7.88), it follows that the polynomial p must change signs
at least n times in [a, b]. That is, p must have at least n zeros. This contradict
with the assumption that p is not identically zero.
Remark. The above theorem asserts only that g is a best approximation whenever there are at least n + 1 points satisfying (7.87) and (7.88). In general, there
can be more points where the maximal deviation is achieved.
Example. Suppose we want to approximate f (x) = sin 3x over the interval
[0, 2]. It follows from the theorem that if n 1 4, then the polynomial g = 0
is a best approximation of f . Indeed, in this case the dierence f g alternates
between its maximal absolute value at six points, whereas the theorem only
requires n + 1 points. On the other hand, for n 1 = 5 we have n + 1 = 7, and
g = 0 no longer satises conditions (7.87) and (7.88). In fact, in this case g = 0
is not a best approximation from P5 .
Remark. The only property of Pn1 we have used to establish Theorem 7.7.1
is a weaker form of the Fundamental Theorem of Algebra, i.e., any polynomial
of degree n 1 has at most n 1 distinct zeros in [a, b]. This property is in fact
shared by a larger class of functions.
Denition 7.7.1 Suppose that g1 , . . . , gn C[a, b] are n linearly independent

functions such that every non-trivial element g U := span{g1 , . . . , gn } has at
most n 1 distinct zeros in [a,b]. Then we say that U is a Haar space. The
basis {g1 , . . . , gn } of a Haar space is called a Chebyshev system.
7.7. UNIFORM APPROXIMATION
23
Remark. We have already seen that {1, x, x2 , . . . , xn1 } forms a Chebyshev

system. Two other interesting examples are
1. {1, ex , e2x , . . . , e(n1)x } over R .

2. {1, sin x, . . . , sin mx, cos x, . . . , cos mx} over [0, 2].
We now state without proof the famous result that Theorem 7.7.1 is not
only sucient but is also necessary for a polynomial g to a best approximation.
The following theorem is also known as the Alternation Theorem:
Theorem 7.7.2 The polynomial g Pn1 is a best approximation of the function f [a, b] if and only if there exist points a x1 < . . . < xn+1 b such
that conditions (7.87) and (7.88) are satised.
Denition 7.7.2 The set of points {x1 , . . . , xn+1 } in Theorem 7.7.2 is referred
to as an alternant for f and g.
Corollary 7.7.1 For any f C[a, b], there is a unique best approximation.
Theorem 7.7.1 also provides a basis for designing a method for the computation of best approximations of continuous functions. The idea, known as the
exchange method of Remez, is as follows:
1. Initially select points such that a = x1 < . . . xn+1 = b.

2. Compute the coecients of a polynomial p(x) := an1 xn1 + . . . a0 and a
number so that
(f p(0) )(x ) = (1)1 .
(7.91)
for all 1 n + 1. Note that the equations in (7.91) form a linear
system which is solvable.
The problem in step (2) is that even the property of alternating signs has
been satised in (7.91), it is not necessarily ture that = f p . We
thus need to replace a new alternant.
3. Locate the extreme points in [a, b] of the absolute error function e(x) :=
f (x) p(x). For the sake of simplicity, we assume that there are exactly
n + 1 extreme points, including a and b.
24

4. Replace {xk } by the new extreme points and repeat the sequence of steps
given above beginning with step (2).
The objective here is to have the set {xk } converge to a true alternant and
hence the polynomial converge to a best approximation. It can be proved that
the process does converge for any choice of starting values in step (1) for which
the value of computed in step (2) in not zero. With additional assumptions on
the dierentibility of f , it can also be shown that convergence is quadratic. Also
the assumption that e(x) possesses exactly n + 1 extrem points in step (3) is
not essential. For more details, refer to G. Meinarduss book Approximation of
Functions: Theory and Numerical Methods, Springer-Verlag, New York, 1967.

Chapter 7

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 7

Uploaded by

Copyright:

Available Formats

Chapter 7

1. Interpolatory approximation: The parameters ai are chosen so that on a

CHAPTER 7. APPROXIMATION THEORY

3. Min-Max approximation: the parameters ai are chosen so as to minimize

Denition 7.0.1 We say is a linear approximation of f if depends linearly

where i (x) are given and xed functions.

Choosing i (x) = xi , the approximating function becomes a polynomial.

then the Bernstein polynomial

In this chapter, we shall consider only the interpolatory approximation.

7.1. LAGRANGIAN INTERPOLATION FORMULA

called rational interpolation where

is an important non-linear interpolation.

Lagrangian Interpolation Formula

Theorem 7.1.1 Let f C[a, b]. Let xi , i = 1, . . . , n, be n distinct points in

In the case when f C [a, b], then

E(x) := f (x) p(x) =

(pf): Suppose p(x) =

ak xk where the coecients ak are to be deter-

ak xki = f (xi ), i = 1, . . . , n, can be written in the

The matrix, known as the van Dermonde matrix, has determinant

CHAPTER 7. APPROXIMATION THEORY

a0 , . . . , an1 . Note that each i (x) is a polynomial of degree n 1 and i (xj ) =

F (x) = f (x) p(x) (f (x0 )

It is easy to see that F (xi ) = 0 for i = 0, . . . , n. By the Rolles theorem, there

Thus E(x0 ) = f (x0 ) p(x0 ) =

f (n) (). Since x0 is arbitrary, the

Remark. The evaluation of a polynomial p(x) = a0 + a1 x + . . . + an xn for

which only takes n multiplications and n additions.

7.2. NEWTONS INTERPOLATION FORMULA

Newtons Interpolation Formula

Interpolating a function by a very high degree polynomial is not advisable in

Theorem 7.2.1 The recursion formula

(x xi0 )Pi1 ...ik (x) (x xik )Pi0 ...ik1 (x)

CHAPTER 7. APPROXIMATION THEORY

fi1 ...ik fi0 ...ik1

Denition 7.2.1 Let x0 , . . . , xk be support arguments (but not necessarily in

= f [x0 ] + f [x0 , x1 ](x x0 )

7.3. OSCULATORY INTERPOLATION

for i = 1, . . . , k and j = 0, . . . , ri . Such a polynomial is said to be an osculatory

Remark. The degree of P (x) is at most

Theorem 7.3.1 Given the nodes {xi }, i = 1, . . . , k and values {ai }, j = 0, . . . , ri ,

(pf): For i = 1, . . . , k, denote

Then P (x) is of degree

c1 j!. So q1 (x) is determined with c1

and has zeros at xi with multiplicity ri + 1. Counting multiplicities, Q(x) has

CHAPTER 7. APPROXIMATION THEORY

Examples. (1) Suppose k = 1, x1 = a, r1 = n 1, then the polynomial (7.23)

has the property

[1 2(x xi )i (xi )]2i (x)

Note that both hi (x) and gi (x) are of degree 2k 1. Furthermore,

f (xi )hi (x) + f  (xi )gi (x)).

(3) Suppose ri = 0 for all i. Then the polynomial becomes P (x) = c1 + c2 (x

7.4. SPLINE INTERPOLATION

As an alternative way of approximation, the spline interpolation is a local

1. The restriction of p(x) on each subinterval [xi , xi+1 ], i = 1, . . . , n 1 is a

Since there are n 1 subintervals, condition (1) requires totally 4(n 1)

Denition 7.4.2 A cubic spline p(x) of f is said to be

1. A clamped spline if p (x1 ) and p (xn ) are specied.

CHAPTER 7. APPROXIMATION THEORY

Upon integrating p (x) twice, we obtain

(xi+1 x)3 Mi + (x xi )3 Mi+1

where i := xi+1 xi , and ci and di are integral constants. By setting p(xi ) =

Thus on [xi , xi+1 ],

a0 , . . . , an1 . Note that each i (x) is a polynomial of degree n 1 and i (xj ) =

[1 2(x xi )i (xi )]2i (x)

f (xi )hi (x) + f (xi )gi (x)).

1. A clamped spline if p (x1 ) and p (xn ) are specied.

Upon integrating p (x) twice, we obtain

It only remains to determine Mi . We rst use the continuity condition of p (x)

j (j) in C N , we write S(q) = f q, f q.