Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Lecture Notes Numerical Analysis

Numerical Analysis

Uploaded by

Quân Trần
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture Notes Numerical Analysis

Numerical Analysis

Uploaded by

Quân Trần
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Numerical Analysis

Jonas Tölle1

Lecture notes for MS-C1650 Numerical Analysis, Aalto University

Last updated: 30.5.2024

Largely based the lecture transcript by Harri Hakula, 2021.

Intended learning outcomes. A!er the course, the student will be able to…

explain the fundamental concepts of numerical analysis, like condi"on number, stability, and convergence rate;
construct the floa"ng point numbers;
discuss and employ basic numerical algorithms like Newton’s method;
use the Monte-Carlo method in basic problems in analysis and geometry;
apply different methods of interpola"on polynomials and numerical quadrature rules;
understand the Euler scheme and linear mul"-step methods for solving ordinary differen"al equa"ons.

Floa!ng-point numbers

The set of real numbers R is infinite in two ways: it is unbounded and con"nuous. In most prac"cal compu"ng, the second
kind of infiniteness is more consequen"al than the first kind, so we turn our a$en"on there first.

Instead of R, we shall introduce the set of floa!ng-point numbers (floats) F. They come with different bases, precisions and
exponent ranges, and other features. The basic representa"on is

x = ±(d0 .d1 d2 … dp )k ⋅ k e .

k ∈ N ∖ {1} is called base or radix, p ∈ N0 := N ∪ {0} is called the precision, di , i ∈ {0, … , p}, and the sequence of
numbers
p
(d0 .d1 d2 … dp )k := ∑ di k −i
i=0

is called man!ssa or significand. The exponent e ∈ Z is bounded m ≤ e ≤ M , where m, M ∈ Z.


If k = 10, we can read the usual decimal commas from the man"ssa:

(1.01)10 = 1 ⋅ 100 + 0 ⋅ 10−1 + 1 ⋅ 10−2 = 1.01.

If k= 2, we have binary floats. In the binary case, we observe that we can always choose d0 = 1, hence saving one bit, which
can be expressed by e. We refer to this as normaliza!on. The man"ssa is always contained in the interval [1, 2).

Example. (Toy floa"ng point system). Binary floats of the type

(1.b1 b2 )2

with exponents e = −1, 0, 1.


Hence (1.00)2 = 1, (1.01)2 = 54 , (1.10)2 = 32 , and >(1.11)2 = 74 . By mul"plying with the exponents 2−1 = 12 , 20 =
1, 21 = 2, we get the whole set:
e
5 3
1 4 2

5
2 2
3
1 5 3
2 8 4

Important quan"ty: (1.01)2 − 1 = 14 , the so-called machine epsilon.

Define the machine epsilon by ε := 2−p = (1.00 … 01)2 − 1.

Rounding

For the rounding func"on round : R → F, we have 5 alterna"ve defini"ons:

Rounding to nearest (default)


Rounding to +∞
Rounding to −∞
Rounding to 0
Rounding away from 0

It holds that round(x) = x(1 + δ), where ∣δ∣ < 2ε , where ε denotes the machine epsilon. Note that usually δ depends on x.
There is a way to define the standard arithme"c opera"ons on F such that

a ⊕ b = round(a + b) = (a + b)(1 + δ1 ),

a ⊖ b = round(a − b) = (a − b)(1 + δ2 ),

a ⊗ b = round(ab) = ab(1 + δ3 ),

a a
a ⊘ b = round ( ) = (1 + δ4 ), b
= 0.
b b

Here, generally δi 
= δj , i 
= j.

IEEE 754 “Double precision”

k = 2, 64 bits, where:

The sign: 1 bit;


The exponent field 0 ≤ ex ≤ 2047: 11 bits, where e = ex − 1023, and −1022 ≤ e ≤ 1023, where ex = 0 and
ex = 2047 are special cases.
The man"ssa 52 bits, precision p = 52.

Exponent field Number Type of number

00 … 00 = 0 ±(0.b1 b2 … b52 )2 ⋅ 2−1022 0 or subnormal

00 … 01 = 1 ±(1.b1 b2 … b52 )2 ⋅ 2−1022

00 … 10 = 2 ±(1.b1 b2 … b52 )2 ⋅ 2−1021


… …

01 … 11 = 1023 ±(1.b1 b2 … b52 )2 ⋅ 20

… …

11 … 10 = 2046 ±(1.b1 b2 … b52 )2 ⋅ 21023

11 … 11 = 2047 ±∞ if b1 = b2 = … = b52 = 0, otherwise NaN excep!on

Thus, there are two zeros, two infini"es and NaN which denotes “not a number”. The smallest posi"ve normalized number is:

(1.0)2 ⋅ 2−1022 ≈ 2.2 ⋅ 10−308 .

The largest posi"ve number is:

(1.1 … 1)2 ⋅ 21023 ≈ 1.8 ⋅ 10308

The machine epsilon is:

2−52 ≈ 2.22 ⋅ 10−16 .

Here’s an easy-to-follow video explaining floa"ng point numbers (and a specific version of Newton’s algorithm).

Condi!on number and stability

Condi!oning of problems

Assume that f : R → R “solu"on map” of the problem, input numbers x, x


^, close in value, e.g. x
^ = round(x). Set y :=
f (x), y^ := f (x
^ ).

Defini!on. The absolute condi!on number C(x) is defined by the rela"on

∣y − y^∣ ≈ C(x)∣x − x
^∣.

The rela!ve condi!on number K(x) is defined by the rela"on

∣ y − y^ ∣ ∣x − x
^∣
≈ K(x)
∣ y ∣ ∣ x ∣
By the normaliza"on, we guarantee that

(relative error in the output) ≈ K(x) × (relative error in the input).

Now,

f (x) − f (x
^)
y − y^ = f (x) − f (x
^) = (x − x
^)
x−x ^
→f ′ (x) as x
^→x

Thus, C(x) = ∣f ′ (x)∣.


Furthermore,

−^ ( ) − ( ^) f (x) − f (x
^) −^
y − y^ f (x) − f (x
^) f (x) − f (x
^) x − x
^ x
= =
y f (x) x−x ^ x f (x)
→f ′ (x) as x
^→x

∣ xf ′ (x) ∣
Thus, K(x) = .
∣ f (x) ∣
Example. f (x) = 2x, f ′ (x) = 2. Thus, C(x) = 2, K(x) = ∣∣ 2x ∣ = 1. This is a well-condi"oned problem.
2x ∣

Example. g(x) = x, g ′ (x) = 1


C(x) is becomes unbounded for x > 0 close to zero, e.g. x ≈ 10−8 yields
2 x
. Thus,
∣ ∣
C(x) ≈ 104 . On the other hand, K(x) = 2 xx x = 12 .
∣ ∣

Stability of algorithms

Defini!on. An algorithm or numerical process is called stable if small changes in the input produce small changes in the output.
It is called unstable if large changes in the output are produced by small changes in the input.

An algorithm is stable, if every step is well-condi"oned (i.e. has a uniformly bounded condi"on number). It is unstable if any
step is ill-condi"oned (i.e. the condi"on number may become arbitrarily large).

Forward error analysis (FEA) is asking:


“How far are we from the true solu"on?”

Backward error analysis (BEA) is asking:


“Given the answer, what was the problem?”

Example.
Set:

fl(x + y) := round(x) ⊕ round(y) = ((x(1 + δ1 ) + y(1 + δ2 ))(1 + δ3 ),

where ∣δi ∣ < 2ε , i = 1, 2, 3.


FEA:

fl(x + y) = x + y + x(δ1 + δ3 + δ1 δ3 ) + y(δ2 + δ3 + δ2 δ3 ).

The absolute error is

ε2
∣fl(x + y) − (x + y)∣ ≤ (∣x∣ + ∣y∣) (ε + ).
4

The rela"ve error is:

∣ fl(x + y) − (x + y) ∣ (∣x∣ + ∣y∣) (ε + 4)


ε2

≤ .
∣ x+y ∣ ∣x + y∣

BEA:

fl(x + y) = x(1 + δ1 )(1 + δ3 ) + y(1 + δ2 )(1 + δ3 ).

2
Thus the rela"ve error for each term is less or equal to ε + ε4 .
Hence the sum of two floa"ng point numbers is backwards stable.
Well-condi"oned problems may have unstable algorithms. For stability, each step has to be well condi"oned. Some ill-
condi"oned steps produce an unstable algorithm. Ill-condi"oned problems cannot be reliably solved with a stable algorithm.

Example. Consider evalua"ng f (x) = 1 + x − 1 for x close to zero. The rela"ve condi"on number is:
∣ xf ′ (x) ∣ x
K(x) = =
∣ f (x) ∣ 2 1 + x( 1 + x − 1)

x( 1 + x + 1) 1+x+1
= = ,
2 1 + x( 1 + x − 1)( 1 + x + 1) 2 1+x

and K(0) = 1.
Consider the following 3 steps;

1. t1 := 1 + x, well-condi"oned, x close to 0.
2. t2 := t1 , rela"vely well-condi"oned, also absolutely well condi"oned, because t1 is close to 1.
∣ 2 ∣
3. t3 := t2 − 1, ill-condi"oned, rela"ve condi"on number of this step: K3 (t2 ) = t t−1 , which becomes unbounded
∣2 ∣
for t2 close to 1!
On the other hand, the problem is well-condi"oned. Solve it by wri"ng:

( 1 + x + 1)( 1 + x − 1) x
f (x) = 1+x−1= = ,
1+x+1 1+x+1

which can be evaluated directly close to zero.

Numerical differen!a!on

Recall Taylor’s theorem, for a twice differen"able func"on f : R → R,


1
f (x) = f (x0 ) + f ′ (x0 )(x − x0 ) + (x − x0 )2 f ′′ (ξ),
2

for any x0 , x ∈ R, where ξ ∈ [x0 , x].

What is that ξ (= “xi”)? Under certain assump"ons elementary func"ons have their series expansions. If the series is
truncated, we have the Taylor polynomial. However, the residual has an explicit expression but due to applica"on of an
intermediate value theorem, the exact loca"on of the point ξ is not known, i.e. “generic”.

By se%ng x := z + h, x0 := z , we obtain the useful equivalent formula


1
f (z + h) = f (z) + f ′ (z)h + f ′′ (ξ)h2 ,
2

for every z, h ∈ R, ξ ∈ [z, z + h].

Rate of convergence (Q-convergence)

Let (xk ) be an infinite sequence of real numbers. Let sk := supl≥k xl , k ∈ N, be the supremum (i.e., the lowest upper bound)
of the tail (that is, large indices l ≥ k ) of (xk ). Define the lim sup (limes superior) as

lim sup xk := lim sk ∈ [−∞, +∞].


k→∞ k→∞
k→∞

Other than a limit, it always exists, but can be ±∞. If (xk ) is bounded, the lim sup is the largest limit of a converging
subsequence. If limk→∞ xk ∈ (−∞, ∞) exists, then limk→∞ xk = lim supk→∞ xk . The opposite is not true.

Examples. (1) xk := (−1)k , then sk = 1, and lim supk→∞ xk = 1.


(2) xk = sin(k), then sk = 1, and lim supk→∞ xk = 1.

Assume that limk→∞ xk = x and that there exists some large index M ∈ N such that xk 
= x for all k ≥ M . Then we
define the following quan"ty for p ≥0
∣xk+1 − x∣
C(p) := lim sup .
k→∞ ∣xk − x∣p

We observe that C(p∗ )


< ∞ for some p∗ > 0 implies C(p) = 0 for every 0 ≤ p < p∗ . If C(p∗ ) > 0 for some p∗ > 0
then C(p) = ∞ for any p > p∗ .

Proof. By the submul"plica"ve property of lim sup,

∣xk+1 − x∣ ∣xk+1 − x∣
C(p) = lim sup = lim sup [ ∗ ∣xk − x∣
p∗ −p
]
k→∞ ∣xk − x∣p
k→∞ ∣xk − x∣p

) = C(p∗ ) ⋅ {
∣xk+1 − x∣ 0 if p < p∗ ,
≤ (lim sup ∗ ) ⋅ (lim sup ∣xk − x∣
p∗ −p
k→∞ ∣xk − x∣p
k→∞ ∞ if p > p∗ .

Thus, there exists a (possibly infinite) p∗ such that

⎧0 if 0 ≤ p < p∗ ,
C(p) = ⎨C(p∗ ) if p = p∗ ,
⎩∞ if p > p∗ .

The number p∗ is called order of convergence for the sequence (xk ) and determines the rate of convergence as follows:

If p∗ = 1 and C(1) = 1 then we say the convergence is sublinear.



If p = 1 and 1 > C(1) > 1 then we say the convergence is linear.
If p∗ > 1 or C(1) = 0 then we say the convergence is superlinear.
If p∗ = 2 then we say the convergence is quadra!c.
If p∗ = 3 then we say the convergence is cubic, etc.

When working with convergence es"mates it is o!en useful to use the following approxima"on:

∣xk+1 − x∣ ≈ C∣xk − x∣p

for some constant C > 0, not necessarily C(p∗ ).


Here, it is useful to look at the logarithmic behavior:

log(∣xk+1 − x∣) ≈ log (C∣xk − x∣p ) = log(C) + log (∣xk − x∣p ) = log(C) + p∗ log(∣xk − x∣).
∗ ∗

The rate of convergence can be used interchangeably with the order of convergence. However, there is some cau"on
necessary, as different authors use different terminology here. Usually, the order of convergence always refers to the same
thing, namely, the p∗ -exponent in the denominator of the limit defining the order of convergence. Most confusingly, some
authors call the order of convergence “rate of convergence”, as e.g. here. The English Wikipedia ar"cle calls it the order of
convergence, whereas here the rate of convergence is the constant in the defini"on, which also determines the speed of
convergence, together with the order of convergence. So, please always check the context, as the use of the terminology
should be clear from it. If there is no defini"on, try to figure out what is meant in each text. As a rule of thumb: The “order
of convergence” is a unique terminology in numerical analysis. The “rate of convergence” can mean at least two different
things. I will use both words for the same thing, but will try to make clear what I mean from case to case. In any case, to be
sure, use “order of convergence”. My PhD advisor usually said that in mathema"cs “it’s all hollow words” (meaning that one
should check the defini"on).

Landau’s li#le o- and big O -nota!on

Copied from Wikibooks under a Crea"ve Commons BY-SA 4.0 license.

The Landau nota"on is an amazing tool applicable in all of real analysis. The reason it is so convenient and widely used is
because it underlines a key principle of real analysis, namely ‘‘es"ma"on’’. Loosely speaking, the Landau nota"on introduces
two operators which can be called the “order of magnitude” operators, which essen"ally compare the magnitude of two given
func"ons.

The “li#le-o”

The “li$le-o” provides a func"on that is of lower order of magnitude than a given func"on, that is the func"on o(g(x)) is of a
lower order than the func"on g(x). Formally,

Defini!on.
Let A ⊆ R and let c ∈ R.
Let f , g : A → R.

f (x)
If limx→c g(x) = 0 then we say that

"As x → c, f (x) = o(g(x))"

Examples.

As x → ∞, (and m < n), xm = o(xn );


As x → ∞, (and n ∈ N), log x = o(xn );
As x → 0, sin x = o(1).

The “Big-O ”

The “Big-O ” provides a func"on that is at most the same order as that of a given func"on, that is the func"on O(g(x)) is at
most the same order as the func"on g(x). Formally,

Defini!on.
Let A ⊆ R and let c ∈ R

Let f , g :A→R
∣ f (x) ∣
If there exists M > 0 such that limx→c < M then we say that
∣ g(x) ∣
"As x → c, f (x) = O(g(x))"

Examples.
As x → 0, sin x = O(x);
As x → π2 , sin x = O(1).

Applica!ons

We will now consider few examples which demonstrate the power of this nota"on.

Differen!ability

Let f : U ⊆ R → R and x0 ∈ U .

Then f is differen"able at x0 if and only if

There exists a λ ∈ R such that as x → x0 , f (x) = f (x0 ) + λ(x − x0 ) + o (∣x − x0 ∣).

Mean Value Theorem

Let f : [a, x] → R be differen"able on [a, b]. Then,

As x → a, f (x) = f (a) + O(x − a).

Taylor’s Theorem

Let f : [a, x] → R be n-"mes differen"able on [a, b]. Then,

(x−a)f ′ (a) (x−a)2 f ′′ (a) (x−a)n−1 f (n−1) (a)


As x → a, f (x) = f (a) + 1! + 2!
+…+ (n−1)!
+ O ((x − a)n ).

Finding roots of func!ons and fixed points

Let f : R → R be con"nuous. We are interested in methods for finding zeros, that is, roots of f , in other words, x ∈ R, such
that f (x) = 0.

: Rn → R, n ∈ N is k -"mes con"nuously differen"able, then we write f ∈ C k (Rn ) or just f ∈ C k . For


Defini!on. If f
k = 0, f ∈ C 0 or f ∈ C(Rn ) or f ∈ C([a, b]) just means that f is assumed to be con"nuous.

Bisec!on method

The intermediate value theorem for con"nuous func"ons implies that x1 < x < x2 with f (x) = 0 exists if f (x1 )f (x2 ) < 0
, i.e., there is a sign change. The bisec!on method is based on halving the interval such that the sign condi"on is preserved. Note
that, in principle, we have to look for intervals [x1 , x2 ].

Let us analyze the convergence rate. Let [a, b] be an interval. A!er k steps the interval of analysis has length b−a
2k
which
converges to zero for k → ∞. Let us look in a neighborhood of radius δ > 0, so that
b−a b−a b−a
≤ 2δ ⇔ 2k+1 ≥ ⇔ k ≥ log2 ( ) − 1.
2k δ δ

Every step reduces the error by factor 12 . The convergence rate is thus linear.

Newton’s method

1
Assume that f ∈ C 1 . For an ini"al value x0 , consider the itera"on
f (xk )
xk+1 = xk − k = 0, 1, … .
f ′ (xk )

Heuris!cs. If f (x∗ ) = 0 and f ∈ C 2 , by the Taylor’s expansion,

(x∗ − x0 )2 ′′
0 = f (x∗ ) = f (x0 ) + (x∗ − x0 )f ′ (x0 ) + f (ξ)
2

for ξ ∈ [x0 , x∗ ], and upon neglec"ng the 2nd order term,


f (x0 )
x∗ ≈ x0 − .
f ′ (x0 )

Theorem. If f ∈ C 2 and x0 is sufficiently good (i.e. close to the root x∗ ) and if f ′ (x∗ ) 
= 0, then Newton’s method converges
quadra"cally.

Proof. By Taylor’s expansion, it follows that

f (xk ) (x∗ − xk )2 f ′′ (ξk )


x∗ = xk − −
f ′ (xk ) 2 f ′ (xk )

for some ξk ∈ [x∗ , xk ]. Take xk+1 from the method and subtract,
f ′′ (ξk )
2
xk+1 − x∗ = (x∗ − xk ) .
2f ′ (xk )
≤D

In other words,

∣xk+1 − x∗ ∣ ≤ D∣xk − x∗ ∣2 ,

as k → ∞ and thus xk → x∗ .
Hence the method is quadra"c. Note that f ′ (xk ) does not vanish by con"nuity if xk is close to x∗ . □


What happens if f (x∗ ) = 0?
f ′′ (ξk )
xk+1 − x∗ = (x∗ − xk )2
2f ′ (xk )
→0

as k → ∞.
By Taylor’s expansion (η =“eta”):

f ′ (xk ) = f (x∗ ) + (xk − x∗ )f ′′ (ηk ) = (xk − x∗ )f ′′ (ηk )
=0

for some ηk ∈ [x∗ , xk ], and hence

xk+1 − x∗ = f ′′ (ηk ) = (xk − x∗ )f ′′ (ηk )(xk − x∗ ).

The method has degenerated to a linear method!

Example. f (x) = x2 , f ′ (x) = 2x. Newton:


2
1
x2k 1
xk+1 = xk − = xk .
2xk 2

Secant method


Some"mes it can be difficult or computa"onally expensive to compute the deriva"ve f (xk ). Newton 's method can be
adapted by approxima"ng the deriva"ve by the differen"al quo"ent. The secant method is the following two-step recursive
algorithm.

f (xk )(xk − xk−1 )


xk+1 = xk − , k = 1, 2, …
f (xk ) − f (xk−1 )

1+ 5
with two dis"nct star"ng points x0 
= x1 . The convergence rate is 2 ≈ 1.62, the golden ra!o.

Fixed point itera!on

Defini!on. A point x ∈ R is called a fixed point of φ : R → R if φ(x) = x.

We could for instance use Newton’s method to find fixed points by se%ng f (x) := φ(x) − x.

Banach’s Fixed Point Theorem. Suppose that φ is a contrac!on, that is, there exists a constant L < 1 such that

∣φ(x) − φ(y)∣ ≤ L∣x − y∣

∈ R. Then there exists a unique fixed point x∗ ∈ R of φ, i.e., φ(x∗ ) = x∗ , and the fixed point itera"on φn :=
for all x, y
φ ∘ … ∘ φ sa"sfies limn→∞ φn (x0 ) = x∗ for any star"ng point x0 ∈ R. The convergence rate is at least linear.
n-times

Proof. We prove that the sequence {φk (x0 )}∞


k=0 = {xk }∞
k=0 is a Cauchy sequence. Let k > j . Then, by the triangle
inequality,

∣xk − xj ∣ ≤ ∣xk − xk−1 ∣ + ∣xk−1 − xk−2 ∣ + … + ∣xj+1 − xj ∣.


(k−j)-summands

Furthermore,

∣xm − xm−1 ∣ = ∣φ(xm−1 ) − φ(xm−2 )∣ ≤ L∣xm−1 − xm−2 ∣ ≤ Lm−1 ∣x1 − x0 ∣.

Hence, by the geometric series,

1 − Lk−j
∣xk − xj ∣ ≤ Lj ∣x1 − x0 ∣.
1−L

If k > N , j > N , then


1
∣xk − xj ∣ ≤ LN ∣x1 − x0 ∣ → 0
1−L

as N → ∞, which proves that {xk } is a Cauchy sequence. The linear convergence rate follows also from this es"mate.
The existence of a fixed point follows from the con"nuity of φ (as contrac"ons are uniformly con"nuous, in fact, even Lipschitz
con"nuous) as follows.

x∗ = lim xk = lim xk+1 = lim φ(xk ) = φ( lim xk ) = φ(x∗ ).


k→∞ k→∞ k→∞ k→∞

Theorem. Assume that φ ∈ C p for p ≥ 1. Furthermore, assume that has a fixed point x∗ and assume that

φ′ (x∗ ) = φ′′ (x∗ ) = … = φ(p−1) (x∗ ) = 0

for p ≥ 2 and

G′ (x∗ ) < 1

= 1. Then the fixed point sequence {φk (x0 )} converges to x∗ at least with rate p, provided that the star"ng point x0 is
if p
sufficiently close to x∗ . If, in addi"on, φ(p) (x∗ ) 
= 0, then the rate of convergence is precisely p.

Proof. First note that by Banach’s fixed point theorem the limit indeed converges to x∗ for suitable star"ng points x0 . By the
Taylor expansion,

p−1
φ(l) (x∗ ) G(p) (ξk )
xk+1 − x∗ = φ(xk ) − φ(x∗ ) = ∑ (xk − x∗ )l + (xk − x∗ )p
l=1
l! p!

for some ξk between x∗ and xk . The sum will be le! empty for the case p = 1. Since φ(l) (x∗ ) = 0 for 1 ≤ l ≤ p − 1, we
get that

∣φ(p) (ξk )∣
∣xk+1 − x∗ ∣ = ∣xk − x∗ ∣p
p!
.
By con"nuity, there exists C > 0 (with C < 1 for p = 1) such that

∣φ(p) (ξk )∣
≤C
p!
for ξk sufficiently close to x∗ (that is, for sufficiently large k ). Thus,

∣xk+1 − x∗ ∣ ≤ C∣xk − x∗ ∣p

for large k , and thus the rate of convergence is at least p. Note that for p = 1, this also proves convergence by

∣xk+1 − x∗ ∣ < ∣φ(ξk )∣∣xk − x∗ ∣.


<1

If φ(p) 
= 0, then by con"nuity, there exists K > 0 such that

∣φ(p) (ξk )∣
≥K
p!

for ξK sufficiently close to x∗ . Thus

∣xk+1 − x∗ ∣ ≥ K∣xk − x∗ ∣p

which implies that the rate of convergence cannot be higher than p. Thus the rate of convergence is precisely p. □

Note. From the above proof, we expect that close to the fixed point x∗

∣φ(p) (x∗ )∣
∣xk+1 − x∗ ∣ ≈ ∣xk − x∗ ∣p ,
p!
when

φ′ (x∗ ) = φ′′ (x∗ ) = … = φ(p−1) (x∗ ) = 0,

but φ(p) (x∗ ) 


= 0.

Polynomial interpola!on

Idea. Approximate a func"on f : R → R over [a, b] by a polynomial p such that in dis!nct data points (xi , yi ), i =
0, 1, … , n, the approxima"on is exact, that is,

f (xi ) = yi = p(xi ), for all i = 0, 1, … , n.

We may call xi node and yi value.


We need at least 2 data points. We usually just assume that xi 
= xj for i 
= j.

Note. Interpola"on polynomials are not per se unique, for instance the data {(−1, 1), (1, 1)} can be interpolated by
p(x) = 1, q(x) = x2 , or r(x) = x4 − x2 + 1. However, we will see later that p is the unique interpola"on polynomial
with deg p ≤ 1 = n.

Example. (1, 2), (2, 3), (3, 6), as data set {(xi , yi ) : i = 0, 1, 2} on the interval [1, 3].
2
We are looking for a polynomial p2 (x) = ∑j=0 cj xj , which is chosen to be 2nd order, because we have 3 data points and 3
unknown coefficients.
We can formulate the problem in matrix form:

⎛1 x0 x20 ⎞ ⎛c0 ⎞ ⎛y0 ⎞


1 x1 x21 ⋅ c1 = y1 ,
⎝1 x2 x22 ⎠ ⎝c2 ⎠ ⎝y2 ⎠

which is a so-called Vandermonde matrix which has determinant

⎛1 x0 x20 ⎞
det 1 x1 x21 = ∏(xj − xi ) 
= 0,
⎝1 x2 x22 ⎠ i<j

and is thus inver"ble. Here,

⎛1 1 1⎞ ⎛c0 ⎞ ⎛2⎞
1 2 4 ⋅ c1 = 3 .
⎝1 3 9⎠ ⎝c2 ⎠ ⎝6⎠

As a result, c0 = 3, c1 = −2, and c2 = 1, and thus,

p2 (x) = x2 − 2x + 3.

The computa"onal complexity of solving the linear system is O(n3 ). We used the natural basis for the polynomials.
What would be the ideal basis?
Defini!on. (Lagrange basis polynomials) Suppose that xi 
= xj if i 
= j . We call
n
x − xj
ϕi (x) := ∏
xi − xj
j=0

i=j
the ith Lagrange basis polynomial.
The Lagrange interpola!on polynomial is given by
n
p(x) := ∑ yi φi (x).
i=0

Clearly,

φi (xj ) = δi,j := {
1 if i = j,
0 if i 
= j.

Example. (1, 2), (2, 3), (3, 6):

(x − 2)(x − 3)
φ0 (x) =
(1 − 2)(1 − 3)

(x − 1)(x − 3)
φ1 (x) =
(2 − 1)(2 − 3)

(x − 1)(x − 2)
φ2 (x) =
(3 − 1)(3 − 2)

p2 (x) = 2φ0 (x) + 3φ1 (x) + 6φ2 (x) = x2 − 2x + 3.

Evalua"ng the Lagrange polynomials has the computa"onal complexity O(n2 ).

Newton’s interpola!on

Idea. Extend the natural basis:

n−1
1, x − x0 , (x − x0 )(x − x1 ), …, ∏(x − xj ).
j=0

Defini!on. Define Newton’s interpola"on polynomials by

n−1
pn (x) = a0 + a1 (x − x0 ) + … + an ∏(x − xj )
j=0

in such a way that pn (xi ) = yi .


Clearly,

p(x0 ) = y0 ⇒ a0 = y0 ,

and

y1 − a0
p(x1 ) = a0 + a1 (x1 − x0 ) = y1 ⇒ a1 = .
x1 − x0

More generally, we have the lower triangular linear system

⎛1 0 ⋯ ⎞⎛ ⎞ ⎛ ⎞
⎛1 0 ⋯ ⎞ ⎛ a0 ⎞ ⎛ y0 ⎞
1 x1 − x0 0 ⋯ a1 y1
1 x1 − x0 (x2 − x0 )(x2 − x1 ) 0 ⋯ a2 = y2 .
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
⎝1 x1 − x0 (x2 − x0 )(x2 − x1 ) ⋯ ∏n−1 (x − x ) ⎠ ⎝an ⎠ ⎝yn ⎠
j=0 n j

Example.

p2 (x) = a0 + a1 (x − 1) + a2 (x − 1)(x − 2)

with the system

⎛1 0 0⎞ ⎛a0 ⎞ ⎛2⎞
1 1 0 ⋅ a1 = 3
⎝1 2 2⎠ ⎝a2 ⎠ ⎝6⎠

and hence a0 = 2, a1 = 1, and a2 = 1 which yields p2 (x) = x2 − 2x + 3.

Uniqueness

Theorem. Interpola"on polynomials with n + 1 nodes xi , i = 0, 1, … , n are unique in the class of polynomials q with
deg q ≤ n.

Proof. (Idea). pn has at most n roots. Let pn and qn be two interpola"ng polynomials for the same set of data. Then

pn (xj ) = qn (xj ) = 0, for any j = 0, 1, … , n.

Hence pn − qn has n + 1 dis"nct roots. As deg(pn − qn ) ≤ max(deg pn , deg qn ) = n, the only polynomial with n + 1
roots is the polynomial which is constantly zero. Hence,

pn = qn .

We have used the corollary to the fundamental theorem of algebra which states that every non-constant real polynomial of
degree m has at most m zeros. □

Divided differences

Let p be a Newton interpola"on polynomial

n=1
p(x) = a0 + a1 (x1 − x0 ) + a2 (x − x2 )(x − x1 ) + … + an ∏(x − xj ).
j=0

Defini!on. The divided difference of order k , denoted by f [x0 , x1 , … , xk ], is defined as the ak -coefficient of the Newton
interpola"on polynomial with data yi = f (xi ), in other words,

f [x0 , x1 , … , xk ] := ak .

Theorem.

f [x1 , … , xk ] − f [x0 , … , xk−1 ]


f [x0 , x1 , … , xk ] = .
xk − x0

Note. The recursion terminates because f [xi ] = yi .

Example. ** (1, 2), (2, 3), (3, 6), p2 (x) = x2 − 2x + 3, Newton: a0 = 2, a1 = 1, a2 = 1.


3−2 6−3 3−1
3−2 6−3 3−1
f [x0 ] = 2 = a0 , f [x1 ] = 3, f [x2 ] = 6, f [x0 , x1 ] = 2−1
= 1 = a1 , f [x1 , x2 ] = 3−2
= 3, f [x0 , x1 , x2 ] = 3−1
=
1 = a2 .

Why does this work?

One point: f [xj ] = fj = yj .


f [x ]−f [x ]
Two points: f [xi , xj ] = xj −x i which is the line spanned by the two points (xi , yi ) and (xj , yj ), i.e.,
j i

yj − yi
y − yi = (x − xi ).
xj − xi

Proof. (Idea). We have three interpola"on polynomials p, q , r , where deg p = k , deg q = deg r = k − 1. p interpolates at
x0 , x1 , … , xk , q interpolates at x0 , x1 , … , xk−1 , and r interpolates at x1 , … , xk .
Claim.

x − x0
p(x) = q(x) + (r(x) − q(x)).
xk − x0
=0 for xi

x0 : p(x0 ) = q(x0 ) = f0 .
x1 , … , xk−1 : p(xi ) = q(xi ).
xk − x0
xk : p(xk ) = q(xk ) + (r(xk ) − q(xk ) = r(xk ).
xk − x0
=1
The highest order term has the coefficient

p(k) (x) rk−1 − qk−1


= ,
k! xk − x0

r(k−1) q (k−1)
where rk−1 = f [x1 , x2 , … , xk ] = (k−1)! and qk−1 = f [x0 , x2 , … , xk−1 ] = (k−1)! ,
which can be proved by the general Leibniz rule. □

Interpola!on error

Assume that f ∈ C n+1 . We are interested in the local (pointwise) error (residual)

R(x) := f (x) − p(x),

where p is the interpola"on polynomial with deg p = n.


Fix data (xi , yi ), yi = f (xi ), i = 0, 1, … , n, xi  = j . Let x′ be an dis"nct extra point.
= xj , i 
Define an auxiliary func"on:

h(x) = f (x) − p(x) − cw(x),

where
n
w(x) = ∏(x − xj )
j=0

and

f (x′ ) − p(x′ )
c= .
w(x′ )
We find that

h(xi ) = 0 for i = 0, 1, … n.

Furthermore,

f (x′ ) − p(x′ )
h(x′ ) = f (x′ ) − p(x′ ) − w(x′ ) = 0.
w(x′ )

Hence h has n + 2 dis"nct zeros. By Rolle’s theorem (see Differen"al and Integral Calculus 1), h(n+1) will have at least one
zero. Let’s call this point ξ .

h(n+1) (x) = f (n+1) − p(n+1) (x) − cw(n+1) (x) = f (n+1) (x) − c(n + 1)!
=0

Hence

h(n+1) (ξ) = f (n+1) (ξ) − c(n + 1)! = 0

and thus

f (n+1) (ξ)
c= .
(n + 1)!

We have proved that:

Theorem. If f ∈ C n+1 , the residual R = f − p at x has the form


n
f (n+1) (ξ)
R(x) = ∏(x − xj ).
(n + 1)! j=0

No"ce that, in general, R is not a polynomial, as ξ = ξ(x) depends nonlinearly on x.

Note. The constant c is a divided difference:

1
f [x0 , x1 , … , xn , x] = f (n+1) (ξ(x)),
(n + 1)!

which follows from the formula for R, R(n+1) = f (n+1) and R(xi ) = f (xi ) for i = 0, 1, … , n.

Piecewise interpola!on

Setup. Fix a bounded interval [a, b] and a step size / mesh

b−a
h :=
n

for some n ∈ N, where n is the number of subintervals.

Idea. Approximate the func"on on each subinterval using some low order interpola"on polynomial such that the interpola"on
func"on is exact at the nodes.

Piecewise linear case:

x−x x−x
x − xi x − xi−1
ℓi (x) = f (xi−1 ) + f (xi ) , x ∈ [xi−1 , xi ].
xi−1 − xi xi − xi−1
f (n+1) (ξ) n
By the residual formula on each subinterval R(x) = (n+1)!
∏j=0 (x − xj ), we get the interpola"on error

f ′′ (ξ)
f (x) − ℓi (x) = (x − xi−1 )(x − xi ),
2!

which simplifies if ∣f ′′ (x)∣ ≤ M by maximiza"on as follows

h2
∣f (x) − ℓi (x)∣ ≤ M , x ∈ [xi−1 , xi ].
8
Note. If f ′′ is bounded over the whole interval [a, b] then the error is the same over the whole interval.

Hermite interpola!on

3
Piecewise interpola"on by degree 3 polynomials p3 (x) = ∑j=0 cj xj . As we have 4 coefficients, we need 4 constraints. We
demand that not only the func"on but also the deriva"ves are exact at the nodes. Let p be a cubic interpola"on polynomial on
[xi−1 , xi ]. Then p′ is a quadra"c polynomial. Recall that h = xi − xi−1 .
We have the condi"ons:

1. p(xi−1 ) = f (xi−1 ),
2. p(xi ) = f (xi ),
′ ′
3. p (xi−1 ) = f (xi−1 ),
′ ′
4. p (xi ) = f (xi ).

Set

x − xi x − xi−1
p′ (x) = f ′ (xi−1 ) + f ′ (xi ) + α(x − xi−1 )(x − xi ).
xi−1 − xi xi − xi−1

Integra"ng yields

f ′ (xi−1 ) x f ′ (xi ) x x
p(x) = − ∫ (t − xi ) dt + ∫ (t − xi−1 ) dt + α ∫ (t − xi−1 )(t − xi ) dt + C.
h xi−1 h xi−1 xi−1

Plugging in xi−1 for x yields that C = f (xi−1 ). Plugging in xi for x and integra"ng yields
3 ′ 6
α= 3
(f (xi−1 ) + f ′ (xi )) + 3 (f (xi−1 ) − f (xi )).
h h

Splines

Let us construct a global piecewise interpola"on func"on s ∈ C 2 such that:

1. We do not impose exactness for deriva"ves.


2. We get a piecewise polynomial construc"on of cubic interpola"on polynomials which is exact and has con"nuous 1st and
2nd deriva"ves.

This requires a global setup. All coefficients are defined first, only evalua"on is piecewise.
Setup. Let h = xi − xi−1 be constant. Define

zi := s′′ (xi ), i = 1, … , n − 1.
Now,

1 1
s′′ (x) = zi−1 (xi − x) + zi (x − xi−1 ).
h h

Denote s on the interval [xi−1 , xi ] by si . Integra"ng twice yields

1 (xi − x)3 1 (x − xi−1 )3


si (x) = zi−1 + zi + Ci (x − xi−1 ) + Di ,
h 6 h 6

where

1 (xi − x)2 1 (x − xi−1 )2


s′i (x) = − zi−1 + zi + Ci .
h 2 h 2

Set fi := f (xi ). We get that


h2
Di = fi−1 − zi−1
6

and

1 h2
Ci = [fi − fi−1 + (zi−1 − zi )] .
h 6

Now s has been defined over all subintervals. However, the zi are s"ll unknown!
Using the condi"on for con"nuity of the deriva"ves s′i (xi ) = s′i+1 (xi ) for all i yields
1 h2 1 h2
zi + [(fi − fi−1 ) + (zi−1 − zi )] = − zi + [(fi+1 − fi ) + (zi − zi+1 )] ,
h h
2 h 6 2 h 6

for i = 1, … , n − 1.
In fact, this cons"tutes a triangular system:

2h h h 1
zi + zi−1 + zi+1 = (fi+1 − 2fi + fi−1 ) =: bi .
3 6 6 h

The values z0 and zn at the interval boundary have to be moved to the right hand side, and thus:

1 h
b1 := (f2 − 2f1 + f0 ) − z0
h 6

and

1 h
bn−1 := (fn − 2fn−1 + fn−2 ) − zn .
h 6

z0 and zn can be chosen freely, for example to force that the 1st deriva"ve of the spline is exact at the interval boundary
points. If z0 = zn = 0, s is called a natural spline.

Bézier curves

Bézier curves are parametrized curves in R2 , that is, r(t) = x(t)i + y(t)j,
1 0
1 0
where i := ( ), j := ( ) and x, y : [0, 1] → R.
0 1

Bernstein polynomials

Define the Bernstein polynomial Bkn (t), t ∈ [0, 1], n ∈ N ∪ {0}, k = 0, … , n, by

Bkn (t) := ( )tk (1 − t)n−k .


n
k

Proper"es:

n
1. ∑k=0 Bkn (t) = 1 (= (t + 1 − t)n ),
2. 0 ≤ Bkn (t) ≤ 1,
3. B0n (0) = Bnn (1) = 1, otherwise, if k 
= 0, Bkn (0) = 0 and if k 
= n, Bkn (1) = 0.

We have the combinatorial rule:

Bkn (t) = (1 − t)Bkn−1 (t) + tBk−1


n−1
(t).

Bézier curves

Fix a finite set X = {x0 , x1 … , xk } of control points xi ∈ Rn .

Defini!on. The convex hull of X is defined by

k k
conv(X) = {y ∈ R : y = ∑ λi xi , λi ∈ [0, 1], ∑ λi = 1} .
n

k=0 k=0

Defini!on. The Bézier curve β n is defined,


n
β n (t) = ∑ xk Bkn (t).
k=0

Sanity check: t = 0, Bkn (0) = 0, except B0n (0) = 1.


⇒ β n (0) = x0 ,
⇒ β n (1) = xn .
We get closed curves if x0 = xn .

What about the con!nuous tangents?


Recall that (nk ) = n!
k!(n−k)!
.

Bk (t) = ( ) (ktk−1 (1 − t)n−k − (n − k)tk (1 − t)n−k−1 )


d n n
dt k

(n − 1)! (n − 1)!
= n[ tk−1 (1 − t)n−k − tk (1 − t)n−k−1 ]
(k − 1)!(n − k)! (k)!(n − k − 1)!

n−1
= n (Bk−1 (t) − Bkn−1 (t)) .

Therefore,

n
n
d n
β (t) = n ∑ (Bk−1
n−1
(t) − Bkn−1 (t)) xk
dt
k=0

n n−1
= n [∑ Bk−1
n−1
(t)xk − ∑ Bkn−1 (t)xk ]
k=1 k=0

n−1 n−1
= n [∑ Bkn−1 (t)xk+1 − ∑ Bkn−1 (t)xk ]
k=0 k=0

n−1
= n ∑(xk+1 − xk )Bkn−1 (t).
k=0

Bˊezier

Hence, for the closed curves:

{ dtd
d
β n (0) = n(x1 − x0 ),
dt β (1) = n(xn − xn−1 ).
n

For smoothness, we need that (x1 − x0 ) ∥ (xn − xn−1 ), i.e., x1 − x0 and xn − xn−1 are parallel.

Li$ing

Control points define the curve but the converse is not true.
Consider:

n n+1
n
β (t) = ∑ Bkn (t)xk = ∑ Bkn+1 (t)yk = αn+1 (t).
k=0 k=0

Let us use the conven"on x−1 = xn+1 = 0. We get the condi"on

yk = (1 − ) xk +
k k
xk−1 .
n+1 n+1

De Casteljau’s algorithm

For control points x0 , x1 , … , xn the algorithm of De Casteljau is as follows:

1. Define constant curves βi0 (t) = xi .


2. Set

βir (t) = (1 − t)βir−1 (t) + tβi+1


r−1
(t), r = 1, … , n, i = 0, … , n − r.

Th

n
The algorithm terminates at β0n (t) and has ( 2 ) opera"ons.

There is also a reverse algorithm for spli%ng Bézier curves.


Numerical integra!on

Integra"on schemes are called quadratures. Therefore, numerical integra"on methods are simply called numerical quadratures.

Note. There are no simple integra"on schemes in higher dimensions. Already 2D-cases are complicated.

Newton-Cotes quadrature rules

Let f : [a, b] → R.
Idea. Approximate

b
∫ f (x) dx =: I
a

by the integral of an interpola"on polynomial


b
I ≈ ∫ pk (x) dx =: Q(pk ),
a

where pk is an interpolant of f over [a, b].

Lagrange:

b n b ⎛ ⎞
x − xj
∫ f (x) dx ≈ ∑ f (xi ) ∫ ∏ dx.
⎝j=0 ⎠
a i=0 a xi − xj

i=j

Let n = 1:
x−b x−a
p1 (x) = f (a) + f (b) ,
a−b b−a

so

b b
b−a
∫ f (x) dx ≈ ∫ p1 (x) dx = [f (a) + f (b)].
a a 2

⇒ Trapezoidal rule!
Error formula"on:

b b
1 b ′′
∫ f (x) dx − ∫ p1 (x) dx = ∫ f (ξ)(x − a)(x − b) dx.
a a 2 a

Now, (x − a)(x − b) < 0 for x ∈ (a, b).


Therefore, by the mean value theory of integra"on,

b
1 ′′
= f (η) ∫ (x − a)(x − b) dx.
2 a

1
=− (b − a)3 f ′′ (η).
12
b−a
Composite rule: h = n , xi = a + ih, i = 0, … , n.
b
b
h
∫ f (x) dx ≈ [f (x0 ) + 2f (x1 ) + … + 2f (xn−1 ) + f (xn )] .
a 2

Total error: O(h2 ) ∼ O( n12 ). We say that the method is quadra"c.

Let n = 2. When is a method exact for degree 2 (or lower)?

Note. In this context, exactness means, that the integral and the method give the exact same result for
polynomials of certain order.

b
a+b
∫ f (x) dx = A1 f (a) + A2 f ( ) + A3 f (b),
a 2

where we call the Ai weights.

b
∫ 1 dx = b − a ⇒ A1 + A2 + A3 = b − a.
a

b
b2 − a2 a+b b2 − a2
∫ x dx = ⇒ A1 a + A2 ( ) + A3 b = .
a 2 2 2

b 2
1 3 a+b 1
∫ x2 dx = (b − a3 ) ⇒ A1 a2 + A2 ( ) + A3 b2 = (b3 − a3 ).
a 3 2 3

Thus,

b−a
A1 = A3 =
6

and

4(b − a)
A2 = .
6

As integrals and the methods are linear, this extends to all polynomials of deg ≤ 2.

This is the so-called Simpson’s rule:


b
b−a a+b
∫ f (x) dx ≈ [f (a) + 4f ( ) + f (b)] .
a 6 2

The associated composite rule becomes:


b
h
∫ f (x) dx ≈ [f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + … + 4f (xn−1 ) + f (xn )] .
a 6

Error for n = 2:
1
(b − a)5 f (4) (η).
4!5!

Error for the composite: O(h4 ). The method is exact for cubic polynomials!
Orthogonal polynomials

Define the inner product of two real-valued polynomials on [a, b] (depends on a and b!) by:

b
⟨p, q⟩ = ∫ p(x)q(x) dx.
a

The associated norm on [a, b] is given by

1/2

∥q∥ := (∫ ∣q(x)∣ dx)


b
2
.
a

Defini!on. Two non-zero polynomials are said to be orthogonal on [a, b] if their inner product is zero. They are said to be
orthonormal if they are orthogonal and have both norm 1.
In other words, orthogonality: ⟨p, q⟩
= 0, then we write p ⊥ q .
Orthonormality: p ⊥ q and ⟨p, p⟩ = 1 = ⟨q, q⟩.

Gram-Schmidt (GS) procedure.


Idea. Transform a basis to an orthogonal one:

{1, x, x2 , … , xk , …} ⟶ {q0 , q1 , … , qk , …}, orthonormal.

Note. The GS procedure depends on the inner product, and thus, here, on a and b.

The elements of the orthonormal basis are called orthogonal polynomials.

1 1 1
1. q0 = = 1/2 = .
∥1∥ b−a
(∫a 12 dx)
b

2. For j = 1, 2, …,
j−1
q~j (x) = xqj−1 (x) − ∑⟨xqj−1 , qi ⟩qi (x),
i=0

and

q~j
qj (x) := ~ .
∥q j ∥

The new basis is (pairwise) orthonormal!


Above, as usually, we denote the polynomial p(x) = x with the symbol x.

Observa!on. By bilinearity, qj−1 is orthogonal to all polynomials of deg ≤ j − 2.


Thus,

⟨xqj−1 , qi ⟩ = ⟨qj−1 , xqi ⟩ = 0, i ≤ j − 3.

As a consequence, the GS procedure reduces to

q~j (x) = xqj−1 (x) − ⟨xqj−1 , qj−1 ⟩qj−1 (x) − ⟨xqj−1 , qj−2 ⟩qj−2 (x)

which is a three-term recurrence rule!

Note. The trick ⟨xqj−1 , qi ⟩ = ⟨qj−1 , xqi ⟩ relies heavily on the fact that the inner product is defined by an integral and
that we are dealing with polynomials. The GS procedure works generally in pre-Hilbert spaces, however, then we do not
expect this kind of simplifica"on.

Claim. The GS procedure works.


Proof.

j−1
⟨q~j , qj−1 ⟩ = ⟨xqj−1 , qj−1 ⟩ − ∑⟨xqj−1 , qi ⟩ ⟨qi , qj−1 ⟩
i=0
=0 except when i=j−1, then it is =1

= ⟨xqj−1 , qj−1 ⟩ − ⟨xqj−1 , qj−1 ⟩ = 0.

Gauss quadrature

Idea. Choose the nodes and the weights simultaneously.


One interval:
b
∫ f (x) dx = A0 f (x1 ) + A1 f (x1 ),
a

with weights A0 , A1 , and nodes x0 , x1 , for n = 1, this is a (n + 1) = 2-rule.


The coefficients are determined by the usual process:
b
∫ 1 dx = b − a = A0 + A1 .
a

b
b2 − a2
∫ x dx = = A0 x0 + A1 x1 .
a 2

b
1 3
∫ x2 dx = (b − a3 ) = A0 x20 + A1 x21 .
a 3

The resul"ng system is nonlinear!

Let us use the orthogonal polynomials in the following way.

Theorem. Let x0 , x1 , … , xn be the roots of an orthogonal polynomial qn+1 on [a, b] of degree n.


Then

b n
∫ f (x) dx ≈ ∑ Ai f (xi ),
a i=0

where

b n
x − xj
Ai := ∫ φi (x) dx, φi (x) = ∏ ,
a xi − xj
j=0

j=i
is exact for all polynomials of degree 2n + 1 or less.

Proof. Let f be a polynomial with deg f = 2n + 1. By the polynomial division algorithm,

f = qn+1 pn + rn ,

where deg pn ≤ n and deg rn ≤ n. Then,

f (xi ) = qn+1 (xi )pn (xi ) + rn (xi ) = rn (xi ).


=0

Integrate,

b b b
∫ f (x) dx = ∫ qn+1 (x)pn (x) dx + ∫ rn (x) dx
a a a
=⟨qn+1 ,pn ⟩=0

b n n
= ∫ rn (x) dx = ∑ Ai rn (xi ) = ∑ Ai f (xi ).
a i=0 i=0

Because rn can be interpolated exactly with n + 1 nodes. The last equality follows from the reasoning before. □

We can extend the no"on of orthogonal polynomials to so-called weighted orthogonal polynomials with respect to the inner
product
b
⟨p, q⟩w = ∫ p(x)q(x)w(x) dx,
a

where w is a posi"ve weight func!on.

One (mathema"cal) advantage: Works also on R = (−∞, ∞).


x2
Example. If w(x) = e−x , we get the so-called Laguerre polynomials. If w(x) = e− 2 , we get the so-called Hermite
polynomials, which are meaningful in probability theory (the weight is the density of the Gaussian normal distribu"on up to
mul"plica"on by a normaliza"on constant).

Theorem. The previous theorem holds a ⟨⋅, ⋅⟩w -orthogonal polynomial qn+1 with

b n
x − xj
Ai := ∫ φi (x)w(x) dx, φi (x) = ∏ .
a xi − xj
j=0

j=i

Error formula. (n + 1)-point rule with nodes x0 , x1 , … , xn :


n
f (2(n+1)) (ξ(x))
error = ∏(x − xj )2 .
(2(n + 1))!
j=0

j=i

Where does the square come from?


We assume that the deriva"ves of f are con"nuous, therefore Hermite interpola"on is the natural choice.

Example. Gauss rule on [−1, 1], n = 1. No"ce, since we only want the roots, there is no need to normalize q~i , i = 0, 1, 2.
GS: q~0 = 1.

1
1
⟨x, 1⟩ ∫ x dx
q~1 = x ⋅ 1 − ⋅ 1 = x − −1
1 ⋅ 1 = x,
⟨1, 1⟩ ∫−1 1 dx

where ⟨1, 1⟩ = 2, and ⟨x, 1⟩ = 0.


⟨x2 , 1⟩ ⟨x2 , x⟩ 1
q~2 = x ⋅ x − ⋅1− ⋅ x = x2 − ,
⟨1, 1⟩ ⟨x, x⟩ 3

where ⟨x, x⟩ = ⟨x2 , 1⟩ = 2


3
⟨x2 , x⟩ = 0.
and
The resul"ng orthogonal polynomials on [−1, 1] are called Legendre polynomials.
q~2 (and q2 ) has the roots x = ± 13 .
The associated Newton quadrature rule is:
1
1 1
∫ f (x) dx = A0 f (− )) + A1 f ( ).
−1 3 3

Let us check exactness:


1
∫ 1 dx = 2 = A0 + A1 .
−1

1
−A0 A1
∫ x dx = 0 = + .
−1 3 3

From this, we obtain easily that A0 = A1 = 1. This weights could of course also been determined by integra"ng the Lagrange
polynomials over [−1, 1].

b 2 2
2 1 1
∫ x dx = = 1 ⋅ (−
2
) +1⋅( ) .
a 3 3 3

b 3 3
1 1
∫ x3 dx = 0 = 1 ⋅ (− ) +1⋅( ) .
a 3 3

Thus, the Newton quadrature is indeed exact up to degree 2n + 1 = 3!

Probabilis!c examples

Monte Carlo integra!on

Let Xi , i ∈ N, be i.i.d. (independent, iden"cally distributed) random variables with mean µ and variance σ 2 . Then for the
arithme"c mean (also called Césaro sum)

N
1
AN := ∑ Xi .
N
i=1

By the law of large numbers, we have almost surely


lim AN = µ.
N →∞

We have that

N
1 σ2
var(AN ) = 2 ∑ var(Xi ) = .
N N
i=1

In order to get the right unit, we have to consider the standard devia"on
σ
σ(AN ) = .
N

Consequence:
If our integra"on problem can be cast into an averaging problem, the convergence rate will be O( 1 ).
N

Note. The rate is independent of the spa"al dimension.

Example. Es"ma"ng the value of π . The area of a circle is A = πr2 . Set r = 1. Consider the box V = [−1, 1] × [−1, 1]
with volume ∣V ∣ = 4. The ra"o of the areas of circle enclosed by the box and the enclosing box is π4 . Let

gi = {
1, if p is inside A,
0, otherwise.

Idea: Let us sample the points pi uniformly from V . In the limit, the number of “hits” over all samples tends to the ra"o of the
areas!

Buffon’s needle

Suppose we doing a random experiment with a large number of needles of same length L that we throw on the floor, which
has parallel strips drawn on it which have all the same distance D to their neighboring strip.

What is the probability that a dropped needle intersects with a line?

Let y be the distance from the center of the needle to the closest line and let θ be the acute angle of the intersec"on point.

We assume, for simplicity, L = D = 1.

Both y and θ are random variables with distribu"on

1
y ∼ Unif ([0, ]) ,
2

1
where y = 0 means that the needle is centered on a line and y = 2
means that y is perfectly centered between two lines.
π
θ ∼ Unif ([0, ]) ,
2

where θ = 0 means that the needle is parallel to the lines and θ = π


2
means that the needle is perpendicular to the lines.

We may assume that y and θ are independent random variables (why?).

By the law of sines, the condi"on for intersec"on with a line is

2y ≤ sin θ.
The joint probability distribu"on of two independent random variables is the product of the respec"ve distribu"ons on
[0, π2 ] × [0, 12 ]. Determining the probability requires calcula"on of the ra"o of the area of the condi"on of intersec"on in
rela"on to the total area π4 .

The condi"on is fulfilled by


π
1 2 1
∫ sin θ dθ = .
2 0 2

Thus the probability of intersec"on is

1 π 2
/ = .
2 4 π

By the law of large numbers, the ra"o of needles intersec"ng the lines with all needles converges to π2 ≈ 0.6366. Hence, we
have found yet another probabilis"c algorithm to determine the value of π .

Ini!al value problems (IVPs)

Problem. (Not necessarily linear) ordinary differen"al equa"on (ODE), with ini"al value y0 at ini"al "me t0 up to a finite "me
horizon T > t0 :

{
y ′ (t) = f (t, y(t)),
y(t0 ) = y0 .

Assump!ons. Existence and uniqueness of the solu"ons are understood (by e.g. Picard itera"on).
Let us assume that f is con"nuous as a func"on from [t0 , T ] × R → R and Lipschitz con"nuous in the following sense:
There exists L > 0 such that for every y, z ∈ R, t ∈ [t0 , T ],

∣f (t, y) − f (t, z)∣ ≤ L∣y − z∣.

Euler’s method

Fix a constant step size h > 0.

1. y0 := y(t0 ).
2. tk := tk−1 + h and yk+1 = yk + hf (tk , yk ), k = 0, 1, 2 … ,.

Applying Taylor’s formula yields:

h2 ′′ h2
y(tk+1 ) = y(tk ) + hy ′ (tk ) + y (ξk ) = y(tk ) + hf (tk , y(tk )) + y ′′ (ξk ),
2 2

with ξk ∈ [a, b].

We shall deal with two types of error:

(A) trunca"on error (local),


(B) global error.

Nota!on. y(tk ) denotes the exact solu"on to the IVP at t = tk , whereas yk denotes the result of the method at step k .

For Euler, we get that

− h
yk+1 − yk h ′′
= f (tk , yk ) + y (ξk ) .
h 2
local error O(h)

Hence the Euler method is locally (in each point) or order 1.


The method is consistent:

yk+1 − yk
lim = y ′ (tk ) = f (tk , y(tk )).
h→0 h

Note that k depends on h, which we omit in the nota"on.

What about the global error, that is, uniform convergence on [t0 , T ]?

max ∣y(tk ) − yk ∣ → 0

as h → 0?

Theorem. Assume the following:

1. f is Lipschitz in the second component.


′′
2. max ∣y (tk )∣ ≤ M for some global constant M > 0.
3. y0 → y(t0 ) as h → 0.

Then Euler’s method is uniformly convergent to the exact solu"on on [t0 , T ] and the global error is O(h).

Proof. Set dj := y(tj ) − yj .


By Taylor and Euler:

h2 ′′
dk+1 = dk + h[f (tk , y(tk )) − f (tk , yk )] + y (ξk ).
2

We get that

h2 h2
∣dk+1 ∣ ≤ ∣dk ∣ + hL∣dk ∣ + M = (1 + hL)∣dk ∣ + M .
2 2
We shall need a lemma on recursive inequali"es.

Lemma. If for α, β > 0,

γk+1 ≤ (1 + α)γk + β,

then

enα − 1
γn ≤ enα γ0 + β.
α
Proof. Itera"ng the inequality yields

γn ≤ (1 + α)2 γn−2 + [(1 + α) + 1]β ≤ (1 + α)n γ0 + [∑(1 + α)j ] β.


n−1

j=0

Note that by the Taylor formula,

α2 ξ α2 ξ
eα = e0 + e0 α + e =1+α+ e
2 2
with ξ ∈ [0, α].
Hence

α2 ξ
1+α≤1+α+ e = eα .
2
>0

And thus,

1 − (1 + α)n enα − 1
γn ≤ enα γ0 + β ≤ enα γ0 + β.
1 − (1 − α) α

Returning to the proof of the theorem, we get that

ekhL − 1 h2
∣dk ∣ ≤ ekhL ∣d0 ∣ + M.
Lh 2

Now for kh ≤ T − t0 ,

eL(T −t0 ) − 1 h
max ∣dk ∣ ≤ eL(T −t0 ) ∣d0 ∣ + M.
k L 2

∣d0 ∣ → 0 as h → 0 by the 3rd assump"on. Hence, the method converges uniformly with linear convergence rate. □

Heun’s method

Idea. Predictor –corrector.

y~k+1 = yk + hf (tk , yk ) (prediction)

h
yk+1 = yk + [f (tk , yk ) + f (tk+1 , y~k+1 )] (correction)
2
Explicit vs. Implicit

Quadrature. Integral formula"on of the IVP:

t+h
y(t + h) = y(t) + ∫ f (s, y(s)) ds,
t

apply your favorite quadrature rule, for instance:

h
[f (t, y(t)) + f (t + h, y(t + h))] + O(h3 ).
2

Combined, we get:

h
yk+1 = yk + [f (tk , yk ) + f (tk+1 , yk+1 )].
2

This method is implicit. Every step requires a solu"on of a nonlinear (fixed point) problem.
Heun’s method and Euler’s method are explicit.

Linear mul!step methods

tk+1
y(tk+1 ) = y(tk ) + ∫ f (s, y(s)) ds.
tk

Adams-Bashforth (explicit)

Interpola"on nodes tk , tk−1 , … , tk−m+1 . Polynomial pm−1 .

tk+1 m−1
yk+1 = yk + ∫ pm−1 (s) ds = yk + h ∑ bl f (tk−l , yk−l ),
tk l=0

where

1 tk+1 ⎛ s − tk−j ⎞
m−1
bl = ∫ ∏ ds.
⎝ j=0 k−l ⎠
h tk t − tk−j

j=l

The trunca"on error is O(hm ).

What methods can be recovered?


For m = 1, l = 0, we get b0 = 1 and

yk+1 = yk + hf (tk , yk ),

and thus Euler’s method!

Adams-Moulton (implicit)

Add tk+1 as an interpola"on node. Interpola"on polynomial qm .

tk+1 m
yk+1 = yk + ∫ qm (s) ds = yk + h ∑ cl f (tk+1−l , yk+1−l ),
tk l=0

where

1 tk+1 ⎛ s − tk+1−j ⎞
m
cl = ∫ ∏ ds.
⎝j=0 ⎠
h tk tk+1−l − tk+1−j

j=l

The trunca"on error is O(hm+1 ).

For m = 0, l = 0, we get c0 = 1 and

yk+1 = yk + hf (tk+1 , yk+1 ),

which is the so-called backward Euler method (also called implicit Euler method)!

Why do we need mul!step methods?


Bad Example.

y ′ = −15y, y(0) = 1.

Exact solu"on y(t) = e−15t . For h = 1


4
Euler’s method oscillates about zero.
Adams-Moulton (trapezoidal method) works!

General form.

The general form of a linear mul!step method is given for s ∈ N by


s s
∑ ak yn+j = h ∑ bj f (tn+j , yn+j ),
j=0 j=0

where as = 1 (normaliza!on) and the coefficients a0 , … , as−1 and b0 , … , bs determine the method.

The method is called explicit if bs = 0, and implicit otherwise.

We call the mul"step method consistent if the trunca"on error is O(h) or be$er.

Theorem. The linear mul"step method is consistent if and only if

s−1
∑ ak = −1
k=0

and

s s−1
∑ bk = s + ∑ kak .
k=0 k=0

If, moreover,

s s−1
q∑k q−1
bk = s + ∑ k q ak ,
q

k=0 k=0

for every q= 1, … , p then the trunca"on error is O(h1+p ).


(Here, we follow the non-standard conven"on that k 0 = 0 if k = 0).

See [Ernst Hairer, Gerhard Wanner, Syvert P. Nørse$. Solving Ordinary Differen"al Equa"ons I: Nons"ff Problems. Springer,
2nd ed., 1993] for a proof.

The stability of mul"step methods depends on the convergence of the ini"al values y1 , … , ys−1 to y0 as h → 0. The second
condi"on yields a global error O(h ). p

Example. m = 1, a0 + a1 = 0, 0 ⋅ a0 + 1 ⋅ a1 = b1 , and by the normaliza"on assump"on, a1 = 1, a0 = −1, b1 = 1, and


thus we get,

yk+1 = yk + hf (tk+1 , yk+1 )

backward Euler!

Example. (Good bad example)

13 5 5
13 5 5
yk+2 − 3yk+1 + 2yk = h [ f (tk+2 , yk+2 ) − f (tk+1 , yk+1 ) − f (tk , yk )] .
12 3 12

13
This method sa"sfies the first condi"on. For the second condi"on, p = q = 1, we get that −1 
= 12
, so the second condi"on
is not sa"sfied and the method is not stable. Indeed,
for

y ′ = 0, y(0) = 1,

which has the explicit solu"on y(t) = 1, we consider a small perturba"on of the ini"al value, δ > 0,
so that

y0 = 1, y1 = 1 + δ,

y2 = 3y1 − 2y0 = 1 + 3δ,

yk = 3yk−1 − 2yk−2 = 1 + (2k − 1)δ.

Hence, for δ ∼ 2−53 , and k = 100, we get the error ∼ 247 .


We note, however, that the method is consistent and the exact differen"al equa"on is stable (in the mathema"cal sense), and
the perturba"on yδ (t) = 1 + δ converges uniformly to the exact solu"on y(t) = 1 as δ → 0.

Example. (Effect of rounding error)

Returning to the proof of convergence for Euler, for the rounding error δ > 0,

∣dk+1 ∣ ≤ (1 + hL)∣dk ∣ + δ,

we get,

eL(T −t0 ) − 1
∣dk+1 ∣ ≤ eL(T −t0 ) ∣d0 ∣ + δ
Lh
initial error or uncertainty
dominant term, if h is sufficiently small

Gradient descent

The following algorithm is widely used in machine learning, together with its probabilis"c counterpart, the stochas"c gradient
decent (SDG).

The goal is to find the minima of a func"on

f : D → R, D ⊂ Rd ,

which is assumed suitably regular, e.g. f ∈ C 1 (D ∖ ∂D).

Gradient descent algorithm.


For simplicity, assume that 0 ∈ D.

For k = 0, … , N , where N ∈ N, iterate:

1. Fix ini"al point w0 := 0 ∈ D.


2. wk+1 is obtained by moving away from wk in the opposite direc"on of the gradient of f at wk , with step size ηk+1 > 0,
more precisely,

wk+1 := wk − ηk+1 ∇f (wk ), k = 0, 1, … , N .

The constants ηk are called learning rates.

ˉN
3. A!er the N th step, we may choose different outputs, as e.g. just w := wN or

ˉN := arg mink=0,…,N f (wk ).


w

Less obviously, one may also choose

N
1
ˉN :=
w ∑ wk ,
N +1
k=0

which is par"cularly useful for the SDG.

Defini!on. f : Rd → R is called convex, if for every λ ∈ [0, 1], x, y ∈ Rd ,

f (λx + (1 − λy) ≤ λf (x) + (1 − λ)f (y).

Note that if f is convex,

f (∑ λi xi ) ≤ ∑ λi f (xi ),
N N

i=0 i=0

N
for every x0 , x1 , … , xN ∈ Rd , whenever λi ∈ [0, 1] sa"sfy ∑i=0 λi = 1.

Con"nuously differen"able convex func"ons f : Rd → R enjoy the so-called subgradient property, i.e.

f (x) − f (y) ≤ ⟨∇f (x), x − y⟩, x, y ∈ Rd ,

where ⟨⋅, ⋅⟩ denotes the Euclidean scalar product.

Theorem. Let f : Rd → R be convex, con"nuously differen"able and L-Lipschitz con"nuous, i.e.,

∣f (x) − f (y)∣ ≤ L∣x − y∣, x, y ∈ Rd .

Let R > 0, N ∈ N. Set


R
m := min f (w), ηk := η := .
∣w∣≤R L N +1

Then for

N
1
ˉN :=
w ∑ wk ,
N +1
k=0
we have that

RL
f (w
ˉN ) − m ≤ .
N +1

ˉN is doubly dependent on N ; not only through the number of steps, but also through the choice of η .
Note. The point w

Remark.

1. Assume that f has a global minimum in w∗ ∈ Rd . Then, the above result ensures the convergence of f (w
ˉN ) to the
minimum f (w∗ ), provided that R ≥ ∣w∗ ∣. Indeed, the claimed es"mate, together with

f (w
ˉN ) − f (w∗ ) ≥ 0

yields

RL
∣f (w
ˉN ) − f (w∗ )∣ ≤ .
N +1
2. It is not guaranteed that {w
ˉN }N ∈N converges to w∗ unless w∗ is the unique minimizer (e.g. if f is so-called strictly
convex (i.e., the inequality in the defini"on of convexity is strict for λ ∈ (0, 1)).
3. The convergence rate is sublinear unless f is so-called strongly convex (i.e., f − δ∣ ⋅ ∣2 is convex for some δ > 0), which
gives a linear convergence rate.

We start by proving an auxiliary result.

Lemma. Let v1 , v2 , … , vk+1 , w∗ be a sequence of vectors in Rd , and let η > 0. Se%ng


w0 = 0 and

wk := wk−1 − ηvk k ∈ N,

we get that

N N
∣w∗ ∣2 η
∑⟨vk+1 , wk − w∗ ⟩ ≤ + ∑ ∣vk+1 ∣2 .
2η 2
k=0 k=0

In par"cular, we have that

N
1 RL
∑⟨vk+1 , wk − w∗ ⟩ ≤ ,
N +1 N +1
k=0

for any R, L > 0 such that


R
η= ,
L N +1

and

∣w∗ ∣ ≤ R, ∣vk ∣ ≤ L, k = 1, … , N + 1.

Proof. A direct computa"on shows (polariza"on iden"ty)

1 1 η
⟨vk+1 , wk − w∗ ⟩ = (∣wk − w∗ ∣2 + η 2 ∣vk+1 ∣2 − ∣wk − w∗ − ηvk+1 ∣2 ) = (∣wk − w∗ ∣2 − ∣wk+1 − w∗ ∣2 ) + ∣vk+1 ∣2 .
2η 2η 2

Adding up with respect to k yields


N N N
1 η
∑⟨vk+1 , wk − w∗ ⟩ = ∑ (∣wk − w∗ ∣2 − ∣wk+1 − w∗ ∣2 ) + ∑ ∣vk+1 ∣2 .
2η 2
k=0 k=0 k=0

The first term is a telescoping sum and w0 = 0, so that we get


N N
1 η
∑⟨vk+1 , wk − w∗ ⟩ = (∣w∗ ∣2 − ∣wN − w∗ ∣2 ) + ∑ ∣vk+1 ∣2 ,
2η 2
k=0 k=0

which proves the first asser"on.


For the second asser"on it is enough to observe that under our condi"ons

N
∣w∗ ∣2 η R2 η(N + 1)L2
+ ∑ ∣vk+1 ∣2 ≤ + = RL N + 1
2η 2 2η 2
k=0

Proof of the Theorem. Recalling that f is convex, we get that

ˉN ) = f ( ∑ wk ) ≤
N N
1 1
f (w ∑ f (wk ).
N +1 N +1
k=0 k=0

Therefore, for any w∗ ∈ arg min∣w∣≤R f (w), we obtain by the Lemma,


M N
1 1 RL
f (w
ˉN ) − m = f (w
ˉN ) − f (w∗ ) ≤ ∑(f (wk ) − f (w∗ )) ≤ ∑⟨∇f (wk ), wk − w∗ ⟩ ≤ .
N +1 N +1 N +1
k=0 k=0 =:vk+1

Literature

1. Anne Greenbaum and Tim P. Char"er. Numerical Methods: Design, Analysis, and Computer Implementa"on of
Algorithms, Princeton University Press, 2012.
2. L. Ridgway Sco$. Numerical Analysis, Princeton University Press, 2011.
3. Qingkai Kong, Timmy Siauw, and Alexandre Bayen. Python Programming and Numerical Methods. A Guide for Engineers
and Scien"sts. Academic Press, 2020.
4. Tobin A. Driscoll and Richard J. Braun, Fundamentals of Numerical Computa"on, SIAM, 2017.
5. Ernst Hairer, Gerhard Wanner, Syvert P. Nørse$. Solving Ordinary Differen"al Equa"ons I: Nons"ff Problems. Springer,
2nd ed., 1993.
6. Real Analysis, Wikibooks, Crea"ve Commons BY-SA 4.0.
7. Stefano Pagliarani. An introduc"on to discrete-"me stochas"c processes and their applica"ons. Lecture notes, Alma
Mater Studiorum - Università di Bologna, 2024.
8. Harri Hakula. Numerical Analysis. Lecture transcript, Aalto University, 2021.

1. jonas.tolle@aalto.fi ↩︎

You might also like