Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
52 views

Numerical Methods Course Notes

The University of Adelaide Numerical Methods 2 Course Notes for understanding methods used in computing for numerically solving calculus
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Numerical Methods Course Notes

The University of Adelaide Numerical Methods 2 Course Notes for understanding methods used in computing for numerically solving calculus
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Numerical Methods, MATHS 2104/7104

Prof Tony Roberts, Lewis Mitchell & Trent Mattner


School of Mathematical Sciences

July 16, 2023

Contents
0 Introduction 4:1

1 Matlab 4:4

2 Polynomial interpolation 4:18

3 Numerical integration 4:34

4 Numerical differentiation 4:37

5 Splines 4:40

6 Numerical linear algebra 4:50

7 Nonlinear equations 4:69

8 Ordinary differential equations 4:76

9 Monte Carlo methods 4:88

A Summary 4:96

0 Introduction
Numerical methods Numerical methods are algorithms used
to obtain approximate solutions to algebraically intractable math-
ematical problems. Such problems arise when modelling many
physical, biological, chemical and financial systems and phenomena.
Computer technology facilitates the application of numerical meth-
ods and the simulation of complex systems. These simulations (or
numerical experiments) allow us to better understand, predict and
optimize such systems.
4:2 References

Course learning outcomes


• Demonstrate understanding of common numerical methods,
and how they provide approximate solutions (to otherwise
intractable mathematical problems).
• Apply numerical methods to obtain approximate solutions to
mathematical problems.
• Derive numerical methods for various mathematical opera-
tions and tasks, such as interpolation, differentiation, integra-
tion, the solution of linear and nonlinear equations, and the
solution of differential equations.
• Analyse and evaluate the accuracy of common numerical
methods.
• Implement numerical methods in Matlab.
• Write efficient, well-documented Matlabcode, and present
numerical results in an informative way.

Overview of numerical methods


• Polynomial interpolation
• Numerical integration and differentiation
• Splines
• Numerical linear algebra
• Numerical solution of nonlinear systems of equations
• Numerical solution of ordinary differential equations
• Monte Carlo methods
Good reference books are by Kreyszig (2011) and Quarteroni et al.
(2014).

References
Fornberg, B. & Flyer, N. (2015), ‘Solving PDEs with radial basis
functions’, Acta Numerica 24, 215–258.
Hahn, B. H. & Valentine, D. T. (2013), Essential Matlab for
Engineers and Scientists, 5th edition edn, Academic Press.
http://www.sciencedirect.com/science/article/pii/
B9780123943989000204
Johnson, R. (2014), Matlab style guidelines 2.0, Technical re-
port, Datatool, https://au.mathworks.com/matlabcentral/
fileexchange/46056-matlab-style-guidelines-2-0.
Kreyszig, E. (2011), Advanced engineering mathematics, 10th edn,
Wiley.

© July 16, 2023


References 4:3

Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P.


(2007), Numerical recipes. The art of scientific computing, 3rd
edition edn, CUP.
http://numerical.recipes
Quarteroni, A., Saleri, F. & Gervasio, P. (2014), Scientific Comput-
ing with MATLAB and Octave, Vol. 2 of Texts in Computational
Science and Engineering, 4th edition edn, Springer.
https://link-springer-com.proxy.library.adelaide.edu.
au/book/10.1007%2F978-3-642-45367-0

© July 16, 2023


4:4 1 Matlab

1 Matlab
1.1 Assumed knowledge about Matlab/Octave
For more details about these concepts:

• read the first nine chapters of the introductory book by Hahn


& Valentine (2013) (freely available as pdf via the library);

• study the content made freely available by Software Carpen-


try on Programmming with Matlab, https://swcarpentry.
github.io/matlab-novice-inflammation/;

• read the first chapter of the book by Quarteroni et al. (2014)


(freely available as pdf via the library);

• and/or read Programming with Matlab or Octave available in


MyUni;

• and/or view the screencasts in MyUni and the accompanying


document Numerical Methods: Matlab Screencasts.

1.1.1 Basic Matlab

• Variables access and store data in the computer’s memory. 1

• Mostly, in this course the data will be real numbers, albeit


limited by computer representation, and some are integer-
valued (Quarteroni et al. 2014, §1.2).

• Occasionally we use strings enclosed in single quotes, ’...’

• The command name = expression assigns the result of the


expression to the variable called name (Quarteroni et al.
2014, §1.7).

• lookfor searchterm searches the first comment line in func-


tions for matches to the specified searchterm .

• help functionName lists for your information the first block


of comments in the specification of the function functionName .

1.1.2 Vectors and arrays

• Overwhelmingly we use the data structures of vectors and


matrices of numbers (Quarteroni et al. 2014, §1.4). 2 A
number is often termed a scalar.

– Square brackets, [ ... ], construct matrices and vec-


tors (Quarteroni et al. 2014, §1.4), including concatenat-
ing matrices and vectors together.
1
https://swcarpentry.github.io/matlab-novice-inflammation/
01-intro.html
2
https://swcarpentry.github.io/matlab-novice-inflammation/
02-arrays.html

© July 16, 2023


1.1 Assumed knowledge about Matlab/Octave 4:5

– The colon operator or the linspace() function con-


structs a row vector of an arithmetic sequence (Quar-
teroni et al. 2014, §1.5–1.6).

– length(v) returns the number of elements of vector v.

– When A is an m×n array, size(A) returns the two-vector


[m,n].

– Useful m × n matrices are zeros(m,n), ones(m,n), and


nan(m,n) (not-a-number which usefully helps catch er-
rors) (Quarteroni et al. 2014, §1.4).

– A(i,j) refers to the element of array A in the ith row


and jth column.

– If i and/or j are ‘index’ vectors themselves, then A(i,j)


refers to the corresponding sub-block of A.

– A(i,:) refers to the entire ith row, and similarly for


columns.

– The operator ’ transposes a matrix or vector (Quarteroni


et al. 2014, §1.4).

• The operators ^, *, +, - follow the usual rules of matrix


arithmetic (Quarteroni et al. 2014, §1.7).

• The operators .^, .*, ./ apply element-by-element, just like


+ and - (Quarteroni et al. 2014, §1.4.1).

• Operators are evaluated in-line with the usual precedence in


mathematics: operators with equal precedence are evaluated
left-to-right.

• For element-by-element operators, Matlab automatically


replicates scalars and vectors as appropriate (Section 1.3.2).

1.1.3 Command control structures

• A % character indicates that the rest of a line is a comment


that is not parsed or executed by Matlab.

• Putting ... at the end of a line indicates the command is


continued on the next line: a practical maximum line length
is 80 characters (60 is better).

• if condition , statements , end conditionally executes the


statements (Quarteroni et al. 2014, §1.7.1). Such an if-
command may also have an else group of statements. 3

3
https://swcarpentry.github.io/matlab-novice-inflammation/06-cond.
html

© July 16, 2023


4:6 1 Matlab

• A for-loop is the standard way to repeatedly execute a group


of statements (Quarteroni et al. 2014, §1.7.1). 4 5
• The break command (inside an if-statement) is used to ter-
minate a for-loop upon some desired condition being reached.
• Relational operators ==, ~=, <, <=, >, and >= are evaluated
after arithmetic: Matlab represents the result as 0 or 1 to
represent false or true respectively (Quarteroni et al. 2014,
§1.7).
• Logical operators &, | and ~ are evaluated after relations.
• functionName = @(argumentName ) expression defines sim-
ple, one-line, ‘anonymous’ functions.
• Define complicated functions (Quarteroni et al. 2014, §1.7.2):
6

– usually in a separate file, functionName.m ;


– function result =functionName (arguments ) should
be the first line;
– followed by description and code;
– variables within the function are purely local;
– within the code, executing return immediately termi-
nates the function;
– and end with the last line end .
• Scripts are more informal than functions (Quarteroni et al.
2014, §1.7.2): 7
– a scripts is a sequence of Matlab commands stored in
a text file, say scriptName.m;
– execute a script by typing its file name (scriptName );
– usually begin a script with clear all (but not in Mat-
lab Grader);
– use scripts to develop code and test functions.
• With care it is possible to have a script and/or multiple
functions stored in the one text file—later in your career you
might do so. For simplicity, in this course we require that
4
https://swcarpentry.github.io/matlab-novice-inflammation/
05-loops.html
5
We avoid while-loops as all too often they result in never-ending infinite
loops.
6
https://swcarpentry.github.io/matlab-novice-inflammation/07-func.
html
7
https://swcarpentry.github.io/matlab-novice-inflammation/
04-scripts.html

© July 16, 2023


1.1 Assumed knowledge about Matlab/Octave 4:7

each script and each (non-anonymous) function be stored in


separate files.
1.1.4 Builtin Matlab functions
• Many common mathematical functions, such as sin(), cos(),
log() and exp(), apply element-by-element to vector or array
arguments (Quarteroni et al. 2014, §1.7).
• Other common functions, such as min(), max(), sum() and prod()
(products), act
– over all elements in a vector argument (resulting in a
scalar answer); but
– columnwise when the argument is a 2D array (resulting
in a row vector of minima, maxima, sums, and products,
respectively).
• Moreover, [m,j]=min(a) or [M,j]=max(a) results in j being
set to the element that equals the minimum/maximum (for
each column when an array, resulting in a row vector j).
• Further, min(a,b) or max(a,b) results in the min/max of an
(auto-replicated) element-by-element comparison of a and b.
• rand(m,n) generates an m × n array of random numbers
uniformly distributed in [0, 1) (Quarteroni et al. 2014, §1.6):
randn() generates N (0, 1) random numbers.
1.1.5 Input and output
• Matlab writes to the Command Window the result of each
command, unless the command is terminated by a semi-
colon (;).
• format compact conveniently omits most of the blank lines
Matlab write to the Command Window.
• Especially in a script or debugging, fprintf(’\n...\n’)
prints a message to the Command Window. This often usefully
replaces comments in a script as it may have the dual role of
commenting the script and commenting the output.
• save(...) stores the specified variables to the specified file
in an unreadable binary format: save(...,’-ascii’) stores
them in a human readable and editable text format (Quar-
teroni et al. 2014, §1.7).
• load(...) gets the variables/data from the specified file into
Matlab variables or expression (Quarteroni et al. 2014, §1.7).
• plot() draws 2D graphs, as does loglog() and semilogy()
(Quarteroni et al. 2014, §1.6) 8 , with optional
8
https://swcarpentry.github.io/matlab-novice-inflammation/
03-plotting.html

© July 16, 2023


4:8 1 Matlab

– axes labelled by xlabel() and ylabel(),


– legend via legend(), and
– title via title(’...’).
• contour(), surf() and mesh() draw 3D surface graphs.
• To save a plot for including in a report:
1. execute set(gcf,’PaperPosition’,[0 0 14 10]) (this
is when Matlab is using centimetres, if it uses inches
then execute set(gcf,’PaperPosition’,[0 0 5.5 4]));
then
2. save with print(’filename.eps’,’-depsc2’) to en-
sure good quality (subsequently convert as needed).
• tic ... elaspedTime=toc reports the elapsed time (in sec-
onds) of the code executed between the tic and the toc.
1.2 Require good programming style
Programming style Code should be written and presented in
a way that maximises clarity and efficiency. A logical, consistent
style and good documentation helps to achieve this objective.
For comprehensive advice on good programming style in Matlab,
see MATLAB Style Guidelines 2.0 by Johnson (2014). These are
guides, not absolute rules.
Nicholson (2014) published a one page summary (see MyUni).
In this course, we require the following.
• Develop code first in a script, with small vectors of length 4–7.
When the code is working, then formalise into a function.
• Name variables according to the context of the problem, and
according to some conventions (see Naming Conventions)
(Johnson 2014, pp.7–12).
• Documentation: at the start of every function or script must
be a description, author, and date (of last revision); comments
must add information (see Documentation).
• There is no more than one executable statement per line,
except for short single statement if or for statements.
• Lines be limited to at most 80 characters, preferably 60 (use
... to continue statement onto next line).
• Reveal logical structure with standard Matlab indentation
(use Smart Indent).
• Suppress unwanted output with semicolons (;).
• When a number or expression is used several times, then
define it once and use the name.

© July 16, 2023


1.3 Vectorise for clarity and efficiency 4:9

• Vectorise code where reasonable (Section 1.3).

Naming conventions (Johnson 2014, pp.7–12)


• Variable names should be mixed case starting with lower case:
velocity, angularAcceleration. Often, variables starting
with a capital letter are used for matrices.
• Variables with a large scope should have meaningful names:
velocity, acceleration. Variables with a small scope can
have short names: x, y, z.
• The prefix n should be used for variables representing the
number of objects: nFiles, nCars, nLines.
• Use a convention on pluralisation consistently: point, pointArray,
points.
• Variables representing a single entity number can be suffixed
by No: tableNo, employeeNo.
• Prefix iterator variable names with i, j, k, etc: iFiles,
jColumns.
However, the context in which you write a program overrules these
guidelines.

Documentation Always document your code with comments.


Comments are preceded by the percent character %.
The bare minimum is
• description,
• author, and
• date (of last revision).
Where appropriate, provide judicious comments to help a reader
understand the function, usage and limitations of parts of the code.
Such comments must add information. Avoid otiose commenting.
1.3 Vectorise for clarity and efficiency
Vectorisation Vectorisation refers to the process of replacing
scalar data and operations with highly efficient vector or matrix
data and operations.

Table 1: Computers have extremely complicated hardware.


Intel Core i7 - Nehalem (?08-?10)
2 or 4 core up to 3GHz
14 stage pipeline with stream prefetching
Return of hyper-threading
32 KB L1 cache (8-way set associative)
256 KB L2 cache per core (8-way set associative)
8 MB L3 cache, Shared (16-way set associative)

© July 16, 2023


4:10 1 Matlab

First, computer efficiency


• Multi-core and hyper-threading implies multiple separate
tasks may be done in parallel.
• The 14 stage pipeline means that the same operation may
be done on 14 numbers simultaneously, staggered, like an
‘assembly line’, speeding by a factor of 14.
• The three levels of cache are buffers between the fast compute
core and the much slower main memory: for fast large-scale
computations one has to ‘block-up’ the numbers involved in
computations in order to minimise data transfers between
core, caches, and slow main memory.
There is no way that scientists and engineers like you and I can spare
the time to learn and write code that micro-manages computations
to be effective on such complicated hardware.

Solution Use the lapack and blas functions developed by one


of the longest running public domain software projects: these
functions are used by almost all good scientific and engineering
software, including Matlab.
For modern complicated computers lapack and blas functions
perform vector and matrix operations highly efficiently. We must
take advantage of these efficient functions by programming in terms
of vectors and matrices.

Second, human efficiency Programming in terms of vectors


and matrices reduces and simplifies the code. It is quicker to write.
Being shorter code, we make less errors. Further, thinking in terms
of vectors and matrices empowers us to think and operate at a
powerful system wide level. Coding in terms of vectors and matrices
is also efficient for humans.
We must develop skills in vectorisation.

1.3.1 Basic vectorisation examples

Example 1.1. For every vector a, b and c,


for i = 1:length(b)
a(i) = b(i) + c(i);
end
must be replaced with a = b + c;

© July 16, 2023


1.3 Vectorise for clarity and efficiency 4:11

Example 1.2. For every


P vector a and b, one could compute the
inner product a · b = ni=1 ai bi as
s = 0;
for i = 1:length(a)
s = s + a(i)*b(i);
end
To vectorise, replace with s = sum(a.*b),
or, via a transpose, with s = a’*b (when column vectors).

Example 1.3. Similarly, to compute the product 1 · 3 · 5 · · · n we


could code
oddProd = 1;
for j = 3:2:n
oddProd = j*oddProd;
end
but much better is oddProd = prod(1:2:n);

Example 1.4. For every vector a,


for i = 1:length(a)-1
da(i) = a(i+1) - a(i);
end
must be replaced with da = diff(a)
or alternatively da = a(2:end) - a(1:end-1)
Remember: end evaluates to the length of the vector in which it
appears (and so is often useful).

Example 1.5. For every vector a, (Inf ≡ ∞ in Matlab)


lowest = Inf;
for i = 1:length(a)
if a(i) < lowest
lowest = a(i);
end
end
must be replaced with lowest = min(a);

Preallocation If you generate a vector/matrix piecemeal, and


you know the size of the matrix or vector you are creating, then pre-
allocate storage (so that Matlab does not waste time re-allocating
storage with the various pieces).

© July 16, 2023


4:12 1 Matlab

Example 1.6. In computing n Fibonacci numbers, we should


preallocate the vector F before the recursion:

F = nan(1,n);
F(1:2) = 1:2;
for k = 3:n
F(k) = F(k-1) + F(k-2);
end

nan or NaN denotes Not-a-Number : we use it in case we miss


assigning some of the elements (here of F), and subsequently try
to use an unassigned element; the NaNs propagate in calculations
warning us of such an error.

Example 1.7. Legendre polynomial Pk (x) may be computed from


the recursion (k + 1)Pk+1 (x) = (2k + 1)xPk (x) − kPk−1 (x), for
k = 1, 2, . . .. Given P0 (x) = 1 and P1 (x) = x, compute and plot
the first n Legendre polynomials over −1 ≤ x ≤ 1: here P(k+1,:)
stores Pk (x).

x = linspace(-1,1);
P = nan(n+1,length(x));
P(1,:) = ones(size(x)); %or 1+0*x
P(2,:) = x;
for k = 1:n-1
P(k+2,:) = ((2*k+1)*x.*P(k+1,:)-k*P(k,:))/(k+1);
end
plot(x’,P’), legend(num2str((0:n)’))

Glitch: Matlab vectors/arrays always start with index 1, whereas


here we want index 0.

1.3.2 Auto-replication
Matlab 2016+ automatically
 1 −2  replicates array operations Although
the product 2 −3 4 is well defined mathematically, the sum
1 −2

2 + −3 4 is not defined mathematically. Nonetheless, such sums
are so useful that Matlab defines them, and many others of the
same ilk—be wary.

Re +, -, .*, ./, .^, <, >=, & et al.: for such element-by-element
operators, Matlab automatically replicates scalars and vectors as
appropriate. For example,

• 2 + −3
 1 −2   1 −2   3 0
4 7→ [ 22 22 ] + −3 4 7→ −1 6

• max(1, −3 4 ) 7→ max [ 11 11 ] , −3
 1 −2  1 −2
7→ [ 11 14 ]
 
4

• [ 1 −2 ] .* −3
 1 −2 
7→ 11 −2
   1 −2   1 4 
4 −2 .* −3 4 7→ −3 −8

• −2 + −3 4 7→ −2 −2 + −3 4 7→ −5 2
 1   1 −2   1 1   1 −2   2 −1 

• [ 34 ] ./ [ 1 −2 3 ] 7→ [ 34 34 34 ] ./ 11 −2 3
   3 −1.5 1 
−2 3 7→ 4 −2 1.33···

© July 16, 2023


1.3 Vectorise for clarity and efficiency 4:13

• but [ 1 −2 3 ] .* 1 −2
 
−3 4 is an error as the non-1 sizes do not
match.

Example 1.8. We meet the Hilbert matrix in Section 6.3: Hij =


1/(i + j − 1) for i, j = 1, 
. . . , n. Use auto-replication to construct it
1 1/2 1/3
for the case n = 3, H = 1/2 1/3 1/4 .
1/3 1/4 1/5

• Set i=1:3
• First, i’+i is mathematical nonsense, but Matlab auto-
replicates the vectors to
h1i h1 1 1i h1 2 3i h2 3 4i
2 + [ 1 2 3 ] 7→ 222 + 123 7→ 345
3 333 123 456

• Second, (i’+i)-1 is also mathematical nonsense, but Mat-


lab auto-replicates the scalar 1 to
h2 3 4i h2 3 4i h1 1 1i h1 2 3i
345 − 1 7→ 345 − 111 7→ 234
456 456 111 345

• Lastly, for 1./(i’+i-1) Matlab also auto-replicates the


scalar 1 to compute the required Hilbert matrix:

1 1/2 1/3
h2 3 4i h1 1 1i h1 2 3i  
1./ 345 7→ 111 ./ 234 7→ 1/2 1/3 1/4
456 111 345 1/3 1/4 1/5

Example 1.9. For every number in the row vector x, which is


the nearest number in the row vector xi? Example 1.5 introduced
using min(), but here use its index output: scalar x(j) is auto-
replicated in the subtraction with vector x.
iNearest=nan(size(x));
for j = 1:length(x)
[~,iNearest(j)] = min( abs(x(j)-xi) );
end
But better is to use that min() operates along columns—columns
in a 2D array obtained by further auto-replication:
[~,iNearest] = min( abs(x-xi’) )
Try it on say xi=[2 3 5 7], x=sort(10*rand(1,5))

© July 16, 2023


4:14 1 Matlab

Example 1.10. Given a set of points in the plane with coordi-


nates (xj , yj ), stored in vectors
p x and y, compute the array of all
inter-point distances dij = (xi − xj )2 + (yi − yj )2 is the distance
from point i to point j.
d=nan(length(x));
for i = 1:length(x)
for j = 1:length(x)
d(i,j) = sqrt((x(i)-x(j))^2+(y(i)-y(j))^2);
end
end
Such double for-loops of scalar code is poor (six lines): better code
auto-replicates some scalars to vectors (four lines).
d=nan(length(x));
for i = 1:length(x)
d(i,:) = sqrt( (x(i)-x).^2+(y(i)-y).^2 );
end
But best is to auto-replicate to array code (one line):
d = sqrt( (x’-x).^2 + (y’-y).^2 )
1.3.3 Succeed in vectorisation
What do you need to do to succeed? Regard every for-loop
and every if-statement as a challenge to remove.
• Remove many loops with array-vector operations and/or the
power of vectors of indices.
• Remove many ifs using find(), min(), max(), et al.
Some for-loops are necessary, but if so:
• poor is for things, scalar-code , end ;
• good is for things, vector-code , end ;
• very good is for things, array-code , end .
1.4 Worked example: vectorisation, functions, documentation
Vibrating strings The boundary value problem describing a
vibrating string with given initial deflection and velocity is

∂2u 2
2∂ u
= c , 0 < x < L, t > 0,
∂t2 ∂x2
subject to boundary conditions

u(0, t) = u(L, t) = 0, t > 0,

and initial conditions

u(x, 0) = f (x), ut (x, 0) = g(x), 0 < x < L.

© July 16, 2023


1.4 Worked example: vectorisation, functions, documentation 4:15

Solution Using separation of variables, the solution is



X nπx
u(x, t) = (Bn cos λn t + Cn sin λn t) sin ,
L
n=1

where λn = cnπ/L and for n = 1, 2, . . .


2 L
Z
nπx
Bn = f (x) sin dx ,
L 0 L
Z L
2 nπx
Cn = g(x) sin dx .
cnπ 0 L

Plucked string For a plucked string


(
x
`, 0<x<`
u(x, 0) = f (x) = L−x
L−` , ` < x < L
ut (x, 0) = g(x) = 0.

The relevant Fourier series obtains



L L X 2 nπ` cnπt nπx
u(x, t) = 2 2
sin cos sin
` (L − `) n π L L L
n=1

Example
1. Write a Matlab script that evaluates this solution for 0 ≤
x ≤ L at a single time t. Use for loops and scalar operations
only.
• string1a.m computes u at one given x and t.
• string1b.m computes u at one given t, and for all x
along the string.
2. Vectorise your script and compare the run time of the vec-
torised script with your previous script.
• string1c.m vectorises the code somewhat and executes
twenty times faster.
• string2.m fully vectorises the code somewhat and ex-
ecutes faster still (but not by much as Matlab, and
other compliers, do parse code and automatically try to
vectorise it for you).
3. Animate the solution by stepping forward through time using
a for loop.
• See string3.m
4. Write a vectorised Matlab function that evaluates the solu-
tion for 0 ≤ x ≤ L at a single time t. Document the function
using the conventions required for this course.

© July 16, 2023


4:16 1 Matlab

• After the script has been developed and tested, we edit


into the function string4.m
5. Write a vectorised Matlab function that evaluates the so-
lution for 0 ≤ t ≤ T at a single point x by modifying your
previous function. Use Matlab’s sound function to listen to
the solution.
Answer:

Vectorised function

function u = synthesize(x, t, ell, L, c, nSum)


% synthesize calculates vibrating string deflection
% at a fixed location x for a vector of times t
% LM & AJR, Numerical Methods, 26 July 2018
% Output:
% u = the deflection of the plucked string.
% Input:
% x - the position along the string.
% t - a row vector containing the times.
% ell - the position along the string that is plucked.
% L - length of the string.
% c - wavespeed of the string.
% nSum - number of terms in the Fourier sum.

B = @(n) L^2/(ell*(L-ell))*2./(n.^2*pi^2).*sin(pi*ell/L*n);
n = 1:nSum;
u = B(n).*sin(pi*x/L*n)*cos(c*pi/L*n’*t);
end

Answer:

Vectorised implementation

% Simulates a sound using solutions of the wave


% equation. Requires synthesize.m
% Numerical Methods, 26 July 2018
close all,clear all

fs = 44000; % Sampling freqency (Hz)


f1 = 440; % Fundamental frequency (Hz)
L = 1; % String length (m)
ell = 0.5; % Plucking position (m)
T = 1; % Duration of tone (sec)
x = 0.5; % Position along string (m)
nSum = 50; % Number of modes

dt = 1/fs;
u = synthesize(x, 0:dt:T, ell, L, 2*f1*L, nSum);
sound(u, fs)

1.4.1 Past exam question


Past exam question The following Matlab function intTrap()
uses the trapezoidal rule to estimate the integral of a user-defined

© July 16, 2023


1.5 Summary 4:17

function func(). The input vector x contains the sample points in


ascending order. The x-points are generally not equi-spaced.
function s = intTrap(func, x)
s = 0;
for i=1:length(x)-1
h = x(i+1) - x(i);
s = s + 0.5*h*(func(x(i+1)) + func(x(i)));
end
Vectorise the function intTrap(). Assume that func() returns
vector output when given a vector argument.
Answer:

Solution
function s = intTrap(func, x)
% Trapezoidal rule estimate of the integral of func()
% s = intTrap(func, x) estimates the definite integral
% of func from x(1) to x(end) using the trapezoidal rule.
% Numerical Methods, Aug 2018
% Input/Output:
% func - a user defined function handle.
% x - a vector of sample points in ascending order.
% s - the trapezoidal estimate of the integral.

f = func(x);
j = 1:length(x)-1;
s = 0.5*sum((f(j+1) + f(j)).*diff(x));

1.5 Summary
• All function and script programs—as for all documents—must
have a descriptive title, author and date.
• The first comment line in a function/script must be a one line
summary so that lookfor is useful.
• Comments in code must add information.
• When a value or expression is used several times, define once
and refer to it by name. Unless a significant parameter, avoid
introducing a variable which is then only used once.
• Code with style: use meaningful names in the context of the
problem; generally avoid multiple statements on a line; use
good indentation and blank spaces; avoid long lines.
• Include clean graphics in your document: do not use a bit-
image format; the information must be a good size; label axes;
include informative caption or title (not both).
• Vectorise code where feasible: prefer to code at the level of
vectors and matrices.
• Use for-loops only where necessary; prefer vectorisation.

© July 16, 2023


4:18 1 Matlab

• Use if-statements only when necessary; prefer min(), max()


and friends.

© July 16, 2023


4:19

2 Polynomial interpolation

2.1 Interpolation

Interpolation is what Artifi-


cial Intelligence, Machine Learn-
ing, Voice/Image Recognition
(AI etc.), all do. Here we do sim-
plest case of a moderate amount
of known data, as a function
of one variable. AI etc have
lots of data, noisy/inaccurate/
conflicting, as function of thou-
sands/millions of variables. Sec-
tion 5 (splines) takes next step
towards these by exploring data
that is a function of several vari-
ables.
5Mbyte hard drive, 1956

Interpolation Suppose we only know the values of a function


f (x) at a certain set of predetermined points (x0 , f0 ), (x1 , f1 )
through to (xN −1 , fN −1 ), where fj = f (xj ) and x0 < x1 < · · · <
xN −1 .

The aim of interpolation is to construct a curve that

• passes precisely through the data points, and

• approximates the underlying function between the data points


(known in AI etc as ‘generalisation’).

Applications of interpolation include climatology, finance, biome-


chanics and robotics (Quarteroni et al. 2014, §3.1). Interpolation is
also the basis for numerical integration and differentiation.

© July 16, 2023


4:20 2 Polynomial interpolation

1.5 f(x)

0.5

y(x), f(x)
0

-0.5

-1

-1.5

-2
0 1 2 3 4
x

1.5
(x0 ,f 0 )
1 fj = f(x j)

0.5
y(x), f(x)

-0.5

-1 (x1 ,f 1 )
(xj,f j)
-1.5 (xN-1 ,f N-1 )

-2
0 1 2 3 4
x

© July 16, 2023


2.1 Interpolation 4:21

1.5
(x ,f )
0 0
1

0.5

y(x), f(x)
0

-0.5

-1 (x1 ,f 1 )
(xj,f j)
-1.5 (xN-1 ,f N-1 )

-2
0 1 2 3 4
x

1.5
y(x0 )=f0
1

0.5
y(x), f(x)

0 y(x)

-0.5
y(x1 )=f1
-1
y(xj)=fj
-1.5
y(xN-1 )=fN-1
-2
0 1 2 3 4
x

© July 16, 2023


4:22 2 Polynomial interpolation

1.5 f(x)

0.5

y(x), f(x)
0 y(x)

-0.5

-1

-1.5

-2
0 1 2 3 4
x
2

1.5 f(x)

0.5
y(x), f(x)

0 y(x)

-0.5

-1

-1.5

-2
0 1 2 3 4
x

2.2 Interpolation using polynomials


Let’s start with polynomial interpolation. AI etc construct a so-
called ‘neural network’, which sounds impressive but is really just
a complicated composition of many functions that look like tanh x.

Polynomial interpolation

© July 16, 2023


2.2 Interpolation using polynomials 4:23

Example 2.1. Find the polynomial p2 (x) = a0 + a1 x + a2 x2 that


interpolates through the points (0, 0), (1, 1) and (2, 0).
The polynomial must pass through each point, hence

p2 (0) = a0 = 0
p2 (1) = a0 + a1 + a2 = 1
p2 (2) = a0 + 2a1 + 4a2 = 0

The polynomial is p2 (x) = 2x − x2 .


A general polynomial of degree n is
n
X
2 n
pn (x) = a0 + a1 x + a2 x + · · · + an x = ai xi ,
i=0

where a0 , . . . , an are n + 1 coefficients (Quarteroni et al. 2014, §3.3).


For pn to interpolate the n + 1 points (x0 , f0 ), (x1 , f1 ), . . . , (xn , fn )
we require
n
X
pn (xj ) = ai xij = fj , j = 0, . . . , n.
i=0

The Vandermonde system Write this as the matrix-vector lin-


ear system

1 x0 x20 · · · xn0
    
a0 f0
1 x1 x2 · · · xn   a1   f1 
1 1 
 .. .. .. ..   ..  =  ..  .
  
. . . .   .  .
2
1 xn xn · · · xn n an fn

This matrix is an example of a Vandermonde matrix. It is invertible/


non-singular provided that x0 , x1 , . . . , xn are distinct. In that case,
the system may be solved for the unknown vector of coefficients
(a0 , a1 , . . . , an ).
Once the coefficients are known, the interpolating polynomial pn (x)
may be evaluated for any x.
However, the Vandermonde matrix is often ill-conditioned (more
about this in Section 6), meaning that the calculated coefficients
are often inaccurate.
Furthermore, solving the linear system numerically requires about n3
arithmetic operations. More efficient and robust methods are avail-
able.
In Matlab, use the function polyfit to find the coefficients
(a0 , a1 , . . . , an ) using a Vandermonde matrix, and then use polyval
to evaluate the polynomial using these coefficients.

© July 16, 2023


4:24 2 Polynomial interpolation

2.3 Lagrange form of the interpolation polynomial


Interpolation polynomials can just be written down, without having
to solve the Vandermonde system (Kreyszig 2011, §19.3).

Example 2.2. Write down the linear polynomial that interpolates


(x0 , f0 ) and (x1 , f1 ).
The linear interpolant is
x − x1 x − x0
p1 (x) = f0 +f1 .
x −x x −x
| 0 {z 1} | 1 {z 0}
L0 (x) L1 (x)

Notice that
• L0 (x0 ) = 1, L1 (x0 ) = 0 =⇒ p1 (x0 ) = f0 , and
• L0 (x1 ) = 0, L1 (x1 ) = 1 =⇒ p1 (x1 ) = f1 .

Example 2.3. Write down the Lagrange form of the interpolation


polynomial that passes through the points (0, 0), (1, 1) and (2, 0).
Three quadratic polynomials that are equal to unity at one of the
data points and zero at the others, respectively, are
(x − 1)(x − 2) x(x − 2) x(x − 1)
L0 = , L1 = , L2 = .
2 −1 2
The quadratic interpolating polynomial is
(x − 1)(x − 2) x(x − 2) x(x − 1)
p2 (x) = 0 · +1· +0· .
2 −1 2
The Lagrange form of the interpolation polynomial that passes
through the n + 1 points (x0 , f0 ), (x1 , f1 ), . . . , (xn , fn ) is
n n
X Y x − xi
pn (x) = fj Lj (x), Lj (x) = ,
xj − xi
j=0 i=0
i6=j

where n is the degree of the polynomial (Quarteroni et al. 2014,


§3.3).
Although different representations of the polynomial are possible,
the interpolation polynomial of degree n that passes through n + 1
distinct points is unique.
The functions Lj (x) satisfy the point properties
(
0 k 6= j,
Lj (xk ) =
1 k = j.

The Lagrange form of the interpolation polynomial can just be


written down, so it does not take any computer time to find the
polynomial. However, it takes about 2n2 arithmetic operations
to evaluate. Section 3 uses it to derive formulae to integrate and
differentiate discrete data.

© July 16, 2023


2.4 Error in polynomial interpolation 4:25

Figure 1: y = ex
3
p1 (x) = 1 − x + ex

x
−1 −0.5 0.5 1

2.4 Error in polynomial interpolation


Error in polynomial interpolation
Theorem 2.4. Suppose f (x) has n + 1 continuous derivatives on
the smallest interval I that contains {x, x0 , . . . , xn }. Then the error
of the polynomial interpolant is
n
f (n+1) (t) Y
n (x) = f (x) − pn (x) = (x − xj )
(n + 1)!
j=0

for some t ∈ I (t depends upon x) (Quarteroni et al. 2014, Prop. 3.2)


(Kreyszig 2011, §19.3, Thm. 1).

Error in linear interpolation


Example 2.5. Consider linear interpolation of f (x) = ex through
x0 = 0 and x1 = 1. The interpolant is p1 (x) = (1 − x) + ex
(Figure 1). Using Theorem 2.4, for some t (varying with x) the
error is
et
1 (x) = x(x − 1).
2
When x ∈ [0, 1], then t ∈ [0, 1] = I, and on this interval et ≤ e,
hence
et e e
|1 (x)| = |x(x − 1)| ≤ |x(x − 1)| ≤ .
2 2 8
1
The last inequality comes from the max of |x(x − 1)| at x = 2 of 14 .
2.5 Piecewise polynomial interpolation
Avoid high-order polynomial interpolation The Lagrange form
of the interpolation polynomial that passes through n + 1 points
(x0 , f0 ), (x1 , f1 ), . . . , (xn , fn ) is
n n
X Y x − xi
pn (x) = fj Lj (x), Lj (x) = ,
xj − xi
j=0 i=0
i6=j

where n is the degree of the polynomial.

© July 16, 2023


4:26 2 Polynomial interpolation

High-order polynomials may not be a good idea Polynomial in-


terpolant of f (x) = 1/(1 + 25x2 ) using uniformly spaced data points
(Quarteroni et al. 2014, §3.3.3).

1.5

1
f(x)

0.5

-0.5

-1
-1 -0.5 0 0.5 1
x

Unless appropriate data points are used Polynomial interpolant


of f (x) = 1/(1 + 25x2 ) using nonuniformly spaced data points
(Quarteroni et al. 2014, §3.3.3). Challenge: think how the theorem
guarantees that the error is small with this sort of non-uniformity.

1.5

1
f(x)

0.5

-0.5

-1
-1 -0.5 0 0.5 1
x

Piecewise polynomial interpolation Instead of increasing the


degree of the interpolation polynomial to accommodate more data

© July 16, 2023


2.5 Piecewise polynomial interpolation 4:27

points, we usually divide the interval into subintervals and use a


low-degree polynomial to interpolate the data in each subinterval.
Rule-of-thumb: avoid order higher than cubic.
In piecewise constant or nearest-neighbour interpolation, the value
of the interpolant at x is simply the value fj at the nearest point xj .

Piecewise constant interpolation

1.5

1
(xj,f j)
0.5
y(x), f(x)

-0.5

-1

-1.5

-2
0 1 2 3 4 5
x
2

1.5
f(x)
1

0.5
y(x), f(x)

-0.5

-1 y(x)

-1.5

-2
0 1 2 3 4 5
x

Piecewise linear interpolation In piecewise linear interpolation,


a linear interpolation polynomial is found for each interval [xj , xj+1 ]
(Quarteroni et al. 2014, §3.4).

© July 16, 2023


4:28 2 Polynomial interpolation

The linear polynomial that interpolates the data (xj , fj ) and (xj+1 , fj+1 )
on the jth interval is

x − xj+1 x − xj
p1 (x) = fj + fj+1
xj − xj+1 xj+1 − xj

1.5 (xj,f j)

1
(xj+1 ,f j+1 )
0.5
y(x), f(x)

-0.5

-1

-1.5

-2
0 1 2 3 4 5
x

1.5 (xj,f j)

1
(x ,f )
j+1 j+1
0.5
y(x), f(x)

-0.5

-1

-1.5

-2
0 1 2 3 4 5
x

© July 16, 2023


2.5 Piecewise polynomial interpolation 4:29

1.5
f(x)
1

0.5

y(x), f(x)
0

-0.5

-1 y(x)

-1.5

-2
0 1 2 3 4 5
x

Piecewise quadratic interpolation In piecewise quadratic inter-


polation, find a quadratic interpolation polynomial for each interval
[xj , xj+2 ].

1.5 (xj,f j)

1
(xj+1 ,f j+1 )
0.5
y(x), f(x)

-0.5 (xj+2 ,f j+2 )

-1

-1.5

-2
0 1 2 3 4 5
x

© July 16, 2023


4:30 2 Polynomial interpolation

1.5 (xj,f j)

1
(x ,f )
j+1 j+1
0.5

y(x), f(x)
0

-0.5 (xj+2 ,f j+2 )

-1

-1.5

-2
0 1 2 3 4 5
x
2

1.5
f(x)
1

0.5
y(x), f(x)

-0.5

-1 y(x)

-1.5

-2
0 1 2 3 4 5
x

Error in piecewise linear interpolation


Example 2.6. Consider linear interpolation of an unknown function
f (x) through xj and xj+1 . Using Theorem 2.4, the error is

f 00 (t)
1 (x) = (x − xj )(x − xj+1 ),
2
where t ∈ [xj , xj+1 ]. Suppose the points are equi-spaced and h =
xj+1 − xj . Then (Quarteroni et al. 2014, Proposition 3.3)

h2
|1 (x)| ≤ max |f 00 |
8

© July 16, 2023


2.6 Exercise: Piecewise polynomial interpolation 4:31

2.6 Exercise: Piecewise polynomial interpolation


Nearest neighbour interpolation The aim of interpolation is to
construct a curve y(x) that passes precisely through a set of data
points (x1 , f1 ), (x2 , f2 ) through to (xN , fN ).

Example 2.7. Let x be a row vector containing the points at which


we want to evaluate the interpolant. Let xj and fj be row vectors
containing the known data.
Consider the following data:
>> xj = [ 1 4 6 10 ]
>> fj = [ 3 5 2 0 ]
>> x = [ 2 3 4 5 6 7 ]
Let y be a row vector containing the values of y(x) for each element
of x. Write down the elements of y.
Answer:

y = [ 3 5 5 5 2 2 ]

Write down an algorithm for finding the elements of y.


Answer:
1: for each element of x do
2: find the index j of the closest element of xj
3: y = xj(j)
4: end for

Write a Matlab implementation of this algorithm.

Scalar implementation
Answer: In a scalar implementation, we loop through each value of x
and use the min(abs()) functions to identify the index j of the element
in xj that is closest to x(i).
If a is a vector, then:

• aMin = min(a) returns the value of the minimum element in the


vector a.
• [aMin,j] = min(a) returns the index j of that element.
for i=1:length(x)
[~, j] = min(abs(x(i)-xj));
y(i) = fj(j);
end

The tilde ~ tells Matlab that the output in this position in the list is
not needed.

Vectorised implementation We need to consider all combina-


tions of the distance from every Xj to every x =⇒ use a 2D-array.
Write out the elements of the array
A(j,i) = x(i) - xj(j)
for i=1:length(x) and j=1:length(xj).

© July 16, 2023


4:32 2 Polynomial interpolation

Answer:

A =
1 2 3 4 5 6
-2 -1 0 1 2 3
-4 -3 -2 -1 0 1
-8 -7 -6 -5 -4 -3

How can you use this matrix to find the index of the nearest point
in xj to each element of x?
Answer: [~, j] = min(abs(A)) returns a row vector j containing the
index of the minimum value along each column of abs(A).

j =

1 2 2 2 3 3

If A is a matrix, then
• aMin = min(A) returns a row vector containing the minimum
element from each column.
• [aMin,i] = min(A) also returns a row vector i containing
the index of that element.
What Matlab commands could be used to create this matrix?
The old answer was via meshgrid. Now we use auto-replication:
A = x-xj’ to get
A =
1 2 3 4 5 6
-2 -1 0 1 2 3
-4 -3 -2 -1 0 1
-8 -7 -6 -5 -4 -3
Write a fully vectorised function called nearest.m that implements
nearest neighbour interpolation. Your function must use the inter-
face:
function y = nearest(x,xj,fj)
where x is a row vector of points at which the interpolant is to be
evaluated, xj is a vector containing the data points xj and fj is
a vector containing the data values fj . The output vector y must
be the same size as x and contain the values of the interpolant for
each element of the vector x.
Answer:

function y = nearest(xj,fj,x)
% documentation: name, date, description of function,
% inputs, outputs

[~,j] = min(abs(x-xj’));
y = fj(j);

© July 16, 2023


2.6 Exercise: Piecewise polynomial interpolation 4:33

Piecewise linear interpolation In piecewise linear interpolation,


the interpolant is

x − xj+1 x − xj
y(x) = p1 (x) = fj + fj+1 ,
xj − xj+1 xj+1 − xj

where xj ≤ x < xj+1 .

Example 2.8. Let x be a row vector containing the points at which


we want to evaluate the interpolant. Let xj and fj be row vectors
containing the known data.
Consider the following data:
>> xj = [ 1 4 6 8 ]
>> fj = [ 2 5 3 5 ]
>> x = [ 2 3 4 5 6 7 ]
Let y be a row vector containing the values of y(x) for each element
of x. Write down the elements of y.
Answer:

y = [ 3 4 5 4 3 4 ]

Write down an algorithm for finding the elements of y.


Answer:
1: for each element of x do
2: find the index j such that xj(j) <= x < xj(j+1)
3: y = fj(j )*(x - xj(j+1))/(xj(j ) - xj(j+1)) ...
+ fj(j+1)*(x - xj(j ))/(xj(j+1) - xj(j ))
4: end for

Write a Matlab implementation of this algorithm.

Scalar implementation
Answer: In a scalar implementation, we loop through each value of x
and use the sum() function to identify the index j of the element in xj
that satisfies x(i) >= xj.
n = length(xj);
for i = 1:length(x)
j = min(max(1,sum(x(i) >= xj)),n-1);
y(i) = ( fj(j+1)*(x(i) - xj(j )) ...
- fj(j )*(x(i) - xj(j+1)) )/(xj(j+1) - xj(j));
end

The min(max(1, ... ),n-1) ensures 1 <= j <= n-1.

Vectorised implementation Write out the elements of the ma-


trix
I(j,i) = x(i) >= xj(j)
for i=1:length(x) and j=1:length(xj).

© July 16, 2023


4:34 2 Polynomial interpolation

Answer:

I =
1 1 1 1 1 1
0 0 1 1 1 1
0 0 0 0 1 1
0 0 0 0 0 0

How can you use this matrix to find the index j such that xj(j) <= x < xj(j+1)?
Answer: j = sum(I) returns a row vector j that is the sum of the
elements down the columns of I, which are the required indices.

j =

1 1 2 2 3 3

What Matlab commands could be used to create this matrix?


One uses auto-replication: I = x >= xj’
I =
1 1 1 1 1 1
0 0 1 1 1 1
0 0 0 0 1 1
0 0 0 0 0 0
Write a fully vectorised function called linearInterp.m that im-
plements piecewise linear interpolation. Your function must use the
interface:
function y = linearInterp(x,xj,fj)
where x is a row vector of points at which the interpolant is to be
evaluated, xj is a row vector containing the data points xj and fj
is a row vector containing the data values fj . The output vector y
must be the same size as x and contain the values of the interpolant
for each element of the vector x.
Answer:

function y = linearInterp(x,xj,fj)
% Add appropriate documentation

n = length(xj);
j = sum(x >= xj’);
j = min(max(1,j),n-1);
% [~, j] = histc(x,xj);
% j(x < xj(1)) = 1;
% j(x >= xj(n)) = n-1;
y = ( fj(j+1).*(x - xj(j )) ...
- fj(j ).*(x - xj(j+1)) )./(xj(j+1) - xj(j));

© July 16, 2023


4:35

3 Numerical integration
Numerical integration The aim of numerical integration is to
obtain approximate values of the definite integral
Z b
I= f (x) dx
a
in areas such as hydraulics, optics, electromagnetism and demogra-
phy (Quarteroni et al. 2014, §4.1). Numerical integration is needed
when:
• it is impossible
R 2 to calculate the integral analytically—for
example, I = 0 exp(sin x) dx; or
• the function is measured from experiments; or
• the function is computed as a table of values.
One way to derive numerical integration formulae is to approximate
the function f (x) using an interpolant, and then integrate the
interpolant.
This is accomplished by sampling the function on a discrete set
of points, and finding the interpolant that passes through those
points. Piecewise polynomials are ideal for this—they are easily
integrated!
3.1 Midpoint rule

Midpoint rule

Example 3.1. Consider the integral I = 0 sin x dx = 2. Approxi-
mate f (x) = sin x using piecewise constant interpolation through
three equi-spaced data points:
1
f (x)
0.8
0.6
0.4
0.2
x

1
2
 0 ≤ x ≤ π3 ,
0.5 1 1.5 2 2.5 3 f (x) ≈ 1 π 2π
3 ≤x≤ 3 ,
1
 2π
2 3 ≤ x ≤ π.

The area under the interpolant is I ≈ 3 = 2.0944.
More generally, divide the interval [a, b] into n equispaced subin-
tervals of width h = (b − a)/n. Let fj = f (xj ), where xj is the
midpoint of the jth interval and use piecewise constant interpolation
(Quarteroni et al. 2014, §4.3.1).
The integral over the jth subinterval is
Z xj + h Z xj + h
2 2
f (x) dx ≈ fj dx = hfj ,
xj − h
2
xj − h
2

© July 16, 2023


4:36 3 Numerical integration

hence the total integral is


Z b n Z xj + h n
X 2 X
f (x) dx = f (x) dx ≈ h fj .
a j=1 xj − h
2 j=1


Example 3.2. Evaluate I = 0 sin x dx = 2 using the midpoint
rule.
n I error
-----------------
3 2.0944 0.0944
6 2.0230 0.0230
12 2.0057 0.0057
24 2.0014 0.0014
48 2.0004 0.0004
-----------------

Theorem 3.3. For a smooth function f (x) with bounded second


derivative |f 00 (x)| < M for a ≤ x ≤ b, the error mid of the midpoint
integration rule is bounded by
1
|mid | ≤ (b − a)M h2 ,
24
where h is the grid spacing.
3.2 Lagrange’s Remainder Theorem
Lagrange’s Remainder Theorem A special case of the Polyno-
mial Interpolation Error Theorem 2.4.
Theorem 3.4. If a function f (x) has (n + 1) derivatives in a
neighbourhood of a point x = a, then
f 00 (a)
f (x) = f (a) + f 0 (a)(x − a) + (x − a)2 + · · ·
2!
f (n) f (n+1) (t)
+ (x − a)n + (x − a)n+1 ,
n! (n + 1)!
| {z }
remainder

for some t (depending upon x) between a and x.

Error scaling
Definition 3.5. For any quantity q (such as the error) and any
parameter h (such as the size of the subintervals) we write “q =
O(hp ) as h → 0”, said “q is of order hp ” when
q
lim =C
h→0 hp

for some finite constant C (Quarteroni et al. 2014, §1.6). Analo-


gously, we write “q = O(np ) as n → ∞”, said “q is of order np ”
when limn→∞ (q/np ) = C . Often “as h → 0” and “as n → ∞” is
omitted as implicit from the context.

© July 16, 2023


3.3 Trapezoidal rule 4:37

Example 3.6. For midpoint integration, show that |mid | = O(h2 )


as h → 0.
3.3 Trapezoidal rule
Trapezoidal rule Divide the interval [a, b] into (n − 1) equis-
paced subintervals of width h = (b − a)/(n − 1). Then the edges
of each subintervals are at xj = a + (j − 1)h for j = 1, . . . , n. Let
fj = f (xj ), and use piecewise linear interpolation (Quarteroni et al.
2014, §4.3.2) (Kreyszig 2011, §19.5).
The integral over the jth subinterval is
Z xj+1
h
f (x) dx ≈ (fj + fj+1 ).
xj 2
Hence the total integral is
 
Z b n−1
f1 X fn 
f (x) dx ≈ h  + fj + .
a 2 2
j=2

Theorem 3.7. For a smooth function f (x) with bounded second


derivative |f 00 (x)| ≤ M for a ≤ x ≤ b, the error of the trapezoidal
integration rule is
1
(b − a)M h2 ,
|trap | ≤
12
where h is the grid spacing (Kreyszig 2011, §19.5).
3.4 Simpson’s rule
Simpson’s rule Divide the interval [a, b] into (n − 1) equispaced
subintervals of width h = (b − a)/(n − 1), where n is odd. Let
fj = f (xj ), where xj = a + (j − 1)h for j = 1, . . . , n, and consider
the area under the piecewise quadratic interpolant (Quarteroni et al.
2014, §4.3.3) (Kreyszig 2011, §19.5).
The integral over each piecewise quadratic is
Z xj+1
h
f (x) dx ≈ · · · = (fj−1 + 4fj + fj+1 ).
xj−1 3

The total integral is


Z b n−1
X Z xj+1
f (x) dx = f (x) dx
a j=2 xj−1
j even
n−1
h X
≈ (fj−1 + 4fj + fj+1 )
3
j=2
j even
 n−1 n−2 
h X X
= f1 + 4 fj + 2 fj + fn
3
j=2 j=3
j even j odd

© July 16, 2023


4:38 4 Numerical differentiation

Theorem 3.8. For a smooth function f (x) with bounded fourth


derivative |f (iv) (x)| < M for a ≤ x ≤ b, the error of Simpson’s
integration rule is
1
|Simp | ≤ (b − a)M h4 ,
180
where h is the grid spacing (Kreyszig 2011, §19.5).

4 Numerical differentiation
Numerical differentiation The goal of numerical differentiation
is to obtain approximate values of the derivatives f 0 (x), f 00 (x), . . . , f (n) (x)
of a given function f (x).
Numerical derivatives are needed when:
• the function is computed as a table of values; or
• we want to fit trends to experimental data; or
• numerically solving ordinary and partial differential equations.
One way to derive numerical differentiation formulae is to approxi-
mate the function f (x) using an interpolant, and then differentiate
the interpolant.
As with integration, this is facilitated by sampling the function on
a discrete set of points (or grid), and finding the interpolant that
passes through those points.
Taylor series can also be used to derive and analyse numerical
differentiation formulae.
4.1 Differentation using polynomial interpolation

Differentiation using linear interpolation


Example 4.1. Let fj = f (xj ). Find the linear polynomial that
passes through (xj , fj ) and (xj+1 , fj+1 ) and differentiate it to obtain
an approximation of f 0 (x).
The linear polynomial is
x − xj+1 x − xj
p1 (x) = fj + fj+1 .
xj − xj+1 xj+1 − xj

The derivative is
fj+1 − fj
f 0 (x) ≈ p01 (x) = , xj ≤ x ≤ xj+1 .
xj+1 − xj

Error analysis

© July 16, 2023


4.2 Differentiation using Taylor’s theorem 4:39

Example 4.2. What is the error of the first-derivative approxima-


tion for fj0 = f 0 (xj ) and fj+1
0 = f 0 (xj+1 ) from Example 4.1.
The error is (x) = f 0 (x) − p01 (x) = 01 (x), where 1 (x) is the error
given by the Polynomial Interpolation Error Theorem 2.4. Hence
the error at the grid points is

1 1
|(xj )|, |(xj+1 )| = |f 00 (t)|h ≤ M h = O(h) as h → 0,
2 2
where h = xj+1 − xj and |f 00 (t)| ≤ M .
For example, use this formula to differentiate f (x) = ex sin x at
xj = 0 and xj+1 = h, and loglog-plot the error for various h. The
blue crosses in this plot shows that the error decreases by an order
of magnitude as h decreases by an order of magnitude: hence the
error is O(h1 ); we turn to the red-circles next.

100
one-sided
10−1 centred

10−2
abs-error

10−3

10−4

10−5

10−6 −3
10 10−2 10−1 100
h

% error in differentiating this function at 0


f=@(x) sin(x).*exp(x)
df0 = cos(0)*exp(0)+sin(0)*exp(0);
h = 0.5.^(1:9);
df1 = (f(h)-f(0))./h;
df2 = (f(h)-f(-h))./(2*h);
loglog(h,abs(df1-df0),’x’,h,abs(df2-df0),’o’)
grid on, xlabel(’h’), ylabel(’abs-error’)
legend(’one-sided’,’centred’ ...
,’location’,’northwest’)
matlab2tikz(’diffError.ltx’)
4.2 Differentiation using Taylor’s theorem

Differentiation using Taylor’s theorem

© July 16, 2023


4:40 4 Numerical differentiation

Example 4.3. Let fj = f (xj ), where xj are equally spaced


with grid spacing h. Use Taylor’s theorem to find an expres-
sion for fj0 = f 0 (xj ) in terms of discrete data (xj−1 , fj−1 ), (xj , fj )
and (xj+1 , fj+1 ).
Neglecting “· · ·”, terms of O(h2 ), the derivative
fj+1 − fj−1
fj0 ≈ .
2h

Error analysis
Example 4.4. What is the error of the approximate derivative in
Example 4.3? The error is defined as
fj+1 − fj−1
j = fj0 − .
2h
Using Lagrange’s Remainder Theorem 3.4, the error satisfies
1
|j | ≤ M h2 = O(h2 ) as h → 0,
6
where h = xj+1 − xj and |f 000 | ≤ M .
4.3 Finite difference formulae

Differentiation using quadratic interpolation


Example 4.5. Let fj = f (xj ), where xj are equally spaced with
constant grid spacing h. Find the quadratic polynomial that passes
through (xj−1 , fj−1 ), (xj , fj ) and (xj+1 , fj+1 ) and differentiate it
to obtain approximations for f 0 (x) and f 00 (x).
The derivatives are
2x − xj − xj+1
f 0 (x) ≈ p02 (x) = fj−1
2h2
2x − xj−1 − xj+1
− fj
h2
2x − xj−1 − xj
+ fj+1 ;
2h2
fj−1 − 2fj + fj+1
f 00 (x) ≈ p002 (x) = .
h2
Finite difference formulae Evaluating the derivative of the quadratic
interpolant at the data points yields the finite difference formulae
(Kreyszig 2011, §19.5)
0 −fj+1 + 4fj − 3fj−1
fj−1 ≈ p02 (xj−1 ) = · · · =
2h
fj+1 − fj−1
fj0 ≈ p02 (xj ) = · · · =
2h
0 0 3fj+1 − 4fj + fj−1
fj+1 ≈ p2 (xj+1 ) = · · · =
2h
fj−1 − 2fj + fj+1
fj00 ≈ p002 (xj ) =
h2
Taylor’s theorem also derives these formulae.

© July 16, 2023


4:41

5 Splines

A spline

5.1 Linear splines

Linear splines

Example 5.1. Consider piecewise linear interpolation through


(0, 1), (1, 0), (2, 0) and (3, 2). The interpolant is

2
y(x)
1.5 
1 − x
 0 ≤ x ≤ 1,
1 y(x) = 0 1 ≤ x ≤ 2,
0.5

2(x − 2) 2 ≤ x ≤ 3.

x
1 2 3

We now write this as globally on the interval 0 ≤ x ≤ 3 as

3 1 1
y(x) = − + x + |x − 1| + |x − 2|
2 2 2

Let’s find such a global interpolant to the data directly. Let

y(x) = a + bx + c2 |x − 1| + c3 |x − 2|,

where a, b, c2 and c3 are unknown coefficients. The interpolant


must pass through each point, hence

y(0) = a + c2 + 2c3 = 1 ,
y(1) = a + b + c3 = 0 ,
y(2) = a + 2b + c2 = 0 ,
y(3) = a + 3b + 2c2 + c3 = 2 .

The coefficients are a = − 32 , b = 12 , c2 = 1


2 and c3 = 1.

© July 16, 2023


4:42 5 Splines

General linear spline Suppose we wish to interpolate N points


(x1 , f1 ), (x2 , f2 ), . . . , (xN , fN ). Let

N
X −1
y(x) = a + bx + cj |x − xj |,
j=2

where a, b and cj are coefficients.


For y(x) to interpolate the N points we require

N
X −1
y(xi ) = a + bxi + cj |xi − xj | = fi , i = 1, . . . , N.
j=2

Write as the linear matrix-vector system


    
1 x1 |x1 − x2 | ··· |x1 − xN −1 | a f1
1 x2 |x2 − x2 | ··· |x2 − xN −1 |   b   f2 
    
1 x3 |x3 − x2 | ··· |x3 − xN −1 |   c2   f3 
= 

 
 .. .. .. ..  ..   .. 
. . . .   .   . 
1 xN |xN − x2 | · · · |xN − xN −1 | cN −1 fN

which can be solved for the unknown vector of coefficients (a, b, c2 , . . . , cN −1 ).


9 Once the coefficients are known, y(x) may be evaluated for any

value of x.

Matlab implementation Given xj is a column vector, we con-


struct the matrix
 
1 x1 |x1 − x2 | · · · |x1 − xN −1 |
1 x2 |x2 − x2 | · · · |x2 − xN −1 | 
 
1 x3 |x3 − x2 | · · · |x3 − xN −1 | 
 
 .. .. .. .. 
. . . . 
1 xN |xN − x2 | · · · |xN − xN −1 |

using the code


N = length(xj);
A = [ones(N,1) xj abs(xj - xj(2:N-1)’)];

Matlab solution of linear systems If A is an n × n matrix


(invertible), x is an n × 1 column vector of unknowns and f is an
n × 1 column vector, then the solution of the linear system A*x = b
is computed via the backslash operator:
x = A\b

9
In mathematics, we use a list in parentheses to denote the corresponding
column vector: e.g., (a, b) = [ ab ].

© July 16, 2023


5.2 Cubic splines 4:43

Example 5.2 (solve a 3 × 3 system).


>> A = [1 2 1; 1 2 2; 2 0 1]; b = [1 2 3]’;
>> x = A\b
x =
1.0000
-0.5000
1.0000
>> A*x
ans =
1
2
3

Matlab implementation Assume that x and fj are column


vectors. To find the vector of coefficients, we solve the linear system
using
abc = A\fj;
abc is the column vector of coefficients. In particular, abc(1) = a,
abc(2) = b and abc(3:N) = (c2 , . . . , cN −1 ).
Evaluate the interpolant using
y = abc(1) + abc(2)*x + abs(x - xj(2:N-1)’)*abc(3:N);
Saved as linearspline.m on MyUni.

5.2 Cubic splines


Cubic splines Linear splines have discontinuties in the deriva-
tive. Piecewise Lagrange polynomials also usually have discontin-
uous derivatives. Cubic splines enforce continuity of the first and
second derivatives (Quarteroni et al. 2014, §3.5) (Kreyszig 2011,
§19.4).
The only change here is |x − xj | 7→ |x − xj |3 .
A method specifies that the interpolant has ‘jumps’ only in the
third derivative, and at the data points. For some coefficients a, b
and cj , the cubic spline interpolant is

N
X −1
y(x) = a + bx + cj |x − xj |3 .
j=2

© July 16, 2023


4:44 5 Splines

Example 5.3. Consider cubic spline interpolation through (0, 1),


(1, 0), (2, 0) and (3, 2). The interpolant is

y(x) = a + bx + c2 |x − 1|3 + c3 |x − 2|3 .

2
y(x)
1.5

0.5

x
0.5 1 1.5 2 2.5 3

The interpolant must pass through each point, hence

y(0) = a + c2 + 8c3 = 1 ,
y(1) = a + b + c3 = 0 ,
y(2) = a + 2b + c2 = 0 ,
y(3) = a + 3b + 8c2 + c3 = 2 .

3
The coefficients are a = 16 , b = − 14 , c2 = 5
16 and c3 = 1
16 .

Differentiating cubic splines

d
|x − xj | = sign(x − xj ), x 6= xj ,
dx
d
sign(x − xj ) = 0, x 6= xj ,
dx (
1 x > 0,
where sign(x) =
−1 x < 0,
and |x − xj | = (x − xj ) sign(x − xj ).

© July 16, 2023


5.3 Two-dimensional splines 4:45

Example 5.4. The cubic spline obtained in Example 5.3 is


3 1 5 1
y(x) = − x + |x − 1|3 + |x − 2|3 .
16 4 16 16
The derivatives are
1 15 3
y 0 (x) = − + sign(x − 1) (x − 1)2 + sign(x − 2) (x − 2)2
4 16 16
15 3
y 00 (x) = |x − 1| + |x − 2|
8 8
4 y(x)
fj
y
y0
y 00
2

x
0.5 1 1.5 2 2.5 3

Differentiating the cubic spline interpolant


N
X −1
y(x) = a + bx + cj |x − xj |3 ,
j=2

yields the everywhere continuous derivatives


N
X −1
0
y (x) = b + 3 cj sign(x − xj )(x − xj )2 ,
j=2
N
X −1
y 00 (x) = 6 cj |x − xj |.
j=2

Implementation saved as cubicspline.m on MyUni.


5.3 Two-dimensional splines
Splines in two dimensions To interpolate N points (x1 , y1 , f1 ),
(x2 , y2 , f2 ), . . . , (xN , yN , fN ), where fj = f (xj , yj ).
Generalising |x − xj |, a global interpolant in two dimensions is
N
X
u(x, y) = a + b1 x + b2 y + cj rj (x, y),
j=1
p
where rj (x, y) = (x − xj )2 + (y − yj )2 for j = 1, . . . , N , and
{a, b1 , b2 , c1 , . . . , cN } are N + 3 coefficients.
The basis functions are 1, x, y and rj .

© July 16, 2023


4:46 5 Splines

Radial basis functions Seek to approximate a surface as a sum


of these, but all based at different points, and different strengths.

x2 + y 2
1

p
r= 1
0
−1 0
−0.5 0
0.5 y
1 −1
x

Splines in two dimensions For u(x, y) to interpolate the N


points we require
u(xi , yi ) = fi , i = 1, . . . , N,
N
X q
⇐⇒ a + b1 xi + b2 yi + cj (xi − xj )2 + (yi − yj )2 = fi .
j=1

But we have N +3 unknowns, so we need three additional equations.


The three additional equations we choose are
N
X N
X N
X
cj = 0, cj xj = 0, cj yj = 0.
j=1 j=1 j=1
p
With ri,j = (xi − xj )2 + (yi − yj )2 these yield the symmetric
matrix in the system
    
0 0 0 1 1 ··· 1 a 0
0 0 0 x1 x2 · · · xN   b1   0 
   
 
0 0 0 y2 · · · yN 
y1   b2   0 
   
  =  .

1 x1 y1 r1,1 r1,2 · · · r1,N 
   c1   f1 
. .. .. .. .. ..   ..   .. 
 .. . . . . .  .   . 
1 xN yN rN,1 rN,2 · · · rN,N cN fN

Matlab implementation Assuming xj, yj and fj are column


vectors of length N, we construct the matrix
 
0 0 0 1 1 ··· 1
0 0 0 x1 x2 · · · xN 
 
0 0 0 y1 y2 · · · yN 
A = 1 x
 
 1 y1 r1,1 r1,2 · · · r1,N 
. .. .. .. .. .. 
.. . . . . . 
1 xN yN rN,1 rN,2 · · · rN,N

© July 16, 2023


5.3 Two-dimensional splines 4:47

using the code


dist = sqrt((xj-xj’).^2 + (yj-yj’).^2);
A = [ zeros(3,3) [ones(N,1) xj yj]’
ones(N,1) xj yj dist ];
Find the vector of coefficients by solving the linear system: use
abc = A\[zeros(3,1); fj];
abc is the column vector of coefficients. In particular, abc(1) = a,
abc(2) = b1 , abc(3) = b2 and abc(4:N+3) = (c1 , . . . , cN ).
p
Recall the interpolant is in terms of rj (x, y) = (x − xj )2 + (y − yj )2 ,
and is

u(x, y) = a + b1 x + b2 y + c1 r1 (x, y) + · · · + cN (x, y),


 
a
 b1 
 
 b2 

 
u(x, y) = 1 x y r1 (x, y) · · · rN (x, y)  c  .
 1
 . 
 .. 
cN

Replicate such a row vector for all the (x, y) you want.
That is, evaluate the interpolant using
dist = sqrt((x-xj’).^2 + (y-yj’).^2);
u = abc(1) + abc(2)*x + abc(3)*y + dist*abc(4:N+3);
where x and y are equal-length vectors containing the coordinates
of the points at which the interpolant is to be evaluated.
Saved as rbfspline2d.m on MyUni.

Radial basis functions In general, a 2D spline is


N
X
u(x, y) = a + b1 x + b2 y + cj φ(rj ).
j=1

The basis functions are 1, x, y and φ(rj ), where previously


q
φ(rj ) = rj = (x − xj )2 + (y − yj )2 , j = 1, . . . , N.

This is a simple example radial basis function.


For smoother interpolation, try

3 cubic spline,
rj

φ(rj ) = rj2 log rj ‘thin plate’ spline,

 2 2
exp(−rj /` ) diffusion kernel.

Read Fornberg & Flyer (2015).

© July 16, 2023


4:48 5 Splines

5.4 Multidimensional splines


Classification challenge: Can we identify which cultivar (first
column) a wine comes from (upon measuring thirteen characteristics,
d = 13, given N = 178 rows/points)? http://archive.ics.uci.
edu/ml/datasets/Wine

Multidimensional splines Now we must interpolate in d dimen-


sions! Let x be a d-dimensional vector, where each row in data is
mathematically a column vector xi . We aim to interpolate N points
(x1 , f1 ), (x2 , f2 ), . . . , (xN , fN ), where fj = f (xj ).
A global interpolant in d dimensions is
N
X
u(x) = a + b · x + cj kx − xj k,
j=1

© July 16, 2023


5.4 Multidimensional splines 4:49

where we have 1 + d + N coefficients: a scalar a, a d-dimensional


vector b, and N coefficients c1 , . . . , cN .
For u(x) to interpolate the N points we require

u(xi ) = fi , i = 1, . . . , N,
N
X
⇐⇒ a + b · xi + cj kxi − xj k = fi .
j=1

But we have 1 + d + N unknown coefficients, so we need 1 + d


additional equations.
The 1 + d additional equations we choose are

N
X N
X
cj = 0 and cj xj = 0.
j=1 j=1

The second of these is a vector equation, so is equivalent to d scalar


equations.
Setting ri,j = kxi − xj k, the resulting linear system is

0 0T
    
1 1 ··· 1 a 0
0 0 x1 x2 ··· xN  b 0
    
 1 xT r1,1 r1,2 ··· r1,N 
  c1  =  f1 
   
 1
 .. .. .. .. ..   ..   .. 
. . . . .  .   . 
1 xTN rN,1 rN,2 · · · rN,N cN fN

Matlab implementation Let xj be an N × d matrix such that


xj(j,:) = xT
j for j = 1, . . . , N .
In d-dimensions, the squared distance

d
X
2 2
ri,j = kxi − xj k = (xi,k − xj,k )2 ,
k=1

where xi,k is the kth component of xi . Here there are three-fold


combinations, all k, all i and all j: want to vectorise over (at
least) two of them; choose the two of largest length to get most
computational benefit.
Calculate interpoint distances via auto-replication using
[N, d] = size(xj);
dist2 = zeros(N,N);
for k=1:d
dist2 = dist2 + (xj(:,k) - xj(:,k)’).^2;
end

© July 16, 2023


4:50 5 Splines

Thus we construct the matrix


0 0T
 
1 1 ··· 1
0 0 x1 x2 ··· xN 
 
 1 xT r1,1 r1,2 ··· r1,N 
 1 
 .. .. .. .. .. 
. . . . . 
1 xT
N rN,1 rN,2 · · · rN,N

using
A = [zeros(d+1,d+1) [ones(N,1) xj]’
ones(N,1) xj sqrt(dist2) ];
Similar to before, we solve with
abc = A\[zeros(d+1,1); fj];
When x has the same structure as xj, evaluate the interpolant using
dist2 = zeros(size(x,1),N);
for k=1:d
dist2 = dist2 + (x(:,k) - xj(:,k)’).^2;
end
u =abc(1)+x*abc(2:d+1)+sqrt(dist2)*abc(d+2:N+d+1);

© July 16, 2023


4:51

6 Numerical linear algebra


6.1 LU factorisation
Numerical linear algebra Consider a system of n linear equa-
tions in n unknowns x1 , x2 , . . . , xn . We write such a system in
matrix-vector form:
    
a11 a12 a13 · · · a1n x1 b1
 a21 a22 a23 · · · a2n   x2   b2 
    
 a31 a32 a33 · · · a3n   x3   b3 
   =  
 .. .. .. . . ..   ..   .. 
 . . . . .  .   . 
an1 an2 an3 · · · ann xn bn

or more succinctly as
Ax = b.
Solving such systems is one of the central tasks in scientific com-
puting (Quarteroni et al. 2014, Ch. 5).
Examples abound including hydraulic networks, spectrometry, eco-
nomics, and capillary networks (Quarteroni et al. 2014, §5.1).

Never invert a matrix Mathematically, the solution of the sys-


tem may be written as x = A−1 b where A−1 is the inverse of A.
But calculating the solution using the inverse is both computational
expensive and badly error prone. Instead we solve the system
Ax = b by factoring A.

Example 6.1. Execute cfIntLU.m


>> cfInvLU
n=5000; A=randn(n,n); b=randn(n,1);
tic; x=A\b; toc
Elapsed time is 1.285061 seconds.
tic; x=inv(A)*b; toc
Elapsed time is 3.321828 seconds.
For this size matrix, solving Ax = b by using the inverse inv(A)
takes three times as long.
More discussion at:
http://blogs.mathworks.com/loren/2007/05/16/purpose-of-inv/
http://www.mathworks.com/help/matlab/ref/inv.html

‘normal equation’ matrix is often bad

© July 16, 2023


4:52 6 Numerical linear algebra

Example 6.2. Execute neverInvert.m


>> neverInvert
X =
1 1
1e-08 0
0 1e-08
Warning: Matrix is singular to working precision.
> In neverInvert (line 6)
tryInverse =
Inf Inf
Inf Inf
inverseShouldBe =
5e+15 -5e+15
-5e+15 5e+15
But worse are the gross inaccuracies when d = 5e-8 as then there
is no warning.

Never invert a matrix Issues when using A−1 :


1. often it is slow; and
2. often it is wrong.

Factorisation is handy!
Solve the equation 42x = 588 by hand in ten seconds!

Gaussian elimination (Kreyszig 2011, §20.1)

Example 6.3. Solve the linear system of equations


    
1 2 2 x1 −4
−1 −4 −6 x2  =  4  .
−3 2 6 x3 8

Using elementary operations, we obtain the equivalent system


    
1 2 2 x1 −4
0 −2 −4 x2  =  0 
0 0 −4 x3 −4

from which the solution is x = (−2, −2, 1).

Gaussian elimination: never code your own Underlying Mat-


lab, Octave, and all good numerical software, are the routines of
the lapack and blas3 libraries.
The lapack and blas3 libraries are the fruits of one of the longest
running public domain software projects. Their routines are de-
signed to carry out matrix-vector computations as quickly as possi-
ble. They take advantage of (Table 1)

© July 16, 2023


6.1 LU factorisation 4:53

• the vector nature of modern cpu chips, and


• the multiple hierarchy of caches between the cpu and your
computer’s memory.
There is no way we could write comparably good code.
Use these libraries by coding, in Matlab etc, in terms of matrices
and vectors. Then Gauss Elimination is known as LU factorisation.

Solution using LU factorisation This is implemented using ma-


trix operations by writing the matrix as the product (Quarteroni
et al. 2014, §5.3) (Kreyszig 2011, §20.2)
    
1 2 2 1 0 0 1 2 2
−1 −4 −6 = −1 1 0 0 −2 −4 .
−3 2 6 −3 −4 1 0 0 −4

Substituting into the matrix equation


      
1 2 2 x1 1 0 0 1 2 2 x1
−1 −4 −6 x2  = −1 1 0 0 −2 −4 x2 
−3 2 6 x3 −3 −4 1 0 0 −4 x3
| {z }
y=(y1 ,y2 ,y3 )
    
1 0 0 y1 −4
= −1 1 0 y2  =  4 
−3 −4 1 y3 8

First find y via


    
1 0 0 y1 −4
−1 1 0 y2  =  4  .
−3 −4 1 y3 8

This features a ‘lower triangular’ matrix: from first to third rows:

y1 = −4 7→ y2 = 0 7→ y3 = −4 .

Second, solve the following for x:


      
1 2 2 x1 y1 −4
0 −2 −4 x2  = y2  =  0  .
0 0 −4 x3 y3 4

This features an ‘upper triangular’ matrix: from third up to first


rows:
x3 = 1 7→ · · · x2 = −2 7→ · · · x1 = −2 .

Lower triangular matrices and forward substitution

© July 16, 2023


4:54 6 Numerical linear algebra

Definition 6.4. A matrix L is For example,


said to be lower triangular if all
the elements above the diagonal
 
1 0 0
are zero; that is, Lij = 0 for −1 1 0
i < j. −3 −4 1

Solve lower triangular systems straightforwardly by forward substi-


tution (Quarteroni et al. 2014, §5.3).

Upper triangular matrices and backward substitution

Definition 6.5. A matrix U is For example,


said to be upper triangular if all
the elements below the diagonal
 
1 2 2
are zero; that is, Uij = 0 for 0 −2 −4
i > j. 0 0 −4

Upper triangular systems are easily solved by backward substitution


(Quarteroni et al. 2014, §5.3).

Finding the factors


Example 6.6. Find the LU factorisation of
 
1 2 2
−1 −4 −6
−3 2 6

by writing the factors in the form


  
1 0 0 U11 U12 U13
L21 1 0  0 U22 U23  .
L31 L32 1 0 0 U33

Solution using LU factorisation Compute the solution to Ax =


b:
• find the factorisation A = LU ;
• second solve Ly = b;
• third solve U x = y.

Matlab solution using LU factorisation We will not discuss


the details of how the factors L and U are obtained.
[L, U, P] = lu(A) returns lower and upper triangular factors
of P A—for a row permutation matrix P . For an n × n matrix A,
this takes O(n3 ) arithmetic operations (as n → ∞).
Multiplying by P , we solve P Ax = P b via the factorisation LU x =
P b.

© July 16, 2023


6.1 LU factorisation 4:55

y = L\(P*b) finds the solution of Ly = P b in O(n2 ) operations.


x = U\y then finds the solution of U x = y in O(n2 ) operations. The
vector x then contains the solution of the original matrix equation.
If it is subsequently necessary to solve the matrix equation with a
different right hand side b, then reuse the existing factors L and U
and thereby avoid repeating the expensive LU factorisation.

Example 6.7. Execute egSimpleLU.m


>> A = [1 2 2; -1 -4 -6; -3 2 6]
A =
1 2 2
-1 -4 -6
-3 2 6
>> b = [-4; 4; 8]
b =
-4
4
8
>> [L, U, P] = lu(A)
L =
1.0000 0 0
0.3333 1.0000 0
-0.3333 -0.5714 1.0000
U =
-3.0000 2.0000 6.0000
0 -4.6667 -8.0000
0 0 -0.5714
P =
0 0 1
0 1 0
1 0 0
>> y = L\(P*b)
y =
8.0000
1.3333
-0.5714
>> x = U\y
x =
-2.0000
-2.0000
1.0000

© July 16, 2023


4:56 6 Numerical linear algebra

Example 6.8. Execute cfLUPfac.m

>> n = 5000; A = randn(n,n); b = randn(n,1);


>> tic; x = A\b; toc
Elapsed time is 1.259202 seconds.

>> tic; [L,U,P] = lu(A); x = U\(L\(P*b)); toc


Elapsed time is 1.306710 seconds.

>> b = randn(n,1); % new RHS vector b


>> tic; x = U\(L\(P*b)); toc
Elapsed time is 0.047264 seconds.

When is this useful? Recall 2D spline coefficients: to fit inter-


polant

N
X q
u(x, y) = a + b1 x + b2 y + cj (x − xj )2 + (y − yj )2 ,
j=1

to data (xj , yj , fj ), we must solve

    
0 0 0 1 1 ··· 1 a 0
0 0 0 x1 x2 ··· xN   b1   0 
    
0 0 0 ···
y1 y2 yN   b2   0 
   
= 

1 x1 y1 r1,1 r1,2 ··· r1,N  
 c1   f1 


. .. .. .. .. .. .   . 
 .. .   ..   .. 
 
. . . .
1 xN yN rN,1 rN,2 · · · rN,N cN fN

p
where ri,j = (xi − xj )2 + (yi − yj )2 .

Consider xj , yj fixed, but fj ’s changing in time . . .

http://www.bom.gov.au/sa/observations/adelaidemap.shtml
Need to resolve onto ≈ 4 km grid:

© July 16, 2023


6.2 QR factorisation 4:57

6.2 QR factorisation
Exceptionally tricky problems for LU factorisation Sometimes
LU factorisation does not work usefully.

Example 6.9. Consider Ax = b, where


 
1 0 0 0 ··· 1
−1 1 0 0 ··· 1
 
−1 −1 1 0 ··· 1
A = −1 −1 −1 1 .
 
 ··· 1 
 . .. .. .. ..
 ..

. . . .
−1 −1 −1 −1 ··· 1

In the following Matlab transcripts, we choose an exact solution x


and calculate the corresponding b. We then use that b and try to
recover the original x using LU factorisation.
Using inv() or LU factorisation to solve this problem for small n
is fine: execute compareMethods.m
>> compareMethods
n =
10
A =

© July 16, 2023


4:58 6 Numerical linear algebra

1 0 0 0 0 0 0 0 0 1
-1 1 0 0 0 0 0 0 0 1
-1 -1 1 0 0 0 0 0 0 1
-1 -1 -1 1 0 0 0 0 0 1
-1 -1 -1 -1 1 0 0 0 0 1
-1 -1 -1 -1 -1 1 0 0 0 1
-1 -1 -1 -1 -1 -1 1 0 0 1
-1 -1 -1 -1 -1 -1 -1 1 0 1
-1 -1 -1 -1 -1 -1 -1 -1 1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 1
Inverse method:
timeInv =
0.0001
relErrorInv =
1.4560e-16

LU decomposition, same as A\ :
timeLU =
0.0414
relErrorLU =
9.7798e-15

But for n > 50, LU runs into big trouble:

Example 6.10.

>> compareMethods
n =
150
Inverse method:
timeInv =
0.0006
relErrorInv =
5.2904e+03

LU decomposition, same as A\ :
> In compareMethods (line 25)
Warning: Matrix is close to singular or badly scaled.
Results may be inaccurate. RCOND = 1.401298e-45.
timeLU =
0.0008
relErrorLU =
0.7713

QR decomposition:
timeQR =
0.0012
relErrorQR =
5.8472e-14

© July 16, 2023


6.2 QR factorisation 4:59

QR factorisation LU difficulties are often resolved using QR fac-


torisation (Quarteroni et al. 2014, §5.7).
Although QR factorisation is slower than LU factorisation, by a
factor of three, it is much more robust: e.g., is less susceptible to
amplification of errors arising from finite precision arithmetic.
QR factorisation also solves approximation problems, such as curve
fitting.

Orthogonal matrices
Definition 6.11. An orthogonal matrix Q is a square matrix whose
transpose is its inverse, that is,
QT = Q−1 and QQT = QT Q = I.

This is one class of matrices where we do ‘compute’ the inverse—


because there is nothing to compute!
 
3/5 4/5
Example 6.12. The matrix Q = is orthogonal. Its
−4/5 3/5
inverse is  
3/5 −4/5
Q−1 T
=Q =
4/5 3/5

QR factorisation
Definition 6.13. A QR factorisation of an m × n matrix A is
A = QR, where Q is an m × m orthogonal matrix and R is an
m × n upper triangular matrix.
 
3 1
Example 6.14. A QR factorisation of A = is
−4 2
  
3/5 4/5 5 −1
A = QR =
−4/5 3/5 0 2

Solution using QR factorisation


Example 6.15. Solve the linear system
    
3 1 x1 2
= .
−4 2 x2 −1
Answer: via QR factorisation, write this as
     
3/5 4/5 5 −1 x1 2
= .
−4/5 3/5 0 2 x2 −1
Using the orthogonality of Q,
       
5 −1 x1 3/5 −4/5 2 2
= = .
0 2 x2 4/5 3/5 −1 1
Solving the triangular system then gives x = ( 21 , 12 ).

© July 16, 2023


4:60 6 Numerical linear algebra

Obtain the solution to Ax = b by


• finding the factorisation A = QR with [Q,R]=qr(A),
• then solving Rx = (QT b) with x=R\(Q’*b).
However, needs some modification in common scenarios.

Matlab solution using QR factorisation We do not discuss


details of how to obtain the factors Q and R—although it is very
like Gaussian Elimination, so leave it to lapack.
For an m × n matrix A, [Q, R] = qr(A) returns an m × m orthog-
onal matrix Q and an m × n upper triangular matrix R such that
A = QR. This takes O(n3 ) arithmetic operations (about the same
as calculating a full inverse, and more than LU factorisation).
x = R\(Q’*b) finds the solution of Rx = QT b.

Example 6.16. Resolve Example 6.9.


>> compareMethods
n =
1000
...
QR decomposition:
timeQR =
0.0766
relErrorQR =
9.3977e-13

Further properties of Q Recall that the dot product of two


vectors u and v is

u · v = uT v = u1 v1 + u2 v2 + · · · + un vn .

The Euclidean norm, or length, of u is


√ q
kuk = uT u = u21 + u22 + · · · + u2n .

The angle θ between u and v is

u·v uT v
cos θ = =
kuk kvk kuk kvk

Theorem 6.17. Multiplication by an orthogonal matrix preserves


all lengths and angles: it rotates and/or reflects the vectors by a
fixed amount.

Linear least squares curve fitting

© July 16, 2023


6.2 QR factorisation 4:61

Example 6.18. Suppose you know the power consumption of


a room heater at three temperatures: at t1 = 5◦ C the power
p1 = 2.4 kW; at t2 = 10◦ C the power p2 = 1.4 kW; at t3 = 15◦ C
the power p3 = 0.7 kW.
We propose that the power consumption depends linearly upon the
temperature; that is, p(t) = at + b . What are the ‘best’ values to
choose for the coefficients x = (a, b)?
Example 6.18

3.5

2.5

2
p

1.5

0.5

0
0 5 10 15 20
t

Errors in satisfying equations The differences between the model


and the data are

5a + b − 2.4 ,
10a + b − 1.4 ,
15a + b − 0.7 ,

which we write as a vector of residuals r = Ax − b (the vector of


the equation-errors), where
   
5 1   2.4
a
A = 10 1 , x = , and b = 1.4 .
b
15 1 0.7

Generally: use residual r for errors in the equations, Ax − b; and


use error e for the error in ‘solution’, x − xtrue .

Linear least squares curve fitting We choose x so that the resid-


uals r are as small as possible.
Let’s choose to measure the size of r using the Euclidean norm krk.

© July 16, 2023


4:62 6 Numerical linear algebra

By minimising krk, we equivalently minimise the sum of the squared


errors in the fit.
• Other choices have advantages, and now computation is cheap
are sometimes implemented—be aware.
• Many solve least-squares problems via the normal equation,
(AT A)x = AT b. Like the inverse, this is fine in theory, but
avoid in practice (as any badness in A is greatly exacerbated
in dealing with (AT A)).
Using the QR factorisation of A:

r = Ax − b = QRx − b

Now apply the rotation QT :

QT r = QT QRx − QT b = Rx − QT b

Recall that QT preserves lengths (Theorem 6.17), hence

krk = kQT rk = kRx − QT bk

Minimising the residual of the equations For Example 6.18,


   
18.7083 1.6036   1.9510
a
QT r = Rx − QT b =  0 0.6547 −  2.0949 
b
0 0 −0.1225

Changing a and b does nothing to affect the bottom row, which


evaluates to 0.1225. This is an unavoidable residual (equation-
error).
However, the top two rows are set to zero by solving
    
18.7083 1.6036 a 1.9510
=
0 0.6547 b 2.0949

using back-substitution. This determines the best a and b as it


minimises kQT rk = krk.

Matlab solution For Example 6.18 (see pwrQRsolve.m):


>> A = [5 1; 10 1; 15 1];
>> b = [2.4; 1.4; 0.7];
>> [Q, R] = qr(A)
Q =
-0.2673 0.8729 0.4082
-0.5345 0.2182 -0.8165
-0.8018 -0.4364 0.4082
R =
-18.7083 -1.6036

© July 16, 2023


6.3 Norms and condition numbers 4:63

0 0.6547
0 0
>> j = find(abs(diag(R)) > 1e-8)
j =
1
2
>> x = R(j,:)\(Q(:,j)’*b)
x =
-0.1700
3.2000
Matlab’s \ operator (usually) finds the least squares solution
automatically, without informing you (Quarteroni et al. 2014, §5.8):
>> x = A\b
x =
-0.1700
3.2000

Generalised linear least squares curve fitting The method ex-


tends to find the least squares fit of a function to data points (t1 , b1 ),
(t2 , b2 ), . . . , (tm , bm ).
Given specified basis functions φj (t), the method determines the
coefficients xj of the function
n
X
f (t) = xj φj (t)
j=1

that best fits the data points.

Generalised linear least squares curve fitting In the general


case, the error/residual to be minimised is r = Ax − b, where
 
φ1 (t1 ) φ2 (t1 ) · · · φn (t1 )
 φ1 (t2 ) φ2 (t2 ) · · · φn (t2 ) 
A= . .. ..  ,
 
 .. . . 
φ1 (tm ) φ2 (tm ) · · · φn (tm )

x = (x1 , . . . , xn ) and b = (b1 , . . . , bm ).


Using qr factorisation determines the vector of coefficients x that
minimises krk.
6.3 Norms and condition numbers
Ill conditioned problems
The expression ‘ill-conditioned’ is sometimes used merely
as a term of abuse . . . It is characteristic of ill-conditioned
sets of equations that small percentage errors in the coef-
ficients given may lead to large percentage errors in the
solution. Alan Turing, 1934 (Higham 1996, p.131)

© July 16, 2023


4:64 6 Numerical linear algebra

This is crucial, especially to analysing experimental data.

Example 6.19. A Hilbert matrix H has entries Hij = 1/(i + j − 1)


for i, j = 1, . . . , n (Quarteroni et al. 2014, Example 5.9). For
example, for n = 4,
 
1 1/2 1/3 1/4
1/2 1/3 1/4 1/5
H=
1/3

1/4 1/5 1/6
1/4 1/5 1/6 1/7

Try solving Hx = b. Choose b such that the exact solution x is a


vector of ones.

Poorly conditioned problems See hilbert.m (with its auto-replication


Example 1.8):
>> n=6 % also try 11
n =
6
>> i = 1:n;
>> H = 1./(i+i’-1);
>> b = H*ones(n,1);
>> x = H\b;
>> errorLU=norm(x-1)
errorLU =
5.7735e-10
>> [Q,R]=qr(H);
>> x=R\(Q’*b);
>> errorQR=norm(x-1)
errorQR =
5.2351e-10
>> conditionNumberH=cond(H)
conditionNumberH =
1.4951e+07
No big problem with n=6, but try the case n=11.
Here the growth of errors is not due to LU factorisation, but is due
to the intrinsic nature of the matrix H.
The condition number helps us detect problematic matrices such as
this (Kreyszig 2011, §20.4).

Matrix norm

© July 16, 2023


6.3 Norms and condition numbers 4:65

Definition 6.20. The norm of a matrix A (norm(A)) is

kAxk
kAk = max = max kAxk.
x6=0 kxk kxk=1

Consequently:
• kAk ≥ 0 ;
• kAmk = 0 if and only if A is all zero;
• *** kAxk ≤ kAk kxk for all x; and
• kAxk = kAk kxk for at least one x 6= 0 .


1 0
Example 6.21. • The matrix A = has norm kAk2 =
1 1

(3 + 5)/2.
• Every orthogonal matrix Q has norm kQk = 1.

Condition number
Definition 6.22. (Kreyszig 2011, §20.4) The condition number of
a square matrix A is

cond A = kAk kA−1 k.

Compute with cond(A), or quicker is the approximate condest(A).

Example 6.23. 1. For every orthogonal matrix Q, its transpose


is also orthogonal, and so

cond Q = kQk kQ−1 k = kQk kQT k = 1 · 1 = 1 .

This is the smallest possible condition number.


2. The Hilbert Matrix of Example 6.19 has a terrible condition
number that flags fundamental problems:
>> i = 1:11; H = 1./(i+i’-1);
>> cond(H)
ans =
5.2e+14
3. For Example 6.10 the moderate condition number indicates
the issues are with the algorithm, not the matrix:
>> n=100;
>> A=toeplitz([1 -ones(1,n-1)],[1 zeros(1,n-1)]);
>> A(:,n)=ones(n,1);
>> cond(A)
ans =
44.8023

© July 16, 2023


4:66 6 Numerical linear algebra

4. The linear system


    
1 2 x1 5
=
2 4.01 x2 10

has exact solution x = (5, 0). But if the second element of b


is perturbed to 10.01, then the solution changes to x = (3, 1) !

Since the condition number is 2508 (via cond()), the up-


coming Theorem 6.24 proves that the relative
√ change in the
right hand side of just 0.01/kbk = 0.01/ 102 + 52 ≈ 0.0009
generally causes a relative change in the solution of about
0.0009 · 2508 ≈ 2.24 —as seen here.

Error bounds

Theorem 6.24. An approximate solution x∗ to the linear system


Ax = b has relative error kek/kxk bounded by

kek krk
≤ cond(A)
kxk kbk

where the error e = x∗ −x, and the residual r = Ax∗ −b (Quarteroni


et al. 2014, §5.5) (Kreyszig 2011, §20.4).

Remember When solving linear equations Ax = b, relative


errors are multiplied by cond A.

Conditioning A non-zero residual r is almost inevitable be-


cause computer arithmetic is not exact. Computers can only store
a finite number of significant digits.

In most computers, eps = 2 · 10−16 is the smallest number for which


1 + eps 6= 1, see Matlab, and is a measure of the relative error in
any arithmetic operation and, in this case, of krk (Quarteroni et al.
2014, §1.2).

Consequently, the relative error in any solution of a linear system


is at least 2 · 10−16 times the condition number.

Rule-of-thumb Consequently, just from computational errors a


condition number is

1 ≤ good < 102 < poor < 104 < bad < 108 < terrible.

But when a matrix or right-hand side has experimental errors, then


be more conservative.

Experiments may only measure quantities to two–four digit accu-


racy: analysis of those experiments may easily lose all significance.

© July 16, 2023


6.4 Jacobi iteration 4:67

Rule-of-thumb Let the relative error in measurements be de-


noted by , then the condition number is

1 ≤ good <  < bad <  < terrible .

Example 6.25 (Optional: conditioning to matrix-error). Suppose


we seek the solution x to the system Ax = b, but due to some errors
we only have available the approximation A∗ to matrix A. Let x∗
be the solution to A∗ x∗ = b. Given that the errors e = x∗ − x
and E = A∗ − A are both small, derive that the relative error in x
is bounded by the condition number of A times the relative error
in A:
kek kEk
≤ cond(A) .
kxk kAk

6.4 Jacobi iteration


Iterative solvers Solvers based on LU and QR factorisation are
often referred to as direct solvers because the solution is calculated
in a finite number of steps.

Iterative solvers work by generating a sequence of approximations


to the solution. The iteration is terminated when the error is
acceptably small. Iterative solvers are often used when the linear
system is very large, direct solvers are impracticable, or matrix
not explicitly known (Quarteroni et al. 2014, §5.9) (Kreyszig 2011,
§20.3).

Jacobi iteration is a simple scheme that works for diagonally domi-


nant linear systems. Better and more sophisticated iterative solvers
should be used, such as Conjugate Gradients taught in Optimisa-
tion III (Quarteroni et al. 2014, §5.11).

Jacobi iteration

© July 16, 2023


4:68 6 Numerical linear algebra

Example 6.26. Consider the linear system

10x1 − x2 + 2x3 = 6 ,
−x1 + 11x2 − x3 + 3x4 = 25 ,
2x1 − x2 + 10x3 − x4 = −11 ,
3x2 − x3 + 8x4 = 15 .

Rewrite as (Quarteroni et al. 2014, §5.9.1)


1
x1 = (6 + x2 − 2x3 ),
10
1
x2 = (25 + x1 + x3 − 3x4 ),
11
1
x3 = (−11 − 2x1 + x2 + x4 ),
10
1
x4 = (15 − 3x2 + x3 ).
8

Given some previous values of x1 , x2 , x3 and x4 , we use the above


formulae to generate new values.
Start with the guess x(0) = (0, 0, 0, 0). Then

(1) 1 (0) (0) 6


x1 = (6 + x2 − 2x3 ) = ,
10 10
(1) 1 (0) (0) (0) 25
x2 = (25 + x1 + x3 − 3x4 ) = ,
11 11
(1) 1 (0) (0) (0) 11
x3 = (−11 − 2x1 + x2 + x4 ) = − ,
10 10
(1) 1 (0) (0) 15
x4 = (15 − 3x2 + x3 ) = .
8 8

A second iteration yields


(2) 1 (1) (1)
x1 = (6 + x2 − 2x3 ) = 1.0473 ,
10
(2) 1 (1) (1) (1)
x2 = (25 + x1 + x3 − 3x4 ) = 1.7159 ,
11
(2) 1 (1) (1) (1)
x3 = (−11 − 2x1 + x2 + x4 ) = −0.8052 ,
10
(2) 1 (1) (1)
x4 = (15 − 3x2 + x3 ) = 0.8852 .
8

And so on . . . (see egJacobi.m)

Matrix splitting Consider the linear system Ax = b. We split A


into the sum of two matrices, D and A − D, so that

Ax = b
Dx + (A − D)x = b
Dx = (D − A)x + b

© July 16, 2023


6.4 Jacobi iteration 4:69

For Jacobi iteration, we choose to make D a diagonal matrix whose


diagonal elements are the same as the diagonal elements of A, that
is, Dii = Aii .
Then D−1 is simply a diagonal matrix whose diagonal elements
are 1/Dii = 1/Aii .

Example 6.27. In our previous example,


   
10 −1 2 0 6
−1 11 −1 3   25 
A=  2 −1 10 −1 and b = −11
  

0 3 −1 8 15
We write
   
10 0 0 0 0 −1 2 0
 0 11 0 0 −1 0 −1 3 
D= 
 0 0 10 0 and A − D = 
 2 −1 0 −1

0 0 0 8 0 3 −1 0

Jacobi iteration Jacobi iteration is defined by


Dx(k) = (D − A)x(k−1) + b
= Dx(k−1) |− Ax(k−1)
{z + b} .
r (k−1)

Implement via the residual r (k−1) = Ax(k−1) − b ,


and update using x(k) = x(k−1) − D−1 r (k−1) .
The elements of D−1 r are ri /Dii = ri /Aii = r./diag(A).

Termination The iteration proceeds until the magnitude of


every element of the residual r (k−1) is smaller than some value
called the tolerance.

Diagonal dominance
Definition 6.28. An n × n matrix A is diagonally dominant if the
diagonal elements in each row of A are larger in magnitude than
the sum of the absolute values of the off-diagonal elements: that is,
if
n
X
|Aii | > |Aij |, i = 1, . . . , n.
j=1
j6=i

Example 6.29. The matrix


 
10 −1 2 0
−1 11 −1 3 
A=  2 −1 10 −1

0 3 −1 8
is diagonally dominant (check the definition holds).

© July 16, 2023


4:70 6 Numerical linear algebra

Convergence
Theorem 6.30. If the matrix A is diagonally dominant, then
Jacobi iteration converges to the solution of Ax = b (Quarteroni
et al. 2014, Prop. 5.3).

Proof overview
(k)
• Define the overall error (k) = maxi |xi − xi |.
• Then (k) ≤ G(k−1) where here
n
1 X
G = max |Aij |.
i |Aii |
j=1, j6=i

• So when G < 1, diagonally dominant, errors → zero.


This proof is analogous to those for methods to solve nonlinear
equations.

Use iteration in large problems In large problems iteration may


be much more effective than direct solution (Quarteroni et al. 2014,
§5.13).
But typically Jacobi iteration converges excruciatingly slowly: so
avoid in practice.
In practice, use a Krylov iteration method (Quarteroni et al. 2014,
§5.11) (do not require diagonal dominance) such as
• Conjugate Gradients (Optimisation III), or
• the Matlab function gmres() (see gmresVsJacobi.m).

© July 16, 2023


4:71

7 Nonlinear equations
Nonlinear equations Given some function f (x), we explore the
problem of finding solutions x of

f (x) = 0 .

Such solutions are referred to as the roots of the equation, or the


zeros of the function (Quarteroni et al. 2014, Ch. 2) (Kreyszig 2011,
§19.2).
Applications arise in investment funds, state equations of a gas,
systems of rods, population dynamics (Quarteroni et al. 2014, §2.1),
among many others.
7.1 Fixed-point iteration
Fixed-point iteration Algebraically rearrange f (x) = 0 to the
form x = g(x).
Then guess x = x0 and iterate according to (Quarteroni et al. 2014,
§2.6) (Kreyszig 2011, §19.2)

xk+1 = g(xk ), k = 0, 1, . . .

If limk→∞ xk = s and g is continuous, then s = g(s). The point s


is called a fixed point of g.

Example 7.1. Suppose we do not know the formula for solving


x2 − 3x + 1 = 0.
We may rearrange this equation as either of the following three
possibilities (there are many more)

1 1 x2 − 1
x = (x2 + 1), or x = 3 − , or x = ,
3 x 2x − 3
where mysteriously to get the last, add x2 − 1 to both sides then
divide by 2x − 3. These three yield the three iteration formulae,
respectively,

1 1 x2k − 1
xk+1 = (x2k + 1) or xk+1 = 3 − or xk+1 = .
3 xk 2xk − 3

Starting at x0 = 1, each iteration formula produces a different


sequence (see fixedPoint1.m), respectively,
• 1, 0.6667, 0.4815, . . . , 0.3820, . . .
• 1, 2, 2.5, . . . , 2.6180, . . .
• 1, 0, 0.3333, . . . , 0.3820, . . .
These converge to the two roots of x2 − 3x + 1 = 0.
However, other initial guesses, such as x0 = 3 or x0 = 0 are not
successful.

© July 16, 2023


4:72 7 Nonlinear equations

Theorem 7.2. Let x = s be a solution of x = g(x) and suppose


that the function g has a continuous derivative in some interval I
containing s and that |g 0 (x)| ≤ G < 1 in I. Then the iteration
xk+1 = g(xk ) converges for every initial x0 in I and limk→∞ xk = s
(Quarteroni et al. 2014, Thm. 2.1) (Kreyszig 2011, §19.2, Thm. 1).

Example 7.3. Consider Example 7.1


• |g 0 | = |2x/3| < 1 only when |x| < 3/2: so this iteration only
converges to the root s = 0.382, and cannot converge to 2.618.
• |g 0 | = |1/x2 | < 1 only when |x| > 1: so this iteration only
converges to the root s = 2.618, and cannot converge to 0.382.
• g 0 = · · · = 2(x2 − 3x + 1)/(2x − 3)2 which cunningly has the
function in the numerator and so |g 0 | is zero at every root of
the equation. So |g 0 | < 1 in some region around every root.
Hence this scheme converges to one other other root provided
the initial guess is close enough.
Further, because |g 0 | is small near each root, so the upper
bound G is small ‘near’ each root, and the convergence is
rapid.

7.2 Newton iteration


Newton iteration Suppose f (x) can be approximated near xk
by a truncated Taylor series. Then

f (xk+1 ) ≈ f (xk ) + f 0 (xk )(xk+1 − xk ).

We want f (xk+1 ) ≈ 0; that is, (Quarteroni et al. 2014, §2.3)


(Kreyszig 2011, §19.2)

f (xk ) + f 0 (xk )(xk+1 − xk ) = 0


f (xk )
⇐⇒ xk+1 = xk − 0 .
f (xk )

Example 7.4. Use Newton’s method on the quadratic f (x) =


x2 − 3x + 1 = 0 for which f 0 (x) = 2x − 3 . Then

x2k − 3xk + 1
xk+1 = xk −
2xk − 3
2
x −1
= ··· = k .
2xk − 3

This is the third alternative introduced by Example 7.1.


Only four iterations are necessary to obtain four decimal place
accuracy.

© July 16, 2023


7.2 Newton iteration 4:73

Convergence
Theorem 7.5. If f (x) is twice continuously differentiable, and
f 0 is not zero at a root s of f (x) = 0, then for initial x0 sufficiently
close to s, the rate of convergence of Newton’s method is quadratic,
where ‘quadratic’ means that errork ≤ M error2k−1 (Quarteroni et al.
2014, Prop. 2.2) (Kreyszig 2011, §19.2, Thm. 2).

Termination Consider further Example 7.1. To solve f (x) =


x2 −3x+1 = 0 using the Newton iteration xk+1 = (x2k − 1)/(2xk − 3),
we need to terminate the iteration. Various possibilities are:
• an absolute test on the iterates, |xk+1 − xk | < ;
if abs(xNew - x) < tol, break, end
• a relative test on the iterates, |xk+1 − xk | < |xk |;
• an absolute test on the residual, |f (xk )| < ;
if abs(r) < tol, break, end
• a relative test on the residual, |f (xk )| < |f (x0 )|.
Depending on the circumstances, any of these may be misleading
or even fail. Choose a termination condition that is suitable for
your problem.

No Derivative? Sometimes the derivative f 0 (x) is unavailable


(perhaps the function is calculated by some ‘black box’ software).
In that case, we may use a finite difference approximation to the
derivative, such as those derived in Section 4. For example,

f (x + h) − f (x)
f 0 (x) ≈ ,
h

where h is in the range 10−6 to 10−8 .

Example 7.6. Use Newton’s method to find the zeros of the


smallest singular value of the matrix
 
−2 − 2x 2 + 3x 5x
 −4 4x 2 + x .
5 − 5x 2 + 4x x

We just need to know that svd(A) (the ‘black-box’) returns a vector


of ‘singular values’ of matrix A (non-negative), and so min(svd(A))
computes the smallest singular value.
In this case, compute the residual function with code
C = [-2 2 0; -4 0 2; 5 2 0]
B = [-2 3 5; 0 4 1; -5 4 1]
f=@(x) min(svd(B*x+C))

© July 16, 2023


4:74 7 Nonlinear equations

No algebraic expression for the derivative f 0 (x) is known!


Newton’s method, using a difference approximation for f 0 (x) and
starting at x0 = 1, converges to x = 0.5356 (see egNoDerivative.m).

7.3 Systems of equations


Nonlinear systems The simplest nonlinear system consists of
two nonlinear equations in two variables:

f (x, y) = 0 ,
g(x, y) = 0 .

A solution to this system consists of those values of x and y that


satisfy both equations simultaneously. Newton iteration generalises
to find such solutions (Quarteroni et al. 2014, §2.5).

Newton iteration of two equations


Example 7.7. Let’s solve the nonlinear system

f (x, y) = x2 − xy 2 − xy − 1 = 0 ,
g(x, y) = y 2 + x3 + xy − 3 = 0 .

Try guessing (x, y) = (1, 0). Then f (1, 0) = 0 and g(1, 0) = −2.
This is not a solution, because a solution must satisfy both equa-
tions.
Try to find a better solution by modifying the original guess as
(x, y) = (1 + ∆x, ∆y), where ∆x and ∆y are notionally small.
Ignoring the very much smaller product terms ∆x2 , ∆y 2 , ∆x∆y,
and so on, we obtain

f (1 + ∆x, ∆y) = · · · ≈ 2∆x − ∆y + 0 ,


g(1 + ∆x, ∆y) = · · · ≈ 3∆x + ∆y − 2 .

We want both f (1 + ∆x, ∆y) = 0 and g(1 + ∆x, ∆y) = 0, and so


we rewrite this pair as the linear system
    
2 −1 ∆x 0
=− .
3 1 ∆y −2

Solving the linear system yields (∆x, ∆y) = (2/5, 4/5). The next
approximation is thus (x1 , y1 ) = (1 + ∆x, ∆y) = (7/5, 4/5).
We could continue this process by searching for a next better
approximation (x2 , y2 ) = (x1 + ∆x0 , y1 + ∆y 0 ), then (x3 , y3 ) =
(x2 + ∆x00 , y2 + ∆y 00 ), and so on. But to do so we need to find the
linear equations systematically.
Now the approximation process leading to f (1 + ∆x, ∆y) ≈ 2∆x −
∆y + 0, g(1 + ∆x, ∆y) ≈ 3∆x + ∆y − 2 gives a tangent plane

© July 16, 2023


7.3 Systems of equations 4:75

approximation to the multivariable functions. Obtain such tangent


plane approximations systematically with multivariable calculus.
For differentiable f (x, y) and g(x, y), approximate them near (xk , yk )
by the tangent plane from the truncated multivariable Taylor series.
Then
f (xk+1 , yk+1 ) = f (xk + ∆x, yk + ∆y)
∂f ∂f
≈ f (xk , yk ) + ∆x + ∆y,
∂x xk ,yk ∂y xk ,yk
g(xk+1 , yk+1 ) = g(xk + ∆x, yk + ∆y)
∂g ∂g
≈ g(xk , yk ) + ∆x + ∆y,
∂x xk ,yk ∂y xk ,yk

where ∆x = xk+1 − xk and ∆y = yk+1 − yk .


We want both f (xk+1 , yk+1 ) ≈ 0 and g(xk+1 , yk+1 ) ≈ 0, hence
∂f ∂f
f (xk , yk ) + ∆x + ∆y = 0,
∂x xk ,yk ∂y xk ,yk
∂g ∂g
g(xk , yk ) + ∆x + ∆y = 0,
∂x xk ,yk ∂y xk ,yk

which we write as the linear system


" ∂f ∂f
#   
∂x xk ,yk ∂y xk ,yk ∆x f (xk , yk )
∂g ∂g =− .
∂x x ,y ∂y x ,y
∆y g(xk , yk )
k k k k

The matrix on the left hand side is called the Jacobian Jk .


The right-hand side is the (negative) vector residual r = (f, g)
evaluated at (xk , yk ).
Assuming this system can be solved for ∆x and ∆y, the next
approximation is
xk+1 = xk + ∆x ,
yk+1 = yk + ∆y .

Example 7.8. Let’s solve the nonlinear system


f (x, y) = x2 + y 2 − 2 = 0,
g(x, y) = y − cos x = 0.
Via differentiating, Newton iteration is
xk+1 = xk + ∆x,
yk+1 = yk + ∆y,
 2
xk + yk2 − 2
   
2xk 2yk ∆x
=− .
sin xk 1 ∆y yk − cos xk
(see newton2d.m)

© July 16, 2023


4:76 7 Nonlinear equations

Let xk = (xk , yk ) and ∆x = (∆xk , ∆y k ). Then the iteration


formulae are

xk+1 = xk + ∆x, Jk ∆x = −r k .

where residual r k = (f (xk , yk ), g(xk , yk )).


We do not compute the inverse of the Jacobian, but mathematically
the above iteration is

xk+1 = xk − Jk−1 r k ,

which is analogous to the single variable formula

xk+1 = xk − [f 0 (xk )]−1 f (xk ).

General Newton iteration Write a system of n nonlinear equa-


tions as f (x) = 0, where f and x are vectors of length n (Quarteroni
et al. 2014, §2.5).
The Newton iteration formulae for such a system are

xk+1 = xk + ∆x, Jk ∆x = −r k .

where residual r k = f (xk ), and the elements of the Jacobian are

∂fi
Jij = .
∂xj

Start with an initial guess x0 . Continue iteration until a specified


tolerance  is achieved:
• kxk+1 − xk k < , or
• kf (xk )k < , or
• kxk+1 − xk k < kxk k, or
• kf (xk )k < kf (x0 )k.

Optional: moderate Newton’s method In many problems New-


ton’s method makes big jumps to useless points xk+1 , and then
takes very many iterations to recover. One simple strategy is to
limit the bad jumps: whenever the new residual at xk + ∆x is
worse, then try half the jump (∆x = ∆x/2; repeatedly halve as
needed.
This works because the in the direction of ∆x, the norm kf k must
decrease for small enough steps. Consider the direction derivative
of kf (x)k2 to establish this.
This connects to part of Optimisation III.

© July 16, 2023


7.3 Systems of equations 4:77

Aside: Newton Fractals Consider solving f (z) = z 4 − 1 = 0


over the complex plane using Newton’s method. The roots are
1, −1, i, −i. Almost every starting point z0 ∈ C will eventually
converge to one of these roots.
Newton’s method over C works exactly as it does over R (Quarteroni
et al. 2014, §1.3):

f (zk ) zk4 − 1
zk+1 = zk − = z k −
f 0 (zk ) 4zk3

Starting points near 1 converge to the root 1 + 0i, and similarly for
the other roots.
What happens at the boundaries between regions?
Points are coloured by the root which they eventually converge to.
The boundaries dividing basins of attraction for each z0 turn out
to be fascinating! Other fractals are generated by other f (z).
-1

-0.5

0.5

1
-1 -0.5 0 0.5 1

For more information:


• Simon Tatham, Fractals derived from Newton–Raphson, http:
//www.chiark.greenend.org.uk/~sgtatham/newton/
• Johannes Rueckert, Newton’s Method as a Dynamical System
(PhD thesis), http://www.math.stonybrook.edu/cgi-bin/
thesis.pl?thesis06-1

© July 16, 2023


4:78 8 Ordinary differential equations

8 Ordinary differential equations


Differential equations Differential equations model an enormous
range of phenomena in physics, chemistry, biology, engineering, and
so on (Quarteroni et al. 2014, §8.1).
More often than not, it is impracticable or impossible to write down
the solution. But numerical methods allow us to find approximate
solutions.
8.1 Initial value problems
Initial value problems Consider the initial value problem con-
sisting of the ordinary differential equation (ode)

dy
= y 0 = f (t, y),
dt
subject to the initial condition y(t0 ) = y0 (Quarteroni et al. 2014,
§8.2) (Kreyszig 2011, §21.1).

Example 8.1. A basic model of the cooling of an object consists


of Newton’s law of cooling,
dT
= −k(T − Ta ),
dt
where T (t) is the temperature of an object, t is time, k is a constant
and Ta is the ambient temperature, subject to the initial condition

T (0) = T0 .

8.2 Euler’s method


Euler’s method We develop Euler’s method in the context of
Example 8.1 with Ta = 0 and k = 1, for which the analytic solution
is T (t) = t0 e−t . Here we seek a numerical solution.
We start at t = 0, where the solution is T (0) = T0 . Take a small
step forward in time from t = 0 to t = h. An approximation of the
derivative at t = 0 is
dT T (h) − T (0)
≈ .
dt t=0 h
Substituting this approximation into the ode yields (Quarteroni
et al. 2014, §8.3)

T (h) − T0
≈ −T0 =⇒ T (h) ≈ T0 − T0 h = (1 − h)T0 .
h

In general, we take a small step forward in time from t = tk to


t = tk+1 = tk + h. Let Tk ≈ T (tk ). Then

dT Tk+1 − Tk
≈ ≈ −Tk =⇒ Tk+1 = (1 − h)Tk .
dt t=tk h

© July 16, 2023


8.3 Systems of ordinary differential equations 4:79

Repeated application of this formula gives

Tk = (1 − h)k T0 .

We next derive that the numerical solution given by this formula


becomes more accurate as the step size h is reduced. But if h > 2,
then disaster strikes!

Effect of step-size The Taylor series of the exact solution T (t) =


T0 e−t is

h2 h3
 
−h
T (h) = T0 e = T0 1−h+ − + ··· .
2! 3!

This matches the first time step T1 = T0 (1 − h) to a local error


O(h2 ).
If h > 2 then |1−h| > 1, hence |Tk | = T0 |1−h|k grows exponentially
quickly (cf. the exact solution which decays)! For excessively large
time steps, the numerical solution is utterly wrong.

Euler’s method For the general initial value problem,


yk+1 − yk
≈ f (tk , yk ),
h
where yk ≈ y(tk ) and tk = t0 + kh. Rearranging gives (Quarteroni
et al. 2014, §8.3) (Kreyszig 2011, §21.1)

yk+1 = yk + hf (tk , yk ).

This is an explicit formula because all terms on the right hand side
are known from the previous time step.
Theorem 8.2. Euler’s method estimates y(t) to an error of O(h)
as h → 0, provided that |y 00 | ≤ M .

Rule-of-thumb Avoid methods poorer than O(h2 ): so avoid


Euler; we improve it soon.

8.3 Systems of ordinary differential equations


Systems of first-order differential equations Euler’s method gen-
eralises to systems of first-order odes of the form

dy
= y 0 = f (t, y), y, f ∈ Rm ,
dt
subject to the initial condition y(t0 ) = y 0 (Quarteroni et al. 2014,
§8.9) (Kreyszig 2011, §21.3).

© July 16, 2023


4:80 8 Ordinary differential equations

Example 8.3. Human heartbeats are commonly modelled by the


pair of coupled nonlinear odes

dx
 = −x3 + ax − b,
dt
db
= x − xa ,
dt
where x(t) is the time-dependent muscle fiber length, b(t) is an
electrochemical control that stimulates the muscles, and , a and xa
are constants.
See code euler.m and testeuler.m

Example 8.4. The angular displacement θ(t) of a pendulum is


governed by the second-order nonlinear ode

d2 θ g
2
+ sin θ = 0,
dt L
where g is the acceleration due to gravity and L is the length of the
pendulum.
For numerical solution we transform this into a system of two
first-order odes by defining y1 (t) = θ(t) and y2 (t) = θ0 (t). Then
 0  
y1 y2
= .
y20 − Lg sin y1

For a general system of first-order odes, Euler’s method is


y k+1 − y k
≈ f (tk , y k )
h
where y k ≈ y(tk ) and tk = t0 + kh. Rearranging gives

y k+1 = y k + hf (tk , y k ).

8.4 Higher order schemes


Improved Euler method The first few terms of the Taylor series
for y(t + h) are

h2 00
y(t + h) = y(t) + hy 0 (t) + y (t) + O(h3 ).
2!
Replace the derivatives y 0 (t) and y 00 (t) using the ode and its first
derivative

y 0 (t) = f (t, y),


y 00 (t) = ft (t, y) + fy (t, y) y 0 (t) = ft + fy f,

© July 16, 2023


8.4 Higher order schemes 4:81

where the subscripts denote partial derivatives. Here, any functions


whose arguments are suppressed are evaluated at (t, y), hence f =
f (t, y), ft = ft (t, y), and so on.
Thus
h2
y(t + h) = y(t) + hf + (ft + f fy ) + O(h3 ).
2
On the other hand, the first few terms of the multivariable Taylor
series for

f (t + h, y + hf ) = f (t, y) + hft (t, y) + hf fy (t, y) + O(h2 )


= f + h(ft + f fy ) + O(h2 ).

Hence
h
f (t + h, y + hf ) + f + O(h3 ).

y(t + h) = y(t) +
2

Let tk = t, yk = y(tk ) and fk = f (tk , yk ), and neglect the O(h3 )


term.
The improved Euler method (Heun method, RK2 method) is (Quar-
teroni et al. 2014, §8.8) (Kreyszig 2011, §21.1)

yk+1 = yk + 21 F1 + F2 ,
 

F1 = hf (tk , yk ),
F2 = hf (tk + h, yk + F1 ).

The improved Euler method is just one of a broader family of


methods, referred to as Runge–Kutta methods (it is a second-order
Runge–Kutta method).
Theorem 8.5. The improved Euler method has error O(h2 ).

Example 8.6. Consider again the earlier initial value problem


dT
= −T, T (0) = T0 .
dt
With Improved Euler, the first time step is

F1 = −hT0 ,
F2 = −h(T0 − hT0 ) = (h2 − h)T0 ,
T1 = T0 + 21 − hT0 − h(T0 − hT0 )] = (1 − h + 21 h2 )T0 .


The second time step is

F1 = −hT1 = −h(1 − h + 12 h2 )T0 ,


F2 = −h(T1 − hT1 ) = (h2 − h)(1 − h + 21 h2 )T0 ,
T2 = T1 + 21 − hT1 − h(T1 − hT1 ) = (1 − h + 21 h2 )2 T0 .
 

© July 16, 2023


4:82 8 Ordinary differential equations

And so on. At the kth time step,

Tk = (1 − h + 12 h2 )k T0 .

The Taylor series of the exact solution T (t) = T0 e−t is

h2 h3
 
−h
T (h) = T0 e = T0 1 − h + − + ··· .
2! 3!

This matches the first time step T1 to a local error O(h3 ).


If h > 2, then |1 − h + 12 h2 | > 1, hence |Tk | grows exponentially
(cf. the exact solution which decays)! For excessively large time
steps, the numerical solution is utterly wrong.

Example 8.7. Consider the initial value problem ivp


dy
= ty 2 subject to y(1) = 2.
dt

The exact solution is y = 2/(2 − t2 ), hence y → ∞ as t → 2.
Blow-up is not necessarily due to the numerical algorithm!
One step of the improved Euler scheme using h = 0.1 gives

F1 = 0.4, F2 = 0.6336, y1 = 2.5168,

which compares well with the exact solution y(1.1) = 2.5316.

Example 8.8. In the cases where the right-hand side f (t, y) does
not explicitly depend on y, then
Z tk+1
dy
= f (t) =⇒ y(tk+1 ) = y(tk ) + f (t) dt.
dt tk

The improved Euler method gives

yk+1 = yk + 12 h(fk + fk+1 ),


R tk+1
which is the trapezoidal approximation of yk + tk f (t) dt.
The improved Euler method generalises to systems of first-order
odes of the form

y 0 = f (t, y), y(t0 ) = y 0 .

Write the method as


1
 
y k+1 = y k + 2 F1 + F2 ,
F 1 = hf (tk , y k ),
F 2 = hf (tk+1 , y k + F 1 ).

© July 16, 2023


8.5 Adaptive error control 4:83

Example 8.9. Consider the nonlinear pendulum with g = L = 1


 0    
0 y1 y2 π/2
y = 0 = such that y(0) = .
y2 − sin y1 0

One step of the improved Euler method using h = 0.1 gives y 1 =


(1.5658, −0.1).

Runge–Kutta schemes The procedure for deriving the improved


Euler method extends to higher order in a straightforward but te-
dious way. A very popular scheme is the fourth-order Runge–Kutta
scheme (Quarteroni et al. 2014, §8.7) (Kreyszig 2011, §21.3)

1 
y k+1 = y k + F 1 + 2F 2 + 2F 3 + F 4 ,
6
F 1 = hf (tk , y k ),
F 2 = hf (tk + 12 h, y k + 12 F 1 ),
F 3 = hf (tk + 21 h, y k + 12 F 2 , )
F 4 = hf (tk + h, y k + F 3 ).

Example 8.10. Consider the ivp

dy
= ty 2 subject to y(1) = 2.
dt
One step of the fourth-order Runge–Kutta scheme using h = 0.1
gives (you do the details)

F1 = 0.4, F2 = 0.5082, F3 = 0.5335, F4 = 0.7060,


y1 = 2.5316,

which agrees with the exact solution to four decimal places.


8.5 Adaptive error control
Adaptive error control How do we choose the step size h?
For efficiency, we would like h as large as possible. But not so large
that the error exceeds some predefined tolerance.
The optimal value of h depends on the solution y(t). We expect
that h needs to be small when the solution is rapidly varying,
whereas a larger value of h is satisfactory when the solution is
slowly varying.
Ideally, we would like to adaptively control h. But this is not so
simple, as it couples the dynamics of the control to that of the
system (Quarteroni et al. 2014, §8.6.3).

© July 16, 2023


4:84 8 Ordinary differential equations

Example 8.11. Consider the ode y 0 = −y, for which the Euler
method is yk+1 = yk − hk yk , but now where hk is an adaptively
chosen step size.
Since the error inn Euler method is ∝ |y 00 |we try choosing hk =
c/|y 00 |, where c is a constant. Then pleasingly the time-step hk is
large when (curvature) |y 00 | is small, and vice versa.
However, in this example we know y 00 = −y 0 = +y and so hk =
c/|yk |. Consequently, Euler method becomes
c
yk+1 = yk − c sign(yk ) and tk+1 = tk + .
|yk |

We want something that approximates the solution y ∝ e−t , but


instead we obtain a numerical solution that eventually oscillates
by c each step.

Matlab ODE solvers Fortunately, Matlab is already equipped


with functions ode23 and ode45 that provide sophisticated adap-
tive error control for the time integration of systems of odes:
help ode23 describes that
[t,y] = ode23(odefun,[t0 tfinal],y0)
integrates the system of differential equations y 0 = f (t, y) from
time t0 to tfinal with initial value y0.
odefun is a function handle. For a scalar t and a vector y,
odefun(t,y) must return a column vector corresponding to f (t, y).
• ode23(@(t,y) t*y^2, [1 1.4], 2) solves y 0 = ty 2 over
1 ≤ t ≤ 1.4 such that y(1) = 2, and plots the solution.
• ode23(@(t,y) [y(2);-sin(y(1))], [0 9], [pi/2;0]) solves
y10 = y2 and y20 = − sin y1 over 0 ≤ t ≤ 9 such that y(0) =
( π2 , 0), and plots the solution.
In the output [t,y]=ode23(...), each row in array y is the solution
at the corresponding time in the vector t.

Example 8.12. Use ode23 to solve the heartbeat model

dx
 = −x3 + ax − b,
dt
db
= x − xa ,
dt
with parameters  = 0.2, a = 1.6 and xa = 0.7, and initial values
x(0) = 1 and b(0) = 0.5.
See heart.m on MyUni. Notice the nonuniform time steps that are
used.

© July 16, 2023


8.6 Stiff systems 4:85

8.6 Stiff systems


Multiple time scales Many systems exhibit dynamics that oc-
cur on vastly different time scales. Such systems are said to be stiff
(Quarteroni et al. 2014, §8.10.3).

Systems of chemical reactions : Some reactions take a few nanosec-


onds to complete, while others might take milliseconds.

Atmospheric evolution : Acoustic waves occur on a scale of mil-


liseconds, while seasons are annual (and climate varies over
scales of decades/centuries or more).

Material diffusion : The characteristic time scale is t ∝ L2 , where


L is a length scale. So if L varies between 1 cm and 10 m,
then the characteristic time scales vary over a factor of 106 .

Stiff systems

Example 8.13. For some constant λ > 0, consider the ivp

y 0 = −λy, y(0) = y0 ,

which has algebraic solution y(t) = y0 e−λt .

Euler’s method gives (and similarly for other schemes)

yk = (1 − λh)k y0 .

For this to decay, we require λh < 2 ⇐⇒ h < 2/λ.

Example 8.14. Consider the system

u0 = −100u
v 0 = −v

• The equations are uncoupled, so each equation can be solved


separately. The analysis of Example 8.13 requires that h <
0.02 for the first ode, and h < 2 for the second ode.

• If these equations were to be solved as a system, then we


would require
h < min(0.02, 2) = 0.02.

Stiff systems: ‘take-home message’ The fastest component in


the system limits the time step for all components.

This is often prohibitive, especially if we are mainly interested in


the long-term evolution.

© July 16, 2023


4:86 8 Ordinary differential equations

Implicit Euler method Suppose we use the approximation

yk+1 − yk
≈ f (tk+1 , yk+1 ),
h

where yk ≈ y(tk ) and tk = t0 + kh. Rearranging gives

yk+1 − hf (tk+1 , yk+1 ) = yk .

This is an implicit formula because we now have to solve a generally


nonlinear equation for yk+1 .

Example 8.15. Reconsider the ivp y 0 = −λy, y(0) = y0 , for


constant λ > 0, which has the solution y(t) = y0 e−λt . The implicit
Euler method gives
y0
yk = .
(1 + λh)k
This decays for every h > 0. For big time-step h the scheme will
be quantitatively inaccurate, but at least it will be qualitatively
correct—and not constraining.

Matlab solvers for stiff systems of odes These include ode23s


and ode15s which use implicit methods:
• the huge advantage is that time-steps are not constrained by
the fast variables;
• the cost is extra computation in solving nonlinear equations
for yk+1 at each time-step.
Alternatively, mathematical modelling may eliminate the unwanted
fast modes. For example, fluids are often assumed to be incom-
pressible in order to eliminate fast acoustic waves.
Current research in the School of Maths aims to computationally
do the equivalent.

8.7 Boundary value problems


Boundary value problems Boundary value problems consist of
a differential equation subject to certain conditions that are specified
at different values of the independent variable (Quarteroni et al.
2014, Ch. 9).

Example 8.16. Consider the ode for u(t)

u00 + 4u = 0, 0 < t < π/4,

subject to u(0) = 0 and u(π/4) = 1.


The solution is u = sin 2t.

© July 16, 2023


8.7 Boundary value problems 4:87

Numerical solution of boundary value problems We briefly look


at

• shooting, and

• finite-difference methods (Quarteroni et al. 2014, §9.2.1–2).

Shooting

Example 8.17. Write the second-order ode from the previous


example as a system of two first-order odes
 0  
y1 y2
=
y20 −4y1

where y1 = u and y2 = u0 . In principle, we could integrate using


the fourth-order Runge–Kutta scheme. However, we only have one
initial condition, that y1 (0) = u(0) = 0.

The approach guesses the other initial value.

Thus for some guessed x, we set y2 (0) = x and integrate forward


to obtain y1 (π/4), hoping that it satisfies the boundary condition
y1 (π/4) = u(π/4) = 1. If y1 (π/4) 6= 1, then we adjust x and try
again.

This amounts to solving the algebraic equation

f (x) = y1 (π/4) − 1 = 0,

where y1 (π/4) depends on x. This may be solved using Newton


iteration.

The idea extends to higher order, nonlinear equations.

One nice aspect is that we can make use of the existing computa-
tional infrastructure that we have developed, including Newton and
Runge–Kutta methods.

One disadvantage is that for many guessed initial values the solution
gets enormously large at the other boundary, which tends to ruin
Newton’s method.

Finite difference method

© July 16, 2023


4:88 8 Ordinary differential equations

Example 8.18. For the ode of Example 8.16 we seek u(t). Con-
struct a grid of points in time t:
π/4
tj = jh, h= , j = 0, . . . , n.
n
Then let uj = u(tj ). Earlier in the course (Section 4.3) we showed
that
uj−1 − 2uj + uj+1
u00j = u00 (tj ) = + O(h2 ).
h2
Substituting this into the differential equation and ignoring the O(h2 )
error term gives
uj−1 + (4h2 − 2)uj + uj+1 = 0, j = 1, . . . , n − 1.

At the boundaries
u0 = u(0) = 0 and un = u(π/4) = 1 .
Applying these to the finite difference equation at j = 1 and j = n−1
yields
(4h2 − 2)u1 + u2 = −u0 = 0 ,
un−2 + (4h2 − 2)un−1 = −un = −1 .

Then write this as the linear system


Au = b,
where u = (u1 , . . . , un−1 ), b = (0, . . . , 0, −1) and A is a sparse
tridiagonal matrix.
8.8 Partial differential equation for heat
Partial differential equations The finite difference method ex-
tends to the solution of partial differential equations, which have
more than one independent variable.
There are many methods for ordinary differential equations, but
there are vastly more methods for partial differential equations. We
just introduce one method on one problem to introduce some of
the main characteristics of most methods.

Heat equation
Example 8.19. Consider an insulated rod of unit length. Let
u(x, t) denote the temperature at position x and time t and sup-
pose the temperature at each end is zero (relative to the ambient
temperature). The temperature in the rod is governed by the heat
(diffusion) equation
∂u ∂2u
= α 2, 0 < x < 1, t > 0,
∂t ∂x
subject to
u(x, 0) = f (x), u(0, t) = 0, u(1, t) = 0,
where f (x) is the initial temperature distribution in the rod.

© July 16, 2023


8.8 Partial differential equation for heat 4:89

Finite difference formulation of the heat equation The domain


of the problem is 0 ≤ x ≤ 1 and 0 ≤ t. Construct a lattice-grid
(instead of h use ∆x and ∆t):
1
xj = j∆x, ∆x = , j = 0, . . . , n;
n
tk = k∆t, k = 0, 1, . . . .

Let ukj = u(xj , tk ) (most common convention on the sub/super-


scripts). Using the finite difference approximations developed in
this course, (Quarteroni et al. 2014, §9.2.6)

∂u k uk+1
j − ukj
= + O(∆t),
∂t j ∆t
∂2u k ukj−1 − 2ukj + ukj+1
= + O(∆x2 ).
∂x2 j ∆x2

Substituting these into the heat equation and ignoring the O(∆t)
and O(∆x2 ) error terms gives

α∆t
uk+1
j = (1 − 2s)ukj + s(ukj−1 + ukj+1 ), s= . (8.1)
∆x2
From the boundary conditions, uk0 = 0 and ukn = 0. Starting from
the initial condition
u0j = f (xj ),
we can use (8.1) to calculate ukj for j = 1, . . . , n − 1 and k = 1, 2, . . . .
This and other methods for solving partial differential equations
are covered in Level III maths.

© July 16, 2023


4:90 9 Monte Carlo methods

9 Monte Carlo methods


Monte Carlo methods
While nothing is more uncertain than the duration of
a single life, nothing is more certain than the average
duration of a thousand lives. Elizur Wright

Numerical methods that make extensive use of random numbers


are often referred to as Monte Carlo methods.
Here we focus on only Monte Carlo integration, which is useful for
approximating multidimensional integrals over complicated regions
(Quarteroni et al. 2014, §4.6). It is often used in finance as the
integrals in financial mathematics are often very high-dimensional.
Monte Carlo methods are ‘embarrassingly parallel ’. This means
they are highly vectorisable—do so.
For complex problems one may instead parallelise over many com-
puter cores (as in gpus), but such techniques are beyond the scope
of this course.

9.1 Random number generation


Random number generation An essential element of Monte Carlo
methods is a random number generator, such as the Matlab func-
tions rand and randn.
Random number generators typically use deterministic algorithms,
so the numbers are not really random! They are better called
pseudorandom numbers.
A linear congruential generator produces a sequence of uniform
pseudorandom integers Xk via the recurrence

Xk = (aXk−1 + c) mod m, k = 1, 2, . . . .

Example 9.1. m = 8, a = c = 3, X0 = 0 gives what?


Useful parameters for simple purposes are a = 75 , c = 0 and m =
231 −1 (Lewis, Goodman & Miller, 1969). A chosen integer X0 is the
seed : different seeds X0 lead to different sequences of pseudorandom
integers; if c = 0 then choose X0 > 0.
See randLinCon.m: in Matlab rem(now,1) is the fraction of the
current day that has elapsed.
Since 0 ≤ Xk < m, Uk = Xk /m are random real numbers uniform
on [0, 1).
Although the linear congruential generator conveys the basic idea,

© July 16, 2023


9.2 Monte Carlo integration 4:91

the whole field of random number generation was mes-


merized, for far too long, by the simple [linear congru-
ential generator] Press et al. (2007)

In practice, use better methods: in Matlab, rand and randn.

Random number generation rand(m,n) produces an m × n ma-


trix of random numbers drawn from a uniform distribution on
the interval (0, 1): rand(n) is an abbreviation for rand(n,n); and
simply rand is an abbreviation for rand(1,1).
rng(seed) sets the random number seed (integer > 0). To set a
seed depending upon how many milliseconds have elapsed in the
day, use
rng( round( 1e8*rem(now,1) ))
randn is the same but generates normally distributed numbers with
mean zero and standard deviation one.

Example 9.2. To create an m × n matrix of random numbers


drawn from a uniform distribution on the interval [a, b) use
a + (b-a)*rand(m,n)

9.2 Monte Carlo integration


Rb
Monte Carlo integration Consider the integral I = a g(x) dx .
1
Rb
Since the ‘mean value’ of g(x) is b−a a g(x) dx , so the integral
I = (b − a) mean(g). Suppose we generate N random numbers Xi ,
uniformly distributed between a and b. Then an estimate of the
mean value of g is
N
1 X
g(Xi ).
N
i=1

Hence a Monte Carlo approximation of the integral is

N
b−aX
IN = g(Xi ).
N
i=1

Example 9.3. Consider the case g(x) = C, where C is a constant.


Rb
In this basic case I = a C dx = (b − a)C , whereas the Monte Carlo
PN b−a PN
approximation IN = b−a N i=1 g(Xi ) = N
b−a
i=1 C = N N C =
(b − a)C .

© July 16, 2023


4:92 9 Monte Carlo methods


Figure 2: error in Monte Carlo estimates of 0 sin x dx for various N .
The reference line is N −1/2 .
10 0

10 -1
Error

10 -2

10 -3

10 -4
10 2 10 3 10 4 10 5
N


Example 9.4. Estimate I = 0 sin x dx using Monte Carlo inte-
gration. See mcsin.m (could also use mean()):
N I_mc e_mc e_mid
-----------------------------
100 1.9217 7.83e-02 8.22e-05
200 1.8987 1.01e-01 2.06e-05
400 1.9736 2.64e-02 5.14e-06
800 2.0025 2.54e-03 1.29e-06
1600 1.9915 8.54e-03 3.21e-07
-----------------------------
9.3 Multidimensional integrals
Multidimensional integrals Monte Carlo integration extends to
multi-dimensional integrals. For example, the mean/average of g(x, y)
over a rectangle [a, b] × [c, d] is
RdRb N
c g(x, y) dx dy
a 1 X
≈ g(Xi , Yi )
(d − c)(b − a) N
i=1
Z dZ b N
(d − c)(b − a) X
=⇒ g(x, y) dx dy ≈ g(Xi , Yi )
c a N
i=1

where (Xi , Yi ) are independent random numbers uniformly dis-


tributed in the rectangle.
Similarly for higher dimensional integrals.

© July 16, 2023


9.3 Multidimensional integrals 4:93

Example 9.5. Evaluate the integral


4Z 5
423
Z
I= (x2 − y 2 + xy − 3) dx dy = = 105.75 .
1 2 4

A Monte Carlo estimate is


N
9 X 2
IN = (Xi − Yi2 − Xi Yi − 3)
N
i=1

An implementation is
% first 2D MC example in notes, Oct 2018
N = 1e6
x = 2 + 3*rand(N,1);
y = 1 + 3*rand(N,1);
g = x.^2 - y.^2 + x.*y - 3;
I = 9/N*sum(g)

Example 9.6. Find the area of the region Ω defined by

0 ≤ x ≤ 1, 10 ≤ y ≤ 13, y ≥ 12 cos x, y ≥ 10 + x3 .

• Among N random points in the rectangle 0 ≤ x ≤ 1 and


10 ≤ y ≤ 13, the area of Ω is the fraction of random points
that lie in Ω times the area P
of the rectangle. That is, a Monte
3
Carlo estimate is AN = N (Xi ,Yi )∈Ω 1.
% Example MC.7 in notes, Oct 2018
N = 1e6;
x = rand(N,1);
y = 10 + 3*rand(N,1);
H = y>=12*cos(x) & y>=10+x.^3 ;
A = 3/N*sum(H)
(
1, (x, y) ∈ Ω,
• Equivalently, let H(x, y) = then the area
0, otherwise,
(as above)
Z 13 Z 1
A= H(x, y) dx dy .
10 0
N
3 X 3 X
≈ AN = H(Xi , Yi ) = 1.
N N
i=1 (Xi ,Yi )∈Ω

© July 16, 2023


4:94 9 Monte Carlo methods

Example 9.7. Extend the previous Example 9.6 to estimate


dy for function g(x, y) = x2 − y 2 + xy − 3, say.
RR
Ω g(x, y) dx
% Example MC.7 in notes, extended
% iint_Omega g dx dy
N = 1e4;
x = rand(N,1);
y = 10 + 3*rand(N,1);
i = find( y>=12*cos(x) & y>=10+x.^3 );
gi = x(i).^2 - y(i).^2 + x(i).*y(i) - 3;
intg = 3/N*sum(gi)

Comparison
R 1 R 1 with midpoint integration Consider the integral
I = 0 0 f (x, y) dx dy. A midpoint rule approximation of this
integral is
Xn Xn
2
IN = h f (xi , yj ),
i=1 j=1

where h = 1/n. The total number of points N = n2 . The midpoint


error is
O(h2 ) = O(n−2 ) = O((N 1/2 )−2 ) = O(N −1 ).

For a d-dimensional integral, the total number of points is N = nd


and the error is
O(h2 ) = O(n−2 ) = O((N 1/d )−2 ) = O(N −2/d ).
The error in Monte Carlo integration is proportional to N −1/2 ,
regardless of dimension d. So Monte Carlo integration becomes a
more attractive alternative to midpoint integration for dimension
d > 4.

Further, methods such as midpoint, trapezoidal and Simpsons


require the integrand to be smooth; Monte Carlo places no such
constraint and is valid for very ‘rough’ integrands.
9.4 Theoretical support
Random variables A random variable X is a real number deter-
mined by the random outcome of an experiment.
A discrete random variable is restricted to certain discrete values
(typically from counting things).
A continuous random variable can take any real value in a given
interval (typically from measuring something).

Probability distribution function For a continuous random vari-


able, the probability density function (pdf) p(x) characterises the
probability that numbers near x are generated: defined via
approximately, Pr(x ≤ X ≤ x + ∆x) = p(x)∆x ;
Pr(x ≤ X ≤ x + ∆x)
more precisely, p(x) = lim .
∆x→0 ∆x

© July 16, 2023


9.4 Theoretical support 4:95

Example 9.8 (Uniform distribution). The uniform distribution on


the interval (a, b) is denoted by U (a, b). The pdf is
(
1
a ≤ x ≤ b,
p(x) = b−a
0 otherwise.

1
0.8 p(x)
0.6
0.4
0.2 x
−0.5 0.5 1 1.5

Expected value and variance For a continuous random vari-


able X, the expected value of Y = g(X) is
Z ∞
E[Y ] = g(x)p(x) dx.
−∞

The variance of Y is

var[Y ] = E[(Y − E[Y ])2 ] = E[Y 2 ] − (E[Y ])2 .

The standard deviation of Y is


p
σ[Y ] = var[Y ].

Linear combinations c1 Y1 + c2 Y2 For Yj = gj (X), the expected


value
E[c1 Y1 + c2 Y2 ] = c1 E[Y1 ] + c2 E[Y2 ],
and similarly for arbitrarily many components.

Monte Carlo integration Consider the integral


Z ∞
I= g(x) dx.
−∞

Independently sample N points Xi , i = 1, . . . , N , from a distri-


bution with pdf p(x). Estimate the integral I with the random
variable
N
1 X
IN = g(Xi )/p(Xi ).
N
i=1

Hence IN is an unbiased estimate of the integral I. But how big


are the errors? We answer soon via the variance.

© July 16, 2023


4:96 9 Monte Carlo methods

Monte Carlo integration with uniform sampling


Rb
Example 9.9. Consider the integral I = a g(x) dx . In the case
where p(x) corresponds to a uniform distribution over [a, b], then
p(x) = 1/(b − a) for a ≤ x ≤ b and so
N N
1 X b−aX
IN = g(Xi )(b − a) = g(Xi ),
N N
i=1 i=1

which is what we earlier derived.

Example 9.10 (examinations). Let the domain of knowledge ad-


dressed by a course be D, and let g(x) be a student’s knowledge
of x ∈ D. An Rexamination aims to estimate the student’s total
knowledge I = D g(x) dx .
• The examiner chooses N ‘questions’, Xi , sampled from all
possibilities but in accordance with his/her view of the im-
portance p(x) of various aspects of the domain D.
• Then the examination estimates a student’s knowledge I with
1 PN
the Monte Carlo IN = N i=1 g(Xi )/p(Xi ).
• That is, the examiner may well ask more questions on ‘impor-
tant’ topics, but should allocate fewer marks to each of those
questions!
• A student, as we soon see, should try to learn topics so that
her/his knowledge g(x) ∝ p(x), the examiner’s view.

Accuracy of Monte Carlo integration The expected value, vari-


ance and standard deviation of IN are
N
" #
1 X
E[IN ] = E g(Xi )/p(Xi ) = · · · = I
N
i=1
N
" #
1 X 1
var[IN ] = var g(Xi )/p(Xi ) = var [g(X)/p(X)]
N N
i=1
1
σ[IN ] = √ σ [g(X)/p(X)]
N
To establish we need variance information. Provided Y1 and Y2 are
independent, defined as requiring E[Y1 Y2 ] = E[Y1 ]E[Y2 ], then the
variance
var[c1 Y1 + c2 Y2 ] = c21 var[Y1 ] + c22 var[Y2 ].

• The standard deviation


√ of estimates IN about their average I
decreases like 1/ N as N gets larger.
• Also, the constant is made smaller if we choose a random
variable with pdf p(x) so that g(x)/p(x) is as constant as
feasible—called Variance Reduction.

© July 16, 2023


9.4 Theoretical support 4:97

Estimate integral with error

Possible Monte Carlo algorithm For a large enough N :


1. compute ten estimates IN/10 via generating ten sets of N/10
random data points;
2. estimate IN = mean(IN/10 ), with standard deviation σN ≈

std(IN/10 )/ 10.
For N sufficiently large, the central limit theorem implies

Pr |IN − I| < 3σN ≈ σN/10 ∼ 0.997.

© July 16, 2023


4:98 A Summary

A Summary
A.1 Introduction
Empty.

A.2 Matlab
Matlab

• Write or interpret short Matlab scripts and functions using


the language elements used in practicals, assignments and
lectures.

• Write Matlab code in terms of vectors and arrays/matrices.

A.3 Interpolation
Interpolation

• Apply Lagrange polynomial interpolation formula for interpo-


lation, integration or differentiation of discrete data.

• Apply polynomial interpolation error theorem or Taylors the-


orem to estimate or bound the error. Use error bounds to
choose parameters that will satisfy error specifications.

A.4 Integration and differentiation


Numerical integration and differentiation

• Derive the midpoint, trapezoidal and Simpson’s rules using


the Lagrange polynomial interpolation formula.

• Derive error bounds for integration formulae using Lagrange’s


Remainder theorem and/or the Polynomial Interpolation Er-
ror theorem.

• Derive finite difference formulae and estimate error using


Lagrange polynomial interpolation formula or Remainder
theorem.

• Understand error scaling O(hp ) as h → 0 and interpret error


plots.

A.5 Spline interpolation


Spline Interpolation

• Given a set of basis functions, construct a curve that inter-


polates the data by solving linear equations for unknown
coefficients.

• Use cubic splines to interpolate discrete data to ensure con-


tinuous first and second derivatives.

• Use radial basis functions to interpolate scattered multidi-


mensional data.

© July 16, 2023


A.6 Numerical linear algebra 4:99

A.6 Numerical linear algebra


Numerical linear algebra
• Never invert a matrix.
• Never write your own Gaussian Elimination code.
• To solve Ax = b :
– if condest(A) is not bad (not large), then first try back-
slash or LU factorisation;
– if condest(A) is bad or at the slightest other difficulty,
then use QR factorisation.
• Understand and use basic properties of orthogonal matrices.
• Use QR factorisation to solve least-squares curve fitting prob-
lems.
• Understand matrix norms and the condition number and use
them to derive error bounds for the solution of linear systems.
• Use the condition number and error bounds to estimate errors
in the solution of linear systems.
• Apply Jacobi iteration for given linear systems.
• Estimate and prove the rate of convergence of the Jacobi
method.

A.7 Nonlinear equations


Nonlinear equations
• Apply fixed-point iteration to solve nonlinear equations.
• Use Lagrange Remainder for Taylor’s series to prove conver-
gence of fixed-point iteration.
• Derive formulae for Newton iteration for one or two variables
using Taylor’s theorem.
• Apply Newton iteration to solve one or more nonlinear equa-
tions.
• Use the Lagrange Remainder to prove second-order conver-
gence of Newton iteration.

A.8 Ordinary differential equations


Ordinary differential equations
• Use Taylor’s theorem to derive ode solvers.
• Estimate the order of the error for ode solvers.
• Apply ode solvers to solve first-order odes, or systems of
first-order odes.

© July 16, 2023


4:100 A Summary

• Apply ode solvers to solve higher-order odes by converting


to a system of first-order odes.
• Understand the effect of step-size on the numerical solution
of odes by analysing model problems.
• Derivation and solution of boundary value problems.
A.9 Monte Carlo methods
• rand and randn generate high quality pseudo-random num-
bers.
• An NR-point Monte Carlo approximation of the 1D integral
b b−a PN
I = a g(x) dx is IN = N i=1 g(Xi ) for random Xi ∈
U [a, b].
And analogously for multi-dimensional integrals.

• The error in Monte Carlo integration is ∝ 1/ N .

© July 16, 2023

You might also like