MAT1841notesfor2024 S2
MAT1841notesfor2024 S2
MAT1841notesfor2024 S2
Lecture Notes
2024 Semester 2
Contents
2 Matrices 34
2.1 Introduction - notation and operations . . . . . . . . . . . . . 34
2.1.1 Operations on matrices . . . . . . . . . . . . . . . . . 35
2.1.2 Some special matrices . . . . . . . . . . . . . . . . . . 36
2.1.3 Properties of matrices . . . . . . . . . . . . . . . . . . 36
2.1.4 Inverses of square matrices . . . . . . . . . . . . . . . 37
2.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . 37
ii
2.2.1 Gaussian elimination strategy . . . . . . . . . . . . . . 38
2.2.2 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Systems of equations using matrices . . . . . . . . . . . . . . 40
2.3.1 The augmented matrix . . . . . . . . . . . . . . . . . . 41
2.4 Row echelon form . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.1 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.2 Homogeneous systems . . . . . . . . . . . . . . . . . . 45
2.4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Matrix Transpose . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.7 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7.1 Properties of determinants . . . . . . . . . . . . . . . 51
2.7.2 Vector cross product using determinants . . . . . . . . 51
2.7.3 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . 52
2.8 Obtaining inverses using Gauss-Jordan elimination . . . . . . 52
2.8.1 Inverse - another method . . . . . . . . . . . . . . . . 54
3 Calculus 55
3.1 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Rate of change . . . . . . . . . . . . . . . . . . . . . . 55
3.1.2 Definition of the derivative f 0 (x) and the slope of a
tangent line . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1.3 Techniques of differentiation - rules . . . . . . . . . . . 58
3.2 Maximum and minimum of functions . . . . . . . . . . . . . . 60
3.3 Differentiating inverse, circular and exponential functions . . 65
3.3.1 Inverse functions and their derivatives . . . . . . . . . 65
3.3.2 Exponential and logarithmic functions: ex and ln x . . 66
3.3.3 Derivatives of circular functions . . . . . . . . . . . . . 69
3.4 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . 74
3.5 Parametric curves and differentiation . . . . . . . . . . . . . . 77
3.5.1 Parametric curves . . . . . . . . . . . . . . . . . . . . 77
3.5.2 Parametric differentiation . . . . . . . . . . . . . . . . 78
3.6 Function approximations . . . . . . . . . . . . . . . . . . . . . 81
3.6.1 Introduction to power series . . . . . . . . . . . . . . . 81
3.6.2 Power series . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6.3 Taylor series . . . . . . . . . . . . . . . . . . . . . . . 86
3.6.4 Derivation of Taylor polynomials from first principles 90
3.6.5 Taylor series centred at x 6= 0 . . . . . . . . . . . . . . 93
3.6.6 Cubic splines interpolation . . . . . . . . . . . . . . . 94
4 Integration 101
4.1 Fundamental theorem of calculus . . . . . . . . . . . . . . . . 101
4.1.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.1.2 Fundamental Theorem of Calculus . . . . . . . . . . . 104
iii
4.2 Area under the curve . . . . . . . . . . . . . . . . . . . . . . . 105
4.3 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . . 108
1
Chapter 1
Common forms of vector notation are bold symbols (v), arrow notation (~v )
and tilde notation (v). Throughout this Study Guide we will use the tilde
˜
notation. This notation compares suitably with the handwritten notation
for vectors. Points in space are represented by a capital letter (for example
the point P ). Note that capital letters are also used for matrices, but con-
sidering each of these objects, this does not lead to ambiguity.
————————————————–
Vectors can be defined in (at least) two ways - algebraically as objects like
v = (1, 7, 3)
˜
u = (2, −1, 4)
˜
or geometrically as arrows in space.
2
Note that vectors have both magnitude and direction. A quantity speci-
fied only by a number (but no direction) is known as a scalar.
How can we be sure that these two definitions actually describe the same
object? Equally, how do we convert from one form to the other? That is,
given v = (1, 2, 7) how do we draw the arrow and likewise, given the arrow
how do˜ we extract the numbers (1, 2, 7)?
Suppose we are give two points P and Q. Suppose also that we find the
change in coordinates from P to Q is (say) (1, 2, 7). We could also draw an
arrow from P to Q. Thus we have two ways of recording the path from P
to Q, either as the numbers (1, 2, 7) or the arrow.
Suppose now that we have another pair of points R and S and further that
we find the change in coordinates to be (1, 2, 7). Again, we can join the
points with an arrow. This arrow will have the same direction and length
as that for P to Q.
In both cases, the displacement, from start to finish, is represented by
either the numbers (1, 2, 7) or the arrow – thus we can use either form to
represent the vector. Note that this means that a vector does not live at
any one place in space – it can be moved anywhere provided its length and
direction are unchanged.
To extract the numbers (1, 2, 7) given just the arrow, simply place the arrow
somewhere in the x, y, z space, and the measure the change in coordinates
from tail to tip of the vector. Equally, to draw the vector given the numbers
(1, 2, 7), choose (0, 0, 0) as the tail then the point (1, 2, 7) is the tip.
The components of a vector are just the numbers we use to describe the
vector. In the above, the components of v are 1,2 and 7.
˜
Another very common way to write a vector, such as v = (1, 7, 3) for exam-
ple, is v = 1i + 7j + 3k. The three vectors i, j, k are a ˜simple way to remind
us that˜ the ˜three˜ numbers
˜ in v = (1, 7, 3) ˜refer
˜ ˜ to directions parallel to the
˜
three coordinate axes (with i parallel to the x-axis, j parallel to the y-axis
and k parallel to the z-axis).˜ ˜
˜
3
In this way we can always write down any 3-dimensional vector as a linear
combination of the vectors i, j, k and thus these vectors are also known as
basis vectors. ˜ ˜ ˜
Two or more vectors are linearly independent if we cannot take any one
of the vectors and write it as a linear combination of the others. We
cannot write i as a linear combination of j and k. In other words there are
˜
no non-zero scalars α and β such that i =˜ αj +˜β k. Thus the basis vectors
i, j, k are linearly independent. ˜ ˜ ˜
˜ ˜ ˜
If we can take a vector and write it as a linear combination of other vectors,
then those vectors are known as linearly dependent. For example the
vectors u = (7, 17, −3), v = (1, 2, 3) and w = (3, 7, 1) are linearly dependent
as u = 3˜w − 2v. ˜ ˜
˜ ˜ ˜
• Equality
v = w only when the arrows for v and w are identical.
˜ ˜ ˜ ˜
4
magnitude of λv is |λ| times the magnitude of v.
˜ ˜
• Addition
To add two vectors v and w arrange the two so that they are tip to tail.
Then v + w is the ˜vector ˜that starts at the first tail and ends at the
second˜ tip.˜ Thus the sum of two vectors v and w is the displacement
vector resulting from first applying v then˜ w. ˜
˜ ˜
• Subtraction
The difference v − w of two vectors v and w is the displacement vector
resulting from ˜first˜applying v then˜ −w. ˜Note that −w is simply the
vector w now pointing in the˜opposite ˜direction to w. ˜
˜ ˜
Example 1.1. Express each of the above rules in terms of the components
of vectors (i.e. in terms of numbers like (1, 2, 7) and (a, b, c)).
5
Example 1.3. Given v = (1, 2, 7) draw v, 2v and −v.
˜ ˜ ˜ ˜
v · w = v x wx + v y wy + v z v z .
˜ ˜
Example 1.5. Let v = (1, 2, 7) and w = (−1, 3, 4). Compute v · v, w · w
and v · w ˜ ˜ ˜ ˜ ˜ ˜
˜ ˜
What do we observe?
6
1.2.1 Length of a vector
Example 1.6. Let v = (1, 2, 7). Compute the distance from (0, 0, 0) to
˜ with √v · v.
(1, 2, 7). Compare this
˜ ˜
This gives us a convenient way to compute the angle between any pair of
vectors. If we find cos θ = 0 then we can say that v and w are orthogonal
(perpendicular). ˜ ˜
7
• Vectors v and w are orthogonal when v · w = 0 (provided neither v
˜ zero).˜
nor w are ˜ ˜ ˜
˜
Example 1.7. Find the angle between the vectors v = (2, 7, 1) and w =
(3, 4, −2) ˜ ˜
A vector is said to be a unit vector if its length is one. That is, v is a unit
vector when v · v = 1. The notation for a unit vector is vˆ (called ˜‘v hat’).
˜ ˜ v ˜ ˜
Unit vectors are calculated by: vˆ = ˜
˜ |v|
˜
This is simply the length of the shadow cast by one vector onto another.
8
1.2.4 Vector projection
This time we produce a vector shadow with length equal to the scalar pro-
jection.
Example 1.10. This example shows how a vector may be resolved into
its parts parallel and perpendicular to another vector.
Given v = (1, 2, 7) and w = (2, 3, 4) express v in terms of w and a vector
˜
perpendicular to w. ˜ ˜ ˜
˜
9
Vector Dot Product - Summary
v.w
Then cos θ = ˜ ˜
|v||w|
˜ ˜
v.w = 0
˜ ˜
The vector cross product is another way to multiply vectors. We start with
vectors v = (vx , vy , vz ) and w = (wx , wy , wz ). Then we define the cross
product ˜ v × w by ˜
˜ ˜
v × w = (vy wz − vz wy , vz wx − vx wz , vx wy − vy wx )
˜ ˜
10
From this definition we can observe
• v × w is a vector
˜ ˜
• v × w = −w × v
˜ ˜ ˜ ˜
• v × v = 0 = (0, 0, 0) (the zero vector)
˜ ˜ ˜
• (λv) × w = λ(v × w)
˜ ˜ ˜ ˜
• (a + b) × v = a × v + b × v
˜ ˜ ˜ ˜ ˜ ˜ ˜
• (v × w) · v = (v × w) · w = 0
˜ ˜ ˜ ˜ ˜ ˜
11
1.3.1 Interpreting the cross product
We know that v × w is a vector and we know how to compute it. But can we
˜ ˜ First we need a vector, so let’s assume that v × w 6= 0.
describe this vector?
Then what can we say about the direction and length of v × w? ˜ ˜ ˜
˜ ˜
The first thing we should note is that the cross product is a vector which
is orthogonal to both of the original vectors. Thus v × w is a vector that is
orthogonal to v and to w. This fact follows from the˜ definition
˜ of the cross
product. ˜ ˜
For any choice of v and w you can see that there are two choices for n – one
˜
points in the opposite ˜
direction ˜ It’s
to the other. Which one do we choose?
up to us to make a hard rule. This is it. Place your right hand palm so that
your fingers curl over from v to w. Your thumb then points in the direction
of v × w. ˜ ˜
˜ ˜
|v × w| = λ = |v||w| sin θ
˜ ˜ ˜ ˜
How? First we build a triangle from v and w and then compute the cross
product for each pair of vectors ˜ ˜
v × w = λθ n
˜ ˜ ˜
(v − w) × v = λφ n
˜ ˜ ˜ ˜
(v − w) × w = λρ n
˜ ˜ ˜ ˜
12
(one λ for each of the three vertices). We need to compute each λ.
Now since (β v) × w = β(v × w) for any number β we must have λθ in
˜ ˜
v × w = λθ n proportional ˜ |v||˜w|, likewise for the other λ’s. Thus
to
˜ ˜ ˜ ˜ ˜
λθ = |v||w|αθ
˜ ˜
λφ = |v||v − w|αφ
˜ ˜ ˜
λρ = |w||v − w|αρ
˜ ˜ ˜
where each α depends only on the angle between the two vectors on which
it was built (i.e. αφ depends only on the angle φ between v and v − w).
˜ ˜ ˜
But we also have v × w = (v − w) × v = (v − w) × w which implies that
λθ = λφ = λρ which ˜ in ˜turn gives
˜ ˜us ˜ ˜ ˜ ˜
αθ αφ αρ
= =
|v − w| |w| |v|
˜ ˜ ˜ ˜
But we also have the Sine Rule for triangles
sin θ sin φ sin ρ
= =
|v − w| |w| |v|
˜ ˜ ˜ ˜
and so
αθ = k sin θ, αφ = k sin φ, αρ = k sin ρ
where k is a number that does not depend on any of the angles nor on any of
lengths of the edges – the value of k is the same for every triangle. We can
choose a trivial case to compute k, simply put v = (1, 0, 0) and w = (0, 1, 0).
Then we find k = 1. ˜ ˜
|v × w| = |v||w| sin θ
˜ ˜ ˜ ˜
————————————————————
Example 1.13. Show that |v × w| also equals the area of the parallelogram
formed by v and w. ˜ ˜
˜ ˜
13
Vector Cross Product - Summary
v × w = (vy wz − vz wy , vz wx − vx wz , vx wy − vy wx )
˜ ˜
14
1.4 Lines in 3-dimensional space
Through any pair of distinct points we can always construct a straight line.
These lines are normally drawn to be infinitely long in both directions.
Example 1.14. Find all points on the line joining (2, 4, 0) and (2, 4, 7).
Example 1.15. Find all points on the line joining (2, 0, 0) and (2, 4, 7).
15
Example 1.16. Find the equation of the line joining the two points (1, 7, 3)
and (2, 0, −3).
Example 1.18. In some cases you may find a small problem with the form
suggested in the previous example. What is that problem and how would
you deal with it?
Example 1.19. Determine if the line defined by the points (1, 0, 1) and
(1, 2, 0) intersects with the line defined by the points (3, −1, 0) and (1, 2, 5).
Example 1.20. Is the line defined by the points (3, 7, −1) and (2, −2, 1)
parallel to the line defined by the points (1, 4, −1) and (0, −5, 1).
Example 1.21. Is the line defined by the points (3, 7, −1) and (2, −2, 1)
parallel to the line defined by the points (1, 4, −1) and (−2, −23, 5).
16
1.4.1 Vector equation of a line
Note that
(a, b, c) = the vector to one point (P ) on the line
(p, q, r) = the vector from the first point to
the second point on the line (P to Q)
= a vector parallel to the line
Let’s relabel these and put d = (a, b, c), v = (p, q, r) and r(t) = (x(t), y(t), z(t)),
then ˜ ˜ ˜
r(t) = d + tv
˜ ˜ ˜
This is known as the vector equation of a line.
Example 1.22. Write down the vector equation of the line that passes
through the points (1, 2, 7) and (2, 3, 4).
Example 1.23. Write down the vector equation of the line that passes
through the points (2, 3, 7) and (4, 1, 2).
17
Lines in R3
18
1.5 Planes in 3-dimensional space
ax + by + cz = d
where a, b, c and d are some bunch of numbers that identify this plane from
all other planes. (There are other ways to write an equation for a plane, as
we shall see).
A plane is uniquely determined by any three points (provided not all three
points are contained on a line). Recall, that a line is fully determined by
any pair of points on the line.
We can find the equation of the plane that passes through the three points
(1, 0, 0), (0, 3, 0) and (0, 0, 2). To do this we need to compute a, b, c and d.
We do this by substituting each point into the above equation,
Example 1.25. What equation do you get if you chose d = 1 in the previous
example? What happens if you chose d = 0?
19
Example 1.26. Find an equation of the plane that passes through the three
points (−1, 0, 0), (1, 2, 0) and (2, −1, 5).
x(t) = a + pt
y(t) = b + qt
z(t) = c + rt
x(u, v) = a + pu + lv
y(u, v) = b + qu + mv
z(u, v) = c + ru + nv
20
Example 1.28. Show that the parametric equations found in the previous
example describe exactly the same plane as found in Example 1.26 (Hint
: substitute the answers from Example 1.27 into the equation found in
Example 1.26).
Example 1.29. Find the parametric equations of the plane that passes
through the three points (−1, 2, 1), (1, 2, 3) and (2, −1, 5).
Example 1.30. Repeat the previous example but with points re-arranged as
(−1, 2, 1), (2, −1, 5) and (1, 2, 3). You will find that the parametric equations
look different yet you know they describe the same plane. If you did not
know this last fact, how would you prove that the two sets of parametric
equations describe the same plane?
ax + by + cz = d
for some numbers a, b, c and d. We will now re-express this in a vector form.
Suppose we know one point on the plane, say (x, y, z) = (x, y, z)0 , then
21
Put ∆x10 = (x1 − x0 , y1 − y0 , z1 − z0 ) and ∆x20 = (x2 − x0 , y2 − y0 , z2 − z0 ).
Notice˜that both of these vectors lie in the plane
˜ and that
Example 1.31. Find the vector equation of the plane that contains the
points (1, 2, 7), (2, 3, 4) and (−1, 2, 1).
22
Planes in R3
23
1.6 Systems of Linear Equations
1.6.1 Examples of Linear Systems
Silly puzzles
John and Mary’s ages add to 75 years. When John was half his present age
he was twice as old as Mary. How old are they?
We have just two equations in our system:
J + M = 75
1
2J − 2M = 0
Intersections of planes
It is easy to imagine three planes in space. Is it possible that they share one
point in common? Here are the equations for three such planes
3x + 7y − 2z = 0
6x + 16y − 3z = −1
3x + 9y + 3z = 3
Can we solve this system for (x, y, z)?
In all of the above examples we need to unscramble the set of linear equa-
tions to extract the unknowns (e.g. G, S, C etc).
To solve a system of linear equations is to find solutions to the sets of
equations. In other words we find values that the variables can take such
that each of the equations in the system is true.
24
1.6.2 A standard strategy
3x + 7y − 2z = 0 (1)
6x + 16y − 3z = −1 (2)
3x + 9y + 3z = 3 (3)
Suppose by some process we were able to rearrange these equations into the
following form
3x + 7y − 2z = 0 (1)
2y + z = −1 (2)0
4z = 4 (3)00
(3)00 ⇒ 4z = 4 ⇒ z=1
(2)0 ⇒ 2y + 1 = −1 ⇒ y = −1
(1) ⇒ 3x − 7 − 2 = 0 ⇒ x=3
3x + 7y − 2z = 0 (1)
6x + 16y − 3z = −1 (2)
25
At this point our system of equations is
3x + 7y − 2z = 0 (1)
2y + z = −1 (2)0
2y + 5z = 3 (3)0
The last step is to eliminate the 2y term in the last equation. We do this
by replacing equation (3)0 with (3)0 − (2)0
⇒ 4z = 4 (3)00
So finally we arrive at the system of equations
3x + 7y − 2z = 0 (1)
4z = 4 (3)00
which, as before, we solve to find z = 1, y = −1 and x = 3.
In previous lectures we saw how we could construct the equations for lines
and planes. Now we can answer some simple questions.
How do we compute the intersection between a line and a plane? Can we
be sure that they do intersect? And what about the intersection of a pair
or more of planes?
The general approach to all of these questions is simply to write down equa-
tions for each of the lines and planes and then to search for a common point
(i.e. a consistent solution to the system of equations).
Example 1.33. Is the point (1, 2, 3) on the line r(t) = (3, 4, 5) + ((2, 2, 2)t?
˜
Solution: We simply check if the following system of equations yields the
same value for t.
1 = 3 + 2t
2 = 4 + 2t
3 = 5 + 2t
26
Rearranging the top equation gives t = −1. In fact, each of these three
equations gives t = −1 hence the point (1, 2, 3) is on the line r(t) = (3, 4, 5)+
(2, 2, 2)t. ˜
Example 1.34. Is the point (1, 2, 4) on the line r(t) = (3, 4, 5) + (2, 2, 2)t?
˜
Example 1.35. Do the lines r1 (t) = (1, 0, 0)+(1, 0, 0)t and r2 (s) = (0, 0, 0)+
(0, 1, 0)s intersect? If so, find˜ the point of intersection. ˜
Example 1.37. Find the intersection of the line x(t) = 1 + 3t, y(t) = 3 − 2t,
z(t) = 1 − t with the plane 2x + 3y − 4z = 1.
27
Example 1.38. Find the intersection of the plane y = 0 with the plane
2x + 3y − 4z = 1.
Now we are well equipped to be able to find the distances between points,
lines and planes. There are various combinations we can have, such as the
distance between a point and a plane, or the distance between a line and a
plane.
Example 1.40. Find the distance between the point (1, 2, 3) and the line
given by the equation r(t) = (0, 0, 7)+(1, 1, 1)t. Solution: Firstly we subtract
the position vector of˜ the point v = (1, 2, 3) from the equation of the line.
This will give us a vector u that is dependent on the parameter t.
˜
Think of the tail of this vector being fixed at the point (1, 2, 3) and its tip
running along the line as t changes. In order to find the shortest distance
(note that when asked to find the distance, it is implied that this means find
the shortest distance) we want to find the value of t for which the length of
u us as short as possible. We can also note that the shortest vector u will
˜be perpendicular to the direction of the line. This means that the˜ dot
product of the vectors (1, 1, 1) and (−1 + t, −2 + t, 4 + t) will be zero.
(1, 1, 1) · (−1 + t, −2 + t, 4 + t) = −1 + t − 2 + t + 4 + t = 1 + 3t = 0
28
Hence t = − 13 . Now put this value of t into the vector u to give:
˜
1 1 1 1 4 7 11
u(− ) = (−1 − , −2 − , 4 − ) = (− , − , )
˜ 3 3 3 3 3 3 3
Now simply calculate the length of u(− 31 ) = (− 43 , − 73 , 11
3 ). This gives |u| =
. If you plug t = − 3 into the vector equation of the line, you get˜the
1 ˜
coordinates of the point on the line that is closest to the point (1, 2, 3).
and
(0, 0, 7) + (1, 1, 1)s
Parallel lines
If two lines are parallel, then it is easy to calculate the distance between
them. Simply pick a point on one of the lines and calculate its distance
from the other line, as per finding the distance between a point and a line.
Remember that two lines are parallel if the direction vector of one line is a
scalar multiple of the direction vector of the other line.
and
(0, 0, 7) + (2, 2, 4)s
29
Another way of finding the distance between two lines uses scalar projec-
tion. Using this method we find any vector that joins a point on one line
to the other line, and then compute the scalar projection of this vector onto
the vector orthogonal to both lines (it helps to draw a diagram).
Example 1.43. Find the distance between the point (2, 3, 4) and the plane
given by the equation x + 2y + 3z = 4.
Solution: First of all we need to find a point on the plane. By setting y = 0
and z = 0 we find x = 4. Thus (4, 0, 0) is a point on the plane. Now we find
the normal vector of the plane, n = (1, 2, 3). We then form a vector v from
˜ the given point (2, 3, 4). Thus
the point on the plane (4, 0, 0) to ˜
Example 1.44. Find the distance between the line r(t) = (2, 3, 4)+(3, 0, −1)t
and the plane plane x + 2y + 3z = 4. ˜
30
1.6.5 Summary
• There is no solution
• There is exactly one solution
• There are infinitely many solutions
This seems obvious for the case of two or three variables if we view the equa-
tions geometrically. In this case a solution of a system of linear equations
(of two or three variables) is a point in the intersection of the lines or planes
represented by the equations.
31
One point of intersection (unique solution)
32
One point of intersection (unique solution)
Example 1.46. What other examples can you draw of intersecting planes?
33
Chapter 2
Matrices
Entries within a matrix are denoted by subscripted lower case letters. For
the matrix A above we have a11 = 3 , a12 = 2, a13 = −1, a21 = 1, a22 = −1,
a23 = 1 and so forth. Here, A is a 3 × 3 matrix
3 2 −1 a11 a12 a13
A = 1 −1 1 = a21 a22 a23
2 1 −1 a31 a32 a33
where aij = the entry in row i and column j of A.
An m × n matrix can be represented similarly. Note that m denotes number
of rows in the matrix, and n denotes the number of columns.
a11 a12 . . . a1n
a21 a22 . . . a2n
A= .
.. .. ..
.. . . .
am1 am2 . . . amn
34
2.1.1 Operations on matrices
• Equality:
A=B
only when all entries in A equal those in B.
• Multiplication of matrices:
... e ...
... ... ...
...
... ... f ... ... ... ...
... g ... ... i ...
a b c d ... =
... h ... ... ... ...
...
. .
... . . . .. ... . . . .. ...
i = a · e + b · f + c · g + d · h + ...
Note that we can only multiply matrices that fit together. That is, if
A and B are a pair of matrices, then in order that AB make sense,
we must have the number of columns of A equal to the number of
rows of B. We also say that matrices A and B are compatible for
multiplication if A has size m × n and B has size m × r. The product
AB is then a matrix of size n × r.
35
2.1.2 Some special matrices
• AB 6= BA
• (AB)C = A(BC)
• (AT )T = A
• (AB)T = B T AT
2 3 1 7 2 1
Example 2.2. Given A = B = and C =
4 1 0 2 3 0
verify the above four properties.
36
2.1.4 Inverses of square matrices
2x + 3y + z = 10 (1)
x + 2y + 2z = 10 (2)
4x + 8y + 11z = 49 (3)
2x + 3y + z = 10 (1)
y + 3z = 10 (2)0 ← 2(2) − (1)
2y + 9z = 29 (3)0 ← (3) − 2(1)
2x + 3y + z = 10 (1)
y + 3z = 10 (2)0
3z = 9 (3)00 ← (3)0 − 2(2)0
2x + 3y + z = 10 (1)
y + 3z = 10 (2)0
3z = 9 (3)00
37
of row reduction is known as Gaussian elimination.1
Example 2.3. Continue from the previous example and use row-operations
to eliminate the terms above the diagonal. Hence solve the system of equa-
tions.
If you stop after step one you are doing Gaussian elimination with back-
substitution (this is usually the easier option).
2.2.2 Exceptions
2x + y + 2z + w = 2 (1)
0y − 3z + w = −1 (2)0 ← (2) − (1)
− 5y + 0z − 3w = −6 (3)0 ← 2(3) − (1)
+ 5y − 4z + 3w = 2 (4)0 ← 2(4) − (1)
1
In some texts, using row operations to eliminate terms below the diagonal only is
known as Gaussian elimination, whereas using row operations to eliminate terms below
and above the diagonal is known as Gauss-Jordan elimination.
38
The zero on the diagonal on the second equation is a serious problem, it
means we can not use that row to eliminate the elements below the diagonal
term. Hence we swap the second row with any other lower row so that we
get a non-zero term on the diagonal. Then we proceed as usual.
2x + y + 2z + w = 2 (1)
− 5y + 0z − 3w = −6 (2)00 ← (3)0
0y − 3z + w = −1 (3)00 ← (2)0
+ 5y − 4z + 3w = 2 (4)0 ← 2(4) − (1)
2x + 3y − z = 1 (1)
− 5y + 5z = −1 (2)0
0z = 0 (3)00
The last equation tells us nothing! We can’t solve it for any of x, y and z.
We really only have 2 equations, not 3. That is 2 equations for 3 unknowns.
This is an under-determined system.
We solve the system by choosing any number for one of the unknowns. Say
we put z = λ where λ is any number (our choice). Then we can leap back
into the equations and use back-substitution.
The result is a one-parameter family of solutions
1 1
x= − λ, y= + λ, z=λ
5 5
Since we found a solution we say that the system is consistent.
Example 2.6. An inconsistent system
Had we started with
2x + 3y − z = 1 (1)
x − y + 2z = 0 (2)
3x + 2y + z = 0 (3)
39
we would have arrived at
2x + 3y − z = 1 (1)
− 5y + 5z = −1 (2)0
0z = −2 (3)00
This last equation makes no sense as there are no finite values for z such
that 0z = −2 and thus we say that this system is inconsistent and that
the system has no solution.
3x + 2y − z = −1
x − y + z = 4
2x + y − z = −1
We can rewrite this system using matrix notation. The coefficients of our
equations form a 3 × 3 matrix A
3 2 −1
A = 1 −1 1
2 1 −1
40
Example 2.7. Write the system of equations
3x + 2y − z = −1
x − y + z = 4
2x + y − z = −1
in matrix notation.
2x + 3y + z = 10 (1)
x + 2y + 2z = 10 (2)
4x + 8y + 11z = 49 (3)
41
2.4 Row echelon form
• If there are any zero rows, they are at the bottom of the matrix.
• The first non-zero entry in each non-zero row (called the leading
entry or pivot) is to the right of the pivots in the rows above it.
Example 2.8. Write down three matrices in echelon form and circle the
pivots.
The variables corresponding to the columns containing pivots are called the
leading variables. The variables corresponding to the columns that do not
contain pivots are called free variables. Free variables are not restricted
by the linear equations - they can take arbitrary values, and we often denote
these by Greek letters (such as α, β and so forth). The leading variables are
then expressed in therms of the free variables.
42
(d) Give a geometric interpretation of your results.
(i)
x+y =1
x − 2y = 4
(ii)
x−y =1
x−y =2
(iii)
x−y =1
3x − 3y = 3
(a) Write down the augmented matrix and bring it to echelon form.
(b) Identify the free variables and the leading variables.
(c) For what values of the number k does the system have (i) no solution,
(ii) infinitely many solutions, (iii) exactly one solution?
(d) When a solution or solutions exist, find them.
43
2.4.1 Rank
The rank of a matrix is the number of non-zero rows (also the number of
pivots) in its row echelon form. The rank of a matrix is denoted by rank(A).
The rank of a matrix gives us information about the solutions of the associ-
ated linear system.
Importantly, if the number of rows in the augmented matrix is equal to
the rank of the matrix, then the system of linear equations has a unique
solution.
A linear system of m equations in n variables will give an m × n matrix A.
Once we have reduced the matrix to echelon form, and found the rank = r
of the reduced matrix (let’s call the reduced matrix U ) we can deduce the
following informative properties:
Properties
1. Number of variables = n
2. Number of leading variables = r
3. Number of free variables = n − r
4. r ≤ m (because there is at most one pivot in each of the m rows of
U ).
5. r ≤ n (because there is at most one pivot in each of the n columns of
U ).
6. If r = n there are no free variables and there will be either no solution
or one solution.
7. If r < n there is at least one free variable and there will be either no
solution or infinitely many solutions.
8. If there are more variables than equations, that is n > m, then r < n
and so there will be either no solution or infinitely many solutions.
Example 2.12. What is the rank of each of the matrices in the previous
examples?
44
2.4.2 Homogeneous systems
2.4.3 Summary
Gaussian Elimination
2. Otherwise, find the first column with a non-zero entry (say a) and
use a row interchange to bring that entry to the top row.
3. Subtract multiples of the top row from the rows below it so that
each entry below the pivot a becomes zero. (This completes the first
row. All subsequent operations are carried out on the rows below it.)
45
A linear system of equations Ax = b, or AX = B, can be written in the
general form: ˜ ˜
x1
a11 a12 ... a1n x2 b1
a21 a22 ... a2n b2
... =
... ... ... ...
...
...
am1 am2 ... amn bm
xn
2. There are infinitely many solutions - this happens when the equa-
tions are consistent and there is at least one free variable.
3. There is exactly one solution - this happens when the equations are
consistent and there are no free variables.
46
2.5 Matrix Inverse
AX = B
Note that not all matrices will have an inverse. For example, if
a b
A=
c d
then
−1 1 d −b
A =
ad − bc −c a
and for this to be possible we must have ad − bc 6= 0.
In later lectures we will see some different methods for computing the inverse
A−1 for other (square) matrices larger than 2 × 2.
1 2
Example 2.13. If A = , then A−1 =
3 4
47
Properties of inverses
48
1 1 1 0 0
1 1 1
Example 2.14. Let B = and C = 0 1 2 1 0.
1 2 5
2 0 1 1 0
Find B T and C T .
• (AT )T = A
• (cA)T = cAT
• (A + B)T = AT + B T
• (AB)T = B T AT
2 3
Example 2.15. Verify the above using matrices A = and B =
4 5
1 2
.
3 4
2.7 Determinants
The determinant function det is a function that assigns to each n×n matrix
A a number det A called the determinant of A. The function is defined
as follows:
49
• If n > 2: It gets a bit complicated now, but it is not too bad. Firstly
create a sub-matrix Sij of A by deleting the ith row and the j th column.
Then define
det A := a11 det S11 − a12 det S12 + a13 det S13 − · · · ± a1n det S1n
The quantity det Sij is called the minor of entry aij and is denoted Mij .
The number (−1)i+j Mij is called the cofactor of entry aij . Thus to compute
det A you have to compute a chain of determinants from (n − 1) × (n − 1)
determinants all the way down to 2 × 2 determinants.
This method of defining and evaluating det A is called Laplace’s expan-
sion along the first row. We can, in fact, use any row (or any column)
to calculate det A.
We often write det A = |A|.
When we expand the determinant about any row or column, we must observe
the following pattern of ± signs (these correspond to the (−1)i+j in Cij -
check!).
+ − + − + − ···
− + − + − + ···
+ − + − + − ···
− + − + − + ···
Example 2.17. By expanding about the second row compute the determi-
nant of
1 7 2
A= 3 4 5
6 0 9
50
Example 2.18. Compute the determinant of
1 2 7
A= 0 0 3
1 2 1
• For any fixed i = 1, . . . , n we have det A = ai1 Ci1 +ai2 Ci2 +· · ·+ain Cin .
The rule for a vector cross product can be conveniently expressed as a de-
terminant. Thus if v = vx i + vy j + vz k and w = wx i + wy j + wz k then
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
i j k
v×w = ˜ ˜ ˜
vx vy vz
˜ ˜ wx wy wz
51
2.7.3 Cramer’s rule
Recall that some texts use the term Gaussian elimination to refer to re-
ducing a matrix to its echelon form, and the term Jordan elimination to
refer to reducing a matrix to its reduced echelon form. In this manner, the
Gauss-Jordan algorithm can be described diagrammatically as follows:
−→ −→
[A|I] G.A [U |∗] J.A [I|B] where B = A−1 .
52
• augment A by the identity matrix;
• then B = A−1 .
x + y + 3z = 2
2y + z = 0
x + 4y + 4z = 1
53
2.8.1 Inverse - another method
• Store this entry at aji (row j and column i) in the inverse matrix.
That is, if
A = [ aij ]
then
1
A−1 = (−1)i+j det Sji
det A
This method works but it is rather tedious.
————————————————–
54
Chapter 3
Calculus
Reference books
3.1 Differentiation
change of s ∆s
= = v = average speed over the time interval ∆t.
change in t ∆t
55
Suppose now that the speed of P varies with time. By making ∆t become
very small, i.e. taking the limit as ∆t → 0, we define the derivative of s
with respect to t at time t as the rate of change of s with respect to t, as
t → 0.
If we let ds and dt be the infinitesimal changes in s and t, then we can
write:
ds ∆s
v= = lim = instantaneous speed of P at time t.
dt ∆t→0 ∆t
Consider the graph of the function y = f (x) of the single variable x shown
in the plot below.
We will now compare the average rate of change of f (x) with the derivative
of f (x).
Let ∆f = f (x + ∆x) − f (x) be the change in f as we go from point P to
Q and as x changes from x to x + ∆x. The average rate of change of the
∆f
function f (x) on the interval ∆x is . This is the slope of the chord P Q.
∆x
The derivative of f (x) at the point x is defined as
df ∆f f (x + ∆x) − f (x)
= f 0 (x) = lim = lim .
dx ∆x→0 ∆x ∆x→0 ∆x
The derivative is the slope (or gradient) of the local tangent line to the
curve y = f (x) at the point P . That is, f 0 (x) = tan θ, where θ is the angle
between the two dashed lines.
56
The derivative f 0 (x) is thus the instantaneous rate of change of f with
respect to x at the point P .
Example 3.1. Use the definition of the derivative to obtain from first
principles the value of f 0 (x) for the function f (x) = 25x − 5x2 at x = 1.
Find the equation of the tangent line to the graph of y = f (x) at the point
(1, 20) in the xy-plane.
Solution:
df f (x + ∆x) − f (x)
= lim
dx ∆x→0 ∆x
=
57
3.1.3 Techniques of differentiation - rules
dy dy du
= = f 0 (u)g 0 (x)
dx du dx
58
x2 + 1
(c) Find the derivative of f (x) = with respect to x.
x2 − 1
dy
(d) Use the chain rule to find when
dx
(i) y = (2x + 3)5
p
(ii) y = (3x2 + 1)
59
3.2 Maximum and minimum of functions
The derivative of a function f (x) tells us important information regarding
the graph of y = f (x).
• If f 0 (x) > 0 on the interval [a, b] then the function f (x) is increasing
on that interval.
• If f 0 (x) < 0 on the interval [a, b] then the function f (x) is decreasing
on that interval.
• If f 0 (x) = 0 on the interval [a, b] then the function f (x) is constant
on that interval.
Example 3.3. Identify the local maxima and minima on the graph of the
function below.
60
How do we find the local maxima and minima?
Local maxima and local minima occur where the derivative of the function
is zero. They can also occur where the derivative does not exist (consider
the function f (x) = |x| at the point x = 0. For the function f (x) we define
the extrema, or critical points, as the points x = c such that:
• f 0 (c) = 0, or
It is important to note that f 0 (c) = 0 does not imply that the function f (x)
must have a local maximum or minimum at x = c. Consider the function
f (x) = x3 at x = 0 to explore this further. Thus having f 0 (c) = 0 is only a
necessary requirement, rather than a sufficient requirement for the existence
of local maxima or minima.
We can also note the following with regards to the graph of f (x).
Note that the tangent line (if it exists) is horizontal at the point x = c
corresponding to either a local maximum or a local minimum.
• If f 0 (x) does not change, then f (c) is neither a maximum nor a mini-
mum value for f (x).
61
In summary to find the local extrema we
Example 3.4. Find the local extrema for the function f (x) = x3 − 5x2 −
8x + 7 over the interval R.
Solution: First we find the critical points by differentiating the function and
solving for x when f 0 (x) = 0.
62
Absolute (Global) Maximum and Minimum
Since we have been talking about local extrema, we must also mention ab-
solute (global) extrema.
Note that the interval [a, b] must be a closed interval. Why is this necessary?
Example 3.5. What happens with the Extreme Value Theorem if a func-
tion f (x) is not continuous?
63
To find the absolute extrema of a continuous function on a closed
interval:
1. Find the values of the function at all critical points in the interval.
2. Find the values of the function at the end points of the interval.
2
Example 3.6. Find the absolute extrema for the function f (x) = e−x .
64
3.3 Differentiating inverse, circular and exponen-
tial functions
The inverse of a function f is the function that reverses the operation done
by f . The inverse function is denoted by f −1 . It satisfies the relation
y = f (x) ⇔ x = f −1 (y).
y = f −1 (x) ⇔ x = f (y).
Example 3.7. Find the inverse function of y = f (x) = 51 (4x − 3). A sketch
of f (x) is given below. Note that y = x is the thin line shown in the diagram.
Sketch f −1 (x) on the same axis.
65
3.3.2 Exponential and logarithmic functions: ex and ln x
A very important example of a function and its inverse are the exponential
function y = ex and the natural logarithm function y = ln x. From the
definition of the inverse function we have
y = f (x) = ex ⇔ x = ln y
• ln(ex ) = ln(y) = x
• eln y = ex = y
66
Sometimes we will need to restrict the domain of a function in order to find
its inverse.
Example 3.8. Find the inverse function f −1 (x) of f (x) = x2 by first re-
stricting the domain to [0, ∞).
dy 1 1
If y = f −1 (x) ⇔ x = f (y), then = = 0
dx dx/dy f (y)
Example 3.9. Find the derivative of the function f (x) and its inverse
1
f −1 (x) for f (x) = (4x − 3). Check that the answers satisfy the deriva-
5
tive rule for inverse functions.
67
d 1
Example 3.10. Show that (ln x) = given that ex is the inverse func-
dx x
d x x
tion of ln x and (e ) = e .
dx
68
3.3.3 Derivatives of circular functions
Circular (or trigonometric) functions sin x, cos x, tan x, etc arise in prob-
lems involving functions that are periodic and repetitive, such as those that
describe the orbit of a planet about its parent star. Here we acquaint our-
selves with the derivatives of such functions.
Example 3.11. Sketch the graphs of f (x) = sin x and g(x) = cos x on the
same diagram for the interval 0 ≤ x ≤ 3π. Use the tangent line method to
estimate the values of f 0 (x) at the points x = 0, π2 , π, 3π
2 , 2π, . . . on the same
diagram. Do these values seem to match the curve g(x)?
69
Before examining the derivatives of circular functions in more detail, we
consider two basic inverse circular functions sin−1 x and tan−1 x.
Note that the alternative notation for inverse circular functions is: sin−1 x =
arcsin x, cos−1 x = arccos x and tan−1 x = arctan x.
Example 3.12. The graphs of y = sin x and y = tan x are shown below by
the heavy curves for the restricted domains [− π2 , π2 ], i.e. − π2 ≤ x ≤ π2 and
(− π2 < x < π2 ), respectively.
Sketch the inverse functions sin−1 x and tan−1 x using mirror reflection
across the line y = x.
y = sin x y = tan x
Note: The reason for the use of a restricted domain in specifying inverse
circular functions is that if all of the graph of y = sin x or tan x were naively
reflected across the line y = x, there would be more than one choice (in
fact there would be an infinite number of choices) for the ordinate (y) value
of the inverse function. We saw this at work in a previous example. A
function may haveonly a single value for each x in its domain. We
also note that tan π2 and tan − π2 are not defined (±∞).
70
The values of the derivatives of the six basic circular (i.e. trigonometric)
functions are shown in the table below, along with the derivatives of the
three main inverse circular functions sin−1 x, cos−1 x and tan−1 x. Also
listed are the derivatives of the basic exponential and logarithm functions.
sin x cos x
cos x − sin x
dy
Example 3.13. Find when y is given by
dx
(a) sin(2x + 3)
71
(b) x2 cos x
(c) x tan(2x + 1)
(d) tan−1 x2
d 1
sin−1 x = √
(e) Prove the differentiation formula .
dx 1 − x2
dy
(f) Find when y = arcsin(e2x )
dx
72
d −1
cos−1 x = √
(g) Prove the differentiation formula .
dx 1 − x2
ds
(h) Find when s = ln(tan(2t))
dt
dg √
(i) Find when g = t sin−1 (t2 )
dt
dg 5x3 + 3x
(j) Find when g = 2
dx (x + 3)2
73
3.4 Higher order derivatives
df
If f (x) is a differentiable function, then its derivative f 0 (x) = is also a
dx
function and so may have a derivative itself. The derivative of a derivative
is called the second derivative and is denoted by f 00 (x). There are various
ways it can be written:
d2 f
00 d 0 d df
f (x) = f (x) = = = f (2) (x)
dx dx dx dx2
Interpretation:
Earlier we used the first derivative to find local maxima and local minima.
We can also use the second derivative at x = c to find these.
The second derivative f 00 (x) also measures the rate of change of the first
derivative f 0 (x). As f 0 (x) is the gradient or slope of the tangent line to the
graph of y = f (x) in the xy-plane, we see that:
d2 f df
(i) if 2
> 0 then increases with increasing x and the graph of
dx dx
y = f (x) is said to be locally concave up.
d2 f df
(ii) if 2
< 0 then decreases with decreasing x and the graph of
dx dx
y = f (x) is said to be locally concave down.
74
Example 3.14. If f (x) = x3 − 3x + 1 find the first four derivatives f 0 (x),
f 00 (x), f (3) (x) and f (4) (x). Determine for what values of x the curve is
concave up and concave down. Also locate the turning points (a, f (a))
given where f 0 (a) = 0. Mark these features on the graph of y = f (x) shown
below.
Example 3.15. Find the second derivative of the function y = e−x sin 2x.
75
ln x
Example 3.16. Find the second derivative of the function y = .
x
√
Example application: Find the point on the graph of x, x ≥ 0 closest
to (2, 0).
76
3.5 Parametric curves and differentiation
The equation that describes a curve C in the Cartesian xy-plane can some-
times be very complicated. In that case it can be easier to introduce an
independent parameter t, so that the coordinates x and y become functions
of t. We explored this when we looked at the vector equations of lines and
planes. That is, x = f (t) and y = g(t). The curve C is parametrically
represented by
y = 4t2 ,
(b) C = (x, y) : x = 2t, −∞ < t < ∞
77
(c) C = {(x, y) : x = 5 cos(t), y = 2 sin(t), 0 ≤ t ≤ 2π}
dy dy dx g 0 (t) df dg
= / = 0 where f 0 (t) = and g 0 (t) = .
dx dt dt f (t) dt dt
78
dy
Find the derivative function . Find the equation of the tangent line to
dx
π
the curve at the point corresponding to t = and draw this on the sketch
4
for the case a = 2.
Solution: Here f (t) = a cos t, g(t) = a sin t. Therefore x2 +y 2 = a2 (cos2 t+
sin2 t) = a2 . The curve C is thus a circle of radius a centred at the origin.
As t goes from 0 → 2π, the circle is described once, in the positive direction,
starting at the point (2, 0). Now, f 0 (t) = −a sin t = −y, g 0 (t) = a cos t = x
and so
g 0 (t)
dy dy dx x a cos t
= = 0 = =− = − cot t
dx dt dt f (t) −y −a sin t
π π π a π π a
At t = ,x = a cos =√ ,y = a sin = √ , and so
4 4 4 2 4 4 2
√
dy a/ 2
= − √ = −1.
dx a/ 2
y − y1 = m(x − x1 )
79
x1 = y1 = m=
and thus the required answer is:
dy
(i) Find as a function of t
dx
dy
(ii) Evaluate at t = 1 and find the tangent line to the curve at this point.
dx
80
3.6 Function approximations
We are now going to take a step in an interesting direction, and look at how
to approximate a function by a number of different methods.
a0 , a1 , a2 , a3 , . . . = a0 , ra0 , r2 a0 , r3 a0 , . . .
A finite geometric series consists of the sum Sn of the first n terms of the
geometric sequence. Setting the initial term a0 = a, a constant, we have:
(1 − r)Sn = a − arn
Hence as long as r 6= 1, the sum of the first n terms of a geometric series is:
n−1
X a(1 − rn )
Sn = ark = (iii)
1−r
k=0
For the particular case of r = 1, we see from Equation (i) that Sn = an.
81
Example 3.21. Find the geometric series of the following sequence
1 1 1 1
1, , , , ,...
2 4 8 16
(i) when n = 3, i.e. find S3 .
Solution: At the end of the first year, the value of the investment is
P1 = P0 + iP0 .
82
Year no. Investment value Investment value Amount withdrawn
at start of year at end of year
1 P0 P0 + iP0 P0
3
.. .. .. ..
. . . .
N −1
N iN −1 P0 iN −1 P0 + iN P0
Hence at the end of year N the investor has got back a total sum
SN = PN = P0 + iP0 + i2 P0 + . . . + iN −1 P0 =
Substituting numerical values, the final value of the investment for the ner-
vous investor is: $133, 333.21.
Final return for the bank is:
83
3.6.2 Power series
We have seen that a finite geometric series of n terms has the sum
n−1
2 3 n−1
X a(1 − rn )
Sn = a + ar + ar + ar + · · · + ar = ark = .
1−r
k=0
Suppose we allow n to become very large. Then provided that −1 < r < 1,
n 1 n→∞
we have r → 0 as n → ∞. For example 2 = 0. Now setting a = 1,
r = x and taking n → ∞, it follows that
∞
1 X
= 1 + x + x2 + x3 + · · · + xn−1 + · · · = xk . (3.1)
1−x
k=0
The right hand side of Equation (3.1) is called a power series in the variable
x, and is represented using a so-called infinite sum.1 Here, this power series
1
evaluates to the function f (x) = .
1−x
Example 3.23. Two trains 200 km apart are moving toward each other.
Each one is going at a constant speed of 50 kilometres per hour. A fly
starting on the front of one of them flies back and forth between them at
a rate of 75 kilometres per hour (fast fly!). The fly does this until the two
trains collide. What is the total distance the fly has flown?
Power series
A general power series in the variable x has the form
∞
X
an xn = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · · (3.2)
n=0
1
A series is an object which allows us to give rigorous meaning to the P concept of
∞
‘infinite sum’. To be precise,
Pn for a sequence b 0 , b1 , . . . , its series is defined as n=0 bn =
limn→∞ Sn , where Sn = i=0 bi is called its partial sum. A power series is just a special
case of a series, namely take bn = an xn . In this course we will not cover series in detail,
only power series.
84
A power series is actually a limit, and hence it may not converge for all
x ∈ (−∞, ∞). However, if x is taken to be sufficiently small, namely
−R < x < R, the power series will exist as a function of x, which we will call
f (x). The largest R for which this occurs is called the radius of conver-
gence, and guarantees that the power series f (x) exists for −R < x < R.
In fact, it may even exist at x = R or x = −R. This leads us to the idea of
representing continuous functions of x by a power series. That is, we have
∞
X
f (x) = an xn = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · · (3.3)
n=0
where the domain of f (x) is either (−R, R), [−R, R), (−R, R] or [−R, R].
85
3.6.3 Taylor series
Tn (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn
Note that Tn (x) is a finite polynomial, its domain includes all x. That is,
−∞ < x < ∞. Now the Taylor series of f is then given by limn→∞ Tn (x) =
f (x). In other words, a function’s Taylor series is precisely its power series
representation. Please take a moment to understand the distinction between
a power series and a Taylor series!
Example 3.24. Use the table of basic power series to find T0 (x), T1 (x),
T2 (x), T3 (x) for ex .
Solution:
T0 (x) = 1
T1 (x) = 1 + x
1
T2 (x) = 1 + x + x2
2
1 1
T3 (x) = 1 + x + x2 + x3
2 6
The first of these Taylor polynomials, namely T0 (x) = 1 simply matches the
height of the graph of y = f (x) at x = 0. It is sometimes called the zeroth
approximation to y = f (x) at x = 0. It can also be called the zeroth
Taylor polynomial for f at a = 0.
86
of the tangent line to the graph y = f (x) at x = 0.
The diagram below shows the graph of y = ex (thick curve) on the do-
main −1.5 ≤ x ≤ 1.5, along with the graphs of y = T0 (x) = 1 and the
linear approximation function y = T1 (x) = 1 + x.
87
(iv) T0 (x), T2 (x), T6 (x), for f (x) = cos 3x
Example 3.26. For parts (i) and (ii) of the example above, draw sketches
of the graphs of y = f (x), T0 (x) and T1 (x).
Linear Approximation
e3x
y = f (x) =
2+x
centred at x = 0.
Solution:
e3x
f (x) = f (0) =
2+x
f 0 (x) = f 0 (0) =
88
Hence T1 (x) =
(ii) Use the linear approximation to f (x) to estimate the value of f (0.1).
89
3.6.4 Derivation of Taylor polynomials from first principles
Suppose we do not know the Taylor series for a given function f (x) but wish
to derive the first few Taylor polynomial approximations to f (x) near x = 0.
How do we find T0 (x), T1 (x), T2 (x), . . ., Tn (x)? Determining these objects
requires finding the numbers a0 , a1 , a2 , . . . . In order to do this we need to
know not only the value of f (x) at x = 0, namely f (0), but also the values
of the first n derivatives of f (x) at x = 0. That is, we need to be given the
values of f (0), f 0 (0), f 00 (0) ≡ f (2) (0), f (3) (0), . . ., f (n) (0). We will build
each of the Ti (x), 0 ≤ i ≤ n, so that its function value and derivative at
x = 0 match up to (and include) f (i) (0).
Solution:
We write
Tn (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn (3.4)
where a0 , a1 , a2 , . . ., an are undetermined constants.
90
(3)
Therefore Tn (0) = 3 × 2 × 1a3 + · · · + n(n − 1)(n − 2)an 0n−3 = 3 × 2 × 1a3 .
(3) 1
By insisting that Tn (0) = f (3) (0) we get a3 = f (3) (0).
3!
1 (n)
Repeating this process n times, we find an = n! f (0). Lastly, we substitute
these ai values into Equation (3.4), to obtain the Taylor polynomial of degree
n centred at x = 0:
n
X f (k) (0)
Tn (x) = xk
k!
k=0
f 0 (0) f 00 (0) 2 f (3) (0) 3 f (n) (0) n
= f (0) + x+ x + x + ··· + x .
1! 2! 3! n!
Knowing that f (x) = limn→∞ Tn (x), this gives the Taylor series of f (x)
centred at x = 0 as
∞
X f (n) (0)
f (x) = xn
n!
n=0
f 0 (0) f 00 (0) 2 f (3) (0) 3
= f (0) + x+ x + x + ··· .
1! 2! 3!
Example 3.28. (i) Derive the first four Taylor polynomials T0 (x), T1 (x),
1
T2 (x), T3 (x) for the function f (x) = 1+x centred at x = 0.
Function Value at x = 0
1
f (x) = f (0) =
1+x
1
f 0 (x) = − f 0 (0) =
(1 + x)2
f (2) (x) = f (2) (0) =
91
(ii) Sketch f (x), T1 (x) and T2 (x).
(iii) Deduce the Taylor polynomial of degree three for the function g(x) =
1
centred at x = 0.
1 + 3x
Example 3.29. (i) Derive the Taylor polynomials T0 (x), T2 (x), T4 (x) for
the function y = cos x centred at x = 0.
(ii) Using Mathematica (or otherwise) plot y = cos x, as well as T0 (x), T2 (x),
and T4 (x) for the domain −π ≤ x ≤ π.
(iii) Deduce the Taylor polynomial of degree four for y = cos 3x centred
at x = 0.
Function Value at x = 0
f 0 (x) = f 0 (0) =
Example 3.30. Is it possible to have two different power series for the one
function?
92
3.6.5 Taylor series centred at x 6= 0
Taking the limit as n → ∞ of Tn (x; c), we get the Taylor series of f centred
at x = c as
∞
X f (n) (c)
f (x) = (x − c)n
n!
n=0
f 0 (c) f 00 (c)
= f (c) + (x − c) + (x − c)2 + · · · .
1! 2!
Note that f (x) is given by its Taylor series regardless of the centring value c
(as long as the convergence occurs!). So, it’s not particularly useful to look
at Taylor series centred at values other than x = 0, unless x = 0 does not
allow for convergence. Taylor polynomials centred at x = c on the other
hand are very useful. They allow us to obtain good approximations to f
near x = c, and we usually only require a low order Taylor polynomial (often
2 or 3 suffices!).
Example 3.31. Derive the third degree Taylor polynomial of f (x) = cos(x),
centred at x = π/2. That is, find T3 (x; π/2). Use this to estimate f ( π2 +0.1).
93
3.6.6 Cubic splines interpolation
Polynomial interpolation
Suppose we are given a set of data points (xi , f (xi )) (note that we do not
have the function f (x) explicitly given to us) and we want to build a function
that approximates f (x) with as much continuity as we can get. What do
we do? We will introduce a method of interpolation to do this. We are
essentially going to find a curve which best fits the data given. In science and
engineering, numerical methods often involve curve fitting of experimental
data.
Polynomial interpolation is simply a method of estimating the values of a
function, between known data points. Thus linear interpolation would use
two data points, quadratic interpolation would use three data points, and
so on.
As an example, suppose you are asked to evaluate a function f (x) at x = 3.4
but all you are given is the following table of data points.
94
Since x = 3.4 is not in the table, the best we can do is find an estimate of
f (3.4). We could construct a straight line built on the two points either side
of x = 3.4 namely (2.400, 0.675) and (3.600, −0.443). Or we could build a
quadratic based on any three points (that cover x = 3.4). If we were really
keen we could build a cubic by selecting four points around x = 3.4. All we
are doing is simply using a set of points near the target point (x = 3.4) to
build a polynomial. Then we estimate f (x) by evaluating the polynomial at
the target point. This process is called polynomial interpolation.
This method will give us a unique polynomial for our approximation of the
function f (x). This is fine but we know that our accuracy is likely to decline
as we get further away from our target point. What else can we do?
linear spline
95
cubic spline
Suppose we are given a simple dataset and we are asked to estimate the
derivative at say x = 0.35 How do we proceed? Here is one approach.
Construct, by whatever means, a smooth approximation ỹ(x) to y(x) near
x = 0.35. Then put y 0 (0.35) ≈ ỹ 0 (0.35).
What we want is a method which
• Interpolation condition
96
• Continuity of the second derivative
00
ỹi−1 (xi ) = ỹi00 (xi ) (4)
ỹi00 (xi ), i = 0, 1, 2, . . . , n − 1,
yi00 =
ỹ 00 (x ), i = n.
n−1 n
00
Now we turn to equation (4) yi+1 = yi00 + 6ci (xi+1 − xi ) which gives
00
ci = (yi+1 − yi00 )/(6hi ) (7)
1 00
yi+1 = yi + ai hi + (yi+1 + 2yi00 )h2i (8)
6
and so
97
yi+1 − yi 1 00
ai = − hi (yi+1 + 2yi00 ). (9)
hi 6
yi+1 − yi yi − yi−1 00
6 − = hi yi+1 + 2(hi + hi−1 )yi00 + hi−1 yi−1
00
. (10)
hi hi−1
The only unknowns in this equation are the yi00 of which there are n + 1.
But there are only n − 1 equations. Thus we must supply two extra pieces
of information.
The simplest choice is to set y000 = yn00 = 0. Then we have a tridiagonal system
of equations2 to solve for yi00 . That’s as far as we need push the algebra –
we can simply now use technology (such as Matlab, Mathematica, Wolfram
Alpha...) to solve the tridiagonal system.
The recipe
Our job is done. We have computed the cubic spline for the our set of
data points.
Example 3.32. Let us say we are given the set of data points in the fol-
lowing table. Find the cubic spline that best fits this data.
2
Often a system of equations will give a coefficient matrix of a special structure. A
tridiagonal system of equations is one such that the coefficient matrix has zero entries
everywhere except for in the main diagonal and in the diagonals above and below the
main diagonal.
98
x −2 −1 1 3
f (x) 3 0 2 1
We are going to use equation 5 to give three cubics, ỹ0 (x), ỹ1 (x) and ỹ2 (x).
Recall
105 51
Solving this system of equations we find that y100 = and y200 = − .
22 22
167 31 23
The use of equation 9 will give us a0 = − , a1 = − and a2 = .
44 22 22
105 51
Equation 6 gives b0 = 0, b1 = and b2 = − .
44 44
35 13 17
Equation 7 gives c0 = , c1 = − and c2 = .
44 22 88
Example 3.33. For the above example, check that the four conditions (1,
2, 3 and 4) are met.
99
Example 3.34. Compute the cubic spline that passes through the following
data set points
x 0 1 2 3
100
Chapter 4
Integration
4.1.1 Revision
R
Computing the indefinite integral I = f (x)dx is no different from find-
dF R dF
ing a function F (x) such that = f (x). Thus dx = F (x). The
dx dx
function F (x) is called an anti-derivative of f (x).
You should recall some of the basic integrals.
Z
kdx = kx + C, where C ∈ R
Z
1
xn dx = xn+1 + C, n 6= −1
n+1
Z
sin(x)dx = − cos(x) + C
Z
cos(x)dx = sin(x) + C
Z
ex dx = ex + C
Z
1
dx = ln |x| + C
x
Z Z Z
f (x) + g(x) dx = f (x)dx + g(x)dx
101
Z Z Z
f (x) − g(x) dx = f (x)dx − g(x)dx
Z Z
kf (x)dx = k f (x)dx for any constant k
There are also a few tricks we can use to find F (x), such as integration by
substitution and integration by parts.
Integration by substitution
R
If I = f (x)dx looks nasty, try changing the variable of integration. That
is, put u = u(x) for some chosen function u(x), then invert the function to
find x = x(u) and substitute into the integral.
Z Z
dx
I= f (x)dx = f (x(u)) du
du
If we have chosen well, then this second integral will be easy to do.
Integration by parts
This is a very powerful technique based on the product rule for derivatives.
Recall that
d(f g) df dg
=g +f
dx dx dx
Now integrate both sides
Z Z Z
d(f g) df dg
dx = g dx + f dx
dx dx dx
But integration is the inverse of differentiation, thus we have
102
Z Z
df dg
fg = g dx + f dx
dx dx
which we can re-arrange to
Z Z
dg df
f dx = f g − g dx
dx dx
Thus we have converted one integral into another. The hope is that the
second integral is easier than the first. This will depend on the choices we
dg
make for f and .
dx
Example 4.2. Find xex dx.
R
dg
Solution: We have to split the integrand xex into two pieces, f and .
dx
dg df
If we choose f (x) = x and = ex then = 1 and g(x) = ex .
dx dx
Then
Z Z
x df
xe dx = f g − g dx
dx
Z
x
= xe − 1 · ex dx
= xex − ex + C
R
Example 4.3. Find x cos(x)dx.
dg df
Solution: Choose f (x) = x and = cos(x) then = 1 and g(x) = sin(x).
dx dx
Then
Z Z
df
x cos(x)dx = f g − gdx
dx
Z
= x sin(x) − 1 · sin(x)dx
= x sin(x) + cos(x) + C
103
R
Example 4.4. Find x sin(x)dx
Rb
Note that a f (x)dx is known as the definite integral from a to b as we
are integrating the function f (x) between the values x = a and x = b.
Can we interpret this theorem in some physical way? Of course! Let s(t) be
a continuous function which gives the position of a moving object at time
t where t is in the interval [a, b]. We know that s0 (t) gives the velocity of
Rb
the object at time t, and we want to know what is the meaning of a s0 (t)dt.
Recall that distance = velocity × time. Thus for any small interval ∆t in
[a, b] we have s0 (t) × ∆t ≈ distance travelled in ∆t. Adding each successive
calculation of the distance travelled for the small intervals of time ∆t from
t = a to t = b will give us (approximately) the total distance travelled
over the interval [a, b].
Integrating the velocity function s0 (t) over the interval [a, b] will then give us
the total distance travelled over the interval [a, b]. Thus the definite inte-
gral of a velocity function can be interpreted as the total distance travelled
in the interval [a, b].
The integral of the rate of change of any quantity gives the total
change in that quantity.
104
4.2 Area under the curve
When f (x) is a positive function and a < b then the definite integral
Z b
f (x)dx
a
gives the area between the graph of the function f (x) and the x - axis. In
other words
Z b
f (x)dx = A
a
Example 4.5. Find the area between the graph of y = sin x and the x -
axis, between x = 0 and x = π2 .
When f (x) is a negative function and a < b then the definite integral gives
the negative of the area between the graph of the function f (x) and the x -
axis.
Z b
f (x)dx = −A
a
105
Example 4.6. Find the area between the graph of y = sin x and the x -
axis, between x = π and x = 3π
2 .
When f (x) is positive for some values of x in the interval [a, b] and negative
for other values in the interval [a, b] then the definite integral gives the sum
of the areas above the x - axis and subtracts the areas below the x - axis.
In other words
Z b
f (x)dx = A − B + C
a
Example 4.7. Find the area between the graph of y = sin x and the x -
axis, between x = 0 and x = 3π
2 .
106
Area between two curves. Given two continuous functions f (x) and
g(x) where f (x) ≥ g(x) for all x in the interval [a, b], the area of the region
bounded by the curves y = f (x) and y = g(x), and the lines x = a and
x = b is given by the definite integral
Z b
f (x) − g(x) dx
a
Example 4.8. Find the area between the graphs of y = sin x and y = cos x
between x = π4 and x = π.
Example 4.9. Find the area between the graphs of y = sin x and y = cos x
between x = 0 and x = π.
107
4.3 Trapezoidal rule
1
A = (m + n)w
2
where m and n are lengths of the parallel sides of the trapezium, and w is
the distance between the parallel lengths (i.e. the width).
!
1b−a
A= f (a) + 2f (a + ∆n) + . . . + 2f (a + (n − 1)∆n) + f (b) .
2 n
n−1
!
X b−a
A= f (xi ) + f (xi + ∆n) .
2n
i=0
n−1
!
b
b−a
Z X
f (x)dx ≈ f (a) + f (b) + 2 f (xi )
a 2n
i=1
108
Example R 24.11. Use the Trapezoidal rule with n = 4 to find an approximate
value of 0 2x dx.
Solution
In the interval [0, 2] when n = 4 we have four trapezoids each of width 21 .
The endpoints of our interval are a = 0 and b = 2 thus f (a) = f (0) = 20 = 1
b−a 2 1
and f (b) = f (2) = 22 = 4. Note that = = . Thus
2n 2×4 4
n−1
!
2
b−a
Z X
x
2 dx ≈ f (a) + f (b) + 2 f (xi )
0 2n
i=1
3
!
1 X
= 1+4+2 2 xi
4
i=1
1
= 1 + 4 + 2(21/2 + 21 + 23/2 )
4
1 √ √
= 5 + 2( 2 + 2 + 2 2)
4
1 √
= (9 + 6 2)
4
109
Example 4.14. Use the Trapezoidal rule with n = 4 to find an approximate
value of Z 1
2
ex dx
0
110
Chapter 5
Multivariable Calculus
We are all familiar with simple functions such as y = x3 . And we all know
the answers to questions such as
• What does the function look line as a plot in the xy− plane?
5.1.1 Definition
111
What does this mean? Simply that for any allowed value of x and y we can
compute a single value for f (x, y). In a sense f is a process for converting
pairs of numbers (x and y) into a single number f .
The notation R2 means all possible choices of x and y such as all points
in the xy-plane. The symbol R denotes all real numbers (for example all
points on the real line). The use of the word subset in the above definition
is simply to remind us that functions have an allowed domain (i.e. a subset
of R2 ) and a corresponding range (i.e. a subset of R).
Notice that we are restricting ourselves to real variables, that is the func-
tion’s value and its arguments (x, y) are all real numbers. This game gets
very exciting and somewhat tricky when we enter the world of complex num-
bers. Such adventures await you in later year mathematics (not surprisingly
this area is known as Complex Analysis).
5.1.2 Notation
f (x, y) = sin(x + y)
We can choose the domain to be R2 and then the range will be the closed
set [−1, +1]. Another common way of writing all of this is
w(u, v) = log(u − v)
Example 5.1. What would be a sensible choice of domain for the previous
function?
5.1.3 Surfaces
112
the xy-plane. If we use standard Cartesian coordinates then such a surface
could be described by the equation
z = f (x, y)
This surface has a height z units above each point (x, y) in the xy-plane.
Just as the equation y = f (x) describes the curve in the xy− plane, the
equation z = f (x, y) describes the surface in R3 . Just as the curve C = f (x)
is made up of the points (x, y), the surface S = f (x, y) is made up of the
points (x, y, z). As z = f (x, y) describes this surface explicitly as a height
function over a plane, we say that the surface is given in explicit form.
A surface such as z = f (x, y) is also often called the graph of the function
f.
Here are some simple examples. A very good exercise is to try to convince
yourself that the following images are correct (i.e. that they do represent
the given equation).
p
Note that in each of the following r is defined as r = + (x2 + y 2 ).
z = x2 + y 2
113
1 = x2 + y 2 − z 2
p
z= 1 + y 2 − x2
114
z = −xy exp −x2 − y 2
1=x+y+z
Example 5.2. Sketch and describe the graph of the surface z = f (x, y) =
6 + 3x + 2y.
115
5.1.4 Alternative forms
We might ask are there any other ways in which we can describe a surface?
We should be clear that (in this subject) when we say surface we are talking
about a 2-dimensional surface in our familiar 3-dimensional space. With
that in mind, consider the equation
0 = g(x, y, z)
z = f (x, y)
for some function f . In this form we see that we have a surface, and thus the
previous equation 0 = g(x, y, z) also describes a surface. When the surface
is described by an equation of the form 0 = g(x, y, z) we say that the surface
is given in implicit form.
Consider all of the points in R3 (i.e all possible (x, y, z) points). If we now
introduce the equation 0 = g(x, y, z) we are forced to consider only those
(x, y, z) values that satisfy this constraint. We could do so by, for example,
arbitrarily choosing (x, y) and using the equation (in the form z = f (x, y) to
compute z. Or we could choose say (y, z) and use the equation 0 = g(x, y, z)
to compute x. Which ever road we travel it is clear that we are free to choose
just two of the (x, y, z) with the third constrained by the equation.
Now consider some simple surface and let’s suppose we are able to drape a
sheet of graph paper over the surface. We can use this graph paper to select
individual points on the surface (well as far as the graph paper covers the
surface). Suppose we label the axes of the graph paper by the symbols u
and v. Then each point on surface is described by a unique pair of values
(u, v). This makes sense – we are dealing with a 2-dimensional surface and
so we expect we would need 2 numbers ((u, v)) to describe each point on the
surface. The parameters (u, v) are often referred to as (local) coordinates
on the surface.
How does this picture fit in with our previous description of a surface, as
an equation of the form 0 = g(x, y, z)? Pick any point on the surface. This
point will have both (x, y, z) and (u, v) coordinates. That means that we can
describe the point in terms of either (u, v) or (x, y, z). As we move around
the surface all of these coordinates will vary. So given (u, v) we should be
able to compute the corresponding (x, y, z) values. That is we should be
able to find functions P (u, v), Q(u, v) and R(u, v) such that
116
The above equations describe the surface in parametric form.
Example 5.3. Identify (i.e. describe) the surface given by the equations
x = 2u + 3v + 1 y = u − 4v + 2 z = u + 2v − 1
Hint : Try to combine the three equations into one equation involving x, y
and z but not u and v.
Example 5.5. How would your answer to the previous example change if
the domain for θ was 0 < θ < π/2?
Explicit z = f (x, y)
Implicit 0 = g(x, y, z)
117
5.2 Partial derivatives
We are all familiar with the definition of the derivative of a function of one
variable
df f (x + ∆x) − f (x)
= lim
dx ∆x→0 ∆x
The natural question to ask is: Is there similar rule for functions of more
than one variable? The answer is yes, and we will develop the necessary
formulas by a simple generalisation of the above definition.
Let us suppose we have a function, say f (x, y). Suppose for the moment
that we pick a particular value of y, say y = 3. Then only x is allowed to
vary and in effect we now have a function of just one variable. Thus we can
apply the above definition for a derivative which we write as
∂f f (x + ∆x, y) − f (x, y)
= lim
∂x ∆x→0 ∆x
Notice the use of the symbol ∂ rather than d. This is to remind us that
in computing this derivative all other variables are held constant (which in
this instance is just y).
Of course we could do the same again but with x held constant. This gives
us the derivative in y
∂f f (x, y + ∆y) − f (x, y)
= lim
∂y ∆y→0 ∆y
Each of these derivatives, ∂f /∂x and ∂f /∂y are known as first order par-
tial derivatives of f (the derivative of a function of one variable is often
called an ordinary derivative).
We can also look at this in terms of the rate of change, as we did for single
variable functions. If z is a function of two independent variables x and y
(i.e. z = f (x, y)) then there are two independent rates of change. One of
these is the rate of change of f with respect to the variable x, and
the other is the rate of change of f with respect to the variable y.
You might think that we would now need to invent new rules for the (partial)
derivatives of products, quotients and so on. But our definition of partial
118
derivatives is built upon the definition of an ordinary derivative of a function
of one variable. Thus all the familiar rules carry over without modification.
For example, the product rule for partial derivatives is
∂ (f g) ∂f ∂g
= g +f
∂x ∂x ∂x
∂ (f g) ∂f ∂g
= g +f
∂y ∂y ∂y
∂f
• To find , treat y as a constant and differentiate f (x, y) with respect
∂x
to x only.
∂f
• To find , treat x as a constant and differentiate f (x, y) with respect
∂y
to y only.
∂f ∂ sin(x) cos(y)
=
∂x ∂x
∂ sin(x)
= cos(y)
∂x
= cos(y) cos(x)
∂f
Also find .
∂y
119
2 −y 2 −z 2
Example 5.8. If g(x, y, z) = e−x then
2 −y 2 −z 2
∂g ∂e−x
=
∂z ∂z
2 −y 2 −z 2 ∂(−x2 − y 2 − z 2 )
= e−x
∂z
2 −y 2 −z 2
= −2ze−x
∂g ∂g
Also find and .
∂x ∂y
120
5.3.1 Geometric interpretation
∂f
Earlier we noted that the partial derivative of the function of two vari-
∂x
ables z = f (x, y) is the rate of change of f in the x-direction, keeping y
∂f
fixed. To visualise as the slope (or gradient) of a straight line, consider
∂x
the diagram below.
121
Similarly, the diagram below illustrates the intersection of the vertical plane
x = ‘constant’ with the surface z = f (x, y).
122
If we zoom in onto the surface at P it becomes locally flat. We can then
draw the two tangent lines T1 and T2 to the surface at P which are tan-
gential to the two curves C1 and C2 that lie in the vertical planes y = b
∂f
and x = a. These tangent lines, which have the slopes = tan α and
∂x
∂f
= tan β shown previously, give the rate of change of f (x, y) in both the
∂y
x and y directions.
The tangent plane to the surface at the point P is the plane that con-
tains both of the tangent lines T1 and T2 . Let us now find the equation
of this plane. We know that the general equation of a plane that passes
through the point P (a, b, c) is
Here m and n are the slopes of the lines of intersection of the general plane
with the two vertical planes y = b and x = a that are parallel to the principal
coordinate planes (the xz− plane and the yz− plane respectively). If we
now put y = b in equation (5.1), we have z − c = m(x − a). This is the
equation of the line of intersection of our general plane with the plane y = b.
It clearly has slope m. Next we put x = a in equation (5.1) and this yields
z −c = n(y −b). This is the equation of the line of intersection of our general
plane with the plane x = a. It clearly has slope n. Lastly, if we choose
∂f ∂f
m = tan α = = fx (a, b) and = n = tan β = = fy (a, b)
∂x ∂y
then the equation of the tangent plane to the surface z = f (x, y) at the
123
point (x, y) = (a, b) is
Example 5.10. Find the equation of the tangent plane to the surface
z = 2x2 + y 2 at the point (a, b) = (1, 1).
∂f ∂f
= 2y therefore (1, 1) =
∂y ∂y
We have done the hard work, and now it is time to enjoy the fruits of our
labour. Just as we used the tangent line in approximations for functions of
one variable, we can use the tangent plane as a way to estimate the original
function f (x, y) in a region close to the chosen point.
The equation of the tangent plane to the surface z = f (x, y) at the point
(a, b) is also the equation for the linear approximation to z = f (x, y) for
points (x, y) near (a, b). We can regard the tangent plane equation (5.2)
as the natural extension to functions of two variables (x, y) of the Taylor
polynomial of degree one equation
124
the linear approximation to f (x, y) for points (x, y) near (a, b). Please
note, we will omit the centring point (a, b) from the argument of T1 as
the notation becomes too cumbersome! You will need to understand from
context which centring point is being utilised.
Example 5.11. Derive the linear approximation function T1 (x, y) for the
√
function f (x, y) = 3x − y at the point (4, 3).
Example 5.12. Use the result of example 5.9 to estimate sin(x) sin(y) at
5π 5π
( , ).
16 16
125
5.4 Chain rule
In a previous lecture we saw how we could compute (partial) derivatives
of functions of several variables. The trick we employed was to reduce the
number of independent variables to just one (which we did by keeping all
but one variable constant). There is another way in which we can achieve
this reduction, which involves parametrising the function.
Consider a function of two variables f (x, y) and let’s suppose we are given
a smooth (continuous, with derivatives which are also continuous) curve in
the xy-plane. Each point on this curve can be characterised by its distance
from some arbitrary starting point on the curve. In this way we can imagine
that the (x, y) pairs on this curve are given as functions of one variable, let’s
call it s. That is, our curve is described by the parametric equations
x = x(s) y = y(s)
for some functions x(s) and y(s). The values of the function f (x, y) on this
curve are therefore given by
f = f (x(s), y(s))
and this is just a function of one variable s. Thus we can compute its
derivative df /ds. We will soon see that df /ds can be computed in terms of
the partial derivatives.
df ∂f
Example 5.14. Show that for the curve x(s) = s, y(s) = 2 we get = .
ds ∂x
126
df
Example 5.15. Show that for the curve x(s) = −1, y(s) = s we get =
ds
∂f
.
∂y
The last two examples show that df /ds is somehow tied to the partial deriva-
tives of f . The exact link will be made clear in a short while.
What meaning can we assign to this number df /ds? It helps to imagine that
we have drawn a graph of f (x, y) (i.e. as a surface over the xy-plane).
Now draw the curve (x(s), y(s)) in the xy-plane and imagine walking along
that curve, let’s call it C. At each point on C, f (s) is the height of the
surface above the xy-plane. If you walk a short distance ∆s then the height
might change by an amount ∆f . The rate at which the height changes with
respect to the distance travelled is then ∆f /∆s. In the limit of infinitesimal
distances we recover df /ds. Thus we can interpret df /ds as measuring the
rate of change of f along the curve. This is exactly what we would have
expected – after all, derivatives measure rates-of-change.
The first example above showed how you could compute df /ds by first re-
ducing f to an explicit function of s. It was also hinted that it is also possible
to evaluate df /ds using partial derivatives.
We will re-write this by adding and subtracting f (x(s), y(s+∆s)) just before
the minus sign. After a little rearranging we get
127
Now let’s look at the first limit. If we introduce ∆x = x(s + ∆s) − x(s) then
we can write
∂f dx
= .
∂x ds
We can write a similar equation for the second limit. Combining the two
leads us to
df ∂f dx ∂f dy
= +
ds ∂x ds ∂y ds
df ∂f dx ∂f dy
= +
ds ∂x ds ∂y ds
Now that we have covered this much, it’s rather easy to see an important
extension of the above result. Suppose the path was obtained by holding
some other parameter constant. That is, imagine that the path x = x(s), y =
y(s) arose from some more complicated expressions such as x = x(s, t), y =
128
y(s, t) with t held constant. How would our formula for the chain rule
change? Not much other than we would have to keep in mind throughout
that t is constant. We encountered this issue once before and that led to
partial rather than ordinary derivatives. Clearly the same change of notation
applies here, and thus we would write
∂f ∂f ∂x ∂f ∂y
= +
∂s ∂x ∂s ∂y ∂s
as the first partial derivative of f with respect to s.
Let’s see where we are at so far. We are given a function of two variables
f = f (x, y) and we are also given two other functions, also of two variables,
x = x(s, t), y = y(s, t). Then ∂f /∂s can be calculated using the above chain
rule.
129
The Chain Rule : Episode 2
Let f = f (x, y) be a differentiable function. If x = x(u, v), y = y(u, v)
then
∂f ∂f ∂x ∂f ∂y
= +
∂u ∂x ∂u ∂y ∂u
∂f ∂f ∂x ∂f ∂y
= +
∂v ∂x ∂v ∂y ∂v
130
5.5 Gradient and Directional Derivative
∂f ∂f
∇f = i+ j
∂x˜ ∂y ˜
The is known as the gradient of f and is often pronounced grad f.
This may be pretty but what use is it? If we look back at the formula for
the chain rule we see that we can write it out as a vector dot-product
df ∂f dx ∂f dy
= +
ds ∂x ds ∂y ds
∂f ∂f dx dy
= i+ j · i+ j
∂x˜ ∂y ˜ ds ˜ ds ˜
dx dy
= (∇f ) · i+ j
ds ˜ ds ˜
The number that we calculate in this process, i.e. df /ds, is known as the
directional derivative of f in the direction t. What do we make of the
vector on the far right of this equation, i.e. dx˜ dy
ds i + ds j? It is not hard to
˜
see that it is a tangent vector to the curve (x(s), y(s)).˜And if we chose the
parameter s to be distance along the curve then we also see that it is a unit
vector.
Example 5.17. Prove the last pair of statements, i.e. that the vector is a
tangent vector and that it is a unit vector.
131
It is customary to denote the tangent vector by t (some people prefer u).
˜
With the above definitions we can now write the equation ˜
for a directional
derivative as follows
df
= t · ∇f
ds ˜
Directional derivative
Example 5.18. Given f (x, y) = sin(x) √ cos(y) compute the directional deriva-
tive of f in the direction t = (i + j)/ 2.
˜ ˜ ˜
132
Example 5.19. Given ∇f = 2xi + 2y j and x(s) = s cos(0.1), y(s) =
s sin(0.1) compute df /ds at s = 1. ˜ ˜
133
At this point we can dispense with the curves and retain just the tangent
vector t at P . All that we require to compute df /ds is the direction we wish
to head˜ in, t, and the gradient vector, ∇f , at P . Choose a different t and
you will get˜a different answer for df /ds. In each case df /ds measures˜ how
rapidly f is changing the direction of t.
˜
134
5.6 Second order partial derivatives
Example 5.21. Let f (x, y) = sin(x) sin(y). Then we can define g(x, y) =
∂f ∂g
and h(x, y) = .
∂x ∂x
That is
∂f ∂ sin(x) sin(y)
g(x, y) = = = cos(x) sin(y)
∂x ∂x
and
∂g ∂ cos(x) sin(y)
h(x, y) = = = − sin(x) sin(y)
∂x ∂x
∂g
Example 5.22. Compute for the above example.
∂y
135
and this is normally written as
∂2f
h(x, y) =
∂y∂x
Note the order on the bottom line – you should read this from right to left.
It tells you that to take a partial derivative in x then a partial derivative in
y.
The function z = f (x, y) has two partial derivatives fx and fy . Taking
partial derivatives of fx and fy yields four second order partial derivatives
of the function f (x, y).
It’s now a short leap to cases where we might try to find, say, the fifth partial
derivatives, such as
∂5Q
P (x, y) =
∂x∂y∂y∂x∂x
Partial derivatives that involve one or more of the independent variables are
known as mixed partial derivatives.
∂2f ∂2f
Example 5.23. Given f (x, y) = 3x2 +2xy compute and . What
∂x∂y ∂y∂x
do you notice?
This is not immediately obvious but it can be proved and it is a very useful
result.
A quick word on notation: The second order partial derivatives can be
written as follows:
136
∂2f
= fxx
∂x2
∂2f
= fyy
∂y 2
∂2f
= fxy
∂x∂y
The theorem allows us to simplify our notation, all we need do is record how
many of each type of partial derivative are required, thus the above can be
written as
∂5Q ∂5Q
P (x, y) = 3 2
=
∂x ∂y ∂y 2 ∂x3
Example 5.25. Show that the function u(x, y) = e−x cos y is a solution of
Laplace’s equation
∂2u ∂2u
+ 2 = 0.
∂x2 ∂y
Solution:
∂ −x ∂ −x
e cos y = −e−x cos y and uy = e cos y = −e−x sin y
ux =
∂x ∂y
Then
∂
− e−x cos y =
uxx =
∂x
and
∂
− e−x sin y =
uyy =
∂y
Hence
uxx + uyy =
137
5.6.1 Taylor polynomials of higher degree
Example 5.26. Derive the Taylor polynomial of degree two for the function
f (x, y) = e−x cos y near the point (a, b) = (0, 0).
Solution:
∂ ∂
− e−x cos y = e−x sin y fxy (0, 0) = e−0 sin 0 = 0
fxy (x, y) = ∂y fx = ∂y
fyy (x, y) = ∂y fy = ∂y − e sin y = −e−x cos y
∂ ∂ −x fyy (0, 0) = −e−0 cos 0 = −1
138
Collecting terms and substituting them into equation (5.4) we obtain:
= 1 − x + 12 (x2 − y 2 )
Lastly we can graph the surface z = f (x, y) = e−x cos y and the quadratic
approximation function T2 (x, y).
139
Graph of z = 1 − x + 12 (x2 − y 2 )
Looking at these two graphs we see that the Taylor polynomial of degree
two, namely T2 (x, y), does a good job in mimicking the shape of the surface
z = f (x, y) for points (x, y) close to (0, 0). In the plane x = 1, the quadratic
approximation z = T2 (x, y) falls off too steeply along the y axis as we move
away from y = 0, while in the plane x = −1 it falls away too slowly along
the y axis as we move away from y = 0.
140
5.6.2 Exceptions: when derivatives do not exist
In earlier lectures we noted that at the very least a function must be con-
tinuous if it is to have a meaningful derivative. When we take successive
derivatives we may need to revisit the question of continuity for each new
function that we create.
If a function fails to be continuous at some point then we most certainly can
not take its derivative at that point.
141
that the domain of the function is the set of real numbers (−∞, ∞). Despite
the function being twice differentiable, we would not call it a C 2 function,
since the second derivative is not continuous.
We should always keep in mind that a function may only posses a finite
number of derivatives before we encounter a discontinuity. The tell-tale
signs to watch out for are sharp edges, holes or singularities in the graph of
the function.
Suppose you run a commercial business and that by some means you have
constructed the following formula for the profit of one of your lines of busi-
ness
f = f (x, y) = 4 − x2 − y 2 .
Clearly the profit f depends on two variables x and y. Sound business prac-
tice suggest that you would like to maximise your profits. In mathematical
terms this means find the values of (x, y) such that f is a maximum. A
simple plot of the graph of f shows us that the maximum occurs at (0, 0)
(corresponding to a maximum profit of 4 units). We may not be able to do
this so easily for other functions, and thus we need some systematic way of
computing the points (x, y) at which f is maximised.
You have seen similar problems for the case of a function of one variable.
And from that you may expect that for the present problem we will be
making a statement about the derivatives of f in order that we have a
maximum (i.e. that the derivatives should be zero). Let’s make this precise.
Let’s denote the (as yet unknown) point at which the function is a maximum
by P . Now if we have a maximum at this point, then moving in any direction
from this point should see the function decrease. That is the directional
derivative must be non-positive in every direction from P . In other words
we must have
df
= t · (∇f )p ≤ 0
ds ˜
for every choice of t. Let us assume (for the moment) that (∇f )p 6= 0 then
we should be able to˜ compute λ > 0 so that t = λ (∇f ) is a unit vector. If
p
˜ find
you now substitute this into the above you will
142
λ (∇f )p · (∇f )p ≤ 0
Look carefully at the left hand side. Each term is positive (remember a · a
˜ or
is the squared length of a vector a) yet the right hand side is either zero ˜
˜
negative. Thus this equation does not make sense and we have to reject our
only assumption, that (∇f )p 6= 0.
We have thus found that if f is to have a maximum at P then we must have
0 = (∇f )p
∂f ∂f
0= , and 0= at P
∂x ∂y
It is from these equations that we would then compute the (x, y) coordinates
of P .
Of course we could have posed the related question of finding the points at
which a function is minimised. The mathematics would be much the same
except for a change in words (maximum to minimum) and a corresponding
change in ± signs. The end result is the same though, the gradient ∇f must
vanish at P .
143
• A local minimum
• A local maximum
• A saddle point
0 = (∇f )p
we might get more than one point P . What do we make of these points?
Some of them might correspond to minimums while others might correspond
to maximums of f , and others still may correspond to saddle points. The
three options are shown in the following graphs.
144
A typical saddle point
A typical case might consist of any number of points like the above.
5.7.2 Notation
We have just seen that a function of two variables has stationary points
∂f ∂f
when = = 0. If (a, b) are the (x, y) coordinates of a stationary point
∂x ∂y
for f (x, y), then we can say
• A local maximum occurs when f (x, y) ≤ f (a, b) for all (x, y) close to
(a, b)
1
In fact, like the one variable case, two variable functions can possess points of singu-
larity, i.e., where the partial derivatives do not exist, and these can correspond to local
extrema as well. As you’d expect, the collection of all stationary points and singularity
points are called the critical points. However, we will not encounter two variable functions
with points of singularity in this course!
145
• A local minimum occurs when f (x, y) ≥ f (a, b) for all (x, y) close to
(a, b)
d2 f
• > 0 then this corresponds to a local minima
dx2
d2 f
• < 0 then this corresponds to a local maxima
dx2
d2 f
• = 0 then no decision can be made (e.g. x3 or x4 ).
dx2
2
∂2f ∂2f ∂2f
D= −
∂x2 ∂y 2 ∂x∂y
146
∂2f
A local minima when D>0 and >0
∂x2
∂2f
A local maxima when D>0 and <0
∂x2
147
5.7.4 Application of extrema
As a final note, we will now turn to some applications of the use of extrema.
∂A
=
∂x
and
∂A
=
∂y
∂A ∂A
Now let = 0 and = 0 and solve for both x and y.
∂x ∂y
148
Thus the dimensions of the box are x = ,y= and z = .
The second partial derivatives (for the above values of x and y) are
∂2A
=
∂x2
∂2A
=
∂y 2
∂2A
=
∂x∂y
∂ 2 A ∂ 2 A ∂ 2 A 2
D= −
∂x2 ∂y 2 ∂x∂y
Since D = the dimensions of the box of volume 12cm3 that uses the
minimum amount of material are x = y= z= .
149
Earlier we looked at methods of finding the shortest distance from a point
to a plane. We can now use extrema to answer questions such as these.
Example 5.32. A plane has the equation 2x + 3y + z = 12. Find the point
on the plane closest to the origin.
Solution: To answer this question we clearly want to minimise the distance
from the origin to the point on the plane. Let (x, y, z) be the point on the
plane. The distance between two points is given by
p
d = (x − x0 )2 + (y − y0 )2 + (z − z0 )2
G = x2 + y 2 + z 2
G = x2 + y 2 + (12 − 2x − 3y)2
∂G
=
∂x
and
∂G
=
∂y
∂G ∂G
Let both = 0 and = 0 and solve for x and y (simultaneous equations
∂x ∂y
is handy here).
150
The second partial derivatives, using the above values for x and y, are
∂2G
=
∂x2
∂2G
=
∂y 2
and
∂2G
=
∂x∂y
∂ 2 G ∂ 2 G ∂ 2 G 2
D= − =
∂x2 ∂y 2 ∂x∂y
Thus the point on the plane that gives the minimum distance to the origin
is (x, y, z) = ( ).
Example 5.33. You are given three positive numbers. The product of the
three numbers is P . The sum of the three numbers is 10.
(a) Find the three numbers that will give a maximum product?
(b) Show that this gives a maximum product.
(c) If it was required that the three numbers be whole numbers, can you find
the three (non-zero)numbers that sum to 10 and give a maximum product?
Does your answer to (a) help you find this?
151