MAT1841notesfor2024 S2

MAT1841
Continuous Mathematics for Computer Science
Lecture Notes
2024 Semester 2
Contents
1 Vectors, Lines and Planes 2

1.1 Introduction to Vectors . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Notation and definition . . . . . . . . . . . . . . . . . 2
1.1.2 Linear independence . . . . . . . . . . . . . . . . . . . 4
1.1.3 Algebraic properties . . . . . . . . . . . . . . . . . . . 4
1.2 Vector Dot Product . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Length of a vector . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Unit Vectors . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Scalar projections . . . . . . . . . . . . . . . . . . . . 8
1.2.4 Vector projection . . . . . . . . . . . . . . . . . . . . . 9
1.3 Vector Cross Product . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Interpreting the cross product . . . . . . . . . . . . . . 12
1.3.2 Right hand thumb rule . . . . . . . . . . . . . . . . . . 12
1.4 Lines in 3-dimensional space . . . . . . . . . . . . . . . . . . . 15
1.4.1 Vector equation of a line . . . . . . . . . . . . . . . . . 17
1.5 Planes in 3-dimensional space . . . . . . . . . . . . . . . . . . 19
1.5.1 Constructing the equation of a plane . . . . . . . . . . 19
1.5.2 Parametric equations for a plane . . . . . . . . . . . . 20
1.5.3 Vector equation of a plane . . . . . . . . . . . . . . . . 21
1.6 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . 24
1.6.1 Examples of Linear Systems . . . . . . . . . . . . . . . 24
1.6.2 A standard strategy . . . . . . . . . . . . . . . . . . . 25
1.6.3 Points, lines and planes - intersections . . . . . . . . . 26
1.6.4 Points, lines and planes - distances . . . . . . . . . . . 28
1.6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 31
2 Matrices 34
2.1 Introduction - notation and operations . . . . . . . . . . . . . 34
2.1.1 Operations on matrices . . . . . . . . . . . . . . . . . 35
2.1.2 Some special matrices . . . . . . . . . . . . . . . . . . 36
2.1.3 Properties of matrices . . . . . . . . . . . . . . . . . . 36
2.1.4 Inverses of square matrices . . . . . . . . . . . . . . . 37
2.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . 37
ii
2.2.1 Gaussian elimination strategy . . . . . . . . . . . . . . 38
2.2.2 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Systems of equations using matrices . . . . . . . . . . . . . . 40
2.3.1 The augmented matrix . . . . . . . . . . . . . . . . . . 41
2.4 Row echelon form . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.1 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.2 Homogeneous systems . . . . . . . . . . . . . . . . . . 45
2.4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.6 Matrix Transpose . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.7 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7.1 Properties of determinants . . . . . . . . . . . . . . . 51
2.7.2 Vector cross product using determinants . . . . . . . . 51
2.7.3 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . 52
2.8 Obtaining inverses using Gauss-Jordan elimination . . . . . . 52
2.8.1 Inverse - another method . . . . . . . . . . . . . . . . 54
3 Calculus 55
3.1 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Rate of change . . . . . . . . . . . . . . . . . . . . . . 55
3.1.2 Definition of the derivative f 0 (x) and the slope of a
tangent line . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1.3 Techniques of differentiation - rules . . . . . . . . . . . 58
3.2 Maximum and minimum of functions . . . . . . . . . . . . . . 60
3.3 Differentiating inverse, circular and exponential functions . . 65
3.3.1 Inverse functions and their derivatives . . . . . . . . . 65
3.3.2 Exponential and logarithmic functions: ex and ln x . . 66
3.3.3 Derivatives of circular functions . . . . . . . . . . . . . 69
3.4 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . 74
3.5 Parametric curves and differentiation . . . . . . . . . . . . . . 77
3.5.1 Parametric curves . . . . . . . . . . . . . . . . . . . . 77
3.5.2 Parametric differentiation . . . . . . . . . . . . . . . . 78
3.6 Function approximations . . . . . . . . . . . . . . . . . . . . . 81
3.6.1 Introduction to power series . . . . . . . . . . . . . . . 81
3.6.2 Power series . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6.3 Taylor series . . . . . . . . . . . . . . . . . . . . . . . 86
3.6.4 Derivation of Taylor polynomials from first principles 90
3.6.5 Taylor series centred at x 6= 0 . . . . . . . . . . . . . . 93
3.6.6 Cubic splines interpolation . . . . . . . . . . . . . . . 94
4 Integration 101
4.1 Fundamental theorem of calculus . . . . . . . . . . . . . . . . 101
4.1.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.1.2 Fundamental Theorem of Calculus . . . . . . . . . . . 104
iii
4.2 Area under the curve . . . . . . . . . . . . . . . . . . . . . . . 105
4.3 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . . 108
5 Multivariable Calculus 111

5.1 Functions of several variables . . . . . . . . . . . . . . . . . . 111
5.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.1.3 Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.1.4 Alternative forms . . . . . . . . . . . . . . . . . . . . . 116
5.2 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . 118
5.2.1 First partial derivatives . . . . . . . . . . . . . . . . . 118
5.3 The tangent plane . . . . . . . . . . . . . . . . . . . . . . . . 120
5.3.1 Geometric interpretation . . . . . . . . . . . . . . . . . 121
5.3.2 Linear approximations . . . . . . . . . . . . . . . . . . 124
5.4 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5 Gradient and Directional Derivative . . . . . . . . . . . . . . 131
5.6 Second order partial derivatives . . . . . . . . . . . . . . . . . 135
5.6.1 Taylor polynomials of higher degree . . . . . . . . . . 138
5.6.2 Exceptions: when derivatives do not exist . . . . . . . 141
5.7 Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.7.1 Finding stationary points . . . . . . . . . . . . . . . . 142
5.7.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.7.3 Minima, Maxima or Saddle point . . . . . . . . . . . . 145
5.7.4 Application of extrema . . . . . . . . . . . . . . . . . . 148
1
Chapter 1
Vectors, Lines and Planes
1.1 Introduction to Vectors
1.1.1 Notation and definition
Common forms of vector notation are bold symbols (v), arrow notation (~v )
and tilde notation (v). Throughout this Study Guide we will use the tilde
˜
notation. This notation compares suitably with the handwritten notation
for vectors. Points in space are represented by a capital letter (for example
the point P ). Note that capital letters are also used for matrices, but con-
sidering each of these objects, this does not lead to ambiguity.
————————————————–
Vectors can be defined in (at least) two ways - algebraically as objects like
v = (1, 7, 3)
˜
u = (2, −1, 4)
˜
or geometrically as arrows in space.
2
Note that vectors have both magnitude and direction. A quantity speci-
fied only by a number (but no direction) is known as a scalar.
How can we be sure that these two definitions actually describe the same
object? Equally, how do we convert from one form to the other? That is,
given v = (1, 2, 7) how do we draw the arrow and likewise, given the arrow
how do˜ we extract the numbers (1, 2, 7)?
Suppose we are give two points P and Q. Suppose also that we find the
change in coordinates from P to Q is (say) (1, 2, 7). We could also draw an
arrow from P to Q. Thus we have two ways of recording the path from P
to Q, either as the numbers (1, 2, 7) or the arrow.
Suppose now that we have another pair of points R and S and further that
we find the change in coordinates to be (1, 2, 7). Again, we can join the
points with an arrow. This arrow will have the same direction and length
as that for P to Q.
In both cases, the displacement, from start to finish, is represented by
either the numbers (1, 2, 7) or the arrow – thus we can use either form to
represent the vector. Note that this means that a vector does not live at
any one place in space – it can be moved anywhere provided its length and
direction are unchanged.
To extract the numbers (1, 2, 7) given just the arrow, simply place the arrow
somewhere in the x, y, z space, and the measure the change in coordinates
from tail to tip of the vector. Equally, to draw the vector given the numbers
(1, 2, 7), choose (0, 0, 0) as the tail then the point (1, 2, 7) is the tip.
The components of a vector are just the numbers we use to describe the
vector. In the above, the components of v are 1,2 and 7.
˜
Another very common way to write a vector, such as v = (1, 7, 3) for exam-
ple, is v = 1i + 7j + 3k. The three vectors i, j, k are a ˜simple way to remind
us that˜ the ˜three˜ numbers
˜ in v = (1, 7, 3) ˜refer
˜ ˜ to directions parallel to the
˜
three coordinate axes (with i parallel to the x-axis, j parallel to the y-axis
and k parallel to the z-axis).˜ ˜
˜
3
In this way we can always write down any 3-dimensional vector as a linear
combination of the vectors i, j, k and thus these vectors are also known as
basis vectors. ˜ ˜ ˜
1.1.2 Linear independence
Two or more vectors are linearly independent if we cannot take any one
of the vectors and write it as a linear combination of the others. We
cannot write i as a linear combination of j and k. In other words there are
˜
no non-zero scalars α and β such that i =˜ αj +˜β k. Thus the basis vectors
i, j, k are linearly independent. ˜ ˜ ˜
˜ ˜ ˜
If we can take a vector and write it as a linear combination of other vectors,
then those vectors are known as linearly dependent. For example the
vectors u = (7, 17, −3), v = (1, 2, 3) and w = (3, 7, 1) are linearly dependent
as u = 3˜w − 2v. ˜ ˜
˜ ˜ ˜
1.1.3 Algebraic properties
What rules must we observe when we are working with vectors?
• Equality
v = w only when the arrows for v and w are identical.
˜ ˜ ˜ ˜
• Stretching (scalar multiple)

The vector λv is parallel to v but is stretched by a factor λ. The
˜ ˜
4
magnitude of λv is |λ| times the magnitude of v.
˜ ˜
• Addition
To add two vectors v and w arrange the two so that they are tip to tail.
Then v + w is the ˜vector ˜that starts at the first tail and ends at the
second˜ tip.˜ Thus the sum of two vectors v and w is the displacement
vector resulting from first applying v then˜ w. ˜
˜ ˜
• Subtraction
The difference v − w of two vectors v and w is the displacement vector
resulting from ˜firstãpplying v then˜ −w. Ñote that −w is simply the
vector w now pointing in theõpposite ˜direction to w. ˜
˜ ˜
Example 1.1. Express each of the above rules in terms of the components
of vectors (i.e. in terms of numbers like (1, 2, 7) and (a, b, c)).
Example 1.2. Given v = (3, 4, 2) and w = (1, 2, 3) compute v + w and

2v + 7w. ˜ ˜ ˜ ˜
˜ ˜
5
Example 1.3. Given v = (1, 2, 7) draw v, 2v and −v.
˜ ˜ ˜ ˜
Example 1.4. Given v = (1, 2, 7) and w = (3, 4, 5) draw and compute v− w.

˜ ˜ ˜ ˜
1.2 Vector Dot Product
How do we multiply vectors? We have already seen one form, where we

stretch v by a scalar λ, i.e. v → λv. This is called scalar multiplication.
˜ ˜ ˜
Another form is the vector dot product. Let v = (vx , vy , vz ) and w =
˜ dot product v · w by
(wx , wy , wz ) be a pair of vectors, then we define the ˜
˜ ˜
v · w = v x wx + v y wy + v z v z .
˜ ˜
Example 1.5. Let v = (1, 2, 7) and w = (−1, 3, 4). Compute v · v, w · w
and v · w ˜ ˜ ˜ ˜ ˜ ˜
˜ ˜
What do we observe?
• v · w is a single number not a vector (i.e. it is a scalar )

˜ ˜
• v·w =w·v
˜ ˜ ˜ ˜
• (λv) · w = λ(v · w)
˜ ˜ ˜ ˜
• (a + b) · v = a · v + b · v
˜ ˜ ˜ ˜ ˜ ˜ ˜
The last two cases display what we call linearity.
6
1.2.1 Length of a vector
The length of a vector v is defined by

˜
√
|v| = v · v.
˜ ˜ ˜
The notation |v| should be distinguished from the absolute value for a scalar
(for example | ˜− 5| = 5 ). The length of a vector is one example of a norm,
which is a quantity used in higher level mathematics.
Example 1.6. Let v = (1, 2, 7). Compute the distance from (0, 0, 0) to
˜ with √v · v.
(1, 2, 7). Compare this
˜ ˜
We can now show that

v · w = |v||w| cos θ
˜ ˜ ˜ ˜
where
1/2
|v| = the length of v = vx2 + vy2 + vz2
˜ ˜ 1/2
|w| = the length of w = wx2 + wy2 + wz2
˜ ˜
and θ is the angle between the two vectors.
How do we prove this? Simply start with v − w and compute its length,
˜ ˜
2
|v − w| = (v − w) · (v − w)
˜ ˜ ˜ ˜ ˜ ˜
=v·v−v·w−w·v+w·w
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
= |v|2 + |w|2 − 2v · w
˜ ˜ ˜ ˜
and from the Cosine Rule for triangles we know
|v − w|2 = |v|2 + |w|2 − 2|v||w| cos θ

˜ ˜ ˜ ˜ ˜ ˜
Thus we have
v · w = |v||w| cos θ
˜ ˜ ˜ ˜
This gives us a convenient way to compute the angle between any pair of
vectors. If we find cos θ = 0 then we can say that v and w are orthogonal
(perpendicular). ˜ ˜
7
• Vectors v and w are orthogonal when v · w = 0 (provided neither v
˜ zero).˜
nor w are ˜ ˜ ˜
˜
Example 1.7. Find the angle between the vectors v = (2, 7, 1) and w =
(3, 4, −2) ˜ ˜
1.2.2 Unit Vectors
A vector is said to be a unit vector if its length is one. That is, v is a unit
vector when v · v = 1. The notation for a unit vector is vˆ (called ˜‘v hat’).
˜ ˜ v ˜ ˜
Unit vectors are calculated by: vˆ = ˜
˜ |v|
˜
1.2.3 Scalar projections
This is simply the length of the shadow cast by one vector onto another.
The scalar projection, vw , of v in the direction of w is given by

˜ ˜
v·w
vw = ˜ ˜
|w|
˜
Example 1.8. What is the length (i.e. scalar projection) of v = (1, 2, 7) in
the direction of the vector w = (2, 3, 4)? ˜
˜
8
1.2.4 Vector projection
This time we produce a vector shadow with length equal to the scalar pro-
jection.
The vector projection, vw , of v in the direction of w is given by

˜ ˜ ˜

v·w
vw = ˜ ˜2 w
˜ |w| ˜
˜
Example 1.9. Find the vector projection of v = (1, 2, 7) in the direction of
w = (2, 3, 4) ˜
˜
Example 1.10. This example shows how a vector may be resolved into
its parts parallel and perpendicular to another vector.
Given v = (1, 2, 7) and w = (2, 3, 4) express v in terms of w and a vector
˜
perpendicular to w. ˜ ˜ ˜
˜
9
Vector Dot Product - Summary
Let v = (vx , vy , vz ) and w = (wx , wy , wz ). Then the Dot Product

of v ãnd w is the scalar ˜defined by
˜ ˜
v · w = v x wx + v y wy + v z v z
˜ ˜
Consider the angle θ between the two vectors such that 0 ≤ θ ≤ π:
v.w
Then cos θ = ˜ ˜
|v||w|
˜ ˜
Two vectors are orthogonal if and only if
v.w = 0
˜ ˜
The scalar projection, vw , of v in the direction of w is given by

˜ ˜
v·w
vw = ˜ ˜
|w|
˜
The vector projection, vw , of v in the direction of w is given by
˜ ˜ ˜

v·w
vw = ˜ ˜2 w
˜ |w| ˜
˜
1.3 Vector Cross Product
The vector cross product is another way to multiply vectors. We start with
vectors v = (vx , vy , vz ) and w = (wx , wy , wz ). Then we define the cross
product ˜ v × w by ˜
˜ ˜
v × w = (vy wz − vz wy , vz wx − vx wz , vx wy − vy wx )
˜ ˜
10
From this definition we can observe
• v × w is a vector
˜ ˜
• v × w = −w × v
˜ ˜ ˜ ˜
• v × v = 0 = (0, 0, 0) (the zero vector)
˜ ˜ ˜
• (λv) × w = λ(v × w)
˜ ˜ ˜ ˜
• (a + b) × v = a × v + b × v
˜ ˜ ˜ ˜ ˜ ˜ ˜
• (v × w) · v = (v × w) · w = 0
˜ ˜ ˜ ˜ ˜ ˜
Example 1.11. Verify all of the above.
Example 1.12. Given v = (1, 2, 7) and w = (−2, 3, 5) compute v × w, and

˜ of v and w. ˜
its dot product with each ˜ ˜
˜ ˜
11
1.3.1 Interpreting the cross product
We know that v × w is a vector and we know how to compute it. But can we
˜ ˜ First we need a vector, so let’s assume that v × w 6= 0.
describe this vector?
Then what can we say about the direction and length of v × w? ˜ ˜ ˜
˜ ˜
The first thing we should note is that the cross product is a vector which
is orthogonal to both of the original vectors. Thus v × w is a vector that is
orthogonal to v and to w. This fact follows from the˜ definition
˜ of the cross
product. ˜ ˜
Thus we must have

v × w = λn
˜ ˜ ˜
where n is a unit vector orthogonal to both v and w and λ is some unknown
number˜ (at this stage). ˜ ˜
How do we construct n and λ? Let’s do it!

˜
1.3.2 Right hand thumb rule
For any choice of v and w you can see that there are two choices for n – one
˜
points in the opposite ˜
direction ˜ It’s
to the other. Which one do we choose?
up to us to make a hard rule. This is it. Place your right hand palm so that
your fingers curl over from v to w. Your thumb then points in the direction
of v × w. ˜ ˜
˜ ˜
Now for λ, we will show that
|v × w| = λ = |v||w| sin θ
˜ ˜ ˜ ˜
How? First we build a triangle from v and w and then compute the cross
product for each pair of vectors ˜ ˜
v × w = λθ n
˜ ˜ ˜
(v − w) × v = λφ n
˜ ˜ ˜ ˜
(v − w) × w = λρ n
˜ ˜ ˜ ˜
12
(one λ for each of the three vertices). We need to compute each λ.
Now since (β v) × w = β(v × w) for any number β we must have λθ in
˜ ˜
v × w = λθ n proportional ˜ |v||˜w|, likewise for the other λ’s. Thus
to
˜ ˜ ˜ ˜ ˜
λθ = |v||w|αθ
˜ ˜
λφ = |v||v − w|αφ
˜ ˜ ˜
λρ = |w||v − w|αρ
˜ ˜ ˜
where each α depends only on the angle between the two vectors on which
it was built (i.e. αφ depends only on the angle φ between v and v − w).
˜ ˜ ˜
But we also have v × w = (v − w) × v = (v − w) × w which implies that
λθ = λφ = λρ which ˜ in ˜turn gives
˜ ˜us ˜ ˜ ˜ ˜
αθ αφ αρ
= =
|v − w| |w| |v|
˜ ˜ ˜ ˜
But we also have the Sine Rule for triangles
sin θ sin φ sin ρ
= =
|v − w| |w| |v|
˜ ˜ ˜ ˜
and so
αθ = k sin θ, αφ = k sin φ, αρ = k sin ρ
where k is a number that does not depend on any of the angles nor on any of
lengths of the edges – the value of k is the same for every triangle. We can
choose a trivial case to compute k, simply put v = (1, 0, 0) and w = (0, 1, 0).
Then we find k = 1. ˜ ˜
We have now found that
|v × w| = |v||w| sin θ
˜ ˜ ˜ ˜
————————————————————
Example 1.13. Show that |v × w| also equals the area of the parallelogram
formed by v and w. ˜ ˜
˜ ˜
13
Vector Cross Product - Summary
Let v = (vx , vy , vz ) and w = (wx , wy , wz ). Then the Cross Product

˜
is defined by ˜
v × w = (vy wz − vz wy , vz wx − vx wz , vx wy − vy wx )
˜ ˜
v × w = −w × v gives a vector orthogonal to both v and w, and

˜defined
˜ by the
˜ right-hand
˜ rule: ˜ ˜
If θ is the angle between v and w, such that 0 ≤ θ ≤ π:

˜ ˜ |v × w|
sin θ = ˜ ˜ .
|v||w|
˜ ˜
Two vectors are parallel if and only if
v × w = 0.
˜ ˜ ˜
The area of the parallelogram spanned by vectors v and w is:
˜ ˜
A = |v × w|.
˜ ˜
14
1.4 Lines in 3-dimensional space
Through any pair of distinct points we can always construct a straight line.
These lines are normally drawn to be infinitely long in both directions.
Example 1.14. Find all points on the line joining (2, 4, 0) and (2, 4, 7).
Example 1.15. Find all points on the line joining (2, 0, 0) and (2, 4, 7).
These equations for the line are all of the form

x(t) = a + pt , y(t) = b + qt , z(t) = c + rt
where t is a parameter (it selects each point on the line) and the numbers
a, b, c, p, q, r are computed from the coordinates of two points on the line.
(There are other ways to write an equation for a line.)
How do we compute a, b, c, p, q, r? It is a simple recipe.
• First put t = 0, then x = a, y = b, z = c. That is (a, b, c) are the

coordinates of one point (such as P ) on the line and so a, b, c are
known.
• Next, put t = 1, then x = a + p, y = b + q, z = c + r. Take this to be
the second point (such as Q) on the line, and thus solve for p, q, r.
A common interpretation is that (a, b, c) are the coordinates of one (any)

point on the line and (p, q, r) are the components of a (any) vector parallel
to the line.
15
Example 1.16. Find the equation of the line joining the two points (1, 7, 3)
and (2, 0, −3).
Example 1.17. Show that a line may also be expressed as

x−a y−b z−c
= =
p q r
provided p 6= 0, q 6= 0 and r 6= 0. This is known as the Symmetric Form
of the equation for a straight line.
Example 1.18. In some cases you may find a small problem with the form
suggested in the previous example. What is that problem and how would
you deal with it?
Example 1.19. Determine if the line defined by the points (1, 0, 1) and
(1, 2, 0) intersects with the line defined by the points (3, −1, 0) and (1, 2, 5).
Example 1.20. Is the line defined by the points (3, 7, −1) and (2, −2, 1)
parallel to the line defined by the points (1, 4, −1) and (0, −5, 1).
Example 1.21. Is the line defined by the points (3, 7, −1) and (2, −2, 1)
parallel to the line defined by the points (1, 4, −1) and (−2, −23, 5).
16
1.4.1 Vector equation of a line
The parametric equations of a line are
x(t) = a + pt y(t) = b + qt z(t) = c + rt
Note that
(a, b, c) = the vector to one point (P ) on the line
(p, q, r) = the vector from the first point to
the second point on the line (P to Q)
= a vector parallel to the line
Let’s relabel these and put d = (a, b, c), v = (p, q, r) and r(t) = (x(t), y(t), z(t)),
then ˜ ˜ ˜
r(t) = d + tv
˜ ˜ ˜
This is known as the vector equation of a line.
Example 1.22. Write down the vector equation of the line that passes
through the points (1, 2, 7) and (2, 3, 4).
Example 1.23. Write down the vector equation of the line that passes
through the points (2, 3, 7) and (4, 1, 2).
17
Lines in R3
The vector equation of a line L is determined using a point P on the

line and a vector v in the direction of the line. Let
˜
d = (a, b, c)
˜
be the position vector of P , and
v = (p, q, r)
˜
bet a vector parallel to the line, then a line is defined as all vectors
which pass through the point P and are parallel to the vector v.
˜
Thus the vector (or parametric) equation of the line L is given by
r(t) = d + tv
˜ ˜ ˜
where t is a parameter. As t is varied all the points on L are traced
out.
18
1.5 Planes in 3-dimensional space
A plane in 3-dimensional space is a flat 2-dimensional surface. The standard

equation for a plane in 3-d is
ax + by + cz = d
where a, b, c and d are some bunch of numbers that identify this plane from
all other planes. (There are other ways to write an equation for a plane, as
we shall see).
Example 1.24. Sketch each of the planes z = 1, y = 3 and x = 1.
1.5.1 Constructing the equation of a plane
A plane is uniquely determined by any three points (provided not all three
points are contained on a line). Recall, that a line is fully determined by
any pair of points on the line.
We can find the equation of the plane that passes through the three points
(1, 0, 0), (0, 3, 0) and (0, 0, 2). To do this we need to compute a, b, c and d.
We do this by substituting each point into the above equation,
1st point a·1+b·0+c·0=d

2nd point a·0+b·3+c·0=d
3rd point a·0+b·0+c·2=d
Now we have a slight problem, we are trying to compute four numbers,

a, b, c, d but we only have three equations. We have to make an arbitrary
choice for one of the four numbers a, b, c, d. Let’s set d = 6. Then we find
from the above that a = 6, b = 2 and c = 3. Thus the equation of the plane
is
6x + 2y + 3z = 6
Example 1.25. What equation do you get if you chose d = 1 in the previous
example? What happens if you chose d = 0?
19
Example 1.26. Find an equation of the plane that passes through the three
points (−1, 0, 0), (1, 2, 0) and (2, −1, 5).
1.5.2 Parametric equations for a plane
Recall that a line could be written in the parametric form
x(t) = a + pt
y(t) = b + qt
z(t) = c + rt
A line is one-dimensional so its points can be selected by a single parameter

t.
However, a plane is two-dimensional and so we need two parameters (say u
and v) to select each point. Thus it’s no surprise that every plane can also
be described by the following equations
x(u, v) = a + pu + lv
y(u, v) = b + qu + mv
z(u, v) = c + ru + nv
Now we have nine parameters a, b, c, p, q, r, l, m and n. These can be com-

puted from the coordinates of three (distinct) points on the plane. For the
first point put (u, v) = (0, 0), the second put (u, v) = (1, 0) and for the final
point put (u, v) = (0, 1). Then solve for a through to n.
Example 1.27. Find the parametric equations of the plane that passes
through the three points (−1, 0, 0), (1, 2, 0) and (2, −1, 5).
20
Example 1.28. Show that the parametric equations found in the previous
example describe exactly the same plane as found in Example 1.26 (Hint
: substitute the answers from Example 1.27 into the equation found in
Example 1.26).
Example 1.29. Find the parametric equations of the plane that passes
through the three points (−1, 2, 1), (1, 2, 3) and (2, −1, 5).
Example 1.30. Repeat the previous example but with points re-arranged as
(−1, 2, 1), (2, −1, 5) and (1, 2, 3). You will find that the parametric equations
look different yet you know they describe the same plane. If you did not
know this last fact, how would you prove that the two sets of parametric
equations describe the same plane?
1.5.3 Vector equation of a plane
The Cartesian equation for a plane is
ax + by + cz = d
for some numbers a, b, c and d. We will now re-express this in a vector form.
Suppose we know one point on the plane, say (x, y, z) = (x, y, z)0 , then
ax0 + by0 + cz0 = d

⇒ a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0
This is an equivalent form of the above equation.
Now suppose we have two more points on the plane (x, y, z)1 and (x, y, z)2 .
Then
a(x1 − x0 ) + b(y1 − y0 ) + c(z1 − z0 ) = 0
a(x2 − x0 ) + b(y2 − y0 ) + c(z2 − z0 ) = 0
21
Put ∆x10 = (x1 − x0 , y1 − y0 , z1 − z0 ) and ∆x20 = (x2 − x0 , y2 − y0 , z2 − z0 ).
Notice˜that both of these vectors lie in the plane
˜ and that
(a, b, c) · ∆x10 = (a, b, c) · ∆x20 = 0

˜ ˜
What does this tell us? Simply that both vectors are orthogonal to the
vector (a, b, c). Thus we must have that
(a, b, c) = the normal vector to the plane
Now let’s put
n = (a, b, c) = the normal vector to the plane

˜
d = (x0 , y0 , z0 ) = one (any) point on the plane
˜
r = (x, y, z) = a typical point on the plane
˜
Then we have
n · (r − d) = 0
˜ ˜ ˜
This is the vector equation of a plane.
Example 1.31. Find the vector equation of the plane that contains the
points (1, 2, 7), (2, 3, 4) and (−1, 2, 1).
Example 1.32. Re-express the previous result in the form ax + by + cz = d.
22
Planes in R3
The vector equation of a plane is determined using a point P on the

plane and a direction n (known as the normal direction) which is
˜
perpendicular to the plane. Then all vectors on the plane which pass
through P are normal to n, i.e.
˜
n · (r − d) = 0
˜ ˜ ˜
where r = (x, y, z) is a typical point on the plane, and d = (x0 , y0 , z0 )
˜
is a particular point (P ) on the plane. ˜
23
1.6 Systems of Linear Equations
1.6.1 Examples of Linear Systems
The central problem in linear algebra is to solve systems of simultaneous

linear equation. A system of linear equations is a collection of equations
which have the same set of variables. Let’s look at some examples.
Bags of coins
We have three bags with a mixture of gold, silver and copper coins. We are
given the following information
Bag 1 contains 10 gold, 3 silver, 1 copper and weighs 60g
The question is – What are the respective weights of the gold, silver and
copper coins?
Let G, S and C denote the weight of each of the gold, silver and copper
coins. Then we have the system of equations
10G + 3S + C = 60
5G + S + 2C = 30
3G + 2S + 4C = 25
Silly puzzles
John and Mary’s ages add to 75 years. When John was half his present age
he was twice as old as Mary. How old are they?
We have just two equations in our system:
J + M = 75
1
2J − 2M = 0
Intersections of planes
It is easy to imagine three planes in space. Is it possible that they share one
point in common? Here are the equations for three such planes
3x + 7y − 2z = 0
6x + 16y − 3z = −1
3x + 9y + 3z = 3
Can we solve this system for (x, y, z)?
In all of the above examples we need to unscramble the set of linear equa-
tions to extract the unknowns (e.g. G, S, C etc).
To solve a system of linear equations is to find solutions to the sets of
equations. In other words we find values that the variables can take such
that each of the equations in the system is true.
24
1.6.2 A standard strategy
We start with the previous example
3x + 7y − 2z = 0 (1)
6x + 16y − 3z = −1 (2)
3x + 9y + 3z = 3 (3)
Suppose by some process we were able to rearrange these equations into the
following form
3x + 7y − 2z = 0 (1)
2y + z = −1 (2)0
4z = 4 (3)00
Then we could solve (3)00 for z
(3)00 ⇒ 4z = 4 ⇒ z=1
and then substitute into (2)0 to solve for y
(2)0 ⇒ 2y + 1 = −1 ⇒ y = −1
and substitute into (1) to solve for x
(1) ⇒ 3x − 7 − 2 = 0 ⇒ x=3
How do we get the modified equations (1), (2)0 and (3)00 ?

The general method is to take suitable combinations of the equations so that
we can eliminate various terms. This method is applied as many times as
we need to turn the original equations into the simple form like (1), (2)0 and
(3)00 .
Let’s start with the first pair of the original equations
3x + 7y − 2z = 0 (1)
6x + 16y − 3z = −1 (2)
We can eliminate the 6x in equations (2) by replacing equation (2) with

(2) − 2(1),
⇒ 0x + (16 − 14)y + (−3 + 4)z = −1 (2)0
⇒ 2y + z = −1 (2)0
Likewise, for the 3x term in equation (3) we replace equation (3) with (3) −
(1),
⇒ 2y + 5z = 3 (3)0
25
At this point our system of equations is
3x + 7y − 2z = 0 (1)
2y + z = −1 (2)0
2y + 5z = 3 (3)0
The last step is to eliminate the 2y term in the last equation. We do this
by replacing equation (3)0 with (3)0 − (2)0
⇒ 4z = 4 (3)00
So finally we arrive at the system of equations
3x + 7y − 2z = 0 (1)
4z = 4 (3)00
which, as before, we solve to find z = 1, y = −1 and x = 3.
The procedure we just went through is known as a reduction to upper

triangular form and we used elementary row operations to do so. We
then solved for the unknowns by back substitution.
This procedure is applicable to any system of linear equations (though be-
ware, for some systems the back substitution method requires special care,
we’ll see examples later).
The general strategy is to eliminate all terms below the main diagonal,
working column by column from left to right. More on this later!
1.6.3 Points, lines and planes - intersections
In previous lectures we saw how we could construct the equations for lines
and planes. Now we can answer some simple questions.
How do we compute the intersection between a line and a plane? Can we
be sure that they do intersect? And what about the intersection of a pair
or more of planes?
The general approach to all of these questions is simply to write down equa-
tions for each of the lines and planes and then to search for a common point
(i.e. a consistent solution to the system of equations).
Example 1.33. Is the point (1, 2, 3) on the line r(t) = (3, 4, 5) + ((2, 2, 2)t?
˜
Solution: We simply check if the following system of equations yields the
same value for t.
1 = 3 + 2t
2 = 4 + 2t
3 = 5 + 2t
26
Rearranging the top equation gives t = −1. In fact, each of these three
equations gives t = −1 hence the point (1, 2, 3) is on the line r(t) = (3, 4, 5)+
(2, 2, 2)t. ˜
Example 1.34. Is the point (1, 2, 4) on the line r(t) = (3, 4, 5) + (2, 2, 2)t?
˜
Example 1.35. Do the lines r1 (t) = (1, 0, 0)+(1, 0, 0)t and r2 (s) = (0, 0, 0)+
(0, 1, 0)s intersect? If so, find˜ the point of intersection. ˜
Solution: To answer this question we simply solve the system of equations

as follows.
1 + 1t = 0 + 0s
0 + 0t = 0 + 1s
0 + 0t = 0 + 0s
This system of equations has the solution t = −1 and s = 0. Hence the
two lines do intersect. To find the point of intersection, we put t = −1
into the first line (or s = 0 into the second line). This gives r1 (−1) =
˜ the two
(1, 0, 0) + (1, 0, 0)(−1) = (0, 0, 0). Hence the point of intersection of
lines is the origin.
Example 1.36. Do the lines r1 (t) = (1, 2, 3)+(1, 1, 2)t and r2 (s) = (0, 0, 7)+
(1, 1, 1)s intersect? ˜ ˜
Example 1.37. Find the intersection of the line x(t) = 1 + 3t, y(t) = 3 − 2t,
z(t) = 1 − t with the plane 2x + 3y − 4z = 1.
27
Example 1.38. Find the intersection of the plane y = 0 with the plane
2x + 3y − 4z = 1.
Example 1.39. Find the intersection of the three planes 2x + 3y − z = 1,

x − y = 2 and x = 1
1.6.4 Points, lines and planes - distances
Now we are well equipped to be able to find the distances between points,
lines and planes. There are various combinations we can have, such as the
distance between a point and a plane, or the distance between a line and a
plane.
Example 1.40. Find the distance between the point (1, 2, 3) and the line
given by the equation r(t) = (0, 0, 7)+(1, 1, 1)t. Solution: Firstly we subtract
the position vector of˜ the point v = (1, 2, 3) from the equation of the line.
This will give us a vector u that is dependent on the parameter t.
˜
u(t) = (0, 0, 7) + (1, 1, 1)t − (1, 2, 3)

˜
= (−1, −2, 4) + (1, 1, 1)t
= (−1 + t, −2 + t, 4 + t)
Think of the tail of this vector being fixed at the point (1, 2, 3) and its tip
running along the line as t changes. In order to find the shortest distance
(note that when asked to find the distance, it is implied that this means find
the shortest distance) we want to find the value of t for which the length of
u us as short as possible. We can also note that the shortest vector u will
˜be perpendicular to the direction of the line. This means that the˜ dot
product of the vectors (1, 1, 1) and (−1 + t, −2 + t, 4 + t) will be zero.
(1, 1, 1) · (−1 + t, −2 + t, 4 + t) = −1 + t − 2 + t + 4 + t = 1 + 3t = 0
28
Hence t = − 13 . Now put this value of t into the vector u to give:
˜
1 1 1 1 4 7 11
u(− ) = (−1 − , −2 − , 4 − ) = (− , − , )
˜ 3 3 3 3 3 3 3
Now simply calculate the length of u(− 31 ) = (− 43 , − 73 , 11
3 ). This gives |u| =
. If you plug t = − 3 into the vector equation of the line, you get˜the
1 ˜
coordinates of the point on the line that is closest to the point (1, 2, 3).
Example 1.41. Find the distance between the two lines
(1, 2, 3) + (1, 1, 2)t
and
(0, 0, 7) + (1, 1, 1)s
Parallel lines
If two lines are parallel, then it is easy to calculate the distance between
them. Simply pick a point on one of the lines and calculate its distance
from the other line, as per finding the distance between a point and a line.
Remember that two lines are parallel if the direction vector of one line is a
scalar multiple of the direction vector of the other line.
Example 1.42. Find the distance between the two lines
(1, 2, 3) + (1, 1, 2)t
and
(0, 0, 7) + (2, 2, 4)s
29
Another way of finding the distance between two lines uses scalar projec-
tion. Using this method we find any vector that joins a point on one line
to the other line, and then compute the scalar projection of this vector onto
the vector orthogonal to both lines (it helps to draw a diagram).
Example 1.43. Find the distance between the point (2, 3, 4) and the plane
given by the equation x + 2y + 3z = 4.
Solution: First of all we need to find a point on the plane. By setting y = 0
and z = 0 we find x = 4. Thus (4, 0, 0) is a point on the plane. Now we find
the normal vector of the plane, n = (1, 2, 3). We then form a vector v from
˜ the given point (2, 3, 4). Thus
the point on the plane (4, 0, 0) to ˜
v = (2, 3, 4) − (4, 0, 0) = (−2, 3, 4)

˜
Now find the scalar projection of v onto the normal vector n. This is the
shortest distance from the point (2,˜ 3, 4) to the plane x + 2y +
˜ 3z = 4.
Example 1.44. Find the distance between the line r(t) = (2, 3, 4)+(3, 0, −1)t
and the plane plane x + 2y + 3z = 4. ˜
Example 1.45. Find the distance between the two planes 2x + 3y − 4z = 2

and 4x + 6y − 8z = 3.
30
1.6.5 Summary
• The equation ax + by = c (or equivalently a1 x1 + a2 x2 = b represents

a straight line in 2-space.
• The equation ax + by + cz = d (or equivalently a1 x1 + a2 x2 + a3 x3 = b
represents a plane in 3-space.
• The equation a1 x1 + a2 x2 + a3 x3 + · · · + an xn = b represents a hyper-
plane in n-space.
The equation a1 x1 +a2 x2 +a3 x3 +· · ·+an xn = b is called a linear equation

in the variables x1 , x2 , x3 , . . ., xn with coefficients a1 , a2 , a3 , . . ., an and
constant term b. In general we can study m linear equations in n variables:
a11 x1 + a12 x2 + ... + a1n xn = b1

a21 x1 + a22 x2 + ... + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bm
and call this a system of linear equations.
Every linear system satisfies one of the following:
• There is no solution
• There is exactly one solution
• There are infinitely many solutions
This seems obvious for the case of two or three variables if we view the equa-
tions geometrically. In this case a solution of a system of linear equations
(of two or three variables) is a point in the intersection of the lines or planes
represented by the equations.
When n = 2: Two lines may intersect at a single point (unique solution), or

not at all (no solution), or they may even be the same line (infinite solutions).
No point of intersection (no solution)
31
One point of intersection (unique solution)
Infinite points of intersection (infinite solutions - intersection in the same line)
When n = 3: Three planes may intersect at a single point or along a common

line or even not at all.
No point of intersection (no solution)
32
One point of intersection (unique solution)
Intersection in a common line (infinite solutions)
Example 1.46. What other examples can you draw of intersecting planes?
33
Chapter 2
Matrices
2.1 Introduction - notation and operations

An m × n matrix A is a rectangular array of numbers consisting of m rows
and n columns. We say A is of size m × n.
We use capital letters to represent matrices, for example:
 
3 2 −1
2 3
A =  1 −1 1 , B=
4 1
2 1 −1
Entries within a matrix are denoted by subscripted lower case letters. For
the matrix A above we have a11 = 3 , a12 = 2, a13 = −1, a21 = 1, a22 = −1,
a23 = 1 and so forth. Here, A is a 3 × 3 matrix
   
3 2 −1 a11 a12 a13
A =  1 −1 1  =  a21 a22 a23 
2 1 −1 a31 a32 a33
where aij = the entry in row i and column j of A.
An m × n matrix can be represented similarly. Note that m denotes number
of rows in the matrix, and n denotes the number of columns.
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= .
 
.. .. .. 
 .. . . . 
am1 am2 . . . amn
For brevity we sometimes write A = [aij ]. This also reminds us that A is a

matrix with elements aij .
A square matrix is an n × n matrix.
34
2.1.1 Operations on matrices
• Equality:
A=B
only when all entries in A equal those in B.
• Addition: Normal addition of corresponding elements. For example:

1 2 7 3 1 4 4 3 11
+ =
2 −1 3 1 2 1 3 1 4
• Multiplication by a number : λA = λ times each entry of A. For

example:
2 3 10 15
5 =
4 1 20 5
• Multiplication of matrices:
  ... e ...
 
... ... ...

...
 ...  ... f ...   ... ... ... 
   
... g ... ... i ...

 a b c d ...  =
  
 
... h ... ... ... ...

... 
   
   
. .
... . . . .. ... . . . .. ...
i = a · e + b · f + c · g + d · h + ...
Note that we can only multiply matrices that fit together. That is, if
A and B are a pair of matrices, then in order that AB make sense,
we must have the number of columns of A equal to the number of
rows of B. We also say that matrices A and B are compatible for
multiplication if A has size m × n and B has size m × r. The product
AB is then a matrix of size n × r.
• Transpose: Flip rows and columns, denoted by [· · · ]T . For example:

 
T 1 0
1 2 7
= 2 3 
0 3 4
7 4
Example 2.1. Does the following make sense?

 
1 7
2 3 
0 2 
4 1
4 1
35
2.1.2 Some special matrices
• The Identity matrix :

 
1 0 0 0 ··· 0

 0 1 0 0 ··· 0 

 0 0 1 0 ··· 0 
I=
 
 0 0 0 1 ··· 0 

 .. .. .. .. .. .. 
 . . . . . . 
0 0 0 0 ··· 1
For any square matrix A we have IA = AI = A.
• The Zero matrix : A matrix whose entries are all zeroes.
• Symmetric matrices: Any matrix A for which A = AT .
• Skew-symmetric matrices: Any matrix A for which A = −AT . Some-

times also called anti-symmetric.
2.1.3 Properties of matrices
• AB 6= BA
• (AB)C = A(BC)
• (AT )T = A
• (AB)T = B T AT

2 3 1 7 2 1
Example 2.2. Given A = B = and C =
4 1 0 2 3 0
verify the above four properties.
36
2.1.4 Inverses of square matrices
A square matrix A is called invertible if there is a matrix B such that

AB = BA = I (where I is the identity matrix). We call B the inverse of
A and write B = A−1 .
In later lectures we will see how to compute the inverse of a square matrix.
2.2 Gaussian Elimination
In previous lectures we introduced systems of linear equations, and briefly

looked at how to solve these. The most efficient method for solving systems
of linear equations is by using Gaussian elimination. This is essentially
the row reduction that we have already encountered, but with a few extra
steps. We will walk through this method using a typical example.
2x + 3y + z = 10 (1)
x + 2y + 2z = 10 (2)
4x + 8y + 11z = 49 (3)
2x + 3y + z = 10 (1)
y + 3z = 10 (2)0 ← 2(2) − (1)
2y + 9z = 29 (3)0 ← (3) − 2(1)
2x + 3y + z = 10 (1)
y + 3z = 10 (2)0
3z = 9 (3)00 ← (3)0 − 2(2)0
Previously we would then solve this system using back-substitution, z =

3, y = 1, x = 2.
Note how we record the next set of row-operations on each equation. This
makes it much easier for someone else to see what you are doing and it also
helps you track down any arithmetic errors.
In this example we found
2x + 3y + z = 10 (1)
y + 3z = 10 (2)0
3z = 9 (3)00
Why stop there? We can apply more row-operations to eliminate terms

above the diagonal. This does not involve back-substitution. This method
37
of row reduction is known as Gaussian elimination.1
Example 2.3. Continue from the previous example and use row-operations
to eliminate the terms above the diagonal. Hence solve the system of equa-
tions.
2.2.1 Gaussian elimination strategy
1. Use row-operations to eliminate elements below the diagonal.

2. Use row-operations to eliminate elements above the diagonal.
3. If possible, re-scale each equation so that each diagonal element = 1.
4. The right hand side is now the solution of the system of equations.
If you stop after step one you are doing Gaussian elimination with back-
substitution (this is usually the easier option).
2.2.2 Exceptions
Here are some examples where problems arise.

Example 2.4. A zero on the diagonal
2x + y + 2z + w = 2 (1)
2x + y − z + 2w = 1 (2)
x − 2y + z − w = −2 (3)
x + 3y − z + 2w = 2 (4)
2x + y + 2z + w = 2 (1)
0y − 3z + w = −1 (2)0 ← (2) − (1)
− 5y + 0z − 3w = −6 (3)0 ← 2(3) − (1)
+ 5y − 4z + 3w = 2 (4)0 ← 2(4) − (1)
1
In some texts, using row operations to eliminate terms below the diagonal only is
known as Gaussian elimination, whereas using row operations to eliminate terms below
and above the diagonal is known as Gauss-Jordan elimination.
38
The zero on the diagonal on the second equation is a serious problem, it
means we can not use that row to eliminate the elements below the diagonal
term. Hence we swap the second row with any other lower row so that we
get a non-zero term on the diagonal. Then we proceed as usual.
2x + y + 2z + w = 2 (1)
− 5y + 0z − 3w = −6 (2)00 ← (3)0
0y − 3z + w = −1 (3)00 ← (2)0
+ 5y − 4z + 3w = 2 (4)0 ← 2(4) − (1)
The result is w = 2, z = 1, y = 0 and x = −1.
Example 2.5. A consistent and under-determined system

Suppose we start with three equations and we wind up with
2x + 3y − z = 1 (1)
− 5y + 5z = −1 (2)0
0z = 0 (3)00
The last equation tells us nothing! We can’t solve it for any of x, y and z.
We really only have 2 equations, not 3. That is 2 equations for 3 unknowns.
This is an under-determined system.
We solve the system by choosing any number for one of the unknowns. Say
we put z = λ where λ is any number (our choice). Then we can leap back
into the equations and use back-substitution.
The result is a one-parameter family of solutions
1 1
x= − λ, y= + λ, z=λ
5 5
Since we found a solution we say that the system is consistent.
Example 2.6. An inconsistent system
Had we started with
2x + 3y − z = 1 (1)
x − y + 2z = 0 (2)
3x + 2y + z = 0 (3)
39
we would have arrived at
2x + 3y − z = 1 (1)
− 5y + 5z = −1 (2)0
0z = −2 (3)00
This last equation makes no sense as there are no finite values for z such
that 0z = −2 and thus we say that this system is inconsistent and that
the system has no solution.
2.3 Systems of equations using matrices
Consider the system of equations
3x + 2y − z = −1
x − y + z = 4
2x + y − z = −1
We can rewrite this system using matrix notation. The coefficients of our
equations form a 3 × 3 matrix A
 
3 2 −1
A =  1 −1 1 
2 1 −1
The variables (x, y and z) can be written as a 3 × 1 matrix (also known as

a column vector) X  
x
X= y 
z
and the right hand side can also be written as a column vector B
 
3
B= 1 

0
Thus our system of equations AX = B becomes

    
3 2 −1 x 3
 1 −1 1  y  =  1 
2 1 −1 x 0
40
Example 2.7. Write the system of equations
3x + 2y − z = −1
x − y + z = 4
2x + y − z = −1
in matrix notation.
2.3.1 The augmented matrix
Consider the system of equations:
2x + 3y + z = 10 (1)
x + 2y + 2z = 10 (2)
4x + 8y + 11z = 49 (3)
The augmented matrix of A is the matrix augmented by the column

vector b.
˜
 
2 3 1 10
[A|b] =  1 2 2 10 
˜ 4 8 11 49
Previously we used Gaussian elimination to solve systems of linear equa-

tions, where we labelled our equations (1), (2), (3) and so forth. It is much
more efficient to set up our system using matrices, and then perform Gaus-
sian elimination on the augmented matrix. Gaussian elimination (using
matrices) consists of bringing the augmented matrix to echelon form us-
ing elementary row operations. This allows us to then solve a much
simpler system of equations.
41
2.4 Row echelon form
A matrix is in row echelon form if it satisfies the following two conditions:
• If there are any zero rows, they are at the bottom of the matrix.
• The first non-zero entry in each non-zero row (called the leading
entry or pivot) is to the right of the pivots in the rows above it.
A matrix is in reduced echelon form if it also satisfies:
• Each pivot entry is equal to 1.
• Each pivot is the only non-zero entry in its column.
Example 2.8. Write down three matrices in echelon form and circle the
pivots.
Example 2.9. Write down three matrices in reduced echelon form.
The variables corresponding to the columns containing pivots are called the
leading variables. The variables corresponding to the columns that do not
contain pivots are called free variables. Free variables are not restricted
by the linear equations - they can take arbitrary values, and we often denote
these by Greek letters (such as α, β and so forth). The leading variables are
then expressed in therms of the free variables.
Example 2.10. For the following linear systems

(a) Write down the augmented matrix and bring it to echelon form.
(b) Identify the free variables and the leading variables.
(c) Write down the solution(s) if any exist.
42
(d) Give a geometric interpretation of your results.
(i)
x+y =1
x − 2y = 4
(ii)
x−y =1
x−y =2
(iii)
x−y =1
3x − 3y = 3
Example 2.11. Consider the linear system
x1 + 3x2 + 3x3 + 2x4 = 1

2x1 + 6x2 + 9x3 + 5x4 = 1
−x1 − 3x2 + 3x3 = k
(a) Write down the augmented matrix and bring it to echelon form.
(b) Identify the free variables and the leading variables.
(c) For what values of the number k does the system have (i) no solution,
(ii) infinitely many solutions, (iii) exactly one solution?
(d) When a solution or solutions exist, find them.
43
2.4.1 Rank
The rank of a matrix is the number of non-zero rows (also the number of
pivots) in its row echelon form. The rank of a matrix is denoted by rank(A).
The rank of a matrix gives us information about the solutions of the associ-
ated linear system.
Importantly, if the number of rows in the augmented matrix is equal to
the rank of the matrix, then the system of linear equations has a unique
solution.
A linear system of m equations in n variables will give an m × n matrix A.
Once we have reduced the matrix to echelon form, and found the rank = r
of the reduced matrix (let’s call the reduced matrix U ) we can deduce the
following informative properties:
Properties
1. Number of variables = n
2. Number of leading variables = r
3. Number of free variables = n − r
4. r ≤ m (because there is at most one pivot in each of the m rows of
U ).
5. r ≤ n (because there is at most one pivot in each of the n columns of
U ).
6. If r = n there are no free variables and there will be either no solution
or one solution.
7. If r < n there is at least one free variable and there will be either no
solution or infinitely many solutions.
8. If there are more variables than equations, that is n > m, then r < n
and so there will be either no solution or infinitely many solutions.
Example 2.12. What is the rank of each of the matrices in the previous
examples?
44
2.4.2 Homogeneous systems
A homogeneous system is one of the form Ax = 0. The augmented matrix

is therefore [A|0] and its echelon form is [U |0]. ˜The˜last non-zero row cannot
be [0 0 . . . 0 d], ˜d 6= 0, so a homogeneous system
˜ is never inconsistent. In fact
x = 0 is always a solution. Geometrically, the lines, planes or hyperplanes
˜represented
˜ by the equations in a homogeneous system all pass through the
origin.
2.4.3 Summary
When we reduce a matrix to echelon form, we do so by performing elemen-

tary row operations. On a matrix, these operations are
• Interchange two rows (which we denote by Ri ↔ Rj ).
• Multiply one row by a non-zero number (cRi → Ri ).
• Add a multiple of one row to another row (Ri + cRj → Ri ).
Every matrix can be brought to echelon form by a sequence of elementary

row operations using Gaussian elimination. This is sometimes given as
an algorithm:
Gaussian Elimination
1. If the matrix consists entirely of zeros, stop (it is in echelon form).
2. Otherwise, find the first column with a non-zero entry (say a) and
use a row interchange to bring that entry to the top row.
3. Subtract multiples of the top row from the rows below it so that
each entry below the pivot a becomes zero. (This completes the first
row. All subsequent operations are carried out on the rows below it.)
4. Repeat steps 1 to 3 on the remaining rows.
45
A linear system of equations Ax = b, or AX = B, can be written in the
general form: ˜ ˜
 
  x1  
a11 a12 ... a1n  x2  b1
 a21 a22 ... a2n     b2 
   ...  =  
 ... ... ... ... 
   ... 
 ... 
am1 am2 ... amn bm
xn
where A is the m × n matrix of coefficients, x (or X) is the n × 1 matrix

˜ m × 1 matrix (or column
(or column vector) of variables, and b (or B) is the
vector) of constant terms. ˜
The augmented matrix is the matrix

 
a11 a12 ... a1n b1
 a21 a22 ... a2n b2 
[A|b] = 
 ...

˜ ... ... ... ... 
am1 am2 ... amn bm
In order to solve a general linear system Ax = b we:

˜ ˜
1. Bring the augmented matrix to echelon form: [A|b] → [U |c]. Since
˜ systems
each elementary row operation is reversible, the two ˜ Ax = b
and U x = c have exactly the same solutions. ˜ ˜
˜ ˜
2. Solve the triangular system U x = c by back-substitution.
˜ ˜
For a general linear system Ax = b or its corresponding triangular form

˜ ˜
U x = c there are three possibilities.
˜ ˜
1. There is no solution - this happens when the last non-zero row of
[U |c] is [0 0 . . . 0 d ] with d 6= 0, in which case the equations are in-
˜
consistent.
2. There are infinitely many solutions - this happens when the equa-
tions are consistent and there is at least one free variable.
3. There is exactly one solution - this happens when the equations are
consistent and there are no free variables.
46
2.5 Matrix Inverse
Suppose we have a system of equations

a b x u
=
c d y v
and that we write in the matrix form
AX = B
Can we find another matrix, call it A−1 , such that
A−1 A = I = the identity matrix
If so, then we have
A−1 AX = A−1 B ⇒ X = A−1 B
Thus we have found the solution of the original system of equations.
For a 2 × 2 matrix it is easy to verify that

−1
−1 a b 1 d −b
A = =
c d ad − bc −c a
Note that not all matrices will have an inverse. For example, if

a b
A=
c d
then
−1 1 d −b
A =
ad − bc −c a
and for this to be possible we must have ad − bc 6= 0.
In later lectures we will see some different methods for computing the inverse
A−1 for other (square) matrices larger than 2 × 2.

1 2
Example 2.13. If A = , then A−1 =
3 4
47
Properties of inverses
• A square matrix has at most one inverse.

Proof: If B1 and B2 are both inverses of A, then AB1 = B1 A = I and
AB2 = B2 A = I. So B1 = B1 I = B1 (AB2 ) = (B1 A)B2 = IB2 = B2 .
So the inverse of A is unique.
• If A is invertible, then so is AT and (AT )−1 = (A−1 )T .
• If A is invertible, then so is A−1 and (A−1 )−1 = A.
• If A and B are invertible matrices of the same size, then AB is invert-

ible and (AB)−1 = B −1 A−1 .
−1 −1
• (A1 A2 . . . Am )−1 = A−1
m . . . A2 A1 .

a b d −b
• If A = and ad − bc 6= 0, then A−1 = 1
ad−bc .
c d −c a
• Cancellation laws: If A is invertible, then
– AB = AC implies B = C (just multiply on the left by A−1)

– BA = CA implies B = C (just multiply on the right by A−1 ).
– BAC = DAE does not imply BC = DE.
• Solving systems: Let A be n × n and invertible. Then the linear

system Ax = b always has exactly one solution, namely x = A−1 b.
˜ ˜ ˜ ˜
• Rank test: An n × n matrix A is invertible if and only if it has full
rank r = n.
2.6 Matrix Transpose
If A is a matrix of size m × n then the transpose AT of A is the n × m

matrix defined by AT (j, i) = A(i, j).

2 3 4
Consider the matrix A = . The transpose of A, namely AT is
0 1 5
simple to find. Row 1 of matrix A becomes column 1 of matrix T
 A ,and row
2 0
2 of matrix A becomes column 2 of matrix AT . Thus AT = 3 1.
4 5
48
 
1 1 1 0 0
1 1 1
Example 2.14. Let B = and C = 0 1 2 1 0.
1 2 5
2 0 1 1 0
Find B T and C T .
Note the following:
• (AT )T = A
• (cA)T = cAT
• (A + B)T = AT + B T
• (AB)T = B T AT

2 3
Example 2.15. Verify the above using matrices A = and B =
4 5

1 2
.
3 4
2.7 Determinants
The determinant function det is a function that assigns to each n×n matrix
A a number det A called the determinant of A. The function is defined
as follows:
• If n = 1: A = [a] and we define det A := a.

a b
• If n = 2: A = and we define det A := ad − bc.
c d
49
• If n > 2: It gets a bit complicated now, but it is not too bad. Firstly
create a sub-matrix Sij of A by deleting the ith row and the j th column.
Then define
det A := a11 det S11 − a12 det S12 + a13 det S13 − · · · ± a1n det S1n
The quantity det Sij is called the minor of entry aij and is denoted Mij .
The number (−1)i+j Mij is called the cofactor of entry aij . Thus to compute
det A you have to compute a chain of determinants from (n − 1) × (n − 1)
determinants all the way down to 2 × 2 determinants.
This method of defining and evaluating det A is called Laplace’s expan-
sion along the first row. We can, in fact, use any row (or any column)
to calculate det A.
We often write det A = |A|.
Example 2.16. Compute the determinant of

 
1 7 2
A= 3 4 5 
6 0 9
When we expand the determinant about any row or column, we must observe
the following pattern of ± signs (these correspond to the (−1)i+j in Cij -
check!).  
+ − + − + − ···
 − + − + − + ··· 
 
 + − + − + − ··· 
− + − + − + ···
This is best seen in an example.
Example 2.17. By expanding about the second row compute the determi-
nant of  
1 7 2
A= 3 4 5 
6 0 9
50
Example 2.18. Compute the determinant of
 
1 2 7
A= 0 0 3 
1 2 1
2.7.1 Properties of determinants
• If we interchange two rows (or two columns) of A the resulting matrix

has determinant equal to −det A.
• If we add a multiple of one row to another row (similarly for columns),

the resulting matrix has determinant equal to det A.
• If we multiply a row or column of A by a scalar α, the resulting matrix

has determinant equal to α(det A).
• If A has a row or column of zeros, then det A = 0.
• If two rows (or columns) of A are identical, then det A = 0.
• For any fixed i = 1, . . . , n we have det A = ai1 Ci1 +ai2 Ci2 +· · ·+ain Cin .
• For any fixed j = 1, . . . , n we have det A = a1j C1j + a2j C2j + · · · +

anj Cnj .
Determinant test: An n×n matrix A is invertible if and only if det(A) 6= 0.
2.7.2 Vector cross product using determinants
The rule for a vector cross product can be conveniently expressed as a de-
terminant. Thus if v = vx i + vy j + vz k and w = wx i + wy j + wz k then
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜
i j k
v×w = ˜ ˜ ˜
vx vy vz
˜ ˜ wx wy wz
51
2.7.3 Cramer’s rule
Recall that if a linear system Ax = b has a unique solution, then x =

˜
A−1 b is this solution. If we substitute ˜ formula for the inverse A−1 from
the ˜
the ˜previous section (using det Sji ) into the product A−1 b we arrive at
Cramer’s rule for solving the linear system Ax = b. ˜
˜ ˜
Cramer’s rule: Let Ax = b be a linear system with a unique solution. This
means that A is a square ˜ matrix
˜ with non-zero determinant. Let Ai be the
matrix that results from A by replacing the ith column of A by b. Then
˜
detAi
xi =
detA
Examples of Cramer’s rule will be given in tutorials.
2.8 Obtaining inverses using Gauss-Jordan elimi-

nation
The most efficient method for computing the inverse of a matrix is by
Gauss-Jordan elimination which we have met earlier.
• Use row-operations to reduce A to the identity matrix.

• Apply exactly the same row-operations to a matrix set initially to the
identity.
• The final matrix is the inverse of A.
We usually record this process in a large augmented matrix.
• Start with [A|I].

• Apply row operations to obtain I|A−1

• Read off A−1 , the inverse of A.
Recall that some texts use the term Gaussian elimination to refer to re-
ducing a matrix to its echelon form, and the term Jordan elimination to
refer to reducing a matrix to its reduced echelon form. In this manner, the
Gauss-Jordan algorithm can be described diagrammatically as follows:
−→ −→
[A|I] G.A [U |∗] J.A [I|B] where B = A−1 .
In words, provided A has rank n:
52
• augment A by the identity matrix;
• perform the Gaussian algorithm to bring A to echelon form U and

[A|I] to [U |∗];
• perform the Jordan algorithm to bring U to reduced echelon form I

and [U |∗] to [I|B] (in other words use elementary row operations to
make the pivots all 1s and to produce zeros above the pivots);
• then B = A−1 .
Example 2.19. Use the Gauss-Jordan algorithm to invert the following

matrix A.  
1 1 3 1 0 0
0 2 1 | 0 1 0
1 4 4 0 0 1
Example 2.20. Solve the linear system
x + y + 3z = 2
2y + z = 0
x + 4y + 4z = 1
53
2.8.1 Inverse - another method
Here is another way to compute the inverse of a matrix.
• Select the ith row and j th column of A.

det Sij
• Compute (−1)i+j det A
• Store this entry at aji (row j and column i) in the inverse matrix.
• Repeat for all other entries in A.
That is, if
A = [ aij ]
then
1
A−1 = (−1)i+j det Sji

det A
This method works but it is rather tedious.
————————————————–
54
Chapter 3
Calculus
Reference books
1. G. James, Modern Engineering Mathematics, 4th edition, Prentice

Hall, 2007.
2. J. Stewart, Calculus Early Transcendentals, 7th edition, Cengage Learn-

ing, 2012.
3.1 Differentiation
3.1.1 Rate of change
Differentiation is the mathematical method that we use to study the rate

of change of physical quantities.
Let us look at this with an example. Consider a point P that is moving
with a constant speed v along a straight line. Let s be the distance moved
by the point after time t. The distance moved after time t is given by the
formula s = vt . If ∆t denotes a finite change in time t, the corresponding
change of distance is given by ∆s = v∆t.
The rate of change of s with t is then simply
change of s ∆s
= = v = average speed over the time interval ∆t.
change in t ∆t
55
Suppose now that the speed of P varies with time. By making ∆t become
very small, i.e. taking the limit as ∆t → 0, we define the derivative of s
with respect to t at time t as the rate of change of s with respect to t, as
t → 0.
If we let ds and dt be the infinitesimal changes in s and t, then we can
write:
ds ∆s
v= = lim = instantaneous speed of P at time t.
dt ∆t→0 ∆t
3.1.2 Definition of the derivative f 0 (x) and the slope of a

tangent line
Consider the graph of the function y = f (x) of the single variable x shown
in the plot below.
We will now compare the average rate of change of f (x) with the derivative
of f (x).
Let ∆f = f (x + ∆x) − f (x) be the change in f as we go from point P to
Q and as x changes from x to x + ∆x. The average rate of change of the
∆f
function f (x) on the interval ∆x is . This is the slope of the chord P Q.
∆x
The derivative of f (x) at the point x is defined as

df ∆f f (x + ∆x) − f (x)
= f 0 (x) = lim = lim .
dx ∆x→0 ∆x ∆x→0 ∆x
The derivative is the slope (or gradient) of the local tangent line to the
curve y = f (x) at the point P . That is, f 0 (x) = tan θ, where θ is the angle
between the two dashed lines.
56
The derivative f 0 (x) is thus the instantaneous rate of change of f with
respect to x at the point P .
Example 3.1. Use the definition of the derivative to obtain from first
principles the value of f 0 (x) for the function f (x) = 25x − 5x2 at x = 1.
Find the equation of the tangent line to the graph of y = f (x) at the point
(1, 20) in the xy-plane.
Solution:

df f (x + ∆x) − f (x)
= lim
dx ∆x→0 ∆x
=
57
3.1.3 Techniques of differentiation - rules
Most mathematical functions are readily differentiable without the need to

resort to the first principles definition. It is simply a matter of applying one
or more of the rules of differentiation which are collected in the following
table. It is assumed that c and n are constants.
Description Function Derivative
Constant f (x) = c f 0 (x) = 0
Power of x f (x) = xn f 0 (x) = nxn−1
Multiplication by a constant c cf (x) d

dx (cf (x)) = cf 0 (x)
Sum (or difference) of functions f (x) ± g(x) f 0 (x) ± g 0 (x)
Product of functions f (x)g(x) f (x)g 0 (x) + g(x)f 0 (x)

f (x) g(x)f 0 (x) − f (x)g 0 (x)
Quotient of functions
g(x) (g(x))2
Chain rule for composite functions

If u = g(x) and y = f (u) so that y = f (g(x)) then
dy dy du
= = f 0 (u)g 0 (x)
dx du dx
Example 3.2. (a) Find the derivative of f (x) = x3 + 2x2 − 5x − 6 with

respect to x.
(b) Find the derivative of y = (x5 + 6x2 + 2)(x3 − x + 1) with respect to x.
58
x2 + 1
(c) Find the derivative of f (x) = with respect to x.
x2 − 1
dy
(d) Use the chain rule to find when
dx
(i) y = (2x + 3)5
p
(ii) y = (3x2 + 1)
59
3.2 Maximum and minimum of functions
The derivative of a function f (x) tells us important information regarding
the graph of y = f (x).
• If f 0 (x) > 0 on the interval [a, b] then the function f (x) is increasing
on that interval.
• If f 0 (x) < 0 on the interval [a, b] then the function f (x) is decreasing
on that interval.
• If f 0 (x) = 0 on the interval [a, b] then the function f (x) is constant
on that interval.
A function f (x) has a local maximum at x = c if f (x) ≤ f (c)

for values of x in some open interval containing c. (Note an ‘open’
interval is an interval NOT including the end points, i.e. the interval
(a, b) rather than [a, b].)
A function f (x) has a local minimum at x = c if f (x) ≥ f (c) for

values of x in some open interval containing c.
Note that the interval endpoints cannot correspond to a local maximum or

a local minimum. Can you see why this is so?
Example 3.3. Identify the local maxima and minima on the graph of the
function below.
60
How do we find the local maxima and minima?
Local maxima and local minima occur where the derivative of the function
is zero. They can also occur where the derivative does not exist (consider
the function f (x) = |x| at the point x = 0. For the function f (x) we define
the extrema, or critical points, as the points x = c such that:
• f 0 (c) = 0, or
• f 0 (c) does not exist.
It is important to note that f 0 (c) = 0 does not imply that the function f (x)
must have a local maximum or minimum at x = c. Consider the function
f (x) = x3 at x = 0 to explore this further. Thus having f 0 (c) = 0 is only a
necessary requirement, rather than a sufficient requirement for the existence
of local maxima or minima.
We can also note the following with regards to the graph of f (x).
• At a point on the graph of the function f (x) corresponding to a local

maximum, the function changes from increasing to decreasing.
• At a point on the graph of the function f (x) corresponding to a local

minimum, the function changes from decreasing to increasing.
Note that the tangent line (if it exists) is horizontal at the point x = c
corresponding to either a local maximum or a local minimum.
The First Derivative Test

Using this test is simple. All we do is look at the sign of the derivative at
each side of the critical point c:
• If f 0 (x) changes from positive to negative (i.e. f (x) changes from

increasing to decreasing) then f (x) has a local maximum at x = c.
• If f 0 (x) changes from negative to positive (i.e. f (x) changes from

decreasing to increasing) then f (x) has a local minimum at x = c.
• If f 0 (x) does not change, then f (c) is neither a maximum nor a mini-
mum value for f (x).
61
In summary to find the local extrema we
• Find all critical points.

• For each critical point, decide whether it corresponds to a local
maximum or minimum (or neither) using the First Derivative Test.
Example 3.4. Find the local extrema for the function f (x) = x3 − 5x2 −
8x + 7 over the interval R.
Solution: First we find the critical points by differentiating the function and
solving for x when f 0 (x) = 0.
Next we inspect the critical points found above.
Using the First Derivative Test there is a local maximum at x = and

a local minimum at x = .
The corresponding values of the function at the local extrema are f ( ) =

and f ( ) = .
62
Absolute (Global) Maximum and Minimum
Since we have been talking about local extrema, we must also mention ab-
solute (global) extrema.
• A function f (x) has an absolute minimum at x = c if f (x) ≥ f (c)

for all x in the domain [a, b] for a ≤ c ≤ b.
• A function f (x) has an absolute maximum at x = c if f (x) ≤ f (c)

for all x in the domain [a, b] for a ≤ c ≤ b.
The Extreme Value Theorem states that if a function f (x) is continuous

on a closed interval [a, b] then f (x) obtains an absolute maximum and an
absolute minimum at some points in the interval.
Note that the interval [a, b] must be a closed interval. Why is this necessary?
Example 3.5. What happens with the Extreme Value Theorem if a func-
tion f (x) is not continuous?
63
To find the absolute extrema of a continuous function on a closed
interval:
1. Find the values of the function at all critical points in the interval.
2. Find the values of the function at the end points of the interval.
3. Compare all of these for maximum / minimum.
To find the absolute extrema of a continuous function over an open

interval (including R):
1. Find the values of the function at all critical points in the interval.
2. Find the limit of the function as x approaches the endpoints of
the interval (or ±∞).
3. Compare all of these for maximum / minimum.
2
Example 3.6. Find the absolute extrema for the function f (x) = e−x .
64
3.3 Differentiating inverse, circular and exponen-
tial functions
3.3.1 Inverse functions and their derivatives
The inverse of a function f is the function that reverses the operation done
by f . The inverse function is denoted by f −1 . It satisfies the relation
y = f (x) ⇔ x = f −1 (y).
Here ⇔ means ‘implies in both directions’. Since x is normally chosen

as the independent variable of a function and as x is always plotted on the
horizontal axis in the xy-plane, the graph of the inverse function of y = f (x)
is defined by the relation
y = f −1 (x) ⇔ x = f (y).
In practice, to obtain the inverse function f −1 to a given function y = f (x),

we
• solve this equation to obtain x in terms of y
• interchange the labels x and y to give y = f −1 (x)
Note that f f −1 (x) = x = f −1 f (x) .

It is possible to plot the graphs of y = f (x) and y = f −1 (x) on the same

diagram. In this case, the graph of y = f −1 (x) is the mirror image of the
graph of y = f (x) in the line y = x .
Example 3.7. Find the inverse function of y = f (x) = 51 (4x − 3). A sketch
of f (x) is given below. Note that y = x is the thin line shown in the diagram.
Sketch f −1 (x) on the same axis.
65
3.3.2 Exponential and logarithmic functions: ex and ln x
A very important example of a function and its inverse are the exponential
function y = ex and the natural logarithm function y = ln x. From the
definition of the inverse function we have
y = f (x) = ex ⇔ x = ln y
Now re-labelling x and y, we obtain f −1 (x) = ln x as the inverse function of

f (x) = ex . We note from the definition y = ex ⇔ x = ln y, that
• ln(ex ) = ln(y) = x
• eln y = ex = y
This explicitly demonstrates the inverse behaviour of ex and ln x.

As an illustration, since e0 = 1, we have ln 1 = ln e0 = 0. This means that
as the point (0, 1) lies on the graph of y = ex , the point (1, 0) must lie on the
graph of y = ln x. This feature is seen in the graphs of y = ex and y = ln x
given below.
In general, for any function f (x) since b = f (a) ⇔ a = f −1 (b), it follows
that if the point (a, b) lies on the graph of y = f (x), then (b, a) is a point
on the graph of y = f −1 (x).
66
Sometimes we will need to restrict the domain of a function in order to find
its inverse.
Example 3.8. Find the inverse function f −1 (x) of f (x) = x2 by first re-
stricting the domain to [0, ∞).
Derivative rule for inverse functions
dy 1 1
If y = f −1 (x) ⇔ x = f (y), then = = 0
dx dx/dy f (y)
Example 3.9. Find the derivative of the function f (x) and its inverse
1
f −1 (x) for f (x) = (4x − 3). Check that the answers satisfy the deriva-
5
tive rule for inverse functions.
67
d 1
Example 3.10. Show that (ln x) = given that ex is the inverse func-
dx x
d x x
tion of ln x and (e ) = e .
dx
Solution: Put y = ln x. This implies x = ey .

d d y dy 1 1 1
Therefore (x) = (e ) = ey and hence = = y = .
dy dy dx dx/dy e x
68
3.3.3 Derivatives of circular functions
Circular (or trigonometric) functions sin x, cos x, tan x, etc arise in prob-
lems involving functions that are periodic and repetitive, such as those that
describe the orbit of a planet about its parent star. Here we acquaint our-
selves with the derivatives of such functions.
Example 3.11. Sketch the graphs of f (x) = sin x and g(x) = cos x on the
same diagram for the interval 0 ≤ x ≤ 3π. Use the tangent line method to
estimate the values of f 0 (x) at the points x = 0, π2 , π, 3π
2 , 2π, . . . on the same
diagram. Do these values seem to match the curve g(x)?
69
Before examining the derivatives of circular functions in more detail, we
consider two basic inverse circular functions sin−1 x and tan−1 x.
Note that the alternative notation for inverse circular functions is: sin−1 x =
arcsin x, cos−1 x = arccos x and tan−1 x = arctan x.
Example 3.12. The graphs of y = sin x and y = tan x are shown below by
the heavy curves for the restricted domains [− π2 , π2 ], i.e. − π2 ≤ x ≤ π2 and
(− π2 < x < π2 ), respectively.
Sketch the inverse functions sin−1 x and tan−1 x using mirror reflection
across the line y = x.
The domain of sin−1 x is
The range of sin−1 x is
The domain of tan−1 x is
The range of tan−1 x is
y = sin x y = tan x
Note: The reason for the use of a restricted domain in specifying inverse
circular functions is that if all of the graph of y = sin x or tan x were naively
reflected across the line y = x, there would be more than one choice (in
fact there would be an infinite number of choices) for the ordinate (y) value
of the inverse function. We saw this at work in a previous example. A
function may haveonly a single value for each x in its domain. We
also note that tan π2 and tan − π2 are not defined (±∞).
70
The values of the derivatives of the six basic circular (i.e. trigonometric)
functions are shown in the table below, along with the derivatives of the
three main inverse circular functions sin−1 x, cos−1 x and tan−1 x. Also
listed are the derivatives of the basic exponential and logarithm functions.
Table of the derivatives of the basic functions of calculus
Original function f Derivative function f 0
sin x cos x
cos x − sin x
tan x sec2 x ≡ 1 + tan2 x
cosec x ≡ 1/ sin x −cosec x · cot x
sec x ≡ 1/ cos x sec x · tan x
cot x ≡ 1/ tan x −cosec2 x

1
sin−1 x domain: − 1 ≤ x ≤ 1 (i.e. |x| ≤ 1) √
1 − x2
1
cos−1 x domain: − 1 ≤ x ≤ 1 (i.e. |x| ≤ 1) −√
1 − x2
1
tan−1 x domain: − ∞ < x < ∞
1 + x2
ex ex
1
ln x domain: x > 0
x
dy
Example 3.13. Find when y is given by
dx
(a) sin(2x + 3)
71
(b) x2 cos x
(c) x tan(2x + 1)
(d) tan−1 x2
d 1
sin−1 x = √

(e) Prove the differentiation formula .
dx 1 − x2
dy
(f) Find when y = arcsin(e2x )
dx
72
d −1
cos−1 x = √

(g) Prove the differentiation formula .
dx 1 − x2
ds
(h) Find when s = ln(tan(2t))
dt
dg √
(i) Find when g = t sin−1 (t2 )
dt
dg 5x3 + 3x
(j) Find when g = 2
dx (x + 3)2
73
3.4 Higher order derivatives
df
If f (x) is a differentiable function, then its derivative f 0 (x) = is also a
dx
function and so may have a derivative itself. The derivative of a derivative
is called the second derivative and is denoted by f 00 (x). There are various
ways it can be written:
d2 f

00 d 0 d df
f (x) = f (x) = = = f (2) (x)
dx dx dx dx2
The second derivative, f 00 (x) can be differentiated with respect to x to yield

d3 f
the third derivative f 000 (x) = = f (3) (x). And so on!
dx3
dn f
In general, the nth derivative of f (x) is denoted by or f (n) (x).
dxn
Interpretation:
Earlier we used the first derivative to find local maxima and local minima.
We can also use the second derivative at x = c to find these.
• If f 00 (c) > 0 then the function has a local minima at c.
• If f 00 (c) < 0 then the function has a local maxima at c.
• If f 00 (c) = 0 then the test is inconclusive and we cannot determine if

there is a local maxima or minima at c.
The second derivative f 00 (x) also measures the rate of change of the first
derivative f 0 (x). As f 0 (x) is the gradient or slope of the tangent line to the
graph of y = f (x) in the xy-plane, we see that:
d2 f df
(i) if 2
> 0 then increases with increasing x and the graph of
dx dx
y = f (x) is said to be locally concave up.
d2 f df
(ii) if 2
< 0 then decreases with decreasing x and the graph of
dx dx
y = f (x) is said to be locally concave down.
74
Example 3.14. If f (x) = x3 − 3x + 1 find the first four derivatives f 0 (x),
f 00 (x), f (3) (x) and f (4) (x). Determine for what values of x the curve is
concave up and concave down. Also locate the turning points (a, f (a))
given where f 0 (a) = 0. Mark these features on the graph of y = f (x) shown
below.
Example 3.15. Find the second derivative of the function y = e−x sin 2x.
75
ln x
Example 3.16. Find the second derivative of the function y = .
x
√
Example application: Find the point on the graph of x, x ≥ 0 closest
to (2, 0).
76
3.5 Parametric curves and differentiation
3.5.1 Parametric curves
The equation that describes a curve C in the Cartesian xy-plane can some-
times be very complicated. In that case it can be easier to introduce an
independent parameter t, so that the coordinates x and y become functions
of t. We explored this when we looked at the vector equations of lines and
planes. That is, x = f (t) and y = g(t). The curve C is parametrically
represented by
C = {(x, y) : x = f (t) y = g(t) t1 ≤ t ≤ t2 }

As the independent parameter t goes from t1 to t2 , the point P = (x(t), y(t)
on the curve C moves from P1 = (x1 , y1 ) to P2 = (x2 , y2 ).
Example 3.17. What do the following parametric curves represent?

(a) C = {(x, y) : x = 2 cos(t), y = 2 sin(t), 0 ≤ t ≤ 2π}
y = 4t2 ,

(b) C = (x, y) : x = 2t, −∞ < t < ∞
77
(c) C = {(x, y) : x = 5 cos(t), y = 2 sin(t), 0 ≤ t ≤ 2π}
(d) C = {(x, y) : x = t cos(t), y = t sin(t), t > 0}
3.5.2 Parametric differentiation

dy
it is now a natural progression to ask what is the value of the slope at
dx
the point (x(t), y(t) on the curve. This is given by
dy dy dx g 0 (t) df dg
= / = 0 where f 0 (t) = and g 0 (t) = .
dx dt dt f (t) dt dt
Example 3.18. Sketch the curve represented parametrically by:

C = {(x, y) : x = a cos t y = a sin t 0 ≤ t ≤ 2π} .
78
dy
Find the derivative function . Find the equation of the tangent line to
dx
π
the curve at the point corresponding to t = and draw this on the sketch
4
for the case a = 2.
Solution: Here f (t) = a cos t, g(t) = a sin t. Therefore x2 +y 2 = a2 (cos2 t+
sin2 t) = a2 . The curve C is thus a circle of radius a centred at the origin.
As t goes from 0 → 2π, the circle is described once, in the positive direction,
starting at the point (2, 0). Now, f 0 (t) = −a sin t = −y, g 0 (t) = a cos t = x
and so
g 0 (t)

dy dy dx x a cos t
= = 0 = =− = − cot t
dx dt dt f (t) −y −a sin t
π π π a π π a
At t = ,x = a cos =√ ,y = a sin = √ , and so
4 4 4 2 4 4 2
√
dy a/ 2
= − √ = −1.
dx a/ 2
Tangent Line: The equation of a straight line of slope m which passes

through the point (x1 , y1 ) is
y − y1 = m(x − x1 )
Now taking a = 2 and collecting values, we have:
79
x1 = y1 = m=
and thus the required answer is:
Example 3.19. Consider the curve represented parametrically by:
C = (x, y) : x = 1 + 3t2 , y = 1 + 2t3 ,

−∞ < t < ∞ .
dy
(i) Find as a function of t
dx
dy
(ii) Evaluate at t = 1 and find the tangent line to the curve at this point.
dx
Example 3.20. Find the equation of the tangent line to

π
x = 5 cos(t) y = 2 sin(t) for 0 ≤ t ≤ 2π at t = .
4
80
3.6 Function approximations
We are now going to take a step in an interesting direction, and look at how
to approximate a function by a number of different methods.
3.6.1 Introduction to power series
A geometric sequence an (n = 0, 1, 2, 3, . . .) is one in which the ratio of

successive terms, namely an+1 /an is a constant, say r. That is, an+1 /an = r.
Thus a1 = ra0 , a2 = ra1 = r2 a0 , and so on. We write:
a0 , a1 , a2 , a3 , . . . = a0 , ra0 , r2 a0 , r3 a0 , . . .
A finite geometric series consists of the sum Sn of the first n terms of the
geometric sequence. Setting the initial term a0 = a, a constant, we have:
Sn = a + ar + ar2 + ar3 + · · · + arn−1 (i)
We can easily find Sn . To do this we multiply both sides of Equation (i) by

r to obtain:
rSn = ar + ar2 + ar3 + ar4 + · · · + arn (ii)
Subtracting Equation (ii) from (i) then yields:
(1 − r)Sn = a − arn
Hence as long as r 6= 1, the sum of the first n terms of a geometric series is:
n−1
X a(1 − rn )
Sn = ark = (iii)
1−r
k=0
For the particular case of r = 1, we see from Equation (i) that Sn = an.
81
Example 3.21. Find the geometric series of the following sequence
1 1 1 1
1, , , , ,...
2 4 8 16
(i) when n = 3, i.e. find S3 .
(ii) when n = 5, i.e. find S5 .
(iii) what happens as n → ∞?
Example 3.22. A nervous investor is deciding whether to invest a sum of

$P0 in a company that is advertising a high fixed interest rate of I% for
the next N years. To allay all fear it is agreed that at the end of each year,
he/she can withdraw all principal less the interest earned on that year. That
interest is then used as principal for the next year’s investment. The com-
pany is to pocket the interest on the last year of the plan as a penalty. What
is the total value PN of his/her asset at the end of the final year? Calculate
PN for the case where P0 = 100000, I = 25% and N = 10.
Solution: At the end of the first year, the value of the investment is
P1 = P0 + iP0 .
82
Year no. Investment value Investment value Amount withdrawn
at start of year at end of year
1 P0 P0 + iP0 P0
2 iP0 iP0 + i2 P0 iP0
3
.. .. .. ..
. . . .
N −1
N iN −1 P0 iN −1 P0 + iN P0
Hence at the end of year N the investor has got back a total sum
SN = PN = P0 + iP0 + i2 P0 + . . . + iN −1 P0 =
Substituting numerical values, the final value of the investment for the ner-
vous investor is: $133, 333.21.
Final return for the bank is:
83
3.6.2 Power series
We have seen that a finite geometric series of n terms has the sum
n−1
2 3 n−1
X a(1 − rn )
Sn = a + ar + ar + ar + · · · + ar = ark = .
1−r
k=0
Suppose we allow n to become very large. Then provided that −1 < r < 1,
n 1 n→∞

we have r → 0 as n → ∞. For example 2 = 0. Now setting a = 1,
r = x and taking n → ∞, it follows that
∞
1 X
= 1 + x + x2 + x3 + · · · + xn−1 + · · · = xk . (3.1)
1−x
k=0
The right hand side of Equation (3.1) is called a power series in the variable
x, and is represented using a so-called infinite sum.1 Here, this power series
1
evaluates to the function f (x) = .
1−x
Example 3.23. Two trains 200 km apart are moving toward each other.
Each one is going at a constant speed of 50 kilometres per hour. A fly
starting on the front of one of them flies back and forth between them at
a rate of 75 kilometres per hour (fast fly!). The fly does this until the two
trains collide. What is the total distance the fly has flown?
Power series
A general power series in the variable x has the form
∞
X
an xn = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · · (3.2)
n=0
where a0 , a1 , a2 , . . . are constants. Putting a0 = 1, a1 = 1, a2 = 1, . . . etc,

we recover the geometric power series that is defined in Equation (3.1).
1
A series is an object which allows us to give rigorous meaning to the P concept of
∞
‘infinite sum’. To be precise,
Pn for a sequence b 0 , b1 , . . . , its series is defined as n=0 bn =
limn→∞ Sn , where Sn = i=0 bi is called its partial sum. A power series is just a special
case of a series, namely take bn = an xn . In this course we will not cover series in detail,
only power series.
84
A power series is actually a limit, and hence it may not converge for all
x ∈ (−∞, ∞). However, if x is taken to be sufficiently small, namely
−R < x < R, the power series will exist as a function of x, which we will call
f (x). The largest R for which this occurs is called the radius of conver-
gence, and guarantees that the power series f (x) exists for −R < x < R.
In fact, it may even exist at x = R or x = −R. This leads us to the idea of
representing continuous functions of x by a power series. That is, we have
∞
X
f (x) = an xn = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn + · · · (3.3)
n=0
where the domain of f (x) is either (−R, R), [−R, R), (−R, R] or [−R, R].
Table of Useful Power Series
Power series Domain

1
= 1 + x + x2 + x3 + · · · + xn + · · · −1 < x < 1
1−x
1
= 1 − x + x2 − x3 + · · · + (−1)n xn + · · · −1 < x < 1
1+x
1 2 1 3 1 n
ex = 1 + x + 2! x + 3! x + ··· + n! x + ··· −∞ < x < ∞
ln(1 + x) ≡ loge (1 + x) = x − 21 x2 + 31 x3 − 14 x4 + · · · −1 < x ≤ 1

xn+1
+(−1)n + ···
n+1
1 3 1 5 1 7 x2n+1
sin x = x − 3! x + 5! x − 7! x + · · · + (−1)n + ··· −∞ < x < ∞
(2n + 1)!
x2n
cos x = 1 − 2!1 x2 + 4!1 x4 − 6!1 x6 + · · · + (−1)n + ··· −∞ < x < ∞
(2n)!
85
3.6.3 Taylor series
Taylor polynomials and linear approximation

We have just learnt that power series are functions. What if we in some sense
asked the reverse question? Namely, can arbitrary functions be expressed
as a power series? For instance, rather than defining a function through a
power series, what if we already have some function f (x) that we want to
express as a power series? Can we do that? How? The answer is yes and
such a power series representation of f (x) is called its Taylor series.
Let’s say we have a function f (x) and let’s assume it can be represented as
a power series like Equation (3.3). Suppose we truncate the power series
defined in Equation (3.3) after the first (n + 1) terms. That is, let us stop
the series at the term an xn . We then obtain an nth degree polynomial in
x. This polynomial, denoted by Tn (x), is called the Taylor polynomial of
degree n for the function f (x) centred at x = 0. We have
Tn (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn
Note that Tn (x) is a finite polynomial, its domain includes all x. That is,
−∞ < x < ∞. Now the Taylor series of f is then given by limn→∞ Tn (x) =
f (x). In other words, a function’s Taylor series is precisely its power series
representation. Please take a moment to understand the distinction between
a power series and a Taylor series!
Example 3.24. Use the table of basic power series to find T0 (x), T1 (x),
T2 (x), T3 (x) for ex .
Solution:
T0 (x) = 1
T1 (x) = 1 + x
1
T2 (x) = 1 + x + x2
2
1 1
T3 (x) = 1 + x + x2 + x3
2 6
The first of these Taylor polynomials, namely T0 (x) = 1 simply matches the
height of the graph of y = f (x) at x = 0. It is sometimes called the zeroth
approximation to y = f (x) at x = 0. It can also be called the zeroth
Taylor polynomial for f at a = 0.
The next Taylor polynomial, namely T1 (x) = 1 + x, is called the linear

approximation to f (x) at a = 0. The equation y = T1 (x) is the equation
86
of the tangent line to the graph y = f (x) at x = 0.
The diagram below shows the graph of y = ex (thick curve) on the do-
main −1.5 ≤ x ≤ 1.5, along with the graphs of y = T0 (x) = 1 and the
linear approximation function y = T1 (x) = 1 + x.
Example 3.25. Use the power series table to find:
(i) T0 (x), T1 (x), T2 (x), T3 (x), for f (x) = ln(1 + x)
(ii) T1 (x), T3 (x), T5 (x), for f (x) = sin x
(iii) T1 (x), T3 (x), T5 (x), for f (x) = sin 2x
87
(iv) T0 (x), T2 (x), T6 (x), for f (x) = cos 3x
Example 3.26. For parts (i) and (ii) of the example above, draw sketches
of the graphs of y = f (x), T0 (x) and T1 (x).
Linear Approximation
The Taylor polynomial of degree one (y = T1 (x)) is the linear approx-

imation to y = f (x) centred at x = 0. Clearly T1 (x) = f (0) + f 0 (0)x.
Example 3.27. (i) Find the linear approximation to
e3x
y = f (x) =
2+x
centred at x = 0.
Solution:
e3x
f (x) = f (0) =
2+x
f 0 (x) = f 0 (0) =
88
Hence T1 (x) =
(ii) Use the linear approximation to f (x) to estimate the value of f (0.1).
89
3.6.4 Derivation of Taylor polynomials from first principles
Suppose we do not know the Taylor series for a given function f (x) but wish
to derive the first few Taylor polynomial approximations to f (x) near x = 0.
How do we find T0 (x), T1 (x), T2 (x), . . ., Tn (x)? Determining these objects
requires finding the numbers a0 , a1 , a2 , . . . . In order to do this we need to
know not only the value of f (x) at x = 0, namely f (0), but also the values
of the first n derivatives of f (x) at x = 0. That is, we need to be given the
values of f (0), f 0 (0), f 00 (0) ≡ f (2) (0), f (3) (0), . . ., f (n) (0). We will build
each of the Ti (x), 0 ≤ i ≤ n, so that its function value and derivative at
x = 0 match up to (and include) f (i) (0).
Solution:
We write
Tn (x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + an xn (3.4)
where a0 , a1 , a2 , . . ., an are undetermined constants.
To find the constant a0 :

Put x = 0 in Equation (3.4). Therefore Tn (0) = a0 + 0 + 0 + 0 + · · · + 0 = a0 .
We next insist that Tn (0) = f (0). Hence a0 = f (0).

(1)
Differentiate Equation (3.4) with respect to x and the set Tn (0) = f (1) (0).
dTn
Tn(1) (x) = = 0 + a1 + 2a2 x + 3a3 x2 + · · · + nan xn−1 (3.5)
dx
(1)
Therefore Tn (0) = 0 + a1 + 2a2 0 + 3a3 02 + · · · + nan 0n−1 = a1 . By insisting
(1)
that Tn (0) = f (1) (0) we get a1 = f (1) (0).

(2)
Differentiate Equation (3.5) with respect to x and then set Tn (0) = f (2) (0).
d2 Tn
Tn(2) (x) = = 0 + 2 × 1a2 x0 + 3 × 2a3 x1 + · · · + n(n − 1)an xn−2 (3.6)
dx2
(2)
Therefore Tn (0) = 2 × a2 + 3 × 2a3 0 + · · · + n(n − 1)an 0n−2 = 2a2 . By
(2) 1
insisting that Tn (0) = f (2) (0) we get a2 = f (2) (0).
2!
(3)
Differentiate Equation (3.6) with respect to x and then set Tn (0) = f (3) (0).
d3 Tn
Tn(3) (x) = = 0 + 3 × 2 × 1a3 x0 + · · · + n(n − 1)(n − 2)an xn−3 (3.7)
dx3
90
(3)
Therefore Tn (0) = 3 × 2 × 1a3 + · · · + n(n − 1)(n − 2)an 0n−3 = 3 × 2 × 1a3 .
(3) 1
By insisting that Tn (0) = f (3) (0) we get a3 = f (3) (0).
3!
1 (n)
Repeating this process n times, we find an = n! f (0). Lastly, we substitute
these ai values into Equation (3.4), to obtain the Taylor polynomial of degree
n centred at x = 0:
n
X f (k) (0)
Tn (x) = xk
k!
k=0
f 0 (0) f 00 (0) 2 f (3) (0) 3 f (n) (0) n
= f (0) + x+ x + x + ··· + x .
1! 2! 3! n!
Knowing that f (x) = limn→∞ Tn (x), this gives the Taylor series of f (x)
centred at x = 0 as
∞
X f (n) (0)
f (x) = xn
n!
n=0
f 0 (0) f 00 (0) 2 f (3) (0) 3
= f (0) + x+ x + x + ··· .
1! 2! 3!
Example 3.28. (i) Derive the first four Taylor polynomials T0 (x), T1 (x),
1
T2 (x), T3 (x) for the function f (x) = 1+x centred at x = 0.
Function Value at x = 0
1
f (x) = f (0) =
1+x
1
f 0 (x) = − f 0 (0) =
(1 + x)2
f (2) (x) = f (2) (0) =
f (3) (x) = f (3) (0) =
91
(ii) Sketch f (x), T1 (x) and T2 (x).
(iii) Deduce the Taylor polynomial of degree three for the function g(x) =
1
centred at x = 0.
1 + 3x
Example 3.29. (i) Derive the Taylor polynomials T0 (x), T2 (x), T4 (x) for
the function y = cos x centred at x = 0.
(ii) Using Mathematica (or otherwise) plot y = cos x, as well as T0 (x), T2 (x),
and T4 (x) for the domain −π ≤ x ≤ π.
(iii) Deduce the Taylor polynomial of degree four for y = cos 3x centred
at x = 0.
Function Value at x = 0
f (x) = cos x f (0) = cos(0) = 1
f 0 (x) = f 0 (0) =
f (2) (x) = f (2) (0) =
f (3) (x) = f (3) (0) =
f (4) (x) = f (4) (0) =
Example 3.30. Is it possible to have two different power series for the one
function?
92
3.6.5 Taylor series centred at x 6= 0
In the previous section, we have been considering our Taylor polynomials

and Taylor series centred at x = 0. This means that the Taylor polynomials
T1 (x), T2 (x), . . . , Tn (x) of a function f (x) will serve as good approximations
to f (x) at values near to x = 0. For example, the degree two Taylor poly-
nomial T2 (0.1) will serve as a good approximation of f (0.1). On the other
hand, T2 (100) will serve as a lousy approximation to f (100). Roughly speak-
ing, this is because Tn (x) is built up from information of the function f (x)
at x = 0 (namely, its derivatives at x = 0). So although you can evalulate
T2 (100), T2 (x) does not have a good idea about what is going on with f (x)
at x = 100. To circumvent this issue you could perhaps look at a higher
degree Taylor polynomial, maybe T50 (100). Now depending on the chosen
function f (x), this may be a good approximation to f (100). But this re-
quires us to calculate 50 derivatives! What if instead of centring our Taylor
polynomials at x = 0, we can try centring it at some other number, say
x = c, which is near our evaluation point? This leads us to a general Taylor
polynomial centred at x = c:
n
X f (k) (c)
Tn (x; c) = (x − c)k
k!
k=0
f 0 (c) f 00 (c) f (n) (c)
= f (c) + (x − c) + (x − c)2 + · · · + (x − c)n .
1! 2! n!
Hence, Tn (x; 0) ≡ Tn (x).
Taking the limit as n → ∞ of Tn (x; c), we get the Taylor series of f centred
at x = c as
∞
X f (n) (c)
f (x) = (x − c)n
n!
n=0
f 0 (c) f 00 (c)
= f (c) + (x − c) + (x − c)2 + · · · .
1! 2!
Note that f (x) is given by its Taylor series regardless of the centring value c
(as long as the convergence occurs!). So, it’s not particularly useful to look
at Taylor series centred at values other than x = 0, unless x = 0 does not
allow for convergence. Taylor polynomials centred at x = c on the other
hand are very useful. They allow us to obtain good approximations to f
near x = c, and we usually only require a low order Taylor polynomial (often
2 or 3 suffices!).
Example 3.31. Derive the third degree Taylor polynomial of f (x) = cos(x),
centred at x = π/2. That is, find T3 (x; π/2). Use this to estimate f ( π2 +0.1).
93
3.6.6 Cubic splines interpolation
Power series (and Taylor series) provide us with a method of approximating

values of a function at some particular point. When we construct Taylor
polynomials we get a higher level of accuracy when we use a higher degree
polynomial. Suppose we are asked to evaluate a function at a particular
point x = c (for example finding the zeros of a function, or the intersection
point of two functions) but we cannot use algebraic methods (such as the
quadratic formula) to solve such a problem. What if we are not even given
the function? What can we do? Lucky for us there are methods available
that will give a good approximation of the solution.
Algebraic methods will give us exact solutions, but the algebraic

methods may not be simple to use.
Numerical methods will give us an approximation, but are much

easier to use.
Polynomial interpolation
Suppose we are given a set of data points (xi , f (xi )) (note that we do not
have the function f (x) explicitly given to us) and we want to build a function
that approximates f (x) with as much continuity as we can get. What do
we do? We will introduce a method of interpolation to do this. We are
essentially going to find a curve which best fits the data given. In science and
engineering, numerical methods often involve curve fitting of experimental
data.
Polynomial interpolation is simply a method of estimating the values of a
function, between known data points. Thus linear interpolation would use
two data points, quadratic interpolation would use three data points, and
so on.
As an example, suppose you are asked to evaluate a function f (x) at x = 3.4
but all you are given is the following table of data points.
x 0.000 1.200 2.400 3.600 4.000 6.000 7.000
f (x) 0.000 0.932 0.675 −0.443 −0.757 −0.279 0.657
94
Since x = 3.4 is not in the table, the best we can do is find an estimate of
f (3.4). We could construct a straight line built on the two points either side
of x = 3.4 namely (2.400, 0.675) and (3.600, −0.443). Or we could build a
quadratic based on any three points (that cover x = 3.4). If we were really
keen we could build a cubic by selecting four points around x = 3.4. All we
are doing is simply using a set of points near the target point (x = 3.4) to
build a polynomial. Then we estimate f (x) by evaluating the polynomial at
the target point. This process is called polynomial interpolation.
This method will give us a unique polynomial for our approximation of the
function f (x). This is fine but we know that our accuracy is likely to decline
as we get further away from our target point. What else can we do?
Piecewise polynomial interpolation: Cubic splines

Instead of trying to find one polynomial to fit our data points, what if we
take sections of the data, and fit polynomials to each section, ensuring that
the overall piecewise function is continuous. We would also like differentia-
bility, but let us not be too picky for now.
The simplest polynomial to use is a linear approximation. This will produce
a path that consists of line segments that pass through the points of the
data set. The resulting linear spline function can be written as a piecewise
function. Unfortunately though, we do not usually have continuity of the
first derivatives at the data set points.
linear spline
This technique can be easily extended to higher order polynomials. If we

take piecewise quadratic polynomials, we can get continuity of the first
derivatives, but the second derivatives will have discontinuities. If we take
piecewise cubic polynomials, then we can make (with some work) both the
first and second derivatives continuous.
95
cubic spline
Suppose we are given a simple dataset and we are asked to estimate the
derivative at say x = 0.35 How do we proceed? Here is one approach.
Construct, by whatever means, a smooth approximation ỹ(x) to y(x) near
x = 0.35. Then put y 0 (0.35) ≈ ỹ 0 (0.35).
What we want is a method which
• Produces a unique approximation,

• Is continuous over the domain and
• Has, at least, a continuous first derivative over the domain.
Let us say we have a set of n + 1 data points (xi , yi ), i = 0, 1, 2, . . . , n and

we wish to build an approximation ỹ(x) which has as much continuity as we
can get.
Between each pair of points we will construct a cubic. Let ỹi (x) be the cubic
function for the interval xi ≤ x ≤ xi+1 , for i = 0, 1, 2, . . . , n − 1. We demand
that the following conditions are met
• Interpolation condition
yi = ỹi (xi ) (1)
• Continuity of the function
ỹi−1 (xi ) = yi (2)
• Continuity of the first derivative

0
ỹi−1 (xi ) = ỹi0 (xi ) (3)
96
• Continuity of the second derivative
00
ỹi−1 (xi ) = ỹi00 (xi ) (4)
Can we solve this system of equations? We need to balance the number of

unknowns against the number of equations. We have n + 1 data points and
thus n cubics to compute. Each cubic (f (x) = ax3 + bx2 + cx + d) has 4
coefficients, thus we have 4n unknowns. And how many equations? From
the above we count n equations in each of (1) and (2), and n − 1 equations
in each of (3) and (4). A total of 4n − 2 equations for 4n unknowns. We see
that we will have to provide two extra pieces of information. For now let us
press on, and see what comes up.
We start by putting
ỹi (x) = yi + ai (x − xi ) + bi (x − xi )2 + ci (x − xi )3 (5)
which automatically satisfies equation (1). For the moment suppose we

happen to know all of the second derivatives ỹi00 (x). We then have ỹi00 (x) =
2bi + 6ci (x − xi ) and evaluating this at x = xi leads to
bi = yi00 /2, (6)
where we have introduced the shorthand notation

ỹi00 (xi ), i = 0, 1, 2, . . . , n − 1,
yi00 =
ỹ 00 (x ), i = n.
n−1 n
00
Now we turn to equation (4) yi+1 = yi00 + 6ci (xi+1 − xi ) which gives
00
ci = (yi+1 − yi00 )/(6hi ) (7)
where we have introduced hi = xi+1 − xi . Next we compute the ai by

applying equation (2),
1 00
yi+1 = yi + ai hi + (yi+1 + 2yi00 )h2i (8)
6
and so
97
yi+1 − yi 1 00
ai = − hi (yi+1 + 2yi00 ). (9)
hi 6
It appears that we have completely determined each of the cubics, though we

are yet to use (3), continuity in the first derivative. But remember that we
don’t yet know the values of yi00 . Thus equation (3) will be used to compute
the yi00 . Using our values for ai , bi and ci we find (after much fiddling) that
equation (3) is

yi+1 − yi yi − yi−1 00
6 − = hi yi+1 + 2(hi + hi−1 )yi00 + hi−1 yi−1
00
. (10)
hi hi−1
The only unknowns in this equation are the yi00 of which there are n + 1.
But there are only n − 1 equations. Thus we must supply two extra pieces
of information.
The simplest choice is to set y000 = yn00 = 0. Then we have a tridiagonal system
of equations2 to solve for yi00 . That’s as far as we need push the algebra –
we can simply now use technology (such as Matlab, Mathematica, Wolfram
Alpha...) to solve the tridiagonal system.
The recipe
• Solve equation (10) for yi00 ,
• Compute all of the ai from equation (9),
• Compute all of the bi from equation (6),
• Compute all of the ci from equation (7) and finally
• Assemble all of the cubics using equation (5).
Our job is done. We have computed the cubic spline for the our set of
data points.
Example 3.32. Let us say we are given the set of data points in the fol-
lowing table. Find the cubic spline that best fits this data.
2
Often a system of equations will give a coefficient matrix of a special structure. A
tridiagonal system of equations is one such that the coefficient matrix has zero entries
everywhere except for in the main diagonal and in the diagonals above and below the
main diagonal.
98
x −2 −1 1 3
f (x) 3 0 2 1
We are going to use equation 5 to give three cubics, ỹ0 (x), ỹ1 (x) and ỹ2 (x).
Recall
ỹi (x) = yi + ai (x − xi ) + bi (x − xi )2 + ci (x − xi )3 (5)
From the data points we have x0 = −2, x1 = −1, x2 = 1, x3 = 3, y0 = 3,

y1 = 0, y2 = 2 and y3 = 1.
We also know that y000 = y300 = 0.
Putting this information into equation 10 we obtain the following two equa-
tions
When i = 1 24 = 6y100 + 2y200
When i = 2 −9 = 2y100 + 8y200
105 51
Solving this system of equations we find that y100 = and y200 = − .
22 22
167 31 23
The use of equation 9 will give us a0 = − , a1 = − and a2 = .
44 22 22
105 51
Equation 6 gives b0 = 0, b1 = and b2 = − .
44 44
35 13 17
Equation 7 gives c0 = , c1 = − and c2 = .
44 22 88
Now using equation 5 we produce the following three cubic polynomials,

which determine the cubic spline:
167 35
ỹ0 (x) = 3 − (x + 2) + (x + 2)3 for −2 ≤ x < −1
44 44
31 105 13
ỹ1 (x) = − (x + 1) + (x + 1)2 − (x + 1)3 for −1 ≤ x < 1
22 44 22
23 51 17
ỹ2 (x) = 2 + (x − 1) − (x − 1)2 + (x − 1)3 for 1 ≤ x ≤ 3
22 44 88
Example 3.33. For the above example, check that the four conditions (1,
2, 3 and 4) are met.
99
Example 3.34. Compute the cubic spline that passes through the following
data set points
x 0 1 2 3
f (x) 0 0.5 2 1.5
100
Chapter 4
Integration
4.1 Fundamental theorem of calculus
4.1.1 Revision
R
Computing the indefinite integral I = f (x)dx is no different from find-
dF R dF
ing a function F (x) such that = f (x). Thus dx = F (x). The
dx dx
function F (x) is called an anti-derivative of f (x).
You should recall some of the basic integrals.
Z
kdx = kx + C, where C ∈ R
Z
1
xn dx = xn+1 + C, n 6= −1
n+1
Z
sin(x)dx = − cos(x) + C
Z
cos(x)dx = sin(x) + C
Z
ex dx = ex + C
Z
1
dx = ln |x| + C
x
Recall also the properties of indefinite integrals:
Z Z Z

f (x) + g(x) dx = f (x)dx + g(x)dx
101
Z Z Z

f (x) − g(x) dx = f (x)dx − g(x)dx
Z Z
kf (x)dx = k f (x)dx for any constant k
There are also a few tricks we can use to find F (x), such as integration by
substitution and integration by parts.
Integration by substitution
R
If I = f (x)dx looks nasty, try changing the variable of integration. That
is, put u = u(x) for some chosen function u(x), then invert the function to
find x = x(u) and substitute into the integral.
Z Z
dx
I= f (x)dx = f (x(u)) du
du
If we have chosen well, then this second integral will be easy to do.
Example 4.1. Find 4x cos(x2 + 5)dx

R
Integration by parts
This is a very powerful technique based on the product rule for derivatives.
Recall that
d(f g) df dg
=g +f
dx dx dx
Now integrate both sides
Z Z Z
d(f g) df dg
dx = g dx + f dx
dx dx dx
But integration is the inverse of differentiation, thus we have
102
Z Z
df dg
fg = g dx + f dx
dx dx
which we can re-arrange to
Z Z
dg df
f dx = f g − g dx
dx dx
Thus we have converted one integral into another. The hope is that the
second integral is easier than the first. This will depend on the choices we
dg
make for f and .
dx
Example 4.2. Find xex dx.
R
dg
Solution: We have to split the integrand xex into two pieces, f and .
dx
dg df
If we choose f (x) = x and = ex then = 1 and g(x) = ex .
dx dx
Then
Z Z
x df
xe dx = f g − g dx
dx
Z
x
= xe − 1 · ex dx
= xex − ex + C
R
Example 4.3. Find x cos(x)dx.
dg df
Solution: Choose f (x) = x and = cos(x) then = 1 and g(x) = sin(x).
dx dx
Then
Z Z
df
x cos(x)dx = f g − gdx
dx
Z
= x sin(x) − 1 · sin(x)dx
= x sin(x) + cos(x) + C
103
R
Example 4.4. Find x sin(x)dx
4.1.2 Fundamental Theorem of Calculus
The Fundamental Theorem of Calculus states that:

If f (x) is a continuous function on the interval [a, b] and there is a function
F (x) such that F 0 (x) = f (x), then
Z b
f (x)dx = F (b) − F (a)
a
Rb
Note that a f (x)dx is known as the definite integral from a to b as we
are integrating the function f (x) between the values x = a and x = b.
Can we interpret this theorem in some physical way? Of course! Let s(t) be
a continuous function which gives the position of a moving object at time
t where t is in the interval [a, b]. We know that s0 (t) gives the velocity of
Rb
the object at time t, and we want to know what is the meaning of a s0 (t)dt.
Recall that distance = velocity × time. Thus for any small interval ∆t in
[a, b] we have s0 (t) × ∆t ≈ distance travelled in ∆t. Adding each successive
calculation of the distance travelled for the small intervals of time ∆t from
t = a to t = b will give us (approximately) the total distance travelled
over the interval [a, b].
Integrating the velocity function s0 (t) over the interval [a, b] will then give us
the total distance travelled over the interval [a, b]. Thus the definite inte-
gral of a velocity function can be interpreted as the total distance travelled
in the interval [a, b].
The integral of the rate of change of any quantity gives the total
change in that quantity.
104
4.2 Area under the curve
When f (x) is a positive function and a < b then the definite integral
Z b
f (x)dx
a
gives the area between the graph of the function f (x) and the x - axis. In
other words
Z b
f (x)dx = A
a
Example 4.5. Find the area between the graph of y = sin x and the x -
axis, between x = 0 and x = π2 .
When f (x) is a negative function and a < b then the definite integral gives
the negative of the area between the graph of the function f (x) and the x -
axis.
Z b
f (x)dx = −A
a
105
axis, between x = π and x = 3π
2 .
When f (x) is positive for some values of x in the interval [a, b] and negative
for other values in the interval [a, b] then the definite integral gives the sum
of the areas above the x - axis and subtracts the areas below the x - axis.
In other words
Z b
f (x)dx = A − B + C
a
axis, between x = 0 and x = 3π
2 .
106
Area between two curves. Given two continuous functions f (x) and
g(x) where f (x) ≥ g(x) for all x in the interval [a, b], the area of the region
bounded by the curves y = f (x) and y = g(x), and the lines x = a and
x = b is given by the definite integral
Z b
f (x) − g(x) dx
a
This is true regardless of whether the functions are positive, negative, or a

combination of both. Can you see why?
Example 4.8. Find the area between the graphs of y = sin x and y = cos x
between x = π4 and x = π.
Look carefully at the next example.
Example 4.9. Find the area between the graphs of y = sin x and y = cos x
between x = 0 and x = π.
Example 4.10. Find the area bounded by the graphs of x = y 2 − 5y and

x = −2y 2 + 4y.
107
4.3 Trapezoidal rule
Sometimes it may not be all that simple to integrate a function. (As an

2
example, try finding the anti-derivative of e−x .) When we encounter situ-
ations such as this we can again turn to numerical methods of approx-
imation to help us out, avoiding the need to integrate the function. One
such method is the Trapezoidal rule which (as its name suggests) uses the
area of the trapezium to approximate the area under the graph of a function
f (x). Recall the area of a trapezium is given by
1
A = (m + n)w
2
where m and n are lengths of the parallel sides of the trapezium, and w is
the distance between the parallel lengths (i.e. the width).
If the interval is [a, b] then w = b − a. If the interval [a, b] is divided into n

b−a
equal sub-intervals, then each sub-interval has width wi = = ∆n and
n
the successive heights (m + n) of the parallel sides are given by f (a) + f (a +
∆n); f (a + ∆n) + f (a + 2∆n); . . ., f (a + (n − 1)∆n) + f (b).
The sum of the areas of each of the trapezoids created by each sub-interval
can then be stated as:
!
1b−a
A= f (a) + 2f (a + ∆n) + . . . + 2f (a + (n − 1)∆n) + f (b) .
2 n
Altering our notation slightly gives
n−1
!
X b−a
A= f (xi ) + f (xi + ∆n) .
2n
i=0
Note that when i = 0, x0 = a and thus f (x0 ) = f (a), and when i = n − 1,

f (xn−1 + ∆n) = f (b).
Thus the sum of the areas of each of the trapezoids created by each sub-
interval can be stated as:
n−1
!
b
b−a
Z X
f (x)dx ≈ f (a) + f (b) + 2 f (xi )
a 2n
i=1
108
Example R 24.11. Use the Trapezoidal rule with n = 4 to find an approximate
value of 0 2x dx.
Solution
In the interval [0, 2] when n = 4 we have four trapezoids each of width 21 .
The endpoints of our interval are a = 0 and b = 2 thus f (a) = f (0) = 20 = 1
b−a 2 1
and f (b) = f (2) = 22 = 4. Note that = = . Thus
2n 2×4 4
n−1
!
2
b−a
Z X
x
2 dx ≈ f (a) + f (b) + 2 f (xi )
0 2n
i=1
3
!
1 X
= 1+4+2 2 xi
4
i=1
1
= 1 + 4 + 2(21/2 + 21 + 23/2 )
4
1 √ √
= 5 + 2( 2 + 2 + 2 2)
4
1 √
= (9 + 6 2)
4
Example 4.12. Use the Trapezoidal rule with n = 5 to find an approximate

value of Z π√
sin xdx
0

value of Z 1
1
2
dx
0 1+x
109
value of Z 1
2
ex dx
0
110
Chapter 5
Multivariable Calculus
5.1 Functions of several variables
We are all familiar with simple functions such as y = x3 . And we all know
the answers to questions such as
• What is the domain and range of the function?
• What does the function look line as a plot in the xy− plane?
• What is the derivative of the function?
Single variable calculus encompasses functions such as y = x3 where y

is a function of the single (independent) variable x. The graph of y = f (x)
is a curve in the xy− plane. In nature, many physical quantities depend
on more than one independent variable. We are now going to explore how
to answer similar questions to the above for functions such as z = x3 + y 2 .
This is just one example of what we call functions of several variables.
We can have as many variables as we want; z = x3 + y 2 is a function of
two independent variables (x and y), w = x3 + y 2 − z 2 is a function of three
independent variables (x, y and z), and so forth. Just as we would write
f (x) = x3 we can write f (x, y) = x3 + y 2 and f (x, y, z) = x3 + y 2 − z 2 and
so on. For the remainder of this course we will focus on functions involving
two independent variables, but bear in mind that the lessons learnt here will
be applicable to functions of any number of variables.
5.1.1 Definition
A function f of two (independent) variables (x, y) is a single valued mapping

of a subset of R2 into a subset of R.
111
What does this mean? Simply that for any allowed value of x and y we can
compute a single value for f (x, y). In a sense f is a process for converting
pairs of numbers (x and y) into a single number f .
The notation R2 means all possible choices of x and y such as all points
in the xy-plane. The symbol R denotes all real numbers (for example all
points on the real line). The use of the word subset in the above definition
is simply to remind us that functions have an allowed domain (i.e. a subset
of R2 ) and a corresponding range (i.e. a subset of R).
Notice that we are restricting ourselves to real variables, that is the func-
tion’s value and its arguments (x, y) are all real numbers. This game gets
very exciting and somewhat tricky when we enter the world of complex num-
bers. Such adventures await you in later year mathematics (not surprisingly
this area is known as Complex Analysis).
5.1.2 Notation
Here is a function of two variables
f (x, y) = sin(x + y)
We can choose the domain to be R2 and then the range will be the closed
set [−1, +1]. Another common way of writing all of this is
f : (x, y) ∈ R2 7→ sin(x + y) ∈ [−1, 1]
This notation identifies the function as f , the domain as R2 , the range as

[−1, 1] and most importantly the rule that (x, y) is mapped to sin(x + y).
For this subject we will stick with the former notation.
You should also note that there is nothing sacred about the symbols x, y and
f . We are free to choose what ever symbols takes our fancy, for example we
could create the function
w(u, v) = log(u − v)
Example 5.1. What would be a sensible choice of domain for the previous
function?
5.1.3 Surfaces
A very common application of functions of two variables is to describe a

surface in 3-dimensional space. How do we do this? The idea is that we
take the value of the function to describe the height of the surface above
112
the xy-plane. If we use standard Cartesian coordinates then such a surface
could be described by the equation
z = f (x, y)
This surface has a height z units above each point (x, y) in the xy-plane.
Just as the equation y = f (x) describes the curve in the xy− plane, the
equation z = f (x, y) describes the surface in R3 . Just as the curve C = f (x)
is made up of the points (x, y), the surface S = f (x, y) is made up of the
points (x, y, z). As z = f (x, y) describes this surface explicitly as a height
function over a plane, we say that the surface is given in explicit form.
A surface such as z = f (x, y) is also often called the graph of the function
f.
Here are some simple examples. A very good exercise is to try to convince
yourself that the following images are correct (i.e. that they do represent
the given equation).
p
Note that in each of the following r is defined as r = + (x2 + y 2 ).
z = x2 + y 2
113
1 = x2 + y 2 − z 2
z = cos (3πr) exp −2r2

p
z= 1 + y 2 − x2
114
z = −xy exp −x2 − y 2

1=x+y+z
Example 5.2. Sketch and describe the graph of the surface z = f (x, y) =
6 + 3x + 2y.
115
5.1.4 Alternative forms
We might ask are there any other ways in which we can describe a surface?
We should be clear that (in this subject) when we say surface we are talking
about a 2-dimensional surface in our familiar 3-dimensional space. With
that in mind, consider the equation
0 = g(x, y, z)
What do we make of this equation? Well, after some algebra we might be

able to re-arrange the above equation into the familiar form
z = f (x, y)
for some function f . In this form we see that we have a surface, and thus the
previous equation 0 = g(x, y, z) also describes a surface. When the surface
is described by an equation of the form 0 = g(x, y, z) we say that the surface
is given in implicit form.
Consider all of the points in R3 (i.e all possible (x, y, z) points). If we now
introduce the equation 0 = g(x, y, z) we are forced to consider only those
(x, y, z) values that satisfy this constraint. We could do so by, for example,
arbitrarily choosing (x, y) and using the equation (in the form z = f (x, y) to
compute z. Or we could choose say (y, z) and use the equation 0 = g(x, y, z)
to compute x. Which ever road we travel it is clear that we are free to choose
just two of the (x, y, z) with the third constrained by the equation.
Now consider some simple surface and let’s suppose we are able to drape a
sheet of graph paper over the surface. We can use this graph paper to select
individual points on the surface (well as far as the graph paper covers the
surface). Suppose we label the axes of the graph paper by the symbols u
and v. Then each point on surface is described by a unique pair of values
(u, v). This makes sense – we are dealing with a 2-dimensional surface and
so we expect we would need 2 numbers ((u, v)) to describe each point on the
surface. The parameters (u, v) are often referred to as (local) coordinates
on the surface.
How does this picture fit in with our previous description of a surface, as
an equation of the form 0 = g(x, y, z)? Pick any point on the surface. This
point will have both (x, y, z) and (u, v) coordinates. That means that we can
describe the point in terms of either (u, v) or (x, y, z). As we move around
the surface all of these coordinates will vary. So given (u, v) we should be
able to compute the corresponding (x, y, z) values. That is we should be
able to find functions P (u, v), Q(u, v) and R(u, v) such that
x = P (u, v) y = Q(u, v) z = R(u, v)
116
The above equations describe the surface in parametric form.
Example 5.3. Identify (i.e. describe) the surface given by the equations
x = 2u + 3v + 1 y = u − 4v + 2 z = u + 2v − 1
Hint : Try to combine the three equations into one equation involving x, y
and z but not u and v.
Example 5.4. Describe the surface defined by the equations

x = 3 cos(φ) sin(θ) y = 4 sin(φ) sin(θ) z = 5 cos(θ)
for 0 < φ < 2π and 0 < θ < π
Example 5.5. How would your answer to the previous example change if
the domain for θ was 0 < θ < π/2?
Equations for surfaces

A 2-dimensional surface in 3-dimensional space may be described by
any of the following forms.
Explicit z = f (x, y)
Implicit 0 = g(x, y, z)
Parametric x = P (u, v), y = Q(u, v), z = R(u, v)
117
5.2 Partial derivatives
5.2.1 First partial derivatives
We are all familiar with the definition of the derivative of a function of one
variable
df f (x + ∆x) − f (x)
= lim
dx ∆x→0 ∆x
The natural question to ask is: Is there similar rule for functions of more
than one variable? The answer is yes, and we will develop the necessary
formulas by a simple generalisation of the above definition.
Let us suppose we have a function, say f (x, y). Suppose for the moment
that we pick a particular value of y, say y = 3. Then only x is allowed to
vary and in effect we now have a function of just one variable. Thus we can
apply the above definition for a derivative which we write as
∂f f (x + ∆x, y) − f (x, y)
= lim
∂x ∆x→0 ∆x
Notice the use of the symbol ∂ rather than d. This is to remind us that
in computing this derivative all other variables are held constant (which in
this instance is just y).
Of course we could do the same again but with x held constant. This gives
us the derivative in y
∂f f (x, y + ∆y) − f (x, y)
= lim
∂y ∆y→0 ∆y
Each of these derivatives, ∂f /∂x and ∂f /∂y are known as first order par-
tial derivatives of f (the derivative of a function of one variable is often
called an ordinary derivative).
We can also look at this in terms of the rate of change, as we did for single
variable functions. If z is a function of two independent variables x and y
(i.e. z = f (x, y)) then there are two independent rates of change. One of
these is the rate of change of f with respect to the variable x, and
the other is the rate of change of f with respect to the variable y.
You might think that we would now need to invent new rules for the (partial)
derivatives of products, quotients and so on. But our definition of partial
118
derivatives is built upon the definition of an ordinary derivative of a function
of one variable. Thus all the familiar rules carry over without modification.
For example, the product rule for partial derivatives is
∂ (f g) ∂f ∂g
= g +f
∂x ∂x ∂x
∂ (f g) ∂f ∂g
= g +f
∂y ∂y ∂y
Computing partial derivatives is no more complicated than computing or-

dinary derivatives. Hooray for us!
Rules for finding partial derivatives
∂f
• To find , treat y as a constant and differentiate f (x, y) with respect
∂x
to x only.
∂f
• To find , treat x as a constant and differentiate f (x, y) with respect
∂y
to y only.
Example 5.6. If f (x, y) = x3 + x2 y 3 − 2y 2 , find fx (2, 1) and fy (2, 1).
Example 5.7. If f (x, y) = sin(x) cos(y) then
∂f ∂ sin(x) cos(y)
=
∂x ∂x
∂ sin(x)
= cos(y)
∂x
= cos(y) cos(x)
∂f
Also find .
∂y
119
2 −y 2 −z 2
Example 5.8. If g(x, y, z) = e−x then
2 −y 2 −z 2
∂g ∂e−x
=
∂z ∂z
2 −y 2 −z 2 ∂(−x2 − y 2 − z 2 )
= e−x
∂z
2 −y 2 −z 2
= −2ze−x
∂g ∂g
Also find and .
∂x ∂y
A word on notation: An alternative notation for ∂f ∂f

∂x and ∂y is fx and fy
respectively. You will find both versions are commonly used.
5.3 The tangent plane

For functions of one variable we found that a tangent line provides a useful
means of approximating the function. It is natural to ask how we might
generalise this idea to functions of several variables.
Constructing a tangent line for a function of a single variable, f = f (x), is
quite simple. (This should be revision!) First we compute the function’s
df
value f and its gradient at some chosen point. We then construct a
dx
straight line equation (y = mx + c) with these values at the chosen point.
This line is the tangent line of the function f at the given point.
π
Example 5.9. Find the tangent line to the function f (x) = sin x at x = .
4
How do we relate this to functions of several variables?
120
5.3.1 Geometric interpretation
∂f
Earlier we noted that the partial derivative of the function of two vari-
∂x
ables z = f (x, y) is the rate of change of f in the x-direction, keeping y
∂f
fixed. To visualise as the slope (or gradient) of a straight line, consider
∂x
the diagram below.
This diagram shows the intersection of the vertical plane y = ‘constant’

with the smooth differentiable surface z = f (x, y) in R3 .
The intersection of the plane with this surface is a curve, C1 say. On C1 , x
can vary but y stays constant. We now draw the tangent line to the surface
at the point P that also lies in the vertical plane y = constant. This tangent
line, T1 say, strikes the xy-plane at angle α as shown.
This tangent line has slope tan α which equals the rate of change of the
height z of the surface z = f (x, y) in the x-direction at the point P . We
thus have
∂f
= tan α = slope of the tangent line to the surface z = f (x, y)
∂x
in x-direction.
121
Similarly, the diagram below illustrates the intersection of the vertical plane
x = ‘constant’ with the surface z = f (x, y).
This intersection is a smooth curve, C2 say, on the surface. On C2 , y can

vary but x stays fixed. As we move along this curve, the height z to the
surface changes only with respect to the change in the independent variable
y. Now we can draw the tangent line to the curve C2 at the point P . This
strikes the xy-plane at angle β. The slope of this tangent line, namely tan β,
gives the rate of change of f with respect to y at the point P . That is
∂f
= tan β = slope of the tangent line to the surface z = f (x, y)
∂y
in y-direction.
What happens if we consider the rate of change of f with respect to x and

with respect to y together? It is helpful to look at the diagram below, which
shows a section of the curve S of a differentiable function z = f (x, y) at a
point P (a, b, c), where c = f (a, b).
122
If we zoom in onto the surface at P it becomes locally flat. We can then
draw the two tangent lines T1 and T2 to the surface at P which are tan-
gential to the two curves C1 and C2 that lie in the vertical planes y = b
∂f
and x = a. These tangent lines, which have the slopes = tan α and
∂x
∂f
= tan β shown previously, give the rate of change of f (x, y) in both the
∂y
x and y directions.
The tangent plane to the surface at the point P is the plane that con-
tains both of the tangent lines T1 and T2 . Let us now find the equation
of this plane. We know that the general equation of a plane that passes
through the point P (a, b, c) is
z − c = m(x − a) + n(y − b) (5.1)
Here m and n are the slopes of the lines of intersection of the general plane
with the two vertical planes y = b and x = a that are parallel to the principal
coordinate planes (the xz− plane and the yz− plane respectively). If we
now put y = b in equation (5.1), we have z − c = m(x − a). This is the
equation of the line of intersection of our general plane with the plane y = b.
It clearly has slope m. Next we put x = a in equation (5.1) and this yields
z −c = n(y −b). This is the equation of the line of intersection of our general
plane with the plane x = a. It clearly has slope n. Lastly, if we choose
∂f ∂f
m = tan α = = fx (a, b) and = n = tan β = = fy (a, b)
∂x ∂y
then the equation of the tangent plane to the surface z = f (x, y) at the
123
point (x, y) = (a, b) is
z = f (a, b) + fx (a, b) · (x − a) + fy (a, b) · (y − b) (5.2)
Our work is done!
Example 5.10. Find the equation of the tangent plane to the surface
z = 2x2 + y 2 at the point (a, b) = (1, 1).
Solution: Here f (x, y) = 2x2 + y 2 . Thus f (a, b) = f (1, 1) = 2 · 12 + 12 = 3.

Next,
∂f ∂f
= 4x therefore (1, 1) =
∂x ∂x
∂f ∂f
= 2y therefore (1, 1) =
∂y ∂y
Using equation (5.2) the equation of the tangent plane is
5.3.2 Linear approximations
We have done the hard work, and now it is time to enjoy the fruits of our
labour. Just as we used the tangent line in approximations for functions of
one variable, we can use the tangent plane as a way to estimate the original
function f (x, y) in a region close to the chosen point.
The equation of the tangent plane to the surface z = f (x, y) at the point
(a, b) is also the equation for the linear approximation to z = f (x, y) for
points (x, y) near (a, b). We can regard the tangent plane equation (5.2)
as the natural extension to functions of two variables (x, y) of the Taylor
polynomial of degree one equation
y = T1 (x; a) = f (a) + f 0 (a) · (x − a).
This is the linear approximation equation for functions of one variable,

namely y = f (x), for x near a.
Hence we call
z = T1 (x, y) = f (a, b) + fx (a, b) · (x − a) + fy (a, b) · (y − b)
124
the linear approximation to f (x, y) for points (x, y) near (a, b). Please
note, we will omit the centring point (a, b) from the argument of T1 as
the notation becomes too cumbersome! You will need to understand from
context which centring point is being utilised.
Example 5.11. Derive the linear approximation function T1 (x, y) for the
√
function f (x, y) = 3x − y at the point (4, 3).
Example 5.12. Use the result of example 5.9 to estimate sin(x) sin(y) at
5π 5π
( , ).
16 16
125
5.4 Chain rule
In a previous lecture we saw how we could compute (partial) derivatives
of functions of several variables. The trick we employed was to reduce the
number of independent variables to just one (which we did by keeping all
but one variable constant). There is another way in which we can achieve
this reduction, which involves parametrising the function.
Consider a function of two variables f (x, y) and let’s suppose we are given
a smooth (continuous, with derivatives which are also continuous) curve in
the xy-plane. Each point on this curve can be characterised by its distance
from some arbitrary starting point on the curve. In this way we can imagine
that the (x, y) pairs on this curve are given as functions of one variable, let’s
call it s. That is, our curve is described by the parametric equations
x = x(s) y = y(s)
for some functions x(s) and y(s). The values of the function f (x, y) on this
curve are therefore given by
f = f (x(s), y(s))
and this is just a function of one variable s. Thus we can compute its
derivative df /ds. We will soon see that df /ds can be computed in terms of
the partial derivatives.
Example 5.13. Given the curve

x(s) = 2s, y(s) = 4s2 −1<s<1
and the function
f (x, y) = 5x − 7y + 2
df
compute at s = 0.
ds
df ∂f
Example 5.14. Show that for the curve x(s) = s, y(s) = 2 we get = .
ds ∂x
126
df
Example 5.15. Show that for the curve x(s) = −1, y(s) = s we get =
ds
∂f
.
∂y
The last two examples show that df /ds is somehow tied to the partial deriva-
tives of f . The exact link will be made clear in a short while.
What meaning can we assign to this number df /ds? It helps to imagine that
we have drawn a graph of f (x, y) (i.e. as a surface over the xy-plane).
Now draw the curve (x(s), y(s)) in the xy-plane and imagine walking along
that curve, let’s call it C. At each point on C, f (s) is the height of the
surface above the xy-plane. If you walk a short distance ∆s then the height
might change by an amount ∆f . The rate at which the height changes with
respect to the distance travelled is then ∆f /∆s. In the limit of infinitesimal
distances we recover df /ds. Thus we can interpret df /ds as measuring the
rate of change of f along the curve. This is exactly what we would have
expected – after all, derivatives measure rates-of-change.
The first example above showed how you could compute df /ds by first re-
ducing f to an explicit function of s. It was also hinted that it is also possible
to evaluate df /ds using partial derivatives.
Let’s go back to basics. The derivative df /ds could be calculated as
df f (x(s + ∆s), y(s + ∆s)) − f (x(s), y(s))

= lim
ds ∆s→0 ∆s
We will re-write this by adding and subtracting f (x(s), y(s+∆s)) just before
the minus sign. After a little rearranging we get
df f (x(s + ∆s), y(s + ∆s)) − f (x(s), y(s + ∆s))

= lim
ds ∆s→0 ∆s
f (x(s), y(s + ∆s)) − f (x(s), y(s))
+ lim
∆s→0 ∆s
127
Now let’s look at the first limit. If we introduce ∆x = x(s + ∆s) − x(s) then
we can write
f (x(s + ∆s), y(s + ∆s)) − f (x(s), y(s + ∆s))

lim
∆s→0 ∆s
f (x(s + ∆s), y(s + ∆s)) − f (x(s), y(s + ∆s)) ∆x

= lim
∆s→0 ∆x ∆s
∂f dx
= .
∂x ds
We can write a similar equation for the second limit. Combining the two
leads us to
df ∂f dx ∂f dy
= +
ds ∂x ds ∂y ds
This is an extremely useful and important result. It is an example of what

is known as the chain rule for functions of several variables.
The Chain Rule

Let f = f (x, y) be a differentiable function. If the function is
parametrized by x = x(s) and y = y(s) then the chain rule for
derivatives of f along a path x = x(s), y = y(s) is
df ∂f dx ∂f dy
= +
ds ∂x ds ∂y ds
Now that we have covered this much, it’s rather easy to see an important
extension of the above result. Suppose the path was obtained by holding
some other parameter constant. That is, imagine that the path x = x(s), y =
y(s) arose from some more complicated expressions such as x = x(s, t), y =
128
y(s, t) with t held constant. How would our formula for the chain rule
change? Not much other than we would have to keep in mind throughout
that t is constant. We encountered this issue once before and that led to
partial rather than ordinary derivatives. Clearly the same change of notation
applies here, and thus we would write
∂f ∂f ∂x ∂f ∂y
= +
∂s ∂x ∂s ∂y ∂s
as the first partial derivative of f with respect to s.
Let’s see where we are at so far. We are given a function of two variables
f = f (x, y) and we are also given two other functions, also of two variables,
x = x(s, t), y = y(s, t). Then ∂f /∂s can be calculated using the above chain
rule.
Of course you could also compute ∂f /∂s directly by substituting x = x(s, t)

and y = y(s, t) into f (x, y) before taking the partial derivatives. Both
approaches will give you exactly the same answer.
Note that there is nothing special in the choice of symbols, x, y, s or t. You
will often find (u, v) used rather than (s, t).
Example 5.16. Given f = f (x, y) and x = 2s + 3t, y = s − 2t compute

∂f /∂t directly and by way of the chain rule.
129
The Chain Rule : Episode 2
Let f = f (x, y) be a differentiable function. If x = x(u, v), y = y(u, v)
then
∂f ∂f ∂x ∂f ∂y
= +
∂u ∂x ∂u ∂y ∂u
∂f ∂f ∂x ∂f ∂y
= +
∂v ∂x ∂v ∂y ∂v
130
5.5 Gradient and Directional Derivative
Given any differentiable function of several variables we can compute each

of its first partial derivatives. Let’s do something ‘out of the square’. We
will assemble these partial derivatives as a vector which we will denote by
∇f . So for a function f (x, y) of two variables we define
∂f ∂f
∇f = i+ j
∂x˜ ∂y ˜
The is known as the gradient of f and is often pronounced grad f.
This may be pretty but what use is it? If we look back at the formula for
the chain rule we see that we can write it out as a vector dot-product
df ∂f dx ∂f dy
= +
ds ∂x ds ∂y ds

∂f ∂f dx dy
= i+ j · i+ j
∂x˜ ∂y ˜ ds ˜ ds ˜

dx dy
= (∇f ) · i+ j
ds ˜ ds ˜
The number that we calculate in this process, i.e. df /ds, is known as the
directional derivative of f in the direction t. What do we make of the
vector on the far right of this equation, i.e. dx˜ dy
ds i + ds j? It is not hard to
˜
see that it is a tangent vector to the curve (x(s), y(s)).Ãnd if we chose the
parameter s to be distance along the curve then we also see that it is a unit
vector.
Example 5.17. Prove the last pair of statements, i.e. that the vector is a
tangent vector and that it is a unit vector.
131
It is customary to denote the tangent vector by t (some people prefer u).
˜
With the above definitions we can now write the equation ˜
for a directional
derivative as follows
df
= t · ∇f
ds ˜
Yet another variation on the notation is to include the tangent vector as

subscript on ∇. Thus we also have
df
= ∇t f
ds ˜
Directional derivative
The directional derivative df /ds of a function f in the direction t is

given by ˜
df
= t · ∇f = ∇t f
ds ˜ ˜
where the gradient ∇f is defined by

∂f ∂f
∇f = i+ j
∂x˜ ∂y ˜
and t is a unit vector, t · t = 1.

˜ ˜ ˜
Example 5.18. Given f (x, y) = sin(x) √ cos(y) compute the directional deriva-
tive of f in the direction t = (i + j)/ 2.
˜ ˜ ˜
132
Example 5.19. Given ∇f = 2xi + 2y j and x(s) = s cos(0.1), y(s) =
s sin(0.1) compute df /ds at s = 1. ˜ ˜
Example 5.20. Given f (x, y) = (xy)2 and the vector v = 2i + 7j compute

˜
the directional derivative at (1, 1). Hint: Is v a unit vector? ˜ ˜
˜
We began this discussion by restricting a function of many variables to

a function of one variable. We achieved this by choosing a path such as
x = x(s), y = y(s). We might ask if the value of df /ds depends on the choice
of the path? That is we could imagine many different paths all sharing the
one point, call it P , in common. Amongst these different paths might we
get different answers for df /ds?
This is a very good question. To answer it let’s look at the directional
derivative in the form
df
= t · ∇f
ds ˜
First we note that ∇f depends only on the values of (x, y) at P . It knows
nothing about the curves passing through P . That information is contained
solely in the vector t. Thus if a family of curves passing through P share
˜
the same t then we most certainly will get the same value for df /ds for each
˜
member of that family. But what class of curves share the same t at P ?
Clearly they are all tangent to each other at P . None of the curves ˜ cross
any other curve at P .
133
At this point we can dispense with the curves and retain just the tangent
vector t at P . All that we require to compute df /ds is the direction we wish
to head˜ in, t, and the gradient vector, ∇f , at P . Choose a different t and
you will getã different answer for df /ds. In each case df /ds measures˜ how
rapidly f is changing the direction of t.
˜
134
5.6 Second order partial derivatives
The result of a partial derivative of a function yields another function of

one or more variables. We are thus at liberty to take another derivative,
generating yet another function. Clearly we can repeat this any number of
times (though possibly subject to some technical limitations as noted below,
see Exceptions).
Example 5.21. Let f (x, y) = sin(x) sin(y). Then we can define g(x, y) =
∂f ∂g
and h(x, y) = .
∂x ∂x
That is

∂f ∂ sin(x) sin(y)
g(x, y) = = = cos(x) sin(y)
∂x ∂x
and
∂g ∂ cos(x) sin(y)
h(x, y) = = = − sin(x) sin(y)
∂x ∂x
∂g
Example 5.22. Compute for the above example.
∂y
From this we see that h(x, y) was computed as follows

∂g ∂ ∂f
h(x, y) = =
∂x ∂x ∂x
This is often written as

∂2f
h(x, y) =
∂x2
and is known as a second order partial derivative of the function f (x, y).
Now consider the case where we compute h(x, y) by first taking a partial
derivative in x then followed by a partial derivative in y, that is

∂g ∂ ∂f
h(x, y) = =
∂y ∂y ∂x
135
and this is normally written as
∂2f
h(x, y) =
∂y∂x
Note the order on the bottom line – you should read this from right to left.
It tells you that to take a partial derivative in x then a partial derivative in
y.
The function z = f (x, y) has two partial derivatives fx and fy . Taking
partial derivatives of fx and fy yields four second order partial derivatives
of the function f (x, y).
It’s now a short leap to cases where we might try to find, say, the fifth partial
derivatives, such as
∂5Q
P (x, y) =
∂x∂y∂y∂x∂x
Partial derivatives that involve one or more of the independent variables are
known as mixed partial derivatives.
∂2f ∂2f
Example 5.23. Given f (x, y) = 3x2 +2xy compute and . What
∂x∂y ∂y∂x
do you notice?
Order of partial derivatives does not matter: Clairaut’s Theorem

If f (x, y) is a twice-differentiable function whose second order mixed par-
tial derivatives are continuous, then the order in which its mixed partial
derivatives are calculated does not matter. Each ordering will yield the
same function. For a function of two variables this means
∂2f ∂2f
=
∂x∂y ∂y∂x
This is not immediately obvious but it can be proved and it is a very useful
result.
A quick word on notation: The second order partial derivatives can be
written as follows:
136
∂2f
= fxx
∂x2
∂2f
= fyy
∂y 2
∂2f
= fxy
∂x∂y
Example 5.24. Use the above theorem to show that
∂5Q ∂5Q ∂5Q

P (x, y) = = =
∂x∂y∂y∂x∂x ∂y∂y∂x∂x∂x ∂x∂x∂x∂y∂y
The theorem allows us to simplify our notation, all we need do is record how
many of each type of partial derivative are required, thus the above can be
written as
∂5Q ∂5Q
P (x, y) = 3 2
=
∂x ∂y ∂y 2 ∂x3
Example 5.25. Show that the function u(x, y) = e−x cos y is a solution of
Laplace’s equation
∂2u ∂2u
+ 2 = 0.
∂x2 ∂y
Solution:
∂ −x ∂ −x
e cos y = −e−x cos y and uy = e cos y = −e−x sin y

ux =
∂x ∂y
Then
∂
− e−x cos y =

uxx =
∂x
and
∂
− e−x sin y =

uyy =
∂y
Hence
uxx + uyy =
137
5.6.1 Taylor polynomials of higher degree
In earlier lectures we discovered that the linear approximation function

T1 (x, y) to a function of two variables f (x, y) near the point (a, b) is the
same as the equation of the tangent plane at the point (a, b). In other
words, for (x, y) near (a, b) we have
f (x, y) ≈ T1 (x, y) = f (a, b) + fx (a, b) · (x − a) + fy (a, b) · (y − b) (5.3)
The function T1 (x, y) is also known as the Taylor polynomial of degree

one for f (x, y) near (a, b), and clearly uses first partial derivatives of f (x, y).
Now the tangent plane provides a good fit to f (x, y) only if (x, y) are suf-
ficiently close to (a, b). But this is obviously not always going to be the
case. If we want to obtain a more accurate polynomial approximation to
the graph of the surface z = f (x, y), we need to take into account the local
curvature of the surface at (a, b). This is done by including the second
partial derivatives of f (x, y), namely fxx , fxy and fyy . Using these gives us
T2 (x, y) which is the Taylor polynomial of degree two. T2 (x, y) is also
known as the quadratic approximation function:
T2 (x, y) = f (a, b) + fx (a, b) · (x − a) + fy (a, b) · (y − b)

1
fxx (a, b)(x − a)2 + 2fxy (a, b)(x − a)(y − b) + fyy (a, b)(y − b)2 (5.4)

+
2!
Example 5.26. Derive the Taylor polynomial of degree two for the function
f (x, y) = e−x cos y near the point (a, b) = (0, 0).
Solution:
Function Value at (0, 0)
f (x, y) = e−x cos y f (0, 0) = e−0 cos 0 = 1 · 1 = 1

∂f
fx (x, y) = ∂x = e−x cos y = −e−x cos y
∂
∂x fx (0, 0) = −e−0 cos 0 = −1

∂f
fy (x, y) = ∂y = ∂y e cos y = −e−x sin y
∂ −x fy (0, 0) = −e−0 sin 0 = 0

fxx (x, y) = ∂x fx = ∂x − e cos y = e−x cos y
∂ ∂ −x fxx (0, 0) = e−0 cos 0 = 1

∂ ∂
− e−x cos y = e−x sin y fxy (0, 0) = e−0 sin 0 = 0

fxy (x, y) = ∂y fx = ∂y

fyy (x, y) = ∂y fy = ∂y − e sin y = −e−x cos y
∂ ∂ −x fyy (0, 0) = −e−0 cos 0 = −1

138
Collecting terms and substituting them into equation (5.4) we obtain:
T2 (x, y) = f (0, 0) + fx (0, 0) · (x − 0) + fy (0, 0) · (y − 0)

1 1
+ fxx (0, 0) · (x − 0)2 + fxy (0, 0) · (x − 0)(y − 0) + fyy (0, 0) · (y − 0)2
2 2
1
=1−1·x+0·y+ 2 · 1 · x2 + 0 · xy − 21 y 2
= 1 − x + 12 (x2 − y 2 )
Lastly we can graph the surface z = f (x, y) = e−x cos y and the quadratic
approximation function T2 (x, y).
Graph of z = e−x cos y
139
Graph of z = 1 − x + 12 (x2 − y 2 )
Looking at these two graphs we see that the Taylor polynomial of degree
two, namely T2 (x, y), does a good job in mimicking the shape of the surface
z = f (x, y) for points (x, y) close to (0, 0). In the plane x = 1, the quadratic
approximation z = T2 (x, y) falls off too steeply along the y axis as we move
away from y = 0, while in the plane x = −1 it falls away too slowly along
the y axis as we move away from y = 0.
140
5.6.2 Exceptions: when derivatives do not exist
In earlier lectures we noted that at the very least a function must be con-
tinuous if it is to have a meaningful derivative. When we take successive
derivatives we may need to revisit the question of continuity for each new
function that we create.
If a function fails to be continuous at some point then we most certainly can
not take its derivative at that point.
Example 5.27. Consider the function





 0, −∞ < x < 0,
f (x) =

3x2 ,

0 < x < ∞.


It is easy to see that something interesting might happen at x = 0. It’s also

not hard to see that the function is continuous over its whole domain, and
thus we can compute its derivative everywhere, leading to



 0, −∞ < x < 0,
df (x) 
=
dx 

6x, 0 < x < ∞.


This too is continuous and we thus attempt to compute its derivative,



 0, −∞ < x < 0,
d2 f (x)


=
dx2 

6, 0 < x < ∞.


Now we notice that this second derivative is not continuous at x = 0. We

thus can not take any more derivatives at x = 0. Our chain of differentiation
has come to an end.
We began with a continuous function f (x) and we were able to compute
only its first two derivatives over the domain x ∈ R. However, as we noted,
its second derivative was not continuous at x = 0. We call such a function
a C 1 function, meaning that the function has a first derivative which is also
continuous. The symbol C reminds us that we are talking about continuity
and the superscript 1 tells us how many derivatives we can apply before we
encounter a non-continuous function. The clause ‘over R’ just reminds us
141
that the domain of the function is the set of real numbers (−∞, ∞). Despite
the function being twice differentiable, we would not call it a C 2 function,
since the second derivative is not continuous.
We should always keep in mind that a function may only posses a finite
number of derivatives before we encounter a discontinuity. The tell-tale
signs to watch out for are sharp edges, holes or singularities in the graph of
the function.
5.7 Stationary points
5.7.1 Finding stationary points
Suppose you run a commercial business and that by some means you have
constructed the following formula for the profit of one of your lines of busi-
ness
f = f (x, y) = 4 − x2 − y 2 .
Clearly the profit f depends on two variables x and y. Sound business prac-
tice suggest that you would like to maximise your profits. In mathematical
terms this means find the values of (x, y) such that f is a maximum. A
simple plot of the graph of f shows us that the maximum occurs at (0, 0)
(corresponding to a maximum profit of 4 units). We may not be able to do
this so easily for other functions, and thus we need some systematic way of
computing the points (x, y) at which f is maximised.
You have seen similar problems for the case of a function of one variable.
And from that you may expect that for the present problem we will be
making a statement about the derivatives of f in order that we have a
maximum (i.e. that the derivatives should be zero). Let’s make this precise.
Let’s denote the (as yet unknown) point at which the function is a maximum
by P . Now if we have a maximum at this point, then moving in any direction
from this point should see the function decrease. That is the directional
derivative must be non-positive in every direction from P . In other words
we must have
df
= t · (∇f )p ≤ 0
ds ˜
for every choice of t. Let us assume (for the moment) that (∇f )p 6= 0 then
we should be able to˜ compute λ > 0 so that t = λ (∇f ) is a unit vector. If
p
˜ find
you now substitute this into the above you will
142
λ (∇f )p · (∇f )p ≤ 0
Look carefully at the left hand side. Each term is positive (remember a · a
˜ or
is the squared length of a vector a) yet the right hand side is either zero ˜
˜
negative. Thus this equation does not make sense and we have to reject our
only assumption, that (∇f )p 6= 0.
We have thus found that if f is to have a maximum at P then we must have
0 = (∇f )p
This is a vector equation and thus each component of ∇f is zero at P , that

is
∂f ∂f
0= , and 0= at P
∂x ∂y
It is from these equations that we would then compute the (x, y) coordinates
of P .
Of course we could have posed the related question of finding the points at
which a function is minimised. The mathematics would be much the same
except for a change in words (maximum to minimum) and a corresponding
change in ± signs. The end result is the same though, the gradient ∇f must
vanish at P .
Example 5.28. Find the points at which f = 4 − x2 − y 2 attains its maxi-

mum.
Recall that stationary points were found in functions of one variable by

setting the derivative to zero and solving for x. We either found a local
maximum, a local minimum, or an inflection point (for example f (x) = x3
at x = 0). As we have just seen, for functions of two variables, the stationary
points are found similarly, and we obtain the following types:
143
• A local minimum
• A local maximum
• A saddle point
When we solve the equations
0 = (∇f )p
we might get more than one point P . What do we make of these points?
Some of them might correspond to minimums while others might correspond
to maximums of f , and others still may correspond to saddle points. The
three options are shown in the following graphs.
A typical local minimum
A typical local maximum
144
A typical saddle point
A typical case might consist of any number of points like the above.
5.7.2 Notation
Rather than continually having to qualify the point as corresponding to a

local minimum or local maximum of f we commonly lump these into the
one term local extrema. Please note that although saddle points are a type
of stationary point, we do not call them extrema (there is nothing ‘extreme’
going on with a saddle point!).
Note when we talk of minima, maxima and extrema we are talking about
the (x, y) points at which the function has a local minimum, maximum or
extremum respectively.1
5.7.3 Minima, Maxima or Saddle point
We have just seen that a function of two variables has stationary points
∂f ∂f
when = = 0. If (a, b) are the (x, y) coordinates of a stationary point
∂x ∂y
for f (x, y), then we can say
• A local maximum occurs when f (x, y) ≤ f (a, b) for all (x, y) close to
(a, b)
1
In fact, like the one variable case, two variable functions can possess points of singu-
larity, i.e., where the partial derivatives do not exist, and these can correspond to local
extrema as well. As you’d expect, the collection of all stationary points and singularity
points are called the critical points. However, we will not encounter two variable functions
with points of singularity in this course!
145
• A local minimum occurs when f (x, y) ≥ f (a, b) for all (x, y) close to
(a, b)
• A saddle point if it is neither a maximum or minimum
There is another way we can classify the stationary points of a function

of several variables. You should recall that for a function of one variable,
f = f (x), that its extrema could be characterised simply by evaluating the
sign of the second derivative. In other words, for y = f (x), the extrema are
where
df
=0
dx
df
Then for these values of x (where = 0) we examine the second derivative.
dx
If
d2 f
• > 0 then this corresponds to a local minima
dx2
d2 f
• < 0 then this corresponds to a local maxima
dx2
d2 f
• = 0 then no decision can be made (e.g. x3 or x4 ).
dx2
Now we want to take this idea to functions of several variables. Can we

do this? Yes, but with some modifications. Without going into the details
(these are covered in a different course) we state the following test.
Characterising stationary points - second derivative test

If 0 = ∇f at a point P then, at P compute
2
∂2f ∂2f ∂2f

D= −
∂x2 ∂y 2 ∂x∂y
Using D we can now classify the stationary points P .
146
∂2f
A local minima when D>0 and >0
∂x2
∂2f
A local maxima when D>0 and <0
∂x2
A Saddle point when D<0
Inconclusive when D=0
Example 5.29. Classify the stationary points of f (x, y) = x2 + y 2 − 2x −

6y + 14.
Example 5.30. Classify the stationary points of f (x, y) = y 2 − x2 .
147
5.7.4 Application of extrema
As a final note, we will now turn to some applications of the use of extrema.
Example 5.31. We are required to build a rectangular box with a volume

of 12 cubic centimetres. Since we are trying to economise on building costs,
we also require the box to be made out of the smallest amount of material.
What are the dimensions of the box that will satisfy these requirements?
Solution: First we need to set up our equations. Let the dimensions of the
box be x, y and z for the length, width and height respectively. The volume
of the box is given by
V = xyz = 12
The total surface area of the box is
A = 2xy + 2xz + 2yz
Rearranging the equation for the volume will give

12
z=
xy
Substituting this into our equation for the surface area will give
12 12 24 24
A = 2xy + 2x + 2y = 2xy + +
xy xy y x
Now we have a function A = f (x, y) which we can minimise. Taking partial

derivatives of A = f (x, y) with respect to x and y gives
∂A
=
∂x
and
∂A
=
∂y
∂A ∂A
Now let = 0 and = 0 and solve for both x and y.
∂x ∂y
148
Thus the dimensions of the box are x = ,y= and z = .
The second partial derivatives (for the above values of x and y) are
∂2A
=
∂x2
∂2A
=
∂y 2
∂2A
=
∂x∂y
Using the Second Derivative Test,
∂ 2 A ∂ 2 A ∂ 2 A 2
D= −
∂x2 ∂y 2 ∂x∂y
Since D = the dimensions of the box of volume 12cm3 that uses the
minimum amount of material are x = y= z= .
149
Earlier we looked at methods of finding the shortest distance from a point
to a plane. We can now use extrema to answer questions such as these.
Example 5.32. A plane has the equation 2x + 3y + z = 12. Find the point
on the plane closest to the origin.
Solution: To answer this question we clearly want to minimise the distance
from the origin to the point on the plane. Let (x, y, z) be the point on the
plane. The distance between two points is given by
p
d = (x − x0 )2 + (y − y0 )2 + (z − z0 )2
Note that if we let G = d2 and minimise G this is the same as minimising

d. Since (x0 , y0 , z0 ) = (0, 0, 0) we can now write
G = x2 + y 2 + z 2
Using the equation of the plane, where z = 12 − 2x − 3y we now have
G = x2 + y 2 + (12 − 2x − 3y)2
We now take partial derivatives of G with respect to x and y.
∂G
=
∂x
and
∂G
=
∂y
∂G ∂G
Let both = 0 and = 0 and solve for x and y (simultaneous equations
∂x ∂y
is handy here).
Thus x = and y = . We can substitute these into the equation of

the plane to find z = .
150
The second partial derivatives, using the above values for x and y, are
∂2G
=
∂x2
∂2G
=
∂y 2
and
∂2G
=
∂x∂y
Using the Second Derivative Test,
∂ 2 G ∂ 2 G ∂ 2 G 2
D= − =
∂x2 ∂y 2 ∂x∂y
Thus the point on the plane that gives the minimum distance to the origin
is (x, y, z) = ( ).
Example 5.33. You are given three positive numbers. The product of the
three numbers is P . The sum of the three numbers is 10.
(a) Find the three numbers that will give a maximum product?
(b) Show that this gives a maximum product.
(c) If it was required that the three numbers be whole numbers, can you find
the three (non-zero)numbers that sum to 10 and give a maximum product?
Does your answer to (a) help you find this?
151

MAT1841notesfor2024 S2

Uploaded by

Copyright:

Available Formats

MAT1841notesfor2024 S2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MAT1841notesfor2024 S2

Uploaded by

Copyright:

Available Formats

MAT1841

Continuous Mathematics for Computer Science

1 Vectors, Lines and Planes 2

5 Multivariable Calculus 111

Vectors, Lines and Planes

1.1 Introduction to Vectors

1.1.1 Notation and definition

1.1.2 Linear independence

1.1.3 Algebraic properties

What rules must we observe when we are working with vectors?

• Stretching (scalar multiple)

Example 1.2. Given v = (3, 4, 2) and w = (1, 2, 3) compute v + w and

Example 1.4. Given v = (1, 2, 7) and w = (3, 4, 5) draw and compute v− w.

1.2 Vector Dot Product

How do we multiply vectors? We have already seen one form, where we

• v · w is a single number not a vector (i.e. it is a scalar )

The length of a vector v is defined by

We can now show that

|v − w|2 = |v|2 + |w|2 − 2|v||w| cos θ

1.2.2 Unit Vectors

1.2.3 Scalar projections

The scalar projection, vw , of v in the direction of w is given by

The vector projection, vw , of v in the direction of w is given by

Let v = (vx , vy , vz ) and w = (wx , wy , wz ). Then the Dot Product

Consider the angle θ between the two vectors such that 0 ≤ θ ≤ π:

Two vectors are orthogonal if and only if

The scalar projection, vw , of v in the direction of w is given by

1.3 Vector Cross Product

Example 1.11. Verify all of the above.

Example 1.12. Given v = (1, 2, 7) and w = (−2, 3, 5) compute v × w, and

Thus we must have

How do we construct n and λ? Let’s do it!

1.3.2 Right hand thumb rule

Now for λ, we will show that

We have now found that

Let v = (vx , vy , vz ) and w = (wx , wy , wz ). Then the Cross Product

v × w = −w × v gives a vector orthogonal to both v and w, and

If θ is the angle between v and w, such that 0 ≤ θ ≤ π:

These equations for the line are all of the form

• First put t = 0, then x = a, y = b, z = c. That is (a, b, c) are the

A common interpretation is that (a, b, c) are the coordinates of one (any)

Example 1.17. Show that a line may also be expressed as

The parametric equations of a line are

x(t) = a + pt y(t) = b + qt z(t) = c + rt

The vector equation of a line L is determined using a point P on the

A plane in 3-dimensional space is a flat 2-dimensional surface. The standard

Example 1.24. Sketch each of the planes z = 1, y = 3 and x = 1.

1.5.1 Constructing the equation of a plane

1st point a·1+b·0+c·0=d

Now we have a slight problem, we are trying to compute four numbers,

1.5.2 Parametric equations for a plane

Recall that a line could be written in the parametric form

A line is one-dimensional so its points can be selected by a single parameter

Now we have nine parameters a, b, c, p, q, r, l, m and n. These can be com-

1.5.3 Vector equation of a plane

The Cartesian equation for a plane is

ax0 + by0 + cz0 = d

(a, b, c) · ∆x10 = (a, b, c) · ∆x20 = 0

(a, b, c) = the normal vector to the plane