Calculus 1A Course Notes
Calculus 1A Course Notes
Calculus 1A Course Notes
Edward Green
School of Mathematical Sciences
University of Adelaide
These lecture notes contain all the material you will be required to know for Mathematics IA
(Calculus). They are an aid to your learning, and will complement the lectures. Whilst there will
be considerable overlap, the material presented here will not be identical to what you see in lectures,
and you will put yourself at a disadvantage if you do not attend them. In particular, lectures will
feature active learning experiences which will help you develop and test your understanding, and
provide opportunities to ask questions of the lecturer or discuss important concepts with your fellow
students.
3
4
Contents
Preface 3
1 Functions 7
1.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Definition of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 How to denote the domain and range of a function . . . . . . . . . . . . . . . 11
1.3 Other examples of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Calculating the inverse of a function . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Inverse trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Zeros of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.7 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7.1 Basic properties of limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.7.2 The limit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.9 The interval bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.10 Summary of learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5
2.7 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.8 Applications to marginality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.9 Summary of learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Integration 61
3.1 Finding the displacement of a vehicle from the velocity . . . . . . . . . . . . . . . . . 61
3.2 Summation notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Defining the definite integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.1 Using summation to calculate definite integrals . . . . . . . . . . . . . . . . . 70
3.3.2 Definite integrals and areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4 Antiderivatives and The Fundamental Theorem of Calculus . . . . . . . . . . . . . . 74
3.4.1 Indefinite integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5 Integration by substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.6 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7 Application: rocket flight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.8 Trigonometric substitutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.9 Application: population growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.10 Partial fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.11 Integration of rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.12 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.12.1 Improper integrals of the first kind . . . . . . . . . . . . . . . . . . . . . . . . 99
3.12.2 Improper integrals of the second kind . . . . . . . . . . . . . . . . . . . . . . 100
3.13 Summary of learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6
Chapter 1
Functions
Lecture 1
1.1 Functions
In science, technology, finance and many other fields, we need to be able to express relationships
between quantities in a precise way. For example, we might be interested in the force required to
make a body accelerate at particular rate, or the amount of energy that could be released from
a certain mass. Mathematics provides a natural way of expressing these relationships e.g. F =
ma, E = mc2 . The quantities which are related to each other are called the variables, and the
relationship itself is encapsulated by a function, a rule which relates the values of the variables.
For the simplest case where there are just two quantities which are related to each other, you will
already be familiar with writing these kinds of relationships in the form:
y = f (x),
where, of course, the choice of symbols is arbitrary, and the meaning of the quantities x and y, and
the nature of the function, f , will depend on the situation being considered.
We can think of x (which is usually called the independent variable) as the input. Then, the
function, f , tells us what to do to x to get the value of the dependent variable, y. In this course, we
will deal only with relationships where x and y are real numbers. In precise mathematical language,
we say that f is a real-valued function of a single real variable.
You will have already met a wide variety of functions of this type, such as:
• Polynomials
p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0
is a polynomial function of degree n (if an 6= 0; each of the a0 , a1 , . . . , an is a (constant) real
number).
• Rational Functions
If p(x) and q(x) are polynomial functions
p(x)
r(x) = is a rational function.
q(x)
7
Note that when q(x) = 0, r(x) is undefined.
• Trigonometric Functions
Recall that the basic definitions of the trigonometric functions are based on the unit circle
y
(x, y)
1
θ x
sin θ y
cos θ = x sin θ = y tan θ =
=
cos θ x
Remember: In mathematics, angles are always radians.
0.5
x
−3 −2 −1 1 2 3
−0.5
sin x
cos x
−1
There are three other trigonometric functions, which may be a little less familiar:
1
sec θ =
cos θ
1
csc θ =
sin θ
1 cos θ
cot θ = =
tan θ sin θ
8
• Exponential functions
These are functions of the form f (x) = ax , where a is a positive real number.
You should ensure you are familiar with the basic properties of exponential functions, such
as:
a0 = 1, (ax )(ay ) = ax+y , (ax )y = axy ,
where x and y are any real numbers. You will need to use these properties in assignments
and tutorials, and in the final examination.
ex
2x
6
1 x
2
f (x)
x
0.5 1 1.5 2
• Logarithmic functions
These are closely related to exponential functions. If x = ay (a > 0, a 6= 1 ), then the
logarithm to the base a, loga x = y (i.e. loga x is the power to which we must raise a to get
x).
The logarithm to the base e ( ≈ 2.72) is called the natural logarithm (or natural log for short)
and has the special notation ln x. Like its counterpart, the exponential function, it also plays
an important role in calculus.
You should ensure you are familiar with the following properties
x
loga (xy) = loga x + loga y, loga = loga x − loga y,
y
loga x
loga xy = y loga x, logb x = ,
loga b
9
where again x and y can be any real numbers, and b > 0, b 6= 1.
x
0
1 2 3 4
f (x)
−2
ln x
log2 x
−4 log10 x
In fact, the mathematical concept of a function is very broad. In earlier work, you have prob-
ably become used to seeing functions which can described by a single, straightforward equation
(e.g. f (x) = x2 ), and which have graphs that are nice smooth curves. But functions are allowed to
have jumps or kinks, can be specified using words or tables rather than formulae, or be described
by different formulae for different values of x. The precise mathematical definition of a function
below makes this clear.
Definition 1.1 (Definition of a function). Let D be a set of real numbers. A function f with
domain D is a rule (or a set of rules) that assigns to each number x in D a unique real number
called f (x).
The range of f is the set of all values of f (x), that is, R = {f (x) | x is a number in D}.
Remarks:
• We will sometimes use the notation D(f ) and R(f ) for the domain and range of f , respectively,
particularly if we need to distinguish between the domains (or ranges) of different functions
(e.g. the domain of g versus the domain of f ).
• Note the significance of the phrase ‘. . . assigns to each number x ∈ D a unique real number
√
called f (x).’ This means that, for example f (x) = ± x (x ≥ 0) does not satisfy the
10
definition of a function, because it assigns two values of f (x) to each nonzero value of x. (For
x = 4, it would give the values of +2 and −2 for f (x).)
• The definition does not mean that there are no values x1 , x2 , (x1 6= x2 ) such that f (x1 ) =
f (x2 ). For example, f (x) = x2 (where x can be any real number), obeys the definition of a
function, even though f (2) = f (−2) = 4; each value of x is only assigned a single value, f (x).
• Formally, in order to specify a function completely we must give the domain D and range R.
However, for brevity, often we do not write them down explicitly. In these cases we adopt the
convention that the domain is the largest possible subset of the real numbers for which the
function can be defined. The range is then the set of points which are the images under the
function of points in the domain.
Sometimes, we may want to exclude certain points from an interval, or join two (or more)
intervals together to give the complete domain or range. Intervals are examples of sets, and the
following items of set notation can be useful for expressing these kinds of relationships concisely.
If A and B are sets (e.g. intervals), then:
• x ∈ A means that the object x is an element of the set A (e.g. x ∈ [0, 1] means 0 ≤ x ≤ 1)
11
• A ∩ B the intersection of the sets A and B: A ∩ B = {x | x ∈ A and x ∈ B}, so [0, 3] ∩ [1, 4] =
[1, 3]
• A\B means the difference of the sets A and B: A\B = {x | x ∈ A but x 6∈ B} . For example,
(0, 2) ∪ (2, 3) = (0, 3)\{2}.
Example 1.1. If f (x) = sin x, x ∈ R (that is, f has domain R) and if g(x) = sin x, − π2 ≤ x ≤ π2 ,
then f and g are different functions.
Usually we consider sin x, x ∈ R (that is, function f ), but sometimes we will want to con-
sider sin x for some (proper) interval in R; after a few lectures we return to this example.
√
Example 1.2. If we write f (x) = x + 2, and do not specify the domain, then implicitly we mean
√
f (x) = x + 2, x ≥ −2;
that is,√the domain is the set {x|x ≥ −2} - the largest subset of the real numbers for which
f (x) = x + 2 is defined (as a real number).
• The hyperbolic functions are a collection of functions defined in terms of the exponential
function. The most important are sinh, cosh and tanh which are defined to be:
1 1 1
sech x = , cosech x = , coth x = .
cosh x sinh x tanh x
The naming of the hyperbolic functions deliberately echoes that of the trigonometric func-
tions. As we will see during the course, they have a number of similar properties: for example
cosh2 x − sinh2 x = 1 (compare the trigonometric identity cos2 x + sin2 x = 1). (In later
courses, you will learn how the two types are closely connected through the theory of com-
plex functions.) However, unlike the trigonometric functions, the hyperbolic functions are
not periodic. Graphs of cosh x, sinh x and tanh x are shown below.
12
The hyperbolic functions
f (x)
x
−2 −1 1 2
−2 cosh x
sinh x
tanh x
Many people encounter difficulty with pronouncing the names of some of the hyperbolic
functions. Although there are some variations between people and countries (particularly US
vs. UK) the following pronunciations are widely used:
• The absolute value function (also known as the modulus function), f (x) = |x| is defined
by a two part rule: (
x if x ≥ 0,
f (x) = |x| =
−x if x < 0
0.8
0.6
0.4
0.2
|x|
x
−1 −0.5 0.5 1
13
• The Heaviside function, H(x), which is defined to be
(
1 if x ≥ 0,
H(x) =
0 if x < 0.
This function is also called the unit step function. It is frequently encountered in electronics
applications, since, if we let the independent variable be time, t, H(t) represents a signal which
switches on at t = 0.
(
1 if x is a rational number,
D(x) =
0 otherwise
The Dirichlet function has many strange properties - for example, what would its graph look
like? However, it clearly is a function, as it satisfies Definition 1.1 above!
Consider the problem of pricing a ticket for a bus or train journey. Since many of the costs involved
scale with the distance travelled, in the past, it was common for companies to calculate fares using
a fixed rate per kilometre (this method is still used in some places). Mathematically, we would
say the cost C (in dollars) of your ticket would be a function of the distance x that you would be
travelling - i.e. C = f (x). For example, if the transport company set a rate of $0.30 per km, we
would have C = f (x) = 0.3x (a linear function).
For this type of pricing scheme, as well as being able to calculate the cost of the ticket when
the distance to be travelled is known, we can reverse the process, and find the distance travelled
if we know the cost. For example, if a ticket from my home to the city is $3.60, then the distance
between these two points must be 12 km (3.60/0.3). In mathematical terms, what we have done
here is calculate the inverse of the function, f , which we denote by f −1 . If f tells us the price of
the ticket (output) given the distance (input), the inverse f −1 reverses the process; it tells us the
distance given the cost. We would write C = f (x) = 0.3x ⇔ x = f −1 (C) = C/0.3 = 10C/3.
Nowadays, most cities have introduced a more streamlined pricing system, where fares are con-
stant within certain zones, or simply charged at a flat rate. For example, Adelaide Metro charges
around $3.60 irrespective of distance. In this case, the cost of the ticket can still be thought of
as a function of distance C = f (x) = 3.60 (a constant function). Note that now, although it is
straightforward to calculate the fare when the distance to be travelled is known (the result is always
$ 3.60), we can no longer reverse the process to find out the distance travelled from the ticket cost.
14
Flat fare
Distance-based fare
graph of C(x)
3. Move vertically downwards to read off
the distance travelled
Distance, x
Finding the cost given the distance travelled (and vice versa)
When can we find the inverse of a function f ? What property of f does it rely on?
If we draw the graphs of the two functions, what is going on becomes clearer. In both cases,
given a value of x, we can find the value of C by moving from x vertically upwards to the graph of
the function, then moving leftwards to read off the value of C. As both functions obey Definition
1.1, we get one corresponding value of C for each x (though the value of C we get may be the
same for many values of x). Now, let’s try to reverse the process, beginning with the first pricing
scheme (by distance). We start by reading off the cost on the vertical axis, and move rightwards
until we hit the line, then move vertically down until we hit the x-axis, where we can read off the
corresponding distance. If we try this with the flat fare pricing structure, we cannot determine a
value of x (it could be any distance x).
The examples illustrate the property we need to be able to invert a function. Whilst for any
function (by definition) there will only be one output value f (x) for any input value x, in order
to reverse the process, we need to certain that for every output value f (x) there is only one input
value x. If we have two values x1 6= x2 , with f (x1 ) = f (x2 ) (which is permitted by the definition of
a function), then knowing the value of f does not allow us to determine the value of x; it could be
x1 or x2 . Functions that have the property that each value of f (x) corresponds to only one value
of x are called one-to-one (1-1) or injective. The definition below makes this idea precise.
15
Definition 1.2 (One-to-one functions). A function f : D → R (D, R, the domain and range
of f are subsets of R) is one-to-one (1–1 ) if
x1 x2 x
1 1 x
16
or more, then we would have a y value that corresponds to two or more x values. Then the function
could not be one-to one). This gives us a useful way of testing if a particular functions is one-to-one
or not.
Horizontal Line Test A function f is 1–1 if and only if any horizontal line meets the graph of f
at most once.
Increasing, decreasing and monotonic functions Functions which are either always increas-
ing or always decreasing are called monotonic. Mathematically, we define them as follows:
If f (x) is monotonic (either always increasing, or always decreasing), any horizontal line can
cross the graph of f (x) at most once. (If the graph crosses a horizontal line twice, then there must
be x1 < x2 with f (x1 ) = f (x2 ), violating monotonicity.) Thus, using the horizontal line test, we
can see that monotonic functions are one-to-one.
17
Now we need to find a method for calculating the inverse. Consider the process we use when
reading off the value of f (x) given x from a graph. We find the required value of x on the horizontal
axis, move vertically upward until we meet the graph of the functions, and then move horizontally
to read off the value of y = f (x). If we want to find x given y = f (x), we do this by starting
with the y value, tracing across to the graph of f (x), and then reading off the value of x (this is
equivalent to finding x = f −1 (y)). Note that the only difference is that we have exchanged the
roles of the x and y axes. Swapping the x and y axes is equivalent to performing a reflection in the
line y = x. Hence, if we take the graph y = f (x), and reflect it in the line y = x, we end up with
the graph of x = f −1 (y) (where the y axis is now the horizontal axis, the the x axis is the vertical
one). Finally, if we swap the symbols x and y to get a horizontal x axis and vertical y axis, then
our graph will be y = f −1 (x). This gives us a method by which we can find the inverse function
f −1 (x) for a given f (x).
We now have the functional form for the inverse function, specified in terms of the variable y.
There is nothing particularly special about the symbols x and y; we could simply change them
to, say, α and β, giving us α = f −1 (β). However, we often want to have the inverse function
specified as a function of x (for example, this can be convenient for plotting purposes). To
obtain it in that form, there is one final step.
Using the formula V = f (t), we can find the volume of the balloon at a given time, t. However,
we in fact want to find the time, given the volume. This is given by t = f −1 (V ) (we can think of this
as taking the relationship V = f (t) and applying f −1 on both sides, using the fact f −1 (f (t)) = t).
Note the we know f −1 exists, since we are told f is increasing. Now, we know that, for a sphere,
the relationship between the volume V and radius r is V = 34 πr3 . Thus, when the radius is 1m,
the volume is 43 π m3 . So, the time when the balloon attains a radius of 1m is t = f −1 34 π .
18
given the lengths of two sides. In order to do this, you will at some point have used the button on
your calculator marked sin−1 or arcsin - which gives you the angle, θ, when sin θ is known. Hence,
this button calculates the inverse function of sin θ. How is this possible?
Recall that we said that, formally, f (x) = sin x, x ∈ R and g(x) = sin x, x ∈ [− π2 , π2 ] are
different functions, since their domains are different. If we look at the plot of g(x) below, we can
see that it is increasing on [ −π π −1
2 , 2 ] and so it is one-to-one. Hence, it has an inverse function, g (x);
this inverse function is written as arcsin x , or sin−1 x. Essentially, in order to be able to find the
inverse of the sin function, we ‘restrict the domain’ to make the function one-to-one. The choice to
restrict the domain to [− π2 , π2 ] is conventional, but arbitrary; any other domain on which the sin
function is one-to-one would be equally valid.
g(x) = sin x, − π2 ≤ x ≤ π
2
1
0.5
x
−1.5 −1 −0.5 0.5 1 1.5
−0.5
−1
As the range of g, R(g) = [−1, 1], the domain D(arcsin) = [−1, 1]. Also the domain of g is
D(g) = [− π2 , π2 ], so the range R(arcsin) = [− π2 , π2 ].
g −1 (x) = arcsin x = sin−1 x, −1 ≤ x ≤ 1
x
−1 −0.5 0.5 1
−1
Inverse tangent We similarly restrict the domain of tan so we can define the inverse tangent
function arctan or tan−1 . Let g(x) = tan x, − π2 < x < π2 . As we can see from the graph below, on
the interval (− π2 , π2 ) tan x is increasing and therefore 1–1.
19
g(x) = tan x , − π2 < x < π
2
x
−1 −0.5 0.5 1
−2
−4
0.5
x
−4 −2 2 4
−0.5
−1
Example 1.5. Although we can define inverse sin, cos and tan functions by restricting their
domains in this way, we need to be aware that we have done this when we are using them to solve
problems. Recall the Sine Rule, which relates the angles and lengths of the sides of any triangle
(not necessarily right-angled):
sin A sin B sin C
= = .
a b c
20
A
c b
B C
a
Definition sketch for the Sine Rule and Cosine Rule
Consider a triangle such that a = 11 cm, b = 15.5 cm and the angle A = 0.75. What is the
angle B?
In fact, it does not. Recall that, for any angle, θ, sin θ = sin π − θ. Hence, we need to check
whether π − 1.29 ≈ 1.85 is also a solution for B. Since the angles in a triangle must add up to π
and 0.75 + 1.85 = 2.60 < π, this is indeed another possible solution.
Question: The Cosine Rule states that for any triangle (not necessarily right-angled)
a2 = b2 + c2 − 2bc cos A
(where we have used the same notation as in the Sine Rule above). Can we ever have two possible
solutions if we use the Cosine Rule to calculate an angle? If not, why not?
20
15
10
x
2 4 6 8 10
21
In practical applications, we frequently need to be able to find the value of x, such that f (x)
takes a particular value. For example, let the height above ground of a ball (in metres) when it is a
horizontal distance x from the point where it was thrown be given by h(x) = x(10−x). How far has
the ball travelled horizontally when it is five metres above the ground? Mathematically, we need to
find the values of x which satisfy the equation x(10−x) = 5. These values occur where the blue curve
meets the red line in the graph above. If we define a new function f (x) = h(x) − 5 = x(10 − x) − 5,
our problem reduces to finding the values of x satisfying f (x) = 0. These values of x are called the
zeros of the function, f (x).
This is a quadratic equation, which we can easily solve using the quadratic formula. The zeros of
f (x) are
√
−10 ± 100 − 20 √ √
x= = 5 ± 20 = 5 ± 2 5.
−2
As this example illustrates, many problems reduce to finding the zeros of a particular function.
Now consider the general problem of finding the zeros of a function, f (x). If we can write down
the inverse function f −1 , this is very straightforward, since if f (x) = 0, then x = f −1 (0). However,
there are many situations, including the example of the thrown ball above, where we want to find
the zeros of a function which is not one-to-one and hence has no inverse. In other cases, it may not
be possible to write down the inverse in an explicit form. Then what should we do?
In the case of the thrown ball, although the function is not one-to-one, the problem is still
straightforward, because we can use the quadratic formula to calculate the zeros. But for most
functions that you could think of, no such simple formula would exist. Suppose we want to find
solutions of cos x = x. This is equivalent to finding the zeros of g(x) = cos x − x. Although this
function is one-to-one on x ∈ [0, 1] (as can be seen from the graph of g(x)), we cannot write down
an expression for the inverse function explicitly. However, the graph does suggest we might proceed
by noting that g(0.8) ≈ −0.103 < 0 and g(0.7) ≈ 0.065 > 0. Since the value of the function changes
sign between x = 0.7 and x = 0.8, we could reason that a solution will lie somewhere in the interval
(0.7, 0.8). What property of the function are we relying on here when make that deduction? Can
you think of a function where f (a) < 0 and f (b) > 0, but there is no value of x ∈ (a, b) where
f (x) = 0?
22
g(x) = cos x − x
1
0.5
g(x)
x
0
0.2 0.4 0.6 0.8 1
This f obeys the definition of a function we gave earlier, and f (−1) < 0 and f (1) > 0, but since
f (x) is nowhere equal to zero, there cannot be a solution of f (x) = 0 for x ∈ (−1, 1). How has the
reasoning that worked for g(x) = cos x − x gone wrong here?
The answer is obviously because the function f (x) ‘jumps’ from the value −1 to +1 ‘skipping
out’ zero, whereas g(x) goes through every value between −0.103 and 0.065. The graph of g(x)
is an unbroken line, whilst that for f (x) has a ‘gap’ near x = 0. Clearly, our method of tracking
down the zero by finding places where the function changes sign will only work for functions like
g(x) where the graph is an unbroken curve, which excludes some functions.
The property that a function must have in order for our zero-finding method to work is called
continuity. Roughly speaking, it means that we can draw the graph of the function without ever
having to take our pen off the paper. However, in order to define more precisely what we mean by
continuity, we must first revisit the idea of a limit, which you have probably already come across
in connection with the definition of a derivative (a topic we will return to in Chapter 2).
1.7 Limits
In order to define continuity, we first need to think carefully about what happens to the value of Lecture 4
f (x) as x approaches some value of interest. We intuitively define the limit of f (x) as x approaches
c as follows.
Definition 1.5 (Limits). Let f be a function defined on a domain that includes all values
close to c, but need not include c itself. For example, the domain could be a set of the form
(a, c) ∪ (c, b) = (a, b) \ {c} where a < c < b. We say that the limit of f (x) as x approaches c is
23
the real number L if the values of f (x) get closer and closer to L as x gets closer and closer to
c (with x 6= c). We write
lim f (x) = L.
x→c
The example above is straightforward; in this case, c = −2 is actually in the domain of f , and
the value of f (x) gets closer and closer to f (c) as x → c.
It is important realise that not all examples of limits are like the preceding example where we
can substitute for the value of f (x) at c. In fact, the number L = limx→c f (x) depends on the
values of f (x) for x close to c but not at x = c. The definition makes this clear, since the function
f does not even need to be defined at c. However, if f is defined at c, we can arbitrarily redefine
its value at c without affecting L.
and c = −2.
Now, suppose we let x get closer and closer to −2, but making sure we always choose values
such that x > −2. Then f (x) approaches 1. If we do the same thing, but this time always choosing
values of x < −2, f (x) again approaches 1. Thus, once again, L = 1 – despite the fact that
f (−2) = −7.
In the next example, we illustrate that the limit L can exist even if f (c) is undefined.
x f (x)
0.1 0.9983341665
0.01 0.9999833334
0.001 0.9999998333
0.0001 0.9999999983
24
It is important to note that, in determining limx→c f (x), we must consider values on both sides
of c. Although we did not explicitly check negative values of x in our table above, if we were to do
so, we would get the same result, L = 1, since for this function, f (x) = f (−x).
Thus far we have seen examples where the limit exists. However limits often do not exist. For
example consider
1
f (x) = sin .
x
As x approaches 0 this oscillates wildly between −1 and 1 and is definitely not getting closer and
closer to any particular value L.
Similarly, if we consider the Heaviside function,
(
1 if x ≥ 0,
H(x) =
0 if x < 0,
with c = 0, then we can see that if we always take negative values of x, as x gets closer and closer
to zero, H(x) = 0. However, if we let x take only positive values, then, as we get closer and closer
to zero, H(x) = 1. Here, the limit again does not exist, since we cannot get a consistent value for L
when we consider x values on both sides n of zero. An alternative way to show this is to think about
what happens to H(xn ) for xn = −1 2 for n = 1, 2, 3 . . .. Although xn gets closer and closer to
zero as n increases, the value of the function oscillates between the values of 0 and 1; it does not
get any closer to a fixed value, L.
The definition of limit we have given above is ‘intuitive’ in that it suffers from a number of
ambiguities due to the lack of precision of the everyday language employed. For example what does
‘closer and closer to’ mean ? It could be read as meaning that the values of f (x) are monotonically
approaching L as x approaches c. This is definitely not the case as we can see by considering the
example
1
f (x) = x sin
x
Like the preceding example this also oscillates up and down but the amplitude of the oscillation
decreases as we approach 0 because of the x factor. It has limit L = 0 as x approaches 0.
In order to avoid these kinds of issues, which result from the imprecision of everyday language,
mathematicians use a more formal definition of limits. Many of you will meet this definition in
future, when taking more advanced mathematics courses. However, for the purposes of this course,
the intuitive definition we have presented above will suffice.
Later in the course, we will need to be able to calculate limits. Hence, in the next two sections,
we present some basic facts about them, together with some useful properties that are known as
the limit laws.
Property 1.1 (Limits are unique). If lim f (x) exists then it is unique.
x→c
That is, if lim f (x) = L and lim f (x) = M then L = M .
x→c x→c
25
Note that this makes sense intuitively, since if L and M are not equal they must be some distance
apart and if f (x) is very close to one of them it cannot also be very close to the other.
Again these are intuitive: if x gets closer and closer to c then x gets closer and closer to c! Also,
if K is a constant then as x gets close to c, K does not change and so can only be close to itself.
To calculate results such as lim (x34 + 3x7 + 2) = 2 we use these elementary limits and the limit
x→0
laws.
Property 1.4 (Limit Laws). Let f and g be functions both defined in (a, b)\{c} (a neigh-
bourhood of c, not including c itself) and suppose that lim f (x) = L and lim g(x) = M .
x→c x→c
L
if M 6= 0
f (x) M
(iii) lim = does not exist if M = 0, L 6= 0
x→c g(x)
may or may not exist if M = 0, L = 0
26
(iv) Composition of functions: if lim g(x) = M and lim f (x) = f (M ) then, if f is defined in
x→c x→M
an open interval containing M , lim f (g(x)) = f (M ).
x→c
Example 1.9.
P (x) = a0 + a1 x + · · · + an xn ,
then
lim P (x) = P (c).
x→c
P (x)
Similarly for any rational function r(x) = where P (x) and Q(x) are polynomials, we can
Q(x)
use the limit laws to show that
P (c)
lim r(x) = provided Q(c) 6= 0.
x→c Q(c)
Example 1.11.
Example 1.12.
h i6
lim (x2 + 3)6 = lim (x2 + 3) (using limit law for composition of functions)
x→1 x→1
6
= [4] = 4096.
27
Sometimes a limit does not exist because as x approaches c, f (x) becomes arbitrarily large so
that no matter what number L you choose f (x) will not get closer and closer to it. If f (x) is always
positive as x → c, we denote this limit by ∞ but be aware that ∞ is not a number and writing
lim f (x) = ∞
x→c
is just a shorthand way of saying that f (x) gets larger and larger as x gets closer and closer to c.
Similarly, if f (x) is negative, but can become arbitrarily large in magnitude as x → c, then we
would write
lim f (x) = −∞.
x→c
1
However, it is not true that lim = ∞ since it is large and negative for negative values of
x
x→0
x close
to zero and large and positive for positive values of x close to zero (on the other hand
1
lim = ∞.)
x→0 x
1
Recall also lim sin which does not exist even though it is not unbounded (the values always lie
x→0 x
between 1 and −1).
Definition 1.6 (Continuity at a point). Let f (x) be defined on (a, b) and let c be a point in
(a, b). We say that f is continuous at c if lim f (x) = f (c).
x→c
28
2. the limit lim f (x) exists, and
x→c
If the function is defined at c but either of conditions (2) and (3) are not met, we say that f has a
discontinuity at c, or equivalently, that f is discontinuous at c.
Definition 1.7 (Continuity on an interval). We say that f is continuous on the open interval
(a, b) if f is continuous at every point of (a, b).
From this definition, we note that polynomial functions, |x|, cos x and sin x are all continuous
on (−∞, ∞). Rational functions f (x) = p(x)/q(x) are not defined at points where q(x) = 0; hence
they cannot be continuous at such points. For example x−1 is undefined at x = 0. Similarly, tan x
is undefined at x = ± (2n+1)π
2 (where n = 0, 1, 2, . . .).
The Heaviside function is discontinuous at x = 0, since, as we saw earlier, the limit lim H(x)
x→0
does not exist. However, it is continuous at all other points. The Dirichlet function D(x) is
discontinuous everywhere, despite being bounded and defined for all real numbers. We can see
this is true if we consider what happens to D(x) as x → c for both rational and irrational val-
ues of c. If c is irrational, D(c) = 0, but there are rational values of x arbitrarily close to c, so
limx→c D(x) 6= D(c) = 0. Similarly, if c is rational, there are irrational values of x arbitrarily close
to it, and so limx→c D(x) 6= D(c) = 1.
The following properties of continuous functions are often useful in applications where we need
to demonstrate that a function is continuous, and avoid the need to return to the definition of
continuity every time. They follow from the Limit Laws (Property 1.4).
Property 1.5 (Properties of continuous functions). If f and g are continuous at c, and α and
β are any real numbers, then
29
would all usually be continuous functions of time. Less familiar examples would include the variation
in gravitational field strength with the distance from a planet, or the variation in the strength of
the electric field with distance from a charged body. However, you must not let this blind you
to the fact that there are many applications which give rise to functions with discontinuities: for
example, the change over time of the population of a city or the amount of money in a bank account.
Since the values of these functions must change by multiples of a fixed amount (one person for the
population, or one cent for the bank account), we can expect their graphs to have ‘jumps’ when
they are plotted as functions of time.
In a large number of cases, whether a relationship is represented by a continuous or discontinuous
function will depend on the situation we are interested in. Consider, for example, volumes of air
or water, which are composed of large (but finite) numbers of individual molecules. If we think
about the mass of water in a container as a function of time, at a microscopic level it must change
only in discrete ‘jumps’, equal to the mass of a single water molecule. However, in most everyday
applications, like filling up a fish tank, these jumps in mass, and the time intervals between them,
are so small that it makes more sense to treat the mass of water as a continuous function of time.
Conversely, we may need to approximate a continuous function by a discontinuous one: for example,
many computer programmes approximate continuous functions by constant values over very short
intervals.
Property 1.6 (The Intermediate Value Theorem). Let f (x) be a continuous function defined
on an interval [a, b]. Then f (x) takes every value between f (a) and f (b) at least once.
An obvious consequence of this result is that if we have a continuous function, f and we know
two values of x, a and b (with a < b) in the domain of f , such that signf (a) 6= signf (b), then a
zero of f must be somewhere in the interval (a, b). We progressively narrow down the interval by
considering the sign of f (m) where m = (a + b)/2 is the midpoint of the interval. If f (m) has the
same sign as f (a) then a zero must lie in the interval (m, b); conversely, if f (m) has the same sign
as f (b), then a root lies in (a, m). We continue this process until we have narrowed the interval
sufficiently to achieve the required accuracy.
We demonstrate the method using the function f (x) = cos x − x which we discussed earlier.
Suppose we want to find the zero of this function to an accuracy of 0.01. To begin the procedure,
we need to find the two values for a and b. Often this can be done by looking at a graph of the
relevant function. Earlier, we saw that a = 0.7 and b = 0.8 are suitable values, since f (0.7) > 0
and f (0.8) < 0. The steps are now as follows:
30
1. We calculate the mid-point of the interval (0.7, 0.8), m = (0.7 + 0.8)/2 = 0.75.
2. We find f (m) = f (0.75) = −0.018 < 0. This has the same sign as f (0.8). Hence, the root
must lie in the smaller interval (0.7, 0.75).
3. Now we go through the procedure again, calculating the mid-point of the new interval, m =
(0.7 + 0.75)/2 = 0.725.
4. Since f (0.725) = 0.023 > 0, we now know our root must line in the even-smaller interval
(0.725, 0.75).
5. The mid-point of this new interval is m = 0.7375, and f (0.7375) = 0.0027 > 0. Hence, we
now know the zero lies in the interval (0.7375, 0.75).
6. The midpoint of this interval m = 0.74375 and f (0.74375) = −0.0078 < 0. Thus, our zero
lies in the interval (0.7375, 0.74375).
7. Since the lower and upper end points of the interval now differ by less that the required
accuracy (0.01), we can say that the zero of f occurs at x ≈ 0.74 (to two decimal places).
• Specify the domain and range of a function using interval notation and simple set notation
• Define the inverse of a function, and determine if a given function has an inverse
• Define the inverse trigonometric functions by appropriately restricting the domain of the
function
• Use the interval bisection method to find zeros of functions which are continuous on an
interval.
31
32
Chapter 2
Lecture 6
In the previous chapter, we have revisited ideas about functions, and defined them in precise
mathematical terms. In practical applications, the function giving the relationship between the
variables we are interested in encapsulates in a compact form a great deal of information about the
situation. However, this information may not always be in the form that is most useful to us. For
example, we might know the position of an aeroplane as a function of time, but want to find out
how its speed depends on time. Understanding relationships involving rates of change is the goal
of differential calculus.
From your previous studies, you will already know that the rate at which a function f (x) changes
with x is called the derivative of f . Common notations for the derivative include
df
= f 0 (x).
dx
For example, velocity v is the rate of change of position x with time. Hence, if we know a vehicle
is moving such that its distance (in m) from some start point at time t (in s) is given by x = 15t,
then
dx
v= = x0 (t) = 15ms−1 .
dt
Note that when the independent variable represents time, as here, you will sometimes see the
notation ẋ(t) for the derivative of x.
33
f(x)
x x
The derivative f 0 (x) gives the gradient of the tangent to the graph of f at x.
Geometrically, the derivative of a function f (x) at a point x is the gradient of the tangent to
the graph of the function f (x) at x (or, equivalently, the slope of the graph of f (x) at x).
You already know how to calculate the derivatives of a variety of different functions. For
example:
dxr
dx = rxr−1
d
dx sin x = cos x
d
dx cos x = − sin x
d x
dx e = ex
df
Question: In theory, can I always calculate the derivative if I know f (x)?
dx
Consider the Heaviside function, H(x). What is the derivative of H(x) at x = 0? In fact, it is
undefined. We note that, at x = 0, the function jumps by one unit in the y direction. Calculating
the slope ( = change in y / change in x) would involve a division by zero; hence the slope (or
derivative) is not defined there. We have already seen that the function is not continuous at x = 0,
so perhaps it is not so surprising that we cannot differentiate it there. But what about the absolute
value function |x| - can we calculate its derivative at x = 0? Note that |x| is continuous at x = 0,
as its graph is an unbroken curve. However, if we try to calculate the slope of the graph at x = 0
using our usual methods, we might get one of three different answers:
|0.01| − |0| |0.01| − | − 0.01| |0| − | − 0.01|
= 1, = 0, = −1.
0.01 0.02 0.01
This is not acceptable; we require that the definition of the derivative should give us a well-
defined, unique answer. Hence the derivative of |x| is undefined at x = 0. This means that being
differentiable is a stronger condition on a function than merely being continuous. All differentiable
functions are continuous, but not all continuous functions are differentiable.
We can see that the problem with |x| at x = 0 is due to the ‘kink’ in the graph of the function
there. If we look at the slope of |x|, we see it is −1 for x < 0 and 1 for x > 0. Hence, the value
34
of the slope jumps by two at x = 0. Informally, we can think of a function being differentiable at
x as meaning that the function is continuous at x, and does not have a ‘kink’ or ‘twist’ (where the
slope suddenly changes) there.
df f (x + h) − f (x)
= f 0 (x) = lim
dx h→0 h
if this limit exists. If the limit exists, then we say the function f is differentiable at x.
The limit laws tell us that limits are unique. Thus, if f is differentiable at x, then the derivative is
unique. Note that the derivative f 0 (x) is itself a function of the variable x.
Above, we have demonstrated that not all continuous functions are differentiable. However, if
a function is differentiable, then it is also continuous. This fact is straightforward to demonstrate
using the limit laws. If f is differentiable at x, then setting y = x + h, we know that the following
limit exists
f (y) − f (x)
lim = f 0 (x).
y→x y−x
Now, lim y − x = 0, and then, using the limit law for a product we know the following
y→x
35
Example 2.1 (Housebuilding costs). Suppose the cost, C, (in dollars) of building a house of area,
A m2 , is given by C = f (A). What is the practical interpretation of dC 0
dA = f (A)? What are its
units?
dC 2
dA is a cost divided by an area, so its units are dollars per m . Recalling the definition of the
derivative, we can think of it as the change in cost, ∆C, when there is a small increase, ∆A in
the area of the house, so dC
dA is the additional cost per square metre. Hence, if you are planning to
build a house of roughly A m2 , then f 0 (A) is the approximate cost per m2 of the additional expense
involved in building a slightly larger house. It is called the marginal cost. We would expect the
marginal cost to be lower than the average cost per square metre, since once you have already hired
the necessary equipment, builders, etc.to build a fairly large house, the cost of adding a little extra
space is likely to be small.
d2 f
d df
2
= f 00 (x) = .
dx dx dx
Note that the phrase ‘Provided that f 0 (x) is a differentiable function’ is very important: f 0 (x)
3
may not be, even if f (x) was itself differentiable. For example, suppose f (x) = x 2 . Then f 0 (x) =
3√
x and f 00 (x) = 4√3 x . Hence we can see that although f (x) is differentiable at 0, f 0 (x) is not,
2
since f 00 (x) is undefined there.
What information does the second derivative convey? As usual, that depends on the problem
being studied. Suppose that the position of a vehicle at time t is given by a function x(t). The
the first derivative is the rate of change of position of the vehicle - i.e. its velocity, v(t). Hence we
would write
dx
= x0 (t) = v(t).
dt
2
The second derivative of x, ddt2x = x00 (t) = dv
dt = v 0 (t), is the rate of change of velocity. But this is
just the acceleration of the vehicle, a(t).
What does the second derivative tell us about the shape of the graph of a function, y = f (x)?
df
Remember that if dx = f 0 (x) > 0 on an interval, then f (x) is increasing on that interval. If
df 0 d2 f 00
dx = f (x) < 0 on an interval, then f (x) is decreasing on that interval. Since dx2 = f (x) is the
derivative of f , then:
d2 f
• dx2
= f 00 (x) > 0 on an interval, then f 0 (x) is increasing on that interval;
d2 f
• dx2
= f 00 (x) < 0 on an interval, then f 0 (x) is decreasing on that interval.
36
What precisely does this tell us? We illustrate the two types of behaviour in the diagram below.
• If f 00 (x) > 0, then f 0 (x) is increasing; this means the curve is bending upwards.
• If f 00 (x) < 0, then f 0 (x) is decreasing; this means the curve is bending downwards.
In fact, we can usefully describe what the second derivative tells us in terms of the concavity of
the graph, which is defined as follows.
Definition 2.2 (Concavity). We say a function f (x) is concave up at a point x if the tangent
line to the graph at that point lies below the graph in the region close to the point.
A function f (x) is concave down at a point x if the tangent line to the graph at that point lies
above the graph in the region close to the point.
Where concavity changes from up to down or vice versa, the tangent line must cross the graph.
Such a point is called a point of inflection
Concave down
Concave up
• if f 00 (x) > 0, then f 0 (x) is increasing and the curve is concave up;
• if f 00 (x) < 0, then f 0 (x) is decreasing and the curve is concave down.
We have to be a little careful, as the converse is not true. A curve can be concave up (or
down) everywhere and have f 00 (x) = 0 at a point. For example, the curve f (x) = x4 is concave up
everywhere, but has f 00 (0) = 0. Thus, if we know a function is concave up on an interval, the most
we can say is that f 00 (x) ≥ 0 on that interval. Similarly, if a function is concave down, the most we
can say is that f 00 (x) ≤ 0.
37
2.2 Rules for differentiation
Lecture 7 In principle, we could use the definition to calculate the derivative of any function of interest.
However, the procedure would be long-winded and tiresome. In order to save time in calculations,
we need to know the derivatives of a small repertoire of common functions. You will probably have
already learned a number of these at school. Henceforth, it will be assumed that you are familiar
with the following:
dxr
= rxr−1 (r 6= 0)
dx
d
sin x = cos x
dx
d
cos x = − sin x
dx
d
tan x = sec2 x
dx
d x
e = ex
dx
d 1
ln x = .
dx x
Using these derivatives and the general rules below, we are able to calculate the derivatives of
many more complex functions.
Let u and v be differentiable functions, and c1 and c2 be fixed real numbers. Then:
d du dv
(c1 u + c2 v) = c1 + c2
dx dx dx
d du dv
(uv) = v +u
dx dx dx
38
We can derive this result from the definition of the derivative as follows:
d u(x + h)v(x + h) − u(x)v(x)
(u(x)v(x)) = lim
dx h→0 h
u(x + h)v(x + h) − u(x)v(x + h) + u(x)v(x + h) − u(x)v(x)
= lim
h→0 h
u(x + h)v(x + h) − u(x)v(x + h) u(x)v(x + h) − u(x)v(x)
= lim + lim
h→0
h
h→0
h
u(x + h) − u(x) v(x + h) − v(x)
= lim v(x + h) + lim u(x)
h→0 h h→0 h
u(x + h) − u(x) v(x + h) − v(x)
= lim v(x + h) lim + lim u(x) lim
h→0 h→0 h h→0 h→0 h
du dv
= v(x) + u(x)
dx dx
du dv
d u v −u
= dx 2 dx
dx v v
We require v(x) 6= 0 to ensure the quotient is differentiable. We can then derive this result
u(x)
using the Product Rule. Let q(x) = ; then u(x) = q(x)v(x). Applying the Product Rule to
v(x)
u(x) we have
du dv dq
=q +v .
dx dx dx
Rearranging gives
dq 1 du dv 1 du u dv
= −q = −
dx v dx dx v dx v dx
1 du dv
= 2 v −u .
v dx dx
d
−5x3 + 3x2 − x + 2 = −15x2 + 6x − 1.
dx
Example 2.3. Newton’s Second Law of motion states that the rate of change of momentum of a
body is equal to force, F , acting on it. The momentum of a body of constant mass m, moving in a
straight line with velocity v(t) is given by mv. Hence, for such a body, combining Newton’s Second
Law with the linear combination rule for derivatives tells us that
d dv
F = (mv(t)) = m = ma,
dt dt
39
since the mass (m) is constant, and the rate of change of velocity, dv
dt , is the acceleration, a. Note
that this familiar formula is only applicable to bodies of constant mass. This assumption does not
apply to e.g. rockets, where the fuel is rapidly burned up, changing the mass of the vehicle. We
shall consider this example in more detail later in the course.
Example 2.4. Using the product rule (2.2)
d √ √ 1
(sin x) x = cos x x + sin x √ .
dx 2 x
d 5x4 + x2
dx x3 − 4x + 3
(20x3 + 2x)(x3 − 4x + 3) − (5x4 + x2 )(3x2 − 4)
= .
(x3 − 4x + 3)2
t 3
4
V (r(t)) = π 3 + .
3 2
The rules we have learned so far are not so helpful in this case. One approach is to multiply out
the brackets, which gives
27t 9t2 t3
4
V = π 27 + + + .
3 2 4 8
Now, we can find the derivative of V with respect to t using the linear combination rule
t2
dV
= π 18 + 6t + .
dt 2
However, multiplying out brackets is a rather tedious process. Luckily, here we only had a cubic to
contend with;
in some problems the power might be much higher. What we would do if we had,
t 13
say, 3 + 2 ? The calculation then would be much more tedious, and the likelihood of making
40
a mistake, greater. Fortunately, there is a very useful property of derivatives which helps us in
cases like these, where we are interested in differentiating a function which can be written as a
composition of functions.
Property 2.4 (The Chain Rule). Suppose that y is a function of u, which is itself a function
of x - i.e. y = y(u), where u = u(x). Then, the derivative of y with respect to x is given by
dy dy du
= · .
dx du dx
In alternative notation, let f and g be differentiable functions. Then
d
f (g(x)) = f 0 (g(x)) g 0 (x).
dx
For the case of our balloon example, we have V = V (r) = 43 πr3 and r = r(t) = 3 + 2t , so
dV dr 1
= 4πr2 , = .
dr dt 2
Hence, on using the Chain Rule, we find
t 2 t2 t2
dV dV dr 1
= = 4πr2 . = 2π 3 + = 2π 9 + 3t + = π 18 + 6t + .
dt dr dt 2 2 4 2
This is, of course, the same answer as we previously obtained. However, here there was no need for
us to multiply out the brackets (we only did so to demonstrate the fact that the two answers were
the same).
Examples:
d √ √ 1
1. sin x = (cos x) · √ .
dx 2 x
dp 1
2. 1 + x3 = √ · 3x2 .
dx 2 1 + x3
3.
d 1 + x 1/3
r
d 3 1+x
=
dx 1−x dx 1 − x
1 1 + x −2/3 1 · (1 − x) − (1 + x)(−1)
= ·
3 1−x (1 − x)2
1 1 + x −2/3
2
= · .
3 1−x (1 − x)2
41
2.4 Implicit differentiation
Lecture 8
y
x
0
How steep is the tangent to a circle at some point (x, y) on the circle?
As a starting point, we could begin by considering how we would answer this question for
another sort of curve, rather than a circle. For example, what about the curve y = x2 ? In that
case, the solution would be easy. We know that the gradient of the tangent to the curve is given
dy
dy
by dx = 2x; the steepness is just dx = |2x|. Hence, the key part of the solution is calculating dy ,
dx
which is straightforward provided we know the equation of the curve y = f (x). However, in the
case of a circle of radius a (centred at the origin) the equation of the curve is given by
x2 + y 2 = a2 ,
rather than specifying y explicitly as a function of x. One way we might try to get around the
problem is by rearranging the equation, to get
p
y = a2 − x2 ,
but this introduces another problem. For a circle, y can take values between −a and a, but by
choosing to take the positive square root, I can only obtain the positive values. Hence, I only have
the upper half of the circle (I would need two equations to get the complete curve).
Instead, our lives are much easier if we simply differentiate the whole of the original equation
with respect to x. We can do this using the Chain Rule - we just need to remember that y depends
on x (i.e. y = y(x)). Hence,
d 2 dy da2
(x + y 2 ) = 2x + 2y = = 0.
dx dx dx
Rearranging, we obtain
dy −2x x
= =− .
dx 2y y
42
Using this formula, we can find the gradient of the tangent to the circle at any point (x, y) on the
circle, and hence we will know the steepness.
Let f be a one-to-one function which is differentiable on some domain of interest, with in-
verse f −1 . Recall that if y = f −1 (x), then we must have x = f (y), and f (y) = f (f −1 (x)) = x. In
order to ensure f −1 is differentiable at x, we require f 0 (y) 6= 0 (where y = f −1 (x)) or equivalently,
f 0 (f −1 (x)) 6= 0. The reason for this will become clear shortly.
Using implicit differentiation on this last equation, we see that
df (y) df dy dx
= = =1
dx dy dx dx
and so
dy 1
= .
dx df /dy
df dx
But since f (y) = x, dy = dy , and thus
dy 1
= .
dx dx/dy
Now we can see why we needed to be a little careful about the differentiability of f −1 earlier. If
dx dy
= 0, then will be undefined (and hence f −1 is not differentiable at that point).
dy dx
dx
Note that when we calculate , we will usually obtain it as a function of y, and hence the right
dy
dy
hand side of the equation above will be a function of y too. If we want to find as a function of
dx
−1
x, we must use the fact that y = f (x) express the right-hand side in terms of x. This is perhaps
clearer when we write the result in the alternative notation
df −1 1 1
= 0 = 0 −1 (for f 0 (f −1 (x)) 6= 0).
dx f (y) f (f (x))
Before we begin, we need to think carefully about what we are trying to calculate, and state it
clearly: there is scope for us to get confused by a poor choice of notation here.
We want to calculate the derivative of g −1 . Hence, to get things set up in the same form as
we have above, we need let y = g −1 (x). Then, x = g(y), and since g(x) = x2 , x ≥ 0, we have
g(y) = y 2 , y ≥ 0 (simply changing the symbol from x to y). Then
dx
x = g(y) = y 2 ⇒ = 2y,
dy
43
so
d −1 dy 1 1
g (x) = = = .
dx dx dx/dy 2y
Now we need to express the last term as a function of x. We do this using the inverse function
√
y = g −1 (x) = x, (x ≥ 0) and so obtain
dy 1 1
= √ = x−1/2 .
dx 2 x 2
Hence
1 1
(g −1 )0 (4) = √ = .
2 4 4
1 + 3x d −1
Example 2.8. Let f (x) = , find f −1 (x). Hence find f (x) by using the formula for the
5 − 2x dx
derivative of an inverse, and also by direct differentiation.
1+3x
Be careful to identify when the roles of x and y swap. From y = 5−2x
5y − 2xy = 1 + 3x
5y − 1 = x(3 + 2y)
5y − 1
x= ,
3 + 2y
thus
5x − 1
f −1 (x) = .
2x + 3
1+3y
Derivative y = f −1 (x) so x = f (y) = 5−2y
dy 1 1
= dx = 3(5−2y)+2(1+3y)
dx dy (5−2y)2
(5 − 2y) 2
=
17
2
5 − 2 5x−1
2x+3
=
17
(5(2x + 3) − 2(5x − 1))2
=
17(2x + 3)2
172 17
= 2
=
17(2x + 3) (2x + 3)2
Check (Easier!)
df −1 (x)
d 5x − 1
=
dx dx 3 + 2x
5(3 + 2x) − 2(5x − 1)
=
(2x + 3)2
17
=
(2x + 3)2
44
2.4.2 Derivatives of inverse trigonometric functions
We can use the results we have just established to find the derivatives of inverse trigonometric
functions.
2 2 2
√ to express cos y in terms of x. Recall cos y = 1 − sin y = 1 − x as x = sin y, so
We now need
cos y = ± 1 − x2 . Which sign ± do we use?
Recall for y = arcsin x, −1 ≤ x ≤ 1 and −π π −π π
2 ≤ y ≤ 2 . For 2 ≤ y ≤ 2 , cos y ≥ 0. Thus
√ d
√
cos y = + 1 − x2 , and hence dx arcsin x = 1/ 1 − x2 .
d 1
Similarly, we can show that dx arctan x = 1+x2
.
Recall y = arctan x iff x = tan y for which we know dx/dy = sec2 y, then
dy 1 1
= dx = 2y
= cos2 y .
dx dy
sec
cos2 y + sin2 y = 1
sin2 y 1
1+ 2
=
cos y cos2 y
1 + tan2 y = sec2 y
dy 1
= as x = tan y.
dx 1 + x2
Example 2.10 (A falling ladder). A ladder of length 3 m stands on flat ground, leaning against
a vertical wall. The bottom of the ladder is at (x(t), 0), and the top is at (0, y(t)). At t = 0 the
ladder begins to slip, with the bottom moving horizontally outwards at 0.1 ms−1 . Assuming it
moves only in the vertical direction, how fast is the top of the ladder slipping down when the foot
is 1m from the wall? How does the speed at which the top falls change as the ladder slips further?
45
y
(0, y(t))
(x(t), 0)
x
Sketch of the ladder problem.
x2 + y 2 = 9,
by Pythagoras’ theorem. Differentiating with respect to time (using the chain rule) we find
dx dy
2x + 2y = 0.
dt dt
Re-arranging we find
dy x dx x
=− = −(0.1) . (2.1)
dt y dt y
√ √
When x = 1, y = 9−1= 8 and so when the ladder is 1m from the wall, the top is moving
at a speed
dy 1
= −(0.1) √ ms−1 .
dt 8
Note that the minus sign indicates movement in the downward (negative-y) direction.
As the ladder slips away from the wall, x(t) will increase, and, correspondingly, y(t) must
decrease. Looking at equation (2.1), we can then see that, the further the ladder moves from the
wall, the faster to top of it is dropping.
Example 2.11 (Piston motion). In a steam engine, the high pressure steam drives a piston back
and forth. This motion is translated into the rotation of a wheel by a connecting rod. (A simi-
lar principle operates in a petrol or diesel engine, except that in that case, the movement of the
piston is caused by the expanding exhaust gases from the ignition of the fuel, rather than by steam.)
46
Model stationary steam engine at Speyer,
Sir Winston Dugan (SAR Class 620) Germany
We consider a simplified version of this situation. Suppose the piston is linked to a wheel of
radius R by a straight, rigid rod of length, L. Let x be the horizontal distance between the centre
of the wheel, and the point where the piston is joined to the rod, as shown in the sketch below.
This changes with time as the piston moves back and forth.
R L
µ
x
How is the rate of rotation of the wheel related to how fast the piston is moving?
From the diagram, we can see that the speed of the piston is just the rate of change of x(t).
Using the cosine rule, we have
L2 = R2 + x2 − 2Rx cos θ.
Implicit differentiation of the above, using the product rule gives:
dx dx dθ
2x − 2R cos θ + 2Rx sin θ = 0.
dt dt dt
Rearranging we have
dx dθ
(R cos θ − x) = Rx sin θ ,
dt dt
so
dx Rx sin θ dθ
= .
dt R cos θ − x dt
47
Lecture 10 2.6 Maxima and minima of functions
Example 2.12 (Projectile motion). Consider the height of a ball, or similar object, thrown upwards
with speed, V , and acted upon only by gravity (we neglect air resistance, etc.). The height h of
the ball at time t is then given by
1
h = h0 + V t − gt2 ,
2
where h0 is the height above ground from which the object is thrown, and g ≈ 9.8 ms−2 is the
acceleration due to gravity. What is the maximum height reached by the ball?
There are various different ways we might approach this problem. We can see from the formula
that the quantity we need to maximise is V t − 12 gt2 (since h0 just adds a constant to this). Hence,
let us take h0 = 0 for the moment, to simplify things. For small values of t, the first term will
dominate, but for larger t, the second term becomes significant, and will eventually cancel it out.
Thus we expect a ‘middling’ value of t will give the greatest height. If we knew the value of V
numerically, one ‘low-tech’ method we might try would simply be to try different values of t in the
formula. We could then progressively ‘narrow down’ the possible value, until we find t to whatever
accuracy we might require. But this would be extremely time consuming and tedious.
h
V2
2g
t
0
V 2V
g g
A better method would be to plot the graph of h against t, as above. Then we can see that the
maximum height is achieved at a time halfway between when the ball is thrown up, and when it
returns to the ground. The first of these times is t = 0; the second is t = 2V
g . Hence the time at
∗ V
which the maximum height is reached is t = g . The height is
V2
hmax = h (t∗ ) = .
2g
This method worked even when we did not know the numerical value of V , but we relied on the
symmetry of the parabola (the fact that the maximum occurred at a value of t half way between
the two roots of h(t) = 0). In other problems, we might want to find the maximum of a function
that does not have such a helpful symmetry; what would we do then?
48
Briefly thinking about the physics of the problem, and the graph of the function h(t) suggests
another way forward. Initially, when the ball is thrown, it is moving upwards, so the function
h(t) is increasing. This means dhdt > 0. Later, the ball will be moving downwards, so h(t) will
dh
be decreasing, and dt < 0. When the ball reaches its maximum height, it changes from moving
upwards to moving downwards, so for an instant its velocity is zero. Hence dh
dt = 0. This observation
provides us with another way of finding the maximum height. We can easily calculate
dh
= V − gt.
dt
The derivative dh ∗ V
dt will be zero when V − gt = 0, which implies t = g , as we found previously.
This method, however, did not require us to use any properties of the graph of h(t) to determine
t∗ . Instead, it relies on the simple observation, that as the graph of a function goes through
the maximum value, the function goes from increasing to decreasing. Hence the derivative of the
function changes from being positive to being negative. The transition will occur at the maximum
point itself, so the derivative there will be zero.
f (x)
x
0
A function f (x) with a minimum at x = 0
Thus, at a maximum or minimum point, the derivative of the function will be zero. This
suggests some interesting questions about the property we used to find the maximum in the last
example. For instance, can we always find the maximum or minimum value of a function f simply
by finding the roots of f 0 (x) = 0? What happens if f 0 (x) has multiple zeros?
49
Graph of f (x) = x(x − 1)(x − 2)(x − 4)
f (x)
15
10
x
1 2 3 4
−5
Consider the function f (x) = x(x − 1)(x − 2)(x − 4) for −1 ≤ x ≤ 5. We can see that the graph
has three points where the tangent to the curve would be parallel to the x axis (i.e. where the
df
derivative dx = 0). This is what we expect: since f (x) is a quartic, its derivative will be a cubic,
and a cubic equation has at most three roots. Now, there can be at most one minimum and one
maximum value, so clearly at least one of the three roots cannot correspond to the maximum or
df
minimum value. Equally, we see that the function takes its largest value at x = 5, where dx 6= 0.
We need to be a little more precise with our definitions if we want to understand the relationship
between points where the derivative is zero, and the maximum and minimum values of the function.
Firstly, we introduce the idea of local maxima and minima, where we consider the values of the
function only in some small neighbourhood of the point of interest.
Definition 2.3. A function f has a local maximum at a point x if f (x) is greater than or equal
to the values of f for all points near x. A function f has a local minimum at a point x if f (x)
is less than or equal to the values of f for all points near x. If a point is either a local maximum
or a local minimum we call it a local extremum of the function.
We can now see that the three points where the derivative of f (x) = x(x − 1)(x − 2)(x − 3) is zero
are all local extrema. The end points x = −1 and x = 5 are also local extrema.
Definition 2.4. A critical point of a function, f , is a point x in the domain of f where either
df df
=0 or is undefined.
dx dx
50
df
Note: the definition tells us that for x to be a critical point, f (x) must be defined even if is
dx
undefined. This means that x = 0 is a critical point of the absolute value function, |x|. However,
x = 0 is not a critical point of f (x) = x−1 , even though f 0 (x) is undefined there, because f (x)
itself is undefined at x = 0.
Property 2.5 (Local extrema at interior points are critical points). Suppose f that is a non-
constant function defined on an interval, and that it has a local extremum at x (where x is not
one of the end points of the interval). If f is differentiable at x, then f 0 (x) = 0. Thus, if x is a
local extremum of f , it is also a critical point of f .
This property is straightforward to prove. First, consider the case where f is not differentiable
at x. This means f 0 (x) is undefined, and hence, from the definition, x is a critical point of f .
Now, consider the case where f is differentiable at x, and has a local maximum there. Then,
for any sufficiently small |h| the existence of a local maximum at x means f (x + h) ≤ f (x). If h > 0
this means that
f (x + h) − f (x)
≤0
h
and if h < 0 this means that
f (x + h) − f (x)
≥ 0.
h
Using these inequalities and the fact that f is differentiable at x we have, for h > 0
f (x + h) − f (x)
f 0 (x) = lim ≤0
h→0 h
and, for h < 0
f (x + h) − f (x)
f 0 (x) = lim ≥ 0.
h→0 h
Since the limit is unique, we must obtain the same answer whether we approach x from the left
or the right (i.e. with h positive or negative). Hence f 0 (x) = 0. The proof for a minimum point
follows in the same manner. Thus, if f has a local extremum at x, then x is a critical point.
Note that the converse of this result is not true: a point x may be a critical point of f which is not
a local extremum. For example, if f (x) = x3 , then f 0 (0) = 0, but zero is not a local extremum.
51
Property 2.6 (The second derivative test for local maxima and minima). Let f be a
twice differentiable function defined on some interval containing the point x.
df d2 f
If dx = 0 and dx2
< 0 then f has a local maximum at x.
df d2 f
If dx = 0 and dx2
> 0 then f has a local minimum at x.
df d2 f
If dx = 0 and dx2
= 0 then the test provides no information.
In the last case, we would need to use some other method e.g. considering the graph of the function,
to decide what type of point we were dealing with. One possibility is that x is a point of inflection.
Definition 2.5 (Global extrema). Let x0 be a point in the domain D of a function f . Then
we say,
• f has a global minimum at x0 if f (x0 ) is less than or equal to all other values of f (x) for
x∈D
• f has a global maximum at x0 if f (x0 ) is greater than or equal to all other values of f (x)
for x ∈ D
Note: The may be more than one global minimum or maximum. This might seem odd compared
to our everyday use of the terms, but it is correct in mathematical terminology, owing to the ‘or
equal to’ in the definition.
In order to find the global maximum or global minimum, intuitively, we would just check the
values of the function at all the local maxima or minima, and choose whichever of those that gives
the greatest or least value of the function. This idea is essentially sound, but there are two ways
things could go wrong.
The global maximum or minimum function f may not exist if, for example:
• The function is defined over an open interval (a, b). Consider f (x) = x defined on x ∈ (0, 1);
the maximum value of f would occur for the largest value x, but there is no largest value of
x in (0, 1).
• The function is defined over an unbounded interval such as [0, ∞). Again, consider f (x) = x.
Although there is now a global minimum at x = 0, there is no global maximum, as f (x)
increases without bound as x → ∞.
52
For continuous functions defined on closed and bounded intervals, our intuitive idea is correct.
Property 2.7 (The Extreme Value Theorem). Let f be a continuous function defined on a
closed, bounded interval [a, b]. Then, f has a global maximum and a global minimum on [a, b].
We can use the Extreme Value Theorem to find the global maximum and minimum of continuous
functions on closed, bounded intervals by exploiting the connections we have established between
critical points and local extrema. The steps are:
df
• Find the critical points of f by solving dx = 0 and finding any points where f 0 (x) is undefined.
• Evaluate f at every critical point and the end points of the interval, a and b.
• Compare the values of f (x) thus obtained; the largest value gives the global maximum, the
smallest the global minimum.
2.7 Optimisation
We have now demonstrated the way to find the smallest and largest values a function can take. In
practical problems, such values are often important, e.g. to minimise the amount of fuel used by a
vehicle, or to maximise the amount of a product of a chemical reaction. This type of problem is
illustrated in the example below.
Example 2.13. A soft drink can is approximately cylindrical, and is required to hold 330 cm3
(equivalently, ml) of liquid. As a can manufacturer, you aim to minimise costs by using the small-
est amount of metal possible. Assuming the cost of materials used in each can is proportional to
the can’s surface area, what dimensions should the can be made to, in order to minimise the cost
of producing it?
A cylindrical can basically consists of two circular pieces for the top and bottom, and a large
rectangular piece which is curved around to form the sides. Briefly thinking about the problem,
we can see that if we try to save material on the sides, by making it short, we will have to make
the discs for the top and bottom very large to accommodate the liquid. Similarly, if we try to save
material from the top and bottom by making them smaller, the can will need to be tall to hold the
drink. So, there is a trade-off that we need to make.
Let us now consider a cylindrical can of radius r and height h. The volume of the can is given
by V = πr2 h, and its surface area is the sum of the two end pieces and the curved piece that forms
the sides. The area of the top and bottom are each πr2 . The piece that forms the sides of the can
is a rectangle, with one side of length h, and the other side of length 2πr (the circumference of the
circles forming the top and bottom, around which it is wrapped); hence its area is 2πrh. Therefore,
the total surface area of the can is A = 2πrh + 2πr2 . We wish to minimise this value, subject to the
constraint that V = πr2 h = 330 cm3 (so the can contains the required amount of liquid). However,
we only know how to find the maxima and minima of functions of a single variable; here A depends
53
on both r and h. How can we simplify things?
Fortunately, the constraint that we require the can to contain a certain amount of liquid comes
to our rescue. We can rearrange the volume equation to get h in terms of r. We find
330
h= .
πr2
Then, we substitute this expression for h into our equation for A, which yields
660
A= + 2πr2 .
r
We can find the value of r which minimises A by computing the derivative if A and setting it equal
to zero. Thus
dA 660
= − 2 + 4πr = 0.
dr r
We rearrange this equation to get the value of r:
1
330 3
r= ≈ 3.7cm.
2π
Note that the corresponding height is
330 330
h= 2 =2 = 2r ≈ 7.5 cm.
πr 2π
The final stage is to check our answer really does give a minimum of the area, A, rather than a
maximum. The second derivative of A is
d2 A 1320
2
= 3 + 4π,
dr r
which is positive for all r > 0. Hence, r ≈ 3.7 corresponds to a minimum.
In fact, if we look up product information on the web, it appears 330ml cans (the European
standard size) are made to a diameter of 6.6cm and height 11.5cm which means they are taller and
thinner than the dimensions we have calculated here. Why might there be such a difference?
Lecture 12 The above example was fairly straightforward. However, practical situations can lead to a wide
variety of optimisation problems. Some will require more careful though before we can simply
apply our procedure for finding extrema of differentiable functions. The following fairly general
guidelines, although they may not all be relevant in every problem, will often help.
3. Identify the quantities on which the dependent variable depends. Write down the relations
between these variables.
54
4. Select one of the quantities from step 3 and express the dependent variable as a function of
this variable – the independent variable – alone. Use the physical constraints of the problem
to fix the domain of the function. (Note: frequently one variable may be a better choice than
others, so a little thought should be put into this step).
5. The problem should now be converted into a mathematical one of finding the global extremum
a certain function over an interval.
The next example illustrates the more complicated situations we might encounter in real life
problems.
Example 2.14. Two corridors, which meet at right angles, have widths a and b m. Find the length
of the longest pole which will go around the corner, assuming the pole remains horizontal at all
times.
b
l1 Á
µ
l2 a
Consider the diagram above. For a given angle θ between the pole and the wall, the longest
pole which can fit in the angle between the corridors will be just touching the walls at each side,
and at the corner, as illustrated.
The length of this longest pole is
l = l1 + l2 .
π
Since φ = 2 − θ, simple trigonometry gives
Hence, as a function of the angle θ, the length of the longest pole that can fit in the corner is
a b
l(θ) = + .
sin θ cos θ
Now, as we turn the pole through the corner, the angle θ will go from 0 to π/2. In order to
find the maximum length of pole that will be able to make the turn, we need to find the minimum
value of l(θ) (the maximum length of pole that fits in the corridor for a given θ) for θ ∈ [0, π2 ]. This
pole will be able to fit in the corner for all angles between 0 and π2 , and hence can traverse the
corner successfully.
55
Differentiating l with respect to θ we obtain
a cos θ b sin θ
l0 (θ) = − + .
sin2 θ cos2 θ
l is minimised when l0 (θ) = 0 i.e.
a
a cos3 θ = b sin3 θ ⇒ tan3 θ = .
b
Equivalently, we can write this as
a1
θ = tan−1
3
.
b
Let us call this value θ∗ . Then, the length of the pole is given by
a b
l= ∗
+ .
sin θ cos θ∗
Since l is large and positive for θ close to 0 and π2 , and there are no other critical points, we can
see this is indeed a global minimum. (Alternatively, the second derivative test can be used.) To
1
express l more compactly, let α = tan θ∗ = ab 3 . Then, by drawing the appropriate triangle, we
see that sin θ∗ = √1+α
α
2
and cos θ∗ = √1+α1
2
. Substituting this into the formula for l we get
√
a 1 + α2 p
l= + b 1 + α2 .
α
√ q 2 2 2 1
Since 1 + α2 = 1 + ab 3 = 11 (a 3 + b 3 ) 2 we have
b3
1
ab 3 2 2 1 b 2 2 1
l= 1 1 (a 3 + b 3 ) 2 + 1 (a 3 + b 3 ) 2 .
a b
3 3 b 3
Hence finally,
2 2 1 2 2 2 2 3
l = (a 3 + b 3 ) 2 (a 3 + b 3 ) = (a 3 + b 3 ) 2
q
Sketch of the typical shape of the cost function, C(q)
56
At the simplest level, to continue and expand, a business needs to bring in more revenue from
selling its wares than it expends in producing them. Consider a very simple case where a company
makes only a single product. We introduce the cost function, C(q), which gives the total cost
of producing a quantity q of some good. (For simplicity, we can think of q as being the number
of ‘widgets’ the company produces, though of course, what the company produces need not be a
physical object.) We expect the cost function to look something like the sketch above. In particular,
since the more goods that are made, the greater the cost, we expect C to be an increasing function
of q. The intercept on the vertical axis represents the fixed costs, which are incurred before anything
can be produced. These might include the costs of buildings and machinery. The cost function
increases rapidly at first, but then more slowly, as producing larger quantities of something is
usually more efficient than producing small quantities (so-called economies of scale). However, at
very high levels of production, the costs start increasing more rapidly again as shortages of resources
like staff or raw materials begin to push up their costs.
q
Sketch of the typical shape of the revenue function, R(q)
The income that comes from selling the goods produced by the business is given by the revenue
function, R(q), where q is again the quantity of goods. If the price per item is p, and the quantity
of goods sold is q then, R = pq. If the price of the goods does not change with quantity, then
the graph of R against q will be a straight line. However, in reality we would expect that, when
the quantity of goods sold becomes very large, the market for that particular product will become
saturated, causing the price to drop. This would give a graph somewhat like that illustrated above.
The profit, π(q) that the business makes is the difference between the revenue and the cost, i.e.
If this is negative, the business makes a loss. (Note that the profit is often denoted π(q) since π is
the Greek letter corresponding to p, and p has already be used to denote price; it is nothing to do
with the constant π ≈ 3.14.) Hence, if we plot R and C on a graph, the business makes a profit
where R is above C.
Marginal analysis
As the owner of a profitable business, you might be tempted to go one stage further and ask, ‘Could
I refine what I am doing to make the business more profitable?’. Many economic decisions are based
on an analysis of how costs and revenues would change, if a small change was made to what the
57
business is currently doing. This is called ‘marginal analysis’; the additional costs involved are the
marginal costs, and the associated revenues are marginal revenues.
Suppose we are running a large bakery, and need to decide if we should increase our production
of bread from the current 1000 loaves per day. Assuming we make our decisions solely on financial
grounds, the logical way to do this would be to consider the extra costs of baking more bread and
compare this to the additional revenue we get from selling more. If the revenue increase is greater
than the cost increase, we should increase production.
If we were to increase our production by one loaf, then the increase in costs would be
C(1001) − C(1000)
Increase in cost = C(1001) − C(1000) = .
1001 − 1000
Note from the final term on the RHS that this is just the average rate of change of cost between 1000
and 1001 loaves, or approximately the derivative of C(q) at q = 1000, which is the instantaneous
rate of change of cost at q = 1000. Since we are interested in the effect of ‘small’ changes to the
business, many economists define the marginal cost, MC , as the instantaneous rate of change of
costs - i.e.
dC
MC = .
dq
Similarly, the additional revenue we would receive from increasing our bread production by one
loaf would be
R(1001) − R(1000)
Increase in revenue = R(1001) − R(1000) = .
1001 − 1000
Hence we similarly define the marginal revenue, MR as
dR
MR = . (2.2)
dq
If the marginal revenue exceeds the marginal cost, then producing the one extra loaf increases our
profits, and we should make the change. If the marginal cost exceeds the marginal revenue, baking
the extra loaf would reduce our profits.
Of course, what we really care about as company directors is profit: this is what we really want
to maximise. From our previous discussions of extrema, we know the maximum profit will occur
either at an end point (q = 0 or q = qmax , the maximum amount of goods that can be produced)
or at a critical point of π(q) - i.e. a point where
dπ
= 0.
dq
But
π(q) = R(q) − C(q),
so
dπ dR dC
= − = MR − MC = 0.
dq dq dq
Thus, the maximum profit can occur when the marginal cost equals the marginal revenue (MC =
MR ).
58
2.9 Summary of learning outcomes
Now that we have reached the end of this chapter, you should be able to:
• Explain the concept of the derivative of a function in terms of the rate of change, or geomet-
rically, the slope of the tangent line
• Give examples of functions which are, and which are not, differentiable
• Interpret the meaning of the derivative in the context of a practical problem, including spec-
ifying its units
• Recall the rules of differentiation (linear combination rule, Product Rule, Quotient Rule and
Chain Rule)
• Differentiate implicitly, and thus differentiate inverse functions (including inverse trigonomet-
ric functions)
• Use implicit differentiation to find relationships between related rates of change (related rates)
and apply this knowledge to solve practical problems
• Use the properties of critical points and local extrema to find global maxima and minima of
differentiable functions.
59
60
Chapter 3
Integration
We started this course by considering functional relationships between variables. In the last chapter Lecture 13
we saw how, once we have such a functional relationship, we are also able to obtain information
about the rates of change of the variables by differentiating. For example, if we know the velocity
of a vehicle, v, as a function of time, t, we can determine its acceleration a = dv
dt by differentiation.
However, there are many applications where we would want to reverse this process - e.g. we might
know the acceleration of a vehicle from Newton’s Second Law (F = ma), but need to know its
velocity. In such a case, we need to use integration.
You will have already come across the concept of integration at school, or in earlier courses. It
will probably have been explained to you that integration is the inverse operation of differentiation
(i.e. if we take a function f (x) integrate it with respect of x and then differentiate the result, we get
back our original function, f (x)). You might also have learned that integration gives the area under
a curve y = f (x). Some of you may have been told that ‘integration is a process of summation’. All
of these things are true, but at first sight it is not obvious that they should correspond to the same
mathematical operation. Thus, to begin this section on integration, we look at some examples to
revisit these ideas, and define precisely what we mean by terms which you may already have heard,
such as definite and indefinite integral.
If the velocity was constant over the five seconds, then we would easily calculate the displacement
as the product of velocity and time. However, the velocity is increasing.
61
v
19
18
17
16
15
14
t
0 1 2 3 4 5
Intuitively, we can work out an estimate of the displacement by assuming that over a short in-
terval (say, one second), the velocity will be roughly constant. Then, by multiplying the velocity by
that time, we have the distance moved by the car over that one second. Adding up all the contribu-
tions from each of the first five seconds will give an estimate of how far the car travelled in that time.
Taking the maximum possible velocity over the five one-second time intervals gives an upper
bound (overestimate) for the displacement:
U5 = 15 · 1 + 16 · 1 + 17 · 1 + 18 · 1 + 19 · 1
= 85 m
(The distance travelled at 15 m/sec for 1 sec is the area of the rectangle and is 15 × 1 = 15 m.)
Similarly, taking the minimum velocity on each time interval gives a lower bound (underesti-
mate):
L5 = 14 · 1 + 15 · 1 + 16 · 1 + 17 · 1 + 18 · 1
= 80 m.
We know the car travelled between 80 m and 85 m from the starting point, a difference of 5 m. If
we estimate the displacement as the midpoint between U5 = 85 and L5 = 80, namely 82.5 m, then
we know that:
x = 82.5 ±2.5
| {z } m.
possible error
Now, suppose that in this case, we know the functional relationship between v and t for any
value of t in [0, 5], rather than just knowing the velocity at one second intervals. We can quickly
check that the function v(t) = 14 + t is consistent with the values in the table. Now that we
know the value of v(t) for any t ∈ [0, 5], we could produce more accurate estimates of the distance
travelled by dividing the time into more than 5 intervals, since as the time intervals get smaller, the
speed of the car will change less over that time, and our approximation that the speed is constant
over the interval comes closer to being true.
If we plot the function on a graph, we can see that, geometrically, what we have done when we
were estimating the distance travelled was to estimate the area under the line v(t) by adding up
the area of rectangles. Since the function is a straight line, we can calculate this area under the
graph (which represents the distance) precisely, as we know how to calculate areas of rectangles
and triangles. We obtain: x = 14 × 5 + 0.5 × 5 × 5 = 82.5. Note that this is consistent with our
previous estimates.
62
Now let us consider the function G(t) = 14t + 21 t2 ; differentiating gives dG
dt = 14 + t = v(t).
Hence,
Z 5
1
x= v(t) dt = [14t + t2 ]50 = G(5) − G(0) = 14 × 5 + 0.5 × 5 × 5 = 82.5.
0 2
This simple example is important because it demonstrates the connection we discussed between
sums (we added up the distances travelled by the car over each small time interval to estimate
the displacement), finding the area under a graph (in this case, a velocity-time graph) and the
‘inverse operation’ of differentiation (which gave us the function G(t)). In the final stage, we used
the notation for a definite integral, which you are probably already familiar with, although we have
not yet precisely defined what it means. However, having demonstrated the connection between
the three concepts, we now need to decide how to define precisely what integration is.
In some ways, it would be convenient to define G(t) as ‘the integral’ of v(t). In the past, you
have probably been used to thinking of integration in this way - the problem of finding the integral
of a certain function v(t) simply came down to finding some function G(t), such that G0 (t) = v(t).
(We call G(t) an antiderivative for v(t); we will define this term precisely later.) For our example,
it was easy to compute a suitable G(t), since √ v(t) was a linear function. But what if v(t) were
a more complicated function - say, v(t) = 1 + t3 + t4 ? Could you find a suitable G(t) in that
case?∗ If you cannot, does that mean we are unable to calculate the distance travelled from t = 0
to t = 5 for this velocity function? Of course not! We can use the summation procedure we used
earlier to find upper and lower bounds on the distance travelled, and by making the time intervals
we consider small enough, we can determine the answer to any desired degree of accuracy.
As there are many more functions on which we can use the summation procedure than for
which we can find antiderivatives it makes sense to use this procedure to define integration, since
it gives a more widely-applicable definition. This is important, because there are many practical
problems where we need to be able to calculate integrals (like the distance in this example), but for
which there is no known antiderivative. In the next section, we will use the summation procedure
to give a general definition of a definite integral. Then, we will show that the familiar method you
have used for computing them (i.e. finding an antiderivative) gives a result consistent with this
definition. But, before we can do all that, we need to introduce some notation.
U5 = 15 · 1 + 16 · 1 + 17 · 1 + 18 · 1 + 19 · 1
= 85 m
If we want to calculate areas using this method we potentially need to evaluate sums with large
numbers of terms, which it would be tedious to write out in full. Summation notation provides
a convenient and concise way to express sums of this type. You have probably already met this
notation at school when you studied statistics (e.g. in calculations of mean and variance). However,
we will briefly revise it here for the sake of completeness.
∗
Before you waste too much effort trying to find one, it is worth noting that an explicit formula for G(t) does not
exist for this v(t)!
63
If a1 , a2 , . . . , an are real numbers, the sum of a1 , . . . , an is written
n
X
ai = a1 + a2 + · · · + an .
i=1
P
“i” is called the “index” of the sum. The symbol is a capital sigma (a Greek letter approximating
to s); this tells us we needPto form a sum. The terms of the sum appear to its right. The value
of i at the bottom of the symbol tells us what value of i to use in the first term of the sum.
Then,P we add one to i to get the second term , and so on, until i reaches the value at the top of
the symbol, which will give the last term in the sum. The letter i is a “dummy index” as we
can change it without altering the sum:
n
X n
X n
X
ai = aj = ak = a1 + a2 + · · · + an .
i=1 j=1 k=1
The value of ai will be a function of i, ai = f (i) so the sum is ni=1 f (i) = f (1)+f (2)+· · ·+f (n).
P
Hence, we can see that using summation notation, our sum from earlier can be written
5
X
U5 = v(i),
i=1
where v(i) = 14 + i.
Examples
P4 2 2 2 2 2
1. (a) i=1 i = 1 + 2 + 3 + 4 = 30
P6
(b) i=3 i = 3 + 4 + 5 + 6 = 18
P3 j 0 1 2 3
(c) j=0 2 = 2 + 2 + 2 + 2 = 1 + 2 + 4 + 8 = 15
P4
(d) i=1 2 =2+2+2+2=8
64
P
Properties of :
Pn Pn
1. i=m cai = c i=m ai , for c a constant;
Pn Pn Pn
2. i=m (ai + bi ) = i=m ai + i=m bi ;
Pn Pn Pn
3. i=m (ai − bi ) = i=m ai − i=m bi .
These can each be verified by writing out the sums on each side
Derivation of formula
n
X
i = 1 + 2 + ··· + n
i=1
= n + n − 1 + ··· + 1
Adding:
n
X
2 i = (n + 1) + (n + 1) + · · · + (n + 1) = n(n + 1)
| {z }
i=1 n
n
X (n + 1)
i=n .
2
i=1
Pn n3 n2 n
4. i=1 i
2 = 3 + 2 + 6 = n6 (2n2 + 3n + 1) = n6 (2n + 1)(n + 1)
Proof
We can show this formula is correct using proof by induction. We begin by checking it is
correct for n = 1: in this case, the sum is 1, and the formula gives
1 6
(2 + 1)(2) = = 1.
6 6
65
Hence, the formula is true for n = 1. We now assume that it is correct for n = k, and consider
what happens when n = k + 1. We have
k+1 k
X X k
i2 = i2 + (k + 1)2 = (2k + 1)(k + 1) + (k + 1)2
6
i=1 i=1
(k + 1) (k + 1) 2 (k + 1) 2
= [k(2k + 1) + 6(k + 1)] = [2k + k + 6k + 6] = [2k + 7k + 6]
6 6 6
(k + 1) (k + 1)
= (2k + 3)(k + 2) = (2[k + 1] + 1)([k + 1] + 1)
6 6
Hence, if the formula is true for n = k it is also true for n = k + 1. Since we have already
shown it is true for n = 1, this completes our proof by induction.
10
X 12
X
(i + 2)2 = j2 where j = i + 2
i=3 j=5
12
X 4
X
= j2 − j2
j=1 j=1
12 4
= · 25 · 13 − · 9 · 5 = 620.
6 6
Lecture 14
I is a unique real number which depends on the function f being integrated (called the integrand ),
and on the values of a and b.
The answer to this question is, ‘No’; not every function is integrable. Some functions are not
integrable for any values of a and b; in other cases, a function might be integrable on some intervals,
but not others. If the function is continuous on [a, b], it can be shown to be integrable. However,
continuity of f is not a requirement; for example, we can integrate the Heaviside function H(x)
between −1 and 1, and obtain Z 1
H(x) dx = 1.
−1
66
We will see why this is true shortly.
Note that, for a definite integral, the variable, x, with respect to which we are integrating, is a
‘dummy variable’, much like the index i was a ‘dummy index’ when we used summation notation;
we can replace x with any other symbol without changing the result, I. Thus
Z b Z b Z b
I= f (x) dx = f (t) dt = f (θ) dθ.
a a a
In order to define the definite integral using sums, we need to follow these steps:
1. We divide the interval [a, b] into n equal subintervals. Each subinterval will have width
∆x = (b − a)/n. We denote the end points of the subintervals by
If the function f is continuous on [a, b], then mi and Mi are guaranteed to exist on each
subinterval. If f is not continuous, either or both of them may fail to exist on one or more
subintervals.
3. Then, we define the lower and upper sums, as we did for the distance-finding example:
Lower sum Ln = m1 ∆x + m2 ∆x + · · · + mn ∆x
n n
!
X X
= mi ∆x = mi ∆x
i=1 i=1
and Upper sum Un = M1 ∆x + M2 ∆x + · · · + Mn ∆x
n n
!
X X
= Mi ∆x = Mi ∆x.
i=1 i=1
Note that Ln ≤ Un for all values of n (since each term in the lower sum is less than or equal
to the equivalent term in the upper sum). In fact, Lm ≤ Un for any numbers m and n. We
can understand this intuitively if we imagine the graph of a continuous function, f (x), and
visualise the rectangles representing each of the terms in the sum (as we drew for the distance
travelled example). The lower sum is always less than the area below the curve, no matter
how many subintervals are used, because all the rectangles lie below the curve, whereas the
upper sum is always greater than the area below the curve.
Definition 3.1. The definite integral, I, of a function, f , from a to b is defined to be the unique
real number which satisfies
Ln ≤ I ≤ Un for all n = 1, 2, 3, . . .
67
where such a number exists. We write this as:
Z b
I= f (x) dx.
a
Z
Note that the symbol we use to denote an integral, is actually an elongated letter s, chosen
because of the close connection between integrals and sums.
Riemann sums An alternative way of defining the definite integral, which you might meet in
textbooks, is based on Riemann sums (named after German mathematician Bernhard Riemann).
Instead of ‘trapping’ the value of I between the upper and lower sums, we approximate it by
introducing
Xn
Sn = f (x∗i )∆x.
i=1
where x∗i is any value in the ith interval (i.e. xi−1 < x∗i < xi ). We then define
n
X
IRiemann = lim f (x∗i )∆x.
n→∞
i=1
Now, if we assume that f is a continuous function on [a, b], we can show that this definition
is equivalent to Definition 3.1. Note that for any n, Ln ≤ Sn ≤ Un . As n → ∞, xi → xi+1 ,
so, lim f (x∗i ) = f (xi ); similarly, lim mi = lim Mi = f (xi ) (the latter two equalities follow
n→∞ n→∞ n→∞
from the fact that Mi and mi are the images under f of two points in the ith interval). Hence
lim Ln − Sn = lim Un − Sn = 0 and so, using the inequality in Definition 3.1 lim I − Sn = 0.
n→∞ n→∞ n→∞
Hence,
Xn
I = lim f (x∗i )∆x = IRiemann .
n→∞
i=1
What functions can we integrate? We have not clearly stated what types of functions are
‘integrable’, except for saying that they are those functions for which the definite integral exists (a
rather circular definition!). A precise definition is outside the scope of this course. However, it can
be shown from the definition that if f (x) is a continuous function on [a, b], then it is integrable.
Continuity of f on [a, b] ensures that the values mi and Mi exist for every sub-interval in [a, b] (by
the Extreme Value Theorem), which is necessary for our definition of I to make sense. However,
continuity is not a requirement. We say that a function f is bounded on the interval [a, b] if there
is some real number M such that for all x ∈ [a, b], −M < f (x) < M . As long as the function is
bounded on [a, b], it can have finitely many jump discontinuities and the definite integral will still
exist.
68
For example, the Heaviside function, H(x), is integrable over any finite interval [a, b] where
a b ∈ R, with
Z b (b − a) if a, b ≥ 0
I= H(x)dx = b if a < 0 ≤ b
a
0 if a, b < 0
n
(b − a) X
If a, b ≥ 0, then we have mi = Mi = 1 on every subinterval, and so Ln = Un = 1 = b − a.
n
i=1
Hence we must have I = b − a. Similarly, if a, b < 0, mi = Mi = 0 on every subinterval, and
n
(b − a) X
Ln = Un = 0 = 0, so I = 0. If a < 0 ≤ b, we split the interval [a, b] into [a, 0] and
n
i=1
[0, b]. Let [a, 0] be divided into n1 subintervals. On each subinterval we have mi = 0, and for
i < n, Mi = 0 too. However, on the last subinterval, we have Mn = 1. We similarly subdivide
on [0, b] into n2 subintervals, on each of which we have mi = Mi = 1. Set n = n1 + n2 ; then,
n1 n2
a X b X
adding the contributions from both [a, 0] and [0, b], we have Ln = 0+ 1 = b and
n1 n2
i=1 i=1
1 −1
nX n2
a a b X a
Un = 0+ + 1= + b. As we take more sub-intervals, n1 and n2 get larger, and
n1 n1 n2 n1
i=1 i=1
so we must have I = b. Hence the class of integrable functions is larger than the class of continuous
functions (which in turn is larger than the class of differentiable functions).
However, not all functions are integrable. For example, the function f (x) = x−1 is not inte-
grable over [0, 1] because it is undefined at 0. The Dirichlet function is not integrable over any
finite interval, [a, b], despite the fact that it is defined for all real numbers, and 0 ≤ D(x) ≤ 1. We
can show this as follows. Suppose that we divide [a, b] into n subintervals. Now, mi = 0 for all i,
since there will be an irrational number in [xi−1 , xi ]. Similarly, Mi = 1 for all i, since there will be
n n
X (b − a) X
an rational number in [xi−1 , xi ]. Thus Ln = 0 and Un = ∆x 1= 1 = (b − a) for all
n
i=1 i=1
values of n. This means that there is no unique real number I with Ln ≤ I ≤ Un for all n (any
number between 0 and b − a would be equally good!).
Z a Z b
2. f (x) dx = − f (x) dx.
b a
69
4. The average value, f¯, of a function f over the interval [a, b] is defined to be
Z b
1
f¯ = f (x) dx.
(b − a) a
To see why this definition makes sense, consider the following. We know that the average
(mean) of n numbers is simply the sum of those numbers, divided by n. Suppose we divide
our interval [a, b] into n subintervals, each of length ∆x = (b − a)/n. Now, let x∗i be a number
in the ith subinterval (we could choose the mid-point, for example). Then, we could calculate
the (approximate) average value of f over [a, b] as:
f (x∗1 ) + f (x∗2 ) + f (x∗3 ) + · · · + f (x∗n )
f¯ ≈ .
n
But, we have n = (b − a)/∆x. Substituting this in, we get
f (x∗1 ) + f (x∗2 ) + f (x∗3 ) + · · · + f (x∗n ) f (x∗1 ) + f (x∗2 ) + f (x∗3 ) + · · · + f (x∗n )
= (b−a)
n
∆x
(f (x∗1 ) + f (x∗2 ) + f (x∗3 ) + · · · + f (x∗n ))∆x
=
(b − a)
f (x∗1 )∆x + f (x∗2 )∆x + f (x∗3 )∆x + · · · + f (x∗n )∆x
=
(b − a)
n
1 X
= f (x∗i )∆x
(b − a)
i=1
Now, as n gets larger and larger (i.e. we are evaluating f at more and and more points in the
interval), our approximation of the average value will improve. Thus, we let
n
1 X
f¯ = lim f (x∗i )∆x,
(b − a) n→∞
i=1
But, comparing the RHS of the expression above to the definition of the definite integral
using Riemann sums, we can see that
Z b
¯ 1
f= f (x) dx.
(b − a) a
70
On each subinterval [ i−1 i
n , n ] let mi and Mi respectively denote the minimum value and maximum
value of y = f (x) on that interval.
y = x2
-
1 2 3 n−1 1
n n n n
1
Adding the areas of all rectangles of height Mi and base n gives the upper sum.
2 2
1 1 2 1 1
Un = · + · + · · · + (1)2
n n n n n
" #
2 2 2
1 2 3 n 2 1
= + + + ··· +
n n n n n
1
= [12 + 22 + 32 + · · · + n2 ]
n3
y = x2
-
1 2 3 n−1 1
n n n n
1
Similarly, adding the areas of all rectangles of height mi and base n gives the lower sum.
71
Calculating the lower sum
1 2 1 1 2 1 n−1 2
Ln = · 0 + + ··· +
n n n n n
" 2 2 #
1 n−1 1
= 02 + + ··· +
n n n
1
= 02 + 12 + · · · + (n − 1)2 3
n
Ln ≤ A ≤ Un
area of rectangles area under area of rectangles
below the curve curve above curve
As n → ∞, that is, the width of the rectangles gets smaller and smaller, we expect the upper
and lower sums to get closer together and better approximate the area.
n3 n2 n 1
Thus Un = + +
3 2 6 n3
1 1 1
= + + 2.
3 2n 6n
We apply the same formula to Ln :
02 + 12 + 22 + · · · + (n − 1)2
= 12 + 22 + · · · + (n − 1)2 + n2 − n2
n3 n2 n n3 n2 n
= + + − n2 = − +
3
3 2 6 3 2 6
2
n n n 1
Ln = − +
3 2 6 n3
1 1 1
= − + .
3 2n 6n2
Ln ≤ A ≤ Un
1 1 1 1 1 1
− + 2 ≤A≤ + + 2
3 2n 6n 3 2n 6n
As n → ∞, Ln → 31 , Un → 13 , so A = 13 .
This result is consistent with what we have found numerically; as we use larger and larger values
of n our numerical approximation of the integral becomes closer and closer to the exact value we
have just found.
72
3.3.2 Definite integrals and areas
In our earlier examples, we considered functions which take only positive values. The definition of
the definite integral above does not require the values of f (x) to be positive; nothing is changed if
either or both of the maximum and minimum values (Mi and mi ) are negative on any sub-interval.
In many applications, such as finding the displacement of a vehicle from the velocity, the fact that
some terms in the sum would be negative makes sense: if the velocity was negative for some part
of the time, the vehicle would be moving backwards, and hence it would not end up so far from
the starting point. The negative terms in the sum would cancel out some of the positive ones, so
the final answer would be smaller. However, we do need to think a little more carefully about our
interpretation of the integral as the area under the relevant graph. If some of the Mi or mi are
negative, then the corresponding value Mi ∆x or mi ∆x must be also; but areas cannot be negative.
We can resolve this problem by interpreting the definite integral as representing the ‘signed
area’: i.e. over regions where f (x) > 0, the definite integral gives the area between the curve and
the x-axis; where f (x) < 0, the definite integral gives minus the area. Thus, if we wish to use the
integral to compute areas when f (x) ≤ 0, a ≤ x ≤ b, then the (positive) area between y = f (x) and
Rb
the x-axis is given by − a f (x) dx.
Example 3.3. Find the area between y = x and the x-axis for −2 ≤ x ≤ 5.
y
6
−2
-
5 x
Z 0 Z 5
1 1
Area = − x dx + x dx = × 2 × 2 + (5 × 5)
−2 0 2 2
29
= (using areas of triangles).
2
However, this difficulty only really arises if we become too wedded to the idea that a definite
integral gives ‘the area under the curve’. Whilst these area-finding problems are common in school
textbooks, in most real-life applications, calculating the definite integral of a function does not
yield an area. For example, if the integrand is a velocity (in ms−1 , the integral with respect to
time (measured in seconds) gives a displacement (in m). If the integrand f (x) represent the force
(in N) acting in the x-direction on a body at position x (measured in metres from an origin), then
Rb
a f (x) dx gives the work done (in J) in moving the body from x = a to x = b. We can represent
quantities like v(t) or f (x) using graphs, and then the area between the curve and the horizontal
axis gives a helpful visual representation of the integral. But they are not areas, as is clear from
the units.
You will find it much easier to understand some of the concepts in later courses if you keep in
mind how integrals are really defined: as a limit of a sum, the terms of which consist of the value
73
of a function over some small region, multiplied by the size of that region. In the examples we have
seen so far the region is an interval, and so the ‘size’ of the region is simply its length. However,
this quite general way of thinking about integration allows us to extend the idea of integration on
part of the real line, as we have been doing here, to integrating along an arbitrary curve, or over
an area or volume.
For example, consider a cube with sides of length a m, composed of a material that has density ρ
kg m−3 . If the density, ρ is constant, then we can calculate the mass of the cube, M , by multiplying
the density by the cube’s volume: M = ρa3 kg. But what if the density of the cube varies with
position (e.g. the material is heavier near the bottom than near the top)? In that case, we could find
the mass of the cube by subdividing it into smaller cubes, and finding the minimum and maximum
values of the density in each small cube. Then, upper and lower bounds for the mass of each small
cube could be calculated as the product of the maximum (or minimum) density and the volume of
the small cube. On summing up the masses of each small cube, we would obtain upper and lower
bounds for the mass of the large cube, M . The smaller we make the little cubes, the better our
estimate of M would become. Notice that our process here is basically the same as that we used
to define the definite integral of a function f (x) on the interval [a, b]. The difference here is that
our function, ρ (the density) could potentially depend on x, y and z, and instead of sub-intervals
of [a, b], we sum up the contribution from sub-volumes (small cubes) of our region 0 ≤ x ≤ a,
0 ≤ y ≤ a, 0 ≤ z ≤ a. In the limit that the volume of the small cubes tends to zero, we would
obtain the mass, M as a volume integral, written as
Z aZ aZ a
M= ρ(x, y, z) dx dy dz.
0 0 0
These kinds of extensions are outside the scope of this course. However, they help to illustrate
why we have defined integration using sums, which may seem rather convoluted at first.
We did this by finding a function G(t) that satisfied G0 (t) = v(t); for this example, G(t) = 14t + 12 t2
is such a function. Then,
Z 5
x= v(t) dt = [G(t)]50 = G(5) − G(0).
0
74
This method is how you have been used to calculating integrals at school. Similarly, if you had
been asked to find the area in the last example without being told which method to use, you would
probably have gone straight ahead and calculated it using
1 1
x3
Z
2 1 1
A= x dx = = (13 − 03 ) = .
0 3 0 3 3
We will now show that this method of finding a function whose derivative is the integrand gives
an answer consistent with the summation method we have used to define integration. But first, we
need to introduce the following definition.
If G(x) is an antiderivative for f (x), we note from the rules of differentiation that G(x) + c
(where c is a constant - i.e. any real number) is also an antiderivative for f (x), because
d dG
(G + c) = = f (x).
dx dx
In fact, all antiderivatives for a function f (x) on an interval can differ from each other only by a
constant. We can demonstrate this fact as follows. Suppose that we have two antiderivatives for
f (x) on our interval [a, b], G1 (x) and G2 (x). Then
d dG1 dG2
(G1 − G2 ) = − = f (x) − f (x) = 0.
dx dx dx
Since the derivative of G1 − G2 is zero, we must have G1 (x) − G2 (x) = c, where c is a constant,
and thus G1 (x) = G2 + c.
Now, consider a function f (t) which is continuous on the interval t ∈ [a, b], and let x be some
value in this interval. For any x, we can calculate the definite integral of f from a to x using
Definition 3.1. The answer will depend on the particular value of x chosen. Hence we can define a
function, G(x), given by
Z x
G(x) = f (t) dt.
a
As we discussed earlier, the definite integral G(x) can represented by the (signed) area between the
graph y = f (t) and the horizontal axis, for t between a and x, as shaded in pink on the diagram
below.
75
y = f (t)
t
a x b
Then, the change in G as t increases from x to x + ∆x is represented by the area of the region
shown in pink in the diagram below. Since the function f is continuous, it is approximately constant
over the small interval [x, x + ∆x]. Thus the area of the shaded region can be approximated as a
rectangle of height f (x) and width ∆x.
y = f (t)
t
a ∆x b
76
But, if we look back at Definition 2.1, we can see that the left hand side of this equation is precisely
the definition of the derivative of G(x). Hence
Since dG
dx = f (x), the function G(x), satisfies the definition of an antiderivative of f (x). The
following theorem summarises what we have just shown.
Theorem 3.1 (The First Fundamental Theorem of Calculus). Let f (t) be a continuous function
on [a, b], and let a < x < b. Then Z x
G(x) = f (t) dt
a
is an antiderivative for f on (a, b) - i.e.
dG
= f (x).
dx
We now need to show that antiderivatives provide a handy way to calculate definite integrals.
Let Z x
G1 (x) = f (t) dt.
a
where a < x < b. Then, as we have just demonstrated, dG dx = f (x). We note that from the way we
1
have defined G1 ,
Z b
f (t) dt = G1 (b).
a
Now, as we saw earlier Z a
f (t) dt = G1 (a) = 0.
a
But suppose we have another antiderivative for f (x), G2 (x) - can we also use this to calculate
the required definite integral?
As we stated earlier, antiderivatives for the same function can differ only by a constant. Hence
G1 (x) = G2 (x) + c,
We have thus shown that we can calculate definite integrals using any antiderivative for our function
f , which is summarised by the following theorem.
77
Theorem 3.2 (The Second Fundamental Theorem of Calculus). Let f be a continuous function
on [a, b] and G any antiderivative of f on [a, b]. Then
Z b
f (t) dt = G(b) − G(a) = [G(x)]ba .
a
This extremely powerful result gives us a much quicker and easier way of calculating integrals
when an algebraic antiderivative exists. It is clear from our manipulations with sums earlier in
this section that this is the method we would ideally use whenever possible. √ But it is not always
possible to find such an antiderivative: for example, the integrands f (t) = 1 + t3 + t4 and f (t) =
1
sin t + t do not have one. In that case, we would use numerical methods, which are based on the
definition of the integral. These will be covered in more detail in later courses.
We can use the Fundamental Theorems of Calculus, possibly in combination with the rules of
differentiation we learnt in the previous chapter to calculate the derivatives of some integrals.
Z 0
dF
Example 3.4. Find if F (x) = cos t2 dt.
dx x
If the limits on the integral defining F (x) were the opposite way around, this would be
Z xa trivial
application of the First Fundamental Theorem of Calculus (FFTC), since if G(x) = f (t) dt,
a
dG
then = f (x).
dx
Here, we need to swap the order of the limits, and recalling the properties of definite integrals,
we have Z 0 Z x Z x
2 2
F (x) = cos t dt = − cos t dt = (− cos t2 ) dt.
x 0 0
Then, the integrand is f (t) = − cos t2 , and using the FFTC, we have
dF
= f (x) = − cos x2 .
dx
Z x4
d
Example 3.5. Find sec t dt.
dx 1
2
In this example, we again need to find the derivative of an integral, so we expect to use the
FFTC. However, the upper limit on the integral causes us a problem: if it was x, things would be
straightforward, but instead we have x4 here. Let us define
Z x4
G(x) = sec t dt.
1
2
dG
Hence we are trying to find .
dx
We need to get the integral into a form where we can use the FFTC. We can do this by making
a substitution, u = x4 , so Z u
G(u) = sec t dt.
1
2
78
Then, we know that
dG
= sec u,
du
by the FFTC.
In order to obtain the derivative with respect to x, we now use the Chain Rule
dG dG du d
= = sec u (x4 ) = 4x3 sec x4 ,
dx du dx dx
where we substituted u = x4 in the final step.
This denotes the collection of all antiderivatives of f (x) (whether known algebraically or not).
Thus if G0 (x) = f (x), that is, G is an antiderivative, then
Z
f (x) dx = G(x) + c,
79
Knowing the indefinite integrals of these basic functions is essential to being able to calculate the
more complicated integrals you will encounter in future courses, and your professional life after
university. You should ensure you learn them.
Now we know the antiderivatives of some basic functions, R 1 we can compute integrals of them
23
quickly and easily. For example, if we needed to calculate 0 x dx, we can use the fact that we
24
know x23 dx = x24 + C. Then, putting in the limits on the integral we obtain
R
1 1
x24
Z
23 1 1
x dx = = −0= .
0 24 0 24 24
Notice that in the above we did not bother to include the constant of integration, since when we
evaluate a definite integral it cancels between the two terms.
f (u(x)) du
R
Thus, if we can ‘spot’ the function
R u(x), we can turn the complicated looking integral dx dx
into the much simpler-looking one f (u) du. Essentially, we are using the Chain Rule in reverse
to calculate the antiderivative we are looking for.
80
The notation in equation (3.1) makes it look as if we can treat dx and du as entities separate
from the rest of the notation for an integral (although we have not given them any independent
meaning) and ‘cancel’ the factor of dx, so that
du
dx = u0 (x) dx = du.
dx
This expression provides a convenient way of ‘converting’ an integral with respect to x into one
with respect to u in calculations, as we will illustrateRwith the example below.
1
Now let us considerRagain the problem of finding 0 (2x + 7)23 dx. A first step would be to find
the indefinite integral (2x + 7) dx. The integrand can be written as a function, f (u) = u23 ,
23
Now, we can substitute u = 2x + 7 back into our answer, to get the integral in terms of x if we
want to do so: Z
1
(2x + 7)23 dx = (2x + 7)24 + c.
48
In this particular example, we wanted to calculate a definite integral, so we need to include the
limits, which are x = 0 and x = 1. There are two ways we can handle this.
2. Alternatively, we can find the indefinite integral, and substitute for u to get the answer in
terms of x (as we did above). Then we evaluate the result at the relevant values of x - i.e.
Z Z 1 1
1 1 1 24
(2x+7) dx = (2x+7)24 +c
23 23
(2x + 7)24 9 − 724 .
⇒ (2x+7) dx = =
48 0 48 0 48
du
Note: if the integrand can be seen to be a function of u multiplied by something that is not ,
dx
but only differs from it by a constant factor (as in the example above), integration by substitution
can still be used.
81
Z
Example 3.6. Find sin3 x cos x dx.
Here, we note that the integrand is a function of sin x (specifically (sin x)3 ) multiplied by cos x
(which is the derivative of sin x). Hence, it is in the classic form for using integration by substitution,
du
with u = sin x. Noting that = cos x so du = cos x dx, we have
dx
u4 sin4 x
Z Z
sin x cos x dx = u3 du =
3
+ c == + c.
4 4
(ln x)6
Z
Example 3.7. Find dx.
x
1
This time, the integrand can be recognised as a function of ln x, multiplied by , which is the
x
derivative of ln x. Hence, it is again in the classic form for using integration by substitution. We
1
set u = ln x, which implies du = dx, and hence
x
(ln x)6 u7 (ln x)7
Z Z
dx = u6 du = +c= + c.
x 7 7
Even if the integrand is not in the classic form for using integration by substitution, it is sometimes
still helpful to make a substitution if the resulting expression is easier to manipulate, as the following
example illustrates.
Z 3
1
Example 3.8. Find x2 (1 + x) 2 dx.
0
Here, we can see that the integrand involves a function of 1 + x, but the factor multiplying it
is x2 , which is not the derivative of 1 + x. However, setting u = 1 + x would make the square root
term appear simpler, and so this substitution seems like it might be worth trying. Setting u = 1 + x
means x = u − 1 and du = dx. We also note that the limits on the integral, x = 0 and x = 3,
correspond to the values u = 1 and u = 4, respectively. Hence,
Z 3 Z 4 Z 4 Z 4
1 1 1 5 3 1
2 2 2
x (1 + x) 2 dx = (u − 1) u 2 du = (u − 2u + 1)u 2 du = u 2 − 2u 2 + u 2 du.
0 1 1 1
The final expression on the right-hand side above can be integrated term by term to give
2 7 4 5 2 3 4 1696
Z 3
1
x2 (1 + x) 2 dx = u2 − u2 + u2 = .
0 7 5 3 1 105
82
two parts, at least one of which we are able to integrate using the rules we have learned earlier.
Let u(x) and v(x) be two functions. Then, the product rule tells us that
d dv du
(uv) = u +v .
dx dx dx
Integrating both sides with respect to x gives
Z Z
dv du
uv = u dx + v dx.
dx dx
Rearrangement of the above equation gives us the standard formula for integration by parts:
Z Z
dv du
u dx = uv − v dx.
dx dx
Note that when using this formula in calculations, you only add the constant of integration
when the very last indefinite integral disappears.
Integration by parts is very useful for integrals where the integrand can be written as the
product of two functions. In particular, for integrals of the form
Z
xn · F (x) dx,
dv
where F (x) is a trigonometric, exponential function, etc. In this case take u = xn , dx = F (x).
However, these are not the only kind of integrals for which it can be helpful.
We will now look at some examples of how technique can be used in practice.
Z
Example 3.9. Evaluate the integral I1 = xex dx.
Our first task is to get our integral into the same form as the LHS of the formula. We can see
that the integrand can be broken down into a product of x and ex , but we need to choose which
dv
term to identify with u, and which with dx . There is no rule which will always tell us the best way
to do this. We could use trial and error, but often a little thought can save us unnecessary work.
If there was one term which we could not integrate that would be the obvious candidate for u, but
in this case it is equally easy to integrate x and ex . However, we want the integral on the RHS of
the formula to be one we are able to recognise. Hence we choose
du dv
u(x) = x ⇒ = 1, = ex ⇒ v = ex .
dx dx
Substituting the above into the formula then gives
Z Z
du
I1 = uv − v dx = xe − ex (1) dx = xex − ex + c = (x − 1)ex + c,
x
dx
where we have included the constant of integration, c, as the final step.
83
• N.B. It is always good practice to check your answer by differentiating.
dv
• Exercise: Why would it have been a bad idea to choose u = ex , dx = x?
For some integrals, a useful trick is to choose one of the terms in the product to be 1. This
allows us to evaluate integrals we might not be able to do using other methods.
Z
Example 3.10. Evaluate the integral I2 = ln x dx.
Z
The first step is to re-write the integral as I2 = (1) ln x dx. Since we obviously do not know
dv
the integral of ln x, we set u = ln x, in which case we must choose dx = 1. We thus have
du 1
= , v = x.
dx x
Z
Example 3.11. Evaluate the integral I3 = x2 ex dx.
This is similar to the first example, and since differentiating x2 leads to a simpler function, we
choose
du
u(x) = x2 , ⇒ = 2x,
dx
dv
= ex ⇒ v = ex .
dx
Substituting the above into the formula then gives
Z Z Z
du
I3 = uv − v dx = x2 ex − ex (2x) dx = x2 ex − 2 xex dx.
dx
Sometimes by repeating the integration by parts procedure you end up with the original ex-
pression again, and after a little algebra, this enables us to evaluate the integral.
84
R
Example 3.12. For the integral cosh x sin x dx, let
dv
u = cosh x, = sin x
dx
du
= sinh x, v = − cos x
dx Z Z
du
I= cosh x sin x dx = uv − v dx
dx
Z
= cosh x(− cos x) − (− cos x) sinh x dx
Z
= − cosh x cos x + cos x sinh x dx
Now repeat the process, pushing on in the same ‘direction’: we differentiated cosh to sinh, so must
differentiate this sinh; conversely, we integrated sin to get cos, so we must integrate cos.
dv
u = sinh x, = cos x
dx
du
= cosh x, v = sin x
dx Z
I = − cos x cosh x + sinh x sin x − cosh x sin x dx
Remember the rule: only add the integration constant when the last integral sign disappears.
Lecture 19
Earlier in the course, we noted that Newton’s Second Law (the rate of change of momentum of
a body is equal to the force acting on it), when applied to a body of constant mass, m, leads
to the well-known equation F = ma. Whilst this equation can be used to solve a wide variety of
problems, there are important examples where the mass of the body in motion is not constant. One
of these is the motion of a rocket, which emits hot gases (from burning fuel) to produce thrust. The
fuel represents a significant proportion of the initial mass of the rocket (e.g. for the space shuttle,
around 80%), almost all of which is used in reaching orbit.
85
m t
We begin by deriving an equation of motion for the rocket. We let the rocket’s mass be m(t)
and its velocity v(t). For simplicity, we assume it will move in a straight line, which we take as the
positive x direction. The force acting on the rocket is F (t). The rocket emits exhaust gases at a
rate |ṁ|, (where, because this reduces the rocket’s mass, we expect ṁ < 0), with constant velocity
U relative to the rocket. Now let us consider what happens to the rocket in the small interval of
time between t and t + δt. There will be a small change in mass of the rocket, δm (where we expect
δm < 0), and similarly, a small change in the rocket’s velocity, δv. A small amount of exhaust gas
(of mass −δm) will also be emitted. Then, the change in momentum of the system (rocket plus
gas) is
[(m + δm)(v + δv) − mv] + [−δm(v − U )] = mv + mδv + vδm + δmδv − mv − vδm + U δm
| {z } | {z }
change in rocket momentum momentum of gas
= F δt
After a little tidying up, we obtain
mδv + δmδv + U δm = F δt
On dividing through by δt and taking the limit δt → 0 we find
dv dm
m =F −U . (3.2)
dt dt
Note that the term involving a product of the two small quantities δm and δv vanishes as δt → 0.
Now consider the particular case of a rocket firework that lifts off vertically upwards, so the
force acting upon it is gravity (for now, we will neglect other forces such as air resistance that may
be present). Thus F = m(t)g (directed vertically downwards), where g is the acceleration due to
gravity (approximately 9.81 ms−2 ) . The initial mass of the rocket is m0 kg, but it burns fuel at a
constant rate of α kg s−1 , so its mass after t seconds is m(t) = m0 − αt kg. The exhaust gases are
emitted at a constant speed, U ms−1 , relative to the rocket. Substituting this information into our
rocket equation above, the upward velocity, v (in ms−1 ) of the rocket obeys
dv
(m0 − αt) = −(m0 − αt)g − U (−α),
dt
86
where we have a minus sign in front of the force term because it is directed downwards (in the
opposite direction to v). Dividing through by m0 − αt yields
dv αU
= −g + ,
dt m0 − αt
m0
Note that this equation can only be valid for times 0 < t ≤ , at most. The reason for this is
α
that when all the fuel is burned, the rocket will not longer experience a thrust, and the forces on
m0
it will change. This will happen before m(t) reaches zero, which occurs at t =
α
1. Assuming that at t = 0 the rocket is stationary, what is the vertical velocity, v, of the rocket
at time t?
To answer this question, we need integrate the expression for dv
dt . The first term on the RHS
poses no challenge, since g is a constant. The second term is more tricky. However, if we
look at it for a moment or so we see it is a function of m0 − αt multiplied by the derivative
of m0 − αt. Hence, it is the type of integral that integration by substitution can be used to
calculate. We make the substitution w = m0 − αt, and proceed as follows:
Z
αU −1
Z Z
αU 1
v = −gt + dt = −gt + dw = −gt − U dw
m0 − αt w α w
= −gt − U ln w + c = −gt − U ln (m0 − αt) + c.
dx
2. Now find the height, x, of the rocket at time, t. (Note: dt = v.)
We need to compute
gt2
Z Z
αt αt
x(t) = −gt − U ln 1 − dt = − −U ln 1 − dt
m0 2 m0
Once again, integrating the first term is straightforward. To calculate the integral of the
αt
second term, we start by making the substitution, z = 1 − m 0
in order to simplify the
argument of the logarithm. We note that the substitution gives dt = − mα0 . Hence
dz
Z Z Z
αt m
0 m0
ln 1 − dt = ln z − dz = − ln z dz.
m0 α α
We can now use integration by parts (as in the example from lectures) to calculate this
integral. However, we need to be careful with our notation. Usually, we call the two functions
in the integration by parts formula u and v but in this problem we have introduced U and v as
velocities, so there is scope for confusion. Instead, we will call the functions in the integration
87
by parts formula f and g. We set f (z) = ln z and g 0 (z) = 1. Hence f 0 (z) = z1 and g(z) = z,
and we have
Z Z Z
0 1
ln z dz = f (z)g(z) − g(z)f (z) dz = z ln z − z dz = z ln z − z + c
z
αt αt αt
= 1− ln 1 − − 1− +c
m0 m0 m0
where c is a constant.
Putting all of the above together yields:
gt2 gt2 U m0
Z Z
αt
x(t) = − − U ln 1 − dt = − + ln z dz
2 m0 2 α
gt2 U m0
αt αt
=− + 1− ln 1 − −1 +K
2 α m0 m0
U m0
where K is a constant. If the firework starts at x = 0 when t = 0, we must have K =
α
Now, recall that we set m(t) = m0 − αt and hence the fraction of the original rocket mass
m(t) m0 − αt αt
remaining at time t is = = 1− . Hence, rewriting the equation above gives
m0 m0 m0
gt2 U m(t)
m(t) U m0
x(t) = − + ln −1 +
2 α m0 α
2
gt U m0 m(t) m(t)
=− + ln −1 +1
2 α m0 m0
The plot below illustrates the rocket’s trajectory for g = 10, U = 20, α = m0 = 1.
10
x(t)
t
0
0.2 0.4 0.6 0.8 1
If the firework consists almost entirely of fuel (so the mass of the containing shell is negligible),
what height does the rocket attain when all the fuel is used up?
88
As this example very nicely demonstrates, in real-life problems we frequently need to use a
combination of the techniques we have learned in order to calculate the quantities we are interested
in. Unlike the problems you see at school, they almost never involve the application of just one
part of your knowledge!
x
0 a
Consider the problem of finding the area of an ellipse which is given by the equation
x2 y 2
+ 2 = 1.
a2 b
On drawing a diagram, we can see that the area of interest is four times that in the first quadrant
(where x and y are both positive). Recalling that the area beneath a curve y = f (x) (where
f (x) > 0) is given by the integral, the area we wish to find is
Z a
A=4 y(x) dx
0
89
If we think about the properties of trigonometric functions, then we can recall a property which
will come to our rescue here: the fact that, for any θ
On multiplying by a2 , we have
a2 cos2 θ = a2 − a2 sin2 θ.
Comparison with our integrand suggests we try the substitution x = a sin θ. Then
Therefore p
a2 − x2 = a cos θ,
where we have taken the positive square root, since y(x) is positive over the given range. We note
that the limits x = 0 and x = a correspond to θ = 0 and θ = π2 respectively, and that cos θ > 0 for
0 ≤ θ ≤ π2 as required. Since dx
dθ = a cos θ, our substitution gives
Z π Z π
2 2
A = 4b cos θ · (a cos θ) dθ = 4ab cos2 θ dθ.
0 0
Now, cos2 θ is not something we can integrate straight away, but once again the properties of
trigonometric functions come to the rescue. Recall the double angle identity
Thus
Z π Z π Z π
2
2
2 1 2
A = 4ab cos θ dθ = 4ab (cos 2θ + 1) dθ = 2ab (cos 2θ + 1) dθ.
0 0 2 0
d 1
Finally, we note that dθ 2 sin 2θ = cos 2θ and hence
Z π π
2 1 2 π
A = 2ab (cos 2θ + 1) dθ = 2ab sin 2θ + θ = 2ab − 0 = πab.
0 2 0 2
√ √
This example is an illustration of the fact that integrands involving a2 ± x2 , x2 ± a2 can often
be evaluated by an appropriate trigonometric substitution. The strategy is to make a substitution of
the type x = a sin θ, x = a sec θ or something similar to simplify the expression using trigonometric
identities such as
sin2 θ + cos2 θ = 1,
sec2 θ = 1 + tan2 θ.
Often the result can then be expressed as an integral of trigonometric functions of the type already
dealt with above. We have already demonstrated the sin substitution; the next example uses a tan
substitution.
90
Z
1
Example 3.13. Find I = √ dx
x2 x2 + 9
This time, we have a factor of the form x2 + a2 in our integrand; this suggests we should try to
exploit the identity 1 + tan2 θ = sec2 θ to simplify it. If we set x = 3 tan θ then dx 2
dθ = 3 sec θ and
x=3tanµ
µ
3
Note: The hyperbolic functions can also be useful for simplifying integrals of this form, as
they have similar properties to the
Z trigonometric functions. For example, we can use the identity
1 3
cosh2 t − sinh2 t = 1 to find I = 2
3 dx by setting x = 2 sinh t. Then we have (x + 4)
2 =
2
(x + 4) 2
8 cosh3 t, and dx = 2 cosh t dt. The rest of the calculation is left as an exercise.
Whilst both trigonometric or hyperbolic substitutions will work, sometimes one method will
produce a solution more quickly than the other. Unfortunately, there is no clear rule for which will
work best in any particular situation.
91
3.9 Application: population growth
Lecture 20 Suppose we are interested in knowing the population of bacteria being grown in a laboratory
experiment. Let the number of bacteria at time t be given by b(t). At time t = 0, b0 bacteria
are placed in a dish which contains a medium to supply them with necessary nutrients. Early on
in the experiment, when there is only a small number of bacteria present, space and nutrients are
plentiful and the cells reproduce rapidly by division. At this stage, we expect the rate of increase
of the bacteria will be proportional to the population. However, when the bacteria become more
numerous, overcrowding becomes an issue and nutrients begin to become more scarce, so the rate
of population increase slows. Finally, when the bacteria are consuming the nutrients as fast as they
are replenished, the population reaches the carrying capacity of the environment, and population
growth stops. This scenario can be modelled by the logistic equation:
db b
= rb 1 − , (3.3)
dt K
where r is the maximum reproduction rate of the bacteria, and K is the carrying capacity of the
dish.
We can find the population b(t) using the method of separation of variables. We rearrange
equation (3.3) so that all the terms involving b are on one side, and all those involving t on the
other
1 K
b
db = db = dt.
rb 1 − K rb (K − b)
We then integrate this equation
Z Z
K
db = 1 dt.
rb (K − b)
The function on the right hand side is easy to compute. Using the fact that b = b0 at t = 0 to
specify the limits on the integral, we find that the time t at which the population reaches b bacteria
is given by
K b
Z
1
t= dx. (3.4)
r b0 x(K − x)
You will not need to recall the method of separation of variables for this course, as it will be cov-
ered in detail in Maths IB. For now, you can just accept that we need to calculate the integral in
equation (3.4).
The integral on the RHS of equation (3.4) is not one that we can do by substitution; nor will
integration by parts help. What can we do?
A B
Consider adding two fractions, x and K−x where A and B are real numbers. Then
A B A(K − x) Bx A(K − x) + Bx
+ = + = (3.5)
x K −x x(K − x) x(K − x) x(K − x)
The terms on the left hand side of the equation would be straightforward to integrate; what we
have in the denominator on the far right hand side is the same as our integrand. If we can find
92
numbers A and B such that the numerator A(K − x) + Bx = 1, then we have a way of calculating
the integral in equation (3.4).
To find the numbers A and B such that A(K − x) + Bx = 1, there are two ways we can proceed.
• We can compare coefficients of the same powers of x on both sides of the equation. These
must be equal. In our example coefficient of x0 (i.e. the constant term) is AK on the LHS,
and 1 on the RHS. Hence AK = 1, and so A = 1/K. Comparing coefficients of x gives
B − A = 0, so B = A = 1/K.
• Since the relationship A(K −x)+Bx = 1 must hold for any value of x, we are free to substitute
particular values of x which make the equation simpler. In this case, if we choose x = 0, the
second term in our equation vanishes, and we are left with AK = 1. Hence A = 1/K. If
we set x = K, then the first term in the equation will vanish. Then, we have BK = 1, so
B = 1/K.
Of course, both methods give the same answer.
K b 1 b 1
Z Z
1 1
t= dx = + dx. (3.6)
r b0 x(K − x) r b0 x K − x
The integral of the first terms is straightforward, and the second term can be integrated easily
using substitution u = K − x
1 K−b −1
Z
1 b 1 1
t = [ln x]b0 + du = (ln b−ln b0 −[ln u]K−b
K−b0 ) = (ln b−ln b0 −ln (K − b)+ln (K − b0 ))
r r K−b0 u r r
A little algebra can now be used to obtain an equations for the population b at time t:
b(K − b0 ) b b0
rt = ln ⇒ = ert .
b0 (K − b) (K − b) (K − b0 )
One final rearrangement gives
b0 rt Kb0 K
1+ e b= ert ⇒ b= .
(K − b0 ) (K − b0 ) 1+ K
− 1 e−rt
b0
Now let us consider the behaviour of the function b(t). We will assume that K > b0 (so that the
carrying capacity - the maximum number of bacteria the environment can sustain - is larger than
the initial population) and note that r > 0 (so bacteria increase in number). Then, the exponential
term in the denominator is positive but decreasing, and hence the function itself is always increasing.
As t increases, the exponential term decays rapidly at first, and so the population will initially grow
quickly. However, at later times this term will be negligibly small, and hence the population will
stop growing and approach the carrying capacity, K. This behaviour is illustrated by the graph
below.
93
The bacteria population, b(t), where b0 = 1, r = 1, K = 10
b(t) 6
2
t
2 4 6 8 10
1. If the denominator (bottom line) of the fraction is a product of distinct linear factors, then
we write the function as a sum of fractions with the linear factors being the denominators of
each fraction (this is what we did in the population growth example).
1
Example 3.14. If we want to expand the function into partial fractions, we
(x + 1)(x + 2)
write:
1 A B
= + ,
(x + 1)(x + 2) x+1 x+2
where A and B are real numbers, which are as yet unknown. To find A and B we re-write
the partial fractions, putting them back over a common denominator:
A B A(x + 2) + B(x + 1) 1
+ = = .
x+1 x+2 (x + 1)(x + 2) (x + 1)(x + 2)
Then, since the denominators of the middle and right-hand terms of the equation are equal,
the numerators must be also, i.e. 1 = A(x + 2) + B(x + 1). As in our earlier example, there
are two ways to proceed at this point. The first is to compare coefficients of the same power
of x on both sides, which would give two simultaneous equations for A and B:
1 = 2A + B A + B = 0.
94
solving these gives A = 1, B = −1. Alternatively, since 1 = A(x + 2) + B(x + 1) is an identity
(it holds for all values of x) we can substitute appropriate values of x to determine A and B.
2. If the denominator of the fraction contains one or more repeated linear factors, we must in-
clude terms of all powers up to the multiplicity of the factor in our partial fractions.
x2 + 2x − 1
Example 3.15. If we want to expand the function into partial fractions, we
(x + 1)3
write
x2 + 2x − 1 A B C A(x + 1)2 + B(x + 1) + C
= + + = .
(x + 1)3 x + 1 (x + 1)2 (x + 1)3 (x + 1)3
Then, comparing the numerators we have
We can quickly find A, B and C using a combination of the two methods described in the
previous example:
• Substitute x = −1 =⇒ C = −2.
• Compare the coefficient of x2 =⇒ A = 1.
• Compare the coefficient of x0 =⇒ −1 = A + B + C =⇒ B = 0.
Hence
x2 + 2x − 1 1 2
3
= − .
(x + 1) x + 1 (x + 1)3
3. If the denominator of the fraction contains a irreducible quadratic (one which cannot be
factorised into real linear factors) then we must allow the numerator of the corresponding
partial fraction to be a linear function.
x
Example 3.16. To expand the function into partial fractions, we write:
(x − 1)(x2 + 2x + 2)
x A Bx + C
= +
(x − 1)(x2 + 2x + 2) x − 1 x2 + 2x + 2
A(x2 + 2x + 2) + (Bx + C)(x − 1)
=
(x − 1)(x2 + 2x + 2)
95
As before, we can find the values of A, B and C by either equating coefficients, or using the
substitution method. We choose the latter here. Setting x = 1 =⇒ A = 51 . Then, we choose two
convenient values of x say x = 0, −1 to obtain two linear equations to determine B and C:
2
x = 0 =⇒ 0 = 2A − C =⇒ C = ;
5
1
x = −1 =⇒ −1 = A + 2B − 2C =⇒ B = − .
5
Thus
x 1 1 x 2
= − + .
(x − 1)(x2 + 2x + 2) 5 x − 1 (x2 + 2x + 2) (x2 + 2x + 2)
We will not consider cases involving repeated irreducible quadratic factors in this course.
(Note: a quadratic equation can always be factored into complex linear factors, and it is not
uncommon to see students do this when tackling partial fraction problems in assignments or exams.
Although technically correct, the problem is that, since the original function is clearly real-valued,
its partial fraction decomposition (and any resulting integral) should be clearly real-valued too.
This approach usually results in a function with complex terms (which, in fact, cancel out), but
unless it is clearly written as a real-valued function it will not receive full marks. )
96
x2 + 1
Z Z
10
dx = (x + 3) + dx
x−3 x−3
x2
= + 3x + 10 ln |x − 3| + C.
2
1. As the degree of the denominator is five and is greater than the degree of the numerator, no
long division is necessary.
• Substitute x = 0 =⇒ −6 = −3C =⇒ C = 2.
• Substitute x = 3 =⇒ 162 − 162 − 81 − 3 − 6 = 90A =⇒ −90 = 90A =⇒ A = −1.
• As there is nothing else to conveniently substitute, compare coefficients:
x1 =⇒ −1 = −3B + C =⇒ B = 1.
x2 =⇒ −9 = A + B − 3C − 3E = −1 + 1 − 6 − 3E =⇒ E = 1.
x4 =⇒ 2 = A + B + D = −1 + 1 + D =⇒ D = 2.
4. Therefore
97
3.12 Improper integrals
Lecture 22 Present value of a continuous income stream
Suppose you place an amount of money, P dollars in a bank account, with a rate of interest r%.
Let t be the time in years since the money was deposited. Then, provided the interest is added at
intervals much shorter than a year (e.g. daily or weekly), the amount of money in the bank, A(t),
is well-approximated by
A(t) = P ert . (3.7)
Suppose that, having enjoyed your time studying mathematics so much, you decide you would like
to set up a mathematics PhD scholarship to help educate future generations of lecturers. You
would like to be able to offer a scholarship of $27,000 every year into the future. How much money
do you need to give to the university?
We can re-arrange equation (3.7) to determine the amount we need to place in the account in
order to have A dollars, after t years:
P = A(t)e−rt . (3.8)
We call this the present value of year t (we must pay P dollars now to get A dollars in a future
year, t). If we want to award a scholarship worth A dollars (a constant amount) every year for n
years we need to find the sum
Xn Z n
−rt
P = Ae ≈ Ae−rt dt.
t=0 0
(The approximation follows from the definition of the integral in terms of a sum: think of plotting
the graph of Ae−rt against t; the sum approximates the area between the curve and the horizontal
axis.) But what if you wanted the scholarship to continue ‘in perpetuity’ (i.e. forever into the
future)? This would mean taking the limit n → ∞, and so we would need to calculate
Z ∞
Ae−rt dt.
0
This kind of integral, which is performed over an unbounded domain, is called an improper integral
of the first kind. It is improper, because when we defined definite integrals, we considered only
bounded intervals. Thus we need to take care here; it is not obvious that a sensible answer will
exist. It is quite plausible that if we wanted the scholarship to continue forever, we would require
an infinite amount of money.
We proceed in the intuitive way, by calculating the amount of money required for the scholarship
to run for n years, and then taking the limit n → ∞. From earlier
Z n
A −rt n A A
P ≈ Ae−rt dt = − e 0
= − (e−rn − 1) = (1 − e−rn ).
0 r r r
In the limit n → ∞, we see that e−rn → 0, and so we have
Z n
A
lim Ae−rt dt = .
n→∞ 0 r
Thus, if the interest rate is 3% (assumed to remain constant) and we want to offer a scholarship of
$27,000 every year in perpetuity, we need to give a sum of
27, 000
P = = $900, 000.
0.03
98
3.12.1 Improper integrals of the first kind
More generally, we define improper integrals of the first kind using the limiting procedure we saw
in the above example.
where this limit exists. If the limit exists, we say the integral converges. Otherwise, we say it
diverges.
Improper integrals play an extremely important role in probability theory. You are already
familiar with discrete random variables (e.g. the number of heads which appear when a coin is
tossed ten times) where the outcome can only take certain discrete values. The concept of a
continuous random variable is similar, but the outcome can be any real number within a certain
range e.g. the number of hours a light bulb operates before failing. For a continuous random
variable X the probability that X lies in the range [a, b] is defined in terms of the integral of a
probability density function f (x). We write
Z b
P (a ≤ X ≤ b) = f (x) dx.
a
Of course, the functional form of the probability density f (x) will depend on the application.
Example 3.19. The lifetime, T , of a certain type of light bulb is a continuous random variable with
a probability density which follows the exponential distribution. The probability density function
for this distribution is given by
1 −x
f (x) = e µ x ≥ 0,
µ
where µ is the mean of the distribution. (Exercise: check that P (0 ≤ T < ∞) = 1.)
The mean lifetime of this type of bulb is found to be 1 year. What is the probability that a
bulb will last for more than 5 years?
We need to calculate P (T > 5) = P (5 ≤ T < ∞). Using the formula above
Z ∞
∞
e−x dx = −e−x 5 = 0 − (−e−5 ) = e−5 ≈ 0.007
P (T > 5) = P (5 ≤ T < ∞) =
5
99
Note that
Z 5 5
e−x dx = 1 − −e−x 0 = 1 − (−e−5 − (−1)) = e−5 .
P (T > 5) = 1 − P (T < 5) = 1 −
0
1 1
Z 1
1 1
lim 2
dx = lim − = lim −1 +
a→0 a x a→0 x a a→0 a
then we can see that the required limiting value is undefined.
In the above two cases, the value for Zwhich the function is undefined occurs at an end point.
6
1
How would we deal with an integral like 2 dx?
0 (x − 4) 3
In this case, the problem is at x = 4, so we split the integral into two before taking the limit:
Z 6 Z c Z 6
1 1 1
2 dx = lim 2 dx + lim 2 dx.
0 (x − 4) 3 c→4 0 (x − 4) 3 c→4 c (x − 4) 3
Now, we have
Z c
1 1 1 1
lim 2 dx = lim [3(x − 4) 3 ]c0 = lim 3(c − 4) 3 − 3(−4) 3
c→4 0 (x − 4) 3 c→4 c→4
1
= 3(4) 3 ,
100
and similarly,
Z 6
1 1 1 1
lim 2 dx = lim [3(x − 4) 3 ]6c = lim 3(2) 3 − 3(c − 4) 3
c→4 c (x − 4) 3 c→4 c→4
1
= 3(2) 3
• Understand how and why the definite integral is defined in terms of a sum
• State the First and Second Fundamental Theorems of Calculus, and apply them to solve
problems
• Apply the techniques of integration by substitution and integration by parts to calculate the
integrals of complicated functions
• Use long division and partial fractions expansions to calculate integrals of rational functions
• Calculate improper integrals of the first and second kind (or recognise that they do not exist)
You should also look back at the summaries of learning outcomes for the previous two chapters:
this will help you prepare for the final written examination.
101
102
Chapter 4
This section collects together some useful notation, for easy reference.
4.2 Notation
iff “if and only if”
=⇒ “implies”
⇐⇒ “is equivalent to”
∃, 6 ∃ “there exists”, “there does not exist”
∀ “for all” (or “for every”)
≡ “is identically equal to”
∴ “therefore”
103
4.3 Sets
A set is a collection of objects called elements. The notation {x | . . .} or {x : . . .} is read “the set
of objects x such that . . .”.
x ∈ A the object x is an element of the set A
∅ the empty set, that is, the set with no elements at all
A ⊂ B the set A is contained in B, that is, every element of A is also an element of B. This
does not exclude the possibility that A = B.
A ∪ B the union of the sets A and B: A ∪ B = {x | x ∈ A or x ∈ B}
A ∩ B the intersection of the sets A and B: A ∩ B = {x | x ∈ A and x ∈ B}
A\B the difference of the sets A and B: A\B = {x | x ∈ A but x 6∈ B}
N the set of natural numbers
Q the set of rational numbers
R the set of real numbers
C the set of complex numbers.
R2 {(x, y) | x, y ∈ R}.
104