Lectures Compressed (3985) MAT135
Lectures Compressed (3985) MAT135
Calculus I and II
Essentials
ISBN-13: 978-1-9994190-7-3
ISBN-10: 1-9994190-7-3
1st edition © 2022 Dmitriy Panchenko
Acknowledgement
This text grows out of the author’s experience teaching MAT135 & MAT136 at the
University of Toronto during the 2021-2022 academic year. This course in its current form
was developed between 2017 and 2021 under the direction of Sarah Mayes-Tang, with
input from many members of a large teaching team, notably including Bernardo Galvão-
Sousa, who also coordinated the course in 2021-2022 academic year. The material in these
notes reflects the structure of the course as designed by Professor Mayes-Tang, for example
through the emphasis on a flipped classroom and in the selection of topics. The presentation
is intended to complement the treatment of these topics as found in the standard MAT135
& MAT136 course material.
I want to thank the entire MAT135 & MAT136 teaching teams from the previous several
years, and also MAT187 from last year, who created many of the exercises. I also want to
thank all the students in these classes, whose feedback was very important to me, and whose
positive energy made the classes a real pleasure to teach.
Contents
1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Logarithmic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Logarithmic scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.7 Polynomials and rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.8 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.9 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1 Practical interpretation of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.2 Formal definition of derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.3 Derivatives and graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.4 Differentiation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.5 First applications: old and new . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.6 Critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.7 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.8 Parametric families of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.9 Related rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.1 Definite integrals: the case of velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.2 Definite integrals: general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3.3 Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
3.4 Application of FTC: differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
3.5 Techniques of integration: substitution rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.6 Techniques of integration: integration by parts . . . . . . . . . . . . . . . . . . . . . . . . 173
3.7 Approximating integrals using Taylor polynomials . . . . . . . . . . . . . . . . . . . . . 180
3.8 CAS: computer algebra systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
3.9 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
3.10 Slicing problems: geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
3.11 Slicing problems: densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
1.1 Introduction
In this chapter we will study various basic functions that appear frequently in
Calculus, such as linear, exponential, logarithmic, power, polynomial, rational,
and trigonometric functions. Of course, we will often combine these functions by
adding, subtracting, multiplying, dividing, taking compositions and inverses. The
functions themselves, but also various properties of functions and operations we
can do with functions, can be:
• described in words, both in plain English or using mathematical terminology;
• expressed with mathematical formulas and notation;
• depicted and observed in figures via their graphs;
• represented by tables of values.
For this reason, learning Calculus is a lot like learning a new language, and even if
you feel that you understood a new concept, it is important to be able to express it
in different ways and translate it between words, formulas, graphs, and sometimes
recognize it from a table. Let us show an example of how the same information can
be expressed or observed in these different ways.
Example 1. Suppose that a function T =
f (t) describes temperature T changing over
time, and suppose that (in plain English)
the temperature is growing but the growth
is slowing down. Then we can express this
using mathematical terminology by saying
that the function is increasing and concave
down. We can also observe this behaviour
from the graph of y = f (t), or use formulas
and write that its first derivative is positive,
f ′ (t) > 0, and second derivative is negative,
f ′′ (t) < 0 (as we will learn later). Finally, if
1
2 1 Functions
we are given a table of temperature values at a few equally spaced points in time,
for example,
t 0 1 2 3 4 5
T 0 3.75 7 9.75 12 13.75
we can also see that the values are increasing, but the gaps between two consecutive
temperatures are decreasing: 3.75, 3.25, 2.75, 2.25, 1.75. Of course, in this case we
do not know what happens for all times t but, from what we can see in the table,
we can guess that the temperature is growing but it is growing slower and slower,
so the function T = f (t) is probably increasing and concave down.
Domain and range. We will discuss terminology associated with functions all
throughout this chapter, but here let us briefly discuss the domain and range of
a function y = f (x). Generally speaking, the domain of a function is a set of all
√ the function f , and the range is a
possible inputs x that we are allowed to plug into
set of all possible outputs y. For example, y = x has the domain [0, ∞) because we
are only allowed to plug in positive values 0 ≤ x < ∞ into the square root, and the
range is also [0, ∞). However, in some cases the domain may be more narrow for
various reasons.
√ For example, if in a given problem we are only interested in the
function y = x on the interval [1, 2],√for the purpose of that problem the domain
will be [1, 2] and the range will be [1, 2]. In applied problems the domain may be
limited by the physical constraints of the problem.
Exercise 2. A ball is tossed straight up with initial speed 10m/s and initial height
above ground of 2m. The height of the ball at time t is given by h = −5t 2 + 10t + 2
meters. What is the domain and range of the function that describes the height of
the ball until the time it hits the ground? Hint: Recall quadratic formulas or see
Section 1.7.
1.1 Introduction 3
y = b+m·x
where constant b is the y-intercept and constant m is the slope, x is the independent
variable (input of the function) and y is the dependent variable (output of the
function). Linear functions may be the simplest functions in Calculus, but they play
a fundamental role because the notion of a derivative f ′ (a) of a function y = f (x) at
a point x = a will be based on approximating f (x) at that point by linear functions.
Let us take a look at a graph of a linear function and consider any two points
(x1 , y1 ) and (x2 , y2 ) on this graph. We can see that:
• y1 = b + m · x1 – first point.
• y2 = b + m · x2 – second point.
• ∆x = x2 − x1 is called run.
• ∆y = y2 − y1 is called rise.
∆y
• m = tan(θ ) = ∆x is called slope.
• b = b + m · 0 is y-intercept.
The notation ∆ is used often in Calculus and means increment. For example,
above ∆x is the increment x2 −x1 of the variable x, and ∆y is the increment y2 −y1 of
∆y
the variable y. The slope m = ∆x represents how much the output variable changes,
∆y, relative to how much the input variable changes, ∆x. We can think of it as a
rate of change of y with respect to x, and for linear functions it is always the same
no matter what the interval [x1 , x2 ] is.
Example 1. If a car drives with a constant speed of 60 km/h, what is the distance
d it covers in t hours? If we think of the distance as a function of time, what is the
meaning of its slope?
Solution: Since Distance = Speed × Time when the speed is constant, in this case,
Distance d = 60 · t, measured in km/h × h = km. It is a linear function with y-
intercept 0 and slope 60, so the meaning of the slope m = ∆d ∆t is speed. Constant
speed means that distance is a linear function of time.
Example 2. What is the linear function whose graph passes through points (1, 3)
and (5, 2)?
Solution: Since ∆x = 5 − 1 = 4 and ∆y = 2 − 3 = −1, the slope is m = ∆x ∆y
= − 41 =
−0.25. To find the intercept, we can use any one of the two points, for example the
first one: 3 = b + m · 1 = b − 0.25 · 1 = b − 0.25, so b = 3 + 0.25 = 3.25. The linear
function is y = 3.25 − 0.25x.
Exercise 2. What is the linear function whose graph passes through points (4, 1)
and (0, 3)?
Line from slope and one point. In the Example 2 above, once we found the
slope m, we computed the intercept b by plugging in the value of one point. Actu-
ally, if we know the slope m and a point (x0 , y0 ) on the graph of a linear function,
we can write the formula for this linear function directly:
y = y0 + m · (x − x0 ).
Example 3. What is the linear function whose slope is −0.25 and whose graph
passes through the point (1, 3)?
Solution: The function is y = 3 − 0.25(x − 1). Of course, we can multiply out the
second term and rewrite this as y = 3 − 0.25(−1) − 0.25x = 3.25 − 0.25x.
7
Exercise 3. What is the linear function whose slope is 11 and whose graph passes
through the point (2, 1)?
Tables and trend lines. If we are given a table of (x, y) values, it is easy to check
if they all lie on a graph of a linear function y = b + m · x. We only need to check
that all slopes between two consecutive values of x are equal. This is especially
easy if all increments ∆x are the same, in which case we only need to check that all
increments ∆y are also the same.
Example 4. Are all the points in the table lie on the graph of a linear functions? If
yes, which one?
x 0 2 4 6 8 10
y 0 3 6 9 12 15
Solution: We can see that all increments ∆x between consecutive points are equal
to 2, so we only need to check that all ∆y are also the same. Indeed, all ∆y are equal
∆y
to 3, so the points lie on the graph of one linear function. Its slope is ∆x = 23 = 1.5,
and since it passes through the point (0, 0), the formula is y = 0+1.5(x−0) = 1.5x.
6 1 Functions
Exercise 4. Are all the points in the table lie on the graph of a linear functions? If
yes, which one?
x −2 0 3 6 8 11
y −3 1 7 13 17 23
Example 5. Sometimes the points might not be exactly on a straight line but pretty
close to a straight line, as the points in the following table
x 0 2 4 6 8 10
y 0 3.4 5.5 8.8 12.6 15.1
shown as blue dots in the figure below. One way to find the trend line (shown in
red in the figure below) is to use the so called least squares regression, which can
be solved using optimization techniques studied later in this course. For now, we
will simply mention how to find this line using Google Sheets. The equation for
the line is shown at the top of the chart, in this case y = 1.52x − 0.0333.
Exercise 5. Find the trend line (least squares regression) for the following data
points:
x −2 0 3 6 8 11
y −3.3 1.1 7.5 12.4 16.6 23.4
Answer to Exercise 2. y = 3 − 12 x.
7 3 7
Answer to Exercise 3. y = 1 + 11 (x − 2) = − 11 + 11 x.
Here we assume that all the terms make sense, i.e. we do not divide by zero, etc.
P = P0 · at .
P = P0 · at = P0 · (1 + r)t .
In the case of exponential growth, when a > 1, the constant r = a − 1 > 0 is called
the (exponential) growth rate. In the case of exponential decay, when 0 < a < 1,
the constant −r = 1 − a > 0 is called the (exponential) decay rate.
Exercise 3. Write down an exponential function with the growth rate 0.01 and
initial value 1.5. Write down an exponential function with the decay rate 0.04 and
initial value 2.
Example 4. We deposit D dollars into a savings account with annual interest 2%.
How much money is in the account after t years?
Solution: If we start with D dollars, in one year we accumulate interest 0.02D, so
the total will be D + 0.02D = D · 1.02. After two years total will be D · 1.02 · 1.02 =
D(1.02)2 , after three years D(1.02)3 , and after t years D(1.02)t . Actually, a typical
savings account accumulates interest continuously, so if we close the account after
t years, where t is not necessarily integer, we will have D(1.02)t dollars.
Exercise 4. Suppose a radioactive material decays at the rate of 2.5% per year.
What percent of the original will remain after 100 years?
Example 5. Suppose that annual sales at a bakery are growing at 2% per year. How
can we model the annual sales?
Solution: If the sales during current year total A dollars, time t = 1 denotes next
year, t = 2 is the year after next, etc., then as in the previous example the sales
during year t will be A(1.02)t . If one prefers to denote current year by t = 1 instead
of t = 0 then we need to shift time by 1 so that the sales during year t will be
A(1.02)t−1 .
Also, the difference with the previous example is that, for non-integer t, this
formula does not have a particular meaning; for example, A(1.02)2.5 does not di-
rectly represent anything related to sales at t = 2.5 years. Instead of annual sales,
we could model the rate of sales using exponential, but this will be studied much
later because, in this case, sales within any interval of time would be computed
using integrals.
Exercise 5. Suppose that annual sales at a bakery are decreasing at 1% per year.
How can we model the annual sales?
10 1 Functions
a = eκ =⇒ κ = ln(a).
P = P0 · at = P0 · (eκ )t = P0 · eκt .
In the case of exponential growth, when a > 1, the constant κ = ln(a) > 0 is called
the continuous growth rate. In the case of exponential decay, when 0 < a < 1, the
constant −κ = − ln(a) > 0 is called the continuous decay rate.
Example 6. Find the continuous growth/decay rate of the functions y = 2 · 1.1x and
y = 3 · 0.98x .
Solution: In the first case, κ = ln(1.1) = 0.0953 . . . is the continuous growth rate,
and we can write the function y = 2 · 1.1x as y = 2 · e0.0953x . In the second case,
κ = ln(0.98) = −0.0202 . . ., so the continuous decay rate is 0.0202, and we can
write the function y = 3 · 0.98x as y = 3 · e−0.0202x . Notice that the decay rate itself
is positive, but the fact that the function is decreasing (decay) instead of increasing
(growth) is expressed by the minus sign in the exponent e−0.0202x .
1 H/H
= P20 . We can also find this
because, in this case, P(H) is equal to P0 · 2
formula by taking an exponential function P = P0 · at , making sure that P(H) is
equal to P20 , and solving for a:
P0 1 1 1/H
P(H) = P0 · aH = =⇒ aH = =⇒ a= .
2 2 2
1 t/H
Then P = P0 · at = P0 ·
2 , which matches the formula above.
ln(ab ) = b ln(a).
Exercise 8. Find the exponential function y = y0 · ax that passes through the points
(2, 5) and (6, 1).
We can also use the above property to check if the points in a table correspond
to some exponential function.
x 0 2 4 6 8
y 4 6 9 13.5 20.25
Solution: We see that the increments of x in the table are all equal to h = 2, so the
ratio y(x+2) 2 x
y(x) should be the same for all x, equal to a if y = y0 · a . Indeed,
6 9 13.5 20.25
= = = = 1.5,
4 6 9 13.5
2
√ values in the table correspond to exponential function with a = 1.5, or
so the
a = 1.5. We can find y0 using any point in the table, for example the first one,
4 = y(0) = y0 · a0 = y0 , so y = 4 · (1.5)x/2 .
x −5 −2 1 4 7
y 8 12 18 28 40.5
Later on we will discuss the so called logarithmic scale, which will allow us
to observe more easily whether the points lie on the graph of some exponential
function, even in the case when the increments of x are not necessarily equal.
Exercise 10. The world population in 1950 was 2.5 billion and in 1987 it was 5
billion. What was the average growth rate during this period?
Answer to Exercise 5. If the sales during current year total A dollars, time t = 1
denotes next year, t = 2 is the year after next, etc., then the sales during year t will
be A(0.99)t .
Answer to Exercise 9. No, because all increments of x are equal to 3, but the ratio
12 28
8 = 1.5 is different from 18 = 1.555 . . . .
Answer to Exercise 10. r = 21/37 − 1 = 1.01891 or about 1.89%. For more about
this example, see youtu.be/9_VJ2PvZBuo.
Answer to Exercise 11. There are two ways to answer this question. First, we
show in the previous example that the top asymptote in the figure is y = a, so the
function takes values between 0 and a. If we want the function to be equal to 0.5 at
−ct
some point x, we must have a > 0.5. Another way to solve this is set ae−be = 0.5
and start solving for t:
−ct 1 1 ln(2a)
e−be = =⇒ − be−ct = ln = − ln(2a) =⇒ e−ct = .
2a 2a b
Before we take logarithms of both sides again, we must notice that the exponential
e−ct on the left hand side is always positive, so the right hand side must also be
positive. Since we agreed that b > 0, the numerator ln(2a) must be positive. This
means that 2a should be bigger than 1 or, again, a > 0.5. So a must be bigger than
0.5, otherwise, such t does not exist. If a > 0.5 then we can take logarithms on both
sides of the last equation above to get that t = − 1c ln ln(2a)
b .
1.4 Logarithmic functions 15
b = 10a ⇐⇒ a = log(b).
b = ea ⇐⇒ a = ln(b)
then log is replaced by the natural logarithm ln(b), again assuming that b > 0.
Remark. One can similarly define logarithm with any positive base, but we will
only use log(x) and ln(x). In fact, most of the time we will use ln(x), and even
log(x) might appear only occasionally. We mentioned in the previous section that
exponential function ex with the base e is the most commonly used exponential
function in Calculus because of its nice properties (that we will learn later on).
As a consequence, natural logarithm ln(x) is the most commonly used logarithmic
function and, by default, ‘logarithm’ refers to ln(x).
Basic properties of logarithms. Let us take a look at the graph of y = ln(x) and
discuss some of its properties after stating them first.
• The first property means that if (a, b) is on the graph of y = ex then (b, a) is on
the graph of y = ln(x) (the two coordinates are flipped). This is true because,
if (a, b) is on the graph of y = ex this means that b = ea , which means that
a = ln(b), which means that (b, a) is on the graph of y = ln(x).
16 1 Functions
• The second property is true because ea takes only positive values, so we can
solve ea = x for a only if x is positive. This is a good time to mention the ter-
minology of the domain and range of a function y = f (x). Domain means the
set of all allowed inputs of a function, and range means the set of all possible
outputs. For example, domain of ex is the set of all real numbers R = (−∞, ∞),
while the range is the set of all positive numbers (0, ∞). For ln(x), it is exactly
the opposite – domain is (0, ∞) and range is R.
• The third property we can see from the graph, but we can also think of it this
way. If x > 0 is small and ea = x then a = ln(x) must be a large negative number,
because the exponential growth function ea takes small values x when the input
a is approaching negative infinity.
• The fourth property is clear because e0 = 1 implies that 0 = ln(1).
• The last property has two parts: the first is that ln(x) never stops growing (so it
does not have a horizontal asymptote!) and the second is that it grows slowly.
First, why will ln(x) eventually be equal to any large number we want, for ex-
ample, 100? That is because ln(x) = 100 means that x = e100 and, in this case,
‘eventually’ simply means e100 . This also illustrates how slowly the logarithm
grows. Even though it will reach 100, we have to ‘wait’ until e100 which is ap-
proximately equal to 26881171418161354484126255515800135873611118.
Exercise 1. What is the domain of y = ln(−x)? How does its graph look like?
Algebraic properties of logarithms. From the definition above, one can derive
several important algebraic properties of the logarithm:
a
ln(ab) = ln(a) + ln(b), ln = ln(a) − ln(b)
b
ln(ac ) = c · ln(a), ln(ex ) = x, eln x = x.
Exercise 2. If the room temperature is 20°C and the temperature of a cup of coffee
is 80°C at time t = 0 then the coffee will cool down according to the formula
T = 20 + 60 · e−κt for some constant κ. At what time will the temperature reach
70°C? Your answer may depend on κ.
Logarithmic scales. There are many quantities that are conventionally mea-
sured on logarithmic scales when the original (more physical) measurement can
cover a wide range of values of very different orders of magnitude, from very
small to very large. We will give three examples below.
The Richter magnitude R of an earthquake is defined as
A
R = log
A0
This means that if the Richter magnitude R increases by 1, the amplitude A (or the
strength of the earthquake) increases 10 times. For example, magnitude 9 earth-
quake would correspond to A = 106 mm= 1 km, which means that in practice the
measurements are not as simple as the definition suggests.
1
pH = log = − log H +
H+
Example 6. If your earphones can output 110 dB and your friend’s earphones can
output 100 dB, how much more damage can you do to your ears? Is it only 10%
more?
Solution: If p1 is the maximum sound pressure of your earphones and p2 is the
maximum sound pressure of your friend’s earphones then the above formula gives
p p
1 2
110 = 20 · log , 100 = 20 · log .
p0 p0
1.4 Logarithmic functions 19
From here we can solve it in two ways. First, we can subtract the two equations
and use properties of the logarithm,
p p p
1 2 1
110 − 100 = 20 · log − 20 · log = 20 · log ,
p0 p0 p2
√
which implies that log pp21 = 20 = 0.5, so pp12 = 10 ≈ 3.16. This means that your
10
earphones are 3.16 times as noisy in terms of sound pressure. Again, it is not just
10% more, because the dB measurement is on the logarithmic scale. Another way
is first to solve the above pressure level equation for p,
p = p0 · 10L/20 ,
Exercise 6. If the sound pressure level of a jackhammer is 100 dB and of the jet
engine is 140 dB, how much louder is the jet engine in terms of sound pressure?
Answer to Exercise 1. Because we can only plug in positive numbers into loga-
rithm, −x must be positive, so −x > 0, or x < 0. The domain is all negative num-
bers, (−∞, 0). The graph will be the same as ln(x) flipped around the y-axis. It is
always the case that the graph of y = f (−x) is graph of y = f (x) flipped around the
y-axis.
1 N t 1 1 N 1
ln = + ln =⇒ ln = 2t + ln
100 100 − N 50 100 99 100 − N 99
N 1 2t 2t 2t
=⇒ = e =⇒ 99N = 100e − Ne
100 − N 99
100e2t
=⇒ N(99 + e2t ) = 100e2t =⇒ N = .
99 + e2t
The exponential trend line A = 1.98e1.1t looks like a good fit, but how could we
expect this just by looking at data points? One way is to transform the dependent
variable, in this case the area A, into log(A) or ln(A). That is because if A = A0 · at
then, taking logarithms,
t 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
A 2.5 2.6 3.9 5.3 5.9 7.0 9.4 12.1 14.4 17.6
log(A) 0.39 0.42 0.59 0.73 0.77 0.84 0.97 1.08 1.16 1.24
Although base 10 is commonly used in log-scale plots, the same would hold if we
used base e and natural logarithm:
Notice that in both cases if the slope on the log-scale is positive, m > 0, then we
have an exponential growth on the original scale and, if the slope on the log-scale
is negative, m < 0, then we have an exponential decay on the original scale.
Example 2. Find y as a function of x if (a) log(y) = 2 + 2x , (b) ln(y) = 1 − 2x. What
is the continuous growth/decay rate in both cases?
22 1 Functions
We can see that the power function y = 2.9 · x3.03 (shown in red) is a good fit, and
it can be found in the same way as above only selecting Power Series instead of
Exponential under the trend line type.
How can we guess that a power function might be a good fit? This can be done
by transforming both variables x and y into log(x) and log(y). The reason is because
if y = c · xκ then, taking logarithms,
log(x) -1.00 -0.70 -0.52 -0.40 -0.30 -0.22 -0.15 -0.10 -0.05 0.00
log(y) -2.46 -1.68 -1.06 -0.75 -0.42 -0.23 0.02 0.20 0.34 0.46
Although base 10 is commonly used in log-scale plots, the same would hold if we
used base e and natural logarithm:
Exercise 6. The figure below is from a New York Times article about bear mar-
kets.3 On the figure it says that “the vertical scale is adjusted so that percentage
changes are comparable”. What is the vertical scale, and why does it make per-
centage changes comparable?
• Many quantities take values on vastly different scales. For example, for several
decades the Dow Jones was below or around 100, while in recent two decades
3 https://www.nytimes.com/2022/06/13/business/bear-market-timeline-stocks.html
26 1 Functions
it was around or above 10000. In addition to examples from the last section
(strength of earthquakes, acidity or alkalinity of a solution in water, sound pres-
sure), other examples include mass and size of plants, or luminosities of stars4 .
In the first figure above we can see that, if we plot values of different magnitude
on the same graph, we can barely distinguish the values that are relatively small.
For example, we can barely see what is going on with the Dow Jones between
1900 and 1980. Logarithm grows slowly, so it can make very small and very
large values comparable to each other.
• As we discussed in this section, log-scale transforms an exponential function
into a linear one. If we observe a linear trend on the log-scale, it suggests an
exponential trend on the original scale. Although the Dow Jones index has large
fluctuations, we can see that long term it follows a roughly linear trend on the
log-scale.
• As we saw in the last exercise, log-scale makes relative changes comparable.
For example, when we look at the Dow Jones on the original scale, we might
think that the Financial crisis of 2007–2008 was the worst stock market crash,
while on the log-scale we can clearly see that the Wall Street Crash of 1929 was
much worse.
• The same considerations apply to the log-log-scale, except that it turns a power
trend into a linear one.
Answer to Exercise 1. The exponential trend line is A = 1.44e1.33t .
Answer to Exercise 6. This is a log-scale. In other words, the y-axis reflects the
logarithm of the S&P 500 price P(t). If the stock market changed by 100 · r%
between time t and t + ∆t then the price P(t + ∆t) = P(t) · (1 + r). After taking
logarithms, we get log P(t + ∆t) − log P(t) = log(1 + r), and since the graph shows
y(t) = log P(t), this means that y(t + ∆t) − y(t) = log(1 + r). In other words, the
same increment on the log-scale corresponds to the same percentage change. That
is why log-scale makes percentage changes comparable no matter what the actual
value is.
4 https://en.wikipedia.org/wiki/Hertzsprung-Russell_diagram
1.6 Trigonometric functions 27
for some constants a, b, c and d. Let us break this down into steps. At the same time,
you can visualize by dragging sliders in this app: www.geogebra.org/m/csuqnyhc.
Example 2. If we shift the graph of y = f (x) up by 1 and to the left by 2, and then
stretch it both horizontally and vertically 2 times, what function will this graph
correspond to?
Solution: y = 1 + f (x) shifts up by 1, then y = 1 + f (x + 2) shifts to the left by 2,
then y = 2(1+ f (x+2)) stretches vertically 2 times and, finally y = 2(1+ f ( 2x +2))
stretches horizontally 2 times (shrinking by a factor of 12 is stretching by a factor
of 2). Notice: compared to Example 1 we simply changed the order, but we got a
very different answer.
Exercise 2. If we shift the graph of y = f (x) down by 1 and to the right by 7, and
then stretch it vertically 2 times and shrink it horizontally 3 times, what function
will this graph correspond to?
Example 3. If the graph of y = f (x) is given
by the solid green curve, what is the function
whose graph is given by the dashed blue curve?
Hint: Use the grey dotted curve as a guideline.
Solution: Dashed curve looks like the dotted
curve shifted by 2 to the right, and dotted curve
looks like the solid curve shrunk vertically by a
factor of 2. So we shrink first, y = 0.5 f (x), and
then shift to the right, y = 0.5 f (x − 2).
Sine and cosine. Let us recall the graphs of functions sin(x) and cos(x), and
recall some of their basic properties.
• Both fluctuate between −1 and +1. We say
that their amplitude is equal to 1.
• Both are periodic functions with the period
2π. This means that sin(x + 2π) = sin(x) and
cos(x + 2π) = cos(x), and the graphs of these
functions repeat the same pattern every 2π.
• Many natural phenomena are (approxi-
mately) periodic, for example, daylight
hours, average monthly temperatures, ocean
tides, circadian rhythms, heart beat, etc. Cosine and sine can be used to describe
(or model) some of them.
1.6 Trigonometric functions 29
Just like sine and cosine, a function y = f (x) is said to be periodic with the
period T if f (x + T ) = f (x) for all x, in which case the graph of this function
repeats every T (units of x).
Compared with the above general transformation y = b + a f c(x − d) , when
dealing with sin(x) and cos(x) we will write this transformation as
2π
y = b + a · cos (x − d)
T
by replacing the constant c with 2π T , for some constant T > 0 (and, similarly, for
sin(x)). This means that we shrink the graph of cos(x) horizontally by a factor 2πT ,
T
which is the same as stretching it by a factor 2π , which means that the period 2π
becomes T . In other words, we express the horizontal stretch factor in terms of the
period T , which has an important practical meaning and which is often easier to
see visually. Let us look at the graph of this function.
• Vertical stretch factor a is called ampli-
tude.
• Vertical shift b is also called average, since
the function fluctuates around this level.
• Horizontal shift d is also called phase shift.
• T is the period.
• 2π
T is called angular frequency and T is
1
called frequency.
Example 4. What is the amplitude, average, phase shift, period, and frequency of
y = 2(0.5 + 1.5 cos(3x + 1))?
Solution: If we rewrite it as y = 1 + 3 cos 3(x + 31 ) , we see that the amplitude is 3,
average is 1, and phase shift is − 13 (minus because + 13 means that we shift to the
left). Since 2π 2π 1
T = 3, period is T = 3 , and frequency is T = 2π .
3
Exercise 4. What is the amplitude, average, phase shift, period, and frequency of
y = −3 + 4 cos( x−2
3 )?
Solution: We see that the minimum is 0.2 and maximum is 2.2, so the average
is 1.2. Then the amplitude is the difference between the maximum and average,
2.2 − 1.2 = 1. The function looks like cosine shifted to the right by 0.4, so phase
shift is 0.4. We can also see that the period is 1, for example by looking at the
difference between consecutive peaks 1.4 − 0.4 = 1. Therefore, the function is y =
1.2 + cos(2π(x − 0.4)).
Varying average and amplitude. The figure above5 depicts monthly number of
sunspots over 400 years of sunspot observations, which follow an approximately
11 year solar cycle. We can think of T = 11 as the period of this cycle, but the
amplitude and average are varying over time. Let us look at a couple of simple
examples of functions of this form:
2π
y = b(x) + a(x) · cos (x − d)
T
where b = b(x) is the average and a = a(x) is the amplitude, both varying with x.
Example 6. Write down a possible model for the function below on the left.
Solution: The distance between consecutive peaks appears to be about 10, so we
can take the period of one cycle to be T = 10. The curve is bounded above by the
60
line that passes through (0, 0) and (90, 60), so its slope is 90 = 2/3 and the line is
5 Robert A. Rohde, commons.wikimedia.org/wiki/File:Sunspot_Numbers.png
1.6 Trigonometric functions 31
y = 2x
3 . The curve is bounded below by the x-axis y = 0. That means that the average
(line through the middle) is b(x) = 3x , and the amplitude is a(x) = 2x x x
3 − 3 = 3 . As
a result, we can guess that the curve is y = 3x + 3x cos( 2π
10 x). It appears that the first
peak is shifted from zero, so there might be a phase shift, but this is just an artifact
of scaling cosine by a varying amplitude which is 0 at x = 0. The peaks away from
zero appear to be near 10, 20, 30,etc., so there is no phase shift.
Exercise 6. Write down a possible model for the function above on the right.
Exercise 7. The diameter of a Ferris Wheel is 30 meters, and at the lowest point it
is 5 meters above the ground. It takes the Ferris Wheel 2 minutes to complete one
revolution. What is the height y = h(t) of a rider starting at time t = 0 at the highest
point?
32 1 Functions
y = a0 + a1 x + a2 x2 + . . . + an−1 xn−1 + an xn
Exercise 1. For the polynomials (c) and (d) in the above figure, determine the
smallest possible degree n, whether degree n is even or odd, and the sign of the
coefficient an .
for some constant c. If we also know the value of p(x) at some point x other than
the roots, we can use it to determine c.
Example 2. Determine the cubic polynomial
(a) in the figure.
Solution: We can see from the graph that the
roots are −1, 1 and 3, so p(x) = c(x + 1)(x −
1)(x − 3). We can also see from the graph that
p(0) = 6, so 6 = c(0 + 1)(0 − 1)(0 − 3) = 3c,
and c = 2, so p(x) = 2(x + 1)(x − 1)(x − 3).
If we need to, we can multiply this out to get
p(x) = 6 − 2x − 6x2 + 2x3 .
Exercise 2. Determine the cubic polynomial
(b) in the above figure.
Example 3. In the figure, the graphs of two
polynomials are visible in a limited region.
Dashed red line is a cubic polynomial with
negative leading coefficient, a3 < 0. How
many zeros does this polynomial have, and
what can we say about their location?
Solution: We can see two roots, x1 = −3 and
x2 = +3. Since a3 < 0 and the leading term
a3 x3 dominates, polynomial must go to −∞
when x goes to +∞, so the graph must start
decreasing eventually. The graph can change
direction at most 3 times, which means that it must start decreasing somewhere on
the right of the observed region, and so there will be another root x3 > 7.
Exercise 3. In the above figure, solid blue line is a quartic polynomial (of degree
4) with negative leading coefficient, a4 < 0. How many zeros does this polynomial
have, and what can we say about their location?
Quadratic polynomials. Polynomials of
degree 2, y = ax2 + bx + c, appear frequently
in Calculus, so let us recall their basic prop-
erties. Their graphs are given by parabolas,
which open upwards when a > 0 and open
downwards when a < 0. If the discriminant
D = b2 − 4ac is nonnegative, D ≥ 0, then this
polynomial has roots
√ √
−b − b2 − 4ac −b + b2 − 4ac
x1 = , x2 = .
2a 2a
36 1 Functions
If the discriminant is equal to zero, D = 0, then the roots are the same. If D < 0
then there are no roots and the parabola is entirely above or below the x-axis. The
extreme point of the parabola (minimum or maximum) is called the vertex and
b b2
its coordinates are x = − 2a and y = c − 4a . Later we will learn how to find this
extreme point by setting the derivative to zero (since the slope of the parabola is
zero at that point), but one can also find it using simple algebra, by completing the
square.
Example 4. Find the roots (if they exist) and vertex of y = 2x2 − 2x − 4 by com-
pleting the square.
Solution: First factor out the leading coefficient y = 2(x2 −x−2). Next, to complete
the square for x2 − x − 2, we want to create something that looks like (x ± r)2 =
x2 ± 2rx + r2 . In this case, we want −x to look like −2rx, so we must take r = 0.5.
Then we add and subtract r2 = 0.52 and rewrite
This finishes completing the square: y = 2(x−0.5)2 −4.5. The vertex is at the point
(0.5, −4.5) since the parabola opens upwards and will have a minimum when the
term (x − 0.5)2 is as small as possible, i.e. when x = 0.5. Another way to see this
is by noticing that y = 2(x − 0.5)2 − 4.5 is obtained from y = x2 by stretching it
vertically 2 times, then shifting to the right by 0.5 and down by 4.5. We can also
find the roots once we completed the square:
p(x)
y=
q(x)
where both the numerator p(x) and denominator q(x) are polynomials. Such func-
tions are undefined whenever q(x) = 0, so the roots of the denominator are not in
the domain. Also, these roots are often vertical asymptotes, for example, when the
numerator p(x) ̸= 0 at the same point, so the function will approach −∞ or +∞
as the variable x approaches the root of q(x). Rational functions can sometimes
have horizontal asymptotes as x approaches −∞ and +∞, and we will see in the
examples how these asymptotes can be determined.
1.7 Polynomials and rational functions 37
2x2 −4x−96
2x2 − 4x − 96 x2
2 − 4x − 96
x2 2−0−0
= = 1 30
→ =2
x2 − x − 30 x2 −x−30 1− − 1−0−0
x2 x x2
The calculation of the horizontal asymptote in the above example can be used
to show the following.
• If the degrees of p(x) and q(x) are equal then the horizontal asymptote is the
ratio of their leading coefficients.
• If the degree of p(x) is smaller than the degree of q(x) then the horizontal asymp-
tote is y = 0.
• If the degree of p(x) is bigger than the degree of q(x) then there is no horizontal
asymptote at infinity.
38 1 Functions
Exercise 6. Match the dashed green curve to one of the rational functions in the
above exercise.
Answer to Exercise 1. (c) Degree n must be odd, because p(x) goes to both +∞
when x goes to +∞, and −∞ when x goes to −∞. Coefficient an must be positive,
again, because an xn dominates and becomes positive when x goes to +∞. The graph
does not change direction, so the smallest possible degree is n = 1, but because the
graph is not linear, the smallest possible degree is n = 3.
(d) Degree n must be even, because p(x) goes to +∞ both when x goes to +∞
and to −∞. Coefficient an must be positive, because an xn dominates and becomes
positive when x goes to +∞. The graph changes direction 2 times, so the smallest
possible degree is n = 2, but because the graph does not look like a parabola, the
smallest possible degree is n = 4.
Answer to Exercise 3. We can see one root, x1 = 6. Since a4 < 0 and the leading
term a4 x4 dominates, polynomial must go to −∞ when x goes to −∞, so the graph
1.7 Polynomials and rational functions 39
must start decreasing eventually on the left. The graph can change direction at most
4 times, and from what we can see there will be one more root somewhere to the
left of −6, x2 < −6.
Answer to Exercise 4. y = 2(x + 1)2 − 8. Roots are −3 and 1. Vertex is (−1, −8).
x
Answer to Exercise 6. y = (x+2)(x−3) . The vertical asymptotes are the same, so the
denominator should be the same. Horizontal asymptote is zero, so the degree of the
numerator should be smaller, and our only choice given was x. Another way to see
this is to notice that the function changes sign as we cross x = 0. If the numerator
was x2 , it would not change sign at x = 0, so the behaviour would be like in the
previous example, where the blue curve does not change sign at x = 0.
40 1 Functions
y = ex ⇐⇒ x = ln(y).
words, if we can solve the equation b = f (x) for unknown x and the solution is
unique, it is denoted x = f −1 (b). Suppose that such solution x = f −1 (b) is unique
for every point b in the range of f . If so, the function f is called invertible and
x = f −1 (y) is called the inverse function of y = f (x). Notice how the output y of
f becomes the input of its inverse f −1 and the input x of f becomes the output of
f −1 . This means that:
• The domain of f is the range of f −1 and the range of f is the domain of f −1 .
Range and domain switch when taking an inverse.
To summarize in plain English, a function y = f (x) is invertible if, for any possible
output y in the range of f , we can determine exactly what the input x was.
Warning! The superscript −1 in f −1 is just a notation for the inverse, √
and it does
1
not mean the reciprocal f . For example, in Example 1 above, f (x) = x + 1 and
f −1 (y) = (y − 1)2 , not √x+1
1
.
Exercise 2. After taking 100 mg of aspirin, the amount of aspirin in a patient’s body
is m = f (t) = 100 · (0.5)t/20 mg, where time t is measured in minutes. A therapeutic
effect of aspirin becomes negligible after 4 half-lives, so we only consider this
function on the domain between t = 0 and 4 half-lives. Is this function invertible?
What is the domain and range of f −1 ? What is the meaning of f (60), and what are
the units of 60? What is the meaning of f −1 (25), and what are the units of 25?
Horizontal line test. If we are given the graph of a function y = f (x), we can
see that f is invertible if every horizontal line y = b intersects the graph not more
than once. If it does not intersect then b is not in the range of f . If it intersects
more than once, such b is the output f (x) of more than one input x, so we cannot
determine f −1 (b), and f is not invertible.
42 1 Functions
The domain of arcsine is [−1, 1] and the range is [− π2 , π2 ]. As we can see here, the
domain is very important when deciding is the function is invertible.
Cancelling inverses. The above example shows that if f is invertible then the
inverses cancel each other,
but x must be in the domain of f and y must be in the range of f that appear in the
definition of the inverse f −1 .
Monotonic functions. The most common
reason for a function to be invertible is when it
is monotone, which means that it is strictly in-
creasing or strictly decreasing, as in the above
examples. A couple of points to keep in mind.
• The function (a) in the figure is increasing,
but not strictly increasing, because it is equal
to 0.2 on the interval 0 ≤ x ≤ 0.4. It does not
pass the horizontal line test and is not invert-
ible, so ‘strictly’ part is important.
1.8 Inverse functions 43
pass the horizontal line test. However, if we limit the domain to (− π2 , π2 ), tangent is
strictly increasing there, so we can define the inverse function, called x = tan−1 (y)
or x = arctan(y). If we plot it on the same x-y plane, i.e we plot y = tan−1 (x),
we can see that, indeed, the graph is a mirror image of the original graph around
the diagonal. The domain of tan−1 (x) is the entire real line, −∞ < x < ∞, and the
range is (− π2 , π2 ). Since y = tan(x) has vertical asymptotes at x = π2 and x = − π2 ,
y = tan−1 (x) has a horizontal asymptote y = π2 as x approaches +∞ and a horizon-
tal asymptote y = − π2 as x approaches −∞. These functions will arise naturally in
applications, but they are also good examples to keep in mind whenever we need a
function with vertical or horizontal asymptotes.
Answer to Exercise 2. The function f (t) is an exponential decay with the half-life
of 20 minutes, so the domain is [0, 80] and the range is [6.25, 100], because f (0) =
100 and f (80) = 6.25. It is invertible and, in fact, we can find the inverse f −1
20 ln m 20 ln 100
explicitly by solving m = 100 · (0.5)t/20 for t: t = ln 0.5
100
= ln 2m . The domain
of the inverse of [6.25, 100] and the range is [0, 80]. The meaning of f (60) is the
amount of aspirin left in the body after 60 minutes. The meaning of f −1 (25) is the
number of minutes until there is only 25 mg left in a patient’s body.
lim f (x) = 2.
x→+∞
Before we summarize all the definitions and notation, let us demonstrate them on
a specific example.
Example 1. For the function in the figure:
• What are the left and right limits of f (x)
at x = 0? Does the limit of f (x) at x = 0
exist? Is the function continuous as x = 0?
• What are the left and right limits of f (x) at x = −2? Does the limit of f (x) at
x = −2 exist? Is the function continuous as x = −2?
Solution: We see that as x approaches 0 from the right side (i.e. x > 0 is getting
close to 0), the value of the function f (x) approaches 8. In this case we say: the
right limit of f (x) at x = 0 is equal to 8, which is expressed using mathematical
notation as:
lim f (x) = 8.
x→0+
Here, notation x → 0+ means that x goes to 0 from the right. Similarly, we see that
as x approaches 0 from the left side (i.e. x < 0 is getting close to 0), the value of
the function f (x) also approaches 8. In this case we say: the left limit of f (x) at
x = 0 is equal to 8, and we write:
lim f (x) = 8.
x→0−
Here, notation x → 0− means that x goes to 0 from the lef t. When the function
approaches the same value from both sides, we say that the limit exists and, in this
46 1 Functions
lim f (x) = 8.
x→0
Here, notation x → 0 means that x goes to 0 from both sides. From the graph we
can see that f (0) = 5, indicated by the solid dot at (0, 5), so the limit of f (x) at
x = 0 is not equal to f (0),
In this case, we say that the function f (x) is discontinuous (or not continuous) at
x = 0.
In the second case of x = 2, we see that
and we see that f (2) = 6, indicated by the solid dot at (2, 6). Since the function
approaches different values from the left and right sides, we say that the limit does
not exist. In this case, we again say that the function f (x) is discontinuous at x = 0.
In the third case of x = 3, we see that
lim f (x) = 2,
x→3
because the function approaches the same value 2 from both sides, so the limit
exists and is equal to 2. However, the function f (x) is undefined at x = 3, because
the white open dot at (3, 2) indicates that the value of f (3) is not 2, and there is
no solid dot anywhere on the line x = 3, which is a way to indicate that x = 3 is
not in the domain of f . In this case, again, the function f (x) is discontinuous at
x = 3 simply because we cannot compare the limit with any value f (3). Whenever
a point x = a is not in the domain of f (x), the function cannot be continuous at that
point.
Finally, in the last case of x = −2, we see that
lim f (x) = 6,
x→−2
because the function approaches the same value 6 from both sides, so the limit
exists and is equal to 6. The value of the function at that point is f (−2) = 6, so
• What are the left and right limits at x = 2? Does the limit of f (x) at x = 2 exist?
Is the function continuous as x = 2?
• What are the left and right limits of f (x) at x = −1? Does the limit of f (x) at
x = −1 exist? Is the function continuous as x = −1?
1
Solution: f (x) = x−2 is continuous for all −1 ≤ x ≤ 1, because we divide by zero
only when x = 2, which is not on the interval [−1, 1]. g(x) = x + 2x is not continuous
for all −1 ≤ x ≤ 1, because we divide by 0 when x = 0, so 0 is not in the domain
of this function.
1 1
Exercise 2. Are f (x) = 2x−1 and g(x) = cos(x) continuous for all 0 ≤ x ≤ 1?
3 3
13x − x , x < 0
13x − x ,
x<0
• g(x) = x, 0≤x<1 • q(x) = κ, 0≤x<1
−2x + 3, 1 ≤ x −2x + 3, 1≤x
The list notation means that each function is defined by different formulas on three
different intervals. For example, if we want to find f (3), we see that 2 < 3, so x = 3
belongs to the last interval and f (3) = −2 · 3 + 8 = 2.
Example 3. Find where the function f (x) is discontinuous and explain why. For
which value of the constant κ will the function p(x) be continuous?
Solution: One each interval the function is a polynomial, so it is continuous. We
need to check if the function value jumps when the interval changes. First time the
interval changes at x = −2. When x is approaching −2 from the left, where x < −2,
2 2
the function is − x2 + 8, so it approaches − (−2)
2 + 8 = 6. We can write this using
formulas instead of words:
x2 (−2)2
lim f (x) = lim − + 8 = − + 8 = 6.
x→−2− x→−2− 2 2
When x is approaching −2 from the right, x now belongs to the second interval
−2 < x ≤ 2 where the function is constant, 6, so it approaches 6 :
The left and right limits are the same, 6, which means that the limit exists and is
2
equal to 6, and the function at x = −2 is also f (−2) = − (−2)
2 +8 = 6 (we are using
the first formula because x = 2 belongs to the first interval x ≤ 2) so the function is
continuous at x = −2.
Second time the interval changes at x = 2. Again, let us compute the left and
right limits using the corresponding formulas from the list:
1.9 Limits and continuity 49
Since the two limits are not equal, the limit does not exist, and the function is
discontinuous at x = 2. Everywhere else the function is continuous.
To find the constant κ which ensures that p(x) is continuous at x = 2, let us
compute the left and right limits using the corresponding formulas from the list:
The limits must be equal, so we must have that 6 = κ · 2 + 8. Solving for κ we get
that κ = −1. For κ = 1 the limit exists and is equal to 6, and the function f (2) = 6,
so now the function p(x) is also continuous at x = 2.
Exercise 3. Find where the function g(x) is discontinuous and explain why. For
which value of the constant κ will the function q(x) be continuous?
Example 4. The price of mailing a letter by Canada Post in 2022 is (where ‘up to’
means ‘including’):
• $1.07 up to 30 g, • $4.44 over 200 g up to 300 g,
• $1.30 over 30 g up to 50 g, • $5.09 over 300 g up to 400 g,
• $1.94 over 50 g up to 100 g, • $5.47 over 400 g up to 500 g.
• $3.19 over 100 g up to 200 g,
At which points is the price as a function of weight discontinuous? What are the
right and left limits at those points?
Solution: The domain of the price p = p(w) of a letter as a function of weight is
0 < w ≤ 500. Beyond that weight it is no longer considered a letter. The first jump
(discontinuity) is at 30g, where the left limit is $1.07 and the right limit is $1.30.
Other discontinuities are similar, at 50g, 100g, etc.
Exercise 4. 50 mg of a drug is injected into a patient at a constant rate for 12
seconds. After that the plasma concentration of the drug decreases exponentially
with a half-life of 10 minutes. Express the quantity q = q(t) of the drug in the
patient’s body as a continuous function of time t, measured in minutes. Make sure
it is continuous at 12 seconds.
|x|
Example 5. What are the left and right limits of f (x) = x at x = 0? Does the limit
at x = 0 exist? Is the function continuous at this point?
Solution: If x < 0 then f (x) = |x| |−1|
x = −1. For example, if x = −1 then −1 = −1.
Similarly, if x > 0 then f (x) = |x|
x = 1. This means that the left limit limx→0− f (x) =
−1 and the right limit limx→0+ f (x) = 1. We can conclude that the limit limx→0 f (x)
does not exist, and the function is not continuous. Also, x = 0 is not in the domain
since we cannot divide by zero, so the function cannot be continuous even if the
limit existed.
Exercise 5. What are the left and right limits of f (x) = |x−2|
x−2 at x = 2? Does the
limit at x = 2 exist? Is the function continuous at this point?
50 1 Functions
2
Example 6. Consider a rational function f (x) = xx2−3x+2
+x−2
. Compute limx→1 f (x)? Is
the function continuous at x = 1?
Solution: At x = 1 both the numerator and denominator are zero, so the function
is undefined at x = 1 which means it cannot be continuous no matter if the limit
exists or not. Finding the roots of quadratic polynomials, we can write
x2 − 3x + 2 (x − 1)(x − 2)
= .
x2 + x − 2 (x − 1)(x + 2)
When we take the limit x → 1, we consider x that approach 1 but are not equal to
1, so x − 1 ̸= 0 and we can cancel it:
In the next two problems we will use the following two functions.
( 2 (
− x2 + 8, 0 ≤ x ≤ 2 sin(x), 0 ≤ x ≤ π2
• f (x) = • g(x) =
−2x + 8, 2 < x ≤ 4. x − 12 , π2 < x ≤ 2.
Example 8. Can we apply the IVT to the function f (x) on the interval [0, 4]? If not,
find the value y between f (0) and f (4) such that y = f (x) has no solution x ∈ [0, 4].
1.9 Limits and continuity 51
Solution: We can check that limx→2− = 6 and limx→2+ = 4, so the function is not
continuous at x = 2, and IVT cannot be applied. On the first interval [0, 2], the
2
parabola − x2 + 8 is decreasing, and on the second interval [2, 4], the linear function
−2x + 8 is also decreasing, so when the function f (x) jumps from 6 to 4 when we
cross x = 2 it skips those values. This means that, for example, f (x) = 5 has no
solutions for 0 ≤ x ≤ 4. The graph of this function is in Example 1 above.
Exercise 8. Can we apply the IVT to the function g(x) on the interval [0, 2]? If not,
find the value y between g(0) and g(2) such that y = g(x) has no solution x ∈ [0, 2].
2x2 − 4x − 96 2 − 4x − 96
x2 2−0−0
lim 2
= lim 1 30
= =2
x→∞ x − x − 30 x→∞ 1 − − 1−0−0
x 2 x
and the main idea in this calculation was to take the leading term in the denominator
(here x2 ) and divide both the numerator and denominator by it. The reason why this
idea worked was because we could easily compare which function grows faster, x
or x2 , by dividing and cancelling. For example, xx2 = 1x → 0 means that x2 grows
faster than x.
To generalize this idea, let us compare how various functions grow at infinity.
We will say that a function g(x) grows faster than f (x) when x goes to infinity, or
f (x) grows slower than g(x), if
f (x)
lim = 0.
x→∞ g(x)
We can also express this more compactly by saying that g(x) dominates f (x) and
write f (x) ≪ g(x) as x → ∞.
how to compare the growth of basic functions, we can consider more complicated
examples.
+x x 2
Example 9. Compute the limit limx→+∞ 5·77x +ln(x) .
Solution: The fastest growing function in the denominator is 7x so, if we divide
both the numerator and denominator by it, we get that
7x +x2 2
7x + x2 7x 1 + 7x x 1+0
lim = lim 5·7x +ln(x)
= lim = = 0.2.
x→+∞ 5 · 7x + ln(x) x→+∞ x→+∞ 5 + ln(x) 5+0
7x 7x
√
Exercise 9. Compute the limit limx→+∞ ln(x)+
√ x.
x− x
+x x 2
Example 10. For which values of κ > 0 does the limit limx→+∞ eκx7 +ln(x) exist?
Solution: In words, the fastest growing term in the numerator is 7x and in the de-
nominator it is eκx , so the limit will exist if the denominator grows at least as fast,
so eκ ≥ 7 or κ ≥ ln(7). To make this explanation more precise, let us divide both
the numerator and denominator by eκx ,
7x x2 7 x
7x + x2 κx + κx κ +0 7 x
lim κx = lim e ln(x) e
= lim e = lim .
x→+∞ e + ln(x) x→+∞ 1 + x→+∞ 1 + 0 x→+∞ eκ
κx e
This limit can exist only if e7κ ≤ 1; otherwise, it will grow exponentially. If e7κ =
1 then the limit is 1 and if e7κ < 1 then the limit is zero, because it will decay
exponentially.
+x κ 2
Exercise 10. For which values of κ > 0 does the limit limx→+∞ 2xx2 −ln(x) exist?
Some famous limits. We conclude this section with a list of a few famous limits
that, in particular, will be useful when we study the derivatives of exponential and
trigonometric functions:
x n ex − 1 sin x cos x − 1
lim 1 + = ex , lim = 0, lim = 1, lim = 0.
n→∞ n x→0 x x→0 x x→0 x
There is no need to memorize these limits at this point but, if you have time, you
can learn more about them in the videos in the footnotes.6 The first limit is, in fact,
the definition of Euler’s number e and the exponential function ex . The second limit
is the consequence of the first one, and it will be used to compute the derivative
of ex . The last two limits are relatively easy consequences of the definition of sine
and cosine, and will also be used to compute the derivative of sin x and cos x.
6 https://youtu.be/sbLWLvSfvwk, https://youtu.be/IX1cZHz-bc0, https://youtu.be/dLXal60n3JQ.
1.9 Limits and continuity 53
Answer to Exercise 10. κ ≤ 2. If κ = 2 then the limit is 1, if κ < 2 then the limit
is 0.5.
Chapter 2
Derivatives
y = b+m·x
∆y y2 − y1
m= = .
∆x x2 − x1
55
56 2 Derivatives
∆y
Notice that for a linear function the ratio ∆x is the same constant m no matter what
the two points (x1 , y1 ) and (x2 , y2 ) are and, when we learn a general definition of
the derivative, we will see that, for a linear function, the slope m also happens to
be its derivative.
How can we think about this quantity m? If the input x changes by ∆x and the
∆y
output of our linear function changes by ∆y, the ratio ∆x tell us how the output
changes relative to the input, so it has the meaning of the rate of change of y with
respect to x. For example:
• In the car example above, if the time changes by ∆t = 0.2 hours then the distance
12
changes by ∆d = 12km, and the ratio 0.2 = 60 km/h is the change of distance
relative to time, better known as speed.
• In the photosynthesis example, if the time changes by ∆t = 0.2 hours then the
6
amount of glucose and oxygen changes by ∆d = 6 µmol, and the ratio 0.2 = 30
µmol/h is the change of glucose and oxygen relative to time, which is the rate
of photosynthesis.
Let us rephrase the same thing in a different way. What does it mean that the
derivative (or slope) of a linear function is equal to m? It means that:
The derivative 60 km/h means that, for example, between time 0.1 and 0.3 hours,
the distance will change by 60 · 0.2 = 12 km. The derivative 30 µmol/h means that
between time 0.1 and 0.3 hours, the amount of glucose and oxygen produced will
change by 30 · 0.2 = 6 µmol.
Derivative of a general function. Next, let us look at an informal definition of
the derivative for a general function y = f (x).
If the function is not linear then its slope may be constantly changing and the
∆y
ratio ∆x can depend on the points (x1 , y1 ) and (x2 , y2 ). However, if we zoom in very
close to a particular point (a, f (a)) on the graph of the function (figure above on
2.1 Practical interpretation of derivatives 57
the right), the graph looks almost linear and we can draw a so-called tangent line
that passes through the same point (a, f (a)) and has the same slope at that point.
Of course, this zooming in procedure is a very informal definition of this slope, and
we will make it more formal and precise later on, but for now:
• the slope of the tangent line at the point (a, f (a)) is called the derivative of a
function y = f (x) at the point x = a and it is denoted f ′ (a).
By analogy with the linear functions, what does it mean that the derivative of a
function y = f (x) at a point x = a is equal to f ′ (a)? It means that:
By contrast with the linear functions, here the change of y is only approximately
equal ≈ to f ′ (a)∆x, not exactly, and only if ∆x is small. Of course, we can rephrase
the above statement slightly depending on the setting of the problem. For example,
for the sake of clarity we will always select some specific small increment ∆x.
Exercise 1. Suppose that the average monthly sales S at a bakery, in dollars, are a
function S = f (A) of its monthly spending on advertisement A, also in dollars. If
f ′ (100) = 4.5, what are the units and practical meaning of this derivative?
Example 2. In the figure1 and table below we see the average finish time in 2009
New York marathon by age group and sex. Let T = f (A) be the average finish time
for men of age A. In each age group, take the middle age (for example, in the 45-49
age group the middle age is 47) and suppose that the average finish time for men
of that age is the same as for the group. For example, f (47) = 4h 13min, ignoring
seconds. Estimate the derivative f ′ (52), give its units and describe its meaning.
The table of values is:
1 https://www.runtri.com/2010/11/new-york-city-marathon-average-finish.html
58 2 Derivatives
A 22 27 32 37 42 47 52 57 62 67
T 4:12 4:06 4:08 4:10 4:09 4:13 4:22 4:36 4:47 5:12
Indeed, the slope of this line is f ′ (a) and, if we plug in x = a into this formula, the
second term becomes zero and we get y = f (a), so the line passes through the point
(a, f (a)). We can also write this equation in terms of the increments ∆y = y − f (a)
and ∆x = x − a: ∆y = f ′ (a) · ∆x.
2 https://www.cleanmpg.com//community/index.php?media/35360/full
2.1 Practical interpretation of derivatives 59
√
Example 3. We will later learn that the derivative of the function y = x at any
1
positive value x > 0 is equal to 2√ x
. What is its tangent line at x = 121? Using the
√
tangent line, approximate 132 and compare with the actual value.
√
Solution: The function at x = 121 is 121 = 11 and the derivative at x = 121 is
√1 = 221
, so the tangent line is y = 11 + 221
(x − 121). When x = 132, the tangent
2 121
11
√
line gives y = 11 + 22 = 11.5 . The actual value is 132 = 11.4891 . . ., and we
see that the function and the tangent line are quite close in this case even when the
increment ∆x = 132 − 121 = 11 is not very small.
Exercise 3. We will later learn that the derivative of the function y = cos x is equal
to − sin(x). What is its tangent line at x = π4 ? Using the tangent line, approximate
cos( π5 ) and compare with the actual value. Recall that cos( π4 ) = sin( π4 ) = √12 .
Estimating derivatives from tables. In the two examples above about New
York marathon and Porsche Taycan we actually estimated the derivative using
nearby values before interpreting its practical meaning. Right now we will do a
couple of similar examples, but we will spell out a bit more explicitly how we can
estimate the derivatives in such cases.
Example 4. In the table below we see the average finish time (in hours) in 2009
New York marathon by age group among women.
A 22 27 32 37 42 47 52 57 62 67
T 4.62 4.53 4.6 4.67 4.65 4.75 4.88 5.15 5.47 5.57
If T = f (A) is the average finish time for women of age A, estimate f ′ (22), f ′ (67)
and f ′ (52).
Solution: In the figure on the right, we
plot the data points from the table and
a dotted curve that interpolates smoothly
between those point and could hypothet-
ically represent the graph of the function
T = f (A). If we knew this function, we
could find the slope of the tangent line
at any age A, which would give us the
derivative f ′ (A). The problem is that we
do not know this function, so we have
to use the values given in the table. Nor-
mally, if we could zoom in on the actual function, we could estimate the slope of
∆y
the tangent line by the ratio ∆x using nearby points. Right now, the closest points
are the neighbouring points in the table, so we will use those values.
For example, to estimate the derivative f ′ (22), we can use the points (22, 4.62)
and (27, 4.53), for which ∆x = 27 − 22 = 5 years and ∆y = 4.53 − 4.62 = −0.09
hours. (Notice that we subtract the values in the same order.) As a result, − 0.09
5 =
60 2 Derivatives
−0.018 hour/year is our estimate for the derivative f ′ (22). We could also change
the units from hours to minutes, using that 1 hour = 60 minutes, to get −0.018
hour/year = −60 · 0.018 min/year = −1.08 min/year.
Similarly, we can estimate f ′ (67) using the points (62, 5.47) and (67, 5.57), in
which case we get 5.57−5.47
67−62 = 0.02 hour/year = 1.2 min/year.
Finally, to estimate f ′ (52), we have several choices. We can use a point to the
right to estimate f ′ (52) ≈ 5.15−4.88
57−52 = 0.054 hour/year = 3.24 min/year. We can use
a point to the left to estimate f ′ (52) ≈ 4.75−4.88
47−52 = 0.026 hour/year = 1.56 min/year.
Or we can take the average of the two estimates, which would be 3.24+1.56 2 = 2.4
min/year. Of course, taking the average is not strictly necessary, but it would often
give a better estimate. In the case when the increments ∆x are the same to the left
and right of our point, averaging the two values is the same as computing the slope
between those two neighbouring points, in this case 5.15−4.75
57−47 = 0.04 hour/year
= 2.4 min/year.
Exercise 4. The table below shows the energy economy E, in miles/kWh, at various
speeds S, in mph, for 2022 Audi GT RS. If E = f (S), estimate the derivatives
f ′ (50), f ′ (60), and f ′ (75).
S 50 55 60 65 70 75
E 3.65 3.40 3.30 2.95 2.70 2.30
• If the function is not continuous at a point x = a then the derivative f ′ (a) does
not exist, as in the figure on the left at x = 0 where the function has a jump.
• If the function has a corner (also called a kink) at x = a then the derivative f ′ (a)
does not exist, as in the case of y = |x| in the middle figure, because the slope
on the right of x = 0 is different than on the left of x = 0.
• A less common example is in the figure on the right, where the function y =
x sin( 1x ) keeps fluctuating between two lines y = x and y = −x as in approaches
x = 0 so, again, there is no tangent line at x = 0.
2.1 Practical interpretation of derivatives 61
Exercise. Draw two examples of graphs when derivatives do not exist, and give
one example of a formula y = f (x) when a derivative does not exist. Specify at
what points x the derivative is not defined and explain why.
Derivatives of inverse functions. Sup-
pose that a function y = f (x) is invertible,
so x = f −1 (y). In the figure on the right, we
flipped the figure at the beginning of this sec-
tion around the diagonal, so now it shows the
inverse function x = f −1 (y). Notice that the
x-axis and y-axis switched, b = f (a) is now
on the horizontal axis, and the slope of the
tangent line at y = b is the derivative of this
inverse function f −1 at the point y = b, which
is ( f −1 )′ (b). We will study how to calculate
derivatives later on, including derivatives of inverse functions, but for now let us
practice interpreting the meaning of this derivative ( f −1 )′ (b). Since the role of x
and y switch for the inverse functions, we can say:
Example 6. Let T = f (A) be the average finish time for women of age A in 2009
New York marathon that appeared in the Examples 2 and 4 above. Until about
62 2 Derivatives
age of 40 this function is not monotone, so not invertible, but if we restrict the
domain to ages of 40 and above then it looks increasing and invertible. Estimate
the derivative ( f −1 )′ (4.88) and give its units.
Solution: Since A = f −1 (T ), the units of the derivative will be the untis of ∆T
∆A
, so
−1
year/hour. From the table we see that f (4.88) = 52, so we can use nearby points
(4.75, 47) and (5.15, 57) to estimate the slope. Notice how we changed the role
of A and T and now write the average time T first, because it is the input of the
inverse function. Since the increments between those two points are ∆A = 57 −
47 = 10 years and ∆T = 5.15 − 4.75 = 0.4 hours, our estimate of the derivative is
( f −1 )′ (4.88) ≈ ∆T
∆A 10
= 0.4 = 25 year/hour. In Exercise 4 we found that f ′ (52) ≈ 0.4
1
hour/year, which is exactly the reciprocal 0.4 = 25 . This is not surprising because
∆A ∆T
for the inverse function we used ∆T instead of ∆A for the original function A =
f (T ). Remember this example when studying later on how to compute derivatives
of inverse functions, which will be based on the formula:
1
b = f (a) =⇒ ( f −1 )′ (b) = .
f ′ (a)
Exercise 6. Let E = f (S) be the energy economy function from the Exercise 4
above. Estimate the derivative ( f −1 )′ (3.30) and give its units. How does it relate
to f ′ (60) in the Exercise 4?
Answer to Exercise 2. It looks like the slope of the line connecting values at
55mph and 65mph would be a good approximation for the slope of the function at
60mph. Since ∆S = 10 mph and ∆E = 3.36 − 4.02 = −0.66 miles/kWh, the slope
of this line is ∆E
∆S = −0.066 (miles/kWh)/mph. Since mph is miles/hour, we can
cancel miles in the units and use h/kWh as the units of this derivative. However,
when interpreting its meaning we will keep using the increments of the original
variables S and E, which have units mph and miles/kWh, so from this point of
view there is no need to simplify the units. Again, the above calculation was only an
approximation of the derivative, but if indeed f ′ (60) = −0.066 (miles/kWh)/mph
then its meaning is the following: driving a car at the speed of 61 mph decreases
the energy efficiency approximately by 0.066 miles/kWh compared to driving it at
60 mph.
Secant lines and average rate of change. When we estimated the slope f ′ (a)
∆y
of the tangent line, we used the ratio ∆x of the increments of the input and output
of our function y = f (x), but we said that this approximation works well only if we
zoom in, which means that the two points should be pretty close to each other. It
∆y
turns out that this ratio ∆x has an important meaning and special name even if two
points are not close to each other.
In the figure on the right we pick
two points (a, f (a)) and (b, f (b)) on
the graph of the function y = f (x)
and draw a line through those points.
This line is called a secant line and its
slope
∆y f (b) − f (a)
=
∆x b−a
Example 1. What is the average rate of change of cos(x) on the interval [0, π]?
Solution: Because cos(0) = 1 and cos(π) = −1, the average rate of change equals
cos(π)−cos(0)
π−0 = −1−1 2
π−0 = − π = −0.6366 . . . .
Exercise 1. What is the average rate of change of ex on the interval [a, a + 1]?
2.2 Formal definition of derivative 65
Example 2. Draw a graph of any concave down function on some interval [a, b]
and compare the derivatives f ′ (a), f ′ (b) at the endpoints and the average rate of
change f (b)− f (a)
b−a .
Solution: One example of a graph of a
concave down function is in the fig-
ure on the right. We can see that,
as we move left to right, the slope
of the tangent line is getting smaller
and smaller. This means that f ′ (a) >
f ′ (b), and the average rate of change
is somewhere in between:
f (b) − f (a)
f ′ (a) > > f ′ (b).
b−a
f (b) − f (a)
f ′ (a) = lim .
b→a b−a
66 2 Derivatives
f (a + h) − f (a)
f ′ (a) = lim .
h→0 h
This is an algebraic definition of the derivative that translates the above geometric
definition into formulas.
For example, what is the derivative of a constant function y = f (x) = c? Since
its graph is a horizontal line, the slope is equal to 0 everywhere, so f ′ (a) = 0. Now
we can also see this using the algebraic definition,
f (a + h) − f (a) c−c
f ′ (a) = lim = lim = lim 0 = 0.
h→0 h h→0 h h→0
e2+h − e2 e2 eh − e2 e2 (eh − 1) eh − 1
lim = lim = lim = e2 · lim .
h→0 h h→0 h h→0 h h→0 h
In the last step, we took the factor e2 outside of the limit, because it is just a constant
that does not depend on h. Here, we practiced using the definition of the derivative
and simplified it a little bit, but we will come back to the last limit in a second.
Exercise 3. If we know that f (1) = 0, write down and simplify the definition of
the derivative of y = f (cos(x)) at x = 0.
(1 + h)2 − 12 (1 + 2h + h2 ) − 1 2h + h2
= = = 2 + h.
h h h
When h gets small, this slope approaches 2 because 2 + h → 2 + 0 = 2, so the
derivative of y = x2 at x = 1 is 2.
Solution: We will use a special trick of multiplying and dividing by the so called
√ and then
conjugate √ use √ (a −√b)(a + 2b) =√
√ the identity a2 − b2 to simplify the numer-
ator ( 9 + h − 9)( 9 + h + 9) = ( 9 + h) − ( 9)2 = (9 + h) − 9 = h:
√ √ √ √ √ √
9+h− 9 9+h− 9 9+h+ 9 h 1
= ·√ √ = √ √ =√ √ .
h h 9 + h + 9 h( 9 + h + 9) 9+h+ 9
1
Exercise 5. Write down and compute the derivative of y = x at x = 3.
Derivative as a function. If we can compute the derivative f ′ (a) for all points
x = a where the derivative exists, then we can think of the derivative as a new
function y = f ′ (x). In the examples and exercises above, instead of choosing some
specific value of a to compute the derivative f ′ (a), such as a = 2, 1, 9 or 3, we
could have chosen an arbitrary x and the same calculations would have given us
f ′ (x). Let us see how this works on a couple of examples.
ex+h − ex ex eh − ex ex (eh − 1) eh − 1
(ex )′ = lim = lim = lim = ex · lim .
h→0 h h→0 h h→0 h h→0 h
h
Let us plug in smaller and smaller values of h to see what number e h−1 approaches.
0.001 0.0001 0.00001 −1
For example, e 0.001−1 = 1.0005 . . ., e 0.0001−1 = 1.00005 . . ., e 0.00001 = 1.000005 . . ..
We can see that this gets closer and closer to 1, so
eh − 1
lim =1
h→0 h
and this shows what we wanted, (ex )′ = ex . From now on we no longer need to
calculate the derivative of ex at a specific point a, since we have a formula that
works for all x.
Comment. The truth is that, although we could see using a calculator that the above
h
limit limh→0 e h−1 was equal to 1, Euler’s number e = 2.718281828 . . . is actually
chosen in such a way that this limit is 1. If you recall, this limit was mentioned at the
end of Section 1.9 as one of the famous limits. You can watch in the footnote links3
more about where Euler’s number e comes from and how its definition implies the
above limit. In Chapter 1, we also mentioned that e is a very special base of an
exponential function and, what makes it special is exactly that the derivative of the
function y = ex is the function ex itself.
3 https://youtu.be/sbLWLvSfvwk, https://youtu.be/IX1cZHz-bc0.
68 2 Derivatives
√ 1 1 ′ 1
(x2 )′ = 2x, (x3 )′ = 3x2 , ( x)′ = √ and = − 2.
2 x x x
All the functions in Exercise 6 are power functions of the form y = xn for n =
2, 3, 21 and −1, and all the derivatives are given by the following power rule:
′
xn = nxn−1 .
√
Example 7. Using the power rule, compute the derivative of x. What is the do-
main of this derivative function?
Solution: Using the power rule with n = 12 ,
√ ′ 1 ′ 1 1 1 1 1
x = x 2 = x 2 −1 = x− 2 = √ .
2 2 2 x
Of course, the power rule applies only where the function and the derivative are
well defined, so the domain of this derivative is x > 0.
1
Exercise 7. Using the power rule, compute the derivative of x2
. What is the domain
of this derivative function?
Example 8. Given any power function y = cxn , show that near any point x in its
domain,
∆y ∆x
≈n .
y x
If n = 2 and x changes by 1%, by what percentage approximately does y change?
Solution: If we move ∆x and y to the opposite sides of the equation, what we want
to show is that
∆y y cxn
≈n =n = cnxn−1 .
∆x x x
But cnxn−1 is the derivative of y = cxn , which by definition of the derivative can
∆y
be approximated by ∆x . It means that the above equation is just a rephrasing of
the usual meaning of the derivative in the case of power functions. If n = 2 and x
∆y
changes by 1% then ∆x = 0.01x, so ∆x ∆x
x = 0.01 and y ≈ 2 x = 0.02. So the above
equations shows that, for a power function y = cx2 , if the input x changes by 1%
then the output changes by approximately 2%. Of course, we can change 2 to any
other power n.
1
(xn )′ = nxn−1 , (ex )′ = ex , (ln x)′ = ,
x
(sin x)′ = cos x, (cos x)′ = − sin x.
We have already explained above the formula (ex )′ = ex , and have checked several
special cases of the power rule (xn )′ = nxn−1 . The derivatives of sine and cosine
follow from some trigonometric identities and the famous limits mentioned at the
end of Section 1.9; if you are interested you can learn more about it in the footnote
link.5 The derivative of ln x and the general case of the power rule (for arbitrary
power n) will be explained later when we discuss derivatives of inverse functions.
The famous limits at the end of Section 1.9 were used to compute the derivatives
of ex , sin x and cos x, but once we know these derivatives we can reinterpret those
limits as derivatives of these functions at zero.
h
Example 9. Compute the limit limh→0 e h−1 using that (ex )′ = ex .
h 0+h 0 0+h 0
Solution: Since e h−1 = e h−e , the limit limh→0 e h−e is the definition of the
derivative of ex at x = 0. Since (ex )′ = ex , the derivative at x = 0 is equal to e0 = 1,
so the limit is 1.
Exercise 9. Compute the limit limh→0 sinh h using that (sin x)′ = cos x, and the limit
limh→0 ln(1+h)
h using that (ln x)′ = 1x .
′ ′
f (x) + g(x) = f ′ (x) + g′ (x), f (x) − g(x) = f ′ (x) − g′ (x).
4 To differentiate a function means to take its derivative, and differentiation means taking a derivative.
5 https://youtu.be/buqwRTJcEmw.
70 2 Derivatives
If the function f (x) changes by 5 between x = 0 and x = 1 then the function 3 f (x)
will change by 3 · 5 = 15. This means that if we multiply our function by a constant
c then the increment ∆y will be multiplied by c, which implies that
′
c f (x) = c f ′ (x).
The above two rules together are called the linearity of differentiation. They can
also be called the sum rule, difference rule and constant multiple rule.
√
x(2x+7 x)−1
Example 10. Compute the derivative of f (x) = x 5/2 .
xa
Solution: First, using that xa xb = xa+b and xb
= xa−b , we can simplify,
√
x(2x + 7 x) − 1 2x2 + 7x3/2 − 1
=
x5/2 x5/2
= 2x 2−5/2
+ 7x 3/2−5/2
− x−5/2 = 2x−1/2 + 7x−1 − x−5/2 .
Then, using the above two rules and the power rule,
√1 − 3ex .
Example 11. Compute the derivative of f (x) = cos x + 2 ln x
Solution: First, let us simplify
1
2 ln √ = 2 ln(x−1/2 ) = 2(−1/2) ln(x) = − ln(x)
x
and then use the rules of differentiation,
′ 1
cos x − ln(x) − 3ex = (cos x)′ − (ln(x))′ − 3(ex )′ = − sin x − − 3ex .
x
Common notation for derivatives. Given a function y = f (x), its derivative can
be written in a number of way, for example,
df d dy
f ′ (x), , f (x), y′ (x), .
dx dx dx
dy
The last two, y′ (x) and dx can be used for any function, but it must be clear from
the context which specific function f (x) we are talking about. If we want to write
a derivative at some specific point x = a then we can use the following notation:
df d dy
f ′ (a), , f (x) , y′ (a), .
dx x=a dx x=a dx x=a
d
For example, dx (1 + x2 + cos x)|x=1 means that we first want to compute the
derivative of the function y = 1 + x2 + cos x and then plug in x = 1. We could
also write y′ (1) since we know what the function is. However, we cannot write
(1 + 12 + cos 1)′ , because it looks like we are taking the derivative of a constant
1 + 12 + cos 1 = 2.5403 . . . which is zero.
dn f dn dny
f (n) (x), , f (x), y(n) (x), .
dxn dxn dxn
As before, the last two can be used for any function, but it must be clear from the
context which specific function f (x) we are talking about. If we want to write a
derivative at some specific point x = a then we can use the following notation:
dn f dn dny
f (n) (a), , f (x) , y(n) (a), .
dxn x=a dxn x=a dxn x=a
For the first three derivatives when n = 1, 2 or 3, instead of writing f (n) we write
f ′ , f ′′ , f ′′′ and, instead of writing y(n) we write y′ , y′′ , y′′′ . Of course, the linearity of
differentiation rules apply to higher order derivatives because they apply at each
step,
dn dn
f (x) ± g(x) = f (n) (x) ± g(n) (x), c f (x) = c f (n) (x).
dx n dx n
72 2 Derivatives
d8 7 6 5 2 d8 7 d8 6 d8 5 d8 2 d8
(x − 4x + x − 2x + 1) = x − 4 x + x − 2 x + 8 1.
dx8 dx8 dx8 dx8 dx8 dx
First of all, the derivative of a constant 1 is 0, so all higher derivatives of a constant
will be zero. Every time we take a derivative of a power function, by the power
rule, the power will decrease by 1. For example, derivative of x2 will become 2x,
then 2, then 0, so the third and higher derivatives of x2 will be zero. For the same
reason, if we take the derivative of x7 eight times it will also become zero. So the
answer is zero.
Exercise 12. Compute the fourth derivative of cos x. Can you think of any other
function that has the same fourth derivative as cos x?
ea+1 −ea ea e1 −ea ea (e−1)
Answer to Exercise 1. (a+1)−a = 1 = 1 = (e − 1)ea .
Answer to Exercise 2. The left figure: 0 < f ′ (4) < f (3) − f (2) < f ′ (1), because
the slope is positive and decreasing as we move left to right, and because f (3) −
f (2) is the average rate of change on the interval [2, 3]. The right figure: f ′ (1) <
f (3) − f (2) < f ′ (4) < 0, because the slope is negative and increasing as we move
left to right, and because f (3) − f (2) is the average rate of change on the interval
[2, 3].
f (cos(0+h))− f (cos(0)) f (cos(h))
Answer to Exercise 3. lim h = lim h because the second
h→0 h→0
term is f (cos(0)) = f (1) = 0.
Answer to Exercise 4. Before taking the limit, let us first simplify the slope of the
tangent line,
1 3−(3+h) h
3+h − 31 (3+h)3 − (3+h)3 1 1 1
= = =− →− =− .
h h h (3 + h)3 (3 + 0)3 9
Answer to √Exercise 6. We will not repeat all the calculations and only show the
case of the x:
2.2 Formal definition of derivative 73
√ √ √ √ √ √
x+h− x x+h− x x+h+ x (x + h) − x 1
= ·√ √ = √ √ =√ √ .
h h x + h + x h( x + h + x) x+h+ x
1 √ 1
When we take the limit h → 0, we get √x+0+ x
= 2√ x
. Of course, this only works
√ ′
when x > 0, so the derivative ( x) exists only when x > 0.
′ ′
Answer to Exercise 7. Using the power rule with n = −2, x12 = x−2 =
−2x−2−1 = −2x−3 = − x23 . The function and derivative are defined when x ̸= 0.
Answer to Exercise 10. First we simplify the function as x−2 − x−1 + 2x−7/2 and
then take the derivative,
Answer to Exercise 12. Consecutive derivatives of cos x will be − sin x, − cos x, sin x
and cos x. So the fourth derivative of cos x is cos x itself. Any function of the form
cos x +ax3 +bx2 +cx +d will also have the fourth derivative equal to cos x, because
all the power functions will disappear after taking four derivatives, just like in the
previous example.
74 2 Derivatives
To help us talk about these four cases, let us imaging that the function y =
f (t) describes a position (or coordinate) y of a car moving along a straight line
as a function of time t. The straight line has a positive and negative direction, so
the car can move forward or backward. The derivative f ′ (t) represents velocity at
time t, which can be positive (if the car is moving forward) or negative (if the car
is moving backward). Absolute value of the velocity | f ′ (t)| is called speed. The
second derivative f ′′ (t) represents acceleration.
(a) The graph is increasing and concave up. If f ′ (x) > 0 then velocity is positive
and the car is moving in the positive direction. If f ′′ (0) > 0 then velocity is
increasing and, in this case, the car is moving faster and faster. The fact that the
slope is increasing means that the graph is concave up.
(b) The graph is increasing and concave down. If f ′ (x) > 0 then velocity is positive
and the car is moving in the positive direction. If f ′′ (0) < 0 then velocity is
decreasing and, in this case, the car is moving slower and slower. The fact that
the slope is decreasing means that the graph is concave down.
6 https://youtu.be/tCs5DK951Js.
2.3 Derivatives and graphs 75
(c) The graph is decreasing and concave up. If f ′ (x) < 0 then velocity is nega-
tive and the car is moving in the negative direction. If f ′′ (0) > 0 then velocity
is increasing and, again, increasing slope means that the graph is concave up.
However, in this case, the speed is decreasing so the car is moving slower and
slower. That is because, if the velocity increases from −3 to −1 then the speed
decreases from 3 to 1.
(d) The graph is decreasing and concave down. If f ′ (x) < 0 then velocity is negative
and the car is moving in the negative direction. If f ′′ (0) < 0 then velocity is
decreasing and, again, decreasing slope means that the graph is concave down.
However, in this case, the speed is increasing so the car is moving faster and
faster. That is because, if the velocity decreased from −1 to −3 then the speed
increased from 1 to 3.7
Example 1. A cup of hot coffee left at room temperature will cool down, but the
rate of cooling will slow down. Translate this into a statement about derivatives
and graph of some function.
Solution: The function here is the temperature T (t) of a cup of coffee as a function
of time. Cooling down means that T ′ (t) < 0, and the rate of cooling slowing down
means that T ′′ (t) > 0. The graph of T (t) will be decreasing and concave up.
Exercise 1. Between January and May, 2021, the decline in Lake Mead water levels
have accelerated. Translate this into a statement about derivatives and graph of
some function.
The next example and exercise will refer to the following two figures.
Example 2. In the figure on the left, given the graph of y = f (x) (solid black line),
determine which curve is the graph of its derivative f ′ (x), (a), (b), or (c).
Solution: The function is decreasing up to about x = −4.5 and right after that it
starts increasing. This means that the derivative should be negative up to −4.5, so
7 https://youtu.be/6NaUJ6OGcLU.
76 2 Derivatives
the graph should be below the x-axis, and right after −4.5 it should become posi-
tive, so the graph should be above the x-axis. The only graph with such behaviour
is (b). Similar behaviour happens at x = 0 and about x = 4.5, where the function
f (x) changes direction and (b) changes sign by crossing the x-axis. Such points are
called critical points.
Notice also that when (b) is decreasing (between about −2.5 and 2.5), the func-
tion f (x) is concave down (the slope is decreasing), and when (b) is increasing,
the function f (x) is concave up (the slope is increasing). These points where the
derivative changes direction (here about −2.5 and 2.5) and the original function
changes concavity are called inflection points.
Exercise 2. In the figure on the right, given the graph of y = f (x) (solid black line),
determine which curve is the graph of its derivative f ′ (x), (a), (b), or (c). Where
are the critical points, and inflection points of f (x)?
The next two examples will refer to the following two figures. Notice that the
solid black line is the graph of the derivative y = f ′ (x), not the original function.
Example 3. In the figure on the left, given the graph of y = f ′ (x) (solid black line),
determine which curve is the graph of f (x), (a), (b), or (c).
Solution: The derivative is positive between about −4 and 2 and negative outside
of that interval, so the function f (x) should be increasing between −4 and 2 and
decreasing outside. The answer is (b). The derivative changes direction at about
x = −1.2, so the function (b) has an inflection point there, where it switches from
concave up to concave down.
Exercise 3. In the figure on the right, given the graph of y = f ′ (x) (solid black
line), determine which curve is the graph of f (x), (a), (b), or (c).
Example 4. Search “pole vault” on Youtube and watch some videos. Which of
the following two graphs below more accurately describes the horizontal position
x = x(t) of the vaulter as a function of time t?
2.3 Derivatives and graphs 77
Exercise 6. Suppose that two functions y = f (x) and y = g(x) are equal at x = a,
i.e. f (a) = g(a), and f ′ (a) < g′ (a). Which function is bigger immediately to the
left of x = a?
Example 8. In the figure above on the left, determine which graph corresponds to
f (x), f ′ (x), and f ′′ (x). Where are the inflection points of f (x), and what happens
to f ′ (x) and f ′′ (x) at those points?
Solution: We can see that at the points x where the dotted blue curve (a) crosses the
x-axis, the solid green curve (b) changes direction from increasing to decreasing or
vice versa. This means that (a) is the derivative of (b). Similarly, where the solid
green curve (b) crosses the x-axis, the dashed red curve (c) changes direction, so (b)
is the derivative of (c). This means that (c) if the graph of y = f (x), (b) if the graph
of y = f ′ (x), and (a) if the graph of y = f ′′ (x). Inflection points are where f ′ (x)
changes direction, which is at around x = 0, 0.75, 1.85, 3.2. At inflection points
f ′ (x) changes direction, and f ′′ (x) crosses the x-axis.
Exercise 8. In the figure above on the right, determine which graph corresponds to
f (x), f ′ (x), and f ′′ (x). Where are the inflection points of f (x), and what happens
to f ′ (x) and f ′′ (x) at those points? What happens at x = 0?
Answer to Exercise 1. The function here is the water level h(t) of Lake Mead
as a function of time. Water level declining means that h′ (t) < 0, and the decline
accelerating means that h′′ (t) < 0. The graph of T (t) will be decreasing and con-
cave down. Of course, water levels might fluctuate slightly, so we should talk about
averages over a certain period of time.
Answer to Exercise 2. The answer is (b). Critical points are about −4.8, −0.4, 4.2
where the function f (x) changes direction and f ′ (x) crosses the x-axis, and inflec-
tion points are about −3.5 and 1.5, where the derivative f ′ (x) changes direction.
Answer to Exercise 3. The answer is (b).
Answer to Exercise 4. https://youtu.be/aewfFlVg-MU
Answer to Exercise 5. You can check one by one that the graph below satisfies
all the above properties. Notice that f (1)− f (−1)
1−(−1) is the slope of the line connecting
the points (−1, f (−1)) and (1, f (1)), which is bigger than f ′ (0) = 0 in the figure.
Also, although f (x) has a vertical asymptote, the dot at (2, 2) indicates that we
chose f (2) to be equal to 2; this is a legal move although the function will be
discontinuous at x = 2. f (x) is continuous but not differentiable at x = 3 because it
has a corner there.
Answer to Exercise 9. Possible sketches are in the figures above. The functions
y = f (x) could be shifted vertically, because adding a constant y = f (x) + c does
not affect the derivative, ( f (x) + c)′ = f ′ (x). In the second figure, the function f (x)
has a corner at x = 0, so f ′ (0) is undefined. The derivative jumps from positive
to negative value, so increasing function suddenly becomes decreasing (like a ball
bouncing off a wall changing direction suddenly). The derivative is increasing on
both sides of x = 0, so the function is concave up on both sides. The function is
increasing exactly when f ′ (x) > 0, i.e. the graph of y = f ′ (x) is above the x-axis.
82 2 Derivatives
′
f (x)g(x) = f ′ (x)g(x) + f (x)g′ (x).
f (x)
• Quotient rule: the derivative of the ratio g(x) of two functions is
′
f (g(x)) = f ′ (g(x))g′ (x).
′ 1
f −1 (x) = .
f ′ ( f −1 (x))
A more convenient way to phrase the chain rule is that, if b = f (a) then
1
f −1 )′ (b) = .
f ′ (a)
Of course, in all these rules we assume that everything is well defined on the right
hand side of each equation. For example, we never divide by zero, etc.
Where the formulas come from. Below we will focus on learning how to use
these rules, but if you are interested to learn where they come from, all the rules can
be derived by simple manipulations from the algebraic definition of the derivative.
You can learn more about the derivation of the chain rule9 and the inverse function
rule 10 in the footnote links. Here we will only show how to derive the product rule.
∆y
We want to see what happens to the ratio ∆x when the increment ∆x gets smaller
and smaller and y = f (x)g(x). If ∆ f and ∆h are the increments of f and g then
∆y = ∆( f g) = f (x + h)g(x + h) − f (x)g(x)
= f (x) + ∆ f g(x) + ∆g − f (x)g(x)
= ∆ f · g(x) + f (x) · ∆g + ∆ f · ∆g.
Chain rule. We will start with the chain rule, because it is the most basic build-
ing block, and because it will give us a much richer collection of functions to
play with when we use the product rule and quotient rule. The chain rule is some-
times called the outside-inside rule, because when we compute the derivative of
the composition f (g(x)) we first take derivative of the outside function f ′ , plug in
the inside function g(x), f ′ (g(x)), and then multiply by the derivative of the inside
function g′ (x).
Example 1. State what the outside function f (x) and inside function g(x) are, and
compute the derivative using the chain rule.
√
(a) e2x+5 (d) ln x
p 2
(b) cos(x2 ) (e) ex + cos(x)
ecos(x)
(c) cos2 (x) (f) e2x
Solution: (a) In e2x+5 , the outside function is f (x) = ex and the inside function is
g(x) = 2x + 5, so e2x+5 = f (g(x)). Because f ′ (x) = ex and g′ (x) = 2, the chain rule
f ′ (g(x))g′ (x) gives e2x+5 · 2 = 2e2x+5 .
(b) In cos(x2 ), the outside function is f (x) = cos x and the inside function is
g(x) = x2 , so cos(x2 ) = f (g(x)). Because f ′ (x) = − sin x and g′ (x) = 2x, the chain
rule f ′ (g(x))g′ (x) gives − sin(x2 ) · 2x = −2x sin(x2 ).
(c) In cos2 (x) = (cos x)2 , the outside function is f (x) = x2 and the inside function
is g(x) = cos x, so cos2 (x) = f (g(x)). Because f ′ (x) = 2x and g′ (x) = − sin x, the
chain rule f ′ (g(x))g′ (x) gives 2 cos x · (− sin x) = −2 cos x sin x.
√
√ (d) In1/2ln x, the√outside function is f (x) = ln x and the inside function is g(x) =
x = x , so ln x = f (g(x)). Because f ′ (x) = 1x and g′ (x) = 2√ 1
x
, the chain
rule f ′ (g(x))g′ (x) gives √1
x
1
· 2√ x
= 1
2x .
There is a much easier way to compute
√
this derivative if we first simplify the function ln x = ln(x1/2 ) = 21 ln x; then the
1
derivative is immediately 2x .
84 2 Derivatives
p2 √
(e) In ex + cos(x), the outside function is f (x) = x = x1/2 and the inside
2
function is ex + cos(x). The derivative of the outside function is f ′ (x) = 2√
1
x
. The
2 2
derivative of the inside function (ex +cos(x))′ = (ex )′ −sin x requires us to use the
2 2 2 2
chain rule one more time to compute (ex )′ = ex · (x2 )′ = ex · 2x = 2xex . Finally,
we get that
2
′ ′ 2xex − sin x
f (g(x))g (x) = p 2 .
2 ex + cos(x)
(f) This problem might look like we need to use the quotient rule, but, in fact,
we can simplify the function as ecos(x)−2x and apply the chain rule. The outside
function is f (x) = ex and the inside function is g(x) = cos(x) − 2x. Because f ′ (x) =
ex and g′ (x) = − sin(x) − 2 = −(sin(x) + 2), the chain rule f ′ (g(x))g′ (x) gives
−ecos(x)−2x (sin(x) + 2).
The case when the inside function is linear, as in (a) above, is so common that it
is worth stating it explicitly as a special case of the chain rule:
′
f (mx + b) = m f ′ (mx + b).
(ax )′ = ln(a)ax .
x 0 1 2 3
g(x) 0 3 0.5 1
g′ (x) 2 0.5 -1 0
2.4 Differentiation rules 85
so f ′ (g(1))g′ (1) = f ′ (3) · 0.5. From the graph we see that f ′ (3) = −2 because
between x = 2 and x = 3 the graph is a line connecting the points (2, 4) and (4, 0),
so it has slope −2. This gives dx d
f (g(x))|x=1 = f ′ (g(1))g′ (1) = −2 · 0.5 = −1.
d
Exercise 2. In the setting of the previous problem, compute dx g( f (x))|x=1 .
Example 3. If the slope of f (x) is always positive and the slope of g(x) is always
negative, are the following functions increasing or decreasing?
Product and quotient rule. Now, we will add the product and quotient rules
into the mix.
d d f (x)
f (x)g(x) and
dx x=2 dx g(x) x=2
d d f (x)
f (x)g(x) and
dx x=2.3 dx g(x) x=2.3
x 1 2 3 4
given the table of values: f (x) 1.5 0.5 0 0.3
g(x) -1 0.25 0 -0.35
sin(x)
(b) We first rewrite tan(x) = cos(x) and then use the quotient rule,
sin(x) ′ (sin(x))′ cos(x) − sin(x)(cos(x))′
=
cos(x) cos2 (x)
cos(x) cos(x) + sin(x) sin(x)
=
cos2 (x)
cos (x) + sin2 (x)
2 1
= 2
= = sec2 (x).
cos (x) cos2 (x)
In the second term, we need to use the product rule again to compute (ex sin(x))′ =
(ex )′ sin(x) + ex (sin(x))′ = ex sin(x) + ex cos(x), and then plug in above to get the
final answer,
′
x2 ex sin(x) = 2xex sin(x) + x2 ex sin(x) + x2 ex cos(x)
= xex 2 sin(x) + x sin(x) + x cos(x) .
In the last problem when we computed the derivative of the product of three
functions and used the product rule twice, the two steps can be combined into one
easy-to-remember formula:
′
f (x)g(x)h(x) = f ′ (x)g(x)h(x) + f (x)g′ (x)h(x) + f (x)g(x)h′ (x).
The same rule will work with four or more factors. We have to apply derivative to
each factor separately and then add up all the terms.
Exercise 5. Compute the derivatives of the following functions.
Inverse function rule. An explanation of the inverse function rule can be found
in the footnote link11 , but the basic idea is quite simple and we have already seen
it in the Example 6 in Section 2.1. Basically, in the inverse function the role of
∆y
variables x and y switches, so the derivative is approximated by ∆x ∆y instead of ∆x .
The only subtle point is that if b = f (a) then the same increments ∆x and ∆y are
used at x = a for f or y = b for f −1 ; that is why the derivative of the inverse function
at b is the reciprocal of the derivative of the original function at a. Before we look
at the examples of using the inverse function rule, let us first review the practical
meaning of the derivative of an inverse function.
11 https://youtu.be/y9jzS-sUeM8
88 2 Derivatives
S−1
• The input of S and S′ should be price p, in this case p = 40, which eliminates
(c).
• The input of S−1 and (S−1 )′ should be the number N of tortes sold, in this case
N = 150, which eliminates (a).
• The derivative S′ (40) can be approximated by ∆N ∆p (increment of the output over
increment of input). In our case, ∆N = 160 − 150 = 10 and ∆p = 35 − 40 = −5,
so S′ (40) ≈ −5
10
= −2. So (b) is not correct.
• The derivative (S−1 )′ (150) can be approximated by ∆N ∆p
(again, increment of
the output over increment of input), so (S−1 )′ (150) ≈ −5
10 = −0.5. So (d) is the
correct answer.
Exercise 6. The total cost of owning a car depends on the APR (annual percentage
rate) of the auto loan. Suppose that when APR is 4%, the total cost is $30, 000, and
lowering APR to 3.5% will decrease the total cost to $29, 500. If c = f (r) is the
total cost c at the rate r, what formula does the above information correspond to?
Next, let us practice the inverse function rule: if b = f (a) then f −1 )′ (b) = 1
f ′ (a) .
2.4 Differentiation rules 89
x 0 1 2 3
g(x) 3 2 1.5 1.25
g′ (x) −1.5 −1 −0.5 −0.25
Solution: (a) Inverse function rule tells us that (g−1 )′ (3) = g′ 1(a) , where a is such
that 3 = g(a). Looking at the table, we see that g(0) = 3, so a = 0. This means that
(g−1 )′ (3) = g′ 1(0) = −1.5
1
= − 23 .
d −1
(b) First, by chain rule, dx g ( f (x))|x=1 = (g−1 )′ ( f (1)) f ′ (1). Looking at the
graph, we see that f (1) = 2 and f ′ (1) = 2, so dx d −1
g ( f (x))|x=1 = 2(g−1 )′ (2) =
2 2 2
g′ (a) , where 2 = g(a). Looking at the table, g(1) = 2, so a = 1 and g′ (1) = −1 = −2.
Exercise 7. Given the functions in Example 7, compute (a) (g−1 )′ (1.5) and (b)
d −1
dx g ( f (x))|x=3 .
When we discussed inverse functions, we define three classic ones: ln(x) as the
inverse of ex , arctan(x) as the inverse of sin(x) on [− π2 , π2 ], and arctan(x) as the
inverse of tan(x) on (− π2 , π2 ). One can use the inverse function rule to show that:
′ 1 ′ 1 ′ 1
ln(x) = , arctan(x) = , arcsin(x) = √ .
x 1 + x2 1 − x2
The first one is explained in the footnote link12 , so let us show the second one here.
′ 1
arctan(x) = .
1 + x2
Now, let us prove the formula. Recall that we already proved that tan′ (x) =
sec2 (x). If b = tan(a) for some a ∈ (− π2 , π2 ) then
1 1
arctan′ (b) = = .
tan′ (a) sec2 (a)
However, the answer should be in terms of b, so we need to express sec2 (a) in terms
of b = tan(a), which can be done by finding the relationship between sec(a) and
tan(a) among the Pythagorean trigonometric identities13 : sec2 (a) = 1 + tan2 a =
1 + b2 , so arctan′ (b) = 1+b
1
2 , which is exactly what we wanted.
′
f ′ (x) = f (x) · ln f (x)
whenever f (x) > 0. The main point of this rule is that sometimes it is easier to
calculate the derivative of the logarithm ln f (x) of a function f (x) instead of calcu-
lating f ′ (x) directly. The reason why we need f (x) to be positive is because we are
only allowed to plug in positive values into ln(x). When f (x) is negative, we can
use a more general rule
′
f ′ (x) = f (x) · ln | f (x)| .
The logarithmic differentiation rule can be used, for example, to prove the power
rule (xn )′ = nxn−1 for all n. If you recall, we only checked this rule in a few special
cases, but for general n it can be obtained by logarithmic differentiation. For the
explanation of the logarithmic differentiation rule and the demonstration of the
power rule, see the footnote link.14
13 https://en.wikipedia.org/wiki/Pythagorean_trigonometric_identity
14 https://youtu.be/hwrTON7VAGw
2.4 Differentiation rules 91
Answer to Exercise 1. √ √
(a) f (x) = 2x , g(x) = x, ( f (g(x)))′ = ln(2)2 x 2√ 1
x
.
2 ′ 2x 2
(b) f (x) = ln(x), g(x) = x , ( f (g(x))) = x2 = x .
√ cos(x)
(c) f (x) = x, g(x) = sin(x), ( f (g(x)))′ = √ .
2 sin(x)
sin(x)
(d) f (x) = ln(x), g(x) = cos(x), ( f (g(x)))′ = − cos(x) = − tan(x).
(e) Since 23x = (23 )xp= 8x and 32x = p (32 )x = 9x , we can simplify
23x 32x = 8x 9x =
(8 · 9)x = 72x and (23x 32x )7 = (72x )7 = (72x )7/2 = (727/2 )x . This means
that we can apply the rule (ax )′ = ln(a)ax with a = 727/2 to get the derivative
ln(727/2 )(727/2 )x = 27 ln(72)(727/2 )x . It is important to try to simplify, if possible,
before taking derivatives.
(f) This might look like we need the quotient rule. However, we can rewrite
Answer to Exercise 2. d
dx g( f (x))|x=1 = g′ ( f (1)) f ′ (1) = g′ (2) f ′ (1) = −1·2 = −2.
Answer to Exercise 3. Notice that time changed to seconds from minutes and
quantity changed to grams from kilograms. First, the quantity (in kg) of the chem-
t t
ical produced up to time t seconds will be equal to f ( 60 ), because t sec = 60
min. The rate of production is its derivative and, using the chain rule, ( f ( 60t
))′ =
′ t 1 ′ t 1
f ( 60 ) 60 , which is measured in kg/sec. Since we want g/sec, we translate f ( 60 ) 60
kg/sec = f ′ ( 60 ) 60 × 1000 g/sec = f ′ ( 60 ) 6 g/sec.
t 1 t 100
Answer to Exercise 4. We could estimate the first derivative of the product in two
∆y
ways. First, since x = 2.3 is between 2 and 3, we could estimate by ∆x between
these two points with y = f (x)g(x), so
To estimate f (2.3) we could use the straight line connecting (2, 0.5) and (3, 0).
We already computed its slope, m = −0.5, and it passes through the point (3, 0),
so the line is y = 0 − 0.5(x − 3) and f (2.3) ≈ −0.5(2.3 − 3) = 0.35. Similarly,
to estimate g(2.3) we could use the straight line connecting (2, 0.25) and (3, 0),
which is y = −0.25(x − 3), so g(2.3) ≈ −0.25(2.3 − 3) = 0.175. Finally, plugging
in all the estimates,
−0.5
Answer to Exercise 6. (b) ( f −1 )′ (30, 000) ≈ ∆r
∆c = −500 = 0.001 %
$.
y = f (a) + f ′ (a)(x − a)
for the tangent line to y = f (x) at x = a. In addition to being called the tangent line,
this linear function is sometimes also called:
• local linearization of f (x) near x = a;
• best linear approximation to f (x) near x = a.
The names reflect that f (x) is well approximated by its tangent line locally near
x = a, i.e. f (x) ≈ f (a) + f ′ (a)(x − a) near x = a. It is a good idea to remember a
few special cases:
Shapes of graphs. We know that the sign of f ′ (x) determines whether the func-
tion y = f (x) is increasing or decreasing, and the sign of f ′′ (x) determines whether
the function is concave up or concave down. Let us now combine this information
with explicit calculations of derivatives.
Example 4. Find where the function y = 2x3 − 3x2 − 12x + 1 is increasing, decreas-
ing, concave up, and concave down. Sketch its graph.
Solution: The first derivative is y′ (x) = 6x2 − 6x − 12 =
6(x2 −x−2). We can check that x2 −x−2 = 0 when x =
−1 and x = 2. Between −1 and 2 (for example at x = 0)
x2 − x − 2 < 0 is negative, and it is positive outside of
[−1, 2]. This means that the function 2x3 −3x2 −12x+1
is decreasing on (−1, 2) and increasing on (−∞, −1)
and (2, ∞). The second derivative is y′′ (x) = 6(2x − 1),
it is equal to zero at x = 0.5, negative to the left and
positive to the right of 0.5. So the function is concave
down on (−∞, 0.5) and concave up on (0.5, ∞). In particular, x = 0.5 is an inflection
2.5 First applications: old and new 95
point. With this information, we can sketch the general shape of this function as in
the figure.
Exercise 4. Find where the function ln(1 + x2 ) is increasing, decreasing, concave
up, and concave down. Sketch its graph.
sin(x + y) − cos(xy) + 1 = 0.
Can we find the derivative y′ (2), which is the slope of the tangent line in the figure,
without knowing this function? The answer is yes, using implicit differentiation.
Example 5. Compute the derivative y′ (2) and the point A = (2, 2.383 . . .) in the
figure.
Solution: What implicit differentiation means is that we differentiate the above
equation sin(x + y(x)) − cos(xy(x)) + 1 = 0, pretending that we know y(x) and
using the chain rule, and then solving for y′ (x) at the end. If this equation is true
then ′
sin(x + y(x)) − cos(xy(x)) + 1 = (0)′ = 0
is also true. First, we use the chain rule,
Then we use whatever rule is necessary for the remaining derivatives, in this case
the sum rule and the product rule,
Notice that, at this step, we simply write y′ (x) for the derivative of y(x) formally,
without knowing what it is. However, the good news is that we can now solve
96 2 Derivatives
the above equation for y′ (x) by multiplying out, collecting all the terms with y′ (x)
together, and moving all the other terms to the other side of the equation,
Finally, since we are interested in the point A = (2, 2.383 . . .), this means that x = 2
and y(2) = 2.383 . . ., so we can plug in these values into the formula
cos(2 + 2.383) + sin(2 · 2.383)2.383
y′ (2) = − = −1.1648 . . . .
cos(2 + 2.383) + sin(2 · 2.383)2
The calculation we just did will be a bit cleaner if we write y instead of y(x) and y′
instead of y′ (x), keeping in mind that y and y′ depends on x:
cos(x + y) + sin(xy)y
y′ = − ,
cos(x + y) + sin(xy)x
and then plug in x = 2 and y = 2.383. Also, we could plug in the values x = 2 and
y = 2.383 before solving for y′ , which would actually make solving for y′ much
easier. Make sure to take advantage of this in the next two exercises.
Exercise 6. Find the tangent line to the curve x2 + 2xy − y3 = 7 at (2, 1).
2.5 First applications: old and new 97
d 1
Answer to Exercise 1. Because dx ln(1 + x)|x=0 = 1+x |x=0 = 1 and ln(1 + 0) = 0,
the best linear approximation of ln(1 + x) near x = 0 is 0 + 1 · (x − 0) = x, so
d
ln(1 + x) ≈ x there. Because dx (1 + x)κ |x=0 = κ(1 + x)κ−1 |x=0 = κ and (1 + 0)κ =
1, the best linear approximation of (1 + x)κ near x = 0 is 1 + κ · (x − 0) = 1 + κx,
and (1 + x)κ ≈ 1 + κx there.
undefined, we cannot use the SDT. The meaning of FDT is simple – if the function
is increasing up to and decreasing after x = a then the point is a local maximum
(like point B in the above figure). In the case of SDT, if f ′′ (a) < 0 then f ′ (x) is
decreasing at x = a and, because f ′ (a) = 0, the derivative changes from positive to
negative, so the point again must be a local maximum. If f ′′ (0) = 0 then the SDT
is inconclusive and we need to use the FDT. Similar reasoning works for local
minimum.
Example 1. Find all critical points and determine which ones are local minima or
2
maxima for (a) f (x) = 3x5 − 5x3 , (b) f (x) = e−x .
Solution: (a) Since f ′ (x) = 15x4 − 15x2 , it is defined everywhere. To find critical
points we need to solve 15x4 − 15x2 = 0, or x2 (x2 − 1) = x2 (x − 1)(x + 1) = 0. The
solutions are x = −1, 0, 1, so the function has three critical points where the tangent
line is horizontal. To decide which ones are local minima or maxima, let us start
with the first derivative test. By checking the sign of the derivative on each interval,
√ test, because x = 0 is not inside the domain. The domain of the function
derivative
ax + b x is x ≥ 0, so x = 0 is the left endpoint. Because the function is increasing
to the right of it, x = 0 is a local minimum. As we will discuss in the next section,
endpoints are often local or global minima or maxima even if they are not critical
points, so they require special attention.
Exercise 2. If a > 0 and b > 0 are positive constants, find all critical points f (t) =
aet + be−t and determine which ones are local minima or maxima.
In the next four problems we will refer to the following figures.
Example 3. In the figure above on the left, we see a graph of the derivative f ′ (x) of
some continuous function f (x). List all critical points of f (x) and determine which
ones are local minima or maxima. Also identity all inflections points of f (x).
Solution: Critical points are where the derivative is zero, x = −6, −1, 4 and 9, and
where it is undefined, x = −2. By the first derivative test, local maxima are where
the derivative changes from positive to negative, which happens at x = −2 and 9.
Local minima are where the derivative changes from negative to positive, which
happens at x = −6 and −1. Critical point x = 4 is neither a local minimum or
maximum, because the derivative is positive on both sides, so the function f (x)
increases before and after x = 4. Notice that the local maximum x = −2 is a corner
(kink) of f (x) because the slope f ′ (x) jumps suddenly from +2 to −2. Finally,
inflection points of f (x) are where the derivative f ′ (x) changes direction from in-
creasing to decreasing or vice versa, so x = −4.4, −2, 0.6, 4, 7.6 (some values are
approximate because it is hard to see exactly from the figure).
Exercise 3. In the figure above on the right, we see a graph of the derivative f ′ (x) of
some continuous function f (x). List all critical points of f (x) and determine which
ones are local minima or maxima. Also identity all inflections points of f (x).
Example 4. In the figure above on the left, we see a graph of the derivative f ′ (x)
of some continuous function f (x). On the interval [0, 8], where does the function
f (x) grow most rapidly and decay most rapidly?
102 2 Derivatives
Solution: That the function f (x) grows most rapidly means that its derivative is
as large as possible on this interval [0, 8], which happens at about x = 7.6. The
function f (x) decays most rapidly means that its derivative is as small as possible
on this interval [0, 8], which happens at x = 4. Notice that these points are inflection
points of the original function, because f ′ (x) changes direction here.
Exercise 4. In the figure above on the right, we see a graph of the derivative f ′ (x)
of some continuous function f (x). On the interval [−2, 8], where does the function
f (x) grow most rapidly and decay most rapidly?
Example 5. Suppose that f (x) has a continuous derivative and we know its values
in the following table:
x 0 1 2 3 4 5 6 7
f ′ (x) 1 −0.5 −0.25 0.5 1 1.5 0.5 −1
Estimate the coordinates of the critical points of f (x) on the interval [0, 7] and
determine which ones are local minima or maxima.
Solution: We see that the derivative changes sign (so crosses 0) somewhere in
between 0 and 1, 2 and 3, and 6 and 7. If we connect two neighbouring points
in the table by a line, between 0 and 1 the slope is −0.5−1 1−0 = −1.5, so the line
is y = 1 − 1.5(x − 0). We want to know where it crosses zero, so we solve the
1
equation 1 − 1.5(x − 0) = 0 and get x = 1.5 = 23 . This point is a local maxi-
mum, because f ′ (x) changes from positive to negative, so the function changes
from increasing to decreasing. Between 2 and 3 the slope is 0.75, so the line is
y = −0.25 + 0.75(x − 2). Solving −0.25 + 0.75(x − 2) = 0 we get x = 2 31 . This
point is a local minimum, because f ′ (x) changes from negative to positive. Finally,
between 6 and 7 the slope is −1.5, so the line is y = 0.5 − 1.5(x − 6). Solving
0.5 − 1.5(x − 6) = 0 we get x = 6 13 . This point is a local maximum, because f ′ (x)
changes from positive to negative. Of course, all critical points are only estimates,
because we do not know the function exactly.
Exercise 5. Suppose that f (x) has a continuous derivative and we know its values
in the following table:
Estimate the coordinates of the critical points of f (x) on the interval [0, 7] and
determine which ones are local minima or maxima.
Example 6. Given the graphs of functions f (x) (solid green curve) and g(x)
(dashed blue curve) below, find the critical points of f (g(x)).
2.6 Critical points 103
Bonus. If we were asked to determine local maxima and minima, we could use
the second derivative test. The second derivative is equal to
′′ ′
f (g(x)) = f ′ (g(x))g′ (x) = f ′′ (g(x))(g′ (x))2 + f ′ (g(x))g′′ (x).
because f ′′ (2) > 0 and (g′ (−3))2 > 0. So x = −3 is a local minimum. Similarly,
we can check that x = 3 is also a local minimum.
Exercise 6. Given the graphs of functions f (x) and g(x) in the above example, find
the critical points of g( f (x)).
Example 7. After a dog jumps into a pool, a beach ball starts
floating up and down on the waves and its height is given by
Answer to Exercise 1. (a) x = 3 is the only critical point, and it is a local minimum.
(b) The critical points are −3, 0, 3; −3 and 3 are local minima, and 0 is a local
maximum.
Answer to Exercise 3. Critical points are x = −6, −3, −2, 4, 8. Local maxima are
x = −6, −2, local minima are x = −3, 4. Point x = 8 is neither. Inflection points
are x = −4.5, −3, 0.5, 5.5, 8.
Answer to Exercise 4. The function f (x) grows most rapidly means that its deriva-
tive is as large as possible on this interval [−2, 8], which happens at about x = 5.5.
The function f (x) decays most rapidly means that its derivative is as small as pos-
sible on this interval [−2, 8], which happens at x = 0.5.
Answer to Exercise 5. Between 0.5 and 1 the slope is −1−0.51−0.5 = −3, so the line is
y = 0.5−3(x −0.5). Solving 0.5−3(x −0.5) = 0 we get x = 32 . This point is a local
maximum, because f ′ (x) changes from positive to negative. Between 2.5 and 3 the
2+1
slope is 3−2.5 = 6, so the line is y = −1 + 6(x − 2.5). Solving −1 + 6(x − 2.5) = 0
we get x = 2 3 . This point is a local minimum, because f ′ (x) changes from negative
2
to positive.
Answer to Exercise 6. Using the chain rule, (g( f (x)))′ = g′ ( f (x)) f ′ (x). This
derivative is equal to zero if either f ′ (x) = 0 or g′ ( f (x)) = 0. From the graph
of y = f (x) we see that f ′ (x) = 0 at x = −2 and 2, where its slope is zero.
From the graph of y = g(x) we see that g′ (x) = 0 at x = 0. This means that
g′ ( f (x)) = 0 when f (x) = 0, which happens at x = −3.5, 0, 3.5. So (g( f (x)))′ = 0
at x = −3.5, −2, 0, 2, 3.5.
(x − 1)3
f ′ (x) = − 2(x − 1) = 0
2
2
when x − 1 = 0, i.e. x = 1, or (x−1)
2 − 2 = 0. We can
rewrite this as (x − 1)2 = 4, or x − 1 = ±2, so x = −1
and x = 3. The point x = 3 is outside of the interval
[−2, 2], so x = −1 and x = 1 are critical points inside
(−2, 2). We could check whether they are local minima or maxima, but this is not
necessary because we are looking for global min and max. We can simply plug in
these critical points and the endpoints into f (x) and compare the values:
In the next few problems we will have two variables, but they will be related
through some given constraint and, as a result, we will be able to eliminate one
variable by expressing it in terms of the other and then optimize as usual. The next
two problems will refer to the following figures.
y y 2 πy2
P = 2x + 2π = 2x + πy, A = xy + π = xy + .
2 2 4
The perimeter is 100 meters, so 2x +πy = 100. This means that one of the variables
is completely determined by the other, for example, x = 50 − πy
2 , and we can write
the area in terms of y only,
The domain is [0, 1] because the height h cannot be bigger than 1, so we want
to maximize π(h − h3 ) on the interval [0, 1]. Since V ′ (h) = π(1 − 3h2 ) = 0 when
h2 = 31 , or h = √13 , this is the only critical point in the domain. We can see that the
volume V is zero at the endpoints h = 0 or h = 1, and V = 1.2091 when h = √13 ,
q
so the cylinder with the largest volume has height h = √13 and radius r = 23 .
108 2 Derivatives
Exercise 4. We want to make one round enclosure and one square enclosure using
ℓ meters of fence total. What part of the fence we should spend on each enclosure
to maximize the total area? What if we wanted to minimize the total area?
In the next example, we will encounter a situation when the domain is not a
finite closed interval [a, b], and so we have to argue a bit more carefully instead of
just comparing critical points and the endpoints.
Example 5. Suppose that in a family of similar drugs, the price of a drug with a
half-life h hours in a patient’s body is h2 dollars per mg. If we have $100 to spend,
what drug should we buy if our goal is to maximize the amount of drug remaining
in a patient’s body 2 hours after administering it.
Solution: $100 is the price of 100h2
mg of drug with the half-life h hours. Half-life h
means that the amount of drug in a patient’s body is decaying exponentially and
the amount remaining after t hours is proportional to 2−t/h . In our case, the amount
left will be 100
h2
2−t/h after t hours, and after 2 hours it will be
100 − 2
a= 2 h.
h2
We want to maximize this over h > 0. We can of course take the derivative to find
critical points, but there is one useful trick that can simplify the calculations. We
see that we divide by h everywhere, so if we rename 1h as x then a = 100x2 2−2x .
Let us maximize this function over x > 0 and then find optimal h = 1x . First,
when x = 0 or x = ln(2) 1
= 1.44. If we plug these points into a(x) = 100x2 2−2x ,
we get a(0) = 0 and a(1.44) = 28.16. Does this mean that the global maximum is
1
at x = ln(2) ? Since our domain here is all x ∈ (0, ∞), which is not a finite closed
interval, what do we do about the endpoints? We already checked what happens to
a(x) at x = 0, a(0) = 0, but what about x = ∞?
Of course, we can graph the function and see
1
that x = ln(2) is the global maximum. However,
without a graphical calculator, there are several
ways we can proceed. One way to decide if x =
1.44 is a global maximum is to look at the deriva-
tive a′ (x). We see that a′ (x) < 0 when x > ln(2)
1
100x2
a(x) = 100x2 2−2x = →0
22x
2.7 Optimization problems 109
If the domain is an infinite or open interval, for example [0, ∞) or (0, 1), then a
function might not have a global maximum or minimum, as we will see in the next
two problems.
110 2 Derivatives
Example 7. Give an example of a continuous function that (a) does not have a
global minimum on [0, ∞), (b) does not have a global maximum on (−2, 2).
Solution: (a) For example, an exponential decay function y = e−x does not have a
global minimum on [0, ∞), because it is decreasing and approaching 0 as x → ∞,
but it never actually reaches 0, so there is no point x where e−x takes the smallest
value.
(b) For example, y = x2 does not have a global maximum on (−2, 2), because it
is approaching 4 as x approaches −2 or 2, but it never actually reaches 4 because
−2 and 2 are not in the domain, so there is no point x on (−2, 2) where e−x takes
the largest value.
Exercise 7. Sketch a graph of a differentiable function y = f (x) on the open interval
(−4, 4) such that
• f (x) has at least one local min on • f (x) does not have a global min on
(−4, 4) (−4, 4)
• f (x) has at least one local max on • f (x) has a critical point at x = 3
(−4, 4) which is not a local max or min
• f (x) has a global max on (−4, 4) • f (x) has an inflection point at x = −2
The next two problems will refer to the following two figures.
Once we know that f ′ (x) ≥ 0, this means that f (x) is increasing, so its global max-
imum on the interval [−4, 4] is at x = 4, and its global minimum is at x = −4, as in
the figure.
Exercise 8. In the figure above on the right we are given the graph of the second
derivative f ′′ (x) of some function y = f (x) on the interval [−4, 4]. If f ′ (1) = 0,
where is the global maximum and minimum of f (x) on this interval?
1 1 x4 (1 − x)
y′ (x) = 5x4 (1 − x)2 − 2x5 (1 − x) = 5(1 − x) − 2x = 0
168 168 168
when x = 0, x = 1, or when 5(1 − x) − 2x = 0, i.e. x = 57 = 0.714. At the endpoints
y(0) = y(1) = 0, while y(0.714) = 2.5499, so the most likely grade is x = 0.714, or
1
about 71. By the way, if you were wondering, the constant 168 was chosen in such
a way that the area under the curve is equal to 1, representing all students.
Answer to Exercise 3. The perimeter P and the area A of the region are
y 2+π 1 y 2 πy2
P = 2x + y + π = 2x + y, A = xy + π = xy + .
2 2 2 2 8
The perimeter is 100 meters, so 2x + 2+π 2+π
2 y = 100, and x = 50 − 4 y, so we can
write the area in terms of y as
2+π πy2 4+π 2
A = 50 − y y+ = 50y − y .
4 8 8
Since A′ (y) = 50 − 4+π 200
4 y = 0 when y = 4+π = 28.0049, this is the only critical
2+π 2+π 200
point. Since 2x + 2 y = 100, 2 y must be between 0 and 100, so y ∈ [0, 2+π ]=
[0, 38.8984], and the critical point 28.0049 is inside this domain. It remains to com-
pare the values: A = 0 at y = 0, A = 700.1239 at y = 28.0049, and A = 594.1889 at
y = 38.8984. Maximal area is A = 700.1239 when y = 28.0049 and x = 14.0024.
112 2 Derivatives
Answer to Exercise 5. If the distance √ between A and C is x then the swimming dis-
tance between the swimmer and C is 202 + x2 and the running distance between
C and B is 60 − x. Since the swimming speed is 1m/s and running speed is 4m/s,
the total time to reach her umbrella is
√
202 + x2 60 − x p 60 − x
T= + = 400 + x2 + .
1 4 4
We want to minimize this for x between 0 and 60. Let us find critical points:
2x 1 x 1
T ′ (x) = √ − =√ − =0
2 400 + x2 4 400 + x2 4
√
when 4x = 400 + x2 and, squaring both sides,
r
2 2 2 400 400
16x = 400 + x =⇒ x = =⇒ x= = 5.16.
15 15
This critical value is in the domain [0, 60] and plugging it in, we get T = 34.36
seconds. Then we then check the endpoints, T (0) = 35 and T (60) = 63.24, and we
see that the optimal point C is at the distance 5.16 meters from A.
2.7 Optimization problems 113
1 92 1 46 46 92
θ ′ (d) = 92 2
− 2
− 46
− 2 = 2 2
− 2 .
1+( d ) d 1 + ( d )2 d d + 46 d + 922
If we set this equal to zero and solve for d, we get d 2 = 92 · 46, so d = 65.05.
To check that this is the global maximum, we can notice that θ (d) approaches 0
when both d → 0 and d → ∞. This can be seen from the figure, or check using that
arctan(0) = 0 and arctan(x) → π2 as x → ∞. The best viewing angle of the Statue
of Liberty is at the distance of ≈ 65 meters.
Answer to Exercise 8. Because f ′′ (x) in the figure is positive for x < 1, f ′ (x) is
increasing for x < 1, and because f ′′ (x) is negative for x > 1, f ′ (x) is decreasing
for x > 1. This means that x = 1 is the global maximum of f ′ (x). We are given that
f ′ (1) = 0, so f ′ (x) is never positive. Once we know that f ′ (x) ≤ 0, this means that
f (x) is decreasing, so its global maximum on the interval [−4, 4] is at x = −4, and
its global minimum is at x = 4.
114 2 Derivatives
c
2. logistic curve y = −b(t−a)
;
1+e
3. exponential with a limit curve y = c(1 − e−bt );
(a) the number of bacteria in a Petri dish grows quickly from the beginning, but then
runs out of food and dies out quickly;
(b) the number of bacteria in a Petri dish grows faster and faster initially and then
stabilizes;
(c) the number of bacteria in a Petri dish grows quickly from the beginning until it
stabilizes;
(d) the number of bacteria in a Petri dish grows faster and faster, but then runs out
of food and dies out over time;
(e) the number of bacteria in a Petri dish grows quickly from the beginning, but then
runs out of food and dies out over time.
Bell curve. First, let us introduce the so called bell curve given by
(x−a)2
−
y = ce 2b2
where a ∈ R is any real number, and b > 0, c > 0 are any positive numbers. Various
features of these curves are summarized in the figure below, and we will check
some of them in the next two problems. This family of curves is most famously
used to describe (or model) the distribution of many quantities occurring in nature,
physical experiments, finance, etc.16 Here, we are simply interested in its shape
16 https://en.wikipedia.org/wiki/Normal_distribution#Occurrence_and_applications
2.8 Parametric families of functions 115
and basic properties. In the first example we will see how we can obtain all bell
curves by rescaling one of them.
x2
Example 1. Given a standard bell curve y = e− 2 with parameters a = 0, b = 1 and
c = 1, if we stretch its graph horizontally b times, stretch it vertically c times and
shift it to the right by a, what is the function corresponding to the resulting curve?
Solution: Recall that stretching a graph of f (x) horizontally b times corresponds to
f ( bx ), then stretching the result vertically c times corresponds to c f ( bx ) and, finally,
shifting to the right by a corresponds to c f ( x−a −x2 /2 , we will get
b ). When f (x) = e
2 2
y = ce−(x−a) /(2b ) , which is the general bell curve above.
x2
Exercise 1. Show that y = e− 2 has the global maximum y = 1 at x = 0 and two
inflection points at x = −1, x = +1. Explain how this, together with the previous
example, implies the location of the maximum (y = c at x = a) and inflection points
(x = a ± b) for the general bell curve above.
Logistic curve. Next, let us consider the so called logistic function given by
c c
y= =
1 + e−b(x−a) 1 + κe−bx
where parameter a is any real number, b > 0, c > 0, κ > 0 are any positive numbers,
and where parameters a and κ are interchangeable and related by κ = eba or a =
116 2 Derivatives
1
b ln(κ). Logistic curves have many applications, for example, in modelling various
growth processes.17
The two formulas above are just slightly different representations of the same
function because
Exercise 2. Show that y = 1+e1 −x is increasing and has one inflection point at x = 0,
where y(0) = 12 . Discuss how this, together with the previous example, explains
the shape of the general logistic curve above.
Exponential with a limit. Next, we will introduce the exponential with a limit
function given by
y = c(1 − e−bx )
for x ≥ 0, where b > 0 and c > 0 are any positive numbers. Notice that here our
domain starts at x = 0 and we do not shift the function horizontally, so we do not
have a parameter a as in the above two families. We think of x = 0 as the starting
point of the process, although we could, of course, introduce a horizontal shift if
we wanted to.
y = cxe−bx
for x ≥ 0, where b > 0, c > 0 are any positive numbers. Similarly to the expo-
nential with a limit, the domain here starts at zero and the function grows quickly
from the beginning, but eventually starts decreasing. The function has a horizon-
tal asymptote y = 0 as x → +∞ because cxe−bx = ecxbx and exponential growth ebx
dominates the power function x at infinity. The constant c here does not quite play
the same role as before (stretching vertically by c) for the reason explained in the
next example.
−b(x−a) −bx
y = ce−e = ce−κe
where κ > 0, b > 0, c > 0 are any positive numbers, a ∈ R is any real number, and
where κ = eba , or a = 1b ln(κ). Gompertz functions are used to model growth of
tumours, adoption of technology (e.g. cellphones), etc.18 As in the case of logistic
function, the two formulas above are just different representations of the same
function, where the parameter a can be replaced by κ or vice versa. All Gompertz
functions are rescalings of one of them.
−x
Example 6. Given a standard Gompertz curve y = e−e with parameters a = 0,
b = 1 and c = 1, if we shrink its graph horizontally b times, stretch it vertically
c times and shift it to the right by a, what is the function corresponding to the
−x
resulting curve? What are the horizontal asymptotes of y = e−e ?
Solution: Shrinking a graph of f (x) horizontally b times corresponds to f (bx), then
stretching the result vertically c times corresponds to c f (bx) and, finally, shifting
−x
to the right by a corresponds to c f (b(x − a)). When f (x) = e−e , we will get
−b(x−a)
y = ce−e , which is the general Gompertz curve above. Because e−x → 0 as
−x
x → ∞, we see that e−e → e−0 = 1, and because e−x → ∞ as x → −∞, we see
−x
that e−κe → e−∞ = 0. This matches the horizontal asymptotes of the general
Gompertz function in the figure above after vertical rescaling by c.
18 https://en.wikipedia.org/wiki/Gompertz_function#Example_uses
2.8 Parametric families of functions 121
b(x−a) bx
y = ce−e = ce−κe
where κ > 0, b > 0, c > 0 are any positive numbers, a ∈ R is any real number,
and where κ = e−ba , or a = − b1 ln(κ). What changing b to −b does is it flips
the Gompertz growth functions horizontally and turns them into Gompertz decay
functions. Most famously, these functions can be used to model human survival
chances with age, based on the empirical Gompertz law of mortality.19 . We will
not discuss this further here, but you can learn more about it in the video in the
footnote link.20 Similarly, by changing b to −b, logistic growth above can be turned
into logistic decay.
x2 x2 x2
f ′′ (x) = e− 2 (−x)2 + e− 2 (−1) = e− 2 (x2 − 1),
we can see that f ′′ (x) = 0 when e−x = 1, or x = 0, it is positive for x < 0 and
negative for x > 0, so the function is concave up for x < 0 and concave down for
x > 0. This means that x = 0 is the only inflection point. If we shrink the graph of
y = 1+e1 −x horizontally b times, stretch it vertically c times and shift it to the right
by a, the inflection will move to x = a where the value will be y = 2c , just like in
the figure of the general logistic curve.
Answer to Exercise 3. Since T (0) = k + c(1 − e−b·0 ) = k and at time t = 0 the pie
is at room temperature 20◦ C, this means that k = 20. The limit of T = k + c(1 −
e−bt ) as t → ∞ is k + c(1 − 0) = k + c, which should be the oven temperature, so
k + c = 200 and c = 200 − k = 200 − 20 = 180. So T = 20 + 180(1 − e−bt ). The
derivative T ′ (0) = 180b is exactly the initial rate of increase of the pie temperature,
so 180b = 18 and b = 0.1. We finally get that T = 20 + 180(1 − e−0.1t ).
T (t) = T (y(t)).
In other words, we plug in y(t) into T (y) to get T (y(t)) – temperature as a function
of time. Sometimes the relationship is given simply by composition of functions.
Since we want to find T ′ (t), we differentiate the key equation, in this case us-
ing the chain rule T ′ (t) = (T (y(t)))′ = T ′ (y(t))y′ (t) = 0.007 ◦ C/m × 900 m/min =
6.3 ◦ C/min.
2.9 Related rates 125
T ′ (t) 2 ◦
T ′ (y(t)) = ′
= C/m = 5 ◦ C/km.
y (t) 400
If, instead, we knew how fast the temperature outside is changing, T ′ (t), and how
temperature changes with altitude, T ′ (y), we could find how fast the airplane is
′ (t)
climbing at that moment, solving for y′ (t) = T T′ (y(t)) . This is an example of implicit
differentiation, which will also be useful in the next exercise.
Exercise 2. Bread dough is rising in the oven during
the first few minutes of baking. Its shape is (roughly)
hemispherical and, at the moment when the radius
is r = 10 cm, its volume is increasing at the rate of
200 cm3 /min. How fast is the radius changing at the
same moment?
Another way it to use implicit differentiation. Taking the derivative of the equation
x(t)2 + y(t)2 = 1.69 we get 2x(t)x′ (t) + 2y(t)y′ (t) = 0, and then solving for y′ (t),
x(t)x′ (t)
y′ (t) = − .
y(t)
We know thatpx(t) = 1 and x′ (t)√= 0.5, and we can find y(t) from the key equation
again, y(t) = 1.69 − x(t)2 = 1.69 − 12 = 0.83, so y′ (t) = − 0.830.5
≈ −0.6 m/s.
and is approaching the statue with the speed 5 m/s, how fast does the angle of view
of the statue θ change?
Answer to Exercise 1. Variables are: radius r, area A and time t. We are given that
r′ (t) = 1 mm/day when r(t) = 5 mm. We want to know A′ (t). The key equation
relating the radius and area is A = πr2 , so A(t) = π(r(t))2 . Differentiating this
equation we get that A′ (t) = 2πr(t) · r′ (t) = 2π · 5 · 1 = 10π mm2 /day.
Answer to Exercise 2. Variables are: radius r, volume V and time t. We are given
that V ′ (t) = 200 cm3 /min when r(t) = 10 cm. We want to know r′ (t). The key
equation relating the radius and volume of half the sphere is V = 2π 3
3 r , so V (t) =
2π 3 ′
3 (r(t)) . We want to know r (t), so we can solve for r(t) first and then take the
derivative, or we can use implicit differentiation. Let us use implicit differentiation.
Differentiating the equation V (t) = 2π 3 ′ 2 ′
3 (r(t)) we get that V (t) = 2π(r(t)) · r (t).
Solving for r′ (t) we get
V ′ (t) 200
r′ (t) = 2
= = 0.3183 cm/min.
2π(r(t)) 2π(10)2
Answer to Exercise 3. The variables are x, y, θ and time t. We are given that
y′ (t) = 400 m/min and θ ′ (t) = −1.8 rad/min when y = 550 and θ = π6 . We want
to find x′ (t) at the same moment. The relationship between variables from the right
triangle is tan(θ ) = xy . Solving for x we get x(t) = y(t) cot(θ (t)), and taking the
derivative (also recall or check that (cot(x))′ = − csc2 (x) = − sin21(x) ),
Answer to Exercise 4. We are given that d ′ (t) = −5 m/s (minus sign because the
boat is moving toward the statue, so the distance is decreasing) when d(t) = 100
m, and we want to know θ ′ (t), so we need to find the equation relating d and θ .
Let α be the angle between the horizontal line and line of sight to the bottom of
the Statue of Liberty, and let β be the angle between the horizontal line and line of
sight to the top of the Statue of Liberty, as in the figure. Then θ = β − α. From the
right triangles, we see that
46 92
tan(α) = and tan(β ) = ,
d d
so α = arctan( 46 92 92 46
d ), β = arctan( d ), and θ (t) = arctan( d(t) ) − arctan( d(t) ). This is
our key equation. Differentiating it, we get
1 92 1 46
θ ′ (t) = 92 2
− d ′
(t) − 46 2
− d ′ (t)
1 + ( d(t) ) d(t)2 1 + ( d(t) ) d(t)2
46 92
= − d ′ (t)
d(t)2 + 462 d(t)2 + 922
46 92
= − (−5) = 0.0059 rad/s.
1002 + 462 1002 + 922
Chapter 3
Integrals
129
130 3 Integrals
D(b) − D(a) = A1 − A2 .
Example 1. A car drives in a ‘positive direction’ on some road with the speed of
20 mph between 2 p.m. and 5 p.m., then turns around and drives in the opposite
direction with the speed of 40 mph between 5 p.m. and 9 p.m., and then turns
around again and drives in the original direction with the speed of 50 mph between
9 p.m. and 11 p.m. What is the total distance travelled by the car, and what is the
change in its position along this road. Draw the graph of the velocity and check
that the calculation matches the above formula in terms of areas.
3.1 Definite integrals: the case of velocity 131
Exercise 1. The table below gives the velocity of the car driving along some road
during different time intervals. Compute the total distance travelled and the change
in position.
t 1 to 3 p.m. 3 to 5 p.m. 5 to 7 p.m. 7 to 9 p.m.
v(t) 60 mph −55 mph 50 mph −60 mph
Sketch the graph of the velocity and check that various distances are exactly the
areas in your figure.
For the future, it is convenient to modify the formula Distance = Speed × Time
slightly and rewrite it as
where we simply take into account the positive or negative direction. When the
position D(t) is decreasing, Position Change = −Distance and Velocity = −Speed,
so it is still the same formula just with a minus sign. It is more convenient because
we do not need to mention the direction explicitly, since it is reflected by the plus
or minus sign.
132 3 Integrals
Riemann sum approximation. In the two problems above, the time interval
[a, b] was divided into several subintervals where the velocity was constant. If the
velocity is not constant, we can still use the same calculation to approximate the
distance travelled by the car and its change in position. To do that, we can divide the
time interval [a, b] into many small subintervals and, because the velocity cannot
change much over a very short period of time, the velocity is almost constant on
each subinterval. This means that if we measure the velocity at any particular time
on a small subinterval, we will get an approximation
∆D ≈ Velocity × ∆t
In the figures above we divided the interval [a, b] into 20 subintervals and then
made two possible choices of velocity on each subinterval: at the left endpoint or
at the right endpoint. The sum corresponding to the left endpoint is called the left
Riemann sum, and to the right endpoint is called the right Riemann sum.
Example 2. The following table shows the ve-
locity v(t) of the bowling ball (in meters per
second) at time t (in seconds), between the
moment it was released until it hit the pins.
t 0 0.5 1 1.5 2 2.5
v(t) 8.6 7.7 7.15 6.8 6.6 6.5
3.1 Definite integrals: the case of velocity 133
Estimate the length of the bowling lane from above and below.
Solution: Because of friction, the bowling ball is
slowing down, which is also reflected in the above
table, because the values v(t) are decreasing. As a
result, the speed is the highest at the beginning of
each interval (left endpoint), and it is lowest at the
end of each interval (right endpoint). This means
that the left Riemann sum, in this case correspond-
ing to 5 intervals of length ∆t = 0.5 each,
8.6 × 0.5 + 7.7 × 0.5 + 7.15 × 0.5 + 6.8 × 0.5 + 6.6 × 0.5 = 18.425,
overestimates the total distance travelled by the ball. The right Riemann sum
7.7 × 0.5 + 7.15 × 0.5 + 6.8 × 0.5 + 6.6 × 0.5 + 6.5 × 0.5 = 17.375
underestimates the total distance travelled by the ball. We conclude that the length
of the lane is between 17.375 and 18.425 meters. Notice how in the figure above,
because the function is decreasing, the green rectangles with height given by the
left endpoints are above the graph of y = v(t) and the blue rectangles with height
given by the right endpoints are below the graph, so the area below the graph is,
indeed, in between the right and left Riemann sums.
Exercise 2. The following table shows the speed v(t) of the bowling ball (in meters
per second) at time t (in seconds), between the moment it was dropped from the
roof of a building until it hit the ground.
t 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
v(t) 0 2.45 4.9 7.35 9.8 12.25 14.7 17.15 19.6
(a) Estimate the height of the building from above and below using four equal subin-
tervals.
(b) Estimate the height of the building from above and below using eight equal
subintervals.
(c) If the speed is v(t) = 9.8t, plot its graph and deduce what the exact height of the
building is using the area formula.
134 3 Integrals
Example 3. Given the following table of velocity v(t) (in meters per second) at
time t (in seconds) between t = 0 and t = 3,
which of the following expressions is not a Riemann sum estimate of the position
change D(3) − D(0), and why?
(a) (1.6 + 0.15 − 0.4) × 1
(b) (0.7 + 0.15 − 0.2 − 0.4 − 0.5 − 0.55) × 0.5
(c) (1.6 + 0.7 + 0.15 − 0.2 − 0.4 − 0.5 − 0.55) × 0.5
(d) (1.6 + 0.7 + 0.15 − 0.2 − 0.4 − 0.5) × 0.5
Solution: (a) Notice that here we multiply the velocity values by 1, not 0.5, so
the increment of time is ∆t = 1, which means that this sum could correspond to a
Riemann sum with 3 subintervals on the interval [0, 3]. This means that we need
to look at the values of v(t) at t = 0, 1, 2 and 3, and we can recognize that (1.6 +
0.15 − 0.4) × 1 is a left Riemann sum v(0) × ∆t + v(1) × ∆t + v(2) × ∆t.
(b) This is a right Riemann sum with ∆t = 0.5 and 6 subintervals. The number
of terms matches the number of subintervals.
(d) This is a left Riemann sum with ∆t = 0.5 and 6 subintervals.
(c) Here, it looks like ∆t = 0.5, but this is not a Riemann sum with ∆t = 0.5 and
6 subintervals, because the number of terms is 7 and it does not match the number
of subintervals. By using all the values in the table we are overcounting, or more
precisely, we are counting one of the intervals twice, at the left and right endpoints.
Exercise 3. Given velocity v(t) = e−t between t = 0 and t = 4, which of the fol-
lowing is not a Riemann sum estimate of the position change, and why?
(a) (e−2 + e−4 ) × 2
(b) (e−1 + e−2 + e−3 + e−4 ) × 1
(c) (1 + e−0.5 + e−1 + e−1.5 + e−2 + e−2.5 + e−3 + e−3.5 ) × 0.5
(d) (e0 + e−1 + e−2 + e−3 + e−4 ) × 1
Constant acceleration. Let us now consider several examples with constant
acceleration, which makes the areas particularly easy to compute.
Example 4. An object is moving along a straight line
with initial velocity 4 m/s and acceleration −1 m/s2 .
What is the total distance travelled and position
change at time t = 9 s?
Solution: Acceleration is the derivative of velocity,
so constant acceleration means that velocity v(t) is a
linear function with the slope equal to acceleration.
In our case, the slope is −1 and initial velocity is
v(0) = 4, so the function is v(t) = 4 − t.
3.1 Definite integrals: the case of velocity 135
Exercise 4. A baseball is thrown from height 0 directly upwards with speed 29.4
m/s. What is the height of the baseball at time t = 5 seconds?
v0 1
Slope = − = −9.8, Area = v0t0 = 78.4.
t0 2
From the first one, we get v0 = 9.8t0 and, plugging this into the second equation,
we get 12 9.8t02 = 78.4, or t02 = 16, or t0 = 4. Then v0 = 9.8t0 = 9.8 × 4 = 39.2 m/s.
Exercise 5. After spotting a police officer in a school zone with a 10 m/s speed
limit, a surprised driver slams on the brakes and comes to a complete stop. The
police officer was not equipped with a speed radar to see how fast the driver was
going. However, by looking at the tire skid marks the police officer determined that
(a) It took the driver 35 m to come to a complete stop.
(b) The driver was braking (decelerating) at 7 m/s2 .
Does the officer have enough information to issue a speeding ticket? Hint: Sketch
the graph of velocity and express unknown initial velocity v0 and time to stop t0 in
terms of given information.
136 3 Integrals
Notation and terminology. The change of position D(b) − D(a) is often called
total displacement, which is A1 −A2 as opposed to total distance travelled A1 +A2 .
Now that we understood that the total displacement D(b) − D(a) can be computed
from velocity v(t) = D′ (t) using the areas of the Riemann sum approximations, it
is time to introduce important notation and terminology.
First of all, the difference of areas A1 − A2 above
and below the x-axis on the interval [a, b] is called
the definite integral of v(t) on the interval [a, b],
and it is denoted
Z b
v(t) dt.
a
n
∑ v(ti∗ )∆t
i=1
where v(t) can be replaced by a specific formula, and where the meaning of each
symbol is as follows: .
• The symbol Σ (sigma)
represents the sum.
• If we divide the inter-
val [a, b] into n small
subintervals, the let-
ter i represents the in-
dex enumerating these
subintervals from 1 to
n. For example, i = 3
means that we are look-
ing at the third subin-
terval starting from t = a. Other letters can be used instead of i, for example,
k, ℓ, m, etc. Below Σ we write i = 1 to indicate that the first interval index is 1,
and above Σ we write n to indicate that the last interval index is n. Sometimes 1
and n can change depending on the context.
3.1 Definite integrals: the case of velocity 137
• The factor ∆t represents the increment of time t, which is also the width of
subintervals.
• Points ti∗ represents some specific choice of a point inside the subinterval #i.
For example, it could be a left endpoint or right endpoint in the case of the left
or right Riemann sums. Because v(ti∗ ) is the ±height of rectangle #i and ∆t is
its width, v(ti∗ )∆t is exactly the ±Area of this rectangle and the notation ∑ni=1
means that we are adding up these terms, just like we did in the above problems.
Example 6. Given a function v(t) = cos(t):
(a) How do we denote its definite integral on the interval [0, π]?
(b) What is the definition of the definite integral on the interval [0, π]? Using the
definition, what is this integral?
(c) What is another meaning of the definite integral on the interval [0, π]?
(d) How can we compute the definite integral on the interval [0, π]?
(e) How do we denote a general Riemann sum for v(t) on the interval [0, π]?
(f) Write down the right Riemann sum on the interval [0, π] with four subintervals.
Solution: (a) The definite integral R of y = cos(t)
on the interval [0, π] is denoted 0π cos(t) dt.
(b) Its definition is the difference A1 − A2 of
areas above and below the x-axis on this inter-
val, as in the figure. In the case of cosine, these
two
R π areas are the same so they cancel out and
0 cos(t) dt = 0.
(c) Another meaning of this definite integral is
the total displacement D(π) − D(0) of an object
moving along a straight line with this velocity
v(t) = cos(t). In other words, if D′ (t) = cos(t) then D(π)−D(0) = 0π cos(t) dt = 0.
R
(d) Above, we noticed that this definite integral is zero, because the areas A1
and A2 cancel out, but we can also compute this integral by approximating it with
Riemann sums.
(e) A general Riemann sum in this case is ∑ni=1 cos(ti∗ )∆t. We can be a bit more
specific and replace ∆t with πn , because the interval has length π and we divide it
into n equal subintervals: ∑ni=1 cos(ti∗ ) πn .
(f) If we divide [0, π] into four subintervals then the right endpoints will be
t1∗ = π4 , t2∗ = π2 , t3∗ = 3π ∗
4 and t4 = π, so the right Riemann sum will be
π π 3π π
cos + cos + cos + cos(π) = −0.7853.
4 2 4 4
Exercise 6. Given a function v(t) = 1 − |t|:
(a) How do we denote its definite integral on the interval [−1, 1]?
(b) What is the definition of the definite integral on the interval [−1, 1]? Using the
definition, what is this integral?
(c) What is another meaning of the definite integral on the interval [−1, 1]?
138 3 Integrals
(d) How can we compute the definite integral on the interval [−1, 1]?
(e) How do we denote a general Riemann sum for [−1, 1] on the interval [0, π]?
(f) Write down the left Riemann sum on the interval [−1, 1] with four subintervals.
Example 7. A cat is chasing a mouse, both running along a straight wall. The
mouse never stops running but can reverse direction. At time t ≥ 0, the velocity
of the mouse is v(t) and the distance travelled by the mouse is d(t). What is the
relationship between v(t) and d(t)? Which one of the functions v(t) and d(t) is
invertible?
Solution: The total displacement is the
sum of areas A1 + A2 . When velocity is
negative, if we flip it around the x-axis then
it will become speed |v(t)| but the area A2
will stay the same, so the area below the
speed function |v(t)| is exactly A1 + A2 .
This means that we can express the total
distance travelled at time t as
Z t
d(t) = |v(s)| ds.
0
Exercise 7. A cat is chasing a mouse, both running along a straight wall. The mouse
never stops running but can reverse direction. At time t, the velocity of the mouse
is vm (t) and velocity of the cat is vc (t). Also, at time t, the distance travelled by the
mouse is dm (t) and distance travelled by the cat is dc (t). Write down the formulas
for the total displacement and total distance travelled by the cat in terms of the total
distance x travelled by the mouse.
Taking a limit. We saw that as we increase the number of subintervals n and the
width or rectangles gets smaller and smaller, Riemann sums get closer and closer
R bthe difference of areas A1 − A2 , which we called the definite integral and denoted
to
a v(t) dt. This statement can be written using the limit n → ∞ notation:
n Z b
lim ∑ v(ti∗ )∆t = v(t) dt.
n→∞ a
i=1
In particular, this explains the notation ab v(t) dt, whichR resembles the Riemann
R
sum, with the sum Σ now replaced by the integral sign and the interval indices
are replaced by the endpoints a and b. Since the definite integral is also the change
3.1 Definite integrals: the case of velocity 139
of position D(b) − D(a) on the interval [a, b], a more complete summary of this
section is the following formula: if v(t) = D′ (t) then
n Z b
lim
n→∞
∑ v(ti∗ )∆t = a
v(t) dt = D(b) − D(a).
i=1
Answer to Exercise 2. The speed is increasing so the left Riemann sums will
underestimate and right Riemann sums will overestimate the distance travelled by
the bowling ball, i.e. the height of the building. (a) Left Riemann sum is (0 + 4.9 +
9.8 + 14.7)0.5 = 14.7 and right Riemann sum is (4.9 + 9.8 + 14.7 + 19.6)0.5 =
24.5, so the height is between 14.7 and 24.5 meters.
(b) Left Riemann sum is (0 + 2.45 + 4.9 + 7.35 + 9.8 + 12.25 + 14.7 + 17.15) ∗
0.25 = 17.15 and right Riemann sum is (2.45 + 4.9 + 7.35 + 9.8 + 12.25 + 14.7 +
17.15 + 19.6)0.25 = 22.05, so the height is between 17.5 and 22.5 meters. Notice
how the estimates improved compared to part (a) when we increased the number
of intervals. Compare the figures below with 4 and 8 intervals.
(c) If v(t) = 9.8t is linear, the region under its graph is a triangle with sides 2
and 19.6, so its area is 21 · 2 · 19.6 = 19.6. This means that the exact height of the
building is 19.6 meters.
From the first one, we get v0 = 7t0 and, √ plugging this into the√second equation,
2
we get (7t0 )t0 = 70, or t0 = 10, or t0 = 10. Then v0 = 7t0 = 7 10 ≈ 22.13 m/s,
which is above the speed limit.
Answer to Exercise 6. (a) The definite inte-
gral ofRy = 1 − |t| on the interval [−1, 1] is de-
1
noted −1 (1 − |t|)dt.
(b) Its definition is the difference A1 − A2
of areas above and below the x-axis on this
interval, as in the figure. In the caseR of 1 −
1
|t|, A1 = 21 · 1 · 2 = 1 and A2 = 0, so −1 (1 −
|t|)dt = 1.
(c) Another meaning of this definite inte-
gral is the total displacement D(1) − D(−1)
of an object moving along a straight line with thisR1
velocity v(t) = 1 − |t|. In other
words, if D′ (t) = 1 − |t| then D(1) − D(−1) = −1 (1 − |t|)dt = 1.
(d) Above, we noticed that this definite integral is 1, but we can also compute
this integral by approximating it with Riemann sums.
3.1 Definite integrals: the case of velocity 141
(e) A general Riemann sum in this case is ∑ni=1 (1 − |ti∗ |)∆t. We can be more
specific and replace ∆t with 2n , because the interval has length 2 and we divide it
into n equal subintervals: ∑ni=1 (1 − |ti∗ |) 2n .
(f) If we divide [−1, 1] into four subintervals then the left endpoints will be
t1∗ = −1, t2∗ = −0.5, t3∗ = 0 and t4∗ = 0.5, so the left Riemann sum will be
2
(1 − | − 1|) + (1 − | − 0.5|) + (1 − |0|) + (1 − |0.5|) = 1.
4
Answer to Exercise 7. Since the distance x travelled by the mouse is dm (t), we can
solve x = dm (t) for t as t = dm−1 (x), where we assume that the function is invertible
because the mouse never stops running. The the distance travelled by the cat at that
time is dm (t) = dm (dc−1 (x)). The total displacement at time t is
Z t Z d −1 (x)
c
vc (s) ds = vc (s) ds.
0 0
142 3 Integrals
n Z b
D(b) − D(a) = lim
n→∞
∑ v(ti∗ )∆t = a
v(t) dt.
i=1
we can see that the analogue of the above formula in the general case will be
n Z b
F(b) − F(a) = lim ∑ f (xi∗ )∆x = f (x) dx.
n→∞ a
i=1
Of course, the key step is ∆F ≈ f (xi∗ )∆x, which is true because the rate of change
on a small interval should be almost constant, so the rate of change f (xi∗ ) at any
point xi∗ on this interval can be approximated by the average rate of change ∆F ∆x . Let
Rb
us add that when we write a definite integral a f (x) dx, the endpoints a and b are
also called the limits of integration and the function f (x) is called the integrand.
1 https://youtu.be/DcgiBBhGreY.
3.2 Definite integrals: general case 143
is how much more Car 1 travelled compared to Car 2. If this is negative, it means
that Car 1 travelled less. Notice that we do not need to know A3 and A4 .
Another way to get the same answer is to divide the interval in the middle point
where the two velocities are equal, near 3.8. Up to that point, the velocity of Car 2
is bigger and the extra distance it travels compared to Car 1 is exactly the area A1 in
between the two graphs. If this is not clear, notice that on this interval Car 1 travels
A3 and Car 2 travels A1 + A3 , so it travels extra distance A3 . Similarly, in the second
half the velocity of Car 1 is bigger and it travels extra distance A2 . Combining the
two intervals, we get that Car 1 travels A2 − A1 extra distance compared to Car 2.
Again, if this is negative, it means that Car 1 travelled less.
Since Car 1 was 10 meters ahead at time t = 1 min and it travelled extra A2 − A1
between 1 and 7 min, does this mean that it is ahead by 10 + A2 − A1 ? The answer
is no! We need to pay attention to the units of the variables and ‘area’ in the figure.
144 3 Integrals
Time on the x-axis has minutes as its units, while the velocity on the y-axis has
m/s as its units, so the units of area or distance v(t)∆t are m/s × min = m/s × 60s
= 60m. So one unit of ‘area’ in this figure is equal to 60 meters, which means
that the extra distance travelled by Car 1 is (A2 − A1 ) × 60m once we take units
into account. At time t = 7 minutes Car 1 is 10 + 60(A2 − A1 ) meters ahead. If
this number is negative, ahead by a negative number means that Car 1 is actually
behind.
Age 6 7 8 9 10 11 12 13 14 15 16 17
Boys 6.70 6.18 5.90 5.60 5.51 5.68 6.54 7.64 6.73 4.46 2.58 1.10
Girls 6.70 6.27 6.00 5.98 6.33 6.68 6.04 4.29 2.42 1.13 0.64 0.33
R7
Example 3. If a function f (x) is even, and we know that −3 f (x) dx = 5 and
R7 R7
3 f (x) dx = 1, what is the integral 0 f (x) dx?
Solution: By the above property,
Z 7 Z 3 Z 7
f (x) dx = f (x) dx + f (x) dx,
−3 −3 3
3 7
f (x) dx − 37 f (x) dx = 5 − 1 =
R R R
so −3 f (x) dx = −3
4. Because Rf (x) is even, as we can see in the figure,
3
the integral −3 f (x) dx consists of two equal parts on
subintervals [−3, 0] and [0, 3], so 03 f (x) dx = 42 = 2.
R
matter which number is bigger, a or b. This way we don’t need to worry which
number is bigger when we use this formula. But if we swap a and b, the right
hand side will become F(a) − F(b) = −(F(b) − F(a)), so only the sign will
change. That is why we agree that ab f (x) will only change the sign if we swap
R
a and b.
• Another reason is that we want the formula ac f (x) dx + cb f (x) dx = ab f (x) dx
R R R
to be true no matter the numbers a, b and c are. If we take b equal to a then this
formula becomes
Z c Z a Z a
f (x) dx + f (x) dx = f (x) dx = 0.
a c a
The integral on the right hand side is equal to 0, because the areaR from a to a is
zero, so this formula will be true if we agree that ca f (x) dx = − ac f (x) dx.
R
R3 R −1 R3
Example 4. If −1 f (x) dx = 4 and 2 f (x) dx = 2, what is 2 f (x) dx?
Solution: Let us start with the usual property −1 f (x) dx = −1 f (x) dx + 23 f (x) dx.
R3 R2 R
R3
f (x) dx = 4 and we want to find 23 f (x) dx, so we need −1
R R2
We know that −1 f (x) dx.
R2 R −1
But, by our convention, −1 f (x) dx = − 2 f (x) dx = −2, so we can write the
above property as 4 = −2 + 23 f (x) dx, and 23 f (x) dx = 4 + 2 = 6.
R R
R −1 R2 R1
Exercise 4. If −2 f (x) dx = 4, −2 f (x) dx = 6 and 2 f (x) dx = −2, what is
R1
−1 f (x) dx?
Another two properties of definite integrals is that the integral of the sum is
equal to the sum of integrals,
Z b Z b Z b
f (x) + g(x) dx = f (x) dx + g(x) dx,
a a a
and
146 3 Integrals
Z b Z b
κ f (x) dx = κ f (x) dx,
a a
i.e. we can take a constant factor κ outside of the integral. The first property is true
because we can add the areas vertically, and the second property is true because
stretching a function κ times vertically multiplies the area by κ. The two properties
together are called the linearity of integral.
R3 R3 R3
Example 5. If 1 f (x) dx = −2 and 1 g(x) dx = −1, what is 1 (5 f (x) − 7g(x)) dx?
Solution: By the linearity of integral,
Z 3 Z 3 Z 3
(5 f (x) − 7g(x)) dx = 5 f (x) dx − 7 g(x) dx = 5(−2) − 7(−1) = −3.
1 1 1
Rb
f (x) + g(x) dx = −2 and ab f (x) − g(x) dx = 4, what is
R
Exercise 5. If a
Rb
a g(x) dx?
If a function is bigger than its definite integral is also bigger, which is obvious by
looking at the areas. This is called monotonicity of integral. Notice that this is only
true if a ≤ b! Unlike other properties, if we flip a and b the minus sign will also
flip the inequality. For example, if 2 = 12 f (x)Rdx ≤ 12 g(x)Rdx = 7 then flipping the
R R
Exercise 6. Order the following definite integrals from the smallest to the largest
without any calculations first. After that find their values.
R1 R1 R1 √
(a) 1 dx (b) (1 − |x|) dx (c) 1 − x2 dx.
−1 −1 −1
3.2 Definite integrals: general case 147
Solution: We should recognize that the second number is the right Riemann sum of
R π/2 π π
0 cos(x) dx on the interval [0, 2 ] with n = 4 subintervals, each of length 8 . Since
cos(x) is decreasing on this interval, we already discussed in the last section that
the right Riemann sum is underestimating the integral, so it is smaller. Of course,
this is the same idea as monotonicity, because the rectangles with height at the right
endpoints will be below the function when it is decreasing.
1 R
Example 8. The definite integral −1 f (x) dx corresponding to the left figure above
is approximately equal to which of the following?
Answer to Exercise 1. Since the quantity we are talking about is the average daily
profit (in dollars) and the rate f (x) is given in dollars per customer, the variable
x must be the number of customers, so the unit of 40.5 and 60.5 is the number
of customers. It might look strange to consider half a customer but, because we
are talking about the average daily profit over a long period of time, x can also
be thought of as the average daily number of customers, so the limits 40.5 and
60.5 make sense. If P(x) is the average daily profit when x is the average number
of customers,
R 60.5
the rate f (x) is its derivative P′ (x), and the meaning of the definite
integral 40.5 f (x) dx is the extra average daily profit P(60.5) − P(40.5) when the
average number of customers increases from 40.5 to 60.5. It means that the units
of 150 are dollars.
Answer to Exercise 2. The the difference between the average height of boys and
girls at age 17 is 1 + A2 − A1 , similarly to Example 2 and because the units of area
are cm/year × year = cm. Using the values in the table, (6.70 + 6.27 + 6.00 +
5.98 + 6.33 + 6.68 + 6.04 + 4.29 + 2.42 + 1.13 + 0.64 + 0.33) × 1 = 52.81 is the
left Riemann sum for the growth of girls on the interval [6, 18], and (6.70 + 6.18 +
5.90 + 5.60 + 5.51 + 5.68 + 6.54 + 7.64 + 6.73 + 4.46 + 2.58 + 1.10) × 1 = 64.62
is the left Riemann sum for the growth of boys on the interval [6, 18]. Although age
18 is not in the table, using the twelve values from age 6 to 17 estimates growth
over the twelve year period from 6 to 18. The estimate of the difference of average
heights at age 18 is 1 + 64.62 − 52.81 = 12.81 cm.
0 7
f (x) dx − 07 f (x) dx = 5 − 7 =
R R R
so −3 f (x) dx = −3
−2. BecauseR f (x) is odd, as we can Rsee in the figure,
the integral 03 f (x) dx differs from −3 0
f (x) dx only
by a sign, so it is equal to 2.
Answer to Exercise 4. Here we divide the interval [−2, 2] into [−2, −1], [−1, 1]
and [1, 2], so
Z 2 Z −1 Z 1 Z 2
f (x) dx = f (x) dx + f (x) dx + f (x) dx.
−2 −2 −1 1
Solving
Rb these two equations for p and q, we get that p = 1 and q = −3, so
a g(x) dx = −3.
so, by monotonicity, the integrals are arranged in the same order, (b) ≤ (c) ≤ (a).
Using geometry, their values are: (a) 2 × 1 = 2, (b) 12 × 2 × 1 = 1 and (c) π2 .
R π/2
Answer to Exercise 7. The sum is the right Riemann sum of 0 sin(x) dx on the
interval [0, π2 ] with n = 4 subintervals. Since sin(x) is decreasing on this interval,
the right Riemann sum is overestimating the integral, so it is bigger.
Answer to Exercise 8. It looks like the curve is closely tracing a linear function
1 + x on the interval [−1, 0] and a constant function 1 on the interval [0, 1], so the
best guess would be 21 + 1 = 1.5, i.e. (b).
150 3 Integrals
This statement is called the Fundamental Theorem of Calculus (FTC for short).
The notation in the middle F(x)|ba is another way to write F(b) − F(a), which
will be very convenient when using this formula. So far we have mostly used this
formula to find the change
Rb
F(b) − F(a) of some quantity F(x) by either computing
the definite integral a f (x) dx using areas (for simple enough graphs of f (x)) or
approximating it by Riemann sums. Below we will start discussing a different way
to use the FTC, by systematically guessing what the function F(x) could be if we
know its derivative f (x). But first let us solve a couple of simple problems using
the FTC and emphasizing the relationship f (x) = F ′ (x).
(a) f (a)
Z b
(b) f (x) dx
a
Z b
1
(c) f (x) dx
b−a a
3.3 Fundamental Theorem of Calculus 151
The first term is just 704 × 630 = 443520, the area of rectangle. To compute the
second and third integral, we would like to use the FTC and write them as F(315)−
F(−315). Let us start with the second term. Can we guess what function F(x) has
the derivative F ′ (x) = e0.0093x ? We know that (ex )′ = ex , so we can try e0.0093x . Its
derivative (e0.0093x )′ = 0.0093e0.0093x is almost what we want, but it has an extra
factor 0.0093 because of the chain rule. This is not a big problem, because we
can divide by 0.0093 and take F(x) = 0.0093 1
e0.0093x . Now we can see that F ′ (x) =
e0.0093x as we wanted. Therefore, by the FTC,
T = 100 − 80e−t
The reason we put |x| inside the logarithm in the second integral is because 1x is
defined for negative x while ln(x) is defined only for positive x and, for negative
x, antiderivative of 1x will be ln(−x), which is the same as ln(x). All the other
antiderivatives can be checked by taking derivatives. We can also combine these
examples with the following simple rule:
1
Z Z
If f (x) dx = F(x) +C then f (b + mx) dx = F(b + mx) +C.
m
We have already used this in the above example when we guessed that an an-
tiderivative of emx is m1 emx . Here we can similarly check that, if F ′ (x) = f (x) then
1 ′ 1 1
F(b + mx) = (F(b + mx))′ = F ′ (b + mx)m = f (b + mx),
m m m
by using the chain rule in the middle, so indeed m1 F(b + mx) is an antiderivative of
f (b+mx). This rule is the simplest case of the so called integration by substitution
that we will study later. Another obvious rule is the linearity of indefinite integrals
saying that an antiderivative of the sum is the sum of antiderivatives.
1 1 t2
Z Z Z Z
3 + 4t + dt = 3 dt + 4 t dt + dt = 3t + 4 + ln |t| +C.
t t 2
Notice that we do not need to add +C to each indefinite integral separately, because
all those unknown constants can be combined into one constant. Of course, we can
simplify the answer as 3t + 2t 2 + ln |t| +C.
(b) Using linearity and the above (b + mx) rule:
6e−z/2 e3z
Z Z Z
(6e−z/2 − e3z ) dz = 6 e−z/2 dz − e3z dz = − +C
−1/2 3
3z
which can be simplified as −12e−z/2 − e3 +C.
(c) By linearity, using the above list, and using the (b + mx) rule:
4
Z Z Z
(sin(2x + 1) − 2 ) dx = sin(2x + 1) dx − 4 x−2 dx
x
cos(2x + 1) 4x−2+1
=− − +C
2 −2 + 1
Notice that when using the FTC we did not write +C in (sin(x) +C), because the
constant will cancel out anyway: (F(b) +C) − (F(a) +C) = F(b) − F(a).
To find A2 , we know that −A2 is the definite integral on [ π2 , 2.5], so
Z 2.5 2.5
−A2 = cos(x) dx = sin(x) = sin(2.5) − sin(π/2) = −0.4015.
π/2 π/2
Exercise 4. Find the total area between the graph of y = 12 x(x − 1)(x − 3) and the
x-axis on the interval [0, 2].
Initial Value Problem. Given the rate of change f (x) of some quantity F(x),
sometimes we want to find F(x) given additional information that F(x0 ) = y0 ,
which is also called the initial value problem. If we can find any antiderivative
G(x) of f (x) then G(x) +C is also an antiderivative so, if we want an antiderivative
to pass through a point (x0 , y0 ), we can plug it in, y0 = G(x0 ) + C, and find the
constant C = y0 − G(x0 ).
FTC-2: Reconstruction Theorem. Another way toRsolve the initial value prob-
lem F ′ (x) = f (x), F(x0 ) = y0 , is to take the statement ab f (x) dx = F(b) − F(a) of
the FTC and replace a by x0 and b by x:
Z x
f (t) dt = F(x) − F(x0 ).
x0
The FTC rewritten in this form is known as the second form of the FTC or the
Reconstruction Theorem. In this statement, we are thinking of the upper limit b as
an independent variable x, and the right hand side y0 + xx0 f (t) dt gives us a specific
R
Solution: First of all, notice that here we departed from the convention to call the
derivative f (x) and antiderivative F(x). Instead, here f ′ (x) is the derivative of f (x)
and f (x) is the antiderivative of f ′ (x) such that f (1) = 4. We see that f ′ (x) = −2
on the interval [−4, −2) which means that f (x) has slope −2 there, f ′ (x) = 3 on
the interval (−2, 1) where the slope of f (x) is 3, and f ′ (x) = −1 on the interval
(1, 4) where the slope of f (x) is −1.
In the middle figure above we sketch a continuous function f (x) with such
slopes on these three intervals. We specified that f (x) is continuous, so it does
not have jumps at x = −2 and x = 1, but the derivative is not defined at those
points. We labelled this graphs by y = f (x) +C in the middle figure, because it can
be shifted vertically by any constant C if we only know its derivative f ′ (x).
However, because we are given that f (1) = 4, we can now fix the position of the
function f (x) as depicted in the right figure above. In this case, each piece of the
function is linear so we could easily find its formula if we wanted to: f (x) = −x + 5
on [1, 4], f (x) = 3x + 1 on [−2, 1], and f (x) = −2x − 9 on [−4, −2].
Exercise 7. Given f (x) in the figure on the right,
which function below could be its antiderivative
such that F(0) = 1?
(a) (b)
(c) (d)
3.3 Fundamental Theorem of Calculus 157
Rx
The FTC tells us that, if f (x) = F ′ (x) then a f (t) dt = F(x) − F(a). As above,
we replaced the upper limit b by a variable x, because, for example, we want to
compute this definite integral for any Rupper limit x. If we take the derivative of
d x ′ ′
both sides with respect to x, we get dx a f (t) dt = (F(x) − F(a)) = F (x) = f (x).
This gives us yet another consequence of the FTC:
Z x
d
f (t) dt = f (x).
dx a
This
R x formula is very useful, because even if we do not know what the integral
a f (t) dt is, we know that
R its derivative is f (x), so we can deduce some of the
properties of this integral ax f (t) dt.
Next two problems will refer to the following figures.
Rx
Example 8. Suppose that h(x) = 0 f (t) dt with f (t) in the left figure above.
(a) What is h(0) and h′ (1)? (c) Where is the global minimum
(b) On what interval is h(x) concave up? and maximum of h(x) on [0, 2]?
(b) Since h′ (x) = f (x) and h(x) is concave up on the interval where its derivative
is increasing, this happens on the interval [0, 0.5] where f (x) is increasing.
(c) Since h′ (x) = f (x) is positive on [0, 1] and negative on [1, 2] in the figure,
h(x) is increasing on [0, 1] and decreasing on [1, 2]. It means that the maximum is
at x = 1 and minimum is at one of the endpoints x = 0 or x = 2. By FTC, h(x)
increases on [0, 1] by the area above the x-axis in the figure and decreases on [1, 2]
by the area below the x-axis in the figure. Since the area below the x-axis is bigger,
h(x) will decrease more on [1, 2] than it will increase on [0, 1]. This means that the
minimum will be at x = 2.
158 3 Integrals
Rx
Exercise 8. Suppose that h(x) = 0 f (t) dt with f (t) in the right figure above.
(a) What is h(0) and h′ (−1)? (c) Where is the global minimum
(b) On what interval is h(x) concave up? and maximum of h(x) on [−1, 1]?
First of all, in this formula we replaced both the lower limit a and upper limit b by
some functions a(x) and b(x). It might look intimidating, but all we did was apply
R b(x)
the chain rule. Indeed, the FTC tells us that a(x) f (t) dt = F(b(x)) − F(a(x)) and
when we differentiate this equation, we simply apply the chain rule twice,
Z b(x)
d ′
f (t) dt = F(b(x)) − F(a(x))
dx a(x)
= F ′ b(x) b′ (x) − F ′ a(x) a′ (x)
R x2 t 4
Example 9. Compute the derivative of 2x e dt.
Solution: By the above formula,
Z x2
d 4 2 )4 4 8 4
e−t dt = e−(x (x2 )′ − e−(2x) (2x)′ = 2xe−x − 2e−16x .
dx 2x
R sin(x)
Exercise 9. Compute the derivative of cos(x)
ln(1 + t 2 ) dt.
A playlist with an overview of the FTC can be found in the footnote link.4
You can also play around with the following Geogebra example to make sure you
understand how the integral changes as a function of the upper limit.5
4 https://www.youtube.com/playlist?list=PLYxPH73Uem-QmJI2fdsCtRww-oYMIXUOx.
5 https://www.geogebra.org/m/fa2w8qjy
3.3 Fundamental Theorem of Calculus 159
Answer
R √ to Exercise 3.
2
(a) ( 3u + 1 − 2u+1 ) du = 29 (3u + 1)3/2 − ln |2u + 1| +C.
(b) R (1 − 6e−s/3 ) ds = s + 18e−s/3 +C.
R
1 x3 3x
x(x − 1)(x − 3) = − 2x2 + .
2 2 2
Its indefinite integral is
Z 3
x 3x x4 2x3 3x2
− 2x2 + dx = − + +C
2 2 8 3 4
so
Z 1 3 x4 2x3 3x2 1 1 2 3
x 3x 5
A1 = − 2x2 +
dx = − + = − + = ,
0 2 2 8 3 4 0 8 3 4 24
Z 2 3 x4 2x3 3x2 2
x 3x 13
−A2 = − 2x2 + dx = − + =− ,
1 2 2 8 3 4 1 24
160 3 Integrals
5
and the total area is A1 + A2 = 24 + 13
24 =
3
4 = 0.75.
Answer to Exercise 7. First of all, we can eliminate (b), because the derivative
f ′ (0) is defined at x = 0 so the function cannot have a jump there. Next, we can
eliminate (d), because the function is not equal to 1 at x = 0, but we must have
F(0) = 1. Between (a) and (c), the difference is the slope between x = −4 and
x = −2. Since f (x) = −2 on that interval, the slope should be −2, which means
that (a) could be the answer. Why did we say “could be”? Because in this problem
we did not ask for F(x) to be continuous and the derivative f (x) is not defined at
x = −2, so if F(x) could have a jump then the linear piece on the interval [−4, −2]
could also be shifted vertically. If we asked for a continuous antiderivative then
this is the answer.
Answer to Exercise 9.
Z sin(x)
d
ln(1 + t 2 ) dt = ln(1 + sin2 (x)) cos(x) − ln(1 + cos2 (x))(− sin(x))
dx cos(x)
= ln(1 + sin2 (x)) cos(x) + ln(1 + cos2 (x)) sin(x).
3.4 Application of FTC: differential equations 161
dy
= f (x, y).
dx
dy
= f (x, y), y(x0 ) = y0
dx
is called an initial value problem (IVP). This equation is like a puzzle where an
unknown function y = y(x) could appear on both sides of the equation. On the left
dy
hand side its derivative dx = y′ (x) appears, and on the right hand side y(x) appears
inside some formula f (x, y) = f (x, y(x)) that can also depend on x. Here are some
examples of such equations:
dy dy x dy dy dy
= xy, = , = x2 + y2 , = 1 + y2 , = cos(x).
dx dx y dx dx dx
To solve such an equation means to find possible functions y = y(x) that satisfy it,
meaning that if we plug it in on both sides of the equation we will get equality.
the right hand side does not have y in it, so the equation simply tells us that the
derivative of y(x) is equal to cos(x) or, in other words, y(x) is an antiderivative of
cos(x), so y(x) = sin(x) +C. Given a differential equation of this easier form
dy
= f (x)
dx
if we can find one particular antiderivative F(x) of f (x) then y = F(x) +C is called
a general solution of this equation. Such equation with an additional information
about some initial value
dy
= f (x), y(x0 ) = y0
dx
Rx
is also called an initial value problem (IVP) and its solution F(x) = y0 + x0 f (t) dt
is called the solution of this initial value problem.
dA
= 1600te−4t , A(0) = 50.
dt
Let us show that A(t) = 150−100(1+4t)e−4t is the solution of this IVP. The initial
value matches: A(0) = 150 − 100(1 + 0)e0 = 50. By the product rule,
dA
= 0 − 100(4)e−4t − 100(1 + 4t)e−4t (−4) = 1600te−4t ,
dt
so the differential equation is also satisfied. OfR course, this solution comes from
the Reconstruction theorem, A(t) = 50 + 1600 0t se−4s ds, and we will later learn
how to find this integral using integration by parts technique, so we will learn how
to solve this problem without somebody giving us the answer to check.
Motion with constant acceleration. In the first section in this chapter we have
solved several problems about a linear motion with constant acceleration, in which
case the velocity v(t) was a linear function of time. We have used the area under
the graph of velocity to determine the position change, or displacement, but now
we can also find a general formula for position y(t) using the FTC.
Solution: (a) The acceleration is due to gravity, so it is −g = −10 m/s2 . The minus
sign is because we consider upward as the positive direction. Because acceleration
is the derivative of velocity, dv
dt = −g, and because the initial velocity is v(0) = 5,
we can solve this initial value problem by using the FTC,
Z t
v(t) = 5 + (−g)ds = 5 − gt = 5 − 10t.
0
Then, because vertical velocity is the derivative of height, dy dt = 5 − gt, and the
initial height is y(0) = 1, we can again solve this initial value problem using the
FTC-2,
t2
Z t
y(t) = 1 + (5 − gs)ds = 1 + 5t − g = 1 + 5t − 5t 2 .
0 2
Exercise 3. A coin is tossed straight up into the air from the height of 1 meters
and it reaches the maximum height of 3 meters. What was the original velocity v0 ?
Suppose that gravity g = 10 m/s2 .
164 3 Integrals
The last equality −| cos(x)| = cos(x) is true because we limit ourselves to the in-
terval π2 ≤ x ≤ 3π
2 where cos(x) is negative.
dy
Answer to Exercise 2. (a) The notation dx indicates that y = y(x) is a function
of x, so x is the independent variable. This means that the differential equation
dy 2 3 ′ 2 3
dx = x + 4x tells us that the derivative y (x) is precisely x + 4x , so y(x) must
be an antiderivative of x + 4x . A general antiderivative of x + 4x3 is y(x) =
2 3 2
x3 4
3 + x +C, which can be also called the general solution of the above differential
equation. Since we are given the initial value y(0) = 5, we can plug it in, y(0) =
03 4 x3 4
3 + 0 +C = C = 5, so C = 5 and y(x) = 3 + x + 5.
(b) Here, t is the independent variable, and x = x(t) is an antiderivative of −10t +
3. A general antiderivative is −5t 2 + 3t + C, and x(0) = 25 = C, so C = 25 and
x(t) = −5t 2 + 3t + 25.
v0 v 2
0 v2 v2 v2
3 = 1 + v0 −5 = 1+ 0 − 0 = 1+ 0 .
10 10 10 20 20
√
Solving for v0 , we get that v20 = 40, or v0 = 40 ≈ 6.32 m/s.
6 Image by Bill Abbott, https://www.flickr.com/photos/wbaiv/51673118672/
3.4 Application of FTC: differential equations 165
t 2 90 t 2 9t 2
x(t) = a = · = .
2 65 2 13
2
Then at lift-off time t = 65 the distance is x(65) = 9(65)
13 = 2925 meters, so the
runway should be at least this long.
If the lift-off speed was given as 200 mph, the units of speed (miles per hour)
and time (seconds) would not match, so we would have to first convert 200 mph to
distance per second, for example, 89.4 m/s or 293.3 ft/s.
166 3 Integrals
In the case of definite integrals, this formula looks like this: if F ′ (x) = f (x) then
Z b x=b
f (g(x))g′ (x) dx = F(g(x)) = F(g(b)) − F(g(a)).
a x=a
How do we know that these formulas are applicable in a particular problem? The
key is to recognize what the presence of some function g(x) and its derivative g′ (x)
inside the integral, which takes some practice, but luckily there are several typical
patterns.
Power substitution. Our first example will be when the function g(x) is of the
following type:
Solution: Once we choose the right substitution, the solution is usually relatively
short. However, in this first example let us explain how things work step by step.
• First, we need to guess what the function g(x) could be and call it by a new
name, for example, u, w, y, etc. We write
u = g(x)
du = g′ (x)dx.
√ ′
In this problem, du √1 √1
dx = (2 x + 1) = x so du = x dx. We can see the presence
of √1x in the integral, which is an indication that we are on the right track.
• We replace all the appearances of g(x) by u and replace g′ (x) dx by du:
Z ′ Z
f g(x) g (x) dx = f (u) du.
|{z} | {z }
u du
In the case of the definite integral, we also replace the limits: x = a becomes
u = g(a) and x = b becomes u = g(b):
Z b Z g(b)
f g(x) g′ (x) dx =
f (u) du.
a |{z} | {z } g(a)
u du
√ √
Here, x = 1 becomes u = 2 1 + 1 = 3 and x = 4 becomes u = 2 4 + 1 = 5,
Z 4
√ 1 1 Z 5
1
cos 2 x + 1 · · √ dx = cos(u) du.
1 | {z } 3 x 3 3
u | {z }
du
It is important that, after we made this substitution, there is no more x left in the
integral. Everything is now in terms of the new variable u.
• Now that the integral has been simplified, hopefully at this step we can find the
antiderivative F(u) of f (u):
Z
f (u) du = F(u) +C.
In the case of the definite integral, we can also apply the FTC:
Z g(b) u=g(b)
f (u) du = F(u) = F(g(b)) − F(g(a)).
g(a) u=g(a)
1 1
Z
cos(u) du = sin(u) +C
3 3
and, in the case of the definite integral,
Z 5 u=5
1 1 1 1
cos(u) du = sin(u) = sin(5) − sin(3).
3 3 3 u=3 3 3
• The definite integral has already been computed in the last step, but in the case
of indefinite integral it is very important to substitute u = g(x) back:
because our original integral f (g(x))g′ (x) dx was in terms of the x variable, so
R
Comment. In the definite integral, a common mistake is not to change the limits
Z 4 √ Z 4
cos(2 x + 1) 1
√ dx = cos(u) du = . . .
1 3 x 1 3
1 R
It remains to estimate the integral −1 f (u) du, which is the area under its graph
on the interval [−1, 1]. We do not have all the information to compute this area
exactly but, looking at the graph, we see that f (x) approximately follows a straight
line y = 1 + x between x = −1 and x = 0, and it is approximately constant y = 1
between x = 0 and x = 1. This means that the area is approximately 12 + 1 = 23 and
the original integral is 01 f (2x2 − 1)x dx ≈ 41 × 32 = 38 .
R
R1
Exercise 2. Estimate the integral 0 x f (3x2 − 1) dx given the following table:
x −1 −0.5 0 0.5 1 1.5
f (x) 0 -1.75 1 3.75 5 3.75
R π/2
Example 3. Compute the integral 0 sin2 (x) cos(x) dx.
Solution: Here we see sin(x) squared (i.e. it is inside the square function) and we
see its derivative cos(x), so it is a good idea to make a substitution u = sin(x). Then
170 3 Integrals
du
dx = cos(x) and du = cos(x) dx. With this substitution,
u3 sin3 (x)
Z Z
sin2 (x) cos(x) dx = u2 du = +C = +C.
3 3
Since we already found the indefinite integral, for the definite integral we can skip
the intermediate substitution steps and use the FTC directly:
u3
Z 1 u=1 1 1
Z π/2
sin2 (x) cos(x) dx = u2 du = = −0 = .
0 0 3 u=0 3 3
R π/4 sin(x)
Exercise 3. Compute the integral 0 cos2 (x)
dx.
R ex −e−x
Example 4. Compute indefinite integral ex +e−x
dx.
Solution: If we make a substitution u = ex + e−x then du
dx = (ex + e−x )′ = ex − e−x
and du = (ex − e−x )dx, so
ex − e−x 1
Z Z
dx = du = ln |u| +C = ln(ex + e−x ) +C.
ex + e−x u
We do not need to write |ex + e−x | because ex + e−x is already positive.
R e2x
Exercise 4. Compute indefinite integral 4+e2x
dx.
R1
cos eh(x) eh(x) h′ (x) dx given the table:
Exercise 5. Compute the integral 0
Exercise 6. What differentiation technique does the substitution rule come from?
(a) power rule (b) product rule (c) chain rule (d) quotient rule
3.5 Techniques of integration: substitution rule 171
eu e e3
Z 1 Z 1 u=1
−2x2 +3 1
xe dx = − eu du = − =− + .
0 4 3 4 u=3 4 4
2 R
It remains to estimate the integral −1 f (u) du. Given the table, we can use the left
Riemann sum with n = 6 subintervals on [−1, 2] of length 0.5 each:
Z 2
f (u) du ≈ (0 − 1.75 + 1 + 3.75 + 5 + 3.75) × 0.5 = 5.875.
−1
sin(x) 1 1 1
Z Z
dx = − du = +C = +C.
cos2 (x) u2 u cos(x)
Using the FTC:
Z π/4
sin(x) 1 x=π/4 √
dx = = 2 − 1.
0 cos2 (x) cos(x) x=0
du
Answer to Exercise 4. If we make a substitution u = 4 + e2x then dx = 2e2x and
e2x dx = 12 du, so
e2x 1 1 1 1
Z Z
dx = du = ln |u| +C = ln(4 + e2x ) +C.
4 + e2x 2 u 2 2
Answer to Exercise 5. We can substitute u = h(x) first, but this will require us to do
another substitution later on (try it). Instead, if we make a substitution u = eh(x) then
172 3 Integrals
du
dx = eh(x) h′ (x) and du = eh(x) h′ (x) dx. Also, from the table, x = 0 will be replaced
π π
by u = eh(0) = eln 4 = π4 , and x = 1 will be replaced by u = eh(1) = eln 2 = π2 , so
Z 1 Z π/2 u=π/2
cos eh(x) eh(x) h′ (x) dx =
cos(u) du = sin(u)
0 π/4 u=π/4
1
= sin(π/2) − sin(π/4) = 1 − √ .
2
Answer to Exercise 6. Substitution rule is the reverse of the chain rule, so (c).
3.6 Techniques of integration: integration by parts 173
The left hand side we is equal to u(x)v(x) because the antiderivative of a derivative
is the function itself. If we replace the left hand side by u(x)v(x) and move the last
integral u(x)v′ (x) dx to the other side of the equation, we get
R
Z Z
′
u (x)v(x) dx = u(x)v(x) − u(x)v′ (x) dx.
This formula is called integration by parts, and its definite integral version is
Z b x=b
Z b
u(x)v′ (x) dx = u(x)v(x) − u′ (x)v(x) dx.
a x=a a
This formula relates one integral ab u(x)v′ (x) dx to another integral ab u′ (x)v(x) dx,
R R
and the idea is that in some cases the second integral is much easier to compute
than the first one. How do we know that these formulas are applicable in a given
problem? As usual, it takes some practice, but we will only focus on a few common
examples of the form (possibly with some constants)
u xn ln(x) u′ nxn−1 1
x
& n+1
v′ sin(x), cos(x), e 1, xn
x v − cos(x), sin(x), ex x, xn+1
and we will see that in all these cases u′ (x)v(x) will be simpler than u(x)v′ (x).
Solution: According to the above table, we take u(x) = x and v′ (x) = cos(2x). To
use the integration by parts formula, we need to find u′ (x) and v(x). First, u′ (x) = 1.
To find v(x) we need to find an antiderivative of v′ (x) = cos(2x). Let us recall the
following substitution formula that will help us many times below:
1
if antiderivative of f (x) is F(x) then antiderivative of f (kx) is F(kx).
k
x 1
Z
x cos(2x) dx = sin(2x) + cos(2x) +C.
2 4
For the definite integral, we can use the FTC after we computed the indefinite
integral:
x 1 x=π/4 π 1
Z π/4
x cos(2x) dx = sin(2x) + cos(2x) = − .
0 2 4 x=0 8 4
x x=π/4 π/4 1
Z π/4 Z
x cos(2x) dx = sin(2x) − sin(2x) dx
0 2 x=0 0 2
1
Z π/4
π
= − sin(2x) dx.
8 0 2
π 1 x=π/4 π 1
= + cos(2x) = − .
8 4 x=0 8 4
When using integration by parts, it is often easier to find the indefinite integral first
and plug in the limits at the very end.
3.6 Techniques of integration: integration by parts 175
Solution: According to the above table, we take u(t) = t and v′ (t) = e2t . Then
u′ (t) = 1 and v(t) = 12 e2t and, using integration by parts,
t 1 2t t 1
Z Z
te2t dt = e2t − e dt = e2t − e2t +C.
2 2 2 4
For the definite integral, we can use the FTC after we computed the indefinite
integral:
e2 1
Z 1 t=1
t 1
te2t dt = e2t − e2t = + = 2.0973 . . .
0 2 4 t=0 4 4
Solution: In this integral we do not have a product of two functions like in a typical
integration by parts examples, but we can nevertheless use integration by parts.
If we take u(x) = ln(x) then what is v′ (x)? In this case it is simply v′ (x) = 1, so
u(x)v′ (x) = ln(x) × 1 = ln(x). Then u′ (x) = 1x , v(x) = x and the integration by parts
formula gives
1
Z Z Z
ln(x) dx = ln(x)x − x dx = x ln(x) − 1 dx = x ln(x) − x +C.
x
Comment. We did not include any constant inside the logarithm ln(x) in the above
problems because ln(kx) = ln(k) + ln(x) so we can separate the constant factor k
into a separate term. ⊔
⊓
R2 R2 ′
Example 4. Find 0 f (x)g′ (x) dx = −4 given that 0 f (x)g(x) dx = −4 and
Using the table and given integral, this is equal to 2(−1) − 1(0.5) − (−4) = 1.5.
R3
Exercise 4. Find 1 x f ′′ (x) dx given the following table
x f (x) f ′ (x)
1 1 −1.5
3 2 1
R
Example 5. Compute sin(x) cos(x) dx using:
(a) substitution (b) integration by parts (c) trig identity
du
Solution: (a) Let us take u = sin(x). Then dx = cos(x), du = cos(x)dx, and
u2 sin2 (x)
Z Z
sin(x) cos(x) dx = u du = +C = +C.
2 2
(b) If we take u(x) = sin(x), v′ (x) = cos(x), u′ (x) = cos(x), v(x) = sin(x) then
the integration by parts formula gives
Z Z
sin(x) cos(x) dx = sin2 (x) − cos(x) sin(x) dx.
R
We see the same indefinite
R integral sin(x) cos(x) dx on both sides of the equation,
1 2
so solving for it we get sin(x) cos(x) dx = 2 sin (x). We found one antiderivative,
and the general antiderivative is sin(x) cos(x) dx = 2 sin2 (x) +C as in (a).
1
R
3.6 Techniques of integration: integration by parts 177
Exercise 5. Find the mistake in the following argument that shows that 0 = 1.
Proof: If we take u(x) = sin(x), v′ (x) = cos(x), u′ (x) = cos(x), v(x) = sin(x) then
the integration by parts formula gives
Z Z
sin(x) cos(x) dx = sin2 (x) − cos(x) sin(x) dx.
R
Next, let us integrate the last integral cos(x) sin(x) dx by parts taking
If we plug this into the first integration by parts step above, we get
Z Z
sin(x) cos(x) dx = sin2 (x) − − cos2 (x) − sin(x) cos(x) dx
Z Z
2 2
sin(x) cos(x) dx = sin (x) + cos (x) + sin(x) cos(x) dx.
| {z }
1
R
If we cancel the integrals sin(x) cos(x) dx on the left and right hand side, we get
0 = 1. Where is the mistake?
Answer to Exercise 1. Take u(x) = x and v′ (x) = sin(3x), so that u′ (x) = 1 and
v(x) = − 31 cos(3x). Integration by parts gives,
x 1 x 1
Z Z
x sin(3x) dx = − cos(3x) + cos(3x) dx = − cos(3x) + sin(3x) +C.
3 3 3 9
The definite integral is
x 1 x=π/3
Z π/3 π
x sin(3x) dx = − cos(3x) + sin(3x) = .
0 3 9 x=0 9
Answer to Exercise 2. According to the above table, we take u(t) = t and v′ (t) =
e−t/4 . Then u′ (t) = 1 and v(t) = −4e−t/4 and, using integration by parts,
Z Z
te−t/4 dt = −4te−t/4 + 4e−t/4 dt = −4te−t/4 − 16e−t/4 +C.
For the definite integral, we can use the FTC after we computed the indefinite
integral:
Z 4 t=4
32
te−t/4 dt = −4te−t/4 − 16e−t/4 =− + 16 = 4.2279 . . .
0 t=0 e
x3 1 x3 x3 x2 x3 x3
Z Z Z
x2 ln(x) dx = ln(x) − dx = ln(x) − dx = ln(x) − +C.
3 x 3 3 3 3 9
The definite integral is
Z 2 x3 x3 x=2 8 7
x2 ln(x) dx = ln(x) − = ln(2) − .
1 3 9 x=1 3 9
Answer to Exercise 4. Take u(x) = x and v′ (x) = f ′′ (x). Then u′ (x) = 1 and v(x) =
f ′ (x). Using the integration by parts formula,
Z Z
x f ′′ (x) dx = x f ′ (x) − f ′ (x) dx = x f ′ (x) − f (x) +C.
In the second equality we used that an antiderivative of f ′ (x) is f (x). Then the
definite integral is
Z 3 x=3
x f ′′ (x) dx = x f ′ (x) − f (x) = (3 f ′ (3) − f (3)) − ( f ′ (1) − f (1)).
1 x=1
The mistake was in cancelling the two integrals. Recall that indefinite integrals,
a.k.a. antiderivatives, are defined up to a constant +C. The above equation simply
says that adding +1 to any antiderivative of sin(x) cos(x)
R is also an antiderivative of
sin(x) cos(x). When we say that indefinite integral f (x) dx is equal to something,
there is always a hidden +C in the statement.
f ′′ (a)
P2 (x) = f (a) + f ′ (a)(x − a) + (x − a)2
2
Solution: We want the first derivative to be the same, P2′ (a) = f ′ (a), and the second
derivative to be the same, P2′′ (a) = f ′′ (a), at x = a. If we compute the first two
derivatives of the above parabola P2 (x):
f ′′ (a)
P2′ (x) = f ′ (a) + 2(x − a) = f ′ (a) + f ′′ (a)(x − a), P2′′ (x) = f ′′ (a),
2
we can see that P2′ (a) = f ′ (a) + f ′′ (a)(a − a) = f ′ (a) and P2′′ (a) = f ′′ (a). Notice
how 2 in the denominator cancelled 2 that came from the derivative of (x − a)2 ,
which explains why we divided by 2 in the last term of P2 (x). The derivatives
would not match otherwise.
Example 2. Compute and graph the first and second degree Taylor polynomial
centered at x = 0 for the functions y = ex and y = cos(x).
Solution: If f (x) = ex then f ′ (x) = ex and f ′′ (x) = ex , so f (0) = f ′ (0) = f ′′ (0) = 1.
2
By definition, P1 (x) = 1 + x and P2 (x) = 1 + x + x2 . The graph is in the left figure
below.
If f (x) = cos(x) then f ′ (x) = − sin(x) and f ′′ (x) = − cos(x), so f (0) = 1,
2
f ′ (0) = 0 and f ′′ (0) = −1. By definition, P1 (x) = 1 and P2 (x) = 1 − x2 . The graph
is in the right figure below.
(a) (b)
(c) (d)
where h is the increment ∆x, in this case h = 0.25. Before we explain this formula,
let us use in this problem:
f (0.75) − 2 f (0.5) + f (0.25) 0.75 − 2 · 1 + 1.75
f ′′ (0.5) ≈ = = 8,
0.252 0.252
′′
so our estimate of the last coefficient is r = f (0.5)
2 ≈ 82 = 4. We estimate that the
Taylor polynomial is P2 (x) ≈ 1 − 2(x − 0.5) + 4(x − 0.5)2 .
Finally, the reason behind the above estimate of the second derivative f ′′ (x)
is that f (x+h)−
h
f (x)
can be viewed as an estimate of f ′ (x + h2 ), f (x)−hf (x−h) can be
viewed as an estimate of f ′ (x − h2 ), and then the above formula becomes
f ′ (x + 2h ) − f ′ (x − h2 )
f ′′ (x) ≈
h
which is exactly how we would estimate the second derivative f ′′ (x) if we knew
the values of the first derivative.
Exercise 3. Estimate the coefficients of the second degree Taylor polynomial
P2 (x) = p + q(x − 2) + r(x − 2)2 centered at x = 0.5 for the function f (x) given
the values in the table:
x 1.5 1.75 2 2.25 2.5
f (x) 1.75 2 1.75 1 -0.25
(n)
Pn (a) = f (a), Pn′ (a) = f ′ (a), Pn′′ (a) = f ′′ (a), . . . , Pn (a) = f (n) (a).
The following figures show several Taylor polynomials centered at x = 0 for three
functions: ex , cos(x), and sin(x). We can see that the approximations get better and
better on wider and wider intervals as the degree n increases.
184 3 Integrals
Let us now consider several classic examples of Taylor polynomials. First, let
us take a look at the exponential function f (x) = ex . Because all the derivatives of
ex are equal to ex , we get that f (n) (x) = ex and f (n) (0) = e0 = 1, and the Taylor
polynomial of ex of degree n at x = 0 is
x2 x3 xn
Pn (x) = 1 + x + + +...+ .
2! 3! n!
we see that the fourth derivative is equal to the original function cos(x), so after
that the same pattern of cos(x), − sin(x), − cos(x), sin(x) will keep repeating. When
we plug in x = 0, we see that cos(0) = 1, − sin(0) = 0, − cos(0) = −1, sin(0) = 0,
so the derivatives at x = 0 will follow a repeating pattern of 1, 0, −1, 0, etc. That
means that the Taylor polynomials of cos(x) at x = 0 will have a pattern
x2 x4 x6 x8 x10
1− + − + − +...
2! 4! 6! 8! 10!
Notice that all odd powers of x are missing because of the coefficients ± sin(0) = 0.
We can stop at any degree to get the Taylor polynomial Pn (x) of that degree n:
x2 x2 x2 x4
P1 (x) = 1, P2 (x) = 1 − , P3 (x) = 1 − , P4 (x) = 1 − + , . . .
2! 2! 2! 4!
Notice how P3 (x) is equal to P2 (x). Again, that is because the coefficient in front
of x3 is zero. That is why in the above figure we plotted only even degree Taylor
polynomials P2 (x), P4 (x) and P6 (x).
3.7 Approximating integrals using Taylor polynomials 185
Solution: By definition,
1 2 2 3 3 4 12 5
P5 (x) = −2 + 2x − x + x − x + x .
2! 3! 4! 5!
We can simplify the last three coefficients,
2 2 1 3 3 1 12 3 · 4 1
= = ; − =− =− ; = = ,
3! 1 · 2 · 3 3 4! 1 · 2 · 3 · 4 8 5! 1 · 2 · 3 · 4 · 5 10
so the simplified form of the Taylor polynomial is
x2 x3 x4 x5
P5 (x) = −2 + 2x − + − + .
2 3 8 10
Exercise 5. Find f (5) (−1) and f (7) (−1) given the Taylor polynomial of f (x) of
7
degree 7 centered at x = −1: P7 (x) = −2 + (x + 1) − 3(x + 1)3 + (x + 1)4 − (x+1)
6! .
x2 x4
cos(x) ≈ 1 − +
2! 4!
186 3 Integrals
(2x2 )2 (2x2 )4 2
cos(2x2 ) ≈ 1 − + = 1 − 2x4 + x8 .
2! 4! 3
Actually, this approximation is very good on the interval [−1, 1] as we can see in
the figure above, where cos(2x2 ) is the black solid line and 1 − 2x4 + 32 x8 is the
blue dashed line. Integrating this approximation gives
Z 1 Z 1
2
cos(2x2 ) dx ≈ 1 − 2x4 + x8 dx
−1 −1 3
2x 5 9
2x x=1
= x− +
5 27 x=−1
2 2 2 2
= 1− + − −1 + − = 1.3481 . . .
5 27 5 27
Actually, one
R 1 can check using a computer or a graphical calculator that the original
integral is −1 cos(2x2 ) dx = 1.3351 . . . , so the approximation we obtained is pretty
good. To make it even better we could have used a Taylor polynomial with a few
more terms, which would give a better approximation of our function.
2
Exercise 6. Write down a Taylor polynomial of e−x at x = 0 with four non-zero
R 1 −x2
terms and use it to approximate 0 e dx.
x3 x3
sin(x) ≈ x − = x− .
3! 6
When we divide both sides by x, we get
sin(x) x2
≈ 1− .
x 6
2
In the figure above, sin(x) x
x is the black solid line and 1 − 6 is the blue dashed line,
and the approximation looks pretty good. Integrating this approximation gives
x2
Z 1 Z 1
sin(x)
dx ≈ 1− dx
0 x 0 6
x3 x=1 1 17
= x− = 1− −0 = = 0.9444 . . .
18 x=0 18 18
3.7 Approximating integrals using Taylor polynomials 187
One can check using a computer or a graphical calculator that the original integral
is 01 sin(x)
R
x dx = 0.9460 . . ., so the approximation is pretty good. There is one subtle
point in this problem: sin(x)
x is not defined at x = 0 because we divide by 0. How-
ever, according to the above figure, sin(x)
x approaches 1 as x approaches 0, so we
implicitly assumed that that function we integrate is equal to 1 at x = 0.
Answer to Exercise 1. There are two ways we can solve this problem. Since we
know that P2 (x) and f (x) have the same value and first two derivatives at x = 0 (so
in this case a = 0), we can just compute P2′ (0) = 2 − 2x and P2′′ (0) = −2 and plug
in x = 0 to get f (0) = P(0) = −3, f ′ (0) = P2′ (0) = 2, and f ′′ (0) = P2′′ (0) = −2.
A better way to solve this problem without any calculations is to compare the
′′
definiton of P2 (x) = f (0) + f ′ (0)x + f 2(0) x2 centered at x = 0 with the formula
given to us, P2 (x) = −3 + 2x−1x2 , and make sure that all the coefficients match:
f ′′ (0)
f (0) = −3, f ′ (0) = 2, = −1.
2
This immediately gives us f (0) = −3, f ′ (0) = 2, and f ′′ (0) = −2.
Answer to Exercise 2. (a) p > 0 because p = f (0) > 0, q > 0 because q = f ′ (0) > 0
since the slope is positive, r < 0 because r = f ′′ (0)/2 < 0 since the function is
concave down at x = 0. (b) p > 0, q < 0, r < 0. (c) p > 0, q > 0, r > 0. (d) p > 0,
q < 0, r > 0.
f (2.25)− f (1.75)
Answer to Exercise 3. p = f (2) = 1.75, q = f ′ (2) ≈ 0.5 = 1−2
0.5 = −2,
′′
and r = f 2(2) , where
we see that the fourth derivative is equal to the original function sin(x), so after that
the same pattern of sin(x), cos(x), − sin(x), − cos(x) will keep repeating. When we
plug in x = 0, we see that sin(0) = 0, cos(0) = 1, − sin(0) = 0, − cos(0) = −1, so
the derivatives at x = 0 will follow a repeating pattern of 0, 1, 0, −1 etc. That means
that Taylor polynomials will have a pattern
x3 x5 x7 x9 x11
x− + − + − +...
3! 5! 7! 9! 11!
Notice that all even powers of x are missing because of the coefficients ± sin(0) =
0. We can stop at any degree to get the Taylor polynomial Pn (x) of that degree n.
For example,
x3 x5 x3 x5
P5 (x) = x − + , P6 (x) = x − +
3! 5! 3! 5!
Notice how P6 (x) is equal to P5 (x). Again, that is because the coefficient in front
of x6 is zero.
f (7) (−1) 1 7!
=− =⇒ f (7) (−1) = − = −7.
7! 6! 6!
x2 x3
ex ≈ 1 + x + + ,
2! 3!
and replace x by −x2 ,
2 (−x2 )2 (−x2 )3
e−x ≈ 1 + (−x2 ) + +
2! 3!
x 4 x 6
= 1 − x2 + − .
2 6
2 4 6
In the figure above, where e−x is the black solid line and 1 − x2 + x2 − x6 is the
blue dashed line. Integrating this approximation gives
3.7 Approximating integrals using Taylor polynomials 189
x4 x6
Z 1 Z 1
2
e−x dx ≈ 1 − x2 +
− dx
0 0 2 6
x3 x5 x7 x=1
= x− + −
3 10 42 x=0
1 1 1
= 1− + − − 0 = 0.7428 . . . .
3 10 42
One can check using a computer or a graphical calculator that the original integral
2
is 01 e−x dx = 0.7468 . . . , so the approximation we obtained is pretty good.
R
x2 x4 x2 x4
cos(x) ≈ 1 − + = 1− +
2! 4! 2 24
When we subtract both sides from 1 and divide by
x2 , we get
1 − cos(x) 1 x2
≈ − .
x2 2 24
2
In the figure above, 1−cos(x)
x2
is the black solid line and 1
2
x
− 24 is the blue dashed
line. Integrating this approximation gives
x2
Z 1 Z 1
1 − cos(x) 1
dx ≈ −
dx
0 x2 0 2 24
x x3 x=1
= −
2 72 x=0
1 1
= − − 0 = 0.4861 . . .
2 72
One can check using a computer or a graphical calculator that the original inte-
gral is 01 1−cos(x)
R
x2
dx = 0.4863 . . .. Again, there is one subtle point in this problem:
1−cos(x)
x2
is not defined at x = 0 because we divide by 0. However, according to the
x2
approximation 1−cos(x)
x2
≈ 12 − 24 , this function approaches 12 as x approaches 0, so
we can implicitly assume that that function we integrate is equal to 12 at x = 0.
Answer to Exercise 8. If A(t) is the area at time t then f (t) = A′ (t) and, by the
FTC, Z Z1 1
A(1) = A(0) + f (t) dt = f (t) dt,
0 0
because the area was 0 at t = 0. Given f (1) = 2, f ′ (1) = −0.6, and f ′′ (1) = 0.18,
we can estimate f (t) using the second degree Taylor polynomial centered at t = 1,
190 3 Integrals
f ′′ (1)
f (t) ≈ P2 (t) = f (1) + f ′ (1)(t − 1) + (t − 1)2
2
= 2 − 0.6(t − 1) + 0.09(t − 1)2
Indefinite integrals.
R To find an
indefinite integral f (x) dx of some
function f (x), one can simply enter
“integral of f (x)” into the input bar.
Depending on the function f (x), the
output may vary, as we will see in the
examples below. In the example on the
right, the function f (x) is ex cos(x), and
the first line of the output is the answer
ex
Z
ex cos(x) dx = (cos(x)+sin(x))+C.
2
In this particular case, below some
x
plots of e2 (cos(x) + sin(x)), it also
gives alternative forms of the integral,
for example,
ex π
√ sin(x + ) +C
2 4
which is just another way to rewrite this function using trigonometric identities.
Below that, the Taylor polynomial of degree 4 is given,
1 x2 x4
+x+ − .
2 2 12
It says “series expansion of the integral at x = 0” because Taylor polynomials give
rise to the so called Taylor series that will be discussed in the later chapter. As
we can see, the output contains a lot of useful information without us even asking
for it. If we are simply looking for an antiderivative, it is useful to take a look at
alternative forms of the integral.
1
e(sin(1) + cos(1)) − 1 ≈ 1.3780
2
One can click on the answer to see
a more accurate decimal approxima-
tion 1.378024613547 . . . . The exact an-
swer 21 (e(sin(1) + cos(1)) − 1) actually
comes from an application of the FTC
using the indefinite integral found in the example above:
ex x=1 1
(cos(x) + sin(x)) = e(sin(1) + cos(1)) − 1.
2 x=0 2
This indefinite integral is also given in the output, if you scroll down.
sin(x)
Z π/4
dx.
0 cos2 (x)
Give the answer up to ten digits.
The good thing about definite inte-
grals is that they can be estimated using
Riemann sums (and other techniques of
numerical integration) even if its an-
tiderivative cannot be found explicitly,
so we can not apply the FTC. In the
example on the right, Wolfram Alpha
was unable
√ to find the indefinite inte-
2
gral of x + 1 e−x in terms of standard
mathematical functions, but it had no
problem computing the definite integral
from 0 to 1 with high accuracy. We will
discuss below how to use specific nu-
merical methods, such as the familiar
left and right Riemann sums.
Exercise 3. Find the definite integral 02 cos(x4 ) sin(x3 ) dx. What about indefinite
R
In the case of A2 and A3 that border the vertical asymptote, we will integrate
up to some point b before the vertical asymptote or from some point c after the
3.9 Improper integrals 197
vertical asymptote and then let those points b and c get closer and closer to this
vertical asymptote from the corresponding side:
Z b Z 4
A2 = lim f (x) dx, A3 = lim f (x) dx.
b→3− 2 c→3+ c
Notice how we wrote b → 3− and c → 3+, which is the notation for the left
and right limits. This notation is very important because it indicates that we
approach the vertical asymptote from a specific side without crossing it.
• If at least one of these pieces is infinite, we say that the integral R ∞
R
−∞ f (x) dx
diverges. If all of these pieces are finite, we say that the integral ∞ −∞ f (x) dx
converges and is equal to A1 + A2 + A3 + A4 . Of course, if a function was
negative on some interval, the corresponding piece could have a minus sign.
Before we consider more concrete examples, let us practice the above definition
first.
Solution: We divide the interval [0, ∞) into three “problematic” regions, [0, 1], [1, 2]
and [2, ∞), where the choice of 2 can be replaced by any other point bigger than 1.
Then the definite integral on each piece is defined as
Z a Z 2 Z c
lim f (x) dx, lim f (x) dx, lim f (x) dx.
a→1− 0 b→1+ b c→+∞ 2
The integral 0∞ f (x) dx diverges if at least one of these limits is not finite. If all
R
three are well defined and finite then we just add them up,
Z ∞ Z a Z 2 Z c
f (x) dx = lim f (x) dx + lim f (x) dx + lim f (x) dx.
0 a→1− 0 b→1+ b c→+∞ 2
Simple power functions. Our main concrete family of examples will be the
power functions
1
f (x) = p where p > 0.
x
These functions have a vertical asymptote at x = 0 and a horizontal asymptote
y = 0 as x → +∞ (see figures below), so we will consider separately the case of
the vertical asymptote on a finite interval [0, 1] and the case of the infinite interval
[1, ∞).
198 3 Integrals
R1 1
Example 2. Consider the improper integral 0 x p dx. Determine for which p > 0 it
converges and for which p > 0 it diverges.
Solution: By the definition above, we need to find the limit
Z 1 Z 1
1 1
dx = lim dx
0 xp a→0+ a xp
and determine when this limit is finite and when it is infinite. Let us start with the
case p = 1: Z 1
1 x=1
dx = ln(x) = ln(1) − ln(a) = − ln(a).
a x x=a
We know that ln(a) → −∞ when a approaches 0 from the right, so − ln(a) → +∞.
The area is infinite and the integral diverges when p = 1.
When p is not equal to 1, we can use the power rule:
x−p+1 a−p+1
Z 1 Z 1 x=1
1 1
p
dx = x−p dx = = − .
a x a −p + 1 x=a −p + 1 −p + 1
To see what happens when a → 0+, we have to separate into two cases: p > 1 and
p < 1. When p > 1 then p − 1 is positive and
1 a−p+1 1 a−(p−1) 1 1
− = − =− + → +∞
−p + 1 −p + 1 −(p − 1) −(p − 1) p − 1 (p − 1)a p−1
as a → 0+, because a p−1 → 0 in the denominator. This means that the integral
diverges when p > 1. Actually, we can see this without integrating because the
function x1p is bigger than 1x between x = 0 and x = 1 (blue line is above the green
line in the left figure) so the area will be bigger. We already computed that the area
is infinite when p = 1, so the area must also be infinite when p > 1.
Finally, when p < 1 then 1 − p is positive and
1 a−p+1 1 a1−p 1 1
− = − →= +0 =
−p + 1 −p + 1 1 − p 1 − p 1− p 1− p
3.9 Improper integrals 199
as a → 0+, because a1−p → 0 in the numerator. So the area is finite in this case
1
and is equal to 1−p . To summarize:
Comparison of integrals. RIn the above example, we could conclude without any
calculations that the integral 01 x1p dx diverges (or is infinite) when p > 1 because,
in this case, x1p is bigger that 1x on this interval, so the area will also be bigger. If we
want to know if an integral converges or diverges, we can often compare to simpler
integrals, or integrals that we have already computed.
• If area A1 is finite and A2 is smaller then it is also finite.
• If area A1 is finite and A2 is bigger then we cannot tell if it is finite or infinite.
• If area A1 is infinite and A2 is smaller then we cannot tell if it is finite or infinite.
• If area A1 is infinite and A2 is bigger then it is also infinite.
Below we will
R ∞ 1take a look
R 1 1 at several concrete examples of comparison with the
p-integrals 1 x p dx or 0 x p dx that we computed above, but first let us take a look
at some comparisons by looking at the graphs of functions.
(b) Near the vertical asymptote x = 0, the function h(x) is below k(x), so its area
is smaller. However, because 1∞ k(x) dx = ∞ and the area below k(x) is infinite, this
R
computed above.
R 5 3−2 cos(x)
Example 4. Does the improper integral 0 x2
dx converge or diverge?
√ √2
R∞
Example 5. Does the improper integral 10 x( x−1)
dx converge or diverge?
if all the limits exist and are finite. Here 1.5 can be replaced by any point in between
1 and 2, and 3 can be replace by any point bigger than 2. If even one of these four
limits is not finite then the integral diverges.
We know that ln(a) → ∞ when a → ∞ from the right, so the area is again infinite
and the integral diverges when p = 1.
When p is not equal to 1, we can again use the power rule:
x−p+1 a−p+1
Z a Z a x=a
1 1
p
dx = x−p dx = = − .
1 x 1 −p + 1 x=1 −p + 1 −p + 1
When p > 1 then p − 1 is positive and
a−p+1 1 a−(p−1) 1 1 1 1
− = − =− + →
−p + 1 −p + 1 −(p − 1) −(p − 1) (p − 1)a p−1 p − 1 p−1
a−p+1 1 a1−p 1
− = − →∞
−p + 1 −p + 1 1 − p 1 − p
Answer to Exercise 3. (a) False, (b) True, (c) False, (d) True. Since the functions
are below the x-axis, all integrals will have a negative sign, but we still compare
the areas between these functions and the x-axis.
√ √2
R∞
this gives us the comparison we want, so 10 x( x−1)
dx also diverges.
204 3 Integrals
We denote the volume of one slice by ∆V to emphasize that this volume represents
a small increment of volume when we add another slice of small width ∆x. We can
remember this formula more informally as ∆V ≈ A(x)∆x.
3.10 Slicing problems: geometry 205
Step 5. The approximation will get better and better when our slices get thinner
and thinner or, in other words, when the number of slices n gets bigger and bigger.
Using the language of limits,
n Z b
V = lim
n→∞
∑ A(xi∗ )∆x = a
A(x) dx.
i=1
Where did the last integral come from? It appears because we recognized that the
sum in the middle is the Riemann sum corresponding to the function A(x) on
the interval [a, b]. This is how integrals appear in applications where the total
Quantity (in this case Volume) can be approximated by a sum of small pieces that
looks like a Riemann sum of some integral.
Comment. In the problems where we will use the above formula V = ab A(x) dx,
R
it will be easy to compute A(x) because the cross-sections will be simple, such as
circles, or rectangles, or triangles. However, it is very important to write down all 5
steps each time, because in applications in the next section the formula will not be
applicable directly. Instead, Volume will be replaced by some other Quantity and
the crucial Step 3 above will be replaced by a different calculation specific to each
problem. Of course, the formula ∆V ≈ A(x)∆x will typically be used as a building
block in the calculations involving volumes.
Step 4. The total volume is the sum of ∆Vi , so V ≈ ∑ni=1 9πe ∆x. −xi∗
Step 5. The approximation will get better as the number of slices gets bigger, so
n Z 5
−xi∗
V = lim ∑ 9πe ∆x = 9πe−x dx.
n→∞ 0
i=1
The last integral appears because the sum in the middle is the Riemann sum corre-
sponding to the function 9πe−x on the interval [0, 5]. The interval [0, 5] is implicit
in the notation of the sum, but we should remember that in the first step we were
slicing this interval [0, 5], so the Riemann sum is defined on this interval.
In this particular case we can compute the integral using the FTC,
Z 5 x=5
9πe−x dx = −9πe−x = −9πe−5 − (−9πe−0 ) = −9πe−5 + 9π,
0 x=0
so V = −9πe−5 + 9π ≈ 28.0838 . . . .
Step 4. The total volume is the sum of ∆Vi , so V ≈ ∑ni=1 πr(h∗i )2 ∆h.
Step 5. The approximation will get better as the number of slices gets bigger, so
n Z H
V = lim
n→∞
∑ πr(h∗i )2 ∆h = 0
πr(h)2 dh.
i=1
The last integral appears because the sum in the middle is the Riemann sum corre-
sponding to the function πr(h)2 on the interval [0, H].
Exercise 2. A vase of height H has inner radius r(h) and
outer radius R(h) at height h. Set up an integral for the
volume of the sidewall of the vase.
p
We can remember this formula more informally as ∆L ≈ 1 + f ′ (x)2 ∆x.
Step 4. The total length is the sum of lengths of n small arcs, so
n n q
L = ∑ ∆Li ≈ ∑ 1 + f ′ (xi∗ )2 ∆x.
i=1 i=1
Step 5. The approximation will get better and better when our arcs get smaller
and smaller or, in other words, when the number of slices n gets bigger and bigger.
Using the language of limits,
n q Z bq
L = lim ∑ 1+ f ′ (xi∗ )2 ∆x = 1 + f ′ (x)2 dx.
n→∞ a
i=1
Example 3. The main span of Brooklyn bridge is approximately 480 meters long.
If we place the origin in the center of the bridge, the shape of the main cable can
be approximated by the graph of y = 0.0008x2 , as shown in the figure. Derive the
formula for the length of this cable using the slicing method and evaluate it in
Wolfram Alpha.
3.10 Slicing problems: geometry 209
Solution: Step 1. We slice the cable vertically into n small arcs along the x-axis on
the interval [−240, 240]. Since we placed the origin in the middle of the bridge, the
main span of the bridge is between the coordinates −240 and 240.
Step 2. Arc number i is between points xi and xi+1 , and the width of one slice is
∆x. If ∆x is small then the slope does not change much between xi and xi+1 .
Step 3. Because f ′ (x) = 0.0016x, from the right triangle calculation, the length
∆Li of the arc is approximately equal to
q q
∆Li ≈ 1 + f (xi ) ∆x = 1 + (0.0016xi∗ )2 ∆x.
′ ∗ 2
Step 5. Approximation will get better when the number of slices gets bigger, so
n q Z 240 q
L = lim ∑ 1 + (0.0016xi∗ )2 ∆x = 1 + (0.0016x)2 dx.
n→∞ −240
i=1
Evaluating this integral in Wolfram Alpha gives L = 491.5 meters. One can actually
find the antiderivative and apply the FTC using some special substitution, but this
is a more advanced material which is beyond what we have studied before.
Exercise 3. Compute the length of the graph of y = x3/2 on the interval [0, 1] by
first setting up the integral using the slicing method and then evaluating it.
Answer to Exercise 2. We could subtract the inner volume from the outer volume
to get the volume of the vase itself:
Z H Z H Z H
V= πR(h)2 dh − πR(h)2 dh = π(R(h)2 − r(h)2 ) dh.
0 0 0
However, to practice the slicing method it is better to follow the usual slicing steps
as in the previous example. The only difference here is that the slice is a disk in
between two circles of radius R(h) and r(h), so the area of one slice is A(h) =
πR(h)2 − πr(h)2 = π(R(h)2 − r(h)2 ).
Step 5. Approximation will get better when the number of slices gets bigger, so
n q Z 1p
L = lim ∑ 1 + (9/4)xi∗ ∆x = 1 + (9/4)x dx.
n→∞ 0
i=1
Linear densities. If Q denotes some Quantity of interest then the general idea
will be to use formulas of the form:
Quantity ∆Q
Quantity = × Length or ∆Q = × ∆L.
Length ∆L
The density ∆Q
∆L will be given to us either explicitly or implicitly, which might
require some calculation of the quantity per unit of length locally on a given slice.
Step 2. If the interval number i between points xi and xi+1 is small enough then
the speed is almost constant on this interval and is approximately equal to v(xi∗ ),
for any point xi∗ between xi and xi+1 .
Step 3. Since the speed is almost constant, as a result, we can apply the above
formula to write the increment of time between points xi and xi+1 as
∆x
∆Ti ≈ .
v(xi∗ )
because the sum in the middle is the Riemann sum corresponding to the function
1
v(x) on the interval [0, 1.4].
1
Since the car can stop at a traffic light, the speed v(x) can approach 0 and v(x)
can approach a vertical asymptote at that point. As a result, the answer could be an
improper integral. It is a subtle point, but in the above calculation we should avoid
points where v(xi∗ ) is equal to zero. Obviously, this improper integral would be
convergent because the time cannot be infinite. In a problem like this, for simplicity,
in a Calculus class it would probably be assumed that v(x) is always positive.
Example 2. In the setting of the previous problem, the density of plants is changing
with altitude y, so the amount of food the goat consumes along the way is A(y)
kg/km. What is the total amount of food the goat eats between x = a and x = b.
Solution: Although the word “density” was not mentioned explicitly in the prob-
lem, notice how the units kg/km tell us that A(y) is actually the density of the
amount of food per unit of distance. If this density was constant, we could simply
multiply it by the length of the path to get the total amount of food. However, since
A(y) changes with altitude, we need to use the slicing method.
3.11 Slicing problems: densities 213
Step 1. We slice the path into n small subintervals of width ∆x along the x-axis
on the interval [a, b] km.
Step 2. If the interval number i between points xi and xi+1 is small then the
altitude does not change much and so the density of food is almost constant on this
interval and is approximately equal to
A(y) = A( f (xi∗ ))
for any point xi∗ between xi and xi+1 . Although A(y) depends on the altitude y, this
altitude should be evaluated at a point xi∗ and the quantity should be expressed in
terms of x when we slice along the x-axis.
Step 3. We need to multiply the density of food in kg/km by the distance in km
to get the amount of food in kg. As a result, the amount of food between points xi
and xi+1 is
q
∆Fi ≈ A( f (xi∗ ))∆Li ≈ A( f (xi∗ )) 1 + f ′ (xi∗ )2 ∆x.
Step 5. The approximation will get better as the number of slices gets bigger, so
n n q
∗
F = lim ≈
∑ i ∑
∆F A( f (xi )) 1 + f ′ (xi∗ )2 ∆x
n→∞
i=1 i=1
Z 1.4 q
= A( f (x)) 1 + f ′ (x)2 dx
0
becausep the sum in the middle is the Riemann sum corresponding to the function
A( f (x)) 1 + f ′ (x)2 on the interval [a, b].
where slope means the slope of the graph of y = f (x). Find the total volume of air
the hiker breathes between x = a and x = b. Pay attention to units!
214 3 Integrals
Volume densities. Next, we will consider quantities distributed over some vol-
ume, so we will be using:
Quantity ∆Q
Quantity = × Volume or ∆Q = × ∆V.
Volume ∆V
W = Force × Distance = mg × h
The units will be J (joule), because all the units were consistent.
Step 4. The total work is the sum of ∆Wi , so
3.11 Slicing problems: densities 215
n n
W = ∑ ∆Wi ≈ ∑ 0.75gπρr(h∗i )2 h∗i ∆h.
i=1 i=1
Step 5. The approximation will get better as the number of slices gets bigger, so
n Z H
W = lim ∑ 0.75gπρr(h∗i )2 h∗i ∆h = 0.75gπρr(h)2 h dh (in J)
n→∞ 0
i=1
because the sum in the middle is the Riemann sum corresponding to the function
0.75gπρr(h)2 h on the interval [0, H].
H = cρTV
where c is a constant called the specific heat in J/(kg · K) that depends on the
material, ρ is the mass density in kg/m3 , V is the volume in m3 , and T is the
temperature in K.
Consider a bar of length L meters made of aluminium with constant density ρA and
specific heat cA , and varying radius r(x) (in meters) and temperature T (x) (in K).
What is the total heat stored in the bar?
Step 3. Using the formula mass = density×volume when the density is constant:
q
∆mi ≈ C(di∗ )2 R2 − (di∗ )2 L ∆d.
The units will be grams, because all the dimensions are given in cm and the density
is given in g/cm3 .
Step 4. The total mass is the sum of ∆mi , so
n n q
m = ∑ ∆mi ≈ ∑ (2L)C(di∗ ) R2 − (di∗ )2 ∆d.
i=1 i=1
Step 5. The approximation will get better as the number of slices gets bigger, so
n q Z R p
m = lim (2L)C(di∗ )
∑ R2 − (di∗ )2 ∆d = (2L)C(x) R2 − x2 dx grams
n→∞ 0
i=1
because the
√ sum in the middle is the Riemann sum corresponding to the function
(2L)C(x) R2 − x2 on the interval [0, R]. The reason we replaced the depth variable
d by variable x in the integral is because dd would look confusing in place of dx.
Area densities. Finally, we will consider quantities distributed over some area,
so we will be using:
Quantity ∆Q
Quantity = × Area or ∆Q = × ∆A.
Area ∆A
8 https://bit.ly/3QFCMNA
3.11 Slicing problems: densities 217
Step 4. The total population is the sum of ∆Pi , so P ≈ ∑ni=1 d(ri∗ )2πri∗ ∆r.
Step 5. The approximation will get better as the number of slices gets bigger, so
n Z R
P = lim
n→∞
∑ d(ri∗ )2πri∗ ∆r = 0
2πd(r)r dr people
i=1
because the sum in the middle is the Riemann sum corresponding to the function
2πd(r)r on the interval [0, R].
Exercise 5. Trees transform carbon dioxide (CO2 )
and water into glucose and oxygen using sunlight.
This process is called photosynthesis. According to a
paper9 , during 1 hour the leaves of the plant Plantago
Asiatica produce glucose and oxygen at a rate
r(T ) = 0.36 11 + 0.95(T − 10) − 0.025(T − 10)2
Answer to Exercise 1. Step 1. We slice the path into n small subintervals of width
∆x along the x-axis on the interval [a, b] km.
Step 2. If the interval number i between points xi and xi+1 is small enough then
the altitude does not change much and so the speed is almost constant on this
interval and is approximately equal to
v(y) = v( f (xi∗ ))
for any point xi∗ between xi and xi+1 . Although speed v(y) depends on the altitude y,
this altitude should be evaluated at a point xi∗ and the quantity should be expressed
in terms of x when we slice along the x-axis.
Step 3. Since the speed is almost constant, as a result, we can write the increment
of time between points xi and xi+1 as
1 + f ′ (xi∗ )2 ∆x
p
distance ∆Li
∆Ti = ≈ ≈ .
speed v( f (xi∗ )) v( f (xi∗ ))
Notice that, compared to the previous example, here the distance ∆L is not the hor-
izontal increment ∆x, but the increment p along the mountain path y = f (x), which
we found in the last section: ∆L ≈ 1 + f ′ (x)2 ∆x. This calculation is the main
step and the biggest difference from the previous example. √ ′ ∗2
n 1+ f (x ) ∆x
Step 4. The total time is the sum of ∆Ti , so T ≈ ∑i=1 v( f (x∗i)) .
i
Step 5. The approximation will get better as the number of slices gets bigger, so
1 + f ′ (xi∗ )2 ∆x
n
p Z 1.4 p
1 + f ′ (x)2
T = lim ∑ = dx
n→∞
i=1 v( f (xi∗ )) 0 v( f (x))
because
√ ′ 2the sum in the middle is the Riemann sum corresponding to the function
1+ f (x)
v( f (x)) on the interval [a, b].
Answer to Exercise 2. Step 1. We slice the path into n small subintervals of width
∆x along the x-axis on the interval [a, b] km.
Step 2. If the interval number i between points xi and xi+1 is small enough then
the speed does not change much and is approximately equal to v(xi∗ ) km/h for
any point xi∗ between xi and xi+1 . The altitude also does not change much, so the
breathing rate is approximately equal to 15 + 100|slope| = 15 + 100| f ′ (xi∗ )| L/min,
because the slope of y = f (x) is f ′ (x).
Step 3. How many litres of air does the hiker breathe between points xi and xi+1 ?
Here the units can help us. To get litres we need to multiply L/min by min, so we
need to multiply the rate 15 + 100| f ′ (xi∗ )| L/min by time ∆T in minutes. We can
computed time as in the previous problems as
1 + f ′ (xi∗ )2 ∆x
p
distance ∆Li
∆Ti = ≈ ≈ .
speed v(xi∗ ) v(xi∗ )
3.11 Slicing problems: densities 219
The only subtle issue is that time is in hours, because the distance was given in km
and speed was given in km/h, so when we multiply the breathing rate by time,
1 + f ′ (xi∗ )2 ∆x
p
′ ∗
15 + 100| f (xi )| × ,
v(xi∗ )
the units are L/min × hour = L/min × 60min = 60 L. So the amount of air is:
1 + f ′ (xi∗ )2 ∆x
p
′ ∗
∆Ai ≈ 15 + 100| f (xi )| × × 60 litres.
v(xi∗ )
Step 4. The total amount of air is the sum of ∆Ai , so
1 + f ′ (xi∗ )2 ∆x
n
p
′ ∗
A ≈ ∑ 60 15 + 100| f (xi )| × .
i=1 v(xi∗ )
Step 5. The approximation will get better as the number of slices gets bigger, so
1 + f ′ (xi∗ )2 ∆x
n
p
′ ∗
A = lim ∑ 60 15 + 100| f (xi )| ×
n→∞
i=1 v(xi∗ )
p
1 + f ′ (x)2
Z 1.4
60 15 + 100| f ′ (x)| ×
= dx litres
0 v(x)
because the sum in the middle is the Riemann sum corresponding to the function
′
√1+ f ′ (x)2
60 15 + 100| f (x)| v( f (x)) on the interval [a, b].
Answer to Exercise 3. Step 1. We slice the bar vertically into n narrow disks of
width ∆x along the x-axis on the interval [0, L].
Step 2. If the interval number i between points xi and xi+1 is small enough then
the radius of the cross-section and temperature do not change much and are ap-
proximately equal to r(xi∗ ) and T (xi∗ ) for any point xi∗ between xi and xi+1 .
Step 3. As we saw in the last section, for a body of revolution the cross-section
is a circle so its area is A ≈ πr(xi∗ )2 and, as a result, the volume is ∆Vi ≈ πr(xi∗ )2 ∆x.
Using the formula stated in the problem, the heat stored in one slice is
The units will be J (joule), because all the units were consistent.
Step 4. The total heat is the sum of ∆Hi , so H ≈ ∑ni=1 πcA ρA T (xi∗ )r(xi∗ )2 ∆x.
Step 5. The approximation will get better as the number of slices gets bigger, so
n Z L
A = lim ∑ πcA ρA T (xi∗ )r(xi∗ )2 ∆x = πcA ρA T (x)r(x)2 dx J
n→∞ 0
i=1
because the sum in the middle is the Riemann sum corresponding to the function
πcA ρA T (x)r(x)2 on the interval [0, L].
220 3 Integrals
Answer to Exercise 4. Step 1. We slice the Atlantic ocean into n slices of depth
∆d between depth 0 and 5 km.
Step 2. When ∆d is small, oxygen concentration is almost constant on any given
slice and is approximately equal to C(di∗ ) for any point di∗ on the interval number i
between di and di+1 . The area of the slice is approximately A(di∗ ), so its volume is
∆Vi ≈ C(di∗ )A(di∗ ).
Step 3. Using the formula Mass = Concentration × Volume when concentration
is constant, we get that ∆mi ≈ C(di∗ )A(di∗ ) ∆d. The area was given in millions of
km2 , depth in km, and concentration in gram/litre. There are 1012 litres in cubic
kilometers, so when we multiply the units we will get 1018 grams. Converting this
to tonnes we get that the units will be in million tonnes.
Step 4. The total mass is the sum of ∆mi , so
n n
m = ∑ ∆mi ≈ ∑ C(di∗ )A(di∗ ) ∆d.
i=1 i=1
Step 5. The approximation will get better as the number of slices gets bigger, so
n Z 5
m = lim
n→∞
∑ C(di∗ )A(di∗ ) ∆d = 0
C(x)A(x) dx millions of tonnes
i=1
because the sum in the middle is the Riemann sum corresponding to the function
C(x)A(x) on the interval [0, 5] km.
Answer to Exercise 5. The rate of photosynthesis r(T ) will depend on the coor-
dinate x on the leaf, because temperature T = T (x) varies with x. Let us record
that
r(T (x)) = 0.36 11 + 1.9 sin(x2 /2) − 0.1 sin2 (x2 /2) .
Step 1. We slice the leaf into n slices of width ∆x between x = 0 and x = 10 cm.
Step 2. When ∆x is small, the rate of photosynthesis is almost constant on one
slice and is approximately equal to r(T (xi∗ )) for any point xi∗ on the interval number
i between xi and xi+1 . The height of the slice is approximately f (xi∗ ) − g(xi∗ ), so its
area is ∆Ai ≈ ( f (xi∗ ) − g(xi∗ ))∆x.
Step 3. Notice that the rate of photosynthesis was given per unit or area,
so we can use that Amount = Rate × Area when the rate is constant. As a re-
sult, the amount of glucose and oxygen produced during 1 hours in one slice is
∆Ai ≈ r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x. The area in the figure is in cm2 , and rate is in
µmol/cm2 , so the amount here is in µmol.
Step 4. The total amount is the sum of ∆Ai , so
n n
A = ∑ ∆Ai ≈ ∑ r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x.
i=1 i=1
Step 5. The approximation will get better as the number of slices gets bigger, so
3.11 Slicing problems: densities 221
n Z 10
A = lim
n→∞
∑ r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x = 0
r(T (x))( f (x) − g(x)) dx µmol
i=1
because the sum in the middle is the Riemann sum corresponding to the function
r(T (x))( f (x)−g(x)) on the interval [0, 10] cm. At this stage we can replace r(T (x))
by the specific formula above:
Z 10
0.36 11 + 1.9 sin(x2 /2) − 0.1 sin2 (x2 /2) f (x) − g(x) dx µmol
A=
0
Chapter 4
Differential equations
y′ = f (x, y).
y′ (x) = f x, y(x)
but we will often write it simply as y′ = f (x, y), keeping in mind that y here actually
means a function y(x). This function is usually unknown to us and we want to solve
this equation to find y(x). A function y(x) is a solution of the equation y′ = f (x, y)
if it satisfies this equation, which means that if we plug it into the two sides of the
equation we get equality. As we will see in the examples, an equation like this will
have many solutions depending on the starting point y(x0 ) = y0 , which is called the
initial condition.
• In this section we will focus on understanding some qualitative behaviour of
solutions of some equations.
• In the next section we will see how we can approximate solutions using Taylor
polynomials and the so called Euler’s method.
• In the section after that we will learn how to find solutions for the so called
separable equations.
• After that, we will do some modelling using differential equations and consider
examples of systems of equations.
Let us introduce some additional definitions in the context of an applied example.
223
224 4 Differential equations
y′ = κ(20 − y)
Solution: (a) One of the things that distinguishes these solutions is the initial tem-
perature y(0) at time t = 0 or, in other words the initial condition. We see that
the solutions corresponding to temperatures 30, 50 and 70◦ C are decreasing over
time to 20◦ C, while solutions corresponding to temperatures 10 and −10◦ C are
increasing to 20◦ C.
(b) To verify that y(t) = 20 is a solution of y′ = κ(20 − y), we check that y′ (t) =
(20)′ = 0 and κ(20 − y(t)) = κ(20 − 20) = 0, so the two sides are indeed equal.
(c) It is sensible to call y(t) = 20 an equilibrium solution, because it reflects that
coffee at room temperature will stay at room temperature, so its temperature is in
a state of rest or balance.
(d) The equilibrium solution y(t) = 20 is a stable equilibrium because solutions
that start close to 20◦ C do not move away and always stay close, in fact, getting
closer and closer to 20◦ over time. We will see examples below when a solution
starting close to an equilibrium moves away from it over time. Such an equilib-
rium will be called an unstable equilibrium. If we are only given the equation
y′ = κ(20 − y) and do not see the figure, how can we check that y(t) = 20 is a
stable equilibrium? We can use the following diagram:
y′ = 0.02(20 − y) : + −
y
20
Since the derivative y′ is 0.2(20 − y), so it depends only on y, we draw an arrow
representing y-axis and we mark 20 on this axis, which is our equilibrium point.
Then we check the sign of the derivative to the left and right of 20. For example,
if we plug in y = 10, we get 0.2(20 − 10) = 2 which is positive and, if we plug
4.1 Differential equations: qualitative analysis 225
Comment. In the above example, the differential equation was of the type
y′ = f (y).
In other words, the right hand side depends only on y and does not depend on
the independent variable x or t explicitly. Such equations are called autonomous
equations. Otherwise, the equation is non-autonomous. For example, y′ = y + t
is non-autonomous, because of the term +t. We will discuss autonomous vs non-
autonomous equations in the examples below. To find equilibrium solutions of an
autonomous equation, we simply find all the points where f (y) = 0. For example,
in the above example 0.2(20−y) = 0 when y = 20, so this was the only equilibrium
solution.
Here Min x, Min y, Max x, Max y are the sides of the rectangle where we want to
draw the slope field, number n is the number of points in the grid in both the
horizontal and vertical directions, and length multiplier a controls the length of
each segment, which can be adjusted for visual impact. The slope field in the above
example was constructed using SlopeField(y(1 − y), 60, 0.75, 0, −5, 10, 5). We can
also draw a solution starting at a point (x0 , y0 ) using the commands
4.1 Differential equations: qualitative analysis 227
Here End x means up to which point we want the solution to be drawn, and Step
means that the solution is drawn in steps of this size. The smaller the step the
smoother solution looks like, although any small enough step like 0.1 will look
perfectly smooth.
Exercise 2. Consider a differential equation y′ = y(y − 1).
(a) Is this an autonomous equation?
(b) Find all equilibrium solutions. Verify that they are indeed solutions.
(c) Are they stable or unstable?
(d) Roughly sketch some examples of solutions. Draw the slope field along the line
y = 2. Draw the slope field in Geogebra.
the slope field changes along horizontal lines, compared to the examples above
where the slope field stayed the same along any horizontal line. That is because
the slope y′ now changes with t even if y stays constant, because the equation is
non-autonomous.
Example 3. Consider the slope field in the left figure above corresponding to the
differential equation y′ = ty(1 − y).
(a) By looking at the slope field, find all equilibrium solutions. Check that they are
indeed solutions by plugging them into the equation.
(b) Sketch some solutions following along the slope field.
(c) If the initial condition y(0) > 0 is positive, what is the limit limt→∞ y(t)?
228 4 Differential equations
1. 2.
3. 4.
4.1 Differential equations: qualitative analysis 229
Answer to Exercise 2. (a) This is an autonomous equation, because the right hand
side y(y − 1) depends only on y.
(b) The right hand side y(y − 1) is equal to 0 when y = 0 or y = 1, so there
are two equilibrium solutions. To check that y(t) = 1 is a solution, we plug it into
y′ = y(1 − y) and we get (1)′ = 1(1 − 1), because both sides are zero. Similarly,
y(t) = 0 is a solution because (0)′ = 0(0 − 1).
(c) To decide if these equilibria are stable or unstable, we draw a diagram:
y′ = y(y − 1) : + − +
y
0 1
From this diagram we see that y = 1 is an unstable equilibrium because solutions
that start nearby are moving away from it, and y = 0 is a stable equilibrium because
solutions are moving towards it.
(d) In the figure on the right we
sketched two equilibrium solutions
y = 0 and y = 1, and three more so-
lutions above, below, and in between
these points. According to the above
diagram, the solutions above and be-
low are increasing because y′ > 0, and
the solution in between is decreasing
because y′ < 0 there. We drew the
slope field in Geogebra, but the slope
field around level y = 2 could be drawn
by hand by computing y′ = y(y − 1) =
2(2 − 1) = 2 and drawing a bunch of line segments with the slope 2 along the hor-
izontal line y = 2. In the figure the regions near y = 2 is emphasized by the red
dashed lines.
Answer to Exercise 3. (a) By looking at the slope field, we can see that the slope
is horizontal at y = −2, −1, 0, 1, 2 and 3, so these look like equilibrium solutions.
For example, if we plug y(t) = 1 into the equation y′ = sin(πt) sin(πy), we see that
(1)′ = sin(πt) sin(0) because both sides are equal to zero. Without looking at the
4.1 Differential equations: qualitative analysis 231
slope field, we see that y′ = sin(πt) sin(πy) = 0 when sin(πy) = 0, which holds
when πy is of the form 0, ±π, ±2π, etc. In other words, when y is any integer
number 0, ±1, ±2, etc.
(b) We sketched the equilibrium so-
lutions and one non-equilibrium solu-
tion flowing along the slope field in the
figure. We see that solutions can now
fluctuate by decreasing and increasing.
(c) For the same reason it looks that
the limit limt→∞ y(t) does not exist, un-
less we start exactly at one of the equi-
librium solutions.
Answer to Exercise 4. From previous example we know that 2 and 4 are au-
tonomous equations, so they must be (a) or (c). We can see from the figure that
in 2 the slope is decreasing as y increases and in 3 the slope is increasing as y in-
creases, so we can match 2(c) and 3(a). We could also match by noticing that 3 has
1
a horizontal slope line, aka equilibrium solution, below the x-axis, while (c) 2y+1
can never be equal to zero.
Next, we can notice that the slope field 4 is the same along vertical lines. This
is an indication that it correspond to the equation of the form y′ = f (t), because
on a vertical line t is a constant, t = c. This allows us to match 4(b), because (b)
y′ = (2t + 1)2 is the only equation where the right hand side depends only on t.
This leaves 1(d).
Another way we could match 4(b) and 1(d) is to check when the slope is zero in
(b) and (d). In (b), the slope is zero when y′ = (2t + 1)2 = 0, so t = −0.5. In (d),
the slope is zero when y′ = y − t = 0, so y = t. We can see that in the slope field
4, the slopes are zero around t = −0.5, and in slope field the 1 the slopes are zero
around the diagonal line y = t.
Example 1. The above figure comes from the example of a cup of coffee cool-
ing down at room temperature. The starting temperature is y(0) = 70◦ C and the
differential equation is y′ = 0.02(20 − y). Describe the first four steps of Euler’s
method with the step h = 20 minutes. If the actual solution at time t = 20 is equal
to 30.09◦ C, does the Euler method underestimate or overestimate it?
Solution: In the above figure, Euler’s method steps are given by the solid lines and
the actual solution is the dashed curve. In this problem, f (t, y) = 0.02(20 − y).
4.2 Differential equations: approximations 233
Step 1. Our starting point is (0, 70) and the slope at this point is f (0, 70) =
0.02(20 − 70) = −1. This means that on the first interval from t = 0 to t = 20, the
tangent line is y = 70 − t. If we move along this line, at time t = 20 we end up at
y = 70 − 20 = 50.
Step 2. After the first step we are at the point (20, 50). The slope at this point
is f (20, 50) = 0.02(20 − 50) = −0.6. This means that on the second interval from
t = 20 to t = 40, the tangent line is y = 50 − 0.6(t − 20). If we move along this
line, at time t = 40 we end up at y = 50 − 0.6(40 − 20) = 38.
Step 3. After the second step we are at the point (40, 38). The slope at this point
is f (40, 38) = 0.02(20 − 38) = −0.36. This means that on the third interval from
t = 40 to t = 60, the tangent line is y = 38 − 0.36(t − 40). If we move along this
line, at time t = 60 we end up at y = 38 − 0.36(60 − 40) = 30.8.
Step 4. After the third step we are at the point (60, 30.8). The slope at this point
is f (60, 30.8) = 0.02(20 − 30.8) = −0.216. This means that on the fourth interval
from t = 60 to t = 80, the tangent line is y = 30.8 − 0.216(t − 60). If we move
along this line, at time t = 80 we end up at y = 30.8 − 0.216(80 − 60) = 26.48.
Since the actual solution at time t = 80 is equal to 30.09, Euler’s method under-
estimated it, which agrees with the figure above.
Euler’s method y′ = 0.02(20 − y), y(0) = 70, from 0 to 80, stepsize = 0.1
We can, of course, change the equation, the interval and the step size.
Comment. In the above two problems we saw that Euler’s method underestimated
or overestimated the actual solution. The reason was quite simple.
• In the region where solutions are concave up, the tangent lines are below, so
moving along the tangent lines will underestimate the actual solutions.
• In the region where solutions are concave down, the tangent lines are above, so
moving along the tangent lines will overestimate the actual solutions.
y′′ (a)
P2 (x) = y(a) + y′ (a)(x − a) + (x − a)2
2
and it only remains to compute y′′ (a). Again, because y′ (x) = f (x, y(x)), we can
differentiate this equation to find y′′ (x) = (y′ (x))′ . Let us illustrate this on a specific
example.
4.2 Differential equations: approximations 235
To find y′′ ( 41 ), we differentiate the equation using the chain rule and keeping in
mind that y is a function of t:
′ ′
y′′ (t) = y′ (t) = sin π(t + y)
′
= cos(π(t + y)) π(t + y) = cos(π(t + y)) π(1 + y′ ) .
We now plug in t = 1
4 and use that y = 1
2 and y′ = √1
2
at t = 41 :
1 1 1 1 3π 1
y′′ = cos π + π 1+ √ = cos π 1+ √ =
4 4 2 2 4 2
1 1 1 1
= − √ π 1 + √ = −π √ + .
2 2 2 2
1
As a result, the Taylor polynomial of degree 2 centred at t = 4 is
1 1 1 π 1 1 1 2
P2 (t) = +√ t− − √ + t− .
2 2 4 2 2 2 4
In the above figure, this Taylor polynomial is graphed as a dashed green curve,
while the actual solution is graphed as a solid blue line, both for t ≥ 14 . We can
see that the approximation works well only if we are not far from the starting time
t = 14 .
Comment. Here we only compute the second degree Taylor polynomials but, once
we computed the second derivative y′′ (x), we can differentiate it once again to find
y′′′ (x) and to find the third degree Taylor polynomial. We can repeat this process to
find any degree Taylor polynomials. In another direction, we could use the Taylor
polynomial of degree 2 only on a small interval of step h, which would give us a
more accurate version of Euler’s method.
Exercise 3. If y(x) is the solution of the differential equation y′ = sin(x2 y) with the
initial condition y(1) = π, find its Taylor polynomial of degree 2 centred at x = 1.
236 4 Differential equations
y′ = f (y)
Answer to Exercise 1. Step 1. Our starting point is (0, 0) and the slope at this point
is f (0, 0) = 0.02(20 − 0) = 0.4. This means that on the first interval from t = 0 to
t = 5, the tangent line is y = 0.4t. If we move along this line, at time t = 5 we end
up at y = 2.
Step 2. After the first step we are at the point (5, 2). The slope at this point is
f (5, 2) = 0.02(20 − 2) = 0.36. This means that on the second interval from t = 5
to t = 10, the tangent line is y = 2 + 0.36(t − 5). If we move along this line, at time
t = 10 we end up at y = 2 + 0.36(10 − 5) = 3.8.
Step 3. After the second step we are at the point (10, 3.8). The slope at this point
is f (10, 3.8) = 0.02(20 − 3.8) = 0.324. This means that on the third interval from
t = 10 to t = 15, the tangent line is y = 3.8 + 0.324(t − 10). If we move along this
line, at time t = 60 we end up at y = 3.8 + 0.324(15 − 10) = 5.42.
Step 4. After the third step we are at the point (15, 5.42). The slope at this point
is f (15, 5.42) = 0.02(20 − 5.42) = 0.2916. This means that on the fourth interval
from t = 15 to t = 20, the tangent line is y = 5.42 + 0.2916(t − 15). If we move
along this line, at time t = 20 we end up at y = 5.42 + 0.2916(20 − 15) = 6.878.
Since the actual solution at time t = 20 is 6.5935, Euler’s method overestimated
it. Using the following command in Wolfram Alpha
we get that Euler’s method with the step size h = 0.1 gives 6.5989, which is a much
better approximation.
Answer to Exercise 2. The slope field in the region above the equilibrium solution
y = 6 seems to be concave up everywhere, and the solution looks concave up. As a
result, Euler’s method will underestimate this solution in this region.
Answer to Exercise 3. We are given that y(1) = π and, using the differential equa-
tion, we can compute y′ (1) = sin(12 π) = sin(π) = 0. To find y′′ (1), we differentiate
the equation using the chain rule and keeping in mind that y is a function of t:
′ ′
y′′ (t) = y′ (t) = sin x2 y
′
= cos(x2 y) x2 y = cos(x2 y) 2xy + x2 y′ .
f (x) g(y)
y′ = f (x)g(y) or y′ = or y′ = .
g(y) g(x)
In other words, on the right hand side the variables x and y are separated into a
product or ratio of two functions, and one of the functions depends only on x and
another one depends only on y. Examples of such differential equations are
x
y′ = xy, y′ = , y′ = ex+y = ex · ey ,
1+y
x(1 − y) x 1−y
y′ = 0.5y, y′ = = · .
y(1 + x) 1 + x y
dy
= xy.
dx
Step 2. We move all the terms containing y variable to one side and all the terms
containing x variable to the other side:
dy
= x dx.
y
Here we treat dy and dx formally as numbers, so we can move them around using
algebra rules.
Step 3. We now integrate both sides formally and find antiderivatives:
dy x2
Z Z
= x dx =⇒ ln |y| = +C.
y 2
absolute value |y|. For example, the initial condition is y(0) = −2, so if we forgot
the absolute value and wrote ln(y) + C, ln(−2) would be undefined because we
cannot plug in negative values into the logarithm.
Step 4 (with initial condition). First, we plug in the initial condition y(0) = −2:
02
ln | − 2| = +C.
2
This gives us that the constant C = ln 2, so the equation is
x2
ln |y| = + ln 2.
2
Then we solve for y if possible. In some cases, the equation may be too complicated
to solve for y, so we can leave it as is and think of y = y(x) as an implicit solution
of this equation. In this particular problem, we can solve for y:
x2 2 /2 2 /2
ln |y| = + ln 2 =⇒ |y| = ex eln 2 = 2ex .
2
2 /2 2 /2 2 /2
−y = ex eln 2 = 2ex =⇒ y = −2ex .
2
We found the solution y = −2ex /2 .
Step 4 (without initial condition). If we do not have the initial condition then
we leave the constant C in Step 3 indeterminate. Again, we can try to solve for y
if possible, or leave it as is and think of y = y(x) as an implicit solution. In this
particular problem, we can solve for y:
x2 2 /2
ln |y| = +C =⇒ |y| = eC ex
2
2 /2
=⇒ y = (±eC )ex
2 /2
=⇒ y = Bex .
In the last step we renamed ±eC as another indeterminate constant B to keep the
expression simple. If we have the initial condition then we can determine B at this
step. For example, we know that y(0) = −2, so −2 = Be0 = B, so B = −2 and
2
y = −2ex /2 again.
240 4 Differential equations
Comment. The first three steps above are easy to follow in practice, but they hide
an implicit integration by substitution. To make sense of these steps, we could have
rewritten the original equation y′ = xy as
y′ (x)
y′ (x) = xy(x) or =x
y(x)
and, since both sides are functions of x, their indefinite integrals are the same:
y′ (x)
Z Z
dx = x dx.
y(x)
On
R dy
the left hand side, if we make the substitution y = y(x), we can rewrite it as
y so we arrive to Step 3 above in a way that makes more sense. Of course, in
,
practice we can skip this substitution step and follow the above formal steps.
dθ gt
= − cos2 (θ )
dt d
dy
= κy(100 − y)
dt
(a) If between time t and t + ∆t the water level changed by ∆h, using that the cross-
section area of the container is A, what is the volume ∆V of water that escaped
during this time. Warning: water level is decreasing so the change ∆h is negative!
p By Torricelli’s law, at time t the water was flowing out at the speed v =
(b)
2gh(t). Using that the water hole has area a and the speed does not change much
over a very short period ∆t, what is the approximate volume ∆V of water jet that
escaped during this time.
(c) Set the volumes ∆V in (a) and (b) equal to each other and derive a differential
equation for h(t) by taking the limit ∆t → 0.
(d) Solve this equation with the initial condition h(0) = H. How long will it take
for the container to empty?
dθ gt dθ gt
= − cos2 (θ ) =⇒ 2
= sec2 (θ ) dθ = − dt =⇒
dt d cos (θ ) d
Z Z
gt gt 2
2
sec (θ ) dθ = − dt =⇒ tan(θ ) = − +C.
d 2d
2
Plugging in the initial condition θ (0) = π4 , we get tan( π4 ) = − g(0)
2d +C, so 1 = C
gt 2
and tan(θ ) = 1 − 2d . We can solve this for θ by taking the inverse tangent: θ =
2
tan−1 (1 − gt2d ).
1 1 1 1
Using the hint y(100−y) = 100 ( y + 100−y ), we can easily find the integral on the left
hand side, so
1
ln |y| − ln |100 − y| = κt +C or ln |y| − ln |100 − y| = 100κt + 100C.
100
Plugging in the initial condition y(0) = 1, we get that 100C = − ln 99, so the equa-
tion can be rewritten as
|y|
ln |y| − ln |100 − y| = ln = 100κt − ln 99.
|100 − y|
Exponentiating both sides, we get
|y| 1
= e100κt .
|100 − y| 99
Because the number of people who know the rumour is always between 0 and 100,
both y and 100 − y are positive, so we can forget about the absolute values and
y 1 100κt
write the equation as 100−y = 99 e . It remains to solve it for y:
Answer to Exercise 4. (a) Since ∆h is negative, −∆h is the height by which the
water level decreased. Since the cross section area is A, the volume of water that
escaped is Area × Height = A × (−∆h) = −A∆h.
(b) If we image the water jet flowing from the hope as a small cylinder, its cross-
section is the area of the hole a. During short time period ∆t, the speed
p is almost
constant, v(t), so the jet moved by Height = Speed × Time ≈ v(t)∆t = 2gh(t)∆t,
where in the last step we used Torricelli’s
p law. Multiplying this by the area a gives
us the volume of the water jet, ∆V ≈ a 2gh(t)∆t. p
(c) The volumes we found in (a) and (b) should
p be equal so −A∆h ≈ a 2gh(t)∆t.
Dividing both sides by ∆t we get −A ∆h ∆t ≈ a 2gh(t). When we take the limit
∆t → 0, the approximation will get better and better and ∆h ′
∆t will become h (t), so
we finally obtain the differential equation
for some constants a > 0 and c > 0. The population of rabbits would grow at a rate
proportional to the number of rabbits, and the population of foxes would die out at
a rate proportional to the number of foxes. We know that these correspond to the
exponential growth and exponential decay.
Since foxes and rabbits share the same territory, it is reasonable to assume that
Indeed, imagine that a rabbit encounters 1 fox per month. If the population of
foxes doubles then it is reasonable that the number of encounters per month would
also double to 2. So the number of encounters of one rabbit with foxes per month
is proportional to y, and the total number of encounters is proportional to xy. The
Lotka-Volterra model modifies the above equations to account for these encounters:
for some positive constants a, b, c and d. The second term −bxy in the rate x′ (t) of
the rabbit population change is there because some proportion of the encounters
will lead to rabbits being killed and eaten. The second term +dxy in the rate y′ (t)
of the fox population change is there because those encounters yield food for foxes
so they allow them to survive and procreate. Simply put, encounters are bad for
rabbits and good for foxes.
246 4 Differential equations
In the rest of this section we will analyze this model and, for simplicity, will
take the coefficients to be all equal to 1, a = b = c = d = 1, so
We will break the analysis of the model into two exercises with relatively simple
steps. In the exercises, we always assume that the derivatives x′ (t) and y′ (t) satisfy
the above Lotka-Volterra equations. In the first exercise, we will draw the pair of
populations x(t) and y(t) as a point (x(t), y(t)) on the xy-plane, and will try to
imagine what kind of trajectory this point (x(t), y(t)) will follow over time. Before
we begin, let us give a simple example.
so the coordinate x(t) is decreasing and y(t) is increasing with time. This means
that the ant is moving in the north-west direction, as indicated by the arrow in the
first quadrant. Similarly, we can check that in the second quadrant it is moving in
the south-west direction, in the third quadrant in the south-east direction, and in
the fourth quadrant in the north-east direction. The equations do not immediately
tell us that the ant is walking on a circle, but we get a general idea that the ant is
moving counterclockwise around the origin.
We will now apply a similar analysis to the fox and rabbit populations using the
Lotka-Volterra equations. After that we will see how we could figure out that the
ant is moving along a circle using only the equations x′ (t) = −y and y′ (t) = x.
4.4 Lotka-Volterra predator-prey model 247
x − xy = 0 and − y + xy = 0.
Next, we will consider a different question. Both populations x(t) and y(t) de-
pend on time t, but
In other words, can we eliminate time t and find y = y(x) as a function of x? Before
we address this question, let us make a useful observation.
Taking the derivative of both sides, by the chain rule, y′ (t) = y′ (x)x′ (t), so
y′ (t)
y′ (x) = .
x′ (t)
y′ (t)
∆y
∆y
≈ ∆t
= ≈ y′ (x).
x′ (t) ∆x
∆t
∆x
If we know that the coordinate x(t) moves at a rate x′ (t) and coordinate y(t) moves
′
at a rate y′ (t) then the pair moves along the curve with the slope y′ (x) = xy′ (t)
(t) . This
is how the slope field was graphed in the previous exercise, using the formula for
y′ (x) that will be computed in the next exercise.
248 4 Differential equations
find an equation for y′ (x) and solve it to show that the ant is moving along a circle.
Solution: Using the chain rule formula in the comment above,
y′ (t) x
y′ (x) = ′
=− ,
x (t) y
where in the last step we used the given equations x′ (t) = −y and y′ (t) = x. This is
a separable equation so we can solve it following standard steps:
dy x
=− =⇒ y dy = −x dx
dx y Z Z
=⇒ y dy = − x dx
y2 x2
=⇒ = − +C.
2 2
If we rewrite this as x2 + y2 = 2C, we can see that this is an equation of the cir-
cle centred at the origin. The constant C can be found if we are given the initial
condition (x(0), y(0)) or any point on the trajectory.
Find a differential equation for y′ (x) of the form y′ (x) = f (x, y).
(b) Solve the equation you found in (a). The answer can be written as an implicit
function.
(b) The directions of how the pair of populations (x(t), y(t)) changes over time
are described by the arrows in the left figure above. Indeed, x′ (t) = x−xy = x(1−y)
is positive when y < 1 and negative when y > 1, so x(t) is increasing (moving right,
or east) below the line y = 1, and decreasing above this line (moving left, or west).
This makes sense, because this tells us that the population of rabbits increases
when there are not too many foxes around, and it decreases when the population of
foxes exceeds a certain threshold. Similarly, y′ (t) = −y+xy = y(−1+x) is positive
when x > 1 and negative when x < 1, so y(t) is increasing (moving up, or north)
to the right of the line x = 1 and decreasing to the left of this line (moving down,
or south). Over time, the point (x(t), y(t)) will be moving around the equilibrium
point (1, 1). Again, this makes sense, because this tells us that the population of
foxes decreases when there are not enough rabbits around, and it increases when
the population of rabbits exceeds a certain threshold.
In the figure above on the right, we show the slope field y′ (x) and one trajectory
starting at a point (0.5, 0.5). We will discuss the slope field in the next example,
and the trajectory can be graphed in Geogebra using the command:
250 4 Differential equations
Here −y + xy and x − xy are the formulas for x′ (t) and y′ (t), (0.5, 0.5) is the initial
position at time t = 0, 10 is the time t until the equation is solved numerically (you
can try other values of t), and 0.01 is the step size. We will find the formula for this
trajectory in the next exercise.
Answer to Exercise 2. Using the chain rule as in the previous example,
y′ (t) −y + xy y(−1 + x)
y′ (x) = = =
x′ (t) x − xy x(1 − y)
where in the middle step we applied the Lotka-Volterra equations. This gives us a
separable differential equation
x−1 y
y′ (x) = · .
x 1−y
We can solve it using the standard steps:
dy x − 1 y 1−y x−1
= · =⇒ dy = dx
dx x 1−y y x
1 1
Z Z
=⇒ − 1 dy = 1− dx
y x
=⇒ ln |y| − y = x − ln |x| +C.
Since the populations are positive, we can forget about the absolute values and
write the last equation as ln(y) − y = x − ln(x) +C. We cannot solve this for y, so
we leave it as an implicit solution. The constant C can be found if we are given the
initial condition (x(0), y(0)) or any point on the trajectory.
Answer to Exercise 3. 1(d) – cannot tell which one is which. 2(c) – x is trees and y
is owls. 3(a) – x is polar bears and y is seals. 4(b) – cannot tell which one is which.
4.5 The SIR model 251
Then we can model the rates of change of the three groups by the equations:
• Constant a > 0 is the rate of infection (more precisely, the rate of potential
transmissions by an infected individual), defined as the average number of con-
tacts of one person per unit of time, multiplied by the probability of disease
transmission in a contact between a susceptible and an infectious subject.
• Constant b > 0 is the rate of recovery, which is defined as b = D1 where D is an
average time period an individual is infectious.
The term aSI represents new infections (per unit of time) resulting from interactions
between susceptible and infectious individuals, so it is added to the infected group
and subtracted from the susceptible group. The term bI represents newly recovered
individuals (per unit of time), so it is subtracted from the infected and added to the
recovered group. Since the term R(t) does not appear in the first two equations, we
can for now forget about the last equation R′ (t) = bI and first focus on analyzing
and solving the system of the first two equations:
a
R0 = = aD
b
252 4 Differential equations
which is called the basic reproduction number. Since it is the rate a of potential
transmissions by an infectious individual per unit of time multiplied by the number
of days D an individual is typically infected, R0 represents the average number
of new cases generated by one infected individual, assuming that everyone else is
susceptible. When the proportion of susceptible in the population is S, the actual
average number of new cases generated by one infected individual is R0 S.
Notice that S′ (t) = −aSI is negative, so the number of susceptible individuals
is always decreasing (obviously). In the first exercise we will analyze when the
infectious subpopulation is increasing and decreasing.
Exercise 1. In the first quadrant S > 0, I > 0 on the SI-plane (with S on the x-
axis and I on the y-axis) find the regions where I ′ (t) is positive, negative, or zero.
Express the regions in terms of the basic reproduction number R0 . In each region,
sketch in which direction the pair (S(t), I(t)) is moving as time t increases. Explain
the behaviour in terms of the basic reproduction number R0 .
Exercise 2. Suppose that an individual is typically infected for D = 4 days, and the
basic reproduction number is R0 = 2.
(a) Write down the SIR model corresponding to these parameters.
(b) Find a differential equation for I ′ (S).
(c) Solve this equation with the initial conditions S(0) = 0.95 and I(0) = 0.5.
Spread of disease in time. If we want to see how the disease spreads over
time, we could try to find the function S = S(t), which tells us how the susceptible
population decreased as a function of time. Then 1−S(t) will give us the proportion
of the population infected up to time t. For example, in the setting of the previous
exercise, if we take the solution I = 0.5 ln S − S + 1.025 we found in part (c) and
plug it into the equation S′ (t) = −0.5SI found in part (a), we get
dS
= −0.5S(0.5 ln S − S + 1.025).
dt
This is a separable equation, but we can not solve
it explicitly because we can not integrate
dS
Z
S(0.5 ln S − S + 1.025)
explicitly. However, we can solve it numerically,
for example, by using the following command in
Geogebra:
This produced the graph in the above figure. Here the initial condition is S(0) =
0.95, and we solve the equation up to time t = 30.
4.5 The SIR model 253
In the figure below on the right, we show the slope field I ′ (S) and one trajectory
for the pair (S(t), I(t)). We will discuss the slope field in the next exercise, and the
trajectory can be graphed in Geogebra using the command:
We have to write x instead of S and y instead of I in axy − by, −axy, for Geogebra
to understand what we want.
1 R0
Answer to Exercise 2. (a) b = D = 0.25 and a = D = 0.5, so the equations are
Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method
of undetermined coefficients.
Solution: Let us write the unknown solution as
y(x) = c0 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
y(x) = −1 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
Next, we want to plug in this expression into the equation y′ (x) + 4y(x) = 8, which
means that we need to take the derivative first:
Notice how the last term 4c4 x4 in the second line disappeared. That is because it
had nothing to be matched with, so it was absorbed by the dots . . ..
We wrote the left hand side of the equation y′ (x) + 4y(x) = 8 as a polynomial of
degree 3 plus some dots. Next, we need to write the right hand side as a polynomial
of degree 3 plus some dots. In the next example, this will require a little bit of work,
but in this example the right hand side is very simple, just a constant 8, so we can
formally write
8 = 8 + 0x + 0x2 + 0x3 .
Because we want the two sides to be equal, we need to make sure that
(c1 + 4c0 ) + (2c2 + 4c1 )x + (3c3 + 4c2 )x2 + (4c4 + 4c3 )x3 = 8 + 0x + 0x2 + 0x3 .
This means that the coefficients in front of each power of x must be equal, so
The first equation gives us c1 = 12. Plugging it into the second equation gives us
that 2c2 + 4(12) = 0, so c2 = −24. Plugging it into the third equation gives us that
3c3 + 4(−24) = 0, so c3 = 32. Plugging it into the fourth equation gives us that
4c4 + 4(32) = 0, so c4 = −32. We found the Taylor polynomial approximation of
degree 4 for the solution y(x) of this equation:
x2 x3 x4
ex ≈ 1 + x + + +
2! 3! 4!
Replacing x with −4x:
4.6 Approximating solutions by Taylor polynomials 257
Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method
of undetermined coefficients.
sin(x)
y′ (x) − 2y(x) = , y(0) = 0.
x
Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method
of undetermined coefficients.
Solution: We handle the left hand side exactly the same way as in the first example.
The initial condition gives us c0 = y(0) = 0, so
y(x) = c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . .
−2y(x) = −2c1 x − 2c2 x2 − 2c3 x3 − 2c4 x4 + . . .
y′ (x) − 2y(x) = c1 + (2c2 − 2c1 )x + (3c3 − 2c2 )x2 + (4c4 − 2c3 )x3 + . . .
What is different in this example is that the right hand side sin(x)
x is not a simple
constant anymore and, to match the coefficients, we first need to find its Taylor
polynomial. For this, we need to recall the pattern of Taylor polynomials for sin(x):
x3 x5
sin(x) = x − + −....
3! 5!
sin(x) x2 x4 x2 x4
= 1 − + − . . . = 1 + 0x − + 0x3 + − . . . .
x 3! 5! 3! 5!
sin(x)
Equating the coefficients of y′ (x) − 2y(x) above with the coefficients of x :
258 4 Differential equations
1 1
c1 = 1, 2c2 − 2c1 = 0, 3c3 − 2c2 = − =− , 4c4 − 2c3 = 0.
3! 6
11 11
Solving them sequentially we get c1 = 1, c2 = 1, c3 = 18 , and c4 = 36 . We found
the Taylor polynomial approximation of degree 4:
11 3 11 4
x + x2 + x + x .
18 36
Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method
of undetermined coefficients.
Answer to Exercise 1. The initial condition gives us c0 = y(0) = 2, so
y(x) = 2 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . .
−3y(x) = −6 − 3c1 x − 3c2 x2 − 3c3 x3 − 3c4 x4 + . . .
y′ (x) − 3y(x) = (c1 − 6) + (2c2 − 3c1 )x + (3c3 − 3c2 )x2 + (4c4 − 3c3 )x3 + . . .
y(x) = c1 x + c2 x2 + c3 x3 + c4 x4 + . . .
y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . .
y′ (x) + y(x) = c1 + (2c2 + c1 )x + (3c3 + c2 )x2 + (4c4 + c3 )x3 + . . .
To handle the right hand side, recall the pattern of Taylor polynomials for cos(x):
x2 x3
cos(x) = 1 − + −....
2! 3!
x3 x5 x3 x5
x cos(x) = x − + − . . . = 0 + x + 0x2 − + 0x4 + − . . . .
2! 3! 2! 3!
Equating the coefficients of y′ (x) + y(x) with the coefficients of x cos(x):
1
c1 = 0, 2c2 + c1 = 1, 3c3 + c2 = 0, 4c4 + c3 = − .
2
Solving these equations sequentially, we get c1 = 0, c2 = 21 , c3 = − 16 , c4 = − 12
1
,
so the Taylor polynomial approximation of degree 4 is
1 2 1 3 1 4
x − x − x .
2 6 12
Chapter 5
Taylor polynomials and series
The above table contains a list of several classic examples of Taylor series, as
well as the radius of convergence R for each of them. We will discuss the meaning
of what is written in the table in this section and subsequent sections.
First of all, let us recall that in Section 3.7 we introduced and discussed Taylor
polynomials of degree n centered at x = a, which can be used to approximate a
function y = f (x) near a point x = a:
In this section, we will push this definition of a Taylor polynomial to the limit,
where it will become the Taylor series.
261
262 5 Taylor polynomials and series
Example 1. Let us discuss the meaning of the Taylor series for ex centred at a = 0:
x2 x3 ∞
xn
ex = 1 + x + + +... = ∑ n! (R = ∞)
2! 3! n=0
x
a−R a a+R
or, in other words, for x in between a − R and a + R for some number R which is
called the radius of convergence. (Sometimes things are a bit more subtle than this,
but we will not encounter such unusual examples.) For example, the table above
says that, in the case of the exponential function ex , the radius of convergence
is R = ∞. This means that for any −∞ < x < ∞, the Taylor polynomials Pn (x)
will eventually get closer and closer to ex if we keep increasing the degree n. In
mathematical language, we can say that Pn (x) converges to ex for all x and write
ex = lim Pn (x).
n→∞
x2 x3 ∞
xn
ex = 1 + x + + +... = ∑ .
2! 3! n=0 n!
In the middle, the dots . . . express that if we continue adding more and more terms
to the Taylor polynomials we will get closer and closer to ex . On the right hand side,
a more sophisticated notation ∑∞ n=0 also expresses that we keep adding the terms
of degree n indefinitely, up to infinite degree n = ∞. This infinite sum is called the
Taylor series of ex centred at a = 0.
5.1 From Taylor polynomials to Taylor series 263
The notation ∑∞n=0 is very important, and it is called the Sigma notation. The n
most important thing about this notation is that we are able to find the formula xn!
that expresses the term number n the Taylor series. In many applications we can
simply write out a few terms at the beginning, but sometimes we will need the
formula for the nth term, so it is important to remember those. Let us check that
2 3
this formula indeed encodes correctly the terms of the series 1 + x + x2! + x3! + . . . .
We need to recall the convention that 0! = 1. Then,
x0 x1 x2
n = 0 =⇒ = 1, n = 1 =⇒ = x, n = 2 =⇒ , etc.
0! 1! 2!
xn
We see that the formula n! matches the pattern of the series as the degree n changes
from 0, 1, 2, etc.
Exercise 1. Discuss the meaning of the Taylor series for cos(x) and sin(x) centred
at a = 0:
x2 x4 x6 ∞
(−1)n 2n
cos(x) = 1 − + − +... = ∑ x (R = ∞)
2! 4! 6! n=0 (2n)!
x3 x5 x7 ∞
(−1)n
sin(x) = x − + − +... =
3! 5! 7! ∑ (2n + 1)! x2n+1 (R = ∞)
n=0
What is the meaning of R = ∞? Check that the ∑-notation matches the pattern in
each case. Does the index n represent the degree of the Taylor polynomial in these
formulas?
Example 2. Check that the series written above is the Taylor series centred at a = 0
1
of the function f (x) = 1−x .
Solution: Let us take a few derivatives of f (x):
1 1·2 1·2·3 1·2·3·4
f ′ (x) = , f ′′ (x) = , f ′′′ (x) = , f (4) (x) = ,
(1 − x)2 (1 − x)3 (1 − x)4 (1 − x)5
n!
etc. We notice the pattern: f (n) (x) = (1−x)n+1
, so f (n) (0) = n!. By definition, the
Taylor series will be
′′′
f ′′ (0) 2 f (0) 3 2! 3!
f (0) + f ′ (0)x + x + x + . . . = 1 + x + x2 + x3 + . . .
2! 3! 2! 3!
which is exactly 1 + x + x2 + x3 + . . . = ∑∞ n
n=0 x .
264 5 Taylor polynomials and series
1 xn+1
− (1 + x + x2 + x3 + . . . + xn ) = .
1−x 1−x
(c) What happens to the difference in part (b) when −1 < x < 1 and n goes to
infinity?
Taylor series for the logarithm. Next, we will discuss the series:
x2 x3 ∞
xn
ln(1 − x) = −x − − −... = −∑ (R = 1)
2 3 n=1 n
2 3
Example 2. Show that ln(1 − x) = −x − x2 − x3 − . . . for any −1 < x < 1.
Solution: Let us take the formula from part (b) in the previous exercise:
1 t n+1
− (1 + t + t 2 + t 3 + . . . + t n ) =
1−t 1−t
5.1 From Taylor polynomials to Taylor series 265
x2 x3 xn+1
Z x n+1
t
− ln(1 − x) − x + + + . . . + = dt.
2 3 n+1 0 1−t
Our goal is to show that, for −1 < x < 1, the two terms on the left hand side are
close to each other,
x2 x3 xn+1
− ln(1 − x) ≈ x + + +...+
2 3 n+1
n+1
so all we need to show is that the integral 0x t1−t dt is small when n is large. To
R
show this, for certainty, take x = 0.9. Then the numerator t n+1 < 0.9n and the
denominator 1 − t > 0.1 is not too small (we do not divide by something close
n+1 n
to zero), so the function we integrate is pretty small: t1−t ≤ 0.9
0.1 → 0 as n → ∞.
As a result, the integral will be small, so the series will indeed approximate the
logarithmic function ln(1 − x).
x2 x3 ∞
(−1)n+1 n
ln(1 + x) = x − + −... = ∑ x .
2 3 n=1 n
Answer to Exercise 1. In both cases, the series notation expresses that as we add
more and more terms, the sum will get closer and closer to our function, cos(x)
or sin(x). The fact that the radius of convergence R is equal to ∞ means that this
approximation will work for all x, between −∞ < x < ∞. Of course, the further x
is from the centre a = 0, the more terms we might have to add before this approxi-
mation gets good.
n
Let us check the ∑-notation. In the case of cos(x), the general term is (−1) 2n
(2n)! x ,
so:
(−1)0 0 (−1)1 2 x2
n = 0 =⇒ x = 1, n = 1 =⇒ x =− ,
(0)! (2)! 2!
(−1)2 4 x4 (−1) 63 x6
n = 2 =⇒ x = , n = 3 =⇒ x =− ,
(4)! 4! (6)! 6!
etc. We see that the formula matches the pattern correctly. In the case of sin(x), the
(−1)n 2n+1
general term is (2n+1)! x , so:
266 5 Taylor polynomials and series
(−1)0 1 (−1)1 3 x3
n = 0 =⇒ x = x, n = 1 =⇒ x =− ,
(1)! (3)! 3!
(−1)2 5 x5 (−1)3 7 x7
n = 2 =⇒ x = , n = 3 =⇒ x =− ,
(5)! 5! (7)! 7!
etc. Again, we see that the formula matches the pattern correctly. In these two
cases, the index n does not represent the degree of the polynomial. It represents the
term number, and the degree is either 2n in the case of cosine, or 2n + 1 in the case
of sine. Notice how the degree increases by 2 in both cases, which explains why
we needed to multiply n by 2 in these formulas.
Answer to Exercise 2. (a) If |x| ≥ 1 then |xn | ≥ 1, so xn does not get small when n
gets large. When we start adding numbers xn which do not get smaller and smaller,
we can not approach any limit, so Taylor polynomials Pn (x) = 1 + x + . . . + xn
will not converge to anything as the degree n gets bigger. This explains why the
geometric series does not converge outside of the interval −1 < x < 1.
(b) Writing the difference with the common denominator,
1 1 − (1 − x)(1 + x + x2 + x3 + . . . + xn )
− (1 + x + x2 + x3 + . . . + xn ) = .
1−x 1−x
Let us multiply out the second term (1 − x)(1 + x + x2 + x3 + . . . + xn ) in the nu-
merator:
(1 + x + x2 + x3 + . . . + xn ) − x(1 + x + x2 + x3 + . . . + xn )
=(1 + x + x2 + x3 + . . . + xn ) − (x + x2 + x3 + x4 + . . . + xn + xn+1 )
=1 + (x + x2 + x3 + . . . + xn ) − (x + x2 + x3 + x4 + . . . + xn ) − xn+1
(
(
((( ((((
(x(
=1 + ( +(x2(
+(x3(
+(. . . + xn ) − (x(
+(x2(
+(x3(
+(x4(+ . . . + xn ) − xn+1
(
which is 1 − xn+1 .
The numerator is 1 − (1 − xn+1 ) = xn+1 as promised.
(c) If −1 < x < 1, which means that the absolute value |x| < 1 is smaller than
1, then |x|n → 0 as n → ∞, because an is a geometric decay function when the
base a < 1. For example, 0.510 = 0.0009765625. This means that the difference
we found in part (b):
xn+1
→0
1−x
will get smaller and smaller as n gets bigger, when x is between −1 and 1. This
1
shows that the geometric series 1 + x + x2 + x3 + . . . converges to 1−x for −1 < x <
1, so the radius of convergence R = 1, as promised.
x2 x3 ∞
xn
ln(1 − x) = −x − − −... =−∑
2 3 n=1 n
5.1 From Taylor polynomials to Taylor series 267
(−x)2 (−x)3 ∞
(−x)n
ln(1 − (−x)) = −(−x) − − −... =−∑ .
2 3 n=1 n
x2 x3 ∞
(−1)n+1 n
ln(1 + x) = x − + −... = ∑ x ,
2 3 n=1 n
Step 3. Finally, we divide both sides by x and simplify. In the Taylor series, we
can divide term by term, just like a regular sum:
sin(x2 ) ∞
(−1)n x4n+2 ∞
(−1)n 4n+1
=∑ =∑ x .
x n=0 (2n + 1)! x n=0 (2n + 1)!
This is the answer in the Σ-notation. Writing out the first few terms:
Warning. The reason we could divide each term by x is because the series for
sin(x2 ) did not have a constant term c0 , and all the terms had at least one power
of x that could be cancelled out. If, for example, the series started with 1 + . . . ,
dividing by x would give 1x + . . . , which would not be a Taylor series. Also, in Step
2 above, should be something that gives us Taylor series again at the end. As
we will see from examples below, it does not always have to the the power of x,
but it should be simple enough.
2
Exercise 1. Find the Taylor series of f (x) = xe−x centered at x = 0. Write the
answer using the Σ-notation and by writing out the first few terms. Where does this
series converge? What is its radius of convergence R?
1
Example 2. Find the Taylor series of f (x) = 1+2x 2 centered at x = 0. Write the
answer using the Σ-notation and by writing out the first few terms. Where does this
series converge? What is its radius of convergence R?
1 1
Solution: The function 1+2x 2 looks similar to 1−x , so we should start with the
geometric series:
∞ ∞
1 1
1−x
= ∑ xn or
1−
= ∑( )n .
n=0 n=0
In the denominator, our function f (x) has 1 + 2x2 , but we want to see something
like 1 − . In this case, we simply write 1 + 2x2 = 1 − (−2x2 ), which means that
we should replace with −2x2 :
∞ ∞
1 1 2 n
= = ∑
1 + 2x2 1 − (−2x2 ) n=0
(−2x ) = ∑ (−1)n 2n x2n .
n=0
In this case we do not multiply by anything, so we can simply write out a few terms
at the beginning of the series:
∞
1
= ∑ (−1)n 2n x2n = 1 − 2x2 + 4x4 − 8x6 + 16x8 − . . . .
1 + 2x2 n=0
How can we decide where this series converges if we know that the original ge-
ometric series converges when −1 < x < 1, or −1 < < 1? Since we replaced
by −2x , the new series converges when −1 < −2x < 1, or −1 < 2x2 < 1.
2 2
1
Exercise 2. Find the Taylor series for f (x) = 1+3(x−1) 2 centered around x = 1.
Write the answer using the Σ-notation and by writing out the first few terms. Where
does this series converge? What is its radius of convergence R?
Example 3. Find the Taylor series of f (x) = ln(2 − x) centered at x = 0. Write the
answer using the Σ-notation and by writing out the first few terms. Where does this
series converge? What is its radius of convergence R?
Solution: The function ln(2 − x) looks similar to ln(1 − x), so we should start with
the series:
∞
xn ∞
( )n
ln(1 − x) = − ∑ or ln(1 − )=−∑ .
n=1 n n=1 n
x x2 x3
ln(2) − − − − . . . .
2 8 24
From the last section we know that the original series for ln(1 − x) converges when
−1 < x < 1, or −1 < < 1. Since we replaced by 2x , the new series converges
when −1 < 2x < 1, or −2 < x < 2. This means that the radius of convergence is
R = 2.
Next two problem will be slightly more tricky, because they require both shift
and rescaling of the argument.
Example 4. Find the Taylor series of f (x) = 1x centered at x = 5. Write the an-
swer using the Σ-notation. Where does this series converge? What is its radius of
convergence R?
5.2 Transformations of Taylor series 271
1 1 1 1 1 1 1
= = x−5
= · x−5
= · .
x 5 + (x − 5) 5(1 + 5 ) 5 1 + 5 5 1 − (− x−5
5 )
Now it looks like what we want and if we replace in the geometric series above
by (− x−5
5 ), we get
1 1 1 ∞ x − 5 n ∞
(−1)n
· = ∑ − = ∑ (x − 5)n .
5 1 − (− x−5
5 ) 5 n=0 5 n=0 5 n+1
We can see that this series is centered at a = 5 because all the powers are of the
form (x − 5)n . This series converges when −1 < − x−55 < 1, or 0 < x < 10, which
means that the radius of convergence is R = 5.
Exercise 4. Find the Taylor series of f (x) = ln(x) centered at x = 10. Write the
answer using the Σ-notation. Where does this series converge? What is its radius
of convergence R?
we can multiply them out term by term, just like regular sums, and then collect the
terms with the same powers. In fact, there is a general formula how to multiply two
Taylor series using the Σ-notation, but it is a bit too complicated for our purposes,
so we will stick with simpler examples where we only want to find a few terms of
the product f (x)g(x). Let us illustrate it on an example.
Example 5. Find the first few terms of the Taylor series of f (x)g(x) centered at
x = 0 if
Solution: One subtle point to remember when multiplying out the product of two
series
is that the terms + . . . could contain powers of x starting from x4 , x5 , etc., and in the
problem we are not told exactly what those terms are. This means the following.
Suppose we make the multiplication table for f (x)g(x) writing all the terms of
f (x) in the first row, all terms of g(x) in the first column, and their products in
other entries of the table:
In this table the terms + . . . could contain powers starting from x4 and, as we said,
we do not know what they are. This means that the terms written in purple in the
lower right corner of the table should not be collected, because they contain powers
x4 , x5 and x6 and they could potentially be modified by the missing terms + . . . .
This means that when we multiply out two series, we should completely ignore the
terms of the same degree as + . . . and our multiplication table could look like this:
1 −2x x2 7x3
3 3 −6x 3x2 21x3
x x −2x2 x3
x2 x 2 −2x3
−4x3 −4x 3
and, after simplifying, we see that the first few terms of the product are
Exercise 5. Find the first few terms of the Taylor series of f (x)g(x) centered at
x = 0 if
Padé approximation. Next, we will give one application of what have learned
so far in this section to find some novel approximations of ex and cos(x) near x = 0:
the so called Padé approximation.
1 1
= = 1 + (−bx) + (−bx)2 + . . . = 1 − bx + b2 x2 + . . . .
1 + bx 1 − (−bx)
(b) Using what we found in part (a), we want to multiply out
1
(1 + ax) · = (1 + ax) × (1 − bx + b2 x2 + . . .)
1 + bx
= 1 − bx + b2 x2 + ax − abx2 + ab2 x3 + . . . .
However, we should remember that the + . . . term contains powers starting with
x3 , so we must ignore the term ab2 x3 . Correct multiplication will actually be:
1
(1 + ax) · = (1 + ax) × (1 − bx + b2 x2 + . . .)
1 + bx
= 1 − bx + b2 x2 + ax − abx2 + . . .
= 1 + (a − b)x + (b2 − ab)x2 + . . . .
Luckily, forgetting to ignore the term ab2 x3 would not affect out next step, but it
was worth emphasizing this point once again.
(c) We want our series in part (b) to be a good approximation of ex near x = 0,
2
which has the Taylor series ex = 1 + x + x2 + . . .. For this purpose, we want the
2
coefficients in part (b) to match 1 + x + x2 + . . . , so
274 5 Taylor polynomials and series
1
a−b = 1 and b2 − ab = .
2
The first equation gives a = b + 1 and plugging it into the second equation gives
b2 − (b + 1)b = 12 , or −b = 12 , or b = − 12 . Then a = b + 1 = − 12 + 1 = 21 . The
approximation we were looking for is
1 + ax 1 + 0.5x
ex ≈ = .
1 + bx 1 − 0.5x
This is the function graphed by the red dashed curve in the figure above.
1 + ax2 1
2
= (1 + ax2 ) · .
1 + bx 1 + bx2
(c) Make sure that the terms in part (b) match the first three terms of the Taylor
series for cos(x) to find a and b.
x n 2
Answer to Exercise 1. Because, ex = ∑∞
n=0 n! , replacing x with −x gives
2
∞
(−x2 )n ∞
((−1)x2 )n ∞
(−1)n x2n
e−x = ∑ n! = ∑ n! = ∑ n!
n=0 n=0 n=0
2
∞
(−1)n x2n x ∞
(−1)n x2n+1 x5 x7
xe−x = ∑ =∑ = x − x3 + − + . . .
n=0 n! n=0 n! 2! 3!
The original series for ex converges everywhere, so x can be replaced by any num-
ber. This means that the new series also converges for all x and its radius of con-
vergence is R = ∞.
This series is centered at a = 1, because all the terms have powers (x − 1)n . A few
terms at the beginning of the series are
Since we replaced x by −3(x − 1)2 , the new series converges when −1 < −3(x −
1)2 < 1, or −1 < 3(x − 1)2 < 1. Solving this for x:
1 1 1 1
3(x − 1)2 < 1 =⇒ (x − 1)2 < =⇒ |x − 1| < √ =⇒ − √ < x − 1 < √ .
3 3 3 3
We can also write this as 1 − √13 < x < 1 + √13 , so the radius of convergence is
R = √13 .
x 2 x 2
Answer to Exercise 3. We write 10 + x2 = 10(1 + 10 ) = 10(1 − ( 10 )), so that
x2 x2
ln(10 + x2 ) = ln 10 1 − − = ln 10 + ln 1 − − .
10 10
Using the series for ln(1 − x):
x2 ∞
(−x2 /10)n ∞
(−1)n+1 2n
ln 10 + ln 1 − − = ln 10 − ∑ = ln 10 + ∑ n
x .
10 n=1 n n=1 n10
x2 x4 x6
ln 10 + − + +....
10 200 3000
The original series for ln(1 − x) converges when −1 < x√< 1, so the√new series
x2
converges when −1 < − 10 <√1. Solving for x, we get − 10 < x < 10, so the
radius of convergence is R = 10.
The original series for ln(1 − x) converges when −1 < x < 1, so the new series
converges when −1 < − x−10 10 < 1. Solving for x, we get 0 < x < 20, so the series
is centered at a = 10 and the radius of convergence is R = 10.
Answer to Exercise 5. Because the + . . . term could contain powers x7 or above,
we should ignore those powers when multiplying things out:
7 −x2 2x4 x6
1 7 −x2 2x4 x6
2x2 14x2 −2x4 4x6
4x4 28x4 −4x6
8x6 56x6
Collecting the terms with the same powers, we see that f (x)g(x) = 7 − 13x2 +
28x4 + 57x6 + . . . .
1
Answer to Exercise 6. (a) Using the geometric series 1−x = 1 + x + x2 + . . ., we
get that
1 1
2
= = 1 + (−bx2 ) + (−bx2 )2 + . . . = 1 − bx2 + b2 x4 + . . . .
1 + bx 1 − (−bx2 )
(b) Using what we found in part (a), we want to multiply out
1
(1 + ax2 ) · = (1 + ax2 ) × (1 − bx2 + b2 x4 + . . .)
1 + bx2
= 1 − bx2 + b2 x4 + ax2 − abx4 + . . .
= 1 + (a − b)x2 + (b2 − ab)x4 + . . . ..
We did not write the term +ab2 x6 because it is absorbed by the dots + . . . .
(c) We want our series in part (b) to be a good approximation of cos(x) near
2 4
x = 0, which has the Taylor series cos(x) = 1 − x2! + x4! + . . .. For this purpose, we
2 4
want the coefficients in part (b) to match 1 − x2! + x4! + . . . , so
1 1 1
a−b = − and b2 − ab = = .
2 4! 24
The first equation gives a = b − 21 and plugging it into the second equation gives
b2 − (b − 12 )b = 24
1 1
, or b2 = 24 1
, or b = 12 1
. Then a = 12 − 12 = − 12
5
. The approxima-
tion we were looking for is
5 2
1 + ax2 1 − 12 x
cos(x) ≈ 2
= 1
.
1 + bx 1 + 12 x2
This is the function graphed by the red dashed curve in the figure in the statement
of the problem.
5.3 Ratio test and the radius of convergence 277
|an+1 |
ρ := lim
n→∞ |an |
where we ignore their signs by taking the absolute values |an+1 | and |an |.
• Then the Ratio Test tells us that:
If ρ > 1 then the series diverges. If ρ < 1 then the series converges.
x
a−R a a+R
or, in other words, for x in between a − R and a + R for some number R which
is called the radius of convergence. In the examples below we will see that such
behaviour is, indeed, a consequence of the Ratio Test.
We have stated in Section 5.1 that the radius of convergence of the the Taylor
series of the exponential function ex is R = ∞. In our first example, we will check
this using the Ratio Test.
Example 1. Find the radius of convergence of the Taylor series:
x2 x3 ∞
xn
ex = 1 + x + + +... = ∑ n! .
2! 3! n=0
Solution: To use this Ratio Test, we consider two consecutive terms in the series,
|x|n |x|n+1
|an | = and |an+1 | = ,
n! (n + 1)!
(do not forget the absolute values!) and then compute their ratio,
|x|n+1
|an+1 | (n+1)! |x|n+1 n! |x|n+1 n! n!
= |x|n
= · n= · = |x| .
|an | (n + 1)! |x| |x|n (n + 1)! (n + 1)!
n!
Notice how dividing |x|n+1 by |x|n cancels |x|n , so we are left with one power of |x|.
This will be a typical feature when applying the Ratio Test to Taylor series. Notice
that we can also simplify the ratio of factorials
n! 1 · 2 · · · (n − 1) · n 1
= =
(n + 1)! 1 · 2 · · · (n − 1) · n · (n + 1) n + 1
because we could cancel out 1 · 2 · · · (n − 1) · n in the numerator and denominator.
|x|
As a result, the ratio is n+1 and its limit is
|an+1 | |x|
lim = lim = 0.
n→∞ |an | n→∞ n + 1
Since this limit is less than 1, the Ratio Test tells us that the series converges. Notice
that the limit was 0 no matter what x was, so this conclusion works for all x, which
means that the radius of convergence is R = ∞.
Exercise 1. Find the radius of convergence of the Taylor series:
x2 x3 ∞
xn
ln(1 − x) = −x − − −... = −∑ .
2 3 n=1 n
5.3 Ratio test and the radius of convergence 279
x2 x4 x6 ∞
(−1)n 2n
cos(x) = 1 − + − +... = ∑ x .
2! 4! 6! n=0 (2n)!
Solution: To use this Ratio Test, we consider two consecutive terms in the series,
Notice how dividing |x|2n+2 by |x|2n cancels |x|2n , so we are left with |x|2 , not |x|.
Factorials here also simplify differently:
(2n)! 1 · 2 · · · (2n − 1) · 2n 1
= =
(2n + 2)! 1 · 2 · · · (2n − 1) · 2n · (2n + 1)(2n + 2) (2n + 1)(2n + 2)
because we could cancel out 1 · 2 · · · (2n − 1) · 2n in the numerator and denominator.
Because 2n increased by 2, there are two extra factors left in this case. As a result,
|x|2
the ratio is (2n+1)(2n+2) and its limit is
|an+1 | |x|2
lim = lim = 0.
n→∞ |an | n→∞ (2n + 1)(2n + 2)
Since this limit is less than 1, the Ratio Test tells us that the series converges for all
x, which means that the radius of convergence is R = ∞.
x3 x5 x7 ∞
(−1)n
sin(x) = x − + − +...
3! 5! 7!
= ∑ (2n + 1)! x2n+1 .
n=0
In the examples above, we checked that the radius of convergence is what was
claimed in the table at the beginning of Section 5.1. Now, let us try some new
series.
∞
(−2)n
∑ 2
(x + 5)2n .
n=1 n
To compute the limit, we divide the numerator and denominator by the highest
power of n, which is n2 in this case, so
n2
|an+1 | n2 n2
lim = lim 2|x + 5|2 = 2|x + 5|2 lim n+1
n→∞ |an | n→∞ (n + 1)2 n→∞ (
n )
2
1 1
= 2|x + 5|2 lim = 2|x + 5|2 = 2|x + 5|2 .
n→∞ (1 + 1 )2 (1 + 0)2
n
By the Ratio Test, the series converges when 2|x + 5|2 < 1. Solving this for x,
1 1 1 1
|x + 5|2 < =⇒ |x + 5| < √ =⇒ − √ < x+5 < √
2 2 2 2
1 1
=⇒ − 5 − √ < x < −5 + √ .
2 2
This means that the centre is a = −5, as it should be because the series is written in
terms of powers of (x + 5)n = (x − (−5))n , which looks like (x − a)n with a = −5.
The radius is convergence is R = √12 .
P(n + 1)
lim = 1.
n→∞ P(n)
This is always true for any polynomial, because when we divide the numerator
and denominator by the highest power of n, it makes all the lower degree terms
disappear. If we are looking for a radius of convergence, for example, on a multiple
choice question on the exam where we do not need to show our work, we can
simply erase any polynomial factors from the beginning. For example, the series
5.3 Ratio test and the radius of convergence 281
∞
(−2)n (n4 − 7n + 3) ∞
∑ 2 + 3n + 1
(x + 5)2n and ∑ (−2)n (x + 5)2n
n=1 n n=1
have the same interval and radius of convergence, because the factors n4 − 7n + 3
and n2 + 3n + 1 in the numerator and denominator of the first series will not affect
the limit. The second one is much easier to work with. Of course, other factors
involving factorials like n! and exponentials like 5n cannot be ignored.
Using symmetry. In the next two problems we will use the fact that the interval
of convergence must be symmetric around the center:
x
a−R a a−R
282 5 Taylor polynomials and series
f (−2) (n)
Exercise 5. Suppose we know that the Taylor series ∑∞
n=0 n! (x+2)n converges
at x = 2 but diverges at x = 4. What can we tell about the series convergence at
x = −9, x = −6, x = −5, x = −4, x = 1 and x = 3?
The Ratio Test tells us that the series converges when this limit is smaller than 1,
i.e. |x| < 1, or −1 < x < 1. This means that the center is a = 0 and the radius of
convergence is R = 1.
Answer to Exercise 2. To use this Ratio Test, we consider two consecutive terms
in the series,
Factorials simplify to
(2n + 1)! 1 · 2 · · · 2n · (2n + 1) 1
= = .
(2n + 3)! 1 · 2 · · · 2n · (2n + 1) · (2n + 2)(2n + 3) (2n + 2)(2n + 3)
|x|2
As a result, the ratio is (2n+2)(2n+3) and its limit is
|an+1 | |x|2
lim = lim = 0.
n→∞ |an | n→∞ (2n + 2)(2n + 3)
Since this limit is less than 1, the Ratio Test tells us that the series converges for all
x, which means that the radius of convergence is R = ∞.
From here, we can proceed in two ways. The fastest way is to remember that the
geometric series ∑∞ n=0 ( )n converges when −1 < < 1. In this case is
x−4 x−4
5 , so the above series converges when −1 < 5 < 1, or −5 < x − 4 < 5, or
−1 < x < 9. So the center is the middle points a = 4 and the radius of convergence
is R = 5.
Another way is to use the ratio test. Two consecutive terms are
x−4 n x−4 n+1
|an | = and |an+1 | = ,
5 5
284 5 Taylor polynomials and series
f (n) (a)
cn = or f (n) (a) = n!cn .
n!
Of course, this is how the coefficients cn of the Taylor series are defined, but if we
can compute the Taylor series first, we can use the coefficients cn to compute the
derivatives.
Here, we simply applied the definition of the coefficients of the Taylor series. For
1
example, c99 = (99!)2 , because 99 is odd.
Computing derivatives. In the next two problems we will have to compute the
series first using some basic transformations of classic series. We will also use a
convenient notation for the derivative f (n) (a), namely,
dn
f (x) .
dxn x=a
dn
The notation expresses that we compute nth derivative dxn of the function f (x) and
then evaluate it at x = a.
Example 2. Compute
d 20 −x2 d 21 −x2
xe and xe .
dx20 x=0 dx21 x=0
2
Solution: First, we need to find the Taylor series for xe−x centered at 0. Starting
xn 2
with the exponential series ex = ∑∞n=0 n! , we replace x by −x ,
2 (−x2 )n
∞ ∞
(−1)n 2n
e−x = ∑ n! = ∑ x ,
n=0 n=0 n!
2
∞
(−1)n 2n ∞
(−1)n 2n+1
xe−x = x ∑ x =∑ x .
n=0 n! n=0 n!
20
d −x 2
To find the derivative dx 20 xe x=0
, we need to find the coefficient in this series in
front of x or, in other words, when the power 2n + 1 = 20, or n = 19
20
2 = 9.5. This
20
is not integer, so there is not term in the series with the power x . Another way to
see that 2n + 1 cannot be equal to 20 is because it is always odd. Since the power
x20 is not in the series, the coefficient c20 = 0 and, as a result,
d 20 −x2
xe = 0.
dx20 x=0
d 21−x 2
To find the derivative dx 21 xe x=0
, we need to find the coefficient c21 in this
21
series in front of x or, in other words, when the power 2n + 1 = 21, or n = 10.
10
The coefficient c21 in front of x21 is (−1) 1
10! = 10! , so the derivative is
d 21 −x2 1 21!
xe = 21!c21 = 21! = = 14079294028800.
dx21 x=0 10! 10!
5.4 Applications of Taylor series 287
Exercise 2. Compute
d 11 d 12
x sin(x) and x sin(x) .
dx11 x=0 dx12 x=0
Computing limits. In the next two problems, we will apply Taylor series to
compute some limits.
Solution: We cannot just plug in x = 0, because we will get 00 . Instead, we will need
to simplify first using Taylor series. As in the previous example, starting with the
xn 2
exponential series ex = ∑∞n=0 n! , we replace x by −x ,
2
∞
(−x2 )n ∞
(−1)n 2n x4
e−x = ∑ n! = ∑ x = 1 − x 2
+ +... .
n=0 n=0 n! 2
2 x4
e−x − 1 + x2 = +... .
2
The dots . . . have powers at least x5 (actually, in this case, at least x6 ), so after we
divide both sides by x4 , we get
2
e−x − 1 + x2 1
= +...
x4 2
x5
where the dots . . . now have at least one power of x, because x4
= x. When we let
x go to zero, all those . . . terms will disappear and so
2
e−x − 1 + x2 1
lim = .
x→0 x4 2
sin(x3 ) − x3
lim .
x→0 x9
288 5 Taylor polynomials and series
Comparing functions near a point. In the last two problems we used that the
. . . terms in the Taylor series disappeared in the limit x → 0 as long they they had
at least one power of x. Next, we will use a similar idea to compare two functions
near x = 0. Next two problems will refer to the following figures.
2 4 3 5
Solution: Recall that cos(x) = 1 − x2! + x4! − . . . and sin(x) = x − x3! + x5! − . . . and,
replacing x by x2 in sin(x),
x6 x10 x6 x10
sin(x2 ) = x2 − + −... and 1 − sin(x2 ) = 1 − x2 + − +... .
3! 5! 3! 5!
Comparing cos(x) and 1 − sin(x2 ) is equivalent to comparing
x2 x4 x6 x10
1− + −... and 1 − x2 + − +... .
2! 4! 3! 5!
First, we can cancel 1 on both sides, so we need to compare
x2 x4 x6 x10
− + −... and − x2 + − +... .
2! 4! 3! 5!
Then we can divide both sides by x2 and compare
1 x2 x4 x8
− + −... and −1+ − +... .
2! 4! 3! 5!
Near x = 0, the terms that have at least one power of x will get smaller and smaller,
so near 0 the main contribution is − 12 on the left hand side and −1 on the right
hand side. Since − 21 > −1, the left hand side is bigger near x = 0. As a result, we
conclude that cos(x) > 1 − sin(x2 ) near x = 0, so the blue solid graph corresponds
to cos(x) and red dashed graph corresponds to 1 − sin(x2 ).
5.4 Applications of Taylor series 289
(−1) n+1
Example 5. Using the series ln(1 + x) = ∑∞
n=1 n xn , compute the integral
Z 1.5
ln(x) dx.
1
Solution: The integral can actually be computed using integration by parts, but here
we will try to use Taylor series. The function we integrate is ln(x), while the series
is for ln(1 + x). We can either change variables in the integral or in the series, so
let us make the substitution x = 1 + t, dx = dt in the integral and rewrite it as
Z 1.5 Z 0.5
ln(x) dx = ln(1 + t) dt.
1 0
Recall that the radius of convergence of the above series is R = 1 and the center is
1, so the interval of integration [0, 0.5] is inside the interval of convergence and we
can integrate term by term:
(−1)n+1 n
Z 0.5 Z 0.5 ∞
ln(1 + t) dt = ∑ t dt
0 0 n=1 n
(−1)n+1
∞ Z 0.5
= ∑ n t n dt
n=1 0
∞
(−1)n+1 t n+1 t=0.5
= ∑ ·
n=1 n n+1 t=0
∞
(−1)n+1 0.5n+1
= ∑ ·
n=1 n n+1
∞
(−1)n+1 0.5n+1
= ∑
n=1 n(n + 1)
0.52 0.53 0.54
= − + −....
2 6 12
For example, if we sum the first three terms written above, we get 0.109375, while
the actual integral is 0.108198.
290 5 Taylor polynomials and series
Shape of graphs. Let us recall how the coefficients in front of the powers (x−a)
and (x − a)2 in the Taylor series correspond to the properties of the graph of a
function y = f (x).
Exercise 6. Which function among above four figures has the Taylor series
1
f (x) = (x − 4) + (x − 4)2 + . . . .
2
(a) (b)
(c) (d)
5.4 Applications of Taylor series 291
Answer to Exercise 1. The center of the series is a = −4, so its coefficients allow
us to compute derivatives at x = −4 using the formula f (n) (−4) = n!cn .
(a) True. Notice that the series starts with index n = 20. In other words, the
lowest power is (x + 4)20 . Since there is no term (x + 4)19 , the coefficient in front
of it is zero, c19 = 0, so the derivative f (19) (−4) = 0.
(b) False. Using the above formula,
522 522
f (22) (−4) = 22!c22 = 22! √ = 22! = 22!521 .
22 + 3 5
(c) True. It matches what we computed in part (b).
(d) False.
525 525
f (25) (−4) = 25!c25 = 25! √ = 25! √ ̸= 525 .
25 + 3 28
d 11
Because the powers 2n + 2 are always even, dx11
x sin(x) = 0.
x=0
d 12
To find dx12
x sin(x)|x=0 , we need to find the coefficient c12 in front of x12 . This
(−1) 5
1
happens when 2n + 2 = 12, or n = 5, so the coefficient is c12 = (2·5+1)! = − 11! .
The the derivative is
d 12 1
x sin(x) x=0
= 12!c 12 = 12! − = −12.
dx12 11!
where the dots . . . have at least 10 (actually 15) powers of n. Subtracting x3 and
dividing by x9 , we get
sin(x3 ) − x3 1
9
= − +...
x 3!
where the dots have at least one power of x. When x goes to zero, those terms
disappear and we get
sin(x3 ) − x3 1 1
lim 9
=− =− .
x→0 x 3! 6
292 5 Taylor polynomials and series
Answer to Exercise 4. From the Taylor series for cos(x), we know that
x2 x4 x2 x4
1 − cos(x) = − +... = − +....
2! 4! 2 24
√
Next, let us find the first three terms of the series for f (x) = 1 + x = (1 + x)1/2 .
We compute
1 1
f ′ (x) = (1 + x)−1/2 = √ ,
2 2 1+x
1 1 1
f ′′ (x) = − (1 + x)−3/2 = − ,
2 2 4(1 + x)3/2
so f (0) = 1, f ′ (0) = 1
2 and f ′′ (0) = − 14 . This gives the first three terms of the
Taylor series:
√ x x2
1+x = 1+ − +....
2 8
2
Plugging in x and then subtracting 1 gives:
p x2 x4 x2 x4
1 + x2 − 1 = 1 + − + . . . − 1 = − + . . . .
2 8 2 8
√
So, comparing 1 − cos(x) and 1 + x2 − 1 is equivalent to comparing
x2 x4 x2 x4
− +... and − +....
2 24 2 8
x2
Cancelling 2 on both sides and then dividing by x4 leads to comparing
1 1
− +... and − +....
24 8
On both sides the dots . . . contain at least one power of x, so they become
√negligible
1
near x = 0, so we compare − 24 > − 18 . This means that 1 − cos(x) > 1 + x2 − 1
√ blue solid graph corresponds to 1 − cos(x) and red dashed graph
near x = 0, so the
corresponds to 1 + x2 − 1.
Answer to Exercise 5. Since all the terms in the Taylor series for sin(t) have at
least one power of t, we can divide the series by t to represent
The series for sin(t) has the radius of convergence R = ∞, so we are allowed to
integrate it over any interval, term by term. Then
5.4 Applications of Taylor series 293
(−1)n
Z x Z x ∞
sin(t)
0 t
dt =
0 n=0
∑ (2n + 1)! t 2n dt
(−1)n
∞ Z x
= ∑ (2n + 1)! t 2n dt
n=0 0
∞
(−1)n t 2n+1 t=x
= ∑ (2n + 1)! · 2n + 1 t=0
n=0
∞
(−1)n x2n+1
= ∑ (2n + 1)! · 2n + 1
n=0
∞
(−1)n x2n+1
= ∑ (2n + 1)!(2n + 1) .
n=0
In this case, the integral can not be computed by finding the antiderivative, so using
a series representation is a great alternative.