Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

LectureNotes VT23 Part0

This document provides an introduction to mathematical proofs. It discusses the need for axioms or initial assumptions in order to establish truth in mathematics. The first part of the "rulebook" for real numbers is presented, including axioms related to equalities and addition of real numbers. The goal of proofs is to justify statements as logical consequences of these initial axioms and previously proven statements. Different techniques for writing proofs are introduced.

Uploaded by

sdfsfsd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

LectureNotes VT23 Part0

This document provides an introduction to mathematical proofs. It discusses the need for axioms or initial assumptions in order to establish truth in mathematics. The first part of the "rulebook" for real numbers is presented, including axioms related to equalities and addition of real numbers. The goal of proofs is to justify statements as logical consequences of these initial axioms and previously proven statements. Different techniques for writing proofs are introduced.

Uploaded by

sdfsfsd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 120

Don’t Panic

Part 0

A guide to MATA21 Analysis in One Variable

Version: January 13, 2023

Jan-Fredrik Olsen
ii
Contents

1 A crash course on mathematical proofs 1


1.1 The (starting) point of mathematical proofs . . . . . . . . . . . . . . . . . 2
1.2 Proofs by chains of equalities and counter-examples . . . . . . . . . . . . . 7
1.3 Proof by chains of implications and cases . . . . . . . . . . . . . . . . . . . 14
1.4 Proof by contradiction and by contraposition . . . . . . . . . . . . . . . . 22
1.5 Proof by induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.6 The completeness axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.7 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 45

8 The derivative 49
8.1 Computational rules for the derivative . . . . . . . . . . . . . . . . . . . . 50
8.2 Proof of the computational rules for the derivative . . . . . . . . . . . . . 59
8.3 Differentiation formulas for elementary functions . . . . . . . . . . . . . . 64
8.4 Exam exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.5 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . 71

9 How to compare infinities C-1

10 A crash course in Python D-1


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
10.2 How to compute and visualize sequences and sums . . . . . . . . . . . . . D-9
10.3 Some additional control statements in Python . . . . . . . . . . . . . . . . D-17
10.4 Functions in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-20
10.5 How numbers are represented in Python . . . . . . . . . . . . . . . . . . . D-29
10.6 Answers to selected exercises . . . . . . . . . . . . . . . . . . . . . . . . . D-36

iii
iv CONTENTS
Chapter 1

A crash course on mathematical


proofs

This chapter is intended as an introduction


to mathematical proofs. “Doing” mathematical
proofs is quite similar to, say, playing a game of
Carcassone (or Monopoly, for that matter): we
need some pieces to play with and a rulebook to
tell us what moves we are allowed to do. More-
over, it is hopeless to expect to master the game
simply by reading the rules!
The pieces we are going to use are the real Fig. 1. Carcassone beats Monopoly!
numbers (and in some cases, the complex num-
bers). On these pieces we are going to apply the
standard operations of arithmetic (i.e., addition, subtraction, multiplication and division)
according to the rules of a rulebook that we introduce little by little throughout the
chapter. We explain how to use these rules by suggesting 7 proof techniques.
In the final section of this chapter, we also study the completeness axiom, which
opens the door to the study of the infinitely large and infinitely small.
The crash course consist of the following parts:

1.1: The (starting) point of mathematical proofs


1.2: Proof by chains of equalities and counter-examples.
1.3: Proof by chains of implications and cases.
1.4: Proof by contradiction and contraposition.
1.5: Proof by induction
1.6: The completeness axiom

1
2 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

1.1 The (starting) point of mathematical proofs

In this section, we briefly discuss the point of proofs in mathematics, and formulate the
first part of the rulebook for the real numbers.

In the beginning, there was nothing

In these lecture notes, one of the main points of mathematical proofs is to understand
why what we learned in high school mathematics is true. A second, and perhaps, more
important goal, is to systematically develop mathematical theorems that goes far beyond
what you learned in high school.
When building a mathematical theory, mathematicians can be guided by various
motivations. For instance, here are some criteria that mathematicians use when trying
to figure out whether a mathematical statement is interesting or not:

• Is it true? • Is it useful? • Is it beautiful?

While the two latter criteria are subjective, and are judged completely differently
by mathematicians from different fields, they are considered to be important. As the
influential mathematician G. H. Hardy famously wrote in his short, and quite readable,
book "A Mathematicians Apology":

The mathematician’s patterns, like the


painter’s or the poet’s must be beautiful;
the ideas, like the colours or the words,
must fit together in a harmonious way.
Beauty is the first test: there is no per-
manent place in the world for ugly math-
ematics.
Fig. 2. G. H. Hardy (1877 – 1947).

Indeed, as is probably the case for all authors of textbooks in mathematics, it is my


sincere hope that you will be able to glimpse some of the beauty (and usefulness!) of
mathematics while studying these lecture notes.
However, what sets mathematics aside from every other science is its unparalleled
potential to uncover truth. Indeed, mathematics is probably the only human endeavour
where "absolute" truth is part of the routine. In other sciences, "truth" at best means
"highly probable". For instance, no one really knows how the biological and chemical
processes in an organism works, or even that every time someone drops a rock, then it
will fall (indeed, no one can really guarantee that gravity will still be around tomorrow).
1.1. THE (STARTING) POINT OF MATHEMATICAL PROOFS 3

The point is that mathematicians make it their


business to know how everything in mathematics
works, and is connected. In a sense, we have no
choice: this is exactly what is required to know that
everything is true! Why is this important? Well,
for instance, in the 1960’s, the Higgs boson (also
known as the God particle) was found as a solution
to some set of equations. Since the mathematics can
be trusted, physicists could be sure that the result
was as good as the assumptions leading up to it.
And, as we all know, 50 years and 5 billion dollars
later, the particle was (with high probability) found Fig. 3. In fact, Homer Simpson
in experiments! discovered the Higgs boson inde-
pendently in 1988! (Simon Singh
But how do mathematicians attain "absolute" has an excellent book on mathe-
truth? The answer is: by constructing mathematical matics in the Simpsons.)
proofs.

Moral definition A mathematical proof of a statement "if A then B" is a logically


correct step by step justification of the statement B that at each step only uses the
statement A in combination with other statements that are already known to be true.

But what does it mean for a statement to be true? Well, here are some examples of
true statements:

Example 1.1 (Selection of true statements)

• Addition by zero changes nothing: a + 0 = a.


• Multiplication by zero kills every number: a · 0 = 0.
• The conjugate rule: (a − b)(a + b) = a2 − b2 .
• The division rule for fractions: (a/C)/(b/D) = (aD)/(bC).
• The power rule: (ab)n = an bn .

While the above statements seem to be true, the real question here is how do we know
this? Ideally, we would want to find a mathematical proof for each of these statements.
However, this poses a problem: to prove that a statement is true, we need to show that it
is the logical consequence of some other true statement. But what is our starting point?
If we make no initial assumption that some statements are to be considered true without
proof, then we have nothing to work with.
This might be shocking, but mathematics has the same fundamental problem as
physics and religion: how can something be created from nothing?
4 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Let there be light! The first part of the rulebook for the real numbers
Where physicists have their big-bang, and priests have their moment of creation, math-
ematicians have axioms. These are an initial selection of mathematical statements that
we choose to believe are true.

Fig. 4. God: "Adam, know ye that a + 0 = a." Adam: "Duh! "

In order to keep things friendly, we introduce the axioms in four batches that we can
think of as a "rulebook" for what we are allowed to do with real numbers. The first
batch is the longest, but also contains the most "obvious" axioms. Here goes:

The rulebook for R (part 1 of 4) There exists a unique set of numbers having the
properties listed in the four parts of this rulebook. We call this set of numbers the real
numbers and denote them by R.
We begin with two axioms that essentially tell us what we mean by equalities.

(E1 ) For all real numbers x we have x = x.


(E2 ) If x = y then we can replace x by y in all expressions.

Note that since we do not want to go too deeply into mathematical logic, we only give a
rather informal formulation of rule E2 which sometimes is called the replacement rule.
Next, we give five axioms that tell us how addition works:

(A0 ) For all real numbers x, y their sum x + y is a real number


(A1 ) For all real numbers x, y we have x + y = y + x.
(A2 ) For all real numbers x, y, z we have (x + y) + z = x + (y + z).
(A3 ) There exists a real number 0 so that for all x we have x + 0 = x.
(A4 ) For all real numbers x there exists a real number a so that x + a = 0.
1.1. THE (STARTING) POINT OF MATHEMATICAL PROOFS 5

The number a from axiom A4 is usually denoted −x and is called the additive inverse
of x. Moreover, subtraction is defined as the addition of additive inverses:

For all x,y we define x − y = x + (−y).

In particular, this means that all rules for subtraction follow from the rules of addition.
Next, here are five axioms that tell us how multiplication works:

(M0 ) For all real numbers x, y their product x · y is a real number.


(M1 ) For all real numbers x, y we have x · y = y · x.
(M2 ) For all real numbers x, y, z we have (x · y) · z = x · (y · z).
(M3 ) There exist a real number 1 6= 0 so that for all x we have x · 1 = x.
(M4 ) For all x 6= 0 there exists a number b so that x · b = 1.

The number b from M4 is usually denoted by 1/x or x−1 and called the multiplicative
inverse or the reciprocal of x. Moreover, we define the quotient x/y to mean

x/y = x · (1/y).

In particular, this means that all rules for division follow from the rules of multiplication.
Finally, we include one axiom that tells us how addition and multiplication interact:
(AM) For all x,y,z we have z · (x + y) = z · x + z · y.

Fig. 5. Don’t worry too much if your attention drifts while trying to read these rules.
Just like Carcassone, to learn "the game of maths" you really need to start playing
to get anywhere.
6 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Exercise 1.2 The rulebook only seems to guarantee the existence of the numbers 0
and 1. Here we ask you to explore some immediate ways to extend the rulebook:
(a) Use the rulebook to suggest a definition for the positive integers 2, 3, 4, . . ..
(b) Similarly, suggest what a definition for the negative integers −2, −3, −4, . . ..
(c) What does the rulebook say about the fractions c/c and c/1?
(d) Suggest a meaning for the symbols a2 and a−2 .
(e) Suggest a meaning for a0 . (Hint: what should a2 · a−2 be equal to?)

Remark 1.3 (Some additional notations) If you look in other textbooks, the axioms
may be formulated slightly differently. In particular, you may see specialised notations
being used. As an example, here is a “compact” way to formulate A4 :

∀x ∈ R, ∃a ∈ R : x + a = 0.

Here, ∀ is an upside-down ‘A’ meaning “for all”, ∃ is a backwards ‘E’ meaning “there
exists” and ∈ is a variant of the Greek letter epsilon meaning “element of” or “belongs to”.
The colon “:” has several uses in mathematical notation. Here, it is supposed to mean
“such that”. We will occasionally use these notations throughout these lecture notes.

As you will see below, while calling the axioms by their labels such as M1 is efficient
in computations, it is also confusing, as we quickly forget which axiom a given label refer
to. Some names we have already given above (such as the name for the replacement
rule). Here are some others that are often used, and that you should know:

Remark 1.4 (Commonly used names for some axioms)

• A1 is called the commutative law of addition.


• A2 is called the associative law of addition.
• M1 is called the commutative law of multiplication.
• M2 is called the associative law of multiplication.
• AM is called the distributive law.

Remark 1.5 (Are mathematical truths absolute?) By what we write above, you
should suspect that there is something fishy when we claim that mathematicians are able
to establish "absolute" truths. The problem is that all proofs have to start from some
collection of axioms that we have to take on faith. That is, any mathematical statement
can only be shown to be true relative to other mathematical statements. And some of
these – the axioms – can never themselves be shown to be true.
1.2. PROOFS BY CHAINS OF EQUALITIES AND COUNTER-EXAMPLES 7

1.2 Proofs by chains of equalities and counter-examples


In this section, we consider the proof strategies proof by chains of equalities and proof by
counter-example, and use them to establish some (well-known) formulas of arithmetic.

The most basic proof strategy: Chains of equalities


Since we have to take axioms on faith, we want to keep their number to a minimum. A
problem with this is that most mathematical statements are not in the rulebook, and
must be shown to be true using a mathematical proof. Here is a first example:

Example 1.6 (First example of a chain of equalities) According to axiom AM,


we have z(x + y) = zx + zy. Is it also true that (x + y)z = xz + yz? Of course, the
answer is yes. Here is how we use the rulebook to prove this:

1. By M1 , we get (x + y)z = z(x + y).


2. By AM, we get z(x + y) = zx + zy.
3. By E2 , we can combine 1 and 2, above, to get (x + y)z = zx + zy.
4. By M1 , we get zx = xz.
5. By E2 , we can combine 3 and 4, above, to get (x + y)z = xz + zy.
6. By M1 , we get zy = yz.
7. By E2 , we can combine 5 and 6, to get (x + y)z = xz + yz. Done!

In practice, we rarely bother to mention that we use axiom E2 to connect equalities as


we did above (this gets tiring after a while). Instead, it is more usual to express the
above argument as a chain of equalities, not mentioning the use of E2 :
(M1 ) (AM) (M1 )
(x + y)z = z(x + y) = zx + zy = xz + yz.

Below, we explore several examples of the


use of the above technique. While reading
these examples, and trying to do the exer-
cises, please keep in mind that the only way
to become good at doing proofs is practice,
practice and more practice. Moreover, it is
important not to freak out if you do not see
the complete solution strategy immediately.
Try to move a couple of the pieces, and see Fig. 6. In chess, your moves should im-
where you end up! prove your position, little by little. The
same is true for each "line" in a proof.
8 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Examples on proofs by chains of equalities: properties of 0, 1 and −1


The rulebook, as formulated above, tells us almost nothing about the number 0 or about
the minus sign. For instance, two facts we would want to use frequently, but which are
not in the rulebook, are:
for all real numbers a we have a · 0 = 0 and (−1) · (−1) = 1.
It is actually good news that these facts are not
in the rulebook since this means that they aren’t
just axioms that we must accept without expla-
nation. Instead, these facts can be explained as
logical consequences of the axioms in our rule-
book. Such explanations, when each step is prop-
erly justified, are exactly what we mean by a
proof. The formulas themselves are what we call Fig. 7. The effect of multiplying
propositions or theorems (the latter essentially 3 and −3 by (−1), respectively.
means “important proposition”).
Let us now take a closer look at what we can prove about the number 0 from the
rulebook. First, we note that when a mathematician says that a number with some
property exists (as we do in axiom A3 ), he does not claim that there exists only one
such number.

Example 1.7 (Uniqueness of additive identity) As an exercise in the use of the


axioms in the rulebook, we now prove that there only exists one number 0 with the
property that x + 0 = x for all numbers x.
How to do this? Well, suppose that someone came to you with the number 0 ("fat
zero"), and claimed that this number also has the property that x + 0 = x for all x. "Oh
my God!" your friends shouts, "I invented a new and cooler looking zero!" But since you
are rather clever, you immediately come up with the following chain of equalities:
prop. of prop. of
usual zero (A1 ) fat zero
0 = 0+0 = 0+0 = 0.
That is, "fat zero" is equal to the ordinary zero! Now, the real question is, how do you
break the news to your friend?

In the above example, we used not only axioms from the rulebook, but also the
assumption, or hypothesis, on the number 0. This is completely valid, and in fact, most
mathematical statements are on the form: If such and such, then some conclusion holds.
Exercise 1.8 (a) Prove that the number 1, whose existence is given by axiom
(M3 ), is the only number satisfying x · 1 = x for all real x.
(b) Prove that all numbers x have exactly one additive inverse.
(c) Prove that all numbers x 6= 0 have exactly one multiplicative inverse.
1.2. PROOFS BY CHAINS OF EQUALITIES AND COUNTER-EXAMPLES 9

Here is another property of the number 0.

Example 1.9 We prove that for all real numbers a, we have


a · 0 = 0.
While this property seems really obvious to us, note that it is not stated in the rulebook.
We must therefore find a proof of this property using only what we know to be true so
far. In particular, we will need to use that x + 0 = x for all real numbers x.
Here is how to do this using a chain of equalities (notice how the initial idea is to
add by 0):
(A3 )
a·0 = a·0+0
(A4 ) 
= a · 0 + a · 0 − (a · 0)
(A2 ) 
= a · 0 + a · 0 − (a · 0)
(AM)
= a · (0 + 0) − (a · 0)
(A3 ) (A4 )
= a · 0 − (a · 0) = 0.

We now turn to considering the properties of the minus sign.

Example 1.10 Here is a chain of equalities to prove that (−1) · a = (−a) holds for all
real numbers a (as in Example 1.9, a crucial step is to add by 0):

(−1) · a = (−1) · a + 0 = (−1) · a + a + (−a)

= (−1) · a + a + (−a)

= (−1) · a + 1 · a + (−a)

= (−1) + 1 · a + (−a)
= 0 · a + (−a) = 0 + (−a) = −a.

Exercise 1.11 (a) Justify the steps in the above example by referring to the rule-
book, an assumption or previously proved statement.
(b) Use a chain of equalities, similar to the one used in the previous example, to
prove that (−1) · (−1) = 1.
(c) Explain why it now follows that −(−1) = 1.
10 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Examples on proofs by chains of equalities: multiplication formulas


In light of what we did above, we are now in a position to prove the multiplication
formula listed in Example 1.1.

Example 1.12 We use a chain of equalities to prove that

(a + b)(a − b) = a2 − b2 .

Here is how to do this:



(a + b)(a − b) = (a + b) a + (−b)

= (a + b)a + (a + b)(−b)

= a(a + b) + (−b)(a + b)

= aa + ab + (−b)a + (−b)b = a2 + ab − ab − b2 .

Exercise 1.13 Justify the steps in the above example by referring to the rulebook,
an assumption or previously proved statement.

Exercise 1.14 Formulate and prove formulas for how to "multiply out" the following
expressions.
(a) (a + b) · (c + d) (b) (a + b) · (a + b).

Remark 1.15 (How detailed do my proofs need to be?) Notice that while the
proof in Example 1.12 is perhaps not so complicated, we still write it out in a level of
detail that we would not use in a normal computation. For instance, we are so used
to real numbers being commutative, that we would never waste ink writing something
like (a + b)a = a(a + b). Similarly, we rarely find it necessary to write something like
(a − b) = (a + (−b)).
Indeed, to make proofs more readable, we often omit steps that we consider to be
"routine". This is something every mathematician does, and so will we. The problem is
to understand what we can safely assume to be "routine". There is no simple answer to
this, and it is ultimately also a question of style. Some mathematicians simply write out
more details than others. However, context matters. For instance, in this part of the
chapter we are explicitly discussing the axioms from the first part of the rulebook. It
therefore makes sense to be explicit about how we use these rules. But in the same way
that it is natural, and necessary, for a 2 year old to inform the world every time they go
to the toilet, there is a point when such information just stops having any function.
1.2. PROOFS BY CHAINS OF EQUALITIES AND COUNTER-EXAMPLES 11

Examples on proofs by chains of equalities: Fractions


We now aim to prove the divison rules for fractions listed in Example 1.1. To prepare
for this, we begin with the following example that will be of use below.

Example 1.16 We prove that for all a,b 6= 0, we have


1 1 1
· = .
a b ab
While this result may seem obvious, the rulebook for the real numbers say nothing
about how to multiply together multiplicative inverses. Basically, all we know is that
x/y = (1/y) · x and that (1/x) · x = 1. So, what to do? Well, how about multiplying
with 1?
1 1 1 1 1 1 ab
· = · ·1= · ·
a b a b a b ab
1 1 1
= · · ab ·
b a ab
1 a 1
= · ·b·
b a ab
1 1
= ·1·b·
b ab
1 1 b 1 1 1
= ·b· = · =1· = .
b ab b ab ab ab
(You are asked to justify each step in the exercises below.)

Exercise 1.17 Use the result of the above example to prove that for all real numbers
a, b, C, D such that C, D 6= 0, we have
a b ab
· = .
C D CD
Hint: By definition, a/C = a · (1/C).

Before dealing with the division rule, let us mention that the result of the above
example is useful when adding fractions. Notice how multiplication by 1 is a crucial
step:
1 1 1 1
+ = ·1+ ·1
2 3 2 3
1 3 1 2
= · + ·
2 3 3 2
3 2 5
= + = .
6 6 6
12 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Exercise 1.18 (a) Justify how to do the last step of the above computation.
(b) Use the above as inspiration to prove that for all real numbers a, b, C, D with
C,D 6= 0, we have
a b aD + bC
+ = .
C D CD
What remains is to try to figure out what happens when we divide a fraction by a
fraction. Again, multipling by 1 is the key idea. For instance, we have
( 21 ) ( 12 ) ( 12 ) 6 ( 21 ) · 6 3
5 = 5 · 1 = 5 · = 5 = .
(3) (3) (3) 6 (3) · 6 10
Exercise 1.19 Use the procedure from the above computation to prove the division
rule from Example 1.1:
(a/C)/(b/D) = (aD)/(bC).
In particular, point out what we need to assume about the real numbers a, b, C, D for
this rule to apply.
We now give an example where we show how "complex" fractions are simplified in
practice. Notice that the main idea is repeated use of the trick "multiply by one":

Example 1.20 (Multiplication by one to simplify fractions) Let us simplify the


expression  1 . 1 
1 1 .
x + y
x+y
First, we compute
1 1 1 y 1 x y x x+y
+ = · + · = + = .
x y x y y x xy xy xy
Next,
1 1 1 xy xy xy
1 1 = x+y = x+y · = (x+y)xy
= .
x + y xy xy
xy x+y
xy

Finally, 1

xy xy xy
x+y · (x + y)
1
x
+ y1 x+y x+y x+y xy
1
 = 1
 = 1
 · = 1
 = = xy.
x+y x+y x+y
x+y x+y · (x + y)
1

Exercise 1.21 Simplify the following expressions as much as you can by using the
trick of multiplication by one.
1 1
1 a− a
1− u+1
(a) 1 (b) 1 (c) 1
1+ x 1− a u−1 +1
a 1 4x a x−a x+a
(d) − (e) 1 + (f ) + −
a−1 1−a (x − 1)2 x−a x+a a
1.2. PROOFS BY CHAINS OF EQUALITIES AND COUNTER-EXAMPLES 13

Proof strategy: Counter-examples


How do we prove that something is false? Let us consider an example (where we use the
notation discussed in Remark 1.3).

Example 1.22 Suppose that someone claims that the following formula is true:
∀a, b ∈ R : (a + b)2 = a2 + b2 . (∗)
To prove that this statement is false, all we need to do is to find a counter-example.
That is, a pair of numbers a, b so that the left and right-hand sides are not equal. In
fact, if we plug, say, a = 1, b = 1, into the formula, we get
4 = 2.
Since this is clearly false, we conclude that the formula cannot not hold for all a, b.
Done!

However, if we replace formula (∗) in the above example by


∃a,b ∈ R : (a + b)2 = a2 + b2 , (∗∗)
then what we did above does not prove anything. In fact, statement (∗∗) is true.

Exercise 1.23 What would you have to do to prove (∗∗)? Do this.

Remark 1.24 Note that by multiplying out, we get (a + b)2 = a2 + 2ab + b2 . Now
this (correct) formula certainly looks different than (∗). However, this is not enough to
conclude that (∗) is false. Indeed, formulas may look different, but still be the same (of
course, this is not the case here).

Exercise 1.25 Are the following formulas true? For each, first (i) try to find a
counter-example to the formula. If you cannot do this, then (ii) try to find a chain of
equalities that proves the formula.
1 1 1
(a) = − , ∀a ∈ R such that a 6= −1 and a 6= 0.
a(a + 1) a a+1
1 1 1
(b) = + , ∀a,b ∈ R such that a 6= 0, b 6= 0, a + b 6= 0.
a+b a b

(c) (a + b)2 = a2 + b2 , ∀a, b ∈ R.


1 − a6
(d) 1 + a + a2 + a3 + a4 + a5 = , ∀a ∈ R such that a 6= 1
1−a
(e) (a + b + c + d)e = ae + be + ce + de ∀a, b, c, d, e ∈ R.
14 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

1.3 Proof by chains of implications and cases


In this section we consider the strategies proof by chains of implications and equivalences
as well as proof by cases. Our main focus will be to use these techniques to solve equations.

First: What is an equation?

On the right, you see an expression written by a student


on some exam. To understand whether or not the expres-
sion makes sense, we need to understand what the student
meant by it. Indeed, here are two fundamentally different
Fig. 8. Is this correct?
ways of interpreting the expression:

1) As a formula/identity. This was essentially the topic of the previous section. In-
terpreted as a formula, we would typically claim that expression in Figure 8 holds
“for all” numbers x. To prove a formula, we often use a chain of equalities.

2) As an equation. This is the topic of this section. From this point of view, we ask:
for which x is the left-hand side equal to the right-hand side? Does it hold for all
x, some x or no x? To investigate this, we study how to solve equations. To this
end, chains of equivalences and implications will come in handy.

At this point, we introduce the “curly bracket” notation for sets1 .

Example 1.26 (Curly bracket notation for sets) Anything written inside of curly
brackets {. . .} should be read as “the set of”. For instance, the set of complex numbers
is expressed as C = {a + ib : a,b ∈ R and i2 = −1}. Here, the “:” means “such that”.

This notation is often used in combination with the symbols ∀, ∃, ∈ (see Remark 1.3).

Exercise 1.27 Express the set Q using the above notation.


Remark: Although these symbols will be used throughout this course, you can probably
avoid using them. The main thing is to be able to read them.

Exercise 1.28 The existence of solutions to an equation may depend on the context.
Indeed, do you recognise the following sets?

(a) {x ∈ R : x2 + 1 = 0} (b) {x ∈ C : x2 + 1 = 0}

Hint: Here you may need the symbol ∅ which denotes the set with no elements.
1
A set is a collection of elements. For instance, in these lecture notes, we usually consider sets of real
numbers such as the interval [1,3].
1.3. PROOF BY CHAINS OF IMPLICATIONS AND CASES 15

What do we mean by implications and equivalences?


Let us first explain what we mean by equivalences and implications. Informally, we say
that two statements are equivalent if they mean exactly the same thing, or rather, if
they are always simultaneously true. Since we do not want to go too far into the field of
logics, we illustrate what we mean in the following examples:

Example 1.29 For all x ∈ R, the following statements are all true:
(i) x = 1 is equivalent to x − 1 = 0.
(ii) x = 1 is equivalent to 2x = 2.
(iii) x = 1 is not equivalent to x − 2 = 0.
(iv) x = 1 is not equivalent to x2 = 1.

If two statements are equivalent, we can denote this by using the symbol ⇐⇒ .

Example 1.30 The two first statements in the previous example can be written as:
(i) x = 1 ⇐⇒ x − 1 = 0
(ii) x = 1 ⇐⇒ 2x = 2

Note that while the statements x = 1 and x2 = 1 from part (iv) of Example 1.29
are not equivalent, they are still related. We use the implication arrows =⇒ and ⇐=
to indicate that the truth of a statement implies the truth of another statement. This
allows us to express relations between statements which are weaker than them being
equivalent.

Example 1.31 For all x ∈ R, the following statements hold:

(i) x = 1 =⇒ x2 = 1
(ii) x = 1 ⇐=
6 x2 = 1.

Here, (i) expresses the fact that if x = 1 holds, then so does x2 = 1, and (ii) expresses
the fact that if x2 = 1 holds for x, then it is not necessarily true that x = 1 (it could be
that x = −1).

The relation between implications and equivalences is as follows:

Remark 1.32 Let A and B be two statements. Then A ⇐⇒ B means exactly that
we have both A =⇒ B and A ⇐= B.
16 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Proof strategy: Proof by cases


We now illustrate the proof technique proof by cases by establishing the following fact,
which we shall need when solving quadratic equations.

Proposition 1.33 For all a,b ∈ R, we have ab = 0 ⇐⇒ a = 0 or b = 0.

Exercise 1.34 Before reading on, we ask you to observe that one half of the equiva-
lence in the above proposition has already been proved. Which one?

Example 1.35 (Proof of Proposition 1.33 using "proof by cases")


We are now going to prove the remaining part of Proposition 1.33. That is, for all
a,b ∈ R, we want to show that ab = 0 =⇒ a = 0 or b = 0. The proof strategy is to
consider the two following "cases" separately:

• Case 1: When a = 0
• Case 2: When a 6= 0.
Note that together these two cases cover all possible situations. This means that if we
are able to prove the desired conclusion in each case, then the proposition holds.
Here are the details on for each case:
• Case 1: We suppose that ab = 0 and that a = 0. In this case, there is nothing to
prove, as this means that the statement "a = 0 or b = 0" is true. Done!
• Case 2: We suppose that ab = 0 and that a 6= 0. In this case, we know by axiom
(M4 ) that a−1 exists. But this means that we have the following chain of equalities:

b = 1 · b = (a−1 a) · b = a−1 (ab) = a−1 · 0 = 0.


That is, if a 6= 0, then we must have b = 0.
In conclusion, we have shown that in both cases we must have either a = 0 or b = 0.
Since the cases cover all possible situations, we are done!

Exercise 1.36 Prove that abc = 0 ⇐⇒ a = 0 or b = 0 or c = 0 in two ways:

(a) By doing a proof by cases.


(b) By applying Proposition 1.33.
1.3. PROOF BY CHAINS OF IMPLICATIONS AND CASES 17

Preparation for proof strategy: Proof by implications and equivalences


We now establish the following result, which we will use on the following page to solve
equations using the proof strategy "chains of equivalences".

Proposition 1.37 (Operations preserving solutions of equations)


(i) For all numbers a, b, c we have
a = b ⇐⇒ a + c = b + c.
(ii) Moreover, if c 6= 0, we have
a = b ⇐⇒ a · c = b · c.

Proof. We prove (i) (leaving (ii) as an exercise). First, we note that by Remark 1.32,
we can split the equivalence into the statements

a = b =⇒ a + c = b + c and a = b ⇐= a + c = b + c.

We can now consider how to prove these two statements one at a time:
Proof of " =⇒ " direction: We are given that a = b is true. But then it follows
by the replacement rule (axiom (E2 )) to establish the (rather short) chain of equalities
a + c = b + c, and we are done.
Proof of " ⇐= " direction: We are now given that a + c = b + c. But this means we
can establish the following chain of equalities:

a = a + (c − c) = (a + c) − c

= (b + c) − c = b + (c − c) = b,

and again, we are done.

Exercise 1.38 (a) Prove part (ii) of the above proposition.


(b) There exists a shorter (and more reasonable) proof of the direction " ⇐= " in
part (i) of the above proposition. Try to figure out what it is.
Remark: If you are confused by what we are asking for in part (b), postpone this problem
until you have read the next example and worked through the problems following it.

Exercise 1.39 (a) Prove that x2 = 1 ⇐⇒ x = 1 or x = −1.


(b) More generally, prove that for all a ∈ R we have x2 = a2 ⇐⇒ x = a or x = −a.
Hint: x2 = 1 ⇐⇒ x2 − 1 = 0.
18 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Example of proof by chains of equivalences: Linear equations


We now give an example of how to use what we have seen above to solve certain equations
using a chain of equivalences. Note that it makes perfect sense to call a correct solution
of an equation a proof.

Example 1.40 (Chain of equivalences) We wish to solve the linear equation


9x + 3 = 5x + 2.
That is, we seek to identify all x ∈ R that make the left-hand side equal to the right-hand
side. Since the expressions on each side of the equality are just numbers, we can use
Proposition 1.37 (repeatedly) to produce the following chain of equivalences:
add by (−3)
9x + 3 = 5x + 2 ⇐⇒ 9x = 5x − 1
add by (−5x) multiply by (·1/4)
⇐⇒ 4x = −1 ⇐⇒ x = −1/4.

Since the first line is equivalent to the last, this chain of equivalences tells us exactly
which x solves the original equation, and we are done!

For emphasis, we repeat that the computation in the above example qualifies as a
proof that 9x + 3 = 5x + 2 holds if and only if x = −1/4. Indeed, it is a logically correct,
and justified, argument connecting these two statements.
Exercise 1.41 Solve the following equations using chains of equivalences.
1 15
(a) 3x+1 = 5x−2 (b) 1−6(x−2) = 3(x+1) (c) 2x+ (x+1) = +6x.
2 2

Remark 1.42 (Chains of equivalences versus chains of equalities) We now point


out that when two statements A and B are equations, then by A ⇐⇒ B, we mean
that A and B have exactly the same solutions. Similarly, by A =⇒ B we mean that all
solutions of A are also solutions of B, but not necessarily vice versa. In particular, this
means that we can think of the above solution strategy as a chain of equalities between
sets as follows:

{x ∈ R : 9x + 3 = 5x + 2} = {x ∈ R : 9x = 5x − 1}

= {x ∈ R : 4x = −1}

= {−1/4}.

In this way, we see that the solution strategies of chains of equivalences and chains of
equalities are closely connected.
1.3. PROOF BY CHAINS OF IMPLICATIONS AND CASES 19

Example of proof by chains implications: Quadratic equations


We now use the techniques studied above to prove the pq-formula2 .

Proposition 1.43 (the pq-formula) The equation x2 + px + q = 0 has exactly the


solutions r
p p2
x=− ± −q
2 4
In particular, these solutions only make sense (as real numbers) if p2 /4 − q ≥ 0.

In the following example, we give an example that contains all the secrets of how to
prove the pq-formula. Notice that the point is to combine Proposition 1.33 with a trick
we have seen before: adding zero! Also, notice that you will be asked to finish the job
in the exercise following the example.

Example 1.44 Let us solve the quadratic equation x2 + 5x + 4 = 0.


If we disregard the pq-formula, the only quadratic equations we know how to solve
are those of the form x2 = a (recall exercise 1.39). So, our goal is to rewrite the above
equation on what is basically this form!
So, what to do? Well, the point is to adjust the constant in the expression x2 +5x+4 =
0 so that the left-hand side can be written as (x + c)2 for some number c. Here, the
correct thing to do turns out to be to add 25/4 to both sides and then tidy up a bit:

add by (+25/4) 25 25
x2 + 5x + 4 = 0 ⇐⇒ x2 + 5x + +4=
4 4
add by (−4) 25 9
⇐⇒ x2 + 5x + =
4 4
 5 2 9
⇐⇒ x+ =
2 4
How could we know that this would work? Well, (x + c)2 = x2 + 2cx + c2 . So, for x2 + 5x
to match the two first terms of this expression, we need 2c = 5. Moreover, with c = 5/2,
we get that (x + 5/2)2 = x2 + 5x + 25/4. This is why we smuggled an extra 25/4 into
the left-hand side of the above expression.
Well, what does this help? The big deal is that now we have managed to get x on
its own, and can use the result of exercise 1.39 to solve the equation:
 5 2 9 5 3
x+ = ⇐⇒ x + = ±
2 4 2 2
5 3
⇐⇒ x = − ± ⇐⇒ x = −1 or x = −4.
2 2
2
In order to keep the discussion simple, we assume for the moment that all real numbers have square
roots (real or imaginary). We will address this fact later in the course.
20 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Exercise 1.45 Repeat the steps from the above example to solve x2 + px + q = 0

We can push the insights from the above example a bit further by looking at basically
the same computations a slightly different way. This gives us a way to factorise second
degree expressions. The only additional ingredient that we will need is the formula
(a + b)(a − b) = a2 − b2 .

Example 1.46 Let us factorise the quadratic polynomial x2 + 3x + 2. Notice that we


are not trying to solve some equation here. Instead, we are looking for a way to rewrite
the polynomial on a different form.
We begin by completing the square. That is, by replacing the constant term, we try
to "absorb" the terms x2 + 3x into an expression of the form (x + c)2 . We do this as
follows (where the trick is to add by 0 in a useful way):
9 9
x2 + 3x + 2 = x2 + 3x + − +2
4 4
9  9 
= x2 + 3x + + − + 2
4 4
 3 2 1
= x+ − .
2 4
Next, we use the formula (a + b)(a − b) = a2 − b2 with a = (x + 3/2) and b = 1/2 to
obtain:
 3 2  1 2
x2 + 3x + 2 = x + −
| {z 2} 2
|{z}
=a =b
 3 1  3 1
= x+ + x+ − = (x + 2)(x + 1).
2 2}
| {z 2 2}
| {z
=a+b =a−b

And we are done! (You should verify that this is correct by multiplying out the final
expression.)

Exercise 1.47 Use the factorisation found in example 1.46 to solve x2 + 3x + 2 = 0.


Exercise 1.48 Prove that x = a and x = b are all solutions to the quadratic equation
x2 + px + q = 0 if and only if x2 + px + q = (x − a)(x − b).
Hint: Here you need to prove two "directions". One is almost trivial (as your first
step, you should try to figure out which it is). For the remaining direction, you can
use that by the pq-formula you have expressions for the solutions of the equation (that
is, for a and b). Do these match what you get if you complete the square of x2 +px+q?
Finally, factorise like we did in example 1.46.
1.3. PROOF BY CHAINS OF IMPLICATIONS AND CASES 21

A mathematical milestone: The fundamental theorem of algebra


We now point out that by combining what we have seen over the last two pages, we
obtain the following result.

Proposition 1.49 (Zeroes of quadratic polynomials) The equation x2 + px + q = 0


always has two roots x = a and x = b. Moreover, we can always write x2 + px + q =
(x − a)(x − b).

Observe: It may be that a = b, or that both a and b are complex numbers. Indeed,
x2 + 2x + 1 = (x + 1)2 and x2 + 1 = (x − i)(x + i) are examples of such situations. In
the former case, we say that the root x = −1 is repeated, or that it has multiplicity 2.

Exercise 1.50 Suppose x2 + px + q = 0 has a root of multiplicity two. (a) What


relation does this imply between the coefficients p and q? (b) Given that p,q are real
numbers, can a repeated root be non-real? If so, give an example, if not, give a proof.

But why stop at quadratic equations (also called second degree equations) when one
can also study n-th degree equations
xn + cn−1 xn−1 + cn−2 xn−2 + · · · + c2 x2 + c1 x + c0 = 0?
(Here, we use the subscripted letters c0 , c1 , c2 , . . . to avoid running out of symbols. Note
that the expression in the left-hand side is called an n’th degree polynomial.)
One of the major achievements of early algebra, and which is beyond the scope of
this course, is the following extremely powerful result. It tells us that Proposition 1.49
has an extension to polynomials of all degrees.

Theorem 1.51 (The Fundamental Theorem of Algebra) The equation

xn + cn−1 xn−1 + · · · + c1 x + c0 = 0

has exactly n solutions r1 , r2 , . . . rn in the complex plane (counting repetitions). More-


over, in terms of these zeroes, the following factorisation holds:

xn + cn−1 xn−1 + · · · + c1 x + c0 = (x − r1 )(x − r2 ) · · · (x − rn ).

A problem with the Fundamental Theorem of Algebra is that it does not tell us
what the zeroes of the equation are. It just tells us that they always exist (at least as
complex numbers), and that we can use them to factorise the polynomial. This should be
compared to the pq-formula which tells us what the zeroes are! During the renaissance,
finding pq-formulas for equations higher than degree 2 was one of the main research
questions in mathematics. These efforts were successful for degrees 3 and 4 (although
the formulas are too complicated to be of any practical use). But, finally, in the 19th
century, it was proven that such formulas do not exist for degrees 5 and higher.
22 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

1.4 Proof by contradiction and by contraposition


In this section, we consider the indirect proof strategies proof by contradiction and proof
by contrapositive, and apply these to the study of inequalities. Note that this requires
us to formulate part 2 of our rulebook for the real numbers.

Axioms for inequalities


First some good news: most of what we have done so far for equations also applies to
inequalities (but with some notable exceptions). Indeed, consider the following example:

Example 1.52 To solve the inequality 3x + 2 < 5, the procedure is essentially identical
to the one for equations:
3x + 2 < 5 ⇐⇒ 3x < 3
⇐⇒ x < 1

So, do the solution methods for equalities and inequalities always behave in the same
way? Well, no. Consider the following exercise.

Exercise 1.53 Here are three suggested solutions of the inequality x − 2 > −8 − 2x.
Are any of them correct? If so, which one(s)?

Exercise 1.54 In Figure ?? of Appendix ??, we illustrate visually why it makes sense
for a < b to be equivalent to −b < −a when both a, b are positive. Verify that the
implication a < b =⇒ −a > −b also holds visually if:

(a) a and b start out on the opposite side of the origin, or


(b) a and b are both negative.
1.4. PROOF BY CONTRADICTION AND BY CONTRAPOSITION 23

To figure out what we are allowed to do with inequalities, we need to expand our
rulebook! Here are the additional axioms that we need to deal with inequalities.

The rulebook for R (part 2 of 4) Inequalities are governed by the following axioms:

(I1 ) For all x,y,z we have x < y =⇒ x + z < y + z.


(I2 ) For all x,y and all c > 0 we have x < y =⇒ cx < cy.
(I3 ) For all x,y,z we have x < y and y < z =⇒ x < z.
(I4 ) For all x,y exactly one of x < y, x = y or y < x holds.
Moreover, by a ≤ b we mean that either a < b or a = b holds.

Apparently, we only need four additional axioms to deal with inequalities. While it
is nice that we do not need more, this also means that there is much for us to figure out
on our own. For instance:

1. What happens if we multiply an inequality by a negative number?


2. Do the statements of (I1 ) and (I2 ) also hold with ⇐⇒ instead of only =⇒ ?
3. Is it true that 1 > 0?

To answer these questions, we formulate the following result. It directly answers


questions 1 and 2, above (you will be asked to answer question 3 in the exercises, below).

Proposition 1.55 The following properties hold for inequalities:

(i) For all x, y, z we have x < y ⇐⇒ x + z < y + z.


(ii) For all x, y and all c > 0 we have x < y ⇐⇒ cx < cy.
(iii) For all x, y and all c < 0 we have x < y ⇐⇒ cx > cy.

Exercise 1.56 Prove the ⇐= part of Proposition 1.55(i).


Exercise 1.57 Use what we have seen so far to do the following:

(a) Show that if x 6= 0, then x2 > 0.


(b) Show that 1 > 0 (hint: use (a)).
(c) Show that if x > 0 then 1/x > 0 (hint: use (a)).
(d) Show that if a,b > 0 then a < b ⇐⇒ a2 < b2 .

Exercise 1.58 (a) Identify what parts of Proposition 1.55 that we have not proved
yet. (b) Use what we have seen so far to prove the remaining parts.
24 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Proof strategy: Proof by contradiction


We now introduce the technique "proof by contradiction". This requires us to first
explain what we mean by the logical negation of a statement.

Example 1.59 (Logical negations) Consider the statement

”x = 0 and y = 1”.

Notice that this statement is false exactly if either x 6= 0 or y 6= 1. For this reason, we
say that "x 6= 0 or y 6= 1" is the logical negation of the statement "x = 0 and y = 1". If
we denote a statement by the symbol A, we can denote its negation by "not A" or ¬A.

Exercise 1.60 Determine the logical negations of the following statements:


(a) x = 1 (b) a = 0 or b = 1 (c) a ∈ {0, 1, 2} (d) x > 3.

Exercise 1.61 Let A be a subset of R. Determine the logical negations of the follow-
ing statements:
(a) ∀x ∈ A, we have x ≥ 0 (b) ∃x ∈ A such that x ≥ 0.

We now give an example to illustrate what we mean by a "proof by contradiction".

Example 1.62 (A proof by contradiction) Could it be that there are real numbers
a so that 1/a = 0? Well, the answer is no, and our goal is to prove this. Since 1/a is not
defined for a = 0, the statement we seek to prove can be expressed as follows:

a 6= 0 =⇒ 1/a 6= 0.

To do this by a "proof by contradiction", the strategy is to start out by assuming that


both the original hypothesis and the logical negation of the desired conclusion hold at
the same time, and then derive something that is clearly absurd. We now go through
the details of this proof, and at the end, we try to explain why the strategy works.
Proof. Since the logical negation of "1/a 6= 0" is "1/a = 0", the strategy just outlined
requires us to establish the following chain of implications

a 6= 0 and 1/a = 0 =⇒ · · · =⇒ something that is clearly absurd. (∗)

Now, on the one hand, since a 6= 0, we know by the axioms that the number 1/a exists,
and we have that a · (1/a) = 1. When combined with the assumption that 1/a = 0, we
get the following chain of equalities:

1 = a · (1/a) = a · 0 = 0,

which is clearly absurd, and so we are done!


1.4. PROOF BY CONTRADICTION AND BY CONTRAPOSITION 25

When reading the above example, you might be annoyed and shout "Wait a minute!
How can establishing that 1 = 0 prove anything?" Well, the point is that one of the two
following possibilities must be true:
1. It is true that both a 6= 0 and 1/a = 0 hold at the same time. But this implies that
we must also accept that 1 = 0. In particular, from this, it is possible to show that
all numbers are equal to 0, and so we conclude that all of mathematics is trivial.
2. It is not true that a 6= 0 and 1/a = 0 hold at the same time. In particular, this
means that if a 6= 0, then we must have 1/a 6= 0, which is exactly the implication
we wanted to prove!
Now, if we believe that the rulebook for the real numbers only contain true statements,
we must reject the first possibility described above and accept the second. In other
words, proving that 1 = 0 means that we are done!
More generally, we can describe a proof by contradiction as follows. Suppose that A
and B are two mathematical statements, and we want to prove that
A =⇒ B.
To do this by a proof by contradiction, we would need to prove
A and "not B" =⇒ something clearly absurd.
Now, if we succeed in establishing a chain of implications proving that "A and "not B"
implies something clearly false, then one of the two following possibilities have to hold:
1. A and "not B" are true at the same time. But then we have to accept that a valid
argument based on true assumptions can lead to false conclusions. In other words,
we have to accept that mathematics cannot be trusted.
2. A and "not B" cannot be true the same time. In particular, this means that if A
holds, then "not B" is false. That is, it must be false that B is false, and so we
must conclude that B holds.
Since we believe that mathematics is to be trusted, we reject the first possibility, and
conclude that the second possibility must hold. That is, we conclude that A =⇒ B,
and the proof by contradiction is done!

Exercise 1.63 Do a proof by contradiction to show that 1 > 0.


Exercise 1.64 Do a proof by contradiction to show that x > 0 implies 1/x > 0.
Exercise 1.65 Formulate the argument of Example 1.7 as a proof by contradiction.
Exercise 1.66 Do a proof by contradiction to prove that ab = 0 =⇒ a = 0 or b = 0.
Exercise 1.67 Prove by contradiction that if a number a ≥ 0 has the property that
for all  > 0 it holds that a < , then we must have a = 0.
Remark: This exercise may seem a bit odd, but we will actually need this result a couple
of times throughout the course (and it is a useful tool for research mathematicians).
26 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Proof strategy: Proof by contraposition


A proof technique related to indirect proofs is “proof by contraposition”. If, for a state-
ment A, we let "not A" denote its logical negation, then the point is that the statements

A =⇒ B and not A ⇐= not B

are logically equivalent. Since we do not want to spend too much time on logic, here is
an example to convince you that this makes sense.

Example 1.68 Consider the statement “If Elias is in Lund, then he is in Sweden”. This
statement is in the form A =⇒ B. Now, taking negations, the statement corresponding
to "not B” =⇒ "not A" is “If Elias is not in Sweden, then he is not in Lund”. Notice
how these two statements say exactly the same thing!

Here is a mathematical example:

Example 1.69 (Proof by contraposition) Suppose that we want to prove, say,


ab = 0 =⇒ a = 0 or b = 0. By the above, this is logically equivalent to proving the
contrapositive statement: a 6= 0 and b 6= 0 =⇒ ab 6= 0. We leave the proof of this
statement as an exercise.

Exercise 1.70 Finish the proof of ab = 0 =⇒ a = 0 or b = 0 in the above example.

We end the discussions on the proof strategies "proof by contradiction" and "proof
by contraposition" with the following historical remark.

Remark 1.71 The technique "proof by contradiction" is rather extreme. Indeed, here
is what G. H. Hardy (who we already met on page 2) had to say about it:

“Reductio ad absurdum, which Euclid loved so much, is one of a mathemati-


cian’s finest weapons. It is a far finer gambit than any chess gambit: a chess
player may offer the sacrifice of a pawn or even a piece, but a mathematician
offers the game.”

This quote is also from his book "A Mathematicians Apology".


1.4. PROOF BY CONTRADICTION AND BY CONTRAPOSITION 27

Revisiting proof by cases: how to solve inequalities


We now consider how to solve some inequalities in order to make the connection to what
we have done above. We begin with a warm-up exercise:

Exercise 1.72 First, determine for which x we have

(a) 9x − 3 > 0 (b) 4 − 2x > 0

Based on the information from (a) and (b), do a proof by cases to determine when
9x − 3
(c) > 0.
4 − 2x
Hint: To solve (c), all you need to know is that “negative times negative is positive”,
that “positive times negative is negative” and that “positive times positive is positive”.

Exercise 1.73 In fact, why do we know that “negative times negative is positive” and
so forth? Formulate this as a proposition, and prove the statement.
So, what is the point of the above exercises? Well, what they tell us is that if we have
an inequality in some factorised form, and we know exactly when each factor is positive
and negative, then we should be in good shape.
Let us consider an example.

Example 1.74 Let us solve the inequality


5
x−2< .
x+2
As a first step, we move everything to the same side of the inequality sign:
5 5
x−2< ⇐⇒ 0 < − (x − 2).
x+2 x+2
Next, we want to simplify the right-hand side. Notice that we do not want to multiply
by anything depending on x, since this may result in the inequality being reversed for
some x and not for other x. So, we use a chain of equalities to rewrite the right-hand
side. The point is to get it in a factorised form similar to what we had in part (c) of
exercise 1.72:
5 5 x+2
− (x − 2) = − (x − 2) ·
x+2 x+2 x+2
5 − (x − 2)(x + 2)
=
x+2
9 − x2 (3 − x)(3 + x)
= = .
x+2 x+2
28 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Notice that in the last step, we used the formula a2 − b2 = (a − b)(a + b) to factorise the
numerator.
Combining what we did above, we get

5 (3 − x)(3 + x)
x−2< ⇐⇒ 0 < .
x+2 x+2
This means that we have rewritten the inequality in factorised form, and that we can
solve the inequality using the same approach as in part (c) of exercise 1.72. Note that
to keep track of all the cases, it is practical to use a table of signs:

Fig. 9. In the first three lines of this table of signs, we keep track of where the indi-
vidual factors are positive and negative, respectively. This allows us to understand
the different cases we have to consider for the full product appearing in the last line.
Note that we use a skull to remind ourselves that a 0 in the denominator means that
the expression is not defined.
From the table of signs, we see that the expression is positive when x < −3 or −2 < x < 3,
which is also our final answer.

Exercise 1.75 Use the strategy from Example 1.74 to solve the following inequalities.
9 x+2
(a) ≥1 (b) (x − 2)(x + 3) ≤ x − 2 (c) ≤x−2
x+3 x−1

Remark 1.76 Some students react to the use of tables of sign. Indeed, we are discussing
axioms at a fairly serious level, and suddenly these childish drawings with skulls appear.
Well, the tables of signs are just symbols to keep track of the cases involved in solving
the inequality. (It would essentially be the same thing if some alien civilisation would
think that the symbols of our alphabet are too “childish” to express anything serious.)
1.5. PROOF BY INDUCTION 29

1.5 Proof by induction


In this section, we formulate the principle of induction. It is the first of two axioms that
specifically deal with the infinite, and gives us the technique proof by induction.

Third part of the rulebook for the real numbers


Before we formulate the principle of induction, we remark that it is not common to state
it as an axiom. Indeed, if we had been a bit more careful about how we defined the
natural numbers (which you were basically asked to do way back in exercise 1.2), we
would have been able to deduce the induction principle as a consequence. However, to
avoid this rather delicate discussion3 , we chose to take the induction principle on faith
(as most textbooks on Calculus do anyway).

The rulebook for R (part 3 of 4) The natural numbers satisfy the induction principle:

(IP) If V ⊂ N satisfies the properties

(i) 0∈V,
(ii) for all k ∈ V it holds that k + 1 ∈ V ,

then V = N.

As an axiom, the induction principle is rather bad since it is far from obvious. Keep
in mind that axioms are the only mathematical "truths" that we allow ourselves to
accept without any proof, and that they form the starting point for all of mathematics.
Therefore, it is critical that they are as self-evident as possible. Since we only take the
induction principle as an axiom for convenience (to avoid a technical discussion), this is
not really that big of a deal philosophically speaking (as we will see in the next section,
the opposite will be true for the the Completeness Axiom).

Exercise 1.77 (Challenging) Try to figure out how to define the natural numbers
N in a way that reveals the induction principle as a natural consequence
Remark: Although the required definition of N is actually not that complicated, finding
this on your own is rather hard – so you may want to search through some literature.

Before we look more closely at how to use the induction principle, let us briefly
consider whether or not it makes intuitive sense. To this end, we point out that it can
be interpreted as follows: “If you could count the natural numbers, one by one, for all
eternity, then you would be able to count all of them.” So, if the induction principle was
false, then there would be integers that you could never “reach” by counting in this way.
Does this sound reasonable?
3
A discussion that will be made in the course on the foundations of algebra.
30 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Some preparation: notations for sums and sequences


A typical situation where you want to prove something by induction is if you are con-
sidering some sum. In this case, the "Sigma"-notation for sums may be useful:

Note the following peculiar thing: it does not matter which letter or symbol we use for
the index since the index itself never appears in the actual sum (notice that n does not
appear on the right-hand side). In particular, it is true that
3 3
X 1 X 1
= .
2n 2k
n=0 k=0
For this reason, the index variable is often called a "dummy variable".

Exercise 1.78 Explain why


3 6 1
X 1 X 1 1 X 1
= = .
2n 2n−3 4 2n
n=0 n=3 n=−2

We also introduce the notation for sequences of numbers:

Exercise 1.79 Express the following sequences in the above notation:


(a) (1, 1/2, 1/4, 1/8, 1/16) (b) (1, −1/2, 1/4, −1/8, 1/16) (c) (1,1,1,1,1).

We remark that if it is clear from the context what the indices are, we usually just
denote the above sequence by its formula 1/2n . Moreover, we express an infinite sequence
numbers, such as 1, 1/2, 1/4, . . ., by writing
 1 ∞  1 1 1 
= 1, , , , . . . .
2n n=0 2 4 8
1.5. PROOF BY INDUCTION 31

First example of proof by induction: Summation formulas


Let us now illustrate how we can use the induction principle to prove mathematical
statements. Specifically, our goal is to prove the following classical summation formula:
n(n + 1)
1 + 2 + 3 + ··· + n = (1.1)
2
In a way, the induction principle allows us to do an infinite version of "proof by cases".
The basic idea is that the formula is rather easy to prove for each of the integers n = 1,
n = 2, n = 3 separately (just compare the left and right hand sides in each case), and
that the induction principle allows us to capture the infinitely many remaining cases in
one clever step. In the following example, we illustrate this proof technique which we
call "proof by induction".

Example 1.80 (Gauss’ summation formula)


Let us prove the summation formula (1.1) shown
above using induction. Notice that this exactly
means that we want to show that the set
 
n
 X n(n + 1) 
V = n: j=
 2 
j=1

is equal to N∗ . But this means that the induction


principle now gives us a strategy to complete the
proof. Indeed, we would be done if we could
check that Fig. 10. Carl Friedrich Gauss (1777-
1855) is said to have proved this for-
(i) 1 ∈ V mula when he was 6 years old.
(ii) if k ∈ V then also k + 1 ∈ V .

Proof that 1 ∈ V: This is the so-called base case. Here, it consists in verifying
that the summation formula is true when n = 1. This is clearly ok here.
Proof that k ∈ V =⇒ k + 1 ∈ V: This is the so-called induction step. It con-
sists of assuming that k ∈ V (the induction hypothesis), and then proving that
k + 1 ∈ V must also be true. To prove that k + 1 ∈ V , our plan is to establish a chain
of equalities showing that
(k + 1)(k + 2)
1 + 2 + · · · + k + (k + 1) = · · · = .
2
Indeed, this is formula (1.1) with N replaced by k + 1. Our most important tool is the
induction hypothesis. Namely, formula (1.1) holds with the upper summation bound n.
We use this to get started:
k(k + 1)
1 + 2 + · · · + k + (k + 1) = |1 + 2 +{z· · · + k} +(k + 1) = + (k + 1).
2
k(k+1)
= 2
32 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Putting this expression on a common denominator, we arrive at

(k + 1)(k + 2)
2
exactly like we needed.
Conclusion: By the induction principle, V = N∗ . That is, formula (1.1) holds for
all positive integers.

A good mental image for understanding the induction


principle is in terms of an infinite row of dominoes: If you
knock down the first domino, and you know that each
domino will knock down the following domino, then the
induction principle says that every domino will fall – even
if there is an infinite number of them! Fig. 11. Pictured:
induction.

Exercise 1.81 Use induction to prove that


X n
2j = 2n+1 − 1, ∀n ∈ N.
j=0

Exercise 1.82 What happens when you use induction to prove the following, false,
formula?
Xn
2j = 2n+1 + 13, ∀n ∈ N.
j=0

Exercise 1.83 Use induction to prove the summation formula:


n
X n(n + 1)(2n + 1)
j2 = , ∀n ∈ N∗ .
6
j=1

Exercise 1.84 One of the axioms says that c(a1 + a2 ) = ca1 + ca2 . Use induction to
write out a careful proof of the more general formula
n
X n
X
c· aj = caj .
j=1 j=1

Remark: The meaning of a "careful proof" is subjective and depends on context. Here,
you are asked to prove an "obvious" result which is not in the rulebook. This indicates
that you should point out steps that you otherwise would not (for instance, the use of
the distributional axiom, which we usually omit to mention explicitly in order to make
proofs readable).
1.5. PROOF BY INDUCTION 33

Second example of proof by induction: Integer powers


We now consider the rule for powers mentioned in Example 1.1 for integer powers. To
prepare for this, we first give a definition of powers that is essentially designed to be
compatible with induction.

Definition 1.85 (Definition of integer powers) Suppose a ∈ R. Then we define ak ,


for all k ∈ N, as follows:

(i) a0 = 1,
(ii) for all k ∈ N we let ak+1 = ak · a.

Moreover, for k ∈ N, we define a−k = 1/ak .

Notice how each power of a is defined in terms of the previous power. Such a definition
is called inductive, and is well suited for proofs by induction, as we illustrate in the
following example.

Example 1.86 Let us prove that am · an = am+n for all m,n ∈ N. To this end, suppose
that m is fixed. We now prove by induction that for all n ∈ N, the formula holds.
Base case: For n = 0, the left-hand side is am · a0 = am · 1 = am . The right-hand
side is am+0 = am . This is exactly what we needed to show.
Induction step: We begin by assuming that the formula holds for n = k. That is,
we assume that am · ak = am+k holds. We want to use this to prove that this formula
also holds when k is replaced by k + 1. We do this by the following chain of equalities:

am · ak+1 = am · ak · a = am+k · a = am+k+1 .

(Here, we used the inductive definition of integer powers in the first and last equalities.
In the middle equality, we used the induction hypothesis.)
Conclusion: For all fixed m ∈ N, we have used the induction principle to prove that
am · an = am+n holds for all n ∈ N. That is, the formula holds for all m,n ∈ N.

Exercise 1.87 Strictly speaking, to apply the induction principle in the above exam-
ple, we need to define some suitable set V . Do this.
Exercise 1.88 (a) Use induction to prove am /an = am−n , for m, n ∈ N.
(b) Prove that the formulas from 1.86 and part (a) of this problem hold for all m, n ∈ Z.
Exercise 1.89 Prove the power rule from Example 1.1 for integer powers. That is,
prove that for n ∈ Z and a, b ∈ R, we have
(ab)n = an bn .

Hint: Use induction to prove this statement for n ∈ N, and then show that the
statement for negative n follows as a consequence.
34 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Third example of proof by induction: Pascal’s triangle


To set up a last application of proof by induction,
we now make a bit of a detour to briefly visit a
beautiful observation due to Blaise Pascal (1623–
1662). Indeed, Pascal observed a pattern in the
following expressions:
(a + b)1 = a + b
(a + b)2 = a2 + 2ab + b2
Fig. 12. Fun fact: At the age of 31,
(a + b)3 = a3 + 3a2 b + 3ab2 + b3
Pascal gave up science to be a monk
(a + b)4 = a4 + 4a3 b + 6a2 b2 + 4ab3 + b4 since God fixed his toothache.

To explain the pattern Pascal introduced his fa-


mous triangle:

Fig. 13. This is Pascal’s triangle. If you let the 1 on top be the zero’th line, then
the n’th line tells you the coefficients you get when you multiply out (a + b)n . To
the right, we indicate how you, from one line, can deduce the next.

Exercise 1.90 (a) Use the instruction in the above figure to compute the next line.
(b) Multiply out (a + b)5 and compare (feel free to use the expression for (a + b)4 )).

Even if the above observation is nice, how do we compute the coefficients of, say,
(a + b)500 ? Pascal figured out how one can do this directly without having to go through
n
the triangle, line by line. To formulate his result, he introduced the symbol m to denote
the m’th coefficient on the n’th line of the triangle (he called them binomial coefficients):

Fig. 14. Here is Pascal’s triangle expressed in terms of binomial coefficients.


1.5. PROOF BY INDUCTION 35

Definition 1.91 (binomial coefficients) For all n, j ∈ N such that j ≤ n, we have


   
n n
(i) = =1
0 n
     
n+1 n n
(ii) = +
j j−1 j

Exercise 1.92 Verify that with the above definition, then figures 13 and 14 are the
same.

Pascal found a nice expression for the binomial coefficients. This expression is in terms
of the "factorial function", which we define for n ∈ N to be
n! = n · (n − 1) · · · 2 · 1.
(n! reads aloud as "n factorial".) Here are some examples:
1! = 1, 2! = 2 · 1, 3! = 3 · 2 · 1, 4! = 4 · 3 · 2 · 1.
Note that it is usual to define 0! = 1 since it makes formulas nicer.

Proposition 1.93 (Pascal’s formula for binomial coefficients) For all n, m ∈ N


such that m ≤ n, we have  
n n!
= .
m m!(n − m)!

Exercise 1.94 (Challenge) Use induction to prove Pascal’s formula.


Hint: Part of the problem here, is to figure out what the induction hypothesis should
be.

With the above preparations out of the way, we formulate Pascal’s formula famous
binomial theorem.

Proposition 1.95 (Pascal’s binomial theorem) For all natural numbers n and real
numbers a, b, we have
n  
n
X n n−m m
(a + b) = a b .
m
m=0

Exercise 1.96 Compute the coefficient of a498 b2 in the expansion of (a + b)500 .

Exercise 1.97 (Challenge) Use induction to prove Proposition 1.95.


36 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

1.6 The completeness axiom


In this section, we formulate the fourth and last part of our rulebook for the real numbers.
While this axiom has no proof strategy named after it, it really opens the doors to the
study of the infinitely small and infinitely large.

The fourth and final part of our rulebook

Every axiom we have formulated so far is satisfied not only by the real numbers R, but
also by the rational numbers Q. That is, none of the rules we have introduced so far
allow us to distinguish between the sets R and Q. In other words, if we did not introduce
any further axioms, we would be tempted to conclude that R = Q. However, √ we know
that this is not the case: it has been known since the ancient Greeks that 2 ∈ / Q.
We begin by formulating the following definition.

Definition 1.98 For a subset M ⊂ R we call C ∈ R an upper bound for M if we have

x≤C ∀x ∈ M.

The smallest number C with the above property is denoted by sup M and called the least
upper bound, or supremum, of M . If M has no upper bound, we write sup M = +∞.

We will momentarily explain what the above definition means, but let us first formu-
late the completeness axiom, which, well, completes our rulebook:

The rulebook for R (part 4 of 4) The real numbers satisfy the completeness principle:

(CP) If the subset M ⊂ R is non-empty and has an upper bound, then it has a least
upper bound.

Our immediate goal is now to figure out what all of this means. The first thing is to
realise that the concept of an upper bound is not really that complicated (please note
that we do not need to use the completeness axiom in these examples!).

Example 1.99 (A set with upper bounds) Let M be the half-open interval [1,3).
In this case, the numbers 3, 19 and 624.7 are all upper bounds of M . In fact, every
number larger than, or equal to, 3 is an upper bound for M . Out of these upper bounds,
the smallest one is 3. That is, sup M = 3.
1.6. THE COMPLETENESS AXIOM 37

Example 1.100 (A set with no upper bound) Let M = {n2 }n∈N . Then M has no
upper bound, and therefore we write sup M = +∞.

Exercise 1.101 Suppose that M = [1,3]. What are the upper bounds for M , and
what is the lowest upper bound of M ?
Remark: The point of this exercise is to encourage you to compare what Definition
1.98 says about the intervals [1,3] and [1,3) (recall Example 1.100).
We will not be using the following definitions much in these lecture notes, but we
include them since they pop up from time to time.

Definition 1.102 If M ⊂ R is such that sup M ∈ M , then we say that the supremum
is the maximum element of M and denote it by max M instead.

Example 1.103 The sets M1 = [1,3) and M2 = [1,3] both have 3 as their supremum.
Of these, only M2 admits a maximum element, and so we can write max M2 = 3.

Exercise 1.104 (Optional) Prove that every finite set admits a maximum element.
Hint: The case when the set contains one element should be clear...
Finally, we note that in the same way that a set M can have least upper bound
sup M , it can also have a greatest lower bound inf M (also called the infimum of M ). In
the following exercise, we ask you to explore what this ought to mean.

Exercise 1.105 (a) Define what we ought to mean by a lower bound for M .
(b) Define what we ought to mean by inf M .
(c) Find an example of a set M for which inf M does not exist. What would be a
reasonable notation for inf M in this case?
(d) Define what we ought to mean by min M , and give examples of a set that admits
a minimal element and one that does not.

The following result is our first consequence of the completeness axiom.

Proposition 1.106 If M ⊂ R is non-empty and has a lower bound, then it has a


greatest lower bound.

Exercise 1.107 Prove that Proposition 1.106 is a consequence of the completeness


axiom.
Hint: The proof is very short, and basically boils down to applying the completeness
axiom to the set {−x : x ∈ M }.
38 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

First example of the use of the completeness axiom


The completeness axiom is rather hard to understand, and almost makes the induction
principle seem friendly by comparison. For this reason, we could call it the black sheep
of our rulebook for R. But since it will play an important role in these lecture notes –
and in mathematics in general – we have no choice but to become friends with it.

Fig. 15. The completeness axiom is a black sheep of mathematics. There are actually
mathematicians who refuse to accept it!

As opposed to the induction principle, we do not connect the completeness axiom


to any specific "method of proof". However, it does allow us to prove statements that
would have been impossible to prove from the previous axioms alone. Here is one such
statement:

Example 1.108 (Archimedean property of N) We prove the following statement.


For all real numbers x > 0, there exists an integer n so that n > x.
We do this by a proof by contradiction. To this end, we note that the negation of
the statement "there exists an integer n so that n > x" is the statement "for all n ∈ N
we have n ≤ x". Since this means that x is an upper bound for N, it follows from the
completeness axiom that N has a smallest upper bound which we denote by sup N.
But n ∈ N implies n + 1 ∈ N. Therefore, n + 1 ≤ sup N also holds for all n ∈ N.
Subtracting from both sides, we obtain that n ≤ sup N − 1 holds for all n ∈ N. That
is, sup N − 1 is also an upper bound for N. But this is a contradiction since sup N was
supposed to be the smallest such upper bound, and we are done.

Notice that the Archimedean property can be understood as the statement that the
set of integers is infinitely "long". That is, the above example shows one sense in which
the completeness axiom allows us to make sense of a notion of the infinite.

Exercise 1.109 Prove the following, more original, formulation of the Archimedean
property of R: For all β > 0 and x > 0 there exists an integer n so that βn > x.
Hint: Here you can either modify the proof of the Archimedean property, or even just
apply it in a suitable way.
1.6. THE COMPLETENESS AXIOM 39

Second example of the use of the completeness axiom


Let us now take a look at how to determine the supremums and infimums of various sets.
Here, we use the completeness axiom in the form of the Archimedean property of R.

Example 1.110 Let us determine the supremum of the set


n 1 2 3 4 o n 1 o
M = 0, , , , , . . . = 1 − : n ∈ N∗ .
2 3 4 5 n
First, we observe that 1 is an upper bound for the set since
1 1
1− ≤ 1 ⇐⇒ 0 ≤ ,
n n
which is true for all n ∈ N∗ (recall that 0 does not belong to N∗ ).
To prove that 1 is the smallest upper bound, we do a proof by contradiction. That
is, we assume that the smallest upper bound is some number D < 1, and show that this
leads to a contradiction. But this means that for all n ∈ N∗ we have
1 1 1
1− ≤ D ⇐⇒ 1 − D ≤ ⇐⇒ n ≤ ,
n n 1−D
where the last step is ok since 1 − D > 0. But by the Archimedean property of N (which
is a consequence of the completeness axiom), such an inequality cannot hold for all n,
and we are done!

Exercise 1.111 In this exercise, we consider the set M from Example 1.110.
(a) Determine whether M admits a maximum element.
(b) Determine whether M admits a minimum element.
Exercise 1.112 Consider the set
n1 o
M= : n ∈ N∗ .
n
(a) Determine inf M and sup M .
(b) Determine if M admits a maximum or minimum element.
Exercise 1.113 Consider the set
nn + 2 o
M= :n∈N .
n+1
(a) Determine inf M and sup M .
(b) Determine if M admits a maximum or minimum element.
Exercise 1.114 Suppose that A,B are two subsets of R so that a ≤ b for all a ∈ A
and b ∈ B. Show that sup A ≤ inf B.
40 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Third example of the use of the completeness axiom


An important consequence of the completeness axiom (and the induction principle) is
that we can make sense of what we mean by a decimal number.
We begin with the following example to illustrate the general idea.

Example 1.115 (All decimal expansions represent some real number) Suppose
someone throws the endless string of decimals digits for the number π in our face:

3.1415926535... (1.2)

How do we know that this represents a real number? One reason to be skeptical is that
nowhere in the axioms for the real numbers are decimal numbers explicitly mentioned.
A key insight is that (1.2) is supposed to be understood to be the supremum of the set
n 1 1 4 1 4 1 o
M = 3, 3 + , 3 + + 2, 3 + + 2 + 3, ... .
10 10 10 10 10 10
= {3, 3.1, 3.14, 3.141, . . .}.

Notice that, here, we are expressing M on the form {an : n ∈ N}, where each an contains
exactly n decimal digits of π after the comma. In particular, the sequence an is growing,
but none of the entries are larger than the number 4.
By the above observations, the completeness axiom implies that a smallest upper
bound for M exists. That is, π exists. (Note that this argument is incomplete – what
we lack is a recipe for computing all decimal digits of π.)

From the above example, we extract the following definition:

Definition 1.116 (Decimal number) For a0 ∈ N and any sequence (an )∞


n=1 of integers
from the set {0,1,2, . . . , 9} (called decimal digits), we define
a0 .a1 a2 a3 a4 . . .
to be the real number given by
n
nX ak o
a = sup : n ∈ N .
10k
k=0
We call a0 .a1 a2 a3 a4 . . . the decimal expansion or representation of the real number a.

Exercise 1.117 (Discussion) Can a real number have two different decimal expan-
sions?
Exercise 1.118 The above definition only says what we mean by a positive decimal
number. Extend the definition to negative decimal numbers.
1.6. THE COMPLETENESS AXIOM 41

Exercise 1.119 (Challenging) Prove that 0.999 . . . = 1.


Hint: You are supposed to determine the supremum of a set...
Remark: We will find a simpler method of showing this in a later chapter.

Note that while the above definition says what we mean by a decimal number, it
does not guarantee that every real number is a decimal number. That is, could there
be super fancy real numbers that cannot be expressed as a decimal number? To put
this question into perspective, consider the following: could it be that there are fancy
numbers that cannot be written
√ as a rational number? Well, yes! It was famously proved
by the ancient Greek that 2 is such a number. These fancy numbers are called the
irrational numbers (see Appendix ??).
So, could it be that there exists some proof that not all real numbers have a decimal
expansion? Well, no! Let us settle this matter as proper mathematicians. That is, in
terms of a theorem, a lemma and a proof.

Theorem 1.120 All real numbers are decimal numbers.

To prove this result, we need the well-ordering principle, which is a consequence of the
induction principle. As the Well-ordering principle will be discussed in other courses, we
omit the proof here (however, for the interested students, we provide a proof in Appendix
??).

Lemma 1.121 (The well-ordering principle) Every non-empty subset of N has a


least element.

Exercise 1.122 (Challenging) Use the well-ordering principle to prove that for every
x ≥ 0 there exists an integer n so that x ∈ [n,n + 1).
Remark: The statement also holds for x < 0, since we only need it for positive x, we
choose to restrict the statement as this simplifies the proof somewhat.

Proof of Theorem 1.120. Let a ∈ R be some fixed real number. We now show that here
exists some sequence of numbers an so that a satisfies Definition 1.116. For convenience,
we assume that a > 0.
First, we observe that by exercise 1.122, there exists an integer a0 ∈ N such that

a ∈ [a0 , a0 + 1).

Notice that this can be rewritten as saying that

a − a0 ∈ [0,1).
42 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

Next, we split the [0,1) into 10 subintervals of length 1/10, that are half-open in the
same sense: h 1 h1 2 h9 
0, , , ... ,1 .
10 10 10 10
The number a − a0 has to be in one of these subintervals. That is, for some a1 ∈
{0,1,2, . . . ,9}, we have
ha a + 1
1 1
a − a0 ∈ , .
10 10
Again, we notice that this can be expressed on the form
a1 h 1
a − a0 − ∈ 0, .
10 10
Continuing this process, where we at each step split an interval of the form [0,1/10n−1 )
into ten subintervals of length 1/10n , we find, an infinite sequence (an )∞
n=1 of integers
an ∈ {0,1,2, . . . , 9} so that for every fixed n, we have
a1 a2 an h 1 
a − a0 − − − · · · − n ∈ 0, . (1.3)
10 100 10 10n
Notice that it follows immediately from (1.3), and the fact that all ak are positive, that
n
X ak
≤a ∀n ∈ N.
10k
k=0

By the completeness axiom, this means that


n
nX ak o
sup : n ∈ N .
10k
k=0
| {z }
M

exists and satisfies sup M ≤ a.


What remains is to prove that a = sup M . In light of what we just proved, all we
need to show is that a − sup M > 0 cannot happen. So, to arrive at a contradiction,
suppose that a − sup M = ∆ for some real number ∆ > 0. By definition, the supremum
of the set M must be larger than, or equal to, each of the elements of M . That is, for
all n ∈ N we have
n
X ak
≤ sup M.
10k
k=0
But this immediately implies that for all n ∈ N, we have
n
X ak
∆ = a − sup M ≤ a − .
10k
k=0

In combination with (1.3), this means that ∆ ∈ [0,1/10n ) for all n ∈ N. But this
contradicts the fact that ∆ is a fixed and strictly positive number (see exercise 1.67)!
1.6. THE COMPLETENESS AXIOM 43

Fourth example of the use of the completeness axiom


While all real numbers can be represented as decimal numbers, not all real numbers
can be represented as a fraction of two √ integers. This is a fact that goes back to the
Pythagoreans, and their discovery that 2 is not a rational number – it is irrational!
(fun fact: it is said that the disciple of Pythagoras discovering this was drowned in
order to keep the knowledge of the existence of such numbers from spreading – indeed,
rational numbers were seen as an expression of the divine, and any thought of them
being "insufficient" in any way was apparently just too much to bear).
We now state this discovery as a theorem. We omit the proof since it will be covered
in your first course on number theory. However, for the interested reader, we provide a
proof, completely based on the contents of this chapter, in Appendix ??.


Theorem 1.123 The real number 2 is irrational.

The following consequence states that there are no gaps between the real and rational
numbers on the real line. It will be useful in certain key examples later in the lecture
notes. We include a proof that shows how it follows from the above theorem.

Corollary 1.124 Every interval [a,b] with a < b contains (at least) one rational and one
irrational number.

Proof of Corollary. We prove that the interval must contain at least one rational number,
and leave the (very similar) proof that it contains at least an irrational number as an
exercise, below.
First, we note that if either a or b is rational, then we are done (why?). Therefore,
we may assume that both a, b are irrational. Now, since b − a > 0 it follows by the
Archimedean property that we can find some number N ∈ N so that, say, N > 1/(b − a).
Our claim is that we can now find a number n ∈ Z so that the rational number n/N
belongs to [a,b]. Intuitively, this makes sense since the gap between each two consecutive
numbers from the sequence
3 2 1 1 2 3
...,− , − , − , 0, , , , . . .
N N N N N N
is equal to 1/N , which by our choice of N , is smaller than (b − a), and so some number
in the above sequence must hit the interval (a,b) sooner or later. To turn this intuition
into a proof, we need only apply the Well-ordering principle in the form of exercise 1.122.
Indeed, applying the exercise to x = N b, it follows that there exists an integer n so that
N b ∈ [n,n + 1). Now, this can be rewritten as follows:

N b ∈ [n,n + 1) ⇐⇒ n ≤ N b < n + 1
44 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

⇐⇒ N b − 1 < n ≤ N b
1 n
⇐⇒ b − < ≤ b.
N N
What remains is to prove that b − 1/N > a. But this follows since the inequality
1/N < (b − a) can be rewritten on the form
1
b− > b − (b − a) = a.
N
We conclude that the rational number n/N belongs to [a,b].

Exercise 1.125 Complete the proof of the above corollary by proving that every in-
terval [a,b] with a < b contains an irrational number.

Hint: Modify the above argument, using the fact that 2 is irrational.
Exercise 1.126 It is actually more common to express the above corollary on the
following form: "Every interval (a,b) with a < b contains (at least) one rational and one
irrational number." Prove that the truth of this statement follows (almost) immediately
from the above corollary.
Remark: To really convince yourself that this is an inessential detail, note that the
statement in the exercise trivially implies the statement of the corollary.

An overview of the numbers contained on the real line


In the course of this chapter, we have discussed all types of numbers that we will consider
on the real line in these lecture notes:

Remark 1.127 (An inventory of the real numbers)


• By axioms (A3 ) and (M3 ), the numbers 0 and 1 exist (with 0 6= 1).
• Using the numbers 0 and 1, we get N by following the ideas from exercise 1.2.
• By axiom (A4 ), for each m ∈ N there exists a number a ∈ R so that m + a = 0.
We denote a by (−m). Together with N, this gives all entire numbers Z.
• By axiom (M4 ), for every non-zero number n ∈ Z, there exists a number b ∈ R so
that n · b = 1. We denote this number by 1/n. Moreover, we denote m · (1/n) by
m/n. In this way, we obtain the numbers we denote by Q, and call rational.
• In Definition 1.116 (and Example 1.115) we explain, using the completeness axiom,
what we mean by a decimal number. In Theorem 1.120, we proved that all real
numbers can be expressed as a decimal numbers.
• Finally, by Theorem 1.123, there exist decimal numbers that are not rational.
Such numbers are called irrational. In Appendix 9, we explain Cantor’s beautiful
argument showing that there are many more irrational than rational numbers.
1.7. ANSWERS TO SELECTED EXERCISES 45

1.7 Answers to selected exercises


1.2 (a) Since we know that 1 exists, we can define 2 = 1 + 1, 3 = 2 + 1 and so forth.
(b) By (a), we know what the positive integers are. To get the negative ones, we
use axiom (A4 ). That is, we define (−1) to be the additive inverse of 1, and so
forth. (c) c/c = c · (1/c) = 1 and c/1 = c · (1/1) = c · 1 = c. (d) a2 = a · a and
a−2 = 1/a2 . (e) a0 = 1 makes sense since we ought to have a0 = a2−2 = a2 /a2 = 1.
1.8 (b) Suppose that x has two additive inverses a and b (that may or may not be
equal to each other). That is both x + a = 0 and x + b = 0 hold. But this means
that a = a + x + (−x) = 0 + (−x) = b + x + (−x) = b.
1.11 (b) Add (1 + (−1)) to (−1) · (−1) and use that (−1) = 1 · (−1).
(c) Combine Example 1.10 and (b).
1.14 (a) ac + ad + bc + bd, (b) a2 + 2ab + b2 .
1.21 (a) x/(1 + x) for x 6= 0, (b) a + 1 for a 6= 0, 1, (c) (u − 1)/(u + 1) for u 6= −1, 0, 1.
(d) (a + 1)/(a − 1), (e) (x + 1)2 /(x − 1)2 , (f ) (x3 − 3a3 )/(a3 − ax2 )
1.23 You need to find one pair of numbers a,b so that the formula holds.
1.25 (a) true, (b) false, (c) false, (d) true, (e) true.
1.27 {a/b : a,b ∈ Z, b 6= 0}.
1.28 (a) ∅, (b) {i, −i}.
1.39 (a) the key observation is that x2 = 1 ⇐⇒ x2 − 1 = 0 ⇐⇒ (x − 1)(x + 1) = 0.
1.41 (a) x = 3/2, (b) x = 10/9, (c) x = −2.
1.47 The solutions are x = −2 and x = −1.
1.50 (a) p2 = 4q, (b) no, by, say, the pq-formula, the relation from (a) implies that the
repeated root is real (the expression inside of the square root is zero).
1.53 The first is wrong, the last two are correct.
1.56 Use (I1 ).
1.60 (a) x 6= 1, (b) a 6= 0 and b 6= 1, (c) a ∈
/ {0,1,2}, (d) x ≤ 3.
1.61 (a) ∃x ∈ A such that x < 0, (b) ∀x ∈ A, we have x < 0.
1.66 Suppose that ab = 0, but that a = 0 or b = 0 is false. That is, ab = 0 and a 6= 0
and b 6= 0. But this means that we can multiply both sides of ab = 0 by 1/a. This
immediately leads to a contradiction.
1.72 (a) x > 1/3, (b) x < 2, (c) x ∈ (1/3,2).
1.75 (a) x ∈ (−3,6]
1.78 This should become clear if you write out all the sums and compare the expressions.
1.79 (a) (1/2n )4n=0 , (b) ((−1/2)n )4n=0 , (c) (1)4n=0 .
46 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS

1.81 Base case: for n = 0, the left-hand side is 1 and the right-hand side is 21 − 1 = 1.
Induction step: suppose formula ok for n = k. To investigate for n = k + 1,
we compute, using the induction hypothesis, that 1 + 2 + 22 + · · · + 2k + 2k+1 =
2k+1 − 1 + 2k+1 = 2 · 2k+1 − 1 = 2k+2 − 1. This is exactly what we wanted to get,
and we are done.
1.82 The induction step works, but the base case fails. Hence, the induction proof fails.
1.83 Base case: putting n = 1, we see that the left- and right-hand sides are both equal
to 1 and so the formula holds in this case. For the induction step, assume that the
formula is ok for n = k. When considering the formula for n = k + 1 notice that
you can apply the induction hypothesis to simplify the expression 1 + 4 + 9 + · · · +
k 2 + (k + 1)2 . Using this, you can find a chain of equalities showing that the left-
and right-hand sides for the formula in the case n = k + 1 are equal, and you are
done.
1.87 V = {n ∈ N : am · an = am+n }.
1.90 (a) 1, 5, 10, 10, 5, 1, (b) (a + b)(a + b)4 = a5 + 5a4 b + 10a3 b2 + 10a2 b3 + 5ab4 + b5 .
1.96 500 · 499/2 = 124750.
1.97 The base case is n = 0. Then (a + b)0 = 1 and 0k=0 k0 a0−k bk = 1 · a0 b0 = 1.
P 

For the induction step, suppose that the formula for (a + b)n holds for n = k. We
are to use this to prove this formula for n = k + 1. This allows us to make the
following chain of equalities:

k  
k+1 k
X k k−m m
(a + b) = (a + b) · (a + b) = (a + b) · a b
m
m=0

k   k  
X k k−m+1 m X k k−m m+1
= a b + a b
m m
m=0 m=0

k   k+1  
X k k−m+1 m X k
= a b + ak−m+1 bm
m m−1
m=0 m=1

k    !
X k k
= ak+1 + + ak−m+1 bm + bk+1
m m−1
m=1

k  
k+1
X k + 1 k−m+1 m
=a + a b + bk+1
m
m=1

k+1  
X k + 1 k−m+1 m
= a b .
m
m=0
1.7. ANSWERS TO SELECTED EXERCISES 47

1.118 For a negative number a, find the decimal expansion a0 .a1 a2 a3 . . . of the positive
number −a. Then put a = −a0 .a1 a2 a3 . . ..
48 CHAPTER 1. A CRASH COURSE ON MATHEMATICAL PROOFS
Chapter 8

The derivative

In this chapter we figure out the computational rules for the derivative and prove the
differentiation formulas for the functions most commonly appearing in these lecture
notes.

Remark 8.1 (Selected problems from previous exams based on this chapter)

1. The following elementary functions are closely related to the trigonometric func-
tions, and are called hyperbolic functions:

ex − e−x ex + e−x
sinh x = and cosh x = .
2 2
(a) Prove the differentiation formulas

(sinh x)0 = cosh x, and (cosh x)0 = sinh x.

(b) Prove the "hyperbolic identity": cosh2 x − sinh2 x = 1.


(c) Show that sinh x is an invertible function on R.
(d) Define the inverse hyperbolic function arcsinh x by letting y = arcsinh x ⇐⇒
sinh y = x. Determine
d
arcsinh x.
dx

49
50 CHAPTER 8. THE DERIVATIVE

8.1 The computational rules for the derivative


Definition for the derivative
We begin by briefly recalling the definition of the derivative from Chapter ??.

Definition 8.2 (The derivative) We define the derivative of f at the point x to be

def f (x + h) − f (x)
f 0 (x) = lim
h→0 h
at all points where this limit exists. Moreover, if the limit exists, we say that f is
differentiable at x. If f is differentiable at all points in its domain, we simply say that f
is differentiable. Note that we sometimes write dx d
f (x) or df 0
dx (x) instead of f (x).

Exercise 8.3 (a) In Chapter ?? We computed the derivative of f (x) = x2 using


the definition. Check that you can also compute this derivative by using the limit

f (x) − f (u)
lim .
u→x x−u

(b) Is it always true that the limit in (a) is equal to the derivative of f (x) for all
differentiable functions f ? If yes, prove this, or, if not, then find a counter-
example.

Remark 8.4 (Leibniz notation) The letter h in the definition of the derivative denotes
how far we move away from the point x on the x-axis. Similarly, the quantity f (x +
h) − f (x) denotes how far this pushes the function away from the value f (x) along the
y-axis. Denoting these changes to the values x and f (x) by ∆x and ∆f , respectively,
Leibniz came up with the notation df
dx
for the derivative. (Here, we should point out that the Greek letter ∆ corresponds to
the latin letter "d" and stands for "difference".) Explicitly, we have
df ∆f f (x + ∆x) − f (x)
= lim = lim .
dx ∆x→0 ∆x ∆x→0 ∆x
In order to remember this notation, then the following is worth keeping in mind:

"In the limit, Greek letters turn into Latin letters."

(In fact, this also holds true in the case of the notation for the definite integral.)
8.1. COMPUTATIONAL RULES FOR THE DERIVATIVE 51

Derivatives of piece-wise defined functions


As an exercise in applying the definition of the derivative, we can try to compute the
derivative of a piecewise-defined function. Here is an example.

Example 8.5 Consider the function


(
x2 + x x ≥ 0
f (x) =
Cx x < 0

For what values of C is this function differentiable at x = 0? To determine this, we need


to check if the following limit exists:

f (0 + h) − f (0) f (h)
lim = lim .
h→0 h h→0 h

Here, we used that f (0) = 0, according to the definition of f . Now, to plug in a formula
for f (h), we need to know if h is positive or negative (since f has different formulas
depending on the sign of h). This forces us to consider the one-sided limits separately:

f (h) h2 + h
lim = lim = lim h + 1 = 1,
h→0+ h h→0+ h h→0+

f (h) Ch
lim = lim = lim C = C.
h→0− h h→0 − h h→0−

Since these two limits are equal if and only if the two-sided limit exists (Proposition
??.??), it follows that f is differentiable at x = 0 exactly if C = 1.

Exercise 8.6 Consider the function in the above example. Suppose that x > 0 is
some fixed number. Do we have to take both formulas of f into consideration when
computing f 0 (x)? Explain why.

Exercise 8.7 Determine values for C and D so that the following function is both
continuous and differentiable at x = 0:
(√
x+1+D x≥0
f (x) =
C(x + 1) x < 0
52 CHAPTER 8. THE DERIVATIVE

Some common differentiation formulas


In Chapter ??, we proved the differentiation formulas for functions such as y = 1/x,

y = x2 and y = x. Later on in this chapter, we prove how to differentiate the other
elementary functions we encounter in these lecture notes. However, since most of these
formulas should be familiar to you from high school, we ask you already now to fill in
as much as you can in the below table, and to feel free to use these formulas throughout
most of this chapter (recall that F is called a primitive function for f if F 0 = f ).

Proposition 8.8 (Derivatives and your favourite primitives of some elemen-


tary functions)
d
Primitive function F (x) f (x) dx f (x)

x2 2x

x 1

1/x

x

ex

ln x

sin x

cos x

tan x

arcsin x

arccos x

arctan x
8.1. COMPUTATIONAL RULES FOR THE DERIVATIVE 53

The rulebook for the derivative


A second goal of this chapter is to prove the following computational rules for the deriva-
tive, and learn how to use them to compute expressions made up of combinations of
functions from the table in Proposition 8.8, above.

Proposition 8.9 (Rulebook for the derivative)


Suppose that both f and g have a derivative at the point x. Then,
 0
(i) f (x) ± g(x) = f 0 (x) ± g 0 (x) (sum rule)
 0
(ii) f (x) · g(x) = f 0 (x)g(x) + f (x)g 0 (x) (product rule)

Moreover, if g(x) 6= 0, it also holds that


0
g 0 (x)

1
(iii) =− (reciprocal rule)
g(x) g(x)2
0
f 0 (x)g(x) − f (x)g 0 (x)

f (x)
(iv) = (quotient rule)
g(x) g(x)2

Finally, if f (u) is differentiable at the point u = g(x), we have

d
f g(x) = f 0 g(x) · g 0 (x)
 
(v) (chain rule)
dx

We prove these rules in Section 8.2, below. As we shall see, these rules are all con-
sequences of the computational rules for the limit. For this reason, it may be surprising
that the sum rule looks very similar to the one for the limit, while others do not.
While the above computational rules should be more or less familiar from high school,
the following rule is probably not.

Proposition 8.10 Suppose that f is an invertible function that is differentiable at a


point y, and satisfies f 0 (y) 6= 0, then f −1 is differentiable at x = f (y), and

d −1 1
f (x) = 0 −1  .
dx f f (x)

We immediately note that while this result may be hard to read and apply, it is
actually not that hard to prove. Indeed, it follows almost immediately from the chain
rule. We shall return to this when we discuss implicit differentiation later in the chapter.
54 CHAPTER 8. THE DERIVATIVE

How to use the computational rules for the derivative


In the first example, we illustrate the use of rules (i) to (iv). In a sense, these are
the "easy" rules (keep in mind that you are allowed to use everything from the table of
derivatives listed in Proposition 8.8, above).

Example 8.11 Here are some examples to illustrate rules (i) to (iv).

d 3
x + sin x = 3x2 + cos x

(i)
dx

d 3
x sin x = 3x2 sin x + x3 cos x

(ii)
dx
= x2 3 sin x + x cos x


d 1  −1 d 3 
(iii) 3
= 2 · x sin x
dx x sin x x3 sin x dx

x2 3 sin x + x cos x

=−
x6 sin2 x
3 sin x + x cos x
=−
x4 sin2 x
Notice how we in (iii) do not try to solve everything in one line. Instead, the first
step was essentially to recall the reciprocal rule. Indeed, to have a bit of patience when
computing derivates often helps us avoid mistakes.

d  x3  3x2 sin(x) − x3 cos x


(iv) = 2
dx sin x sin x
3 sin x − x cos x
= x2 ·
sin2 x

We include one more example on the product rule to illustrate the importance of
patience when using the product rule:

Example 8.12 Applying the product rule twice, we can differentiate the product of
three functions:
d  d 
ln x · sin x · arctan x = ln x · sin x · {z
arctan x}
dx dx |{z} |
f g
d  d 
= ln x · sin x · arctan x + ln x · sin x · arctan x
dx dx
8.1. COMPUTATIONAL RULES FOR THE DERIVATIVE 55

sin x · arctan x  sin x 


= + ln x · cos x · arctan x +
x 1 + x2
sin x · arctan x ln x · sin x
= + ln x · cos x · arctan x + .
x 1 + x2

Exercise 8.13 Compute the derivatives of


x2 + 1 sin x x
(a) (b) (c) √
arctan(x) 1 + cos x ln x

Remark: You can avoid the use of the chain rule in this exercise.
Exercise 8.14 Show that for constants a,b,c,d such that not both c = d = 0, then
 
d ax + b ad − bc
= .
dx cx + d (cx + d)2

Exercise 8.15 (a) Use the product rule to prove by induction that
d n
x = nxn−1 , ∀n ∈ {1,2,3, . . .}.
dx
(b) Combine the formula from (a) with the reciprocal rule to prove that
d n
x = nxn−1 , ∀n ∈ {−1, − 2, − 3, . . .}.
dx
We now move on to rule (v), namely, the the chain rule.

Example 8.16 Using the chain rule, we get


d  d 3
sin(x3 ) = cos x3 · x
dx dx
= 3x2 cos(x3 ).

Here, we use the chain rule as formulated in Proposition 8.9 with f (x) = sin x and
g(x) = x3 . In particular, since f 0 (x) = cos x, this means that f 0 g(x) = cos(x3 ).

Exercise 8.17 Compute the derivatives of the following functions.


2)
(a) esin x (b) ln(1 + x2 ) (c) arctan(1/x) (d) xe(x

Remark: At first, the chain rule can be confusing. If you are struggling with this
exercise, continue reading, and then try again after taking a look at Example 8.20.
56 CHAPTER 8. THE DERIVATIVE

The chain rule is usually the computational rule for the derivative that requires the
most effort to master. The main reason is probably that the notation is sort of bad.
Indeed, notice that in our formulation of the chain rule, then
d
f g(x) 6= f 0 g(x) .
 
(8.1)
dx
So what is going on? Well, in the expression to the left, we are trying to say that one
should first compose f and g, to get f (g(x)) = sin(x3 ), and then take the derivative of
this composition. In the expression to the right, on the other hand, we mean to say that
you should first take the derivative of f (x) = sin(x), and afterwards compose the result
with g(x).
This difference is really not at all clear from how we write these expressions. So,
to make the chain rule easier to understand, it is common to introduce different letters
for the variables and write f (u) for the outer function and g(x) for the inner function.
With the Leibniz notation for the derivative (recall Remark 5.24), we can now write the
right-most expression in (8.1) as follows:
d d
f (u) or f (u) .
du du u=g(x)

These two expressions mean the same thing. However, in the right-most variant, we
make the extra effort of reminding the reader that only after taking the derivative, do
we put u = g(x). The chain rule can now be expressed as
d  d df du
f g(x) = f (u) = · .
dx dx du
|{z} dx
|{z}
outer der. inner der.

Example 8.16 (continued) In the case of sin(x3 ) then f (u) = sin u is the outer function
and g(x) = x3 is the inner function. This means that f 0 (u) = cos u is the outer derivative
and g 0 (x) = 3x2 is the inner derivative. By the chain rule, we get

d u=x3 d du
sin(x3 ) = sin(u) = cos u ·
dx dx dx
d 3
= cos(x3 ) · x
dx
= cos(x3 ) · 3x2 .

Exercise 8.18 Compute the derivatives in exercise 8.17 using this notation.

Exercise 8.19 Check the definition of the indefinite integral in Appendix ??, and
use it to pair the following integrals with the suitable expression. Note that to solve
8.1. COMPUTATIONAL RULES FOR THE DERIVATIVE 57

this exercise, you only need to be able to compute derivatives. (Why? Also, note that
you do not even need to know what the derivative of arctan x is.)
Z
dx 1
(a) 2
(i) ln(x2 + 4) + C
4+x 2
Z
dx 1 
(b) (ii) ln(2 + x) − ln(2 − x) + C
4+x 4
Z
dx
(c) (iii) ln(4 + x) + C
(4 + x)2
Z
x dx 1
(d) 2
(iv) − +C
4+x 4+x
Z
dx 1 x
(e) 2
(v) arctan + C
4−x 2 2

Let us consider one more example where we illustrate the use of the chain rule.

Example 8.20 We wish to compute


d p 
sin 1 + x3 .
dx
As with the product of three factors, we need patience, and we need to apply the chain
rule more than once. We typically work as follows:
d p  u=√1+x3 d
sin 1 + x3 = sin u
dx dx
chain rule du
= cos u ·
dx
p dp
= 1 + x3 ·
cos 1 + x3
dx
v=1+x3
p d√
= cos 1 + x3 · v
dx
chain rule
p 1 dv
= cos 1 + x3 · √ ·
2 v dx

2 cos 1 + x3
p 1 2 3x
= cos 1 + x3 · √ · 3x = √
2 1 + x3 2 1 + x3

In the last line, we used that dv/dx = (x3 )0 = 3x2 .


Exercise 8.21 Compute the derivative of y = ln ln(ln x) .
58 CHAPTER 8. THE DERIVATIVE

Exercise 8.22 Compute the derivatives of the following functions. Note that they
all have something in common. In particular, after having done this exercise, think
about what this means for their graphs, and plot them to see if you are correct.
1
(a) f (x) = arctan + arctan x
x
x
(b) f (x) = arcsin √ − arctan x
1 + x2
p p
(c) f (x) = 2 arctan(x − x2 − 1) + arctan x2 − 1

Hint: These functions – and how to compute their derivatives – have all appeared on
recent exams. You can find these exams, with full solutions, on the course website.
8.2. PROOF OF THE COMPUTATIONAL RULES FOR THE DERIVATIVE 59

8.2 Proof of the computational rules for the derivative


In this section, we take the point of view that proofs for the computational rules for the
derivative are essentially just examples of applications of the computational rules for the
limit.

Example 8.23 (Proof of the sum rule) We use the computational rules for the limit
to show that  0
f (x) + g(x) = f 0 (x) + g 0 (x).

Starting from the definition of the derivative, we do the following computation:


   
d   f (x + h) + g(x + h) − f (x) + g(x)
f (x) + g(x) = lim
dx h→0 h
 
f (x + h) − f (x) g(x + h) − g(x)
= lim +
h→0 h h
f (x + h) − f (x) g(x + h) − g(x)
= lim + lim = f 0 (x) + g 0 (x).
|h→0 {z h } | h→0
{z h }
=f 0 (x) =g 0 (x)

Notice that we could use the summation rule for the limit since we knew that both the
limits f 0 (x) and g 0 (x) exist.

Next, we turn to proving the product rule. While


it has the same flavour as the proof for the sum-
mation rule, it is slightly more complicated. In
particular, we need to use the fact that if you
are differentiable at a point, then you are also
continuous there.
In the following two exercises you are asked to Fig. 1. Differentiable functions are
verify that the diagram to the right is correct. always continuous.

Exercise 8.24 Show that if f has a derivative at x, then it is also continuous there.
What part of the diagram in Figure 1 does this justify?
Hint: Recall the formula from exercise 8.3, and find a chain of equalities showing that
 
lim f (u) − f (x) = · · · = 0.
u→x

Keep in mind that you know that f 0 (x) exists...


60 CHAPTER 8. THE DERIVATIVE

Exercise 8.25 We consider the function


(
x2 if x ≥ 1
f (x) =
x if x < 1

Use the definition of the derivative to deter-


mine whether f 0 (1) exists or not. What does
this example say about the diagram in Figure
1 on the previous page?
Remark: Pictured to the right is the
rather
P∞ −n extreme Weierstrass function f (x) = Fig. 2. The Weierstrass function.
2 cos(3 n πx). It is continuous at ev-
n=1
ery point but is nowhere differentiable.
Exercise 8.26 Prove the product rule for the derivative.
Mega long hint: What you need to do is to obtain a chain of equalities proving that

d  f (x + h)g(x + h) − f (x)g(x)
f (x) · g(x) = lim = · · · = f 0 (x)g(x) + f (x)g 0 (x).
dx h→0 h

As in the last steps of the proof of the sum rule,


you need to make both expressions (f (x + h) −
f (x))/h and (g(x + h) − g(x))/h appear. To to
this, you can use basically the same trick as we
used for the product rule for limits. That is, you
can add by 0 in a clever way. Or, you could use
the illustration to the right as inspiration. Here,
for the sake of simplicity, we suppose that f and
g are both increasing functions. Then f (x)g(x)
can be thought of as the blue area, while f (x +
h)g(x+h) as the total area shown. The difference
f (x + h)g(x + h) − f (x)g(x) can then be written
as the sum of the red and green areas. Express
this difference algebraically, and you can use it
Fig. 3.
to rewrite the numerator in the expression for
the derivative of f (x)g(x).

Exercise 8.27 (a) Use the definition of the derivative to figure out a formula for

d 1
.
dx g(x)

What assumption do we need to make on g(x)?


8.2. PROOF OF THE COMPUTATIONAL RULES FOR THE DERIVATIVE 61

(b) Use the formula found in (a) and the product rule for derivatives to derive a
formula for d f (x)
.
dx g(x)
Finally, we turn to the chain rule. It is – by far
– the most difficult of the computational rules to
prove. To prepare us for the proof, we give a sim-
ple, but unfortunately false, argument that helps
us understand what is going on (curiously, this
"proof" may be found in numerous high school Fig. 4. Fake proof ahead.
textbooks).
Example 8.28 (Fake "proof" of chain rule) We wish to compute the limit

d  f (g(x + h)) − f (g(x))


f g(x) = lim .
dx h→0 h
The trick is now to multiply with 1 in a creative way. Namely,

f (g(x + h)) − f (g(x)) f (g(x + h)) − f (g(x)) g(x + h) − g(x)


lim = lim ·
h→0 h h→0 h g(x + h) − g(x)
f (g(x + h)) − f (g(x)) g(x + h) − g(x)
= lim ·
h→0 g(x + h) − g(x) h
f (g(x + h)) − f (g(x)) g(x + h) − g(x)
= lim · lim .
h→0 g(x + h) − g(x) h→0
{z h
| {z } | }
(∗) =g 0 (x)

To end the proof, we need to compute the limit labeled by (∗). To do this, we make a
change of variables. That is, we set k = g(x + h) − g(x). Note that as h → 0, then k → 0
(this is true since differentiable functions are automatically continuous). This allows us
to write
  
f g(x + h) − f (g(x)) f g(x) + k − f g(x)  
lim = lim = f 0 g(x) .
h→0 g(x + h) − g(x) k→0 k

Notice that the last expression here means the derivative of the function f evaluated at
the point g(x).

Exercise 8.29 (a) What is the problem with the above proof? (A correct proof is
supplied on the following page.)
(b) This "fake proof" can be used to come up with a correct proof for the differenti-
ation formula for inverse functions from Proposition 8.10.
Hint: In (b), use the "fake proof" to differentiate f −1 (f (x)).
62 CHAPTER 8. THE DERIVATIVE

Correct proof of the chain rule


Again, our goal is to compute the limit
d  f (g(x + h)) − f (g(x))
f g(x) = lim .
dx h→0 h

Now, the problem with the fake proof of the chain rule occurs when we multiply by one.
Indeed, the expression
g(x + h) − g(x)
g(x + h) − g(x)
may be of the form 0/0 an infinite number of times as h → 0. That is, we need to find
an alternative approach that avoids division by g(x + h) − g(x).
For this reason, let us now consider the definition of f 0 , which we write up as follows:
f (u + k) − f (u)
f 0 (u) = lim . (8.2)
k→0 k
As in the fake proof, we want to put k = g(x + h) − g(x). However, the problem we
mentioned above then becomes precisely that k may be zero for various values of h, and
we may therefore not divide by it. But here is the crucial step: before we make the
connection between k and g(x + h) − g(x), we define

f 0 (u) − f (u + k) − f (u) , k 6= 0,

E(k) = k (8.3)
0 k = 0,

where we keep in mind that we consider u as being fixed and k as the variable. Here, we
also notice that since (8.2) holds, it follows that

lim E(k) = 0.
k→0

That is, when defined in this way, the function E(k) is continuous at the origin. Why
did we do all of this? Well, by multiplying up k, and rearranging (8.3), we can now write
 
f (u + k) − f (u) = f 0 (u) − E(k) k.

Since this expression is fine for k = 0, it is now safe to put u = g(x) and make the
connection k = g(x + h) − g(x), which, in particular, means that k → 0 as h → 0 and
that we can write
  
f (g(x) + k) − f (g(x)) = f 0 (g(x)) − E(k) g(x + h) − g(x) .

Here, we did not write out the first k since this will make the computation that follows
below slightly easier to read. Also, notice that we can rewrite k = g(x + h) − g(x) as
g(x + h) = g(x) + k.
8.2. PROOF OF THE COMPUTATIONAL RULES FOR THE DERIVATIVE 63

Finally, we have gathered all the necessary pieces needed to make the following
computation:

d  f (g(x + h)) − f (g(x))


f g(x) = lim
dx h→0 h
f 0 (g(x)) − E(k) k

= lim
h→0 h
  k
= lim f 0 (g(x)) − E(k) · lim
h→0 h→0 h
  g(x + h) − g(x)
= lim f 0 (g(x)) − E(k) · lim
k→0 h→0 h
= f 0 (g(x)) · g 0 (x).

And we are done!

Remark 8.30 While the above proof seems to be more complicated than the "fake"
proof, the general idea is basically the same. However, here, things get more complicated
as we need to do some extra bookkeeping to make sure that nothing bad happens if k
happens to be zero for h arbitrarily close to 0 (but not equal to 0).

Exercise 8.31 (Discussion) Is the "fake" proof really that bad? Can you think of
any conditions under which it will actually work, and do the functions we normally
consider in these lecture notes satisfy such conditions?

Exercise 8.32 Use the chain rule to prove the differentiation formula for invertible
functions from Proposition 8.10. Here, you may assume that both f and f −1 are
differentiable.
Remark: You should compare this exercise to exercise 8.29.

Remark 8.33 In the YouTube-film linked in the margin, here, another proof of the
formula for the derivative on an inverse function is given.
64 CHAPTER 8. THE DERIVATIVE

8.3 Differentiation formulas for elementary functions


In this section, we take a closer look at the formulas for the derivatives of the trigono-
metric functions sin x, cos x, tan x, the logarithm and the exponential function, as well as
the inverse trigonometric functions. But we start out by observing that the elementary
functions are less well-behaved with respect to differentiation than continuity.

Are all elementary functions differentiable? What, no!?


In Chapter ??, we observed that all elementary functions are continuous. Unfortunately,
the corresponding statement is not true for differentiability as is shown by the following
example.

Example 8.34
Let f (x) = arcsin(x) and g(x) = sin(x). Surely,
these functions are differentiable. However, this
is not the case for the composition f ◦ g(x) =
arcsin(sin x). As we see in the figure to the right,
there are plenty of pointy edges! The point is
that arcsin(x) is not differentiable at the end-
points of its domain, and this causes trouble
when x is such that sin x = ±1.
Fig. 5. Look! Pointy edges!

But all is not lost. The following proposition is analogue to Proposition ??.?? (notice
that it contains a little bit of "fine print").

Proposition 8.35 Suppose that f and g are differentiable. Then the same is true for
the functions f ± g, f · g and f /g. Moreover, if g is differentiable at a point x and f is
differentiable at the point g(x), then f ◦ g is also differentiable at x.

Exercise 8.36 Prove Proposition 8.35.


Hint: This is just a matter of using the rulebook for differentiation. For instance, the
differentiation rule (f g)0 = f 0 g + f g 0 in particular states that if f, g are differentiable
at a point x, then so is the product f g.
Exercise 8.37 The inverse function of an invertible and differentiable function is not
always differentiable. This statement is true even when adding the assumption that
the function is defined on an interval. For instance, the function y = x3 is continuous
and differentiable for all x ∈ R, but the same is not true for its inverse y = x1/3 .
Explain.
8.3. DIFFERENTIATION FORMULAS FOR ELEMENTARY FUNCTIONS 65

A first look at differentiation formulas for elementary functions


The point is now to establish the following differentiation formula.

Proposition 8.38
d 1
(i) log x = , x>0
dx x
d
(ii) sin x = cos x
dx
d
(iii) cos x = − sin x
dx

As it happens, we have already done the hard work in proving this proposition in
the previous chapter (see page ??). In the following exercises, the point is to help you
realise this.
Exercise 8.39 We now ask you to prove part (i) of the above proposition.
(a) Write out the definition of the derivative of the logarithm at x = 1 and verify
that we have already proved that formula (i) holds at this point.
(b) Use the logarithmic laws, and the change of variables rule for the limit, to extend
the differentiation formula to all other x.
Hint: The solution to (a) should reveal where to look for inspiration for (b).

Exercise 8.40 In this exercise, we ask you to prove parts (ii) and (iii) of the above
proposition.
(a) Write out the definition of the derivative of the sine and cosine at x = 0 and
verify that we have already proved that formulas (ii) and (iii) hold there.
(b) Use suitable trigonometric formulas, and the change of variables rule for the limit,
to extend the differentiation formulas to all other x.
Hint: The solution to (a) should reveal where to look for inspiration for (b).

Now, an interesting observation is that while the logarithm has domain x > 0, its
derivative y = 1/x has the much larger domain x 6= 0. This leads us to ponder if we can
somehow extend the logarithm to negative x in such a way that the derivative of this
extension is equal to 1/x there. This is the point of the following exercise:
Exercise 8.41 (a) Use the chain rule for the derivative to prove that for x < 0 we
have
d 1
log(−x) = .
dx x
(b) Use what you learned in (a) to write a formula for a function f (x) with domain
R\{0} such that f 0 (x) = 1/x there.
66 CHAPTER 8. THE DERIVATIVE

Implicit differentiation and more on derivatives of elementary functions


One of our main goals is to now use what we know about the derivatives of the logarithm
and the trigonometric functions to find the derivatives of their inverses. Now, technically
speaking, once we have proved Proposition 8.10 on the derivatives for inverse functions,
this should be smooth sailing. However, as students tend to find that formula hard to
use, we introduce a special technique called implicit differentiation in order to make it
as transparent as possible that taking derivatives of inverse functions is nothing but a
clever use of the chain rule.
So, what is implicit differentiation? Essentially, it is just an application of the chain
rule. We illustrate the main idea of differentiating "implicitly" by considering the prob-
lem of finding tangent lines to the unit circle:

Example 8.42 (Implicit versus explicit differentiation)

Our goal is to compute the slope of the tangent



line to the unit circle at the point (1/2, 3/2).
We first do it this in the “obvious” way. Indeed,
2
√ x +
the unit circle is described by the equation
2
y = 1. Solving for y leads to y = ± 1 − x2 .
Choosing + gives us the equation for the upper
half circle: p
y = + 1 − x2 .
Taking the derivative of this expression with re-
spect to x, we obtain the desired slope as follows: Fig. 6.

dp −x −1/2 √
y0 = 1 − x2 = √ =⇒ y 0 (1/2) = p = −1/ 3.
dx 1 − x2 1 − 1/4


2
√ 7. The function y = + 1 − x describes the upper semi-circle and y =
Fig.
− 1 − x2 describes the lower semi-circle.
Determining the slope using implicit differentiation: We now show how to find
this slope without first solving y explicitly as a function of x. To do this is to assume
that our curve is defined by some function y = f (x) close to the point we care about.
8.3. DIFFERENTIATION FORMULAS FOR ELEMENTARY FUNCTIONS 67

The point is that we can solve our problem without ever needing to know the formula
for f (x).
The first step is to put y = f (x) into the equation for the circle:

x2 + f (x)2 = 1.

Since the left-hand side is identical to the right-hand side for all x, the derivative of the
left-hand side has to be equal to the derivative of the right-hand side. We obtain:
d 2  d 
x + f (x)2 = 1 =⇒ 2x + 2f (x)f 0 (x) = 0.
dx dx
Here, f 0 (x) appears since it is the inner derivative when we use the chain rule to get
(f (x)2 )0 = 2f (x)f 0 (x). Rewriting the above expression, we get
x x
f 0 (x) = − ⇐⇒ y 0 = − .
f (x) y
0 0
√ used that y =0 f (x) and
In the last step, we just √ y = f (x). But this means that when
(x,y) is equal to (1/2, 3/2), we get y = −1/ 3.

To summarise, we say that we differentiate implicitly when we take the derivative


of an expression where y is not explicitly given in terms of x. If we leave out the step
where we give y the name f (x) (which is not really necessary), the above example can
be (partially) summed up as follows:
p explicit diff. −x
explicit solution: x2 + y 2 = 1 =⇒ y = ± 1 − x2 =⇒ y0 = ± √
1 − x2
implicit diff. x
implicit solution: x2 + y 2 = 1 =⇒ 2x + 2yy 0 = 0 =⇒ y 0 = − .
y

Observe that since y = ± 1 − x2 both methods really give the same result.

Exercise 8.43 Consider the curve given by the equation

x4 − y 4 − x2 + y 2 = 0.

(a) Determine y 0 as a formula of x and y by using implicit differentiation.



(b) Show that the points ( 3/2, 1/2) and (3,3) are on the curve. Determine the
tangent-lines through these points.
(c) Draw the curve in WolframAlpha or some other program. How can the curve
most easily be described in terms of other, more familiar, curves. First make a
guess, and then try to prove that your guess is true.
68 CHAPTER 8. THE DERIVATIVE
√ √
(d) What value do you get for y 0 at the point (1/ 2,1/ 2). Why do you think this
is?

Remark 8.44 (Implicit Function Theorem) To use implicit differentiation, we need


to know that we can think of the variable y as a function of x even if we cannot explicitly
solve y in terms of x. The theoretical justification of this is given by a result called the
Implicit Function Theorem. Since it is a result from several variable calculus, we will just
assume that this justification works in these lecture notes. We remark that the implicit
function theorem also states conditions for when y is differentiable.

We now apply implicit differentiation to the problem of finding the derivative of


inverse functions. The nice thing is that the technique allows us to use the differentiation
formula for inverse functions without actually using it. To explain what we mean by this
apparent non-sense, let us consider an example:

Example 8.45 We now use implicit differentiation to compute the derivative of the
function y = arcsin(x) by using the fact that it is the inverse function of the sine. That
is, for x ∈ Darcsin we have
y = arcsin x ⇐⇒ sin y = x,

where Darcsin = Rsin = [−1,1] and Rarcsin = Dsin = [−π/2,π/2].


The point is that we consider the relation sin y = x as an implicit equation for y.
That is, in considering y as a function of x, we can use implicit differentiation to compute
d d
sin y = x ⇐⇒ cos y · y 0 = 1
dx dx
1
⇐⇒ y 0 = .
cos y
Notice that the only thing we did here was to use the chain rule to get (sin y)0 = cos y · y 0 .
Next, plugging in y = arcsin x, we can write the last expression above as
1
(arcsin x)0 = − .
cos(arcsin x)

By the result of Example ??.??, we know that cos(arcsin x) = 1 − x2 , and so
1
(arcsin x)0 = √ .
1 − x2

Remark 8.46 Observe that when using implicit differentiation to study the derivatives
of inverse functions as we do above, then we have no need for the implicit function
8.3. DIFFERENTIATION FORMULAS FOR ELEMENTARY FUNCTIONS 69

theorem. Indeed, since the function is differentiable, we know that we can consider both
y as a function of x and x as a function of y (if needed).

Exercise 8.47 Use implicit differentiation to show that


d 1 d 1
(a) arccos(x) = − √ (b) arctan(x) = .
dx 1 − x2 dx 1 + x2
Hint: In (b), keep in mind that there is more than one way to express (tan x)0 .
Exercise 8.48 Use the fact that exp(x) is the inverse function of log x, and that
(log x)0 = 1/x to prove that
d
exp(x) = exp(x).
dx
Exercise 8.49 Use the definition of the complex exponential to prove that
d
exp(ix) = i exp(ix).
dx
Exercise 8.50 Use what you know about the logarithm and exponential functions to
prove that for all α ∈ R, we have
d α
x = αxα−1 , x > 0.
dx
Exercise 8.51 Use implicit differentiation to prove the differentiation formula for
inverse functions from Proposition 8.10.

Here is a summary of some of the differentiation formulas proven in this chapter.

Proposition 8.52
d
(i) exp(x) = exp(x)
dx
d
(ii) exp(ix) = i exp(x)
dx
d α
(iii) x = αxα−1 , α ∈ R, x > 0
dx
d 1
(iv) arcsin x = √ , x ∈ (−1,1),
dx 1 − x2
d 1
(v) arccos x = − √ , x ∈ (−1,1),
dx 1 − x2
d 1
(vi) arctan x = , x ∈ R.
dx 1 + x2
70 CHAPTER 8. THE DERIVATIVE

8.4 Exam exercises


There are no questions on the previous exams simply asking you to compute the deriva-
tive of a function. Therefore, all exercises below are part of some larger problem, and if
you consult the suggested solutions of the exams problems, you should find the details
of most of these computations.

Exercise 8.53 (Exam 2015-05-27, part of 5) Make a table of signs for the deriva-
tive of the function
x−1
f (x) = ln x − , x ≥ 1.
x+1
Exercise 8.54 (Exam 2014-08-18, part of 2) Make a table of signs for the deriva-
tive of the function
1
f (x) = 2 arctan x + , x 6= 0.
x
Exercise 8.55 (Exam 2014-05-26, part of 3) Make a table of signs for the deriva-
tive, and the second derivative, of the function

f (x) = |x|e−1/x , x 6= 0.

Exercise 8.56 (Exam 2012-12-19, part of 4) Make a table of signs for the deriva-
tive of the function

f (x) = |x3 − 6x2 + 9x − 4|, x ∈ [0,5].

Exercise 8.57 (Exam 2012-05-28, part of 1) Make a table of signs for the deriva-
tive of the function
2
p
f (x) = e−x /2 x2 + 1, x ∈ R.

Exercise 8.58 (Exam 2012-05-28, part of 3) Make a table of signs for the deriva-
tive of the function
1
f (x) = ln(1 + e−x ) − , x ∈ R.
ex +1
8.5. ANSWERS TO SELECTED EXERCISES 71

8.5 Answers to selected exercises


8.1 In (i) and (ii), the point is to use the expressions defining sinh x and cosh x. In
(iii), use the fact that the derivative of sinh x is strictly positive for all x ∈ R. In
(iv), use implicit differentiation.

8.6 No.

8.7 C = 1/2, D = −1/2.

8.13 (a) (2x arctan(x) − 1)/ arctan(x)2 , (b) 1/(1 + cos(x)), (c) 2(ln(x) − 1)/(ln x)2 .
2
8.17 (a) cos xesin x , (b) 2x/(1 + x2 ), (c) −1/(1 + x2 ), (d) (1 + 2x2 )ex

8.19 (a) - (v), (b) - (iii), (c) - (iv), (d) - (i), (e) - (ii).

8.22 (a) 0, (b) 0, (c) 0.

8.24 Here is an additional hint: multiply the expression in the original hint by one in
such a way that you can take advantage of the fact that the limit

f (u) − f (x)
lim
u→x u−x
exists and is equal to some finite number.

8.25 It says that there are functions that are continuous but not differentiable. That is,
that the two areas in the Venn diagram do not coincide.

8.27 (a)
1 1
g(x+h) − g(x) g(x) − g(x + h) 1 1 g(x + h) − g(x) g 0 (x)
lim = lim =− lim · =− .
h→0 h h→0 hg(x)g(x + h) g(x) h→0 g(x + h) h g(x)2

(b) Apply the product rule to f (x) · 1/g(x).


2) √
8.43 (a) y 0 = x(1−2x 2
y(1−2y )
, (b) the tangent lines are y = − 3x + 2 and y = x, respectively,
2 2
(c) it is the union of the circle x +y = 1 and the lines y = x and y = −x. This can
be seen algebraically from the identity x4 −y 4 −x2 +y 2 = (1−x2 −y 2 )(y −x)(y +x).

8.48 Let y = ex , then log y = x. Now differentiate implicitly on both sides, and then
substitute back.

8.51 Let y = f −1 (x), then f (y) = x. Now differentiate implicitly on both sides, and
then substitute back.
Chapter 9

How to compare infinities

Our goal is to explore a rather surprising con-


sequence of the fact that the real numbers corre-
spond to the decimal numbers: not all infinities
are the same! This was first realised by Georg
Cantor who became a laughing stock among sev-
eral contemporary mathematicians who thought
his theory on the cardinality of sets was absurd.
In fact, some say that this conflict drove him
mad.
Cantor’s ideas on infinities led to much heated Fig. 1. Georg Cantor ca 1870. The
debate in mathematical circles. In fact, ac- underdog.
cording to wikipedia (a most trusted source),
the famous mathematician, and rival to Cantor,
Leopold Kroenecker is quoted as saying:

“I don’t know what predominates in


Cantor’s theory – philosophy or the-
ology, but I am sure that there is no
mathematics there!”

To see what the fuzz was all about, we have to


explore what Cantor’s definition has to say about
infinite sets.
Fig. 2. Leopold Kroenecker ca. 1865.
First let us consider the following question:
Famous mathematician and a true
which set is the largest, the natural numbers N
humanitarian.
or the integers Z? The answer may seem obvious,
since Z clearly contains more numbers. Cantor, however, disagreed. He realised that
when considering infinitely large sets, we need to be more careful.

C-1
C-2 CHAPTER 9. HOW TO COMPARE INFINITIES

Indeed, consider the following: how can we tell that the sets B = {Alfred, Elias, Nils}
and G = {Anna, Emma, Nora} are of the same size? Well, a reasonable answer is that
if we can partner up every element of B with an element of G in such a way that no
element of G is left out or used twice, then B and G are of the same size. Here, is one
such pairing:

Alfred ↔ Anna
Elias ↔ Emma
Nils ↔ Nora

To use a more mathematical term, such a pairing is called a bijective correspondence, or


bijective map, between the elements of B and G.
Inspired by this observation, Cantor made the following definition.

Definition I.1 We say that two sets A and B have the same size (he used the term
cardinality) if there exists a bijective correspondence between the elements of A and B.

Please take a moment to notice that the only complication at this point is our use of
the words “bijective” and “cardinality”. In fact, this is just about finding dance partners!
To take another example, let A = {1,2,3,4} and B = {101,102,103,104}. These have the
same cardinality since we can partner up elements from A and B, just as we did with B
and G, above (that is, we can find a bijective correspondence).

Example I.2 The sets N and Z are, according to Cantor, of the same size. To prove
this, we need to find a way to partner up the elements of N and Z with a bijective
correspondence. This may seem hopeless since Z contains all of N, and then some – but
there is a trick that works! We illustrate it in the following figure:

Fig. 3. Here is how to give all numbers in Z a partner in N. First, 0 gets 0. Then 1
gets 1, then −1 gets 2, then 2 gets 3, then −2 gets 4, and so forth.
That is, we partner the elements as follows:

Z N
C-3

0 ←→ 0
1 ←→ 1
−1 ←→ 2
2 ←→ 3
−2 ←→ 4
..
.

In this way, every element of Z is paired bijectively with an element of N. Therefore,


according to Cantor, the sets Z and N are of the same size.

While this example may be surprising, it was not really what got grumpy mathe-
maticians like Kroenecker upset. Let us push this one step further.

Example I.3 The sets N and Q are of the same size. That is, there are exactly as
many natural numbers as there are rational numbers (that is, fractions). This time, we
indicate the partnering through a sequence of illustrations. First, let us represent the
rational numbers in a coordinate system:

Fig. 4. In this way, all rational numbers can be represented by a coordinate in the
upper half-plane.
Notice that some rational numbers appear more than once here. For instance, 1/1 = 2/2,
and 3/2 = 6/4 and so forth. So, we delete the superfluous ones:
C-4 CHAPTER 9. HOW TO COMPARE INFINITIES

Fig. 5. Note that we place 0 at the origin since it looks prettier.


So, how to partner up the rational numbers with the natural numbers? Well, we draw
a path as follows:

Fig. 6. Here is how to give all numbers in Q a partner in N.


That is, we partner the elements as follows:
Q N
0 ←→ 0
1 ←→ 1
1/2 ←→ 2
−1/2 ←→ 3
−1 ←→ 4
C-5

..
.

In this way, every element of Q is paired bijectively with an element of N. Therefore,


according to Cantor, the sets Q and N are of the same size.

Remark I.4 Note that the previous two examples can be adjusted to show that if A and
B both have the same size of N, then both the union A ∪ B and the Cartesian product
A × B = {(a,b) : a ∈ A, b ∈ B} also have the same size as N.

Although this is perhaps even more surprising than Example I.2, still this was not the
reason why grumpy old Kroenecker got upset. If you did not like Cantor’s arguments,
you simply could choose to ignore his definition and move one with your life.
However, the following example made sparks fly.

Example I.5 The set R is not of the same size as N. That is, the infinity describing
the size of the real numbers is strictly larger than the infinity describing the size of the
natural numbers. Hence, there are different infinities!
Let us now see if we can understand Cantor’s famous diagonal argument which proves
that there are different types of infinities. The argument is quite short, but as you can
understand from the controversy, many found it hard to swallow.
What we need to prove is that there is no way to write a list

R N
0,23482968762... ←→ 0
5,34243438421... ←→ 1
0,73923212253... ←→ 2
7,23529523532... ←→ 3
3,04360158943... ←→ 4
..
.

that partners all elements of R with elements of N bijectively. The intuitive idea is that
there simply are not enough numbers in N, so that most elements of R have to be missing
from such a list (even if it is infinitely long – because this infinity is the infinity of N,
and it is simply too small to count all of R!).
The proof is by contradiction. So, suppose that there actually does exist such a list
pairing all elements of R bijectively with an element of N. For the sake of concreteness,
let us suppose that it starts exactly like the list above (it does not really matter. Another
C-6 CHAPTER 9. HOW TO COMPARE INFINITIES

detail is that we need to suppose that numbers ending with an infinite sequence of 9’s
instead are written so that they end with an infinite seqeunce of 0’s.) The point is that
we can now prove that there is at least one x ∈ R which cannot belong to this list.
To do this, let us choose decimal digits from the list in a diagonal fashion:

R N
0,23482968762... ←→ 0
5,34243438421... ←→ 1
0,73923212253... ←→ 2
7,23529523532... ←→ 3
3,04360158943... ←→ 4
..
.

We now claim that we can write down a number x which is not on this list, based on
the digits in red. How do we do this? Well, it is quite straight-forward: we write down
a number x according to the following rules:

• The first decimal is different from the first decimal of the first number.

• The second decimal is different than the second decimal of the second number.

• The third decimal is different than the third decimal of the third number.
..
.

This is achieved, by, for instance, replacing every 0 by 1, every 1 by 2, and so on, until
the digit 9, which we replace by 0 (or 8, if we want to be extra careful). That is,

x = 0,35031...

When constructed in this way, this number x must be different from all numbers on the
original list since it differs from each and every one by at least one decimal digit. Since
we began by supposing that all x ∈ R was on the list, we have reached a contradiction!
Therefore, no such list exists, and so, no bijective pairing of N and R exists.
C-7

And there you have it! The infinity of R has


to be larger than the infinity of N. While some
mathematicians thought this was ridiculous, oth-
ers actually realised that Cantor, in fact, had
opened the eyes of the mathematical community
to new ideas and new directions of thought. Per-
haps the most influential of all mathematicians
in the early 1900’s, David Hilbert, said the fol-
lowing about Cantor’s work (also this according
to wikipedia):
Fig. 7. David Hilbert ca. 1912. A
“No one will drive us from the par- true gentleman.
adise which Cantor created for us.”
These days, Cantor’s ideas are a firm part of mainstream mathematics. Let us end by
noting the following definition.

Definition I.6 Sets of the same cardinality as N are called countably infinite, and sets
which have a strictly larger cardinality than N are called uncountably infinite.

The sets N, Z √
and Q are all countably infinite, while the sets R and R\Q (the
irrational numbers 2, π, e and so forth) are uncountably infinite.
Chapter 10

A crash course in Python

10.1 Introduction
How to install Python 3 and write your first program
Go to the webpage https://www.spyder−ide.org and download and install the latest
version of Spyder. Once installed, launch this program. On a Mac (a few versions ago,
at least) this would open the following window.

Fig. 1. The graphical user interface of the Spyder editor for Python.
Exercise J.1 Type print("Hello world") in the editor window and press the green
play button. What output do you get? Also try print("2+2 = ", 2+2).
Remark: Notice how Python treats "2+2" as text to be printed, and 2+2 as something
to be computed, and that a comma is used to separate different type of input.

D-1
D-2 CHAPTER 10. A CRASH COURSE IN PYTHON

A first look at basic syntax for arithmetic in Python


We now consider how to do basic arithmetic in Python. Here is a first example.

Example J.2 Enter the following code into the Spyder editor.

1 a = 2
2 b = 4
3 c = a + b
4 print ("a + b = ", c)

When you press the green play button you get the output a + b = 6.

This seems reasonable. Here, a, b and c are what is called variables. Technically
speaking, a variable is a space in the memory of the computer which can store one piece
of information, such as a number. In the above example, they work pretty much like
what we would expect of a mathematical variable. But the next example shows that this
is not always the case:

Example J.3 Enter the following code into the Spyder editor.

1 a = 2
2 b = 4
3 c = a + b
4 a = 10
5 print ("a + b = ", c)

When you press the green play button you still get the output a + b = 6.

Wait, what? Does Python really mean that 10 + 4 = 6? Well, no. The program is
doing exactly what it is told. The thing is that we, ourselves, do not really understand
what we just asked Python to do. So let us try to figure it out. As an example. here is
a line by line explanation of what happens in Example J.2:

Line 1: Python creates a variable a and assigns


it the value 2.
Line 2: Python creates a variable b and assigns
it the value 4.
Line 3: Python creates a variable c, computes
the value a + b, and assigns this value to c .
Fig. 2. To explain the code, we
Line 4: Python prints the value of c on the replace the equal signs by arrows.
screen (together with the string of text "a+b").
10.1. INTRODUCTION D-3

The above example illustrates several peculiarities of arithmetic in Python and how
a code is run:
1. The code is executed line by line.
2. Expressions such that a = b should be read from the right to the left. That is, a
is assigned whatever value b has, but not vice versa (if b does not have a value,
the program will (probably) crash). For this reason, an arrow makes more sense
than an equal sign.
3. If a variable gets assigned some value, it has no memory of how that happened.
That is, in the second example, c gets assigned the value a+b. But when a is
changed in the next line, this does not affect the value stored in c.

Exercise J.4 Explore the various ways to do arithmetic operations in Python by


running the following commands for suitable values of a and b:
(a) a+b (b) a−b (c) a∗b (d) a∗∗b (e) a/b (f ) a//b (g) a%b

We now turn to a discussion of the basic syntax in Python. That is, what are the
basic rules for how we are allowed to write a code? First, let us discuss the types of
names you are allowed to give a variable:

Example J.5 You can give variables much more interesting names than just, say, a, b,
x or y. Here is an example of a perfectly well-functioning program:
1 ponies = 2
2 cookies = 4
3 pony_Cookie 32 = ponies + cookies
4 print ( pony_Cookie 32)

We make the following remarks:

4. Python is case sensitive. This means that n and N are as different as n and m.
5. By "tradition", the name of a variable should never start with an upper case letter.
6. A variable cannot be given a numerical name such as 2 or 34 (in particular, this
means that the code 2 = a will crash, since Python is trying to create a variable
called 2 and assign it whatever value is stored in the variable a). However, numbers
can be part of the name of a variable, as long as the name starts with a letter or
an underscore ’ _’. In particular, pony_Cookie32 is perfectly fine. Note that certain
special symbols, such as $, # and % can never be used in the name of a variable.
7. The name of a variable cannot contain a "space". That is, you cannot give a
variable the name pony Cookie32. Instead, you will have to use something like
pony_Cookie32 or just ponyCookie32.
D-4 CHAPTER 10. A CRASH COURSE IN PYTHON

8. You should not give a variable a name that already means something different. For
instance, do not give a variable the name print (however, Print is fine).

Here is another example, followed by more comments:

Example J.6 In Python, we need to be careful with how we place indentations. For
instance both of the following programs will crash.
1 a = 2 1 a = 2
2 b = 4 2 b = 4
3 c = a + b 3 c = a + b
4 print (c) 4 print (c)

9. In Python we need to be careful with how we place indentations. If Python does


not understand why we make some indentation, it will panic and crash.
10. Later in this chapter, we will see situations where indentations become a natural
part of the code. This is when we use, for instance, for-loops, while-loops, the
if- else structure, and when we define functions.

Here is one last example, followed by even more comments:

Example J.7 In contrast to the situation with indentations, Python is much less
sensitive with respect to whether or not we skip a line. In particular, the following code
will run just fine:
1 a = 2; b = 4 # Two commands on the same line.
2 c = (a + b) ∗ 3 # Here we use soft parentheses .
3
4 print (c)

11. Python does not care if you skip a line, or twenty.


12. You can place multiple commands on the same line, as long as they are separated
by a semi-colon ’ ;’.
13. Everything you write on a line following a hashtag # is ignored by Python. This
allows programmers to write comments throughout a code to explain it (usually to
themselves).
14. You can use "soft" parentheses ’ (’ and ’ )’ just as you would in a normal compu-
tation. However, these cannot be replaced by hard parentheses ’ [’, ’ ]’ or curly
parentheses ’ {’, ’ }’ as these all mean something different to Python.
10.1. INTRODUCTION D-5

15. It is a good thing for a program to crash. It is a way for Python of letting you
know that whatever result the program would have given you would probably be
false (since it was written in a bad way). What is much more dangerous is if a
program is wrong because of, say, some mathematical mistake in some formula
that still makes sense to Python (say, if a plus is mistakenly replaced by a minus).
Then the code will run, and you will not be warned that something is wrong :-(

Exercise J.8 Will the following codes run? If not, why? If yes, what is the output?
(a) (b)
1 a = 2 1 a = 2
2 b = 4 2 b = 4
3 6 = a + b 3 a = a + b
4 print (a+b) 4 print (a)

(c) (d)
1 a = 2 1 a = 2
2 b = 4 2 b = 4
3 a = b 3 c = a + b
4 print (b) 4 print (c)

(e) (f )
1 a = 2; b=4; a = a + b 1 Ponies = 2
2 2 Cookies = 4
3 3 Rainbows = ponies + cookies
4 print (b) 4 print ( rainbows )

Exercise J.9 Consider the following codes. Can you express mathematically what
that they compute?
(a) (b)
1 a = 1 1 a = 1
2 a = a + 1/2 2 a = 1/(1+a)
3 a = a + 1/4 3 a = 1/(1+a)
4 a = a + 1/8 4 a = 1/(1+a)
5 a = a + 1/16 5 a = 1/(1+a)
6 6
7 print (a) 7 print (a)

Hint: You have already seen these objects on page ??.


D-6 CHAPTER 10. A CRASH COURSE IN PYTHON

A first look at how to efficiently repeat commands in Python


In exercise J.9 above, you were asked to consider a code where an operation was repeated
multiple times. When programming, we sometimes want to ask the computer to repeat
some operation perhaps a million or more number of times. Do we really need a million
or more lines of code to do this. Well, of course not. We now explain one way this can
be done using a so-called for-loop.

Example J.10 (For-loop) The following codes do exactly the same thing when run:

1 a=0
2 print (a)
3 a=1
1 for n in range(0,4):
4 print (a)
2 a = n
5 a=2
3 print (a)
6 print (a)
4 print ("the end!")
7 a=3
8 print (a)
9 print ("the end!")

We explain the right-most code:


Line 1: Here, the command for n in range(0,4) means that something is to be
repeated once for each n between 0 and 3 (but not 4). What is to be repeated?
Well, the code on every line following this one, and which has an indentation1 .
Lines 2 and 3: These lines are indented. This tells Python that they are to be
run by the for-loop (it does not matter how much they are indented, as long as
they have the same indentation). Specifically, lines 2 and 3 will first be run with
n = 0, next with n = 1, and so on until n = 3. After this, the for-loop ends and
Python moves on to the first unindented line following it.
Line 4: Python prints "the end!" as output, and the program ends.

Exercise J.11 Will these codes run? If so, what are their outputs?
(a) (b)
1 for n in range(0,4): 1 for n in range(0,4):
2 print (n) 2 print (n)
3 print (" mississippi ") 3 print (" mississippi ")
4 print (" hello ") 4 print (" hello ")

Exercise J.12 Rewrite the code in exercise J.9 using for-loops.


1
Note that it does not matter how large this indentation is, however, the indentation must be the
same for all lines in the loop.
10.1. INTRODUCTION D-7

Some words on datatypes in Python


Python stores variables in different ways depending on whether they contain text, num-
bers or, say, lists. We say that Python uses different datatypes.
For instance, Python will treat a number in differently depending on whether or not
it is an integer. Indeed, writing a=2 stores the number 2 as the datatype integer,
while writing a = 2/3 stores the fraction 2/3 as the datatype float (called floating point
numbers). The advantage of doing this is that it is much easier for a computer to deal
with numbers that do not have long decimal expansions, which allow for programs to
run quicker and use less memory. Moreover, as we will see in Section 10.5, a computer
cannot actually store an infinite number of decimals, which inevitably leads to round-off
errors and lack of precision when doing computations with floating point numbers.
To check the datatype of a number we can use the built-in function type.

Example J.13
1 a = 2 # Here , we store the value 2 as an integer
2 print (type(a)) # Here , we check the type and tell Python to print
3 # the result on screen (it will print "int ").

Exercise J.14 (a) Check the type of the variable defined by b = 3/2.
(b) Check the type of the variable defined by c = 4/2.
In addition to integers and floating point numbers, we will encounter the following
datatypes in this chapter:

• String: A variable that contains a string of text is called a string. For instance,
a = "Hej" will create a variable containing the string of text "Hej".
• List: A variable that contains a list of other variables is called... well, a list.
For instance, a = [2, 3/2, "Hej"] creates a variable that contains an integer, a
floating point number and a string. A list can even be made up of other lists (more
on this on the following pages).
• Numpy array: A numpy array is a special type of list that can only contain either
integers or floating point numbers (but not both). Since it is more specialised than
the datatype list, this also means it can have more "advanced" features (more on
this on page D-25).

Remark J.15 As opposed to many other programming languages, Python is rather


forgiving when it comes to datatypes. For instance, when creating a new variable con-
taining some number, Python will itself make a decision of whether or not this is to be
an integer or a float.
D-8 CHAPTER 10. A CRASH COURSE IN PYTHON

How to store and manipulate sequences using Python lists


One way of storing sequences in Python is by using lists. Here are some examples of how
to create and manipulate lists:

Example J.16
1 a = [0,1,2,3,4,5] # list of integers
2 b = [" cheese ", 88] # list of a string and an icecream
3
4 print (a) # prints the entire list
5 print (a[0]) # prints 1st entry
6 print (a[3]) # prints 4th entry
7 print (a[−1]) # prints last entry
8 print (a[−2]) # prints second to last entry
9
10 len(a) # computes the length of the list
11 sum(a) # sums the terms of the list
12 max(a) # returns the largest entry of the list
13 min(a) # returns the smallest entry of the list
14
15 a. append ("oi") # adds entry "oi" to the end of the list
16 a.pop(3) # deletes the 4th entry in the list
17 c = a+b # creates the list [0 ,1 ,2 ,3 ,4 ,5 ," cheese " ,88]
18 d = b∗2 # creates the list [" cheese " ,88 ," cheese " ,88]
19
20 e = a[2:5] # creates the list [2 ,3 ,4] ( called a " slice ")
21 f = a[2:] # creates the list [2 ,3 ,4 ,5]
22 g = a[:5] # creates the list [0 ,1 ,2 ,3 ,4]
23 h = a[1:4:2] # creates the list [1 ,3] ( every second term)

The code is more or less explained by the comments, but we note the following:

Lines 1, 2: It is important that we use the square brackets ’ [’ and ’ ]’ when


creating a list and that we separate each entry of the list by a comma ’ ,’.
Lines 5, 6, 7, 8: Here we show how to display various entries of a list. We also
note that by writing, say, a[3] = 10, we change an entry in the list.
Lines 11, 12, 13: These only work if the list only contains numbers.
Lines 15, 16, 17, 18: There are other ways of representing sequences in Python,
and these may have many of the above features. However, what makes lists stand
out is how easy they are to modify.
Line 20, 21, 22, 23: This is called the slice notation for lists.
10.2. HOW TO COMPUTE AND VISUALIZE SEQUENCES AND SUMS D-9

10.2 How to compute and visualize sequences and sums


Some ways to compute sequences in Python
Suppose we want to compute a lot of entries of the sequence
 1 ∞  1 1 
= 1, , , · · · .
2k k=0 2 4
First, we point out that it is useful to use for-loops (this example should be compared
to Example J.10).

Example J.17 (Computing sequences using a for-loop) The following two codes
do exactly the same thing when run:

1 print (1/2 ∗∗ 0)
1 for k in range(0,3):
2 print (1/2 ∗∗ 1)
2 print (1/2 ∗∗ k)
3 print (1/2 ∗∗ 2)

Another way of computing lists is to use a list commands called a list comprehension.
It has the benefit of both computing and storing a sequence of numbers in just a single
line of code.

Example J.18 (Computing sequences using a list) The following two codes do
exactly the same thing when run:

1 a = [1/2 ∗∗ 0,1/2 ∗∗ 1,1/2 ∗∗ 2] 1 a = [1/2 ∗∗ k for k in range(0,3)]


2 print (a) 2 print (a)

Exercise J.19 Run and compare the output of the codes in the above examples.

Exercise J.20 A difference between the right-most codes in examples J.17 and J.18
is that in the first, we do not store the values of the sequence anywhere. Here are two
suggestions for how to fix this:

1 a = 0 1 a = []
2 for k in range(0,3): 2 for k in range(0,3):
3 a = 1/2 ∗∗ k 3 a. append (1/2 ∗∗ k)

Explain what is going in each code. What information is stored at the end of each
program? Also, why do these not give any output? Can you fix this?
D-10 CHAPTER 10. A CRASH COURSE IN PYTHON

Some ways to compute partial sums of infinite series


On the previous page, we discussed how to compute and store values of a sequence. Let
us now consider how to compute the sum of the first, say, million terms of the infinite
series

X 1 1 1
k
= 1 + + + ··· .
2 2 4
k=0
As above, we are going to discuss several ways that this can be done.
This first example should be compared to part (a) of exercise J.9.

Example J.21 (Computing sums using a for-loop) The following two codes both
compute the sum
1 1 1
1+ + + .
2 4 8

1 a = 0
1 a = 0
2 a = a + 1/2 ∗∗ 0
2 for k in range(0,3):
3 a = a + 1/2 ∗∗ 1
3 a = a + 1/2 ∗∗ k
4 a = a + 1/2 ∗∗ 2

To understand the code in the above example, recall Figure 2 on page D-2, and read
the explanation following Example J.10.
Here is how to compute the same sum using lists and list comprehensions.

Example J.22 (Computing a sum using lists) The following two codes do exactly
the same thing when run:

1 a = [1/2 ∗∗ 0,1/2 ∗∗ 1,1/2 ∗∗ 2] 1 a = [1/2 ∗∗ k for k in range(0,3)]


2 b = sum(a) 2 b = sum(a)

Note that using the technique of this example, we could actually compute the sum
in just one line (!):

1 b = sum ([1/2 ∗∗ k for k in range(0,3)])

Exercise J.23 (a) Run the codes from the examples on this page. Why do they not
give any output? Fix this.
(b) Modify (some of) the code on this page so that you can compute the sum of the
first million terms or so. What result do you get?
10.2. HOW TO COMPUTE AND VISUALIZE SEQUENCES AND SUMS D-11

How to visualise sequences and partial sums using for-loops


We now consider how to visualise the data we computed on the previous pages. This is
the first point where we will see that for-loops are more flexible, and gives code that is
easier to read, than list comprehensions2 .

Example J.24 The following code is perhaps the simplest way to visualise a sequence
in Python. Here, we visualise the sequence
 1 9
.
2k k=0

1 import matplotlib . pyplot as plt


2
3 for k in range(0,10):
4 plt.plot(k,1/2 ∗∗ k,"bo")
5
6 plt.show ()

To the right, we see the output.

Here is how this code works:

Line 1: We import the "package" matplotlib.pyplot. At this point, we do not


really have to know what this means. But, in short, this package provides Python
with additional commands that help us visualise stuff. Adding as plt allows us to
refer to this package by the shorter name plt.
Lines 3, 4: Here we run a for-loop. At each iteration, it asks Python to plot a
blue filled circle (this is what the "bo" means) at the coordinate (k,1/2k ).
Line 6: The command plt.show() tells Python that we are done with our figure,
and that it is time to display it. Any subsequent use of plt.plot will create a new
figure to be displayed separately.

Let us now immediately modify the above code, so that we can plot partial sums of
the infinite series

X 1
.
2k
k=0
That is, we want to plot the first few we get when we compute
0 1 2
X 1 X 1 1 X 1 1 1
= 1, =1+ , =1+ + ,
2k 2k 2 2 k 2 4
k=0 k=0 k=0
2
We show how to visualise data stored as lists on page D-13.
D-12 CHAPTER 10. A CRASH COURSE IN PYTHON

and so on. The nice thing is that we can achieve this by a relatively minor modification
of the above code. Notice how this code combines what we did in examples J.21 and
J.24.

Example J.25 The following code computes and visualises exactly the partial sums
expressed above.
1 import matplotlib . pyplot as plt
2
3 a = 0
4
5 for k in range(0,3):
6 a = a + 1/2 ∗∗ k
7 plt.plot(k,a,"bo")
8
9 plt.show ()

Exercise J.26 Run this code, and verify by computing the first three partial sums
by hand that the plot is correct. What happens when you increase range(0,3) to, say,
range(0,100)?

Exercise J.27 Several commands can be used to change how the above figure looks.
Try inserting the following into the code. This can be done anywhere after the import
and before plt.plot(). What happens?
(a) Replace "bo" by "rx".
(g) plt.xticks([−2,0,3,4,6.5,10])
(b) plt.xlim(−3,15)
(h) plt.yticks([0,1,1.5])
(c) plt.ylim(−5,2)
(i) plt.grid(True)
(d) plt.xlabel("There was")
(j) plt.figure(figsize=(4,3))
(e) plt.ylabel("a graph")
(k) plt.savefig("myfigure.jpg")
(f ) plt.title("that had a title")

Remark: Note that by choosing the extension ".jpg" in (k), you actually tell Python
to save your figure as a jpg-file. Note that only a limited number of file formats are
supported. For more on how to plot in Python, check out the official tutorial
https://matplotlib.org/users/pyplot_tutorial.html which has a ton of infor-
mation and examples.
10.2. HOW TO COMPUTE AND VISUALIZE SEQUENCES AND SUMS D-13

Visualising sequences and partial sums stored as lists


We now give an example where we visualize a sequence stored as a list.

Example J.28 We now use lists to create and plot the sequence
 1 9
.
2n n=0

1 import matplotlib . pyplot as plt


2
3 nValues = [n for n in range(0,10)]
4 a = [2∗∗(−n) for n in nValues ]
5
6 plt.plot(nValues ,a,"bo")
7 plt.show ()

To the right, we see the output. Note that it


matches exactly that of Example J.24.

While this example is quite similar to Example J.24, let us explain what happens in
lines 3, 4 and 6 a bit more carefully:

Lines 3, 4: Here we create two lists. The first is nValues = [0, 1, 2, ..., 9]
which will give us the n-values to be used in the plot (these will be placed on the
x-axis). The second list, a, contains the sequence we are trying to plot. These
will play the part of the y-values in our plot. Note that in the for command, we
can replace the range command by any other list. This is a particular, and rather
elegant, feature of Python.
Line 6: Here, we are creating the plot itself. In the command plt.plot(nValues,a,"bo"),
the we use the two lists created in lines 3 and 4. The first will be interpreted as
the x-coordinates to be used, and the second as their corresponding y-coordinates.
Note that it is therefore crucial that these two lists are of the same length (if not,
Python will crash), and explains why it is a good idea to let the for-loop in line 4
be defined in terms of n running through nValues, as this will guarantee that the
lists nValues and a have the same length.
D-14 CHAPTER 10. A CRASH COURSE IN PYTHON

We now give an example of how to visualise partial sums using lists.

Example J.29 As in Example J.25, we consider the infinite series



X 1
.
2k
k=0

The following code should be compared to that of Example J.28, above. It will compute
the 19 (!) first partial sums of the series. Here, we use the name "indices" instead of
"nValues", and vary the letter used for the index from line to line to emphasise that the
choice of letter really does not matter.

1 import matplotlib . pyplot as plt


2
3 indices = [i for i in range(0,20)]
4 a = [1/2 ∗∗ k for k in indices ]
5
6 S = [sum(a[0:n+1]) for n in indices ]
7
8 plt.plot(indices ,S,"bo")
9 plt.show ()

To understand this code, you should first read the explanation for Example J.28.
The difference is what happens in line 6:

Line 6: Here, we use the sum command to sum slices (see Example J.16) of the
list. Here, you should keep in mind that the slice, say, a[0:19] gives you the entries
a[0], a[1], . . . , a[18]. In particular, there is no point in computing a[0:0] as
this slice contains no terms. More explicitly:
0
X 1
n = 0 =⇒ sum(a[0 : 0 + 1]) = sum(a[0 : 1]) = ,
2k
k=0
1
X 1
n = 1 =⇒ sum(a[0 : 1 + 1]) = sum(a[0 : 2]) = ,
2k
k=0
..
.
19
X 1
n = 19 =⇒ sum(a[0 : 19 + 1]) = sum(a[0 : 20]) = .
2k
k=0

Exercise J.30 Implement the code in Example J.29. Verify that it gives the same
output as Example J.25 when the range in the latter example is suitably adjusted.
10.2. HOW TO COMPUTE AND VISUALIZE SEQUENCES AND SUMS D-15

A slightly more advanced example: the Fibonacci sequence


Above we have considered examples of sequences and partial sums that can be computed
both using "pure" for-loops or for-loops inside of lists (i.e., list comprehensions). Let
us now consider a situation where we need to adapt these methods slightly, and use the
best of both worlds.
Namely, we consider the sequence

(1,1,2,3,5,8,13, . . .).

These are the Fibonacci numbers. In general, the n’th Fibonacci number is given by
continuing this list using the rule

an = an−1 + an−2 .
Note that if we intend for the sequence to start at a0 , then this rule cannot be used to
compute a0 or a1 , since they would then depend on a−2 and a−1 . Therefore, it is more
correct to say that the Fibonacci numbers are created from the set of rules:

 a0 = 1,

a1 = 1,

an = an−1 + an−2 for n ≥ 2.

So, how to create a list in Python containing, say, the 20 first Fibonacci numbers? Well,
this cannot be done by using a command on the form

[??? for n in range(0,21)],

since there is no way to let the n’th number depend on the previous numbers in the
sequence3 . What to do? Well, here we use a "pure" for-loop, where we store the
Fibonacci numbers, as they are computed, in a list so that we can use previous Fibonacci
numbers to compute the next numbers.

Example J.31 The following code can be used to compute Fibonacci numbers.

1 a = [1,1]
2
3 for n in range(2,20):
4 newterm = a[n−1] + a[n−2]
5 a. append ( newterm )
6
7 for n in range(0,20):
8 print ("The", n+1,"’th Fibonacci number is", a[n])

3
Well, strictly speaking, there is, but there is no "natural" way to do this in Python.
D-16 CHAPTER 10. A CRASH COURSE IN PYTHON

Exercise J.32 Implement the code from the above example. Use, say, the wiki-page
for the Fibonacci numbers to verify that it gives the correct output.
Exercise J.33 Use the method of the P above examples to create a list containing a
few partial sums of the infinite series ∞
k=0 1/2k.

Remark J.34 ("pure" for-loops versus list comprehensions) Above, we saw


an example of a situation where the flexibility of "pure" for-loops made them easier
to use than the more rigid list comprehensions. Does this mean list comprehensions
are less useful? Well, sort of. In addition, "pure" for-loops are available in most
programming languages, while comprehensions are really specific for Python. This means
that if you want to be able to adapt to other programming languages, you should use
list comprehensions sparingly.

Another feature of Python is that you can measure how long it takes for a program
to run. In this way, you can time how long it takes Python to run a limited number of
iterations of some for-loop, and then you can make an informed guesstimate of how long
it will take to run, say, a million iterations. Here is how this is done:

Remark J.35 (how to time a computation)


1 import time
2 time_start = time. process_time ()
3 # your code
4 time_elapsed = (time. process_time () − time_start )
5 print ( time_elapsed )

Exercise J.36 (a) What does the code to the


right compute? (b) Do the same as in (a), 1 S = 0
except do it by first getting Python to create 2 for n in range(1,100000):
a suitable sequence a, and then compute its 3 S = S + 1/n
sum using the command sum(a). (c) Use the
code from the above remark to compute which
method is the fastest.

Remark J.37 Python is a useful, but fairly slow, programming language. What you
are supposed to observe in the previous exercise is that the built-in commands in Python
actually are written using much faster languages, such as C++. The morale is: if speed
is important, use the built-in functions as much as possible.
10.3. SOME ADDITIONAL CONTROL STATEMENTS IN PYTHON D-17

10.3 Some additional control statements in Python


In the previous section, we considered for-loops. Here, we consider some additional
control statements in Python.
While-loops
The while-loop is a close cousin of the for-loop. Indeed, in most (if not all) cases, what
you can do with one of them, you can also do with the other. The point is that in most
situations, one of them will usually be much easier to use than the other.
Example J.38 (partial sums with a while-loop) The following code computes the
sum 10
X 1 1 1 1
= 1 + + + ··· + .
n 2 3 10
n=1
1 S = 0; n = 1
2 while n <= 10:
3 S = S + 1/n
4 n = n+1
5 print (S)

We explain the code:


Line 1: We initialise the variables S and n. (Recall that code separated by a
semi-colon is treated as if it was written on separate lines.)
Line 2: We start the while-loop. By writing while n <= 10: we tell Python that
the code inside of the while-loop (that is, the indented lines) should be repeated
until the variable n is no longer less than or equal to 10.
Line 3: The first time the while-loop is executed, we have S=0 and n=1. This
means that the line S = S + 1/n assigns S the value 1.
Line 4: We increase the value of n by 1. (This is something we did not have to do
in the for-loop.) Python now takes this, new, value of n, jumps up to line 2, and
checks if it is less than or equal to 10. If it is, the while-loop runs again, if not, it
jumps down to the first non-indented line (line 5).

Remark J.39 (logical operators) In the above example, we see the symbol <=. This
is a logical operator that checks if something is less than or equal to something else (note
that it is important to remember the order of the symbols since =< means nothing to
Python). When doing while-loops, the symbols >=, == and != are also useful, where the
latter two checks if two variables are equal or not equal, respectively (note that round-
off errors usually makes it impossible for Python to check if two numbers that are not
integers are equal – more on this in Section 10.5).

P J.40 Use a while-loop to check how large n has to be for the partial sums
Exercise
Sn = nk=0 1/2k is closer to 2 than 1/10000.
D-18 CHAPTER 10. A CRASH COURSE IN PYTHON

If-else statements
Alongside for- and while-loops, the if- else statement is the most important tool in
programming.
Here is a basic example:

Example J.41 (if-statements in Python)


1 a = 10
2
3 if a >= 3:
4 print ("a is a BIG number ")
5 else:
6 print ("a is a tiny number ")

Here is an explanation of the code:

Lines 3 to 6: Like the while-loop, an if-statement starts out by checking if some


condition is true or not. Here, if the condition is true, line 3 is run. If the condition
is false, then line 5 is run.

If you want to add more conditions, you can do this by adding as many elif com-
mands as you want (notice how we are allowed to use the word and to combine two
inequalities in our condition – it is also possible to use the keyword or):

Example J.42
1 a = 10
2
3 if a >= 3:
4 print ("a is a BIG number ")
5 elif a > −3 and a < 3:
6 print ("a is a tiny number ")
7 elif a <= −3:
8 print ("a is a BIG but negative number ")

Exercise J.43 Use an if-type statement to modify the code from Example J.31 so
that 1’th, 2’th and 3’th are replaced by 1’st, 2’nd and 3’rd, respectively.
10.3. SOME ADDITIONAL CONTROL STATEMENTS IN PYTHON D-19

The break command


Another keyword we can combine with if-statements (and while-loops) is break. This
command tells Python to stop the if-statement (or while-loop) and to continue the pro-
gram on the first unindented line below it. Since we will not need it, we refer the
interested reader to Google for more further information.
Here is an example where we combine the if, break and for keywords to mimic a
while-loop.

Example J.44 The following code does exactly the same as the one in Example J.38
1 S = 0
2 for n in range(1,1000):
3 if n > 10:
4 break
5 S = S + 1/n
6 print (S)

Let us briefly explain this code:

Line 1: We initialise an integer S (that we will use to keep track of partial sums).
Line 2: Here we start out the for-loop. We choose range(0,1000) large to ensure
that the break command, further down, will have time to kick in.
Line 3: We are now inside of the for-loop. Here, we ask the if-command to check
if n > 10. If this is not the case, the code will skip the indented lines and continue
on line 5. If n > 10 is true, then the indented code on line 4 will be run.
Line 4: Here, the break command is activated. It means that the for-loop is
stopped. The code will continue on line 8.
Line 5: Here, the value of S is updated.

Remark J.45 Sometimes we need to put for-loops inside of for-loops (when computing
matrices, this may be the case). When this happens, the break command will only stop
the "inner-most" loop.

Exercise J.46 Consider the list myList = [n/(1+n∗∗2) for n in range(0,10∗∗6)].

(a) Write a code that combines a for-loop with the break command to check when
the first entry in the list is smaller than 10−4 .
(b) Do the same as in (a), but with a while-loop and no break command.
D-20 CHAPTER 10. A CRASH COURSE IN PYTHON

10.4 Functions in Python


In this section, we will mostly limit ourselves to discussing mathematical functions in
Python.

How to define a function in Python


Here is a basic example showing how to define a function in Python.

Example J.47 (Defining functions in Python) In the following code we define


f (x) = x2 − 5x + 4 in Python and compute the value f (3).
1 def f(x):
2 y = x ∗∗ 2−5 ∗ x+4
3 return y
4
5 z = f(3)
6 print (z)

Here is what happens:


Line 1: We write def f(x): to let Python know that we are now about to define
a function that has the name f and that takes one variable, given the name x,
as input. It is absolute critical to understand that, at this point, Python will not
execute this code. Instead, we are just telling Python that if the function f is used
at any subsequent point in the code, then this is the code that should be executed.
Line 2: The variable y is assigned the value x∗∗2−5∗x+4.
Line 3: The value of y is returned as the output of f.
Lines 5 and 6: In line 5, we ask Python to apply the function f to the value 3.
This means that Python will run the code in lines 1, 2 and 3 with x = 3. When
line 3 is executed, the returned value will be stored in the variable z. In line 6,
this value is printed on screen.
Exercise J.48 (a) Use the if-control statement to implement the absolute value
function. Use it on a few values to check that it works.
(b) Implement the absolute value function without using the if-control statement.
Remark: It is not really necessary to program the absolute value function by hand since
it already exists in Python as the function abs(x).
Exercise J.49 Write the code for a function that will take a list of numbers as input,
and return their product as output.
Remark: This will be analogue to the function sum() which takes a list of numbers as
input and returns their sum as output.
10.4. FUNCTIONS IN PYTHON D-21

Warning: The importance of local namespaces


We face some dangers when defining functions in Python. First, we need to be aware that
the variables appearing inside the function definition are local and cannot be accessed
outside of the definition. Since this is a subtle point that leads to a lot of bad code, we
are going to spend a little bit of time discussing it.
Here is a first example:

Example J.50 The following code makes no sense and will crash.
1 x = 3
2 def f(x):
3 y = x ∗∗ 2−5 ∗ x+4
4 return y
5 print (y)

Here is what happens:

Line 1: We define a variable x and set its value to 3.


Lines 2, 3, 4: We define the function f. However, this code is never run.
Lines 5: Here, Python becomes confused and crashes since no variable called y
was ever created.

While the above explanation may seem to make perfect sense, it is actually sort of
misleading. To understand why, let us take a look at a second example.

Example J.51 The following code makes no sense to Python and will crash.
1 x = 3
2 def f(x):
3 y = x ∗∗ 2−5 ∗ x+4
4 return y
5 f(x)
6 print (y)

Take a moment and think about what happens when this code runs. Hopefully, it
will confuse you since the error is rather subtle. Here is the explanation:

Lines 1 – 4: This is the same as the code in the previous example. For clarity,
we reiterate that the code in lines 2, 3, 4 is not run at this stage.
Lines 5, 6: In line 5, finally, the code in lines 2, 3, 4 is run. Since we put x = 3 in
line 1, it is run with the value x = 3. In line 3, y gets the value −2, and in line 4,
D-22 CHAPTER 10. A CRASH COURSE IN PYTHON

this value is returned. The program moves on to line 6, where it tries to print the
value of the variable y. But this variable has never been created, and so Python
crashes.

Wait, what? Surely, this makes no sense since the variable y was created in line 4
and given the value −2. Well, no: the point is that all variables created in lines 3, 4, 5
are local and only exist when these lines are run. After the code is done running line 5,
all of these local variables are deleted.
Maybe things become clearer if we rewrite the explanation of Lines 7, 8 as follows:

Lines 7, 8: In line 7, finally, the code in lines 3, 4, 5 is run. Since we put x = 3 in


line 1, the command f(x) is read by Python as f(3). Now, the variable x in line
3 is not the same as the variable x from line 1, instead, we should think of it has
having some other name, say, local_x. Since the command f(3) activated line 3,
the first thing that happens is that Python sets local_x = 3. Next, in line 4, the
variable is not really y, it is local_y, which gets the value −2. Finally, in line 5,
this value is returned. This means that f(x) in line 7 now represents the value −2.
However, since this value is all alone on this line, nothing is done to it, and it is
simply forgotten (we could have written, say, z = f(x) to store it). The program
merrily continues on line 8, where it tries to print the value of the variable y. But
this variable has never been created, and so Python crashes. (Note that as Python
jumped from line 5 to line 7, the variables local_x and local_y are deleted from
memory.)

In technical terms, it is said that the code inside of the definition of a function
has its own local namespace which cannot be called upon in the code outside of the
function. This is done to "protect" the program from the code inside of the function.
To understand why this is necessary, let us look at the following example.

Example J.52 Thanks to functions having their own namespace, we can be sure that
the following code works:
1 a = [1, 2, 3]; b = [4, 5, 6]
2 c = sum(a)
3 print (b)

The point is that we have no idea how the function sum(a) is coded. If the local
namespace was not kept separate from the (global) namespace, we could be so unlucky
that a variable called b is used inside of the code for the function. This would then
overwrite the contents of the variable b that we created before running sum(a). The
point of having a local namespace is to avoid this and to keep our variables safe from
harm. Yay!
10.4. FUNCTIONS IN PYTHON D-23

How to plot functions in Python using lists


Here, we demonstrate the first out of two methods for plotting functions in Python. This
method is really similar to what we did when we plotted sequences stored as lists (recall
Example J.28).

Example J.53 (plotting with datatype list) The following code plots the graph of
f (x) = x2 + 2x + 3 over the interval [0,1].
1 import matplotlib . pyplot as plt
2
3 def f(x):
4 y = x ∗∗ 2 + 2 ∗ x + 3
5 return y
6
7 X = [n/10 for n in range(0,11)]
8 Y = [f(x) for x in X]
9
10 plt.plot(X,Y)

Fig. 3. To the left, we see the output of the above code. To the right, we see the
output if we replace the line plt.plot(X,Y) with plt.plot(X,Y,"bo").

Here, we explain the code.

Lines 1 – 5: Here, we import the package matplotlib.pyplot and define the


function f.
Line 7: Here, we create the list [0, 1/10, 2/10, ..., 9/10, 1]. This list will
play the part of the x-axis.
Line 8: Here, we create the list [f(0), f(1/10), f(2/10), ..., f(9/10), f(1)].
This list provides the y-values given by y = f (x) for each x in the list X. Notice
how we can replace range(0,11) in the for-statement with other lists.
D-24 CHAPTER 10. A CRASH COURSE IN PYTHON

Line 10: The command plt.plot(X,Y) is very similar to the one appearing when
we plotted the sequence in Example J.28. It takes two lists, in this case X =
[x0 , x1 , . . .] and Y = [y0 , y1 , . . .] as input. It then draws a straight (blue) line from
the point (x0 , y0 ) to (x1 ,y1 ). And then from (x1 ,y1 ) to (x2 ,y2 ) and so on until it
runs out of points. If the lists X and Y are not of equal length, the program gets
confused and crashes. If we add the option "bo" to the command plt.plot(X,Y),
we tell Python not to draw lines, and instead just put blue dots, as we have done
before (and as is shown above)!

Fig. 4. We are never really plotting the graph of a function f based on all its values
on an interval [a,b]. In reality, we only check the y-values for certain x-values and
then ask Python to connect the dots.

Exercise J.54 Plot the functions you implemented in exercise J.48 on [−2,2].
Exercise J.55 Consider the following code.

1 import matplotlib . pyplot as plt


2
3 N = 10
4
5 def a(k):
6 return 1/2 ∗∗ k
7
8 def S(a,n):
9 return sum ([a(k) for k in range(0,n)])
10
11 nValues = [n for n in range(0,N)]
12 Y = [S(a,n) for n in nValues ]
13
14 plt.plot(nValues ,Y, "bo")

(a) Explain in mathematical term what this code does.


(b) The symbol a takes more than one role in the above code. Explain what roles
these are.
(c) Make the code easier to read by giving a different names where possible.
10.4. FUNCTIONS IN PYTHON D-25

Plotting functions using numpy arrays

We now give an alternative, and rather elegant, way to plot functions. The price to
pay is that we need to introduce a new datatype commonly referred to as numpy arrays
(strictly speaking, Python calls them numpy.ndarray’s, but we will ignore this).
In a sense, numpy arrays are just like lists, but with the restriction that they can
consist of numbers (lists can also consist of, say, strings of text). But this means that it
makes sense for numpy arrays to have additional features related to numbers.

Example J.56 (features of the datatype numpy array) The following code is
meant to illustrate some of the things we can do with numpy arrays.
1 import numpy as np
2
3 def f(x):
4 return x ∗∗ 2
5
6 a = [1,2,3,4]; b = [1,2,5,3]
7
8 A = np. array (a) # Here we convert the lists a and b into numpy
9 B = np. array (b) # arrays . Usually , arrays are expressed like lists
10 # but without commas . E.g., now A = [1 2 3 4].
11
12 C = A∗B # This results in C = [1 4 15 12].
13 D = A+B # This results in D = [2 4 8 7].
14 E = A/B # This results in ... well , it divides the two lists ,
15 # entry by entry :−)
16
17 F = np. zeros ((3)) # Creates the array [0 0 0]
18 G = np.ones ((4)) # Creates the array [1 1 1 1]
19 H = np.arange(4) # Creates the array [0 1 2 3]
20 I = np. linspace (0,1,5) # Creates the array [0 0.25 0.5 0.75 1]
21
22 J = I. tolist () # Converts the numpy array I to a list J.
23 K = f(A) # this results in K = [1 4 9 16]

Let us make the following comments with respect to the above example:

Line 1: Before we can use numpy arrays we have to import the package numpy.
Lines 8, 9, 22: We can always convert a list (with only numerical entries) to a
numpy array, and vice versa. When creating a numpy array, normally we would
just write, say, A = np.array([1,2,3,4]).
D-26 CHAPTER 10. A CRASH COURSE IN PYTHON

Line 20: The command np.linspace(a,b,N) creates an array with N equally


spaced points from the interval [a,b], starting and ending at the left and right end-
points, respectively. This command is very useful when we want to plot functions!
Line 23: When applying a mathematical function f to the numpy array A = [01 2 3],
what happens is that we obtain the array J = f(A) = [f(0) f(1) f(2) f(3)]. This
is also very useful when we want to plot functions!

Here is how to plot functions using numpy arrays.

Example J.57 The following code gives exactly the same output as the code in Example
J.53.
1 import matplotlib . pyplot as plt
2 import numpy as np
3
4 def f(x):
5 y = x ∗∗ 2 + 2 ∗ x + 3
6 return y
7
8 X = np. linspace (0,1,11)
9 Y = f(X)
10
11 plt.plot(X,Y)

Exercise J.58 Try to use numpy arrays to plot the functions from exercise J.48. For
one of them, the code will not work. Can you imagine why?
10.4. FUNCTIONS IN PYTHON D-27

Predefined functions in Python


We now describe some of the pre-defined function that comes with Python. Usually, it
is a good idea to use these whenever you can since they are optimised to run fast using
techniques way outside the scope of these lecture notes.

Remark J.59 (Built in functions in Python) Here is a list of some of the functions
that are built into Python.
• abs(x) – computes the absolute value of x.
• complex(a,b) – returns the complex number a + ib.
• float(x) – converts integer to a float.
• int(x) – convertes float to integer (rounds down to nearest integer).
• round(a,n) – rounds the floating point number a to its n first digits.
• type(x) – returns the datatype of the variable x.

For the full list, see, e.g., https://docs.python.org/3/library/functions.html.

Additional functions can be imported from packages. For instance, here is a list of
some functions from the numpy (numerical Python) package:

Remark J.60 (Functions in the numpy package) To use these functions, you need
to start your code with import numpy as np. You now have access to the following
functions:
• np.exp(x) – the exponential function
• np.log(x) – the natural logarithm
• np.log2(x) – the logarithm with base 2
• np.log10(x) – the logarithm with base 10
• np.sin(x) – the sine function (radians)
• np.cos(x) – the cosine function (radians)
• np.tan(x) – the tangent function (radians)
• np.arcsin(x) – the arcsine
• np.arccos(x) – the arccosine
• np.arctan(x) – the arctangent

Here are some other useful functions:

• np.absolute(x) – gives the absolute value of x


• np.deg2rad(x) – converts degrees into radians
• np.rad2deg(x) – converts radians into degrees
D-28 CHAPTER 10. A CRASH COURSE IN PYTHON

• np.sum(x) – returns the sum of the elements of a list/array


• np.prod(x)– returns the product of the elements of a list/array
• np.imag(z) – returns imaginary part of the complex number z.
• np.real(z) – returns real part of the complex number z.
• np.angle(z) – returns the angle of the complex number z (radians).
• np.conj(z) – returns the complex conjugate of the complex number z.

Here are some constants:

• np.pi – gives π
• np.e – gives e
For more, see https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html

Here is a brief example of how to use the numpy package.

Example J.61
1 import numpy as np
2 y = np.sin(np.pi) # Computes sin(pi ).
3 print (y) # Prints the result on screen .

There are also other packages with even more functions. We can mention the math
and scipy (scientific python) packages. However, since we are quite happy with what we
have mentioned above, we will skip these. (In particular, the numpy package essentially
makes the math package obsolete – especially when working with numpy arrays)
10.5. HOW NUMBERS ARE REPRESENTED IN PYTHON D-29

10.5 How numbers are represented in Python


We end this appendix by examining how numbers are represented in Python and some
peculiar consequences of this.

Example J.62 At the start of Chapter ??, we considered the infinite series

X 1 1 1 1
= 1 + + + + ··· .
2k 2 4 8
k=0

Letting Sn denote the n’th partial sum of this infinite series, the point of exercise J.??
was for you to notice that 1
2 − Sn = n .
2
In particular, for all n ∈ N, we have Sn 6= 2.
But here is the thing: if if we ask Python to compute, say, S100 , then Python will
give the output S100 = 2. But this is wrong! While S100 is close to 2, it is not equal to
2. Yikes!

Exercise J.63 Does Python really mean that S100 = 2, or is there something else
going on? One way to double check this is to ask Python to compute 1/(2 − S100 ).
What happens?

So, what is going on here? Well, the point is that since there is a limit to how much
memory Python is willing to use to represent a number, there is also a limit to how
precisely Python will represent that number. This inevitably leads to so-called round-off
errors when working with computers, and it is vital that have some understanding of
why they occur.

How integers are represented in Python


The first question we ask is the following, what exactly happens when we run, say:
1 myNumber = 82

If the code is run on an 8 bit computer, the following happens:

Fig. 5. On the most fundamental level, the memory of the computer is described in
terms of bits and bytes. A bit can be either 0 or 1, while a byte is a string of 8 bits.
Modern computers normally set aside 64 bits for each integer.
D-30 CHAPTER 10. A CRASH COURSE IN PYTHON

That is, Python sets aside a certain amount of memory (depending on the type of
computer you run it on), and use it to store your integer. By giving it a name such as
myNumber, we know how to access this part of the memory, and by giving it a datatype,
Python will knows how to interpret the of 0’s and 1’s located there. Integers are usually
stored using variables of the datatype int (short for integer).

Remark J.64 Inside the circuits of a computer, the 1’s are represented by a short pulse
of electricity, while the 0’s are represented by the lack of such a pulse. Now, it would
make sense to design the circuitry of a computer so that every integer between 0 and
9 was represented. Indeed, one could model the integers by using varying intensities of
the pulse. However, a reason for just working with 0’s and 1’s is that this reduces the
chance that some disturbance will make the computer mistake one value for another.

This leads to the following mathematical question: how to represent all integers
using only 0’s and 1’s? The thing to realise is that this is not so different from the
following question: how to represent all integers using only strings of digits from the list
{0,1,2,3, . . . , 9}?

Example J.65 (Decimal and binary notation) So what do we really mean by the
integer 4132? Well, this:

4132 = 4 · 1000 + 1 · 100 + 3 · 10 + 2 · 1

= 4 · 103 + 1 · 102 + 3 · 101 + 2 · 100 .

(Here, we included the second line to emphasise how the powers of 10 occur in this
expression.) In fact, we are so used to thinking about integers as decimal numbers that
it is completely obvious for us that we can represent all numbers in this way.
But how to represent integers just using a string of 0’s and 1’s? For instance, what
number should the string, say, 1101 represent? Well, here is the basic idea:

1101 = 1 · 8 + 1 · 4 + 0 · 2 + 1 · 1

= 1 · 23 + 1 · 22 + 0 · 21 + 1 · 20 .

When 1101 is interpreted in this way, we call it a binary number. (Again, we include
the second line to emphasise how the powers of 2 occur in this expression.)

Let us now consider two questions: 1) why is the basic idea shown in the above
example actually quite reasonable, and 2) how to know if the number 1101 is supposed
to be interpreted in the above sense (i.e., as a binary number) and not as the decimal
number one-thousand-one-hundred-and-one?
10.5. HOW NUMBERS ARE REPRESENTED IN PYTHON D-31

To answer the second question first: when it is not clear if we are talking about binary
or decimal representations of numbers, we can use the notations (1101)2 and (1101)10 to
indicate that we mean binary or decimal notation, respectively.
But what about the first question? Well, using the idea shown above, here is how
counting in binary works:

(0)2 = 0 (11)2 = 3 (110)2 = 6


(1)2 = 1 (100)2 = 4 (111)2 = 7
(10)2 = 2 (101)2 = 5 (1000)2 = 8

Notice that this is exactly how counting with two digits should work! Every time we run
out of digits, we start over by including an extra zero and carrying over a one. Indeed,
this is what happens when you count using ten digits and want to count past 9 or, for
that matter, past 19. The thing with counting in binary is that this happens a lot!

Exercise J.66 Check that the above list is correct, and continue the list to 20.

Remark J.67 (Binary numbers in 8-bit computers) Here is a slightly simplified


explanation of how an 8-bit computer would interpret the string of 0’s and 1’s in the
byte shown in Figure 5 (see Remark J.69 below for a hint of the full story):

Fig. 6. The first 7 bits (from the right) combine to form the binary representation of
the integer. The left-most bit tells us if the binary number is positive or negative.

Exercise J.68 (a) What is the largest integer you can represent as an 8 bit integer?
(b) Modern computers use 64 bit integers, where 1 bit is used for the sign and 63 for
the number itself. What is the largest integer you can represent using a 64 bit integer?

Remark J.69 (Two’s complement) Strictly speaking, our explanation for how inte-
gers are represented is only correct for positive integers. For negative integers, it would
be kind of stupid to do exactly as we describe since then we would have two different
ways of representing the integer 0 (indeed, both the bytes 0000 0000 and 1000 0000 would
represent 0). Instead, an alternative strategy called two’s complement is used. We will
not explain it here (Wikipedia has a nice page on this), but it allows the computer to
represent one extra negative number, meaning that on an 8 bit computer we can repre-
sent every integer between −128 and 127. (And, perhaps more importantly, using two’s
complement allows the hardware to speed up integer arithmetic.)
D-32 CHAPTER 10. A CRASH COURSE IN PYTHON

How non-integers are represented in Python


√ We now turn to how real numbers that are not integers, such as 1/10 √ = 0.1 and
2 = 1.4151..., are represented in a computer. First, we note that since 2 has an
infinite number of digits, it should be clear that this number cannot be represented
exactly. What may come as a surprise is that not even 0.1 can be stored correctly!
So, what is going on? First, we need to know that when storing real numbers that
are not integers, your computer uses the datatype float.

Remark J.70 (sloppy description of 64 bit floating numbers) A 64 bit computer


sets aside 64 bits to represent the real number on the form

α · 10β ,

where the fraction α is (roughly) a 16 digit integer (positive or negative) and the exponent
β is (roughly) an integer between −340 and +292.

Here, we use the words essentially and roughly since we lose a little bit in the trans-
lation from binary to decimal numbers. However, before formulating a more correct
description of floating point numbers in binary language, let us try to get some intuition
from the sloppy definition. To this end, we consider the following example.

Example J.71 According to the sloppy description, how is the number 2 =
1.414213562373095048801688724209698078569... stored? Well, roughly as

1414213562373095 · 10−15 .

That is, the computer stores up to 16 digits in α (starting with the first non-zero digit
from the left), and the position of the decimal point in the β. In particular, since a lot
of information is thrown away, this means that we get a round-off error!

Exercise J.72 According to √the sloppy definition, (a) how far is it between the float-
ing point representation of 2 and its closest floating point "neighbour"? (b) How
far is it between x = 0 and the next floating point number?

Fig. 7. As indicated by the above exercise, the floating point line is not a continuous
line, instead it consists of many points with some short distance between them.
10.5. HOW NUMBERS ARE REPRESENTED IN PYTHON D-33

Exercise J.73 Put a = 325660000, b = 0.000032566 and use Python to compute


100∗(a+b), 1000∗(a+b) and 10000∗(a+b). How are the results of these computations
stored? Are there any round-off errors?
Remark: If you want to force Python to show you, say, 20 decimals places of a variable
a, you can use the crazy looking command print("{:.20f}".format(a)).

Exercise J.74 Use the sloppy description of float numbers to do the following:

(a) Explain how large N has to be for the computer to think that 1 + 2−N = 1.
(b) How large does N have to be for the computer to think that 2−N = 0?

Hint: Recycle your answers from J.72.

Now, notice the following. According to our sloppy description, above, it makes no
sense that the number 1/10 = 0.1 cannot be represented accurately as a floating point
number. Indeed, the number 1/10 ought to have the simple representation

1 · 10−1 .

So what is going on? Well, to explain this, we need a more accurate description of
floating point numbers. As a first step, we need to understand how binary notation
works for non-integers.

Example J.75 (binary notation for non-integers) The way to represent non-
integers in binary notation is essentially completely analogous to how we do this for
decimal numbers. Indeed, compare
1 1
(643.57)10 = 6 · 102 + 4 · 101 + 3 · 100 + 5 · 1
+7· 2
10 10
and
1 1
(101.01)2 = 1 · 22 + 0 · 21 + 1 · 20 + 0 · + 1 · 2.
21 2

Exercise J.76 As a small taste of binary arithmetic, figure out both the decimal and
binary representations of the numbers

(a) (101.01)2 · 22 (b) (101.01)2 · 21 (c) (101.01)2 · 2−1

We now formulate the more accurate description of floating point numbers.


D-34 CHAPTER 10. A CRASH COURSE IN PYTHON

Remark J.77 (a more accurate description of 64 bit floating point numbers)


A 64 bit floating point number is stored on the form

(1.α)2 · 2β ,

where the fraction α is a string of 52 bits, and the exponent β is a 11 bit integer. The
remaning bit is used to store the sign of the floating point number.
When the exponent β is the smallest possible, then (1.α)2 is replaced by (0.α)2 . (This
is done to offer additional accuracy close to the origin.)

Here is a visual representation of the memory used for a 64 bit floating point number:

Fig. 8. Keep in mind that out of the 11 bits used for the exponent, one of them is
used to denote the sign. In addition, it is not completely accurate to think of the
53 bits used for the fraction as a binary integer (see example below).
Since the notation used in the above description may be a bit confusing, let us consider
an example.

Example J.78 How to store the number 1/10 a floating point number? In order not to
have to write strings of 53 bits, let us pretend that we are working with 16 bit floating
point numbers (so-called half-precision floats).

Fig. 9. 16 bit floats are exactly like 64 bit floats, except that less bits are available
for the fraction and exponent.
First, expressing 1/10 on binary form (see exercises below for how to do this), we see
that 1
= (0.00011001100110011...)2 ,
10 10
where the pattern keeps repeating. That is, the binary expansion of 1/10 is not finite!
This means that to store it as a 16 bit (or 64 bit) floating point number, we are forced
10.5. HOW NUMBERS ARE REPRESENTED IN PYTHON D-35

into making a round-off error! Indeed, here is the representation of 1/10 as a floating
point number:
−(100)2
| {z } ·2
1. 100110011 ,

and here is exactly how this would look in the memory of the 16 bit computer:

Fig. 10. Note that since the fraction appears on the right-hand side of a "binary
comma", its right-most zeroes can be ignored.

Exercise J.79 Translate the "accurate description" of 64 bit floating point numbers
to decimal notation to obtain the "sloppy description" at the start of this section. In
particular, you need to take into account the added accuracy close to the origin.

Exercise J.80 (Challenging) Explain what are the only fractions that can be rep-
resented without round-off error as 64 bit floating point numbers.

Remark J.81 (The effect of round-off errors)


When using Python (or any other computer program)
to compute and visualise data, we constantly need to
ask ourselves if the results of our computations make
sense, or if they are the artificial results of round-off
error. For instance, to the right, we see a visualisation
of f (x) = 2x · ln(1 + 2−x ) which is utter nonsense (this
function appears in, e.g., Chapter ??).
D-36 CHAPTER 10. A CRASH COURSE IN PYTHON

10.6 Answers to selected exercises


J.4 (a) addition, (b) subtraction, (c) multiplication, (d) power, (e) division, (f ) floor
division (returns the result of the division rounded down), (g) modulo (returns the
numerator of the "remainder term" of the division),.

J.8 (a), (d) and (f ) will not run.

1
J.9 (a) 1 + 1/2 + 1/4 + 1/8 + 1/16, (b)
1
1+
1
1+
1
1+
1+1
J.11 The code in (a) will run.

J.36 The code computes the partial sum S99999 of the harmonic series. The speed of
the computations will depend on your processor.

J.40 There are many ways to write this code. Here is one:
1 sum = 0
2 k = 0
3 while abs(sum − 2) >= 1/10000:
4 sum = sum + 1/2 ∗∗ k
5 k = k +1 # Keep in mind that in a while−loop ,
6 # we must update k manually .
7 print (k−1)

J.46 (a)
1 k = 0
2 for x in myList :
3 if x < 10∗∗(−4):
4 break
5 k = k+1
6 print (k)

(b)
1 k = 0
2 while myList [k] > 10∗∗(−4):
3 k = k+1
4 print (k)
10.6. ANSWERS TO SELECTED EXERCISES D-37

J.48 Below, notice that we cannot call the absolute value function abs, since this keyword
is already taken (for Python’s own version of the absolute value function). (a)
1 def absolute 1(x):
2 if x >=0:
3 return x
4 else
5 return −x

(b)
1 def absolute 2(x):
2 return (x ∗∗ 2) ∗∗ (1/2)

J.49
1 def product (x):
2 temp_prod = 1
3 for a in x:
4 temp_prod = temp_prod ∗ x
5 return temp_prod

J.54 Here is how to plot the first:


1 import matplotlib . pyplot as plt
2 # insert code for the definition of absolute1
3 X = [k/10 for k in range(−20,21)]
4 Y = [ absolute 1(x) for x in X]
5 plt.plot(X,Y)

J.58 This will not work for absolute1 since if-statement in the definition of the function
does not make sense if x is a list or a numpy array. For absolute2, the following
code will work:
1 import matplotlib . pyplot as plt
2 import numpy as np
3 # insert code for the definition of absolute2
4 X = np. linspace(−2,2,40)
5 Y = absolute 2(X)
6 plt.plot(X,Y)

J.63 Python crashes and returns the error message: "ZeroDivisionError: division by
zero". In other words, Python really believes that S100 = 2.
D-38 CHAPTER 10. A CRASH COURSE IN PYTHON

J.66 Here are the first twenty numbers in both binary and decimal notation:

P6 n
P62
J.68 (a) n=0 2 = 27 − 1 = 127, (b) n
n=0 2 .

J.72 (a) Roughly 10−16 , (b) 10−340 .


J.73 According to the sloppy description a+b is stored as 3256600000000325·10−7 . When
printing 100 ∗ (a + b), 1000 ∗ (a + b) and 10000 ∗ (a + b) in in Python, you will see
that the 17’th digit keeps changing due to the round-off error (Python essentially
keeps guessing this digit wrong).
J.74 (a) N ≥ 53, (b) N ≥ 1075 (these answers can be checked by doing the computa-
tions in Python).
J.76 (a) (10101)2 = 21, (b) (1010.1)2 = 10.5, (c) (10.101)2 = 2.625.
J.79 β is between −1024 and 1023 (counting two’s complement). This means that 2β
is between 10−308 and 10308 , roughly. Taking into account that the α in our rough
notation is a number between 1 and 1016 , and that in the accurate description
(1.α)2 is replaced by (0.α)2 when β = −1024, we should get the rough description
(roughly).
J.80 Exactly the fractions a/2n , where a is an integer, and n is a natural number. These
are exactly the numbers with finite binary expansions (prove this!).

You might also like