Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
0 views

Lecture Notes

The document contains lecture notes for the Mathematics for AI 1 course at JKU for the Winter Semester 2024/25, based on previous combined notes. It covers fundamental mathematical concepts such as sets, numbers, functions, matrices, and systems of linear equations, along with sequences and series. The notes aim to provide a solid foundation for understanding mathematics in the context of artificial intelligence.

Uploaded by

alitaweel101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lecture Notes

The document contains lecture notes for the Mathematics for AI 1 course at JKU for the Winter Semester 2024/25, based on previous combined notes. It covers fundamental mathematical concepts such as sets, numbers, functions, matrices, and systems of linear equations, along with sequences and series. The notes aim to provide a solid foundation for understanding mathematics in the context of artificial intelligence.

Uploaded by

alitaweel101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 148

Mathematics for AI

Oliver Roche-Newton

September 30, 2024

Abstract

Lecture notes for the Mathematics for AI 1 course at JKU (Winter Semester 2024/25).
The notes are based on Mario Ullrich’s combined notes for Mathematics for AI 1-3. I am
grateful to Jan-Michael Holzinger, Severin Bergsmann, Sara Plavsic, Tereza Votýpková
and Monica Vlad for their help in constructing the notes.

Contents

1 Sets, Numbers and Functions 3


1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.5 An introduction to proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6 Bounded sets, infimum and supremum . . . . . . . . . . . . . . . . . . . . . . 35
1.7 Some basic combinatorial objects, identities and inequalities . . . . . . . . . . 40
1.8 Some important functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.9 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
1.10 Vectors and norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

2 Matrices and systems of linear equations 72


2.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.2 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.3 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4 Matrices as linear transformations . . . . . . . . . . . . . . . . . . . . . . . . 98
2.5 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.6 Inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

1
3 Sequences and Series 112
3.1 Convergence of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.2 Calculation rules for limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.3 Monotone sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.4 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.5 Cauchy criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.6 Introduction to series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.7 Calculation rules and basic properties of series . . . . . . . . . . . . . . . . . 139
3.8 Convergence tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

2
1 Sets, Numbers and Functions

In this section, we will introduce some of the most fundamental objects of mathematics. In
particular, we will focus on numbers, sets (usually sets of numbers), and relations between
them. We will see some simple proofs and learn about some important proof techniques, par-
ticularly proof by induction and proof by contradiction. Finally, we treat complex numbers,
which are necessary to give solutions to arbitrary polynomial equations.
The content of this section will form the basis for the mathematics we learn throughout this
course and also the upcoming courses Mathematics for AI 2 and 3, and so it is essential to
have a solid understanding of the concepts we introduce here.

1.1 Sets

A set M is a collection of different ‘objects’ which we call elements of M . We use the following
notation:
x belongs to M, we write x ∈ M
or
x does not belong to M, we write x ∈
/ M.
Some particularly important sets are assigned names and symbols.

• N := {1, 2, 3, . . . } is the set of natural numbers.

• N0 := {0, 1, 2, 3, . . . } is the set of non-negative integers.

• Z = {. . . , −2, −1, 0, 1, 2, . . . } is the set of integers.

• Q is the set of rational numbers.

• R is the set of real numbers.

• C is the set of complex numbers.

• P is the set of prime numbers.

All these sets will be precisely defined and discussed later in this chapter. First, let us see
that there are multiple ways to define sets. The easiest way would be to list all its elements,
for instance:

A = {0, 1, 2}, B = {Austria, Germany, Italy, Liechtenstein}.

However, if we have sets containing an infinite amount of elements, we cannot name all of
them. In this case we use dots if it is clear what is contained in the set. We did this already
for the sets N and Z. Some more examples are the even and odd natural numbers:

E = {2, 4, . . . }, O = {1, 3, . . . }.

However, this may lead to difficulties of interpretation as this is not guaranteed to give a
unique description. For example, we may define the set of even numbers, as above, via

E = {2, 4, . . . },

3
and the set of all powers of 2 as
G = {2, 4, . . . }.
Since these sets do not differ until the third element, they appear identical in this notation.
For a good definition of an infinite set, it is therefore formally necessary to precisely specify
the properties of its elements. For instance, the set of even numbers can be written as

E = {2k : k ∈ N},

and the set of all powers of 2 as


G = {2j : j ∈ N}.
A special but important set is the empty set, denoted ∅, which does not contain any element.
Two sets M and N can also be related to each other. If for all m ∈ M we also have m ∈ N
then we say M is a subset of N , and we write M ⊂ N In this case, we also call N a superset
of M . If we have a look at the sets defined above, we have e.g. E ⊂ N and G ⊂ E. Note
that, for any set M we have the relations M ⊂ M and ∅ ⊂ M .
Sets M and N are called equal if they contain the same elements, i.e. M ⊂ N and N ⊂ M .
For example we have
{0, 1, 2} = {2, 0, 1}
and
{2, 3, 5, 7} = {p ∈ P such that p ≤ 9}.
To verify that two sets X and Y are equal, we need to check that both X ⊂ Y and Y ⊂ X.
Exercise - Show that
{2, 3, 5, 7} = {p ∈ P such that p ≤ 9}.

If we have M ⊂ N and M ̸= N , then we say that M is a proper or strict subset of N and


write M ⊊ N . For example,
N ⊊ N0 ⊊ Z. (1)

Some authors prefer to use “⊆” for a generic subset instead of “⊂‘”, in order to make it
more explicit that equality is not excluded. The same authors typically use the notation “⊂”
instead of “⊊” for proper subsets. So, one should be mindful of this variation in notation
when using different literature.
The elements of sets can also be sets! An important example is the power set P(M ) for a
given set M , which consists of all subsets of M . That is,

P(M ) := {A : A ⊂ M }.

For example, consider once more the set A = {0, 1, 2}. The power set of A is

P(A) = {∅, {0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}, A}.

Note that we always have M ∈ P(M ) and ∅ ∈ P(M ).


We can also create new sets from given sets, say M and N , by using set operations.
The union M ∪ N contains all elements which belong to the set M and all elements which
belong to the set N .

4
The intersection M ∩ N consists of all elements which are in both M and N .
The set difference of M and N , written as M \ N , is the set of all elements of M which are
not contained in N .
If we only work with subsets M ⊂ Ω for a fixed set Ω, then we call Ω the underlying
set or the ground set. In this case, the complement of M (in Ω), denoted M c , is the set
Mc = Ω \ M.

Figure 1: Venn-diagrams

The illustrations above are called Venn-diagrams. Such pictures can be a helpful tool for
thinking about sets and how they interact with each other.
Example. Let Ω = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and let A, B ⊂ Ω be the sets
A = {2, 3, 4}, B = {3, 5, 7, 9}.
Then
A ∪ B = {2, 3, 4, 5, 7, 9}
A ∩ B = {3}
B \ A = {5, 7, 9}
Ac = {1, 5, 6, 7, 8, 9, 10}
B c = {1, 2, 4, 6, 8, 10}.

5
All of the sets we have seen so far have elements that can be listed, even if in some cases the
list is infinite. However, this is not the case with all sets. Another important family of sets
to consider are intervals.
Definition 1.1. Let a, b ∈ R such that a ≤ b. Then we define the closed interval between
a and b to be the set
[a, b] := {x ∈ R such that a ≤ x ≤ b}.
The half open intervals between a and b are the sets

[a, b) := {x ∈ R such that a ≤ x < b}

and
(a, b] := {x ∈ R such that a < x ≤ b}.
The open interval between a and b is the set

(a, b) := {x ∈ R such that a < x < b},

Moreover, we write

[a, ∞) := {x ∈ R such that x ≥ a}, (a, ∞) := {x ∈ R such that x > a}

and
(−∞, a] := {x ∈ R such that x ≤ a}, (−∞, a) := {x ∈ R such that x < a}.

Elements of sets are not ordered, and so the sets {a, b} and {b, a} are the same. Nevertheless,
it is often important to order the objects under consideration, and for this purpose we have
the Cartesian product.
Definition 1.2. Let A and B be sets, and let a ∈ A and b ∈ B be arbitrary elements.

• The expression (a, b), which is sensitive to order, is called an ordered pair.

• Two tuples (a, b) and (a′ , b′ ) are equal if and only if a = a′ and b = b′ .

• The set of all ordered pairs

A × B := {(a, b) : a ∈ A, b ∈ B}

is called the Cartesian product of the sets A and B.

Some remarks about Cartesian products are given below.

• In general (a, b) ̸= (b, a). In fact, the second part of the definition implies that (a, b) =
(b, a) if and only if a = b.

• An ordered pair (a, b) and a set {a, b} are completely different objects.

• If we consider more than two sets, say A1 , A2 , . . . , Ad for some d ∈ N, then we can also
define the (d-fold) Cartesian product

A1 × A2 × · · · × Ad := {(a1 , . . . , ad ) : ai ∈ Ai for all i = 1, . . . , d},

whose elements (a1 , . . . , ad ) are called d-tuples.

6
• For the Cartesian product
| ×A×
A {z· · · × A}
d times ,
we use the shorthand Ad .

Example - Let A = {π, 2} and B = {1, 2, 3}. Then
√ √ √
A × B = {(π, 1), (π, 2), (π, 3), ( 2, 1), ( 2, 2), ( 2, 3)}

and √ √ √
B × A = {(1, π), (1, 2), (2, π), (2, 2), (3, π), (3, 2)}.

7
1.2 Propositional logic

Next, we will introduce some notation from logic that can be helpful for making precise math-
ematical statements. We begin with the universal and existential quantifiers. These are
essentially just abbreviations.

• The notation ∀ means “for all”.

• The notation ∃ means “there exists”.

We also sometimes use the colon symbol “:” to denote “such that”. With these notational
conventions, we can also give a more concise description of some of the sets and concepts we
discussed in the previous subsection. Here are some examples:

• The statement
∃x ∈ N : x is even
simply says that there is at least one even natural number. This is a true statement,
obviously!

• We can also use this notation to make false statements. For instance,

∀x ∈ N, x is even.

• We have that M ⊂ N if and only if

∀x ∈ M, x ∈ N.

• We have that M ⊊ N if and only if M ⊂ N and

∃x ∈ N : x ∈
/ M.

As you might have already noticed, we will often need the terms “if” or “if and only if”, and
therefore we define a mathematical symbol for them. In order to do this properly, we need
to learn some basic formal logic.
Definition 1.3. A proposition is a statement which is either true or false.

We can use connectives to build more complicated propositions depending on two proposi-
tions A and B.
Definition 1.4. Let A and B be propositions, then

• ¬A ( not A) is the negation of A,

• A ∧ B (A and B) is the conjunction of A and B,

• A ∨ B (A or B) is the disjunction of A and B,

Examples - Let A be the proposition “3 is a prime number” and let B be the proposition
“22 = 5”. So, A is true and B is false.

8
• ¬A is the statement “3 is not a prime”. This is false.

• ¬B is the statement “22 ̸= 5”. This is true.

• A ∨ B is the statement “3 is a prime or 22 = 5”. This is true.

• A ∧ B is the statement “3 is a prime and 22 = 5”. This is false.

• A ∧ ¬B is the statement “3 is a prime and 22 ̸= 5”. This is true.

• A ∨ ¬B is the statement “3 is a prime or 22 ̸= 5”. This is true.

The last of these propositions highlights a difference between the meaning of “or” in mathe-
matics and in common language. The proposition A ∨ B is true if both A and B are true also.
In common language, when we say something like “please buy me some apples or oranges”,
we are expecting only one of the kinds of fruit to arrive (this is the exclusive or). The common
language equivalent of the symbol ∨ is “and/or”.
We sometimes use truth tables to see relations between truthfulness of propositions and
their component parts. Here is an example of a truth table for the negation, conjunction,
and disjunction.

A B ¬A A∧B A∨B
T T F T T
T F F F T
F T T F T
F F T F F

The truth value of a proposition is either T or F. We sometimes write |A| = T or |A| = F


to indicate the truth value of A.

Definition 1.5. Let A and B be two propositions. Then,

• “A =⇒ B” means “A implies B”. In other words, if A is true then so is B.

• “A ⇐⇒ B” means that both A =⇒ B and B =⇒ A. In other words, A is true if


and only if B is true.

Using the notation for the conjunction, we see that the proposition A ⇐⇒ B is the same as
(A =⇒ B) ∧ (B =⇒ A). We also sometimes use “iff” as an abbreviation for “if and only
if”.
Examples - Here are some examples of true mathematical statements which involve impli-
cations.

• A ⊂ N =⇒ A ⊂ Z.

• (x ∈ P and x > 2) =⇒ x is odd.

• b2 − 4ac > 0 ⇐⇒ (ax2 + bx + c = 0 has two distinct real solutions).

9
For the three statements above, it seems to be intuitively clear that they are true. We said
above that the implication A ⇒ B means that “if A is true then B is true”. However, what if
A is not true? This situation can be rather confusing. Consider the following two statements:

(1 = 2) =⇒ (Every prime number is odd),


(π ∈ ∅) =⇒ (2 is even ).

Although it is not so intuitively obvious, both of these statements are in fact also true. In
fact, if A is false, then the statement A ⇒ B is true, regardless of what B says! This can be
confusing, since it can be used to build mathematical statements which are logically true but
appear to be nonsensical, like these two above.
Truth tables may be helpful for clearing up potential confusion surrounding this issue. The
truth table showing both directions of implication is given below.

A B A⇒B B⇒A A⇔B


T T T T T
T F F T F
F T T F F
F F T T T

Here are some remarks about what this truth table shows us.

• We can see from the truth table above that the only way that the statement A =⇒ B
is false is if A is true and B is false.

• The final column shows that the statement A ⇐⇒ B is true when A and B have the
same truth value. If two propositions A and B have the same truth value then we say
that they are logically equivalent, and write A ≡ B.

We can build longer truth tables and use them to compare other logical statements. Let us
compare the statement A =⇒ B with ¬B =⇒ ¬A.

A B ¬A ¬B A⇒B ¬B ⇒ ¬A
T T F F T T
T F F T F F
F T T F T T
F F T T T T

We see that the truth values of the statements A =⇒ B and ¬B =⇒ ¬A are in fact
identical. This means that the statements are logically equivalent; if one is true then the
other is true, and if one is false then the other is false. Another way of writing this is

(A =⇒ B) ⇐⇒ (¬B =⇒ ¬A).

We say that the statement ¬B =⇒ ¬A is the contrapositive of the statement A =⇒ B.


There are times when it is very helpful to know that these two statements are equivalent to

10
one another, particularly as it can help us to transfer a statement that is difficult to work
with into an equivalent statement that may be easier to understand.
We use implications in mathematics to build a chain of logic, using existing statements to
prove new ones. When we do this, we are using the transitivity of implication. This is the
statement that, for propositions A, B and C,

[(A =⇒ B) ∧ (B =⇒ C)] =⇒ (A =⇒ C). (2)

Exercise - Use a truth table to verify (2).


We often consider propositions which depend on one or more variables, such as the statements
“x is even” or “a ≤ b”. The truth value of these statements depends on the choice of the
variables x, a and b. A proposition which depends on a variable is called a predicate. We
may use the notation P (x) or Q(a, b) for such predicates.
Example - Let P (A) be the statement “A ⊂ {1, 2, 3, 4}”. Then, for example, P ({1, 3})
and P (∅) are true, while P (Z) is false. Let Q(A, B) be the statement “B ⊂ A”. Then
Q({1}, {1, 2}) is false and Q(Z, N) is true. Observe that

(P (A) ∧ (Q(A, B))) =⇒ (P (B)).

Finally, with all of this notation, we may write certain definitions or statements without using
any words, but rather by exclusively using mathematical symbols. For example,

M ⊂ N ⇐⇒ (∀x ∈ M, x ∈ N ) ⇐⇒ (x ∈ M ⇒ x ∈ N ).

The “sentence” above shows three different symbolic descriptions for the statement that M
is a subset of N .
However, just because we can use these symbolic shorthand descriptions, it does not mean
that we always should! Sometimes people forget that words themselves are a very valuable
tool for describing mathematical ideas.

11
1.3 Relations and functions

Roughly speaking, relations shall describe connections between two objects. Here, we give a
formal description and important properties. We then introduce functions; a special kind of
relation which describe connections from one set to another. We also discuss special relations
that are used to compare, group or order elements of a given set.
Definition 1.6. A relation R between two sets M and N is a subset of the Cartesian product
of M and N , i.e. R ⊂ M × N .

To make things clearer, see the upcoming illustration, which depicts every element of R as a
“connection” between an element of M and an element of N . As you can see it is possible
that x ∈ M is connected to some y ∈ N , which we denote by (x, y) ∈ R. However, this does
not have to be the case for every x ∈ M , and different elements of M may be connected to
the same y ∈ N . Moreover, x ∈ M can be connected to more than one element in N , or can
even be connected to none of the elements of N .

Figure 2: An illustration of a relation

Example - Let M = {2, 3, 4, 5} and N = {4, 5, 6, 7} and define a relation R ⊂ M × N


whereby
(x, y) ∈ R iff x or y are prime.
Then

R = {(2, 4), (2, 5), (2, 6), (2, 7), (3, 4), (3, 5), (3, 6), (3, 7), (5, 4), (5, 5), (5, 6), (5, 7), (4, 5), (4, 7)}.

Now we head to a very important type of relation, namely functions, which assign to each
element of M exactly one element of N .
Definition 1.7. Let M and N be non-empty sets. We call f : M → N a function from M
to N , if each x ∈ M is assigned exactly one element f (x) ∈ N .
M is called the domain of f and N is the codomain of f .

Examples - The function f : R → R defined by

f (x) = x2

12
is a function. It satisfies the key property that every element x in the domain is assigned a
value f (x).
Similarly, the function g : R → R defined by

g(x) = x + 1

is a function.

Definition 1.8. Let M, N ̸= ∅ and let f : M → N be a function from M to N .


For S ⊂ M , we define the image of S under f as

f (S) := {f (x) : x ∈ S} ⊂ N,

and the range of f as


f (M ) := {f (x) : x ∈ M } ⊂ N.
In other words, the range of f is the image of the whole domain under f .
For T ⊂ N , we define the preimage of T under f by

f −1 (T ) := {x : f (x) ∈ T } ⊂ M.

Examples - Consider again the functions f (x) = x2 and g(x) = x + 1 introduced above. Let
S ⊂ R be the closed interval S = [1, 3]. Then

f (S) = [1, 9], and g(S) = [2, 4].

The range of f is [0, ∞), while the range of g is R.


Since S is also a subset of the codomain of both f and g, we can also consider its preimage
in each case. We obtain √ √
f −1 (S) = [− 3, −1] ∪ [1, 3]
and
g −1 (S) = [0, 2].

The next definition shows the connection between relations and functions.

Definition 1.9. Let f : M → N be a function. We define the graph of f as

Gf := {(x, f (x)) : x ∈ M } ⊂ M × N.

Note that the graph of a function is a relation. In this sense, all functions induce a relation,
but not vice versa.
We can visualize a function with domain and codomain in R by plotting its graph in R2 . For
f (x) = x2 and g(x) = x + 1 this is demonstrated in the next illustration (Figure 3).
In what follows, we will define several important properties of relations. We will always
demonstrate afterwards what this means for functions.

Definition 1.10. Let R ⊂ M × N be a relation.

13
Figure 3: The graph of x2 and x + 1

• R is injective if and only if

∀ (x1 , y1 ), (x2 , y2 ) ∈ R, x1 ̸= x2 ⇒ y1 ̸= y2 .

This is equivalent to

∀(x1 , y1 ), (x2 , y2 ) ∈ R, y1 = y2 ⇒ x1 = x2 .

• R is surjective if and only if

∀y ∈ N, ∃x ∈ M : (x, y) ∈ R.

• R is bijective if and only if it is injective and surjective.

• R is functional if and only if

∀x ∈ M, ∃ y ∈ N : (x, y) ∈ R

and
∀x ∈ M, y1 , y2 ∈ N, ((x, y1 ), (x, y2 ) ∈ R) ⇒ y1 = y2 .
The two requirements for a relation to be functional can be written more succinctly as
follows:
∀x ∈ M, ∃! y ∈ N : (x, y) ∈ R.

Note that the graph of a function is a functional relation, and vice versa. We can therefore
rephrase the above definitions for functions.

Definition 1.11. Let f : M → N be a function.

14
• We say f is injective if and only if

∀x1 , x2 ∈ M, x1 ̸= x2 ⇒ f (x1 ) ̸= f (x2 ).

• We say f is surjective if and only if

∀y ∈ N, ∃x ∈ M : f (x) = y.

• We say f is bijective if and only if

∀y ∈ N, ∃!x ∈ M : f (x) = y.

Let us see some illustrations for better understanding.

Figure 4: injective relation

Figure 5: surjective relation

Figure 6: a function (not injective and not surjective)

15
Figure 7: bijective function

Figure 8: bijective relation but not a function

We will now define two simple functions that can be defined on an arbitrary set M . First,
we define the identity function IdM : M → M which maps each element to itself, i.e.,
IdM (x) = x.
A constant function is a function which takes the same value for every element. Let M and
N be arbitrary non-empty sets and let c ∈ N be fixed. The function f : M → N , which is
defined by
f (x) = c,
for all x ∈ M , is called a constant function.
We now ask ourselves the following question: given a function f : M → N , can we find
another function which reverses the effect of f . This leads us to the concept of an inverse
function.
Definition 1.12. Let f : M → N and g : N → M be functions with the properties
∀ x ∈ M, g(f (x)) = x
and
∀ y ∈ N, f (g(y)) = y.
Then f is the inverse of g and g is the inverse of f . In this case we write f −1 := g and
g −1 := f and call f and g invertible.

16
Note that we already used the notation f −1 in the context of the preimage of a set under a
function f . This similarity in notation is intentional, as the following exercise indicates.
Exercise - Let f : M → N be an invertible function with inverse f −1 : N → M and S ⊂ N .
Prove that
{f −1 (y) : y ∈ S} = {x ∈ M : f (x) ∈ S}. (3)

Let us revisit Definition 1.8. The left hand side of (3) is the image of S under f −1 , which
is denoted f −1 (S). The right hand side is the preimage of S under f , which is also denoted
f −1 (S). So, the exercise shows that these two notational conventions do not clash with one
another.
Example Let R+ denote the set of all non-negative real numbers (i.e. R+ = [0, ∞)) and let
f : R+ → R+ and g : R+ → R+ be defined by

f (x) = x2 , and g(x) = x.

For fixed x > 0, we have g(f (x)) = x2 = x. On the other hand, for all y > 0 we have

f (g(y)) = ( y)2 = y. Therefore, f and g are inverses of each other.
The following theorem provides us with a tool to check if a function has an inverse or not.

Theorem 1.13. Let f : M → N be a function. Then,

f is invertible ⇐⇒ f is bijective.

This is the first theorem we have seen in this course, and it will be the first of many. We are
trying to build a rigorous mathematical foundation in this course, and so we will (with a few
exceptions) give a proof of every result we use during the course. The skill of understanding
and writing mathematical proofs is rather specialised, and is likely to be unfamiliar to many
students of this course. We will devote special attention to proof techniques in a section that
will be shortly forthcoming, and will include the proof of Theorem 1.13.
Invertible (or bijective) functions may be used to formally define the cardinality of a set.

Definition 1.14. A finite set M containing n elements has cardinality n. We write |M | =


n.

In other words, the cardinality of a finite set is simply its size.


Note that the existence of a bijective function f : M → N means that there is a one-to-
one correspondence between M and N . In particular, both sets must have the same
cardinality. Moreover, the next exercise shows how we can compare the cardinality of sets by
looking at the properties of functions that exist between them.
Exercise - Show that, for two finite sets A and B, the following statements are true.

• |A| = |B| if and only if there is a bijection f : A → B.

• |A| ≤ |B| if and only if there is an injection f : A → B.

• |A| ≥ |B| if and only if there is a surjection f : A → B.

17
As well as finite sets, bijections also allow us (to some extent) to characterise the cardinality
of an infinite set.

Definition 1.15. Let A be a set.

• If there exists n ∈ N such that |A| = n, then we call A finite, or a finite set.

• If A is not finite, then we call A infinite, or an infinite set.

• If there exists a bijection f : N → A, then we call A countably infinite.

• If A is either finite or countably infinite then we call A countable.

• If A is not countable, then we call A uncountable.

Note that countability is the precise definition of the “simple” property that the elements of
A can be enumerated by the natural numbers {1, 2, 3, . . . }, which is a fancy way to say, the
elements of A can be counted.
Examples - Define a function f : N → Z such that, for all n ∈ N,

f (2n) := n, and f (2n − 1) := −n + 1.

This is a bijection. Indeed, f (1) = 0, f (2) = 1, f (3) = −1, f (4) = 2, f (5) = −2, and so on.
We can see from this pattern that every element of Z is mapped to by exactly one element
of N. By Definition 1.15, it follows that Z is a countable set.
Note the following strange feature of this example. Intuitively, it seems that the set Z is
“bigger” than N. Indeed, N is a proper subset of Z, and appears to contain half as many
elements. Still, with the notion of size (i.e. cardinality) given by Definition 1.15, the two sets
have the same size; they are both countably infinite.
Exercise - Show that the set E = {2, 4, 6, . . . } is countably infinite.
Exercise - Let M and N be finite sets. Show that |M × N | = |M ||N |.
We can also consider the composition of functions. Let X, Y, Z be non-empty sets and let
f : X → Y and g : Y → Z be functions. We then define a function (g ◦ f ) : X → Z by first
applying f and then applying g. That is, we define (g ◦ f )(x) = g(f (x)).
Exercise - Check that g ◦ f is indeed a function.
Note that it is important that the codomain of f matches the domain of g in order for the
definition of g ◦ f to make sense.
Example - If f : M → N is an invertible function and f −1 : N → M is its inverse, then, for
all x ∈ M and y ∈ N , we have

f −1 (f (x)) = x, and f (f −1 (y)) = y.

In particular, f ◦ f −1 = IdN and f −1 ◦ f = IdM .


Example - Consider the functions f, g, h : R → R defined by

f (x) = x2 , g(x) = sin(x) and h(x) = sin(x2 ).

18
We get
(g ◦ f )(x) = g(f (x)) = sin(x2 ) = h(x).
However, if we reverse the order of composition and consider the function f ◦ g, we see that

f (g(x)) = (sin x)2 .

In particular the functions f ◦ g and g ◦ f are not the same. Indeed, in a typical case we have
f ◦ g ̸= g ◦ f , and it can even be the case that only one of the compositions is defined.
The next theorem shows how to calculate the inverse of a composition of two functions.

Theorem 1.16. Let f : X → Y and g : Y → Z be invertible functions. Then g ◦ f is


invertible and
(g ◦ f )−1 = f −1 ◦ g −1 .

We will prove Theorem 1.16 in the forthcoming introductory section on proof.


Finally, let us discuss some relations that are used to compare, group or order elements
of a set M . For this we consider relations R ⊂ M 2 . Such relations usually have nothing to
do with functions, but are still essential. Again, relations with some particularly important
characteristics are given special names.

Definition 1.17. Let R ⊂ M × M be a relation for an arbitrary M ̸= ∅. We call R

• reflexive if and only if


∀ x ∈ M, (x, x) ∈ R,

• symmetric if and only if


(x, y) ∈ R ⇒ (y, x) ∈ R,

• antisymmetric if and only if

(x, y), (y, x) ∈ R ⇒ x = y,

• transitive if and only if

(x, y), (y, z) ∈ R ⇒ (x, z) ∈ R,

• total if and only if


∀x, y ∈ M, (x, y) ∈ R or (y, x) ∈ R.

Some relations which have certain combinations of these properties are especially important.

Definition 1.18. • An equivalence relation is a relation that is reflexive, symmetric


and transitive.

• A partial order is a relation that is reflexive, antisymmetric and transitive.

• A linear order is a relation that is a partial order and total.

19
Example - Consider the relation L ⊂ R2 , where we define

(x, y) ∈ L ⇔ x ≤ y.

This is the well-known smaller or equal relation in R and we just write x ≤ y for (x, y) ∈ L
in the following.

• This relation is reflexive, since x ≤ x holds for all x ∈ R.

• This relation is antisymmetric, since x ≤ y and y ≤ x both holding implies that x = y.

• This relation is transitive, since x ≤ y and y ≤ z implies that x ≤ z.

• This relation is total: for all x, y ∈ R, at least one of x ≤ y or y ≤ x holds.

Therefore, L is a linear order.


Example - Consider the usual equality relation in R. That is, define

L = {(x, x) : x ∈ R} ⊂ R2 .

Then L is reflexive, symmetric, antisymmetric and transitive, but not total. It is therefore
an equivalence relation and a partial order.
Exercise - Show that the relation L from the previous example is the only relation that is
symmetric and antisymmetric.
Example - Define the “strictly less” relation “<” by

L = {(a, b) ∈ R2 : a < b}.

We seek to determine which of the characteristics defined in Definition 1.17 L has.

• L is not reflexive, since, for example, 1 < 1 does not hold. (Of course, we know that
x < x is never true, but in order to show that reflexivity does not hold, we only need
to find a single instance of a counterexample.)

• L is not symmetric. For instance 1 < 2 holds, but 2 < 1 is false.

• L is antisymmetric. We will discuss this further below.

• L is transitive, since x < y and y < z implies that x < z.

• L is not total. Once again, this is because, for example, 1 < 1 does not hold.

Let us now verify that L is antisymmetric, as claimed above. This is an instance of a


potentially confusing feature of implication that we discussed earlier. We want to verify that
the implication
(x < y and y < x) ⇒ x = y
is a valid statement. However, the left hand side of this implication is a false statement, and
false statements imply everything!

20
This is an instance of a more general situation that occurs in several mathematical proofs.
Suppose that P (x) is a statement that depends on some real number x, and we want to verify
that
x ∈ A ⇒ P (x) is true. (4)
In the particular case when A = ∅, the implication (4) is certainly true, because the left hand
side is certainly false.
Exercise - Let m and n be integers. We say that m divides n, and write m|n if there exists
k ∈ Z such that
mk = n.
Show that the divisibility relation

R := {(a, b) ∈ N × N : a|b}

is a partial order.
Example - One can define a partial order on a set of sets by using the subset relation ⊂.
For example, for
M := {∅, {1}, {2}, {1, 2}},
we define the relation
L = {(A, B) ∈ M × M : A ⊂ B}.
It follows from the basic properties of inclusion that the relation L is reflexive, antisymmetric
and transitive. Hence it is a partial order on M . However, since {1} ̸⊂ {2} and {2} ̸⊂ {1},
it is not total, and so not a linear order on M .
For the remainder of this subsection, we will pay special attention to equivalence relations.
Equivalence relations have the particularly useful and interesting property that they can be
used to partition the elements of a set into related chunks. These partitioning sets are called
equivalence classes.
Definition 1.19. Let R ⊂ X × X be an equivalence relation. For x ∈ X, we define the
equivalence class of x by

[x]R := {y ∈ X : (x, y) ∈ R}.

When it is clear which relation we are referring to (which it usually is in practice) we abbre-
viate [x]R to [x].
Each element y ∈ [x] is called a representative of the equivalence class [x]. Note that, by
reflexivity (x, x) ∈ R, and hence x ∈ [x]. So an equivalence class is never the empty set and
x is a representative of [x].
Example - Consider again the equivalence relation

L = {(x, x) : x ∈ R} ⊂ R2 .

Then, for each x ∈ R, we have [x] = {x}, and so the equivalence classes for this relation are
all singleton sets (i.e. sets with exactly one element).
Example - Fix an arbitrary m ∈ N. We say that two integers a, b ∈ Z are congruent
modulo m, if
m|(a − b)

21
and we write
a ≡ b mod m.
Define
R = {(a, b) ∈ Z : a and b are congruent modulo m}.
Then R is an equivalence relation. Indeed, let us check that the necessary three properties
hold.

• Reflexivity: Let a ∈ Z. Then m divides a − a = 0, and so (a, a) ∈ R.

• Symmetry: Let a, b ∈ N such that (a, b) ∈ R. That is, m|a − b. In other words, there
is some k ∈ Z such that
a − b = mk.
But then
b − a = m(−k),
which means that m divides b − a and thus (b, a) ∈ R.

• Transitivity: Let a, b, c ∈ Z such that (a, b), (b, c) ∈ R. By the definition of divisibility,
there exist integers k1 , k2 such that

a − b = mk1 , and b − c = mk2 .

Therefore,
a − c = (a − b) + (b − c) = mk1 + mk2 = m(k1 + k2 ).
This means that m divides a − c, and thus (a, c) ∈ R.

Note that, in both of the examples above, the equivalence classes do not overlap with one
another, and every element of the underlying set belongs to exactly one equivalence class. This
is not a coincidence; the following theorem shows that equivalence classes always partition
the underlying set in such a way.

Theorem 1.20. Let R ⊂ X × X be an equivalence relation and x, y ∈ X. Then,

(x, y) ∈ R ⇐⇒ [y] = [x] ⇐⇒ [x] ∩ [y] ̸= ∅.

We will prove Theorem 1.20 in the forthcoming subsection “Introduction to Proof”.


In particular, the last equivalence

[y] = [x] ⇐⇒ [x] ∩ [y] ̸= ∅

can be rewritten as
[y] ̸= [x] ⇐⇒ [x] ∩ [y] = ∅.
Therefore, we indeed have that two distinct equivalence classes must be disjoint.

22
1.4 Real numbers

The natural starting point when thinking about numbers is the set N = {1, 2, 3, . . . }. These
are the first numbers we learn about as small children, and it feels right that they are called
the natural numbers. However, this alone is not enough to solve all of the equations that we
encounter using simple mathematical operations like addition, subtraction, multiplication,
and division. And so, we gradually extend our universe of numbers in order to be able to
deal with more of these simple mathematical questions. Continuing in this way, we soon
reach the set of real numbers.
Let’s give a little more detail about this vague idea described in the paragraph above. The set
N is closed under addition and multiplication. That is, for all n, m ∈ N we have m + n ∈ N
and m · n ∈ N. However, we already get in trouble when we try to work with subtraction,
since, for example 21 − 42 = −21 ∈ / N.
The set Z of integers is closed under addition and multiplication, but is additionally closed
under subtraction. That is, for any a, b ∈ Z we have a + b, ab, a − b ∈ Z. However, division
is still a problem if we use integer numbers only. For instance, 1 and −3 are both in Z, but
1
the ratio −3 is not.
We therefore define the set of all rational numbers
na o
Q := : a, b ∈ Z, b ̸= 0 ,
b
where we call a the numerator and b the denominator. It is possible that the same rational
number can have two different representations. For instance, 21 and 24 are the same number.
We call two rational numbers nk and ml equal if and only if km = ln with m, n ̸= 0.
It is sometimes convenient to exclude 0 from Q in order to avoid the problem of potentially
dividing by zero, and we use the notation Q∗ := Q \ {0}. We now have a set which is closed
under division. Indeed, if we take two elements ab and dc in Q∗ (with a, b, c, d ∈ Z \ {0}), then
the ratio
a/b ad
=
c/d bc
is an element of Q∗ .
k
Note also that any integer k ∈ Z can be represented as an element of Q by writing 1 = k.
Consequently, we can expand the earlier chain of inclusions (1) to

N ⊊ N0 ⊊ Z ⊊ Q. (5)

The real reason that we have introduced “new” sets of numbers above is that we wanted to
solve certain equations and, in particular, we want to know if a solution exists in a given set.
For example, if we want to solve the equation a · x + b = 0, where a, b ∈ Q are fixed constants
and a ̸= 0, we see that
−b
x= ∈Q
a
and so we can find a solution to the equation in Q.
However, what about the simple equation x2 − 2 = 0? Is there some x ∈ Q such that x2 = 2?
The following theorem will show that this is not possible.

23
Theorem 1.21. There is no x ∈ Q such that x2 = 2.

We will come back and prove this theorem in the next subsection, using the method of proof
by contradiction.
Theorem 1.21 shows that the equation x2 − 2 = 0 is not solvable in Q. But we would like
there to √be a solution √
somewhere! Furthermore, we all know from school that there is a
number 2 such that ( 2)2 = 2.
Making the number line complete, we finally get to the set of real numbers. Using the
familiar decimal expansion, we can define the set of real numbers to be
n a1 a2 o
R := b + + + · · · : b ∈ Z , a1 , a2 · · · ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} . (6)
10 100

One may think of R as the set of all points on the number line, i.e., the number line without
any holes. Note that there are infinitely many terms in the sum in (6) defining an element of
R. It is possible that, after a certain point, all of the ai are equal to zero. For instance, an
integer n can be expressed in this form, with all of the ai equal to zero. More generally, the
rational numbers, written using the decimal expansion as in (6), either have a finite number
of digits or the sequence of digits is periodic. This means that, if we consider only the rational
numbers, some points on the line are missing. These correspond to numbers which have √a
non-periodic infinite number of decimals, and cannot be written as fractions. As well as 2,
some other prominent examples of such numbers are π and e. These elements of R \ Q are
called irrational numbers.
We can extend the hierarchy described in (5) to include R as follows:

N ⊊ Z ⊊ Q ⊊ R.

It turns out that the set of rational numbers is countable, whereas the set of real numbers is
uncountable. So, there are “more” irrational numbers than there are rationals. The task of
proving this statement will be considered in a future exercise sheet.
The set of real numbers satisfies several convenient arithmetic and algebraic properties. These
properties turn out to be useful and interesting in a much more general setting, which is why
we give the following definition.

Definition 1.22. Let F be a set with operations of addition (formally, addition is a function
+ : F × F → F) and multiplication (formally, multiplication is a function · : F × F → F)
defined over F which satisfy the following properties.

• Commutativity: for all x, y ∈ F, x + y = y + x and x · y = y · x.

• Associativity: for all x, y, z ∈ F, (x + y) + z = x + (y + z) and (x · y) · z = x · (y · z).

• Distributivity: for all x, y, z ∈ F, x · (y + z) = (x · y) + (x · z).

• Identity elements: there exists two distinct elements 0, 1 ∈ F such that, for all x ∈ F

1. x + 0 = 0 + x = x and
2. x · 1 = 1 · x = x.

24
• Inverse elements:

1. for all x ∈ F, there exists y ∈ F such that x + y = y + x = 0, and


2. for all x ∈ F \ {0}, there exists y ∈ F such that x · y = y · x = 1.

Then we call F a field.

All of these properties hold for the set of reals, as we can see by using elementary rules for
manipulating algebraic equations, and so R is a field.
Exercise - With the usual operations of addition and multiplication, is Q a field? How about
Z?
Exercise - Suppose that F is a field. Show that, for all x, y, z ∈ F,

(x + y) · z = (x · z) + (y · z).

25
1.5 An introduction to proof

The concept of rigorous proof is absolutely vital in mathematics. Unlike other branches of
the sciences, we cannot be satisfied with evidence that something is very likely to be true,
or true beyond reasonable doubt, as one might accept in a court of law. Rather, we need to
know that something is true (or false) with absolutely certainty.
For instance, the Goldbach Conjecture states that every even integer greater than 2 can be
written as a sum of two prime numbers. We can get some intuition for this conjecture by
writing down some examples:
4 = 2 + 2,
6 = 3 + 3,
8=3+5
10 = 5 + 5
..
.
100 = 3 + 97.
This list provides some initial evidence that the conjecture is correct. In fact, a whole load
more evidence exists; the conjecture has been verified, with the help of computers, for all
even integers n up to 4 · 1018 . That is a lot of cases checked!
Still, mathematicians continue to seek a proof of the Goldbach conjecture that guarantees
that the statement is valid for all even n, with absolute certainty. A great deal of effort has
been invested in this pursuit, including efforts from many great mathematicians spanning
several generations, and a million dollar prize was even offered at one point.
In this subsection, we will discuss some basic techniques that allow us to prove some of the
statements that were given earlier in Section 1, and in doing so develop some tools that will
allow us to prove many interesting statements during the remainder of the course. We will
focus on three particular techniques: proof by definition, proof by contradiction, and proof
by induction.
We begin with proof by definition. These are often the easiest kinds of proofs, and the
technique essentially consists of stringing together some basic properties that follow from
definitions. This is another advertisement for the importance of knowing all of the definitions
in this course!
We begin with a fairly simple proof that uses only the definitions involved. The following
lemma (a lemma is like a small theorem, √ and usually is used to prove a more important
result) will be helpful in proving that 2 is irrational. We make use of the obvious definition
of an even integer; n ∈ Z is even iff there exists k ∈ Z such that n = 2k.
Lemma 1.23. Let n ∈ N. Then
n is even ⇐⇒ n2 is even.

Proof. This is a two-way implication, and so we need to prove both directions. We begin by
proving the “⇒” direction. Suppose that n is even. Then, by the definition of evenness, we
can write n = 2k for some k ∈ N. Therefore,
n2 = (2k)2 = 4k 2 = 2(2k 2 ).

26
Now we prove the “⇐” direction. In fact we will prove the equivalent contrapositive statement

n is odd =⇒ n2 is odd.

Suppose that n is odd, and so n = 2k + 1 for some k ∈ N0 . But then

n2 = (2k + 1)2 = 4k 2 + 4k + 1 = 2(2k 2 + 2k) + 1.

Thus n2 is odd, by definition, and the proof is complete.

For the next instance of these methods, let us restate and then prove Theorem 1.13.
Theorem 1.24. Let f : M → N be a function. Then,

f is invertible ⇐⇒ f is bijective.

Proof. We are trying to prove that an “ ⇐⇒ ” holds, and so we need to prove two implications.
We begin with the “⇒” direction. We assume that f is invertible, and we need to show that
it is a bijection.
We begin by using the definition of invertibility. Since f is invertible, there exists f −1 : N →
M such that
f −1 (f (x)) = x, ∀ x ∈ M, (7)
and
f (f −1 (y)) = y, ∀ y ∈ N. (8)
To show that f is bijective, we need to show that it is surjective and injective. We can take
care to write down the definitions that tell us exactly what we need to show. We begin with
surjectivity. We need to show that,

∀ y ∈ N, ∃ x ∈ M : f (x) = y.

By (8), we can take x = f −1 (y) ∈ M and see that f (x) = y.


Secondly we verify that f is injective. Let us again recall the definition of the property we
are trying to prove. f is injective if we have

∀ x1 , x2 ∈ M, x1 ̸= x2 =⇒ f (x1 ) ̸= f (x2 ).

It is easier to work with the logically equivalent contrapositive of this statement. That is, we
will show that
∀ x1 , x2 ∈ M, f (x1 ) = f (x2 ) =⇒ x1 = x2 .
Let f (x1 ) = f (x2 ) for some x1 , x2 ∈ M . Since f (x1 ) ∈ N and f (x2 ) ∈ N we may apply the
function f −1 to f (x1 ) = f (x2 ) and get

x1 = f −1 (f (x1 )) = f −1 (f (x2 )) = x2 ,

as required.
We now move to the “⇐” direction. We assume that f is bijective and need to prove that f
is invertible. Since f is bijective we have

∀ y ∈ N, ∃! x ∈ M : f (x) = y. (9)

27
In words, for each y ∈ N we can find a unique x ∈ M such that f (x) = y. Now we define
g : N → M such that it maps each y ∈ N to this unique x ∈ M , i.e. g(y) = x, where x is
given by (9). This shows that, for each y ∈ N , we have f (g(y)) = y. Indeed,

f (g(y)) = f (x) = y.

We also need to check that, for all x ∈ M

g(f (x)) = x.

But this is an immediate consequence of the way that the function g has been defined.
Therefore, f is invertible with f −1 = g.

Exercises

• Let f : M → N be a function. Prove that

∀x ∈ M, (f ◦ IdM )(x) = f (x) = (IdN ◦ f )(x). (10)

• Prove that composition of functions is associative. That is, for any functions

f : M → N, g : N → O, and h : O → P,

prove that, ∀ x ∈ M ,
(h ◦ (g ◦ f ))(x) = ((h ◦ g) ◦ f )(x).

• Show that the inverse function is unique. That is, let f : M → N be an invertible
function and suppose that g1 , g2 : N → M are inverses of f . Prove that g1 = g2 .

Next, we prove Theorem 1.16, which relates inverses and compositions. We restate and prove
the result now.
Theorem 1.25. Let f : X → Y and g : Y → Z be invertible functions. Then g ◦ f is
invertible and
(g ◦ f )−1 = f −1 ◦ g −1 .

Proof. We will directly check that f −1 ◦ g −1 is the inverse of g ◦ f . Recalling the definition
of the inverse of a function, there are two things we need to check:

(f −1 ◦ g −1 )((g ◦ f )(x)) = x ∀ x ∈ X (11)

and
(g ◦ f )((f −1 ◦ g −1 )(z)) = z ∀ z ∈ Z. (12)
Let us first focus on the goal of checking that (11) holds. This can be rewritten as

((f −1 ◦ g −1 ) ◦ (g ◦ f ))(x) = x. (13)

By the associativity of function composition (see the previous exercise) applied twice,

((f −1 ◦ g −1 ) ◦ (g ◦ f ))(x) = (f −1 ◦ (g −1 ◦ (g ◦ f )))(x)


= (f −1 ◦ ((g −1 ◦ g) ◦ f ))(x)
= (f −1 ◦ (IdY ◦ f ))(x).

28
By (10), it follows that

((f −1 ◦ g −1 ) ◦ (g ◦ f ))(x) = (f −1 ◦ f )(x) = x,

which proves (13) (and thus also (11).


The proof of (12) is similar and is left as an exercise.

We continue to collect some proofs of statements that were given earlier in the course. The
proof of the next statement (stated earlier as Theorem 1.20) also uses only the relevant
definitions.
Theorem 1.26. Let R ⊂ X × X be an equivalence relation and x, y ∈ X. Then,

(x, y) ∈ R ⇐⇒ [x] = [y] ⇐⇒ [x] ∩ [y] ̸= ∅.

Proof. We will prove the following chain of implications:

(x, y) ∈ R =⇒ [x] = [y] =⇒ [x] ∩ [y] ̸= ∅ =⇒ (x, y) ∈ R.

Once we have completed this cycle of implication, we can see that all three of the statements
imply one another by moving around the cycle.

Figure 9: Chain of implications

Proof that (x, y) ∈ R =⇒ [x] = [y]. Suppose that (x, y) ∈ R. We will first show that
[x] ⊂ [y]. Let z ∈ [x]. By definition (x, z) ∈ R. By the symmetry of R, we also know that
(y, x) ∈ R. Then, by the transitivity of R, we have

(y, x) ∈ R and (x, z) ∈ R =⇒ (y, z) ∈ R.

That is z ∈ [y], and we have shown that [x] ⊂ [y].


To prove the reverse inclusion [y] ⊂ [x], let z ∈ [y] be arbitrary. So (y, z) ∈ R, and then by
transitivity
(x, y), (y, z) ∈ R =⇒ (x, z) ∈ R.
This shows that z ∈ [x], and hence [y] ⊂ [x]. Thus

[y] = [x].

29
Proof that [x] = [y] =⇒ [x] ∩ [y] ̸= ∅. This is clear, we know that [x] is not the empty set
(since x ∈ [x]) and so [x] ∩ [y] is not the empty set (for instance x ∈ [x] ∩ [y].)
Proof that [x] ∩ [y] ̸= ∅ =⇒ (x, y) ∈ R. If [x] ∩ [y] ̸= ∅ then there is some z ∈ [x] ∩ [y]. By
definition (x, z) ∈ R and (y, z) ∈ R. By symmetry, (z, y) ∈ R. Then, by transitivity,

(x, z), (z, y) ∈ R =⇒ (x, y) ∈ R.

We now move on to proof by contradiction. The basic idea is as follows. Suppose that we
want to prove that a given statement A is true. We assume the opposite, which is that A is
not true, and look to see what consequences of this assumption can be logically derived. If
we can derive a statement B which we know to be false, then there must be something wrong
with our initial assumption that A is false. We therefore conclude that A is true. Actually,
what we are doing with this approach is proving the equivalent contrapositive statement
(recall the definition from Section 1.2), and some students may find this to be an easier way
to think about this approach.
We
√ first give one of the most famous proofs by contradiction in mathematics; the proof that
2 is irrational.

Theorem 1.27. There is no x ∈ Q such that x2 = 2.

Proof. We assume for a contradiction that there is some x ∈ Q such that x2 = 2. We write
m
x= (14)
n
for some integers m and n. Moreover, we may assume that at least one of m and n is odd.
Indeed, if we have m = 2k and n = 2l then we can rewrite x as x = kl with k < m. We
repeat this process until one of the parts of the fraction is odd. The process must terminate
eventually, as the value of the integer k gets strictly smaller each time, and this cannot happen
infinitely often.
Since x2 = 2, (14) gives
m2
2 = x2 = .
n2
This rearranges to give
m2 = 2n2 . (15)
It therefore follows from Lemma 1.23 that m is even, and so we can write m = 2k for some
k ∈ Z. We can plug this back into (15) to get 2n2 = m2 = (2k)2 = 4k 2 , and thus

n2 = 2k 2 .

Therefore, n2 is even, and it again follows from Lemma 1.23 that n is even.
We have reached a contradiction, since we stated earlier that at least one of m and n must be
odd, but we have also shown that they must both be even. Therefore, our original assumption
that x2 = 2 for some x ∈ Q must be false. The proof is complete.

30
We will see another famous example of a proof by contradiction a little later in this section,
but now we move on to proof by induction. Proof by induction is a very structured
proof technique that is particularly useful for proving that some proposition P (n) is true for
all integers n. The basic principle of mathematical induction is explained in the following
statement.
Theorem 1.28. A predicate P (n) is true for all n ∈ N if the following two steps hold:

• Induction basis: P (1) is true.


• Induction step: for all n ∈ N
P (n) is true =⇒ P (n + 1) is true. (16)

Proof. We know that P (1) is true, by the first assumption. We can then iteratively apply
(16) to obtain the chain of implications
P (1) is true =⇒ P (2) is true =⇒ P (3) is true =⇒ . . . ,
which completes the proof.

We call the statement that P (1) is true the induction basis or base case. The step
P (n) =⇒ P (n + 1) is called the induction step, and the assumption that P (n) is true is
called the induction hypothesis.
Let us discuss two examples to demonstrate this type of proof. The first one is (at least
without proof) known to many from school. The young Carl Friedrich Gauss (1777-1855)
knew it already by the age of nine.
Theorem 1.29. For all n ∈ N,
n
X n(n + 1)
k= .
2
k=1

Proof. We first need to check that the statement is valid in the base case when n = 1. This
is true, since
1
X 1(1 + 1)
k=1= .
2
k=1
Now, we need to show that, if the statement is true for n, it is also true for n + 1. That is,
we assume the induction hypothesis
n
X n(n + 1)
k= ,
2
k=1
and we need to use this to show that
n+1
X (n + 1)(n + 2)
k= .
2
k=1
Indeed,
n+1 n
!
X X n(n + 1) n(n + 1) 2n + 2 (n + 1)(n + 2)
k= k +n+1= +n+1= + = .
2 2 2 2
k=1 k=1
We used the induction hypothesis in the second equality.

31
Another important instance of a proof by induction is Bernoulli’s inequality, which will
be essential later.
Theorem 1.30. Fix a real number x ≥ −1 and let n ∈ N. Then

(1 + x)n ≥ 1 + nx. (17)

Proof. We regard x ≥ −1 as fixed, and seek to prove that (17) holds for all n ∈ N by
induction. We first check the base case with n = 1. Then (17) becomes

(1 + x)1 ≥ 1 + 1x.

The two sides of this inequality are the same, and so this is true. Thus the base case is valid.
We now assume the induction hypothesis

(1 + x)n ≥ 1 + nx (18)

and seek to use this to prove that

(1 + x)n+1 ≥ 1 + (n + 1)x.

Indeed,
(18)
(1 + x)n+1 = (1 + x)n (1 + x) ≥ (1 + nx)(1 + x) = 1 + (n + 1)x + nx2 ≥ 1 + (n + 1)x. (19)

The last inequality simply uses the fact that nx2 ≥ 0.

Where have we used the assumption that x ≥ −1 in this proof? It is not so easy to see that
we have used it if we do not pay proper attention. We have in fact used this assumption in
the first inequality of (19), since we need 1 + x ≥ 0 for the logic of this step to make sense.
It is sometimes helpful to use the following interpretation of proof by induction, where we
strengthen our induction hypothesis to assume that the statement we want to prove is valid
for all smaller values. This is called the method of proof by strong induction.
Theorem 1.31. Let k ≥ 1 be an integer. A predicate P (n) is true for all n ∈ N such that
n ≥ k if the following two steps hold

• Induction basis: P (k) is true.

• Induction step: If P (i) is true for all k ≤ i ≤ n then P (n + 1) is true.

Proof. We know that P (k) is true, by the first assumption. Similarly to the proof of Theorem
1.29, we can then repeatedly apply the second assumption to prove that P (n) holds for all
n ≥ k. Indeed,

P (k) is true =⇒ P (k + 1) is true ,


P (k) is true and P (k + 1) is true =⇒ P (k + 2) is true ,
P (k) is true and P (k + 1) is true and P (k + 2) is true =⇒ P (k + 3) is true ,
..
.

which completes the proof.

32
Theorem 1.31 looks slightly more complicated than Theorem 1.29, and one of the reasons
for this is that we have made a generalisation that allows us to consider a base case which is
not k = 1. However, the two methods are essentially the same, and we can also take a larger
base case for the usual proof by induction.
We will use strong induction to prove that every integer can be written as a product of primes.
This is one part of the Fundamental Theorem of Arithmetic. As the name suggests, this is
an extremely important fact in number theory, which tells us that prime numbers form the
building blocks for our number system.
Theorem 1.32. Let n ≥ 2 be an integer. Then there is some i ∈ N and prime numbers
p1 , . . . , pi such that
n = p1 · p2 · · · pi . (20)

Note that this statement does not require that the primes p1 , . . . , pi are distinct. For instance,
to verify that the theorem holds for n = 4, we can observe that 4 = 2·2, and this decomposition
satisfies the required form (20).

Proof. We will use proof by strong induction. For the base case n = 2, we know that n is a
prime, and so it is already in the required form (20) (with i = 1).
Now we assume the induction hypothesis that every 2 ≤ i ≤ n can be written as a product
of primes, and we seek to show that n + 1 can be written as a product of primes. If n + 1 is
a prime number then it automatically has the required form of (20). If n + 1 is not prime
then there exist two integers 2 ≤ a, b < n + 1 such that

n + 1 = a · b. (21)

By the induction hypothesis, we can write

a = p1 · · · pi , and b = q1 · · · qj ,

for some primes p1 , . . . , pi and q1 , . . . , qj . It then follows from (21) that

n + 1 = a · b = p1 · · · pi · qi · · · qj .

After relabelling the primes qi suitably, this gives the required form (20), and the proof is
complete.

We conclude this section by using Theorem 1.32 as part of another ancient and famous proof
by contradiction.
Theorem 1.33. The set P of all prime numbers is infinite.

Proof. Suppose for a contradiction that P is finite. Then we can write

P = {p1 , p2 , . . . , pn }

for some integer n. In this notation, pi denotes the ith prime number. So, p1 = 2, p2 = 3,
p5 = 11, and pn is the largest prime. Then consider the number

m := p1 · p2 · · · pn + 1.

33
Since m is larger than all of the elements of P, it is not a prime. But we know from Theorem
1.32 that m can be written as a product of primes. In particular, it follows that, for some
1 ≤ i ≤ n, pi divides m. But pi ≥ 2, and so
m 1
= p1 · · · pi−1 · pi+1 · · · pn + ,
pi pi

which is not an integer. This is a contradiction (since pi both divides and does not divide m)
and so the proof is complete.

34
1.6 Bounded sets, infimum and supremum

In this subsection, we want to understand the “smallest” and the “largest” element of a set,
where possible. To define this formally, we first need the definition of a bounded set (in R).
Definition 1.34. Let A ⊂ R.

• We say A is bounded from above if and only if

∃ c ∈ R : ∀ a ∈ A, a ≤ c.

We call c an upper bound for A, and write c ≥ A or A ≤ c.

• We say A is bounded from below if and only if

∃ c ∈ R : ∀ a ∈ A, c ≤ a.

We call c a lower bound for A, and write c ≤ A or A ≥ c.

• We say A is bounded if and only if A is bounded from above and bounded from below.

Example - Let a, b ∈ R such that a < b. Then for the closed interval I = [a, b] we have that
a, a − 1 and a − 42 are lower bounds for I and b and b + 42 are upper bounds, and the same
is true for the corresponding open interval (a, b). So, upper and lower bounds are not unique
in these cases (and in general).
To fix a specific upper/lower bound, let us first define the minimum and maximum of a set.
Definition 1.35. Let A ⊂ R be a non-empty set and t ∈ R. Then, t is called a minimal
element or minimum of A, denoted by min A := t , if and only if

• t ≤ A (i.e., t is a lower bound for A), and

• t ∈ A.

t is called a maximal element or maximum of A, denoted by max A := t , if and only if

• A ≤ t (i.e., t is an upper bound for A), and

• t ∈ A.

If the maximum or minimum exists, it is unique. (Why?)


Example - Let a, b ∈ R with a < b. Then, min[a, b] = a and max[a, b] = b.
Example - The set of the natural numbers N ⊂ R is bounded from below, with min N = 1.
However, maxima and minima do not have to exist. For instance, while b is the least upper
bound of both [a, b] and (a, b), we have that b ∈ [a, b], but b ∈/ (a, b) Hence, b is not the
maximum of (a, b), and in fact the set (a, b) does not have a maximum (or minimum). But
still, we would like to work with the “best possible” upper and lower bounds for such a set,
which are clearly a and b in this example. For this we define the infimum and the supremum
as the greatest lower bound and the least upper bound, respectively. These objects
will be very important in the upcoming analysis.

35
Definition 1.36. Let A ⊂ R be non-empty and t ∈ R. Then, t is called the greatest lower
bound or infimum of A, denoted by inf A := t , if and only if

• t ≤ A (i.e. t is a lower bound), and

• x ≤ A =⇒ x ≤ t (i.e. there is no greater lower bound).

t is called the least upper bound or supremum of A, denoted by sup A := t , if and only if

• t ≥ A (i.e. t is an upper bound), and

• x ≥ A =⇒ x ≥ t (i.e. there is no smaller upper bound).

If A is not bounded from above we set sup A := ∞. If A is not bounded from below then we
put inf A = −∞.
For the empty set, we define

sup ∅ = −∞, and inf ∅ = ∞.

If inf A ∈ A, then inf A = min A, and if sup A ∈ A, then sup A = max A. In words, if the
infimum (or supremum) of a set A is contained in A, then A has a minimum (or maximum)
which has the same value.
Moreover, the infimum and supremum are uniquely determined. To see this for the supremum,
assume that there are two suprema t1 and t2 for A. Since sup A = t1 , we have A ≤ t1 . In
addition, since sup A = t2 , we obtain by the second defining property above that x ≥ A =⇒
x ≥ t2 . Setting x = t1 , we have t1 ≥ t2 . But we can also make this argument in reverse to
show that t1 ≤ t2 . Hence t1 = t2 .
Example - Let a, b ∈ R with a < b. Then,

min[a, b] = min[a, b) = inf[a, b] = inf[a, b) = inf(a, b] = inf(a, b) = a

and
max[a, b] = max(a, b] = sup[a, b] = sup[a, b) = sup(a, b] = sup(a, b) = b.
However, min(a, b), min(a, b], max(a, b) and max[a, b) do not exist.
Example - Let A = {x2 : x ∈ (−1, 1)}. Then A = [0, 1). Hence

inf A = min A = 0, and sup A = 1.

Let us state an equivalent definition of the supremum and infimum for bounded sets in R.
Although it looks more complicated at first sight, this formulation is sometimes very helpful.
Moreover, thinking in this way will be a helpful preparation for some of the real analysis
we consider later, and particularly for understanding important definitions of convergence,
continuity, and much more.
Definition 1.37. Let A ⊂ R be bounded from below. Then, t = inf A if and only if

• t ≤ A (i.e., t is a lower bound for A), and

36
• ∀ ϵ > 0, ∃ a ∈ A : a < t + ϵ (i.e. t comes arbitrarily close to A).

Analogously, let A ⊂ R be bounded from above. Then, t = sup A if and only if

• A ≤ t (i.e., t is an upper bound for A), and

• ∀ ϵ > 0, ∃ a ∈ A : a > t − ϵ (i.e. t comes arbitrarily close to A).

Exercise - Let A ⊂ R. Define


−A := {−a : a ∈ A}
and suppose that sup A = k for some k ∈ R. Prove that inf(−A) exists and inf(−A) = −k.
The next result reflects the intuitive notion that the real number line is complete, i.e. there
are no gaps in R. This is an important assumption (i.e. an axiom) that underlies much of the
work we will do in the Mathematics for AI curriculum, although it is mostly working behind
the scenes of the mathematics we study.
Axiom 1.38 (Completeness Axiom). Every non-empty set A ⊂ R that is bounded from above
has a least upper bound. That is, there always exists t ∈ R such that t = sup A.

Exercise - Prove that every non-empty set A ⊂ R that is bounded from below has a greatest
lower bound. (Hint: Use the completeness axiom and a previous exercise.)
Note that the completeness axiom is false if we consider Q instead of R. Consider for instance
the set √
A = {x ∈ Q : x ≤ 2}.
This set is bounded, and we can find a number c ∈ Q such that c ≥ A (for instance, c = 32
works). However, we cannot find a least upper bound among all such rational c. Suppose

that we try to do√this. We are looking for a rational number c which is larger than 2 but
also as close to 2 as possible. But,√no matter which C we choose, we can always find a
smaller one which is still larger than 2.
This idea is considered more formally in the next theorem, where we state the Archimedean
property. This is based on the fact that the set of natural numbers is unbounded. Even
though this property seems rather obvious and unimpressive, it was of significant importance
for real analysis, and will be implicitly used repeatedly throughout the Mathematics for AI
study.
Theorem 1.39. The following assertions hold:

1. The Archimedean property:

∀x ∈ R, ∃ n ∈ N : n > x.

In other words, N has no upper bound in R.


1
2. ∀ϵ > 0, ∃ n ∈ N : n < ϵ.
m
3. For any x, y ∈ R with x < y, there exists a rational number n ∈ Q such that
m
x< < y.
n

37
The last of these three points is interesting; it tells us that, no matter how close the two real
numbers x and y we consider are, we can always find a rational number in between them.
Another way to think of this is as follows: no matter where you are on the real number line,
you are always arbitrarily close to a rational number.

Proof. We first prove the first point using proof by contradiction. For this, we assume that
N is bounded. Thus, the supremum x = sup N exists by the completeness axiom. As x is the
least upper bound of N, x − 1 is not an upper bound for N, so there exists some n ∈ N with

x − 1 < n.

Adding 1 to both sides of this inequality gives

x < n + 1.

But n + 1 ∈ N (since n ∈ N) and therefore x is not an upper bound for N. This contradicts
the fact that x = sup N. So, N must be unbounded.
Now we prove the second point of the theorem. Let ϵ > 0 be arbitrary. We can apply the
Archimedean property with x = 1ϵ , which implies that there exists n ∈ N with

1
n> .
ϵ
Rearranging this inequality, we obtain
1
< ϵ,
n
as required.
We now prove the third bullet-point. We will only check the case x, y > 0, since the other
possible cases can be checked similarly. Then we are looking for m, n ∈ N such that nx <
1
m < ny. By the Archimedean property, there exists n ∈ N with n > y−x , which rearranges
to give
ny > 1 + nx. (22)
Now take the smallest natural number m such that m > nx (such an m certainly exists).
Then m > nx and m − 1 ≤ nx. The latter inequality combined with (22) implies that

m ≤ nx + 1 < ny.

We have therefore proved that


nx < m < ny.

Based on this, we can easily calculate certain infima and suprema of (discrete) sets.
Example Let A = { n1 : n ∈ N}. Then

inf A = 0, and max A = 1 (23)

To prove (23), first note that 0 < n1 ≤ 1 for all n ∈ N. So, 0 is a lower bound for A, and 1 is
an upper bound. Since 1 ∈ A (for n = 1), we therefore obtain that max A = 1.

38
For the infimum we need to show the there is no larger lower bound. For this, let ϵ > 0 be
arbitrary, and suppose for a contradiction that ϵ is a lower bound for A. By the Archimedean
property there exists an n ∈ N with n1 < ϵ. Therefore, we have found some a = n1 ∈ A such
that a < ϵ. This contradicts the assumption that ϵ is a lower bound, and completes the proof.
Exercise - Let  
1
A= : n ∈ N .
n2 − n − 3
Calculate inf A, sup A, min A and max A, if these quantities exist.

39
1.7 Some basic combinatorial objects, identities and inequalities

We begin this section by defining some important combinatorial quantities. These quantities
are heavily used when it comes to discrete mathematics or elementary probability theory, as
they represent the number of permutations or subsets of certain size.

Definition 1.40. The factorial n! of a natural number n ∈ N is the product


n
Y
n! = 1 · 2 · · · n = i.
i=1

In addition, we define 0! = 1.
For any n, k ∈ N0 with n ≥ k, the binomial coefficient nk (we say “n choose k”) is defined


by the formula  
n n! n · (n − 1) · · · (n − k + 1)
:= = .
k k!(n − k)! k · (k − 1) · · · 1

Observe that        
n n n n
= 1, = n, and = .
0 1 k n−k

The factorial n! is the number of permutations (orderings) of a set of n objects. If you pick
your first arbitrary element, you have n possibilities for your choice. When it comes to your
second decision you have just n − 1 options left, and so on.
The binomial coefficient nk represents the number

 of ways to choose k (unordered) outcomes
from a set of n possibilities. For example, n3 is the number of three-element subsets of
{1, . . . , n} and 42 is the number of football matches that are required in a world cup group


(4 teams, with each pair of teams playing each other once).


The following is a useful formula for binomial coefficients.

Lemma 1.41. Let n, k ∈ N with k ≤ n − 1. Then we have


     
n+1 n n
= + .
k+1 k k+1

The “simple” explanation of this equality is, that subsets of the subsets {1, . . . , n + 1} with
k + 1 elements can be split into those that contain the number n + 1 and those that don’t
contain it. To find the number of sets that contain n + 1, we need to count all k-element
subsets of {1, . . . , n}, and there are nk of them. The number of all sets that don’t contain
n

n + 1, is the same as the number of all (k + 1)-element subsets of {1, . . . , n}, which is k+1 .
So the total number is their sum.
Here is a more formal and formulaic proof of Lemma 1.41.

40
Proof.
   
n n n! n!
+ = +
k k+1 k!(n − k)! (k + 1)!(n − (k + 1))!
n! n!
= +
k!(n − k)! (k + 1)!(n − k − 1)!
n!(k + 1) n!(n − k)
= +
(k + 1)!(n − k)! (k + 1)!(n − k)!
n!((k + 1) + (n − k))
=
(k + 1)!(n − k)!
n!(1 + n)
=
(k + 1)!(n − k)!
(n + 1)!
=
(k + 1)!(n − k)!
 
(n + 1)! n+1
= = .
(k + 1)!((n + 1) − (k + 1))! k+1

Based on this, we can prove another famous and highly applicable theorem; the Binomial
Theorem.

Theorem 1.42. Let x, y ∈ R and n ∈ N. Then we have


n  
n
X n k n−k
(x + y) = x y .
k
k=0

Proof. Consider expanding the product (x + y)n (i.e. multiplying out all of the brackets).
Every term we obtain must take the form xk y n−k for some 0 ≤ k ≤ n. This is simply because
the powers must add up to n (each product consists of exactly n parts).
It remains to consider how many times the term xk y n−k appears. This will indeed be nk ,


since a term xk y n−k appears if we choose exactly k x from the total of n choices.

Alternatively, we can prove the Binomial Theorem by induction, making use of Lemma 1.41.
The proof by induction is a bit longer, but some may prefer this more formal approach.
Example - If we set x = y = 1 in Theorem 1.42 we get
n  
n
X n
2 = .
k
k=0

We can also justify this identity by thinking in terms of sets. Both sides of this identity count
the number of subsets of {1, 2, . . . , n} (or, moreover, any set of n elements). To see that the
number of such sets is 2n , simply observe that for each of the n elements, we have 2 choices
in our construction of a set A ⊂ {1, . . . , n}; we either include the element in A, or we do not.

41
On the other hand, we can also count the number ofsubsets of {1, . . . , n} by counting the
number of subsets with a fixed size k, which gives nk , and then summing over all possible
values of k.
Example - Setting x = −1 and y = 1 in Theorem 1.42 gives the identity
n  
X n
0= (−1)k .
k
k=0

The binomial theorem can also be used to (easily) improve upon Bernoulli’s inequality (see
Theorem 1.30.

Corollary 1.43. Let x ≥ 0. Then, for any m ∈ {1, . . . , n}, we have


 
n n m
(1 + x) ≥ 1 + x .
m

Proof. Apply Theorem 1.42 with y = 1. We obtain


n    
n
X n k n m
(x + 1) = x ≥1+ x .
k m
k=0

For the inequality above, we simply take only the k = 0 and k = m terms from the sum. The
other terms are all non-negative from the assumption that x ≥ 0.

If we apply this theorem with m = 1, we recover the lower bound from Bernoulli’s inequality.
Note however that this Corollary 1.43 is only valid for x ≥ 0, and it is only guaranteed to
give a proper improvement when x ≥ 1. On the other hand, Bernoulli’s inequality is valid
for all x ≥ −1.

42
1.8 Some important functions

In this subsection we want to briefly discuss some particularly important functions. We start
with the absolute value of a number, since this function and its generalisations are heavily
used throughout the Mathematics for AI courses.
The absolute value of x ∈ R is defined by:
(
x if x ≥ 0
|x| = . (24)
−x if x < 0
The absolute value is a function from R to R+ (recall that R+ denotes the set of all non-
negative real numbers, i.e. R+ = [0, ∞)). We write | · | : R → R+ . The graph of the function
is shown below.

Figure 10: The graph of the absolute value function f (x) = |x|

We list some important properties of the absolute value function in the following lemma.
Lemma 1.44. The absolute value function satisfies the following properties.

1. For all x ∈ R, |x| ≥ 0.


2. |x| = 0 ⇐⇒ x = 0.
3. For all x ∈ R, |x| ≥ x and |x| ≥ −x.
4. For all x, y ∈ R, |xy| = |x||y|.
5. For all x ∈ R, | − x| = |x|.
6. For all x ∈ R and z ∈ [0, ∞),
|x| ≤ z ⇐⇒ −z ≤ x ≤ z.
and
|x| < z ⇐⇒ −z < x < z.

43
Proof. The first 3 points follow immediately from the definition of the absolute value.
To prove point 4, we split into 4 cases.
Case 1 - Suppose that x, y ≥ 0. Then xy ≥ 0, and so |xy| = xy. Also, |x| = x and |y| = y.
It follows that |x||y| = xy = |xy|.
Case 2 - Suppose that x, y < 0. Then xy > 0, and so |xy| = xy. Also, |x| = −x and
|y| = −y. It follows that |x||y| = (−x)(−y) = xy = |xy|.
Case 3 - Suppose that x ≥ 0 and y < 0. Then xy ≤ 0, and so |xy| = −xy. Also, |x| = x and
|y| = −y. It follows that |x||y| = x(−y) = −xy = |xy|.
Case 4 - Suppose that x < 0 and y ≥ 0. Then xy ≤ 0, and so |xy| = −xy. Also, |x| = −x
and |y| = y. It follows that |x||y| = (−x)y = −xy = |xy|.
The proof of points 5 and 6 is left as an exercise.

The case distinction that we have made in the proof of point 4 in Lemma 1.44 gives a good
illustration of how to deal with absolute values in general. We will use a similar approach
again now, as we try to understand inequalities involving absolute values.
Example - Consider the inequality |x − 1| < 2. We would like to determine all values of x
which satisfy this inequality. Define the set

L := {x ∈ R : |x − 1| < 2}.

There are two situations that need to be considered separately, according to whether or not
x − 1 is positive. We split L into two parts. Write L = L1 ∪ L2 where

L1 := {x ∈ L : x < 1}, and L2 := {x ∈ L : x ≥ 1}.

The sets L1 and L2 are disjoint (i.e. the intersection L1 ∩ L2 is equal to the empty set). It
remains to find a simplified expression for the set L1 and L2 , and thus to determine precisely
what L looks like.
Case 1 - Consider x < 1. In this range |x − 1| = −(x − 1) = 1 − x, and so the inequality
becomes
1 − x < 2,
which rearranges to x > −1. So,

L1 = {x ∈ R : x < 1 and x > −1} = (−1, 1).

Case 2 - Consider the range x ≥ 1. Then |x−1| = x−1 and the inequality under consideration
simplifies to the form
x − 1 < 2,
which is equivalent to x < 3. Therefore,

L2 = {x ∈ R : x ≥ 1 and x < 3} = [1, 3).

Putting things together, we have

L = L1 ∪ L2 = (−1, 1) ∪ [1, 3) = (−1, 3).

44
|x − 1| 3

-2 -1 1 2 3 4

Figure 11: A graphical representation of the inequality and its solution set

We remark here that we also could have reached the conclusion that L = (−1, 3) by applying
point 6 from Lemma 1.44. However, for more complicated inequalities involving absolute
values Lemma 1.44 is not sufficient, and we really do need to use such a case distinction. The
following is such an example.
Example - Consider the inequality

2|x + 3| − 4|x − 1| ≥ 8x − 2. (25)

We want to determine the nature of the set L := {x ∈ R : 2|x + 3| − 4|x − 1| ≥ 8x − 2}. In


this case, we have to distinguish three regions (x ≤ −3, −3 < x ≤ 1 and x > 1), since the
expressions in the absolute values change their signs at these points. Write L = L1 ∪ L2 ∪ L3
where

L1 := {x ∈ L : x ≤ −3}, L2 := {x ∈ L : −3 < x ≤ 1}, and L3 := {x ∈ L : x > 1}.

Case 1 - Suppose that x ≤ −3. Then the inequality (25) becomes

2(−x − 3) − 4(−x + 1) ≥ 8x − 2.

After some basic algebra, this is equivalent to x ≤ − 43 . It then follows that


 
4
L1 = x ∈ R : x ≤ −3 and x ≤ − = (−∞, −3].
3

Case 2 Suppose that −3 < x ≤ 1. Then the inequality (25) becomes

2(x + 3) − 4(−x + 1) ≥ 8x − 2.

After some basic algebra, this is equivalent to x ≤ 2. It then follows that

L2 = {x ∈ R : −3 < x ≤ 1 and x ≤ 2} = (−3, 1].

45
Case 3 - Suppose that x > 1. Then the inequality (25) becomes

2(x + 3) − 4(x − 1) ≥ 8x − 2.

After some basic algebra, it is equivalent to x ≤ 65 . It then follows that


   
6 6
L3 = x ∈ R : x > 1 and x ≤ = 1, .
5 5

Putting everything together, we conclude that


 
6
L = L1 ∪ L2 ∪ L3 = −∞, .
5

16

2|x + 3| − 4|x − 1| 8

8x − 2

-4 -3 -2 -1 1 2

-8

-16

Figure 12: A graphical representation of the inequality (25) and its solution set

The following is probably the most well known inequality involving absolute values. We will
see this inequality reappearing in more general scenarios throughout the Mathematics for AI
material. This is called the triangle inequality for the absolute value function.

Theorem 1.45. Let x, y ∈ R. Then

|x + y| ≤ |x| + |y|.

Proof. We already know from Lemma 1.44 that |z| ≥ z and |z| ≥ −z for all z ∈ R. We
therefore have
|x| + |y| ≥ x + y
and
|x| + |y| ≥ −x − y = −(x + y).
Since |x + y| is equal to either x + y or −(x + y), we always have |x + y| ≤ |x| + |y|, as
required.

46
The triangle inequality (Theorem 1.45) also implies that

|x − z| = |x − y + y − z| ≤ |x − y| + |y − z|, (26)

for all x, y, z ∈ R. Hence, we can consider the absolute value as a “distance” between numbers.
The triangle inequality in the form (26) can be interpreted as saying that “the shortest route
from x to z is to go directly, rather than stopping at another number y”.
Moreover, we obtain the following (less obvious) inequality, which will also be useful later.
Corollary 1.46. Let x, y ∈ R. Then

|x| − |y| ≤ |x − y|.

Proof. Theorem 1.45 yields


|a + b| − |b| ≤ |a| (27)
for all a, b ∈ R. Set a = x − y and b = y, so that (27) gives

|x| − |y| ≤ |x − y|.

On the other hand, if we set a = x − y and b = −x, (27) gives

| − y| − | − x| ≤ |x − y|,

which is equivalent to
−(|x| − |y|) ≤ |x − y|.
Since |x| − |y| is equal to either |x| − |y| or −(|x| − |y|), it follows that we always have the
intended inequality
|x| − |y| ≤ |x − y|.

We now turn to some other elementary functions. An affine linear function is a function
f : R → R of the form
f (x) = ax + b
with a, b ∈ R. If a = 0, then f is called a constant function. If a ̸= 0 and b = 0 we call f
a linear function. Note that linear functions satisfy f (x + y) = f (x) + f (y). (Is the same
true for affine linear functions?) Moreover, they are special cases of polynomial functions,
which are defined to be functions p : R → R of the form
n
X
p(x) = ai xi ,
i=0

with a0 , . . . an ∈ R. For a non-zero polynomial, i.e. a polynomial where at least one of the
coeffecients ai is not equal to zero, we may assume that an ̸= 0. The value n is the degree
of the polynomial p. The degree of the zero polynomial (i.e. the function f : R → R such
that f (x) = 0 for all x ∈ R) is not defined.
Examples - Consider the functions

f (x) = 2x − 3, g(x) = 4x3 + 5x, h(x) = π, j(x) = 2x.

47
All 4 functions are polynomials. f , h and j are also affine linear functions. j is also a linear
function. h is also a constant function.
Closely related to polynomials are power functions. These are functions f : R+ → R of
the form f (x) = xa , for some (fixed) a ∈ R. In the case when a is a natural number, such a
power function is a special kind of polynomial called a monomial.
The restriction of the domain to the positive reals is necessary in order for these functions
to be well defined in general. For instance, if we consider the power function f (x) = x1/2 ,
this is the function which takes the positive square root of x. In order for this to be a real
number, we may only consider x ≥ 0.
The next image illustrates the behaviour of the power function for some different choices of a.
Notice that there are some crucial points where the behaviour changes. The function grows
faster as a increases, and we see a different shape of curve for a > 1, a = 1 and 0 < a < 1.
Moreover, the function looks very different when a is negative; instead of growing with x, the
function is now decreasing.

Figure 13: The graph of the function f (x) = xa for different a.

The most natural power functions to consider are those for which a = m/n is a rational
number, in which case we can build a power function f (x) = xa as a composition of familiar

functions g(x) = xm and h(x) = n x. One can also consider power functions of the form
f (x) = xa where a is an irrational number. We avoid giving a full formal definition, since it
is rather technical. Interested students may wish to consult Mario Ullrich’s notes from the
previous Mathematics for AI 1 course (see page 37 of the notes). Roughly speaking, we can
approximate f (x) = xa by finding a rational number b which is very close to a and calculating
xb .
Another well-known class of functions are the exponential functions, which characterise
rapid growth or decay. Exponential functions are defined to be functions f : R → R+ of the
form
f (x) = ax ,

48
where the base a > 0 is a fixed real number. That is, in contrast to power functions, the
variable x ∈ R is in the exponent.

Figure 14: The graph of the function f (x) = ax for different a.

Among all exponential functions, one is particularly important. This is when we choose the
basis a = e ≈ 2.7182818284, where e is Euler’s number. This special (irrational) number e
is one of the most important constants in mathematics. It connects many different concepts
and appears in some beautiful formulae. It may be defined by different means. Indeed,

1 n 1 n X 1
    
e = sup 1+ : n ∈ N = lim 1 + = .
n n→∞ n k!
k=0

Don’t worry too much about these complicated looking expressions for now. We will come
back to them much later.
The exponential functions often appear together with the logarithmic functions, which
are defined as their inverses. Let b > 0 be a real number (we usually restrict our attention
to the case b > 1. The function logb : R+ → R is defined to be the inverse of the exponential
function g(x) = bx . Recalling the properties of inverse functions, this means that

logb (bx ) = x, ∀x ∈ R

and
blogb (x) = x, ∀ x ∈ R+ . (28)
We can use (28) to give a wordy definition of the function logb . “The value of logb (x) is
defined to be the number y which satisfies by = x.”

49
Figure 15: The graph of the function f (x) = logb (x) for different b.

Once again, an important special case occurs when b = e (i.e. Euler’s number). We use the
shorthand ln(x) = loge (x).
Using familiar calculation rules for exponentiation we obtain the following calculation rules
for logarithms. For all a, b, x, y > 0:

logb xy = y logb x
logb (xy) = logb x + logb y
 
x
logb = logb x − logb y
y
loga x
logb x = .
loga b

Trigonometric functions play a crucial role in analysis and geometry. The most impor-
tant are sin x, cos x and tan x. We can give a geometric definition of these function using the
unit circle, as the following image illustrates.

50
Figure 16: Illustration of trigonometric functions.

The variable x ∈ R corresponds to the (anticlockwise) angle that is enclosed between the
horizontal axis and the line to the point (cos x, sin x). We will usually use the unit of radians
to refer to an angle.
 
Example - Consider the point p = √12 , √12 on the unit circle. We can calculate (using high
school trigonometry) that the angle between the horizontal axis and the line to the point p
is π4 . Therefore, we have
  
1 1 π   π 
√ ,√ = cos , sin .
2 2 4 4
 
For the point √12 , − √12 , the (anticlockwise) angle is 2π − π4 = 7π 4 , by symmetry, and so
      
1 1 7π 7π
√ , −√ = cos , sin .
2 2 4 4

51
Figure 17: Illustration of the example above.

The functions sin x and cos x are defined for all x ∈ R, see the illustration below.

Figure 18: Graphical representation of sine and cosine.

The sine and cosine functions are periodic. In particular, we always have
sin(x + 2π) = sin x, and cos(x + 2π) = cos x.

52
This means that we have infinitely many choices for representing a point on the unit circle
using sine and cosine functions. For instance, recalling an earlier example, we have
    π      
1 1 π  9π 9π
√ ,√ = cos , sin = cos , sin
2 2 4 4 4 4
    
17π 17π
= cos , sin
4 4
= ...
However, we will typically use the simplest possible representation of a point on the unit
circle, with the angle in the interval [0, 2π). In particular, every point p on the unit
circle can be expressed uniquely in the form p = (cos x, sin x) for some x ∈ [0, 2π).
By the periodicity of sine and cosine, as well as observing that the graph of one function can
be obtained by shifting the other, we obtain the following simple calculation rules.
sin(−x) = − sin x
 π
sin x + = cos x
2
sin(x + π) = − sin x
sin(x + 2π) = sin x
cos(−x) = cos x
 π
cos x + = − sin x
2
cos(x + π) = − cos x
cos(x + 2π) = cos x.

Some important values of the sine and cosine functions are listed in the following table.

π π π π 2π 3π 5π
ϕ 0 6 4 π
√3 2 √3 4 6
1 √1 3 3 √1 1
sin ϕ 0 1 0
√2 2 2 2 2 2√
3 √1 1 3
cos ϕ 1 2 2 2 0 − 12 − √12 − 2 −1

We can use the results from the table above along with the fact that sin(x + π) = − sin(x)
and cos(x + π) = − cos(x) to deduce some of the corresponding values in the interval (π, 2π).
In fact, we can deduce most of these trigonometric values by using just a few assumptions,
as the following exercise shows.
Exercise - You are given the information that
π  1 π  1  π  √3
sin = , sin = √ , and sin =
6 2 4 2 3 2
and the following facts about the sine and cosine functions:
sin(−x) = − sin x
 π
sin x + = cos x
2
 π
cos x + = − sin x.
2

53
Using only these facts, prove that
π  √
3
cos = ,
6 2
π  1
cos = √ ,
4 2
π  1
cos = ,
3 2

 
2π 3
sin = ,
3 2
 
3π 1
sin = √ ,
4 2
 
5π 1
sin = .
6 2

We also recall some important trigonometric identities. Most importantly of all, for all
x ∈ R,
sin2 x + cos2 x = 1.
We will also need the trigonometric addition formulae.

sin(x + y) = sin x cos y + cos x sin y,


cos(x + y) = cos x cos y − sin x sin y.

Based on sine and cosine, we can define tan x by


sin x π
tan x := , ∀ x ̸= + kπ, k ∈ Z.
cos x 2
This restriction is necessary, since cos π2 + kπ = 0 for all k ∈ Z. The function tan x is not


defined at these values. So, tan is a function with domain R \ { π2 + kπ : k ∈ Z} and codomain
R.

54
Figure 19: Graphical representation of tan.

Additionally, we can define the inverse trigonometric functions. Let’s first consider the inverse
of the cosine function. We know from our earlier work (see Theorem 1.24) that a function
is invertible if and only if it is a bijection. We need to take a little care, since the cosine
function is not a bijection (it is not injective, for instance cos 0 = cos(2π) = cos(4π) = . . . ).
However, if we restrict the domain of the cosine function suitably, we can make it into a
bijection. In particular, one may see from the graph of the cosine function that the function
restricted to the domain [0, π] and codomain [−1, 1] is a bijection. We can therefore define
the arccosine function
arccos : [−1, 1] → [0, π].
The function is defined by
y = arccos x ⇐⇒ x = cos y and y ∈ [0, π].
The arccosine function is just a special name for the inverse of the cosine function. We also
use the intuitive notation cos−1 for the same function.
By making a similar restriction of the relevant domains, we define the arcsine function
h π πi
arcsin : [−1, 1] → − , ,
2 2
and the arctangent function  π π
arctan : R → − , ,
2 2
via
h π πi
y = arcsin x ⇐⇒ x = sin y and y ∈ − , ,
 2π 2π 
y = arctan x ⇐⇒ x = tan y and y ∈ − , .
2 2

55
1.9 Complex numbers

In section 1.4, we discussed how we extend our number system gradually from the set of
natural numbers to the set of real numbers in order to have the tools to solve certain equations.
However, at a certain point in mathematics, we cannot answer all of the questions that arise
using only real numbers. This point is reached when we want to find a solution of the equation

x2 = −1, (29)

which has no solution x ∈ R.


We are not satisfied with this situation, and so we introduce a new element to solve the
equation. Define √
i := −1.
Then, by definition, the equation (29) has two solutions, namely ±i. This extension of the
real numbers leads us to the (field of) complex numbers.

Definition 1.47. The set of all complex numbers is defined by

C := {x + iy : x, y ∈ R}.

We often use z to denote a generic element of C (so z = x + iy for some x, y ∈ R).


For a complex number z = x + iy we call x the real part of z and y the imaginary part.
We write Re(z) = x and Im(z) = y.
Let z = x + iy ∈ C. We define the complex conjugate of z by

z̄ := x − iy.

Note that the imaginary part of a complex number is in fact a real number!
The representation of the term z = x + iy is called the canonical representation. There
are alternative ways to express a generic complex numbers, as we shall see later.
Let’s first discuss how to do some simple calculations with complex numbers. We will always
follow the same rules that we are familiar with for making calculations with real numbers.
Let z = x + iy and w = u + iv be complex numbers (where x, y, u and v are real numbers).
Addition is straightforward, as we collect the real and imaginary parts together:

z + w = (x + iy) + (u + iv) = (x + u) + i(y + v).

For multiplication, we follow familiar rules for expanding multiplication with brackets, sim-
plifying whenever possible by making use of the fact that i2 = −1. For example,

zw = (x + iy)(u + iv) = xu + i(yu + xv) + i2 yv = xu + i(yu + xv) − yv = (xu − yv) + i(yu + xv).
(30)
A useful consequence of (30) is that z z̄ is always a non-negative real number. Indeed, let
z = x + iy be a complex number. Then

z z̄ = (x + iy)(x − iy) = x2 + y 2 .

56
We stated above that C is a field. Indeed the operations of addition and multiplication are
associative, commutative, distributive. The additive and multiplicative identity elements are
the same as they are for the real numbers. The additive inverse of z = x + iy is naturally
−z = −x − iy. It remains to verify the existence of a multiplicative inverse for z ̸= 0. We
simply define
1
z −1 = .
x + iy
We should check that this really is a complex number (i.e. we should check that it can
be expressed in the canonical form). For this, we can make use of complex conjugate to
transform the denominator into a real number. Indeed,
1 1 x − iy x − iy x y
z −1 = = = 2 2
= 2 2
−i 2 .
x + iy x + iy x − iy x +y x +y x + y2
Hence, C with these operations is also a field.
Example - Consider z = 4 + 2i. Then z is a complex number with Re(z) = 4, Im(z) = 2,
z̄ = 4 − 2i and
4 2 1 1
z −1 = 2 2
−i 2 2
= −i .
4 +2 4 +2 5 10

Exercise - Show that, for any z1 , z2 ∈ C,

z1 + z2 = z1 + z2 and z1 z2 = z 1 z 2 .

Show that, for any z ∈ C,


(z) = z.

Example - Recall that i is just a symbol for −1. We can use this symbol to express square
roots of other negative real numbers. For example, we have
√ p √ √
−4 = (−1)(4) = −1 4 = 2i.

Moreover, for arbitrary positive real number a, we have


√ √
−a = i( a).

We can even calculate the square root of a complex number, and this will always be another
complex number!

Example - We would like to calculate i. Given the sentence before this example, let us
assume that this is another complex number z = x + iy (with x, y ∈ R). We have z 2 = i,
which means that
i = (x + iy)2 = x2 − y 2 + i(2xy). (31)
If we compare the real and imaginary parts of both sides of (31), we arrive at a system of
two equations:

x2 − y 2 = 0,
2xy = 1.

57
We seek to solve this for x and y. The first equation tells us that either x = y or x = −y.
However, if x = −y then we have a contradiction with the second equation (since the left
hand side is not positive), and so it must be the case that x = y.
The second equation then becomes 2x2 = 1, and so either x = √1 or x = − √12 . It follows
2
that
1 2
 
1
√ + i√ =i
2 2
and
1 2
 
1
−√ − i√ =i
2 2
That is,

 
1 1
i = ± √ + i√ . (32)
2 2
Let’s emphasise again, that even though we have introduced a new number i to our universe,
we are still using all of the same rules for calculations that we know from our mathematical
education, including all of the basic rules that we have been repeating since early school days.
The complex numbers are just an extension of what we already know, we should not think of
them as being something completely new. For instance, if we set all of the imaginary parts to
be equal to zero, all we have done so far in this section is to make some trivial observations
about real numbers.
One can identify an element of the set C with a tuple (x, y) of real numbers. Each complex
number z = x + iy can therefore be illustrated as a point in the plane, which is called the
complex plane. The coordinate axes are the called real and imaginary axis.

Figure 20: Illustration of the complex plane.

Exercise - Think about where to draw the complex conjugate z̄. What about −z?
We define the absolute value of a complex number z = x + iy to be the length of the straight
line from (0, 0) to (x, y). Here is a formulaic version of this definition.

58
Definition 1.48. Let z = x + iy ∈ C. We define the absolute value of z to be
√ p
|z| := z z̄ = x2 + y 2 .

In particular, note that |z| is always a real number, since z z̄ is always a positive real number.
We have used the same notation here as we used for the absolute value of a real number
in Section 1.8. This is very deliberate! The definition of the absolute value for a complex
number is an extension of the previous definition over R. Indeed, if z is a real number (i.e.
the imaginary part is equal to 0), then according to Definition 1.48,
p √
|z| = x2 + y 2 = z 2 .

This is equal z if z ≥ 0 and is equal to −z otherwise, which matches the definition given in
(24).
Several calculation rules for the absolute value are just the same as for the absolute value
over R. The next lemma is an analogue (and extension) of Lemma 1.44.
Lemma 1.49. The absolute value function satisfies the following properties.

1. For all z ∈ C, |z| ≥ 0.

2. |z| = 0 ⇐⇒ z = 0.

3. For all z ∈ C, |z| ≥ Re(z) and |z| ≥ −Re(z). Also,

Re(z) = |z| ⇐⇒ z ∈ R and z ≥ 0.

4. For all z ∈ C, |z| ≥ Im(z) and |z| ≥ −Im(z). Also,

Im(z) = |z| ⇐⇒ z = iy, y ∈ R, and y ≥ 0.

5. For all z, w ∈ C, |zw| = |z||w|.

6. For all z ∈ C, |z| = |z̄|.

7. For all z ∈ C, |z| = | − z|.

Proof. 1. This follows immediately from the definition of the absolute value, since we take
the positive square root.

2. This is left as an exercise.

3. Write z = x + iy. To prove that |z| ≥ Re(z) we need to show that


p
|z| = x2 + y 2 ≥ x. (33)

This is true, because y 2 is non-negative.


For the if and only if statement, let us first verify the “⇐” direction. If z is real and
non-negative then y = 0 and so x ≥ 0. Then
p √
|z| = x2 + y 2 = x2 = |x| = x = Re(z),

59
as required. For the “⇒” direction, suppose that Re(z) = |z|. That is,
p
x2 + y 2 = x.
p
Then pit must be the case that y 2 = 0 (otherwise we would have x2 + y 2 > x. Also,
since x2 + y 2 ≥ 0, it must be the case that x ≥ 0, as required.

4. This is left as an exercise.

5. Write z = x + iy and w = u + iv. It follows from the formula for multiplication of


complex numbers given in (30), along with the definition of the absolute value, that
p
|zw| = (xu − yv)2 + (yu + xv)2
p
= x2 u2 + y 2 v 2 − 2xuyv + y 2 u2 + x2 v 2 + 2yuxv
p
= x2 u2 + y 2 v 2 + y 2 u2 + x2 v 2
p
= (x2 + y 2 )(v 2 + u2 )
p p
= x2 + y 2 v 2 + u2
= |z||w|.

6. This is left as an exercise.

7. This is left as an exercise.

The triangle inequality is also valid for the absolute value function, and it is also a very
important and useful result.

Theorem 1.50. Let z, w ∈ C. Then we have

|z + w| ≤ |z| + |w|. (34)

We also have

|z + w| = |z| + |w| ⇐⇒ z w̄ is a real number and z w̄ ≥ 0. (35)

We also have
|z| − |w| ≤ |z − w|. (36)

Proof. We will prove (34) and (35) simultaneously. First observe that

|z + w|2 = (z + w)(z + w) = (z + w)(z̄ + w̄).

Here we have used the fact that z + w = z̄ + w̄, which was an exercise on page 50. It then
follows that
|z + w|2 = z z̄ + (z w̄ + z̄w) + ww̄ = |z|2 + (z w̄ + z̄w) + |w|2 . (37)
We can calculate directly that
z w̄ + z̄w = 2Re(z w̄). (38)

60
Indeed, for any a ∈ C, we have
a + a = 2Re(a). (39)
Now apply (39) with a = zw. Since zw = zw, this proves (38).
Combining (38) with (37) and then applying points 3, 5 and 6 from Lemma 1.49, we have

|z + w|2 = |z|2 + (z w̄ + z̄w) + |w|2 = |z|2 + 2Re(z w̄) + |w|2


≤ |z|2 + 2|z w̄| + |w|2
= |z|2 + 2|z||w̄| + |w|2
= |z|2 + 2|z||w| + |w|2
= (|z| + |w|)2 .

This proves (34). Note also that there is only one inequality in this sequence, which was
an application of Lemma 1.49, point 3. If we apply the additional information from Lemma
1.49, point 3, we see that this inequality is an equality if and only if z w̄ ∈ R and z w̄ ≥ 0.
This proves (35).
We now turn to the proof of (36). After relabelling the variables and rearranging the expres-
sion, it follows from (34) that
|a + b| − |b| ≤ |a| (40)
for all a, b ∈ C. Applying (40) with a = z − w and b = w, we get

|z| − |w| ≤ |z − w|. (41)

Applying (40) with a = z − w and b = −z, we get

|w| − |z| = | − w| − | − z| ≤ |z − w|. (42)

In the equality above, we have used point 7 of Lemma 1.49.


Finally, note that |z| − |w| is a real number, and therefore the absolute value |z| − |w| is
equal to either |z| − |w| or −(|z| − |w|) = |w| − |z|. In either of these cases, (41) or (42) give
the required bound |z| − |w| ≤ |z − w|.

Once again, the triangle inequality (inequality 34) also implies that

|a − b| = |a − c + c − b| ≤ |a − c| + |c − b|, (43)

for all a, b, c ∈ C.
The triangle inequality in the form |a − b| ≤ |a − c| + |c − b| can be interpreted geometrically
as saying that the fastest route from a to b is to go directly, rather than travelling via the
point c. This interpretation also gives some justification for why we call this the triangle
inequality.

61
Figure 21: The Triangle Inequality. A visualization of equation 43.

The following table summarises the representation and visualisation of complex numbers in
the complex plane.

C The plane R2
z = x + iy point with coordinates (x, y)
Re(z) x-coordinate of the corresponding point
Im(z) y-coordinate of the corresponding point
The set of real numbers points on the x-axis
The set of imaginary numbers points on the y-axis
−z z after rotation by 180 degrees centred at the origin
z̄ z reflected in the x-axis
|z| the distance between z and the origin
|z1 − z2 | the distance between z1 and z2
z1 + z2 vector addition of the corresponding points
|z1 + z2 | ≤ |z1 | + |z2 | the triangle inequality

Instead of using the Cartesian coordinates (x, y) for a complex number z = x+iy, one can also
switch to polar coordinates (r, ϕ). We can identify a complex number by its “direction” (or
angle) from the origin, and its distance from the origin. Once we know these two properties
of a complex number, we know exactly what the number is.
For example, suppose that we are told that a complex number z has absolute value |z| = 2
and makes an angle of π4 (i.e. 45 degrees) anticlockwise from the real axis (i.e. the x-axis).
We can use some simple trigonometry to calculate the canonical form z = x + iy. Indeed, we
have
y x
sin(π/4) = , cos(π/4) = ,
2 2

62

2
and since sin(π/4) = cos(π/4) = 2 , it follows that
√ √ !
√ √ 2 2
z= 2+ 2i = 2 +i .
2 2

Figure 22: Illustration of the point z determined by its polar coordinates.

The choice of the factorisation may seem a little strange here, but there is a reason for this,
as should become clear soon.
Recall from Section 1.8 that every point on the unit circle {(x, y) : x2 + y 2 = 1} can be
written uniquely as (x, y) = (cos ϕ, sin ϕ) for some ϕ ∈ [0, 2π). Putting this fact into the
context of the complex plane C, it follows that every complex number z such that |z| = 1
can be expressed uniquely as
z = cos ϕ + i sin ϕ, (44)
for some ϕ ∈ [0, 2π). This ϕ tells us the direction of a “unit” complex number. For a generic
complex number z ̸= 0 (i.e. not necessarily with |z| = 1), we introduce a dilate r which tells
us the distance between z and the origin (i.e. the value of |z|).
To summarise, we can uniquely express an arbitrary nonzero complex number z in the form

z = r(cos ϕ + i sin ϕ),

for some r ∈ R+ and ϕ ∈ [0, 2π). The values r and ϕ have a geometric meaning; r is
the distance of z from the origin (the absolute value or magnitude of z) and ϕ is the
(anticlockwise) angle determined by the real axis and z. We call ϕ the argument of z.
An easy, useful and compact way of writing complex numbers with the form of (44) can be
obtained by using Euler’s formula, which states that, for all ϕ ∈ R,

eiϕ := cos ϕ + i sin ϕ. (45)

63
Note that eiϕ can at the moment only be understood as a symbol for the right hand side above,
and it is already useful as such. However, we will see later that this is really an equation,
i.e., we will also define ez for any z ∈ C in a way that is consistent with the case when z is
real, and show that Euler’s formula is true for this definition of a complex exponential.
Examples - By using Euler’s formula and some familiar trignometric values, we can imme-
diately see that
π
π  π 
ei 2 = cos + i sin = i,
2 2
eiπ = cos(π) + i sin(π) = −1,
   
i 3π 3π 3π
e 2 = cos + i sin = −i,
2 2
ei2π = cos(2π) + i sin(2π) = 1.

Additionally, we obtain from these values, together with the periodicity of the trigonometric
functions (or standard calculation rules for exponentials), that for k ∈ Z,

(
1 if k is even
eikπ = .
−1 if k is odd

Using Euler’s formula we can write every complex number (in its polar form) as

z = reiϕ . (46)

The polar representation in this form is particularly useful when it comes to multiplication
and powers of complex numbers. Given two complex numbers z1 = r1 eiϕ1 and z2 = r2 eiϕ2 ,
we obtain
z1 · z2 = (r1 eiϕ1 )(r2 eiϕ2 ) = r1 r2 ei(ϕ1 +ϕ2 ) .
This formula is often easier to deal with and more intuitive than the formula for multiplication
of complex numbers in canonical form given in (30).
From this formula (with z = w) and induction we obtain de Moivre’s formula for powers
z n of z = r(cos ϕ + i sin ϕ) = reiϕ with n ∈ N:

z n = rn einϕ = rn (cos(nϕ) + i sin(nϕ)).

Example - As an example we calculate (1 + i)42 . We set z := 1 + i. In order to use de


Moivre’s formula, we must first express z in to its trigonometric (polar) form. We obtain
√ π
z = 2ei 4 .

You should verify this in detail! By de Moivre’s formula and the periodicity of the sine and
cosine functions, we get
√   π  π   π   π 
z 42 = ( 2)42 cos 42 + i sin 42 = 221 cos + i sin
4 4 2 2
= 221 i.

64
Exercise - Suppose that z, w ∈ C \ {0}. Show that |z + w| = |z| + |w| if and only if z and w
have the same argument. (Hint: use Theorem 1.50.)
It is an important result in mathematics, that complex numbers are really all we need to solve
polynomial equations. In fact, the Fundamental theorem of algebra (in one of its variants)
even gives the precise answer for the number of solutions of polynomial equations, the number
of solutions to a polynomial degree d polynomial equation with complex coefficients is exactly
d. We do not discuss this in detail here, but illustrate with an example.
Example - We seek to find all solutions x ∈ C to the equation.

x2 + (1 − i)x − i = 0.

This is a quadratic equation, or in other words, a polynomial of degree 2. So, there should be
in general be two solutions x ∈ C. We can find them by using the quadratic formula, which
implies that
p √ √ √
i − 1 ± (1 − i)2 + 4i i − 1 ± 2i i−1± 2 i
x= = = .
2 2 2
Recalling from (32) that

 
1 1
i= √ + i√ ,
2 2
we conclude that the solutions to our equation are
√  
i − 1 + 2 √12 + i √12
x= =i
2
and √  1 
i−1+ 2 − √2 − i √12
x= = −1.
2

65
1.10 Vectors and norms

For the final section of this introductory chapter, we will discuss some important concepts
related to vectors. Let d ∈ N. A vector is a tuple
 
v1
 v2 
v = (vi )di=1 =  .  ∈ Cd .
 
 .. 
vd

The elements vi are complex numbers. The dimension of v is d. If all of the vi are equal to
zero then we call v the zero vector, which is denoted as 0.
We define the addition and scalar multiplication of vectors component-wise. That is,
for two vectors    
u1 v1
u2   v2 
u =  .  and v =  .  ,
   
.
.  .. 
ud vd
and a number λ ∈ C, we define
       
u1 v1 u1 + v1 λu1
u2  v2  u2 + v2  λu2 
u+v= . + . = .  and λu =  .  .
       
 ..   ..   ..   .. 
ud vd ud + vd λud

Note that it is important for vector addition that the vectors have the same dimension. If the
dimensions of the two vectors do not agree, then their sum is not defined. The term “scalar”
is used to distinguish this multiplication from the other notions of multiplication with vectors
to be discussed later.
Here we use the field of complex numbers C to build complex vectors v ∈ Cd . Note that
we can easily also consider only real vectors v ∈ Rd , as real vectors are a special case of
complex vectors. However, since all definitions here work directly in the complex case, and
this will be needed later, we define it in the more general context and comment on necessary
changes when needed. Moreover, note that we could define vectors, and the corresponding
operations, in a much more general context, as long as the operations for the components are
well-defined. This will be discussed much later when we come to vector spaces.
For real and complex numbers, we defined the absolute value to give a notion of how “large”
a number is. We will now do something similar for vectors.

Definition 1.51. Let v = (vi )di=1 ∈ Cd . Then, we define the Euclidean norm or length of
v by the formula v
u d
uX
∥v∥ := t
2 |v |2 .
i
i=1

66
Moreover, for two vectors u = (ui )di=1 and v = (vi )di=1 ∈ Cd we define the inner product or
dot product of u and v by the formula
d
X
⟨u, v⟩ := ui v¯i .
i=1

With this, we have the useful representation


p
∥v∥2 = ⟨v, v⟩.

For real vectors v, u ∈ Rd , these definitions simplify to


v
u d
uX
∥v∥2 := t vi2
i=1

and
d
X
⟨u, v⟩ := ui vi .
i=1

First of all, let us note that the notion of the Euclidean norm really does generalise the
concept of the absolute value of a complex number discussed earlier. To see this, consider the
case of a one-dimensional vector v ∈ C1 . This “vector” is in reality just a complex number
v1 = x + iy, with some x, y ∈ R. Then applying Definition 1.51, we see that
v
u d
uX p
∥v∥2 := t |vi |2 = |v1 |2 = |v1 |.
i=1

We use the subscript 2 here, because we will study also other norms later in the Mathematics
for AI syllabus. Note that some authors use the notation ∥ · ∥ for the Euclidean norm (i.e.
they remove the subscript), which emphasises its role as generalization of the absolute value
of a real or complex number.
The inner product is formally a mapping ⟨·, ·⟩ : Cd × Cd → C.
Exercise - Prove that the inner product satisfies the following properties.

1. Positive definiteness: for all u ∈ Cd , ⟨u, u⟩ ∈ R and ⟨u, u⟩ ≥ 0. Also,

⟨u, u⟩ = 0 ⇐⇒ u is the zero vector.

2. Linearity in the first argument: for all u, v, w ∈ Cd and λ ∈ C

⟨u + v, w⟩ = ⟨u, w⟩ + ⟨v, w⟩ (47)

and
⟨λu, v⟩ = λ⟨u, v⟩. (48)

3. Conjugate symmetry: for all u, v ∈ Cd ,

⟨u, v⟩ = ⟨v, u⟩

67
We can use the previous three properties of the inner product to prove the following.

Lemma 1.52. 1. For all u, v, w ∈ Cd and µ, λ ∈ C,

⟨λu + µv, w⟩ = λ⟨u, w⟩ + µ⟨v, w⟩.

2. For all u, v, w ∈ Cd and µ, λ ∈ C,

⟨u, λv + µw⟩ = λ⟨u, v⟩ + µ⟨u, w⟩.

Proof. 1. By linearity in the first argument,

⟨λu + µv, w⟩ = ⟨λu, w⟩ + ⟨µv, w⟩ = λ⟨u, w⟩ + µ⟨v, w⟩.

2. By conjugate symmetry

⟨u, λv + µw⟩ = ⟨λv + µw, u⟩. (49)

We then apply part 1 of this lemma, and some basic properties of the complex conjugate
(see the exercise on page 51) to obtain

⟨λv + µw, u⟩ = λ⟨v, u⟩ + µ⟨w, u⟩ = λ · ⟨v, u⟩ + µ · ⟨w, u⟩. (50)

Using conjugate symmetry again and the fact that (z̄) = z, (50) implies that

⟨λv + µw, u⟩ = λ · ⟨u, v⟩ + µ · ⟨u, w⟩

After combining this with (49), the proof is complete.

Note that, taking λ = µ = 1 in part 2 of Lemma 1.52, obtain

⟨u, v + w⟩ = ⟨u, v⟩ + ⟨u, w⟩.

This is very similar to the identity 47 from the previous exercise. An analogue of (48) is the
identity
⟨u, λv⟩ = λ⟨u, v⟩.
This is obtained from part 2 of Lemma 1.52 by fixing µ = 0.
Exercise - Prove that, for any λ ∈ C and v ∈ Cd , we have

∥λv∥2 = |λ|∥v∥2 .

We have already seen that the Euclidean norm generalises many features of the absolute
value function to the higher dimensional setting. The next result, the triangle inequality,
is another important step in this direction.

Theorem 1.53. For all u, v ∈ Cd ,

∥u + v∥2 ≤ ∥u∥2 + ∥v∥2 .

68
Note that in the case that d = 2 or d = 3 the Euclidean norm coincides with the usual
intuition of the distance between the origin and the point v, i.e. ∥v∥ is the length of the
‘direct path’ between the origin and v. This makes the triangle inequality appear to be an
obvious statement. However, in higher dimensions (particularly when d > 3) this is not so
clear. We present a proof below.
First we need another inequality, the Cauchy-Schwarz inequality, which is one of the
most important inequalities in mathematics. It continues to be applied extremely frequently
in active research across almost all areas of mathematics, and many great and celebrated
results essentially come down to skillful manipulations of the bound.

Theorem 1.54 (Cauchy-Schwarz inequality). For all u, v ∈ Cd ,

|⟨u, v⟩| ≤ ∥u∥2 ∥v∥2 .

Proof. Write   
u1 v1
u2   v2 
u =  .  and v =  .  .
   
.
.  .. 
ud vd

Case 1: Suppose that either u = 0 or v = 0. Then both sides of the inequality are equal to
zero, and thus the result is valid.
Case 2: Suppose that u, v ̸= 0. We will make use of the fact that the inequality

a2 + b2
≥ ab (51)
2
holds for all a, b ∈ R. The inequality (51) can be proved by rearranging the inequality
(a − b)2 ≥ 0.
We define real numbers
|ui | |vi |
ai = , and bi = .
∥u∥2 ∥v∥2
Observe that
d d d
X X |ui |2 1 X
a2i = 2 = ∥u∥2 |ui |2 = 1.
i=1 i=1
∥u∥ 2 2 i=1
Pd 2
Similarly, i=1 bi = 1. It therefore follows that
d
X a2 + b2
i i
= 1. (52)
2
i=1

We then use the triangle inequality for C (inequality 34 of Theorem 1.50), along with (51)

69
and (52), to conclude that
d
X d
X d
X
|⟨u, v⟩| = ui v¯i ≤ |ui v̄i | = |ui ||vi |
i=1 i=1 i=1
Xd
= ai ∥u∥2 bi ∥v∥2
i=1
d
X
= ∥u∥2 ∥v∥2 ai bi
i=1
d
X a2 + b2
i i
≤ ∥u∥2 ∥v∥2
2
i=1
= ∥u∥2 ∥v∥2 .

By analysing the argument a little more closely, and paying attention to when the inequalities
in the proof are really identites, we obtain the following stronger statement, which gives us
additional information about when the Cauchy-Schwarz inequality is tight.
Theorem 1.55 (Cauchy-Schwarz inequality). For all u, v ∈ Cd ,

|⟨u, v⟩| ≤ ∥u∥2 ∥v∥2 .

Moreover, we have |⟨u, v⟩| = ∥u∥2 ∥v∥2 if and only if u = λv for some λ ∈ C.

We leave the proof of Theorem 1.55 as an optional exercise.


The Cauchy-Schwarz inequality can now be used to prove the triangle inequality for the
Euclidean norm.

Proof of Theorem 1.53. By the properties of the inner product established in Lemma 1.52
and the exercise before it, it follows that for all u, v ∈ Cd ,

∥u + v∥22 = ⟨u + v, u + v⟩
= ⟨u, u⟩ + ⟨u, v⟩ + ⟨v, u⟩ + ⟨v, v⟩
= ⟨u, u⟩ + ⟨u, v⟩ + ⟨u, v⟩ + ⟨v, v⟩
= ∥u∥22 + ⟨u, v⟩ + ⟨u, v⟩ + ∥v∥22
= ∥u∥22 + 2Re (⟨u, v⟩) + ∥v∥22 .

We now use Lemma 1.49 (part 3) and the Cauchy-Schwarz Inequality to conclude that

∥u + v∥22 = ∥u∥22 + 2Re (⟨u, v⟩) + ∥v∥22


≤ ∥u∥22 + 2 |⟨u, v⟩| + ∥v∥22
≤ ∥u∥22 + 2∥u∥2 ∥v∥2 + ∥v∥22
= (∥u∥ + ∥v∥)2 .

This completes the proof.

70
We finally introduce some particularly important vectors which will appear very often in the
upcoming considerations. These are the standard basis vectors e1 , . . . , ed ∈ Cd , where ek
is the vector which is zero everywhere except for the kth entry, which is 1. That is,
     
1 0 0
0 1 0
e1 =  .  , e2 =  .  , . . . , ed =  .  .
     
 ..   ..   .. 
0 0 1

Note that ∥ei ∥ = 1 for all i = 1, . . . , d.


One important property of these vectors is that they can be used to represent arbitrary
vectors. For this, note that λek with λ ∈ C is the vector with λ in the kth entry, and zero
elsewhere. It is therefore easy to see that
 
v1
v2  X d
v=.= vi ei .
 
 .. 
i=1
vd

Moreover, the representation of the vector v in this way is unique. Such a method for
representing elements of Cd uniquely turns out to be very useful and important. A set with
such a property is called a basis for Cd , and the set {e1 , . . . , ed } is called the standard
basis. These concepts will be discussed in greater detail later, when we come to study vector
spaces.

71
2 Matrices and systems of linear equations

In this chapter we want to solve systems of equations. That is, we want to find possible
values for some variables that fulfill a certain collection of equations. This is one of the most
important disciplines in applied mathematics and numerical applications. In particular, we
will focus on systems of linear equations.
Systems of linear equations are the most frequently occurring type of multivariate problems
to solve, and they are also the easiest. Many (even “non-linear”) numerical problems can be
rewritten as, or approximated by, a system of linear equations. Such systems can be rather
large and are usually solved by a computer, but it is up to the user to transfer the problem
under consideration to a well-defined linear system. It is therefore indispensable to have a
solid understanding of these basic problems.
For one free variable, linear equations are easy to solve. If we want to solve the equation
ax = b for some (fixed) a, b ∈ R, then the unique solution is given by x = ab if a ̸= 0. However,
if a = 0 and b ̸= 0 then this equation cannot be solved. That is, there is no x satisfying the
equation 0 · x = b. Meanwhile, for a = b = 0 the equation ax = b is fulfilled for every x ∈ R.
Although this situation was straightforward to solve, we still see that there are some subtleties
and different cases to consider. In particular, a single linear equation can have a unique
solution, can have no solutions, or can have infinitely many solutions, depending on the
properties of the equation. It turns out that systems of linear equations also satisfy the same
trichotomy.
As we increase the number of variables and equations, things become more complicated. Let
us illustrate this with some examples involving two equations and two variables.
Example - Find all solutions (x1 , x2 ) ∈ R2 to the system of linear equations

2x1 + x2 = 1
6x1 + 2x2 = 2.

We can use the first equation to eliminate one of the variables and reduce this to one equation
with one variable. The first equation gives x2 = 1 − 2x1 . Plugging this into the second
equation, we obtain 6x1 + 2(1 − 2x1 ) = 2, which simplifies to 2x1 = 0. So, it must be the
case that x1 = 0. Plugging this back into the first equation, it follows that x2 = 1, and so
the unique solution is (x1 , x2 ) = (0, 1).
If we change the system of equations slightly, the set of solutions can change significantly.
Example - Find all solutions (x1 , x2 ) ∈ R2 to the system of linear equations

2x1 + x2 = 1
6x1 + 3x2 = 2.

Let us try to use the same approach as we used for the previous example. The first equation
again gives x2 = 1−2x1 . Plugging this into the second equation, we obtain 6x1 +3(1−2x1 ) =
2, which simplifies to 3 = 2. It seems that we have reached a nonsensical contradiction!
Indeed, there are no solutions to this system, as the two equations are incompatible. Another
way to see this is by multiplying both sides of the first equation by 3. We arrive at the

72
equivalent system

6x1 + 3x2 = 3
6x1 + 3x2 = 2.

A solution to this system would again imply the contradiction 3 = 2.


Making another small modification changes the story completely again.
Example - Find all solutions (x1 , x2 ) ∈ R2 to the system of linear equations

2x1 + x2 = 1
6x1 + 3x2 = 3.

Plugging x2 = 1 − 2x1 into the second equation, we obtain 6x1 + 3(1 − 2x1 ) = 3, which
simplifies to 3 = 3. This is satisfied for all x1 ∈ R.
What is happening in this example is that the two equations are equivalent. This can again
be seen by multiplying both sides of the first equation by 3. Therefore, the solutions (x1 , x2 )
to this system are the same as the solutions to the equation 2x1 + x2 = 1. We can choose
any x1 ∈ R and then choose the corresponding x2 to satisfy the equation, and so there are
infinitely many equations. For instance, (0, 1) and (1, −1) are solutions.
These three examples show that such systems of equations might be quite sensitive to small
changes of the parameters (and this was just a two dimensional example). It is therefore
desirable to have criteria for a given system of equations to be (uniquely) solvable that can
be checked more easily before we start trying to calculate a solution. Moreover, this procedure
of eliminating variables becomes less practical for larger systems, and so we would like to
develop a more systematic approach that can deal with larger systems efficiently.
The key objects that will be used for developing such an approach are matrices.

2.1 Matrices

One may think of matrices as a multi-dimensional analogue of vectors, where we arrange


numbers into an array.

Definition 2.1. Let m, n ∈ N and aij ∈ R for 1 ≤ i ≤ m and 1 ≤ j ≤ n. A (real) m × n


matrix is an array given by
 
a11 a12 ... a1n
 a21 a22 ... a2n 
A= . ..  .
 
.. ..
 .. . . . 
am1 am2 ... amn

In this case we use the notation A ∈ Rm×n , and call m and n the dimensions of A. An
m × 1 matrix is called a column vector. A 1 × n matrix is called a row vector. If m = n,
then A is a square matrix.

In order to save space, we sometimes use the notation (aij )m,n


i,j=1 as a shorthand for the matrix
A described above. This notation tells us about the dimensions of A, and also that the entry

73
in the ith row and jth column is aij . Another notation that is sometimes convenient is that
(A)ij is used to denote the entry of A in the ith row and jth column. So, for the matrix A
defined above, we have (A)ij = aij .
The case of complex matrices, i.e., aij ∈ C, can be treated analogously, and we write A ∈
Cm×n . We can even consider matrices whose entries come from an arbitrary field F. However,
in order to give a solid grounding for understanding this new material, we will consider only
matrices of real numbers in this course.
We now turn to basic operations of matrices. The first two, namely scalar multiplication
and matrix addition, are simple (and familiar from the corresponding operations for vectors).
These operations are carried out component-wise, meaning that they are performed in each
entry of the matrices individually.
Let A, B ∈ Rm×n . Write  
a11 a12 ... a1n
 a21 a22 ... a2n 
A= .
 
.. .. .. 
 .. . . . 
am1 am2 . . . amn
and  
b11 b12 ... b1n
 b21 b22 ... b2n 
B= . ..  .
 
.. ..
 .. . . . 
bm1 bm2 . . . bmn
Then A + B is the m × n matrix
 
a11 + b11 a12 + b12 ... a1n + b1n
 a21 + b21 a22 + b22 ... a2n + b2n 
A+B = .
 
.. .. .. ..
 . . . . 
am1 + bm1 am2 + bm2 . . . amn + bmn

Note that it is important here that the matrices have the same dimension. If the dimensions
of the two matrices do not agree, then their sum is not defined.
The second operation is scalar multiplication, which is the operation which multiplies
every entry of the matrix by a fixed scalar. Let λ ∈ R and
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= . ..  .
 
.. ..
 .. . . . 
am1 am2 . . . amn

Then λA ∈ Rm×n is the matrix


 
λa11 λa12 ... λa1n
 λa21 λa22 ... λa2n 
A= . ..  .
 
.. ..
 .. . . . 
λam1 λam2 . . . λamn

74
The term “scalar” is used to distinguish this from the matrix product. This is the third
essential operation on matrices that we consider. The definition of the product of matrices
is a little more complicated, as we do not define this product component-wise.
Let A ∈ Rm×p be an m × p matrix and let B ∈ Rp×n be a p × n matrix. Write

A = (aij )m,p
i,j=1 , B = (bij )p,n
i,j=1 .

The product of A and B is the matrix C = AB ∈ Rm×n such that C = (cij )m,n
i,j=1 and

p
X
cij = aik bkj .
k=1
Pp
In other words, the entry of AB in the ith row and jth column is k=1 aik bkj .

For this definition to make sense, it is crucial that the dimensions of A and B are correct. In
particular, the number of columns of A must be equal to the number of rows in B
(we may sometimes say that the inner dimensions of A and B match).
A helpful way to think about computing the matrix product may be the following: to calculate
the ij entry of the product AB, move along the ith row of A and down the jth column of B.
Example - Let A ∈ R3×2 and B ∈ R2×2 be the matrices
 
1 6  
7 9
A = 2 5 , B = .
8 0
3 4

There are 2 columns in A and 2 rows in B. These numbers are the same, and so the matrix
AB is defined. In particular AB is a 3 × 2 matrix. Write
 
c11 c12
AB = c21 c22  .
c31 c32

We will fill in the blanks one entry at a time to write C out explicitly. To calculate c11 , we
consider the first row of A and first column of B.
 
1 6  
7 9
A= 2 5 , B=
  .
8 0
3 4

Then
c11 = 1 · 7 + 6 · 8 = 55.
To calculate c12 , we consider the first row of A and second column of B.
 
1 6  
7 9
A= 2 5 , B=
  .
8 0
3 4

Therefore,
c12 = 1 · 9 + 6 · 0 = 9.

75
We can repeat this process for each entry. We eventually obtain
 
55 9
AB = 54 18 .
53 27

On the other hand, the reverse product BA is not defined, because the inner dimensions do
not match.
One may observe from the example above that the process of computing an entry in the
product of two matrices is similar to that of computing an inner product of two vectors. We
formalise this observation below.
Let A ∈ Rm×p and B ∈ Rp×n . The matrix A has m rows, each of which can be viewed as
row vectors. We write  
a1
 a2 
A =  . ,
 
 .. 
am
where ai is the row vector ai = (ai1 , ai2 , . . . , aip ). Similarly, we can write

B = b1 b2 . . . bn .

Here, bj is the column vector  


b1j
b2j 
bj =  .  .
 
 .. 
bpj
The entry of AB in the ith row and jth column is then equal to
p
X
aik bkj = ⟨ai , bj ⟩.
k=1

We can also define the matrix-vector product of a matrix A ∈ Rm×n and a vector x ∈ Rn .
These kind of products will be particularly useful when it comes down to solving systems of
linear equations later! Write
   
a11 a12 . . . a1n a1
 a21 a22 . . . a2n   a2 
A= . ..  =  .. 
   
. .. . .
 . . . .   . 
am1 am2 . . . amn am

and  
x1
 x2 
x =  . .
 
 .. 
xn

76
Then Ax ∈ Rm is a vector, defined as
   
a11 x1 + a12 x2 + · · · + a1n xn ⟨a1 , x⟩
 a21 x1 + a22 x2 + · · · + a2n xn   ⟨a2 , x⟩ 
Ax :=   =  ..  . (53)
   
..
 .   . 
am1 x1 + am2 x2 + · · · + amn xn ⟨am , x⟩

In the previous chapter, we considered vectors x ∈ Rn for some n ∈ N. Such a vector may be
considered as a matrix in Rn×1 . If we consider a vector to be a matrix in this way, we can
consider the product of a matrix A ∈ Rm×n with the matrix x ∈ Rn×1 using the definition
of matrix product given above. Then the matrix product Ax is exactly the same as the
definition of Ax given above in (53). In other words, the matrix vector product is just a
special case of the matrix product we have already defined.
Example - Let  
1 6  
3
A= 2 5
  and x = .
4
3 4
Then    
1·3+6·4 27
Ax = 2 · 3 + 5 · 4 = 26 .
  
3·3+4·4 25

As with matrix multiplication, we need to take care to ensure that the matrix-vector product
we are considering has the correct dimensions to be defined. For instance, if we instead set
   
1 6 3
A = 2 5 and x = 4 ,
3 4 0

then Ax is not defined.


We have the following calculation rules, which are reminiscent of rules we established for
vectors in the previous section.
Exercise

• Prove that, for all A, B ∈ Rm×n and any λ ∈ R,

λ(A + B) = λA + λB.

• Prove that, for all A ∈ Rm×p , B ∈ Rp×n and any λ ∈ R,

A(λB) = λAB.

• Prove that, for all A ∈ Rm×p and B, C ∈ Rp×n ,

A(B + C) = AB + AC.

Since matrix-vector multiplication is just a special case of matrix multiplication, the following
two facts follow immediately from the previous exercise.

77
• For all A ∈ Rm×n and x, y ∈ Rn ,

A(x + y) = Ax + Ay

• For all A ∈ Rm×n , x ∈ Rn and λ ∈ R,

A(λx) = λAx.

However, matrix multiplication is not commutative. Even in the case when both
matrices AB and BA are defined and have the same dimensions, it is usually the case that
AB ̸= BA. You can check this yourself by choosing two “random” matrices A and B, with
the correct dimensions to ensure that both AB and BA are defined (what condition does this
impose on the dimensions?), and computing both AB and BA.
There exist identity elements for the operations of matrix addition and multiplication.
Let 0mn denote the matrix in Rm×n with every entry being equal to zero. Then, for any
A ∈ Rm×n we have
A + 0mn = 0mn + A = A.

The multiplicative identity is of more use, and also practical interest. For n ∈ N, we define
the identity matrix In ∈ Rn×n to be the matrix
 
1 0 0 ... 0
0 1 0 . . . 0
 
In := 0 0 1 . . . 0 .
 
 .. .. .. . . .
. . . . .. 
0 0 0 ... 1

In words, In is the matrix with 1 at every diagonal entry and 0 everywhere else.
Note that the identity is a square matrix, and we may write I := In if the dimension is clear.
Exercise - Let A ∈ Rm×n . Prove that

AIn = A

and
Im A = A.

Next, we discuss the transpose of a matrix. Since the dimensions of a matrix are important,
it makes a huge difference if the dimensions of a matrix are m × n or n × m, and it is quite
useful to have a compact notation to switch the rows and columns of a matrix. That is, for a
given m × n matrix A = (aij )m,n T
i,j=1 , we define its transpose, denoted A as the n × m matrix
whose rows are the columns of A. To be more precise, if
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= .
 
.. .. .. 
 .. . . . 
am1 am2 . . . amn

78
then  
a11 a21 ... am1
 a12 a22 ... am2 
AT =  . ..  .
 
.. ..
 .. . . . 
a1n a2n . . . amn

Example - Consider again the matrix


 
1 6
A = 2 5  .
3 4

Then  
T 1 2 3
A = .
6 5 4

The transpose notation is also convenient for distinguishing column vectors and row vectors.
Recall that the standard basis unit vectors ek ∈ Rn = Rn×1 are the (column) vectors that
contain exactly one 1 (in the kth position) and all other entries are zero. The row unit vectors
are defined via transposition as eTk ∈ R1×n . That is,

eT1 = (1, 0, 0, . . . , 0), eT2 = (0, 1, 0, . . . , 0), . . . eTn = (0, 0, 0, . . . , 0, 1).

Using these unit vectors, we can write the identity matrix as


 T
e1
eT 
 2
In = (e1 , e2 , . . . , en ) =  .  .
 .. 
eTn

With the above considerations and the fact that In A = AIn = A, we see that the unit vectors
can be used to “extract” the rows and columns from a matrix. For instance, given a matrix
A ∈ Rm×n of the form
   
a11 a12 . . . a1n a1
 a21 a22 . . . a2n   a2 
A= . ..  =  ..  ,
   
.. ..
 .. . . .   . 
am1 am2 . . . amn am

and the row vector eTk ∈ R1×m , we can compute that

eTk A = ak .

Similarly, the product Aek (with ek ∈ Rn×1 ) can be used to extract the kth column from A.
There is one calculation rule related to the transpose, that is sometimes also very useful for
computing the product of matrices. We state this in the following lemma.

Lemma 2.2. Let m, p, n ∈ N, A ∈ Rm×p and B ∈ Rp×n . Then

(AB)T = B T AT .

79
Proof. Firstly, we note that B T AT is well-defined, and is and n×m matrix, which means that
the dimensions of (AB)T and B T AT are the same. Indeed, B T ∈ Rn×p and AT ∈ Rp×m , and
so the inner dimensions of B T and AT agree. The outer dimensions confirm that B T AT ∈
Rn×m .
To prove that (AB)T = B T AT , we need to show that each corresponding pair of entries of
the two matrices are the same. Recall that, for an arbitrary matrix M , the notation (M )ij
is used for the entry of M in the ith row and jth column. We need to show that

((AB)T )ij = (B T AT )ij (54)

holds for all 1 ≤ i ≤ n and 1 ≤ j ≤ m.


Since (M T )ij = (M )ji , it follows that
p
X
T
((AB) )ij = (AB)ji = (A)jk (B)ki .
k=1

On the other hand


p
X p
X
T T T T
(B A )ij = (B )ik (A )kj = (B)ki (A)jk .
k=1 k=1

Comparing the previous two equations, we see that we have proved (54).

Some matrices do not change under transposition. A matrix A ∈ Rn×n such that AT = A is
called symmetric.
Note that symmetric matrices must be square, and we will see later that symmetric matrices
have several important properties.
An obvious but important example of a symmetric matrix is the identity matrix. More
generally, diagonal matrices are always symmetric.

Definition 2.3. A square matrix A = (aij )n,n


i,j is a diagonal matrix if there exist d1 , . . . , dn ∈
R such that (
di if i = j
aij := .
0 otherwise
The numbers di are called diagonal elements of A and we write A = diag(d1 , . . . , dn ).

The last concept related to matrix multiplication that we will need is the inverse of a matrix.

Definition 2.4. Let A ∈ Rn×n . The inverse of A, if it exists, is a matrix A−1 ∈ Rn×n such
that
AA−1 = A−1 A = In .
If an inverse exists, then we call a matrix invertible.

80
Note that we only considered square matrices in the above definition. Why?
Example - The matrix In is invertible and In−1 = In .
Example - Let  
1 2
A=
3 4
We can verify by direct calculation that A is invertible and
 
−1 −2 1
A = 3
2 − 12

We just need to check that


       
1 2 −2 1 −2 1 1 2 1 0
3 = 3 = .
3 4 2 − 21 2 − 12 3 4 0 1

This calculation is left to the student to check.


In general, it is not easy to see whether a matrix is invertible or not. For example, we showed
above that the matrix  
1 2
3 4
is invertible, but the slightly modified matrix
 
1 2
3 6

is not invertible. We will discuss a way to verify if a matrix is invertible, and how to compute
an inverse later in this chapter. However, let us already add here that even if we know that
a matrix is invertible, it is usually difficult to compute its inverse. We will come back to
this issue later, and present some ways for computing the inverse, at least for small matrices.
This inverse will be the ultimate tool to solve certain systems of linear equations. But we
will first discuss some more direct, but less powerful ways to calculate solutions.
Recall the field axioms that we stated in Section 1.4. Many of these axioms are also satisfied
for matrices. Let us restrict to the case of matrices in Rn×n for some n ∈ N. We have a notion
of addition and multiplication which satisfy the properties of associativity and distributivity.
There exist additive and multiplicative inverses. Addition is commutative, and every matrix
has an additive inverse.
On the other hand, multiplication of matrices is not commutative, and so Rn×n is not a field.
Furthermore, not all matrices in Rn×n have a multiplicative inverse.

81
2.2 Systems of linear equations

Throughout this section, there will be the parameters m, n ∈ N, where n is the number
of unknown variables and m is the number of equations that must be fulfilled. The
system of equations we want to solve will be of the following form.

Definition 2.5. Let m, n ∈ N and for all 1 ≤ i ≤ m and 1 ≤ j ≤ n, let aij ∈ R and bi ∈ R.
A system of linear equations is given by

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bm .

The xi with 1 ≤ i ≤ n are called variables, or unknowns.


The aij are called the coefficients of the system.
The matrix A = (aij )m,n
i,j=1 is called the matrix of coefficients of the system.

The vector  
b1
 b2 
b= . 
 
 .. 
bm
is called the right hand side (RHS) of the system.
If there exist numbers x1 , . . . , xn ∈ R that fulfill all the equations, then we call the tuple
(x1 , ..., xn ) a solution to the linear system.
If there is no solution, then we call the linear system inconsistent.

Let  
a11 a12 ... a1n
 a21 a22 ... a2n 
A= .
 
.. .. .. 
 .. . . . 
am1 am2 . . . amn
and  
x1
 x2 
x =  . .
 
 .. 
xm
Recall from the previous section that the matrix-vector product Ax is defined as
 
a11 x1 + a12 x2 + · · · + a1n xn
 a21 x1 + a22 x2 + · · · + a2n xn 
Ax =  .
 
..
 . 
am1 x1 + am2 x2 + · · · + amn xn

82
Therefore, the linear system from the previous definition can also be written in shorter form
as
Ax = b.

Obviously, we are interested in solutions to a linear system. However, as already discussed


above, such systems may have no solutions, a unique solution, or even infinitely many solu-
tions. We will see that a more detailed analysis of the matrix of coefficients can help us to
determine which of these three cases we face. Before we come to this, let us introduce some
more notation and discuss some examples.
Definition 2.6. Given a linear system Ax = b with coefficient matrix A ∈ Rm×n and RHS
b ∈ Rm , then we denote the set of solutions by

L(A, b) = {x ∈ Rn : Ax = b}.

Examples - Let us revisit some examples we considered at the beginning of this chapter.
The system

2x1 + x2 = 1
6x1 + 2x2 = 2

has the unique solution (x1 , x2 ) = (0, 1). This is the same thing as saying that the system
    
2 1 x1 1
=
6 2 x2 2

has the unique solution    


x1 0
= .
x2 1
Therefore, we can write      
2 1 1 0
L , = .
6 2 2 1

Note that L(A, b) is a set and therefore, we need to write L(A, b) = {x} (rather than
L(A, b) = x if x is the only solution.
Example - We also considered the system of equations

2x1 + x2 = 1
6x1 + 3x2 = 3.

Since the two equations are identical, the solutions to this system are simply the solutions to
the equation 2x1 + x2 = 1. This can be rewritten as x2 = 1 − 2x1 .
We can treat x1 ∈ R as a free variable, and conclude that any point of the form (λ, 1 − 2λ)
such that λ ∈ R is solution to our system of equations. In summary, we have shown that
      
2 1 1 λ
L , = :λ∈R .
6 3 3 1 − 2λ

Example - Consider the following system of linear equations.

83
2x1 + x2 + 3x3 = 1
6x1 + 3x2 = 3 (55)
4x1 = 8.

The set of solutions to this system is the same as the set of all (x1 , x2 , x3 ) ∈ R3 such that
    
2 1 3 x1 1
6 3 0  x2 = 3
 
4 0 0 x3 8

The form that this system takes makes it quite simple to solve. The last equation immediately
gives x1 = 2. Plugging this into the second equation, we have 12 + 3x2 = 3, and so x2 = −3.
Plugging these values of x1 and x2 into the first equation gives 4 − 3 + 3x3 = 1, and so x3 = 0.
We conclude that      
2 1 3 1  2 
L   6 3 0 , 3
    = −3 .
4 0 0 8 0
 

It would be nice if all systems of linear equations could be solved as easily as (55). While this
is not quite the case, it is true that every system of linear equations can be reduced to make
it a little easier to consider. This is essentially what we will be doing in the next section,
when we learn about Gaussian elimination and row reduction.
We now discuss one special case of equations, for which the right-hand side of the system is
made up only of zeroes.
Definition 2.7. Let A ∈ Rm×n . A linear system of the form

Ax = 0

is called a homogeneous system.


For a linear system Ax = b, we say that the homogeneous system Ax = 0 is the correspond-
ing homogeneous system.

For a given matrix A ∈ Rm×n and b ∈ Rm , it turns out that we can learn a lot about
the set of solutions L(A, b) by considering the set of solutions L(A, 0) to the corresponding
homogenous system. This is the content of the next two lemmas.
Lemma 2.8. Let A ∈ Rm×n and b ∈ Rm . Suppose that there exist y, z ∈ Rn with z ̸= 0 such
that Ay = b, and Az = 0. Then there exist infinitely many solutions x ∈ Rn to the system
Ax = b.

Proof. Let λ ∈ R be arbitrary. Then

A(y + λz) = Ay + λAz = b + λ0 = b.

Since z ̸= 0, it follows that all of the vectors y + λz are distinct. As there are infinitely many
choices for λ ∈ R, it follows that Ax = b has infinitely many solutions.

84
Lemma 2.9. Let A ∈ Rm×n and b ∈ Rm . Suppose that the homogeneous system Ax = 0 has
only the trivial solution x = 0. Then there exists at most one solution to the system Ax = b.

Proof. This is left as an exercise.

85
2.3 Gaussian elimination

Now we make an important observation which allows us to derive an algorithm for solving
linear systems by manipulating matrices. We can perform the following operations to a linear
system without changing the set of solutions:

• Interchanging any two equations, i.e., changing the order of the equations.

• Multiplying an equation with a scalar 0 ̸= λ ∈ R.

• Adding a multiple of an equation to another equation.

Take a moment to consider why these three changes do not alter the solution set. The third
of these points is a little more difficult than the others, and will be considered in more detail
in a forthcoming exercise sheet.
Since every system of linear equations can be written with the help of a matrix, it is natural
to consider how the above operations change the corresponding matrix of coefficients of a
linear system. We will see that they indeed allow for successive modifications that lead to
much simpler matrices, i.e., matrices in echelon form and reduced echelon form. From such a
matrix, we will be able to basically see if a corresponding linear system is (uniquely) solvable
or not.
Let us start by discussing how the above operations to a linear system Ax = b affect the
corresponding matrix A. However, note already now that these operations also change the
RHS b of a linear system, and this is essential. We will come back to this shortly, but for
now we only consider the corresponding matrix of coefficients.
In view of the operations from above that can be used to change a linear system Ax = b
without changing the set of solutions, we see that the matrix A is changed in the following
way:

• Interchanging two rows.

• Multiplying a row with a scalar 0 ̸= λ ∈ R.

• Adding a multiple of a row to another row.

For obvious reasons, these operations are called row operations, or sometimes elementary row
operations. Two matrices A and B are said to be equivalent if A can be obtained from B
by performing row operations. Note that this definition is symmetric, since one can perform
“inverse” row operations to then get from A back to B.
The goal now is to use these operations to simplify the given matrix into echelon form. In
particular, we look to create as many zeroes in the matrix as possible, and for these zeroes
to appear in a certain structured manner. We are ready to give some proper definitions.

Definition 2.10. Let C ∈ Rm×n and let 1 ≤ i ≤ m. The leading coefficient of the ith row
of C is the first non-zero entry in the row.

86
Definition 2.11. Let C ∈ Rm×n be a matrix which has at least one non-zero entry and let
cij denote the entry of C in the ith row and jth column. Let ciji denote the leading coefficient
of the ith row of C, if such a leading coefficient exists.
C is in row echelon form if both of the following conditions hold.

• There is some 1 ≤ k ≤ n such that the leading coefficient ciji exists if and only if
1 ≤ i ≤ k. In other words, all zero rows appear at the bottom of the matrix.

• For all 1 ≤ i < i′ ≤ k, ji < ji′ . In other words, the leading coefficients move strictly to
the right as we move down the rows of C.

The matrix with every entry being 0 is also in row echelon form.

Note that this definition also implies that the entries below a leading coefficient in the same
column are all equal to 0.

Definition 2.12. Let C ∈ Rm×n and let cij denote the entry of C in the ith row and jth
column. Let ciji denote the leading coefficient of the ith row of C, if such a leading coefficient
exists.
C is in reduced row echelon form if it is in row echelon form and it also satisfies the
following two conditions.

• For all 1 ≤ i ≤ k, ciji = 1. In other words, all leading coefficients are equal to 1.

• For all 1 ≤ i ≤ k, and 1 ≤ i′ < i, we have ci′ ji = 0. In other words, the entries above a
leading coefficient in the same column are all equal to 0.

The matrix with every entry being 0 is also in reduced row echelon form.

Examples - The following two matrices are in row echelon form:


 
  −2 0 2 3 −1
1 0 2 5
0 1 0 1 0
A = 0 0 −3 12 

and B= 0
.
0 1 8 π
0 0 0 1
0 0 0 0 0

The following two matrices are in reduced row echelon form:


 
1 2 0  
0 0 1  1 0 0
C= 0 0 0 
 and D = 0 1 0 
0 0 1
0 0 0

Consider the matrix  


1 2 0
0 0 0
E=
0
.
0 0
0 0 1

87
This matrix is not in row echelon form. It does not satisfy the required condition that all of
the zero rows are at the bottom of the matrix. However, if we reverse the order of the rows,
swapping the second row with the fourth one, we obtain the matrix C, which is in reduced
row echelon form.
Another matrix which is not in row echelon form is
 
1 2 0
0 1 1
F = 0 1
.
0
0 0 0

This is because the leading coefficient of the the third row is not to the right of the leading
coefficient of the row above.
Example - Let’s see an example of how reduced row echelon form matrices correspond to
linear systems that can be easily solved. We use the reduced row echelon form matrix
 
1 2 0
0 0 1
C= 0 0 0

0 0 0

from an earlier example. We seek to find all solutions x ∈ R3 to the equation


 
2
1
Cx =  0 .

Recalling the notation introduced in the previous section, we want to understand the set
  
2
 1
LC, 0
 

Writing this as a system of linear equations, this becomes

x1 + 2x2 = 2
x3 = 1
0=0
0 = 1.

Since the last equation is never valid, there are no solutions to this system, and therefore
  
2
 1
L C, 0 = ∅.
 

88
Let’s see what happens when we slightly modify the RHS of this system and consider solutions
x ∈ R3 to the equation  
2
1
Cx = 
0 .

0
Writing this as a system of linear equations, this becomes

x1 + 2x2 = 2
x3 = 1
0=0
0 = 0.

The last two equations are always satisfied, and can thus be disregarded. So, we need to
solve the system. Writing this as a system of linear equations, this becomes

x1 + 2x2 = 2
x3 = 1.

We can treat x2 as a free variables. This means that we allow x2 to range over all possible
values of R, and write the other variables in terms of the free variables (if necessary). We
conclude that   
2   
 1  2 − 2x2 
L C,   =
     x2  : x2 ∈ R . (56)
0
1
 
0

Note that there is some choice in how to choose the free variables and therefore in how we
express the final form of the solution set. In this case, we could have instead chosen x1 as
the free variable and expressed x2 in terms of x1 . We could then conclude that
  
2   
 1  x1 
L C,   = 1 − 1 x1  : x1 ∈ R .
 0  2
1

0

These two expressions may appear different, but they are just two different descriptions of
the same set.
When we are looking to express the solution set for a system given in reduced row echelon
form, a convenient method is the following: we can set the free variables to correspond to
the columns which do not contain a leading coefficient. This is what we did when writing
down the solution set in the form of (56). As we have seen above, there are other ways to
choose the free variables, but using the columns without leading coefficients is guaranteed to
produce a fairly tidy looking expression for the solution set.
We do not prove the following statement here formally, but note that it is the basis of the
considerations below.
Theorem 2.13. Every matrix can be transformed to reduced row echelon form by performing
row operations. Moreover, the reduced row echelon form of a matrix is unique.

89
In contrast, a given matrix A can be transformed by row operations into different matrices
in (non-reduced) row echelon form. For example, multiplying any row of a row echelon form
matrix by 2 leads to another row echelon form matrix. But even then, all row echelon forms
of a matrix have the same number of non-zero rows.
Exercise - Prove that any two equivalent matrices in row echelon form have the same number
of non-zero rows.
This allows us to state the following definition.
Definition 2.14. Let A ∈ Rm×n be arbitrary. We define the rank of A, denoted by rank(A),
as the number of non-zero rows in a row echelon form matrix C that is equivalent to A.

Exercise - Let A ∈ Rm×n be an arbitrary matrix. Show that

rank(A) ≤ min{m, n}.

Examples - Let us revisit the six matrices A, B, C, D, E and F from a previous example.
Since A, B, C and D are in row echelon form, we can immediately see their ranks by counting
the number of non-zero rows. Note that

rank(A) = 3, rank(B) = 3, rank(C) = 2, rank(D) = 3.

Although E is not in row echelon form, we can easily transform it into row echelon form with
row operations, namely by switching the second and fourth row. We obtain the equivalent
matrix  
1 2 0
0 0 1 
E′ = 0 0 0  .

0 0 0
Therefore, rank(E) = 2. We can transform F into row echelon form by subtracting the
second row from the third. We obtain the equivalent row echelon form matrix
 
1 2 0
0 1 1 
F′ = 0 0 −1 .

0 0 0

Therefore, rank(F ) = 3.
Let us see some more involved examples of how we can use row operations to transform a
matrix into row echelon form and reduced row echelon form.
Example - Let us consider the matrix
 
1 2 3
A = 4 5 6 .
7 8 9

Our first task is to reduce the matrix to row echelon form. To do this, we need to create
zeroes underneath the leading entries. We can do this by subtracting four times the first row
from the second. We indicate this procedure as follows.

90
   
1 2 3 −−−−−−−−−→ 1 2 3
4 5 6 R2 = R2 − 4R1 0 −3 −6
7 8 9 7 8 9
Note that the “R2 = R2 − 4R1 ” appearing above is not an actual mathematical equation, but
rather a piece of notation for telling us how the row operation has been carried out. There
are many slight variants of this notation that appear throughout the literature, so please be
aware of this when reading other sources.
Similarly, we obtain a zero in the third entry of the first column by subtracting 7 times the
first row.

   
1 2 3 −−−−−−−−−→ 1 2 3
0 −3 −6 R3 = R3 − 7R1 0 −3 −6 
7 8 9 0 −6 −12

We are finished with the first leading entry, but we also need to have zeroes beneath the
second leading entry. The row operation which achieves this is R3 = R3 − 2R2 . With this,
we obtain the matrix  
1 2 3
B = 0 −3 −6 .
0 0 0
This matrix is in row echelon form. To transform this into reduced row echelon form, we
dilate the second row to ensure that all leading coefficients are equal to 1 (we perform the
operation R2 = − 13 R2 ).
After that, we still need to create zeroes above all of the leading coefficients. This means that
the entry in the first row and second column needs to be zero. The operation R1 = R1 − 2R2
achieves this. We summarise these two steps via the following notation.

   
1 2 3 −−−−−−−−−→ 1 2 3
0 −3 −6 R2 = − 1 R2 0 1 2
3
0 0 0 0 0 0
 
−−−−−−−−−→ 1 0 −1
R1 = R1 − 2R2 0 1 2
0 0 0

We have finally obtained a matrix in reduced row echelon form. We can therefore say that
the reduced row echelon form of A is the matrix
 
1 0 −1
C = 0 1 2  .
0 0 0

The process that we have used above to transform a given matrix into (reduced) row echelon
form is called Gaussian elimination.

91
There are many ways to reduce a matrix to row echelon form and reduced row echelon form
via row operations, and the choices mainly come down to choosing an order to perform
the operations. We can often use intuition to spot certain shortcuts that will simplify the
process. We can also state a formal algorithm for doing this, which is essentially implicit in
the examples outlined above.
Step 1 - Begin with the leftmost nonzero column. If necessary, use row interchange so that
this column’s first entry is nonzero.
Step 2 (optional) - Dilate so that the first entry in this column is equal to 1 (this is not
essential for reducing to echelon form, but it usually makes the calculations easier)
Step 3 - Use row replacement operations to create zeros in all positions except for the first
entry of the column.
Step 4 - Ignore the first row of the matrix, and apply steps 1,2 and 3 to the submatrix that
remains. Repeat the process until the matrix is in echelon form (this process will certainly
terminate, since any matrix with exactly one row is in echelon form).
At this point we have a matrix in echelon form, but we can extend the algorithm to produce
a matrix in reduced echelon form.
Step 5 - Beginning with the rightmost leading entry, use row replacement operations to
create zeros above the leading entry. Do this for all of the leading terms, progressing to the
left and up.
Step 6 - Use row dilation so that all of the leading entries are changed to 1 (this will not be
necessary if we performed step 2 every time).
The order of steps 5 and 6 can be reversed. It is often more practical to perform step 6 before
step 5.
Now we discuss how we can solve linear systems by calculating (reduced) row echelon forms of
matrices. We consider the linear system Ax = b with corresponding matrix A = (aij ) ∈ Rm×n
and RHS  
b1
 b2 
b= . 
 
 .. 
bm
Define the augmented matrix (A|b) to be the matrix A with an additional row b added.
That is,  
a11 a12 . . . a1n b1
 a21 a22 . . . a2n b2 
(A|b) =  . ..  .
 
.. .. ..
 .. . . . . 
am1 am2 . . . amn bm
As we outlined at the beginning of this section, applying row operations to a system of linear
equations does not change the solution set. Therefore, if the augmented matrix (C|b′ ) is
obtained from (A|b) using only row operations, then

L(A, b) = L(C, b′ ).

92
Let’s see how to perform Gaussian elimination with augmented matrices to solve linear sys-
tems in practice.
Example - Consider the linear system Ax = b where x ∈ R2 is a (vector) variable,
   
3 5 42
A= and b = .
1 −1 6

We write the augmented matrix


 
3 5 42
(A|b) = .
1 −1 6

Next, we reduce the augmented matrix to reduced row echelon form using row operations.
This means that we perform row operations to transform the left side of the augmented
matrix to reduced row echelon, and in the process record the changes to b on the right side
of the augmented matrix. We obtain

3 5 42 −−−−−−−−−→ 1 −1 6
   
R1 ↔ R2
1 −1 6 3 5 42
−−−−−−−−−→ 1 −1 6
 
R2 = R2 − 3R1
0 8 24
−−−−−−−−−→  
1 1 −1 6
R2 = R2
8 0 1 3
−−−−−−−−−→ 1 0 9
 
R1 = R1 + R2 .
0 1 3

It therefore follows that the set of solutions to Ax = b is equal to the set of solutions to the
system

x1 + 0x2 = 9
0x1 + x2 = 3.

We have therefore proved that


     
3 5 42 9
L , = .
1 −1 6 3

Example - Consider the linear system Ax = b where


   
2 1 −2 5
A= 0 3 6
  and b = 3 .

2 0 −4 4

We write the augmented matrix


 
2 1 −2 5
(A|b) = 0 3 6 3 .
2 0 −4 4

93
Next, we reduce the augmented matrix to reduced row echelon form using row operations.
   
2 1 −2 5 −−−−−−−−−→ 2 1 −2 5
0 3 6 3 R3 = R3 − R1 0 3 6 3
2 0 −4 4 0 −1 −2 −1
 
−−−−−−−−−→ 2 1 −2 5
R3 = −R3 0 3 6 3
0 1 2 1
 
−−−−−−−−−→ 2 1 −2 5
R2 ↔ R3 0 1 2 1
0 3 6 3
 
−−−−−−−−−→ 2 1 −2 5
R3 = R3 − 3R2 0 1 2 1
0 0 0 0
 
−−−−−−−−−→ 2 0 −4 4
R1 = R1 − R2 0 1 2 1
0 0 0 0
 
−−−−−−−−−→ 1 0 −2 2
1 0
R1 = R1 1 2 1 .
2
0 0 0 0

The left side is of the augmented matrix is now in reduced row echelon form, and so we are
ready to finalise the solution. The system is equivalent to

x1 − 2x3 = 2
x2 + 2x3 = 1.

This system has a free variable, which we set as x3 . We rewrite this system as

x1 = 2 + 2x3
x2 = 1 − 2x3 .

We therefore conclude that


  
 2 + 2x3 
L(A, b) = 1 − 2x3  : x3 ∈ R .
x3
 

2.3.1 The relationship between Rank(A) and the number of solutions to Ax = b

The next result highlights the convenient simplicity of the reduced row echelon form of a
square matrix with full rank.

Lemma 2.15. Let A ∈ Rn×n be a square matrix and let C ∈ Rn×n be its reduced row echelon
form. Then
rank(A) = n ⇐⇒ C = In .

94
Proof. (⇐) We prove the contrapositive form. Suppose that rank(A) ̸= n. Then C contains
at least one zero row, and thus C ̸= In , as required.
(⇒) Suppose that rank(A) = n. Then C is a square matrix in reduced echelon form with
at least one entry in each row. It must then be the case that the leading coefficients are the
diagonal entries of C. Since C is in reduced echelon form, these leading coefficients are all 1.
Also, since C is in reduced echelon form, all of the other entries in the same column as the
leading coefficients must be zero. It follows that C = In .

This leads to the following nice characterisation of square matrices with full rank.

Theorem 2.16. If A ∈ Rn×n is a square matrix with rank(A) = n, then the linear system
Ax = b has a unique solution for any b ∈ Rn .

Proof. Let C be the reduced row echelon form of the matrix A. By Lemma 2.15, C = In .
The set of solutions to Ax = b is the same as the set of solutions to Cx = b′ , where b′ is
some fixed vector in Rn . However, since C = In , the system Cx = b′ has the unique solution
x = b′ .

Let A ∈ Rm×n and let C be the reduced echelon form matrix equivalent to A. Note that the
linear system Ax = b is inconsistent, i.e., has no solutions, if and only if there is a zero row
in C (from the augmented matrix (C|b′ )) and the corresponding entry of b′ is not equal to
zero.
Since there are exactly m rows, and k = rank(A) of them are non-zero, we see that a linear
system Ax = b is solvable (independent from the RHS b) if rank(A) = m.
We can see from the discussion above and the previous few examples, as well as Theorem
2.16, that the rank of a matrix A is very influential in determining whether a system Ax = b
has no solutions, a unique solution, or infinitely many solutions. We summarise some more
features of the relationship between rank and the number of solutions in the next theorem.

Theorem 2.17. Let A ∈ Rm×n .

1. If rank(A) < m, then there is some b ∈ Rn such that the linear system Ax = b has no
solutions.

2. If rank(A) < n, then the homogeneous system Ax = 0 has infinitely many solutions.

3. If rank(A) < n, then the system Ax = b either has no solutions or has infinitely many
solutions.

Proof. 1. Let C be the reduced row echelon form matrix equivalent to A. Since rank(A) <
m, the last row of C is a zero row. Let b′ ∈ Rm be any vector with a non-zero entry
b′m ̸= 0 in the final position. Observe that that the system Cx = b′ has no solutions.
Since A and C are equivalent, we can perform row operations on the augmented matrix
(C|b′ ) to transform it into the equivalent matrix (A|b) (here b is a vector in Rm which
is obtained by performing the row operations which transform C into A). The solutions
to the system Ax = b are exactly the same as those for the system Cx = b′ . Therefore,
there are no solutions to Ax = b.

95
2. Let C be the reduced echelon form matrix equivalent to A. If the rank of A is strictly
less than the number of columns, then there must exist a column in C which does
not contain a leading coefficient. The corresponding variable can be treated as a free
variable.
3. Suppose that Ax = b has at least one solution. By part 2 of this theorem, Ax = 0 has
a non-trivial solution (i.e. a solution x ̸= 0). Lemma 2.8 then implies that Ax = b has
infinitely many solutions.

We can use this to strengthen Theorem 2.16, as follows.


Theorem 2.18. Let A ∈ Rn×n be a square matrix. Then
rank(A) = n ⇐⇒ the linear system Ax = b has a unique solution for any b ∈ Rn .

Proof. The “⇒” direction is Theorem 2.16. The other direction follows from Theorem 2.17.

2.3.2 Row operations are the same as elementary matrix multiplication

The final key idea of this section to introduce is that row operations are equivalent to multi-
plication by certain “elementary” matrices. This interpretation of the row operations will be
useful as we seek to build a basic theory of matrices.
An elementary matrix is a matrix which can be obtained from the identity matrix by a single
row operation. There are three kinds of elementary matrices
1. A matrix corresponding to row interchange, for example performing the operation R2 ↔ R1
gives  
0 1 0
E1 = 1 0 0 .
0 0 1

2. A matrix corresponding to row dilation, for example performing the operation R2 = 3R2
gives  
1 0 0
E2 = 0 3 0 .
0 0 1

3. A matrix corresponding to row replacement, for example performing the operation R3 =


R3 + 4R1 gives  
1 0 0
E3 = 0 1 0 .
4 0 1

Furthermore, every row operation that we perform can be restated as matrix multiplication
by an elementary matrix. For example, consider the matrix
 
1 0 −2
A = −3 1 4 .
2 −3 4

96
We would typically start the process of reducing this to echelon form by performing the
operation R2 = R2 + 3R1 . We obtain
 
1 0 −2
0 1 −2 .
2 −3 4

However, this is the same thing as left-multiplying by the matrix


 
1 0 0
E = 3 1 0 .
0 0 1

Indeed     
1 0 0 1 0 −2 1 0 −2
3 1 0 −3 1 4  = 0 1 −2 .
0 0 1 2 −3 4 2 −3 4

The same is true for all elementary row operations. This hints at the following result.

Lemma 2.19. Let A and B be m × n matrices and suppose that A and B are equivalent.
Then, there exists a sequence of elementary matrices E1 , E2 , . . . Ek such that

B = Ek Ek−1 . . . E1 A.

Proof. The proof follows immediately from the preceding discussion.

Note that, since elementary matrices are derived from the identity matrix via a row opera-
tions, it follows that they are always square. A useful fact about elementary matrices is that
they are always invertible, and their inverses are also elementary matrices.

Lemma 2.20. If E ∈ Rn×n is an elementary matrix then E is invertible and E −1 is also an


elementary matrix.

Proof. This is left as an exercise.

97
2.4 Matrices as linear transformations

We can use our notion of matrix multiplication to view matrices as a kind of function, or
transformation.
Let A ∈ Rm×n , we say that TA : Rn → Rm is the matrix transformation of A. This function
is defined by
TA (x) = Ax.

Example - Let 

1 −3
A= 3 5 .
−1 7
Then, for an arbitrary vector  
x1
x= ∈ R2
x2
we have    
1 −3   x1 − 3x2
x
TA (x) =  3 5  1 =  3x1 + 5x2  .
x2
−1 7 −x1 + 7x2
Note that TA (x) is only defined if x ∈ R2 .
An important class of functions is the set of linear transformations.
Definition 2.21. A function T : Rn → Rm is a linear transformation if

1. For all x, y ∈ Rn , T (x + y) = T (x) + T (y),

2. For all x ∈ Rn and all λ ∈ R, T (λx) = λT (x).


Theorem 2.22. Let A be an m × n matrix. Then TA : Rn → Rm is a linear transformation.

Proof. We proved in an earlier exercise that A(x + y) = Ax + Ay and A(λx) = λAx. This
immediately implies the theorem.

The next result shows that there is a direct correspondence between matrices and linear
transformations.
Theorem 2.23. Let T : Rn → Rm be a linear transformation. Then there exists a unique
matrix A such that T = TA . In fact, A is the m × n matrix

A = (T (e1 ) T (e2 ) . . . T (en )).

Proof. This is left as an exercise.

Theorem 2.23 can then be used to prove the following result, which will be useful later in
this chapter.
Theorem 2.24. Let A ∈ Rn×n be a square matrix. Then

TA is invertible ⇐⇒ A is invertible.

98
Proof. Throughout this proof, we use Id to denote the identity function with domain Rn .
(⇐) Suppose that A is invertible. Then AA−1 = In = A−1 A. Therefore,
TA ◦ TA−1 = TAA−1 = TIn = Id,
and similarly
TA−1 ◦ TA = TA−1 A = TIn = Id.

(⇒) Suppose that TA is invertible, and so there is some function T : Rn → Rn such that
T ◦ TA = Id = TA ◦ T .
Claim. T is a linear transformation.

First we will show that the claim implies the theorem, and then we will prove the claim
Claim ⇒ Theorem 2.24 - Since T is a linear transformation, Theorem 2.23 implies that
T = TB for some matrix B ∈ Rn×n . Therefore
Id = TA ◦ T = TA ◦ TB = TAB , and Id = T ◦ TA = TB ◦ TA = TBA .
Therefore, for all x ∈ Rn , we have
AB(x) = x and BA(x) = x.
It follows that AB = In = BA, and so A is invertible with A−1 = B.
It remains to prove the claim.

Proof of Claim. Let x, y ∈ Rn and λ ∈ R be arbitrary. Since T is the inverse of TA , we have


TA (T (x)) = x = T (TA (x)), and TA (T (y)) = y = T (TA (y)).
Therefore,
T (x + y) = T (TA (T (x)) + TA (T (y)))
= T (A(T (x)) + A(T (y)))
= T (A(T (x) + T (y)))
= T (TA (T (x) + T (y)))
= T (x) + T (y).
Similarly,
T (λx) = T (λTA (T (x)))
= T (λA(T (x)))
= T (A(λT (x)))
= T (TA (λT (x)))
= λT (x).
This completes the proof of the claim, and therefore also completes the proof of the theorem.

99
2.5 Determinants

We now introduce the determinant of a matrix. This is a particularly useful tool for deter-
mining if a matrix is invertible or not.
Given a matrix A, let Aij denote the matrix with row i and column j of A removed. For
example, let

 
3 2 1
A = 6 7 8 .
1 −1 −1
Then    
6 7 3 1
A13 = , A22 = .
1 −1 1 −1

We use this to define the determinant recursively.

Definition 2.25. For a 1 × 1 matrix A whose only entry is a, we say that the determinant
of A is a, and write det(A) = a.
Suppose that n ≥ 2 and A ∈ Rn×n . The determinant of A, denoted det(A) is
n
X
(−1)1+j a1j det(A1j ).
j=1

We sometimes omit the brackets and simply write det A. Note that the determinant is only
defined for square matrices.
We remark here that there are many equivalent definitions of the determinant that may be
found elsewhere in the literature.
Example - For 2 × 2 matrices, the definition above gives a simple description of the deter-
minant. Let  
a11 a12
A= .
a21 a22
Then, according to the definition above, we have

det A = (−1)1+1 a11 det(A11 ) + (−1)1+2 a12 det(A12 )


= (−1)1+1 a11 det(a22 ) + (−1)1+2 a12 det(a21 )
= a11 a22 − a12 a21 .

Example - Compute the determinant of

 
3 2 1
A = 6 7 8 .
1 −1 −1

100
Using the definition of the determinant above, this can be written as the sum of three smaller
determinants. We obtain
     
7 8 6 8 6 7
det A = 3 · det − 2 · det + 1 · det
−1 −1 1 −1 1 −1
= 3 · [7 · (−1) − 8 · (−1)] − 2 · [6 · (−1) − 8 · 1] + 1 · [6 · (−1) − 7 · 1]
= 3 · 1 − 2 · (−14) + 1 · (−13) = 18.

Another common notation for the determinant of a matrix is to use a pair of vertical lines
instead of the usual round brackets for the matrix border (resembling the notation for the
absolute value function). So, we may also write
3 2 1
6 7 8 = 18.
1 −1 −1

The definition of the determinant is given by expanding along the first row of the matrix.
In fact, there is much more flexibility, and we can also expand along any row or column to
calculate the determinant.
Theorem 2.26. Let A be an n × n matrix, n ≥ 2. Then, for any 1 ≤ i ≤ n
n
X
det A = (−1)i+j aij det Aij
j=1

and for any 1 ≤ j ≤ n


n
X
det A = (−1)i+j aij det Aij .
i=1

We skip the proof of this result because we are running out of time. This result is very
useful, in both theory and practice. In particular, it can provide a convenient shortcut for
calculating determinants by choosing a row or column that contains many zeroes.
Example - Compute the determinant of
 
1 2 100

3 5 2 .
0 0 1
To make things easier, we use the third row. The presence of many zeros in this row makes
our calculations quicker.
det A = 0 · det A31 − 0 · det A32 + 1 · det A33 = 1 · det A33 = 1 · (1 · 5 − 2 · 3)) = −1.

Another example is the following.


Example - Compute the determinant of
 
3 −7 8 9 −6
0 2 −5 7 3
 
0 0 1 5 0 .
 
0 0 2 4 −1
0 0 0 −2 0

101
If we used the original definition of the determinant, it would require many computations to
calculate det A. However, the many zeroes appearing in the first column can be used to make
things easier. We obtain
 
2 −5 7 3
0 1 5 0
det A = 3 · det  
0 2 4 −1
0 0 −2 0
 
1 5 0
= 3 · 2 · det 2 4 −1
0 −2 0
 
1 5
= 3 · 2 · (−1) · (−1). det
0 −2
= 3 · 2 · (−2) = −12.

Definition 2.27. An m × n matrix is said to be upper triangular if all of the entries below
the main diagonal are zero. That is, A is upper triangular if aij = 0 for all i > j. A matrix
is lower triangular if aij = 0 for all i < j. A matrix is triangular if it is either upper
triangular or lower triangular.

Lemma 2.28. Let A = (aij ) be an n × n triangular matrix. Then det A = a11 · a22 · · · ann .

Proof. This is left as an exercise.

Exercise - Let A ∈ Rn×n . Show that det(AT ) = det A.


The next result concerns how row operations change the determinant of a matrix.

Lemma 2.29. Let A ∈ Rn×n with n ≥ 2. Then,

1. If the matrix B is obtained by multiplying one row of A by a scalar λ ∈ R, then


det B = λ det A. In particular, det(λA) = λn det(A).

2. If the matrix B is obtained by interchanging two rows of A, then det B = − det A.

3. If the matrix B is obtained from A by adding a multiple of one row of A to another


row, then det B = det A.

Proof. 1. Suppose that B is obtained from A by dilating the ith row by λ. So

bij = λaij , ∀ 1 ≤ j ≤ n

and
bkj = akj , ∀1 ≤ k, j ≤ n such that k ̸= i.

102
In particular, it follows that Aij = Bij for all 1 ≤ j ≤ n. The determinant det B can
be calculated by expanding along the ith row. We obtain
n
X
det B = (−1)i+j bij det Bij
j=1
n
X
= (−1)i+j λaij det Bij
j=1
Xn
= (−1)i+j λaij det Aij
j=1
Xn
=λ (−1)i+j aij det Aij
j=1

= λ det A.

2. Proof by induction on n. The base case n = 2 can be verified directly. Suppose that
 
a b
A=
c d

and  
c d
B= .
a b
Then
det A = ad − bc = −(cb − da) = − det B.
Now let n ≥ 3 and suppose that the result holds for dimension (n−1)×(n−1) matrices.
Choose a row of B that is the same as the corresponding row of A. Since there are at
least 3 rows, and only two rows change, such an unchanged row is guaranteed to exist.
The determinant of B can be calculated by expanding along this row (let us say that
the ith row is the same in B and A). We obtain
n
X n
X
det B = (−1)i+j bij det Bij = (−1)i+j aij det Bij
j=1 j=1

Now observe that the matrix Bij ∈ R(n−1)×(n−1) is obtained from the matrix Aij by
interchanging two rows. Therefore, by the induction hypothesis, det Bij = − det Aij .
Finally,
n
X n
X
det B = (−1)i+j aij det Bij = − (−1)i+j aij det Aij = − det A.
j=1 j=1

3. A similar proof by induction argument to the one used for part 2 of this lemma can be
used here. This is left as an exercise.

Observe that the previous result immediately implies the following handy consequence.

103
Corollary 2.30. Let A ∈ Rn×n and suppose that B can be obtained from A by row operations.
Then
det A = 0 ⇐⇒ det B = 0.

Lemma 2.29 can be useful for calculating determinants. The idea is simply to reduce a given
matrix to row echelon form (which is a triangular matrix), and to then use Lemma 2.28 to
quickly calculate the determinant of the echelon form matrix. We need to keep track of the
row operations we carry out, and include this factor in our calculation.
Example - Consider the matrix  
1 −4 2
−2 8 −9 .
−1 7 0
By Lemma 2.29,
     
1 −4 2 1 −4 2 1 −4 2
det −2 8 −9 = det 0 0 −5 = − det 0 3 2  = (−1) · 1 · 3 · (−5) = 15.
−1 7 0 0 3 2 0 0 −5

In the first step we only performed the operation of adding a multiple of one row to another
row (R2 = R2 +2R1 and R3 = R3 +R1 ), so we did not need to introduce a multiplicative factor.
In the second step, we switched the sign because we applied a row interchange operation.
The determinant is of special interest, because it can be used to characterise if a matrix
is invertible and if a linear system is uniquely solvable. Recalling Theorem 2.16, the latter
property is equivalent to the corresponding matrix having full rank.

Lemma 2.31. Let A ∈ Rn×n . Then

det A ̸= 0 ⇐⇒ rank(A) = n.

Proof. By Theorem 2.13, A can be transformed into reduced row echelon form C by perform-
ing row operations.
(⇒) We will prove the contrapositive form. Suppose that rank(A) ̸= n. Then, by definition,
C has a zero row, and in particular, one of the diagonal entries of C is equal to zero. On
the other hand, C is an upper triangular matrix, and so Lemma 2.28 implies that det C = 0.
Since C is obtained from A via row operations, Corollary 2.30 then implies that det A = 0.
(⇐) Suppose that rank(A) = n. Then by Lemma 2.15, C = In . Thus det C = 1, and
Corollary 2.30 then implies that det A ̸= 0.

Lemma 2.32. Let A ∈ Rn×n . Then

A is invertible ⇐⇒ rank(A) = n.

Proof. By Theorem 2.16,

rank(A) = n ⇐⇒ ∀ b ∈ Rn , there exists a unique x ∈ Rn such that Ax = b. (57)

104
Matrix multiplication can also be considered as a function on vectors. Indeed, recall that we
defined the linear transformation TA : Rn → Rn by

TA (x) = Ax.

The right hand side of the equivalence (57) is equivalent to the function TA being bijective
(for all b ∈ Rn , there exists exactly one x ∈ Rn such that TA (x) = b). Therefore, by Theorem
1.13 and Theorem 2.24, we conclude that

rank(A) = n ⇐⇒ ∀ b ∈ Rn , there exists a unique x ∈ Rn such that Ax = b


⇐⇒ TA is a bijection
⇐⇒ TA is invertible
⇐⇒ A is invertible.

Combining the previous two lemmas, we have the following important theorem relating de-
terminants and invertibility.
Theorem 2.33. Let A ∈ Rn×n . Then

A is invertible ⇐⇒ det A ̸= 0.

Moreover, we can combine several of the statements we have recently proved into one big
statement which shows that several important properties of square matrices are equivalent.
Theorem 2.34. Let A ∈ Rn×n . Then the following statements are equivalent.

1. A is invertible.
2. det A ̸= 0.
3. rank(A) = n.
4. The linear system Ax = b has a unique solution for any b ∈ Rn .
5. A is equivalent to In .

Example - We computed earlier that


3 −7 8 9 −6
0 2 −5 7 3
0 0 1 5 0 = −12.
0 0 2 4 −1
0 0 0 −2 0
It therefore follows from Theorem 2.33 that the matrix
 
3 −7 8 9 −6
0 2 −5 7 3
 
0 0 1 5 0
 
0 0 2 4 −1
0 0 0 −2 0

105
is invertible.
The next statement says that the determinant of the product of two matrices is equal to the
product of the determinants.

Theorem 2.35. Let A, B ∈ Rn×n . Then

det(AB) = det(A) · det(B).

We will first prove this theorem in the special case when one of the matrices is elementary.

Lemma 2.36. Let B ∈ Rn×n and let E ∈ Rn×n be an elementary matrix. Then

det(EB) = det(E) · det(B).

Proof. There are three cases to consider, corresponding to the three row operations that E
may represent.

1. Suppose that E is an elementary matrix corresponding to row interchange. Then, since


E is obtained from In by interchanging two rows and det(In ) = 1, it follows from
Lemma 2.29 that det(E) = −1. On the other hand, the matrix EB is obtained from B
by interchanging two rows, and so Lemma 2.29 again implies that det(EB) = − det B =
(det E)(det B).

2. Suppose that E is an elementary matrix corresponding to row dilation. The proof is


similar to the first case, and is left as an exercise.

3. Suppose that E is an elementary matrix corresponding to adding a multiple of one row


to another row. The proof is similar to the first case, and is left as an exercise.

Proof of Theorem 2.35. Case 1 - Suppose that either det(A) = 0 or det(B) = 0. We consider
the case when det(A) = 0 in detail only, as the case when det(B) = 0 can be handled similarly.
It follows from Lemma 2.31 that rank(A) < n. Part 1 of Theorem 2.17 then implies that
there is some vector b ∈ Rn such that Ax = b has no solutions.
Suppose for a contradiction that det(AB) ̸= 0. Then, by Theorem 2.34, there exists y ∈ Rn
such that (AB)y = b, where b is the same vector as in the previous paragraph. This is a
contradiction, since if we write By = x, we see that

b = (AB)y = A(By) = Ax,

contradicting the fact that Ax = b has no solutions.


Case 2 - Suppose that det(A) and det(B) are both non-zero. Theorem 2.34 then implies
that rank(A) = rank(B) = n. Then, by Lemma 2.15, it follows that both A and B can be
reduced to In by elementary row operations.

106
Since A can be reduced to In by a sequence of row operations, this process can be reversed
so that In is transformed to A by a sequence of row operations. It follows that there is a
sequence of elementary matrices E1 , . . . , Ek such that

A = Ej . . . E1 In = Ej . . . E1 .

Similarly,
B = Fk . . . F1 ,
where F1 , . . . Fk are elementary matrices.
Repeated applications of Lemma 2.36 imply that

det(A) = det(Ej . . . E1 )
= det(Ej ) det(Ej−1 . . . E1 )
..
.
= det(Ej ) · · · det(E1 ). (58)

Similarly,
det(B) = det(Fk ) · · · det(F1 ). (59)
Finally,

det(AB) = det(Ej . . . E1 Fk . . . F1 )
= det(Ej ) det(Ej−1 . . . E1 Fk . . . F1 )
..
.
= det(Ej ) · · · det(E1 ) · det(Fk ) · · · det(F1 )
= det(A) · det(B).

In the final step, we have used (58) and (59).

Example
Consider the matrix  
15 10 24
A = 15 22 12 .
12 4 33
It looks like it might be tricky to compute det A, at least without a calculator. We are,
fortunately, given the information that
  
1 2 4 1 2 4
A = 3 4 0 3 4 0 .
2 0 5 2 0 5

We can therefore deduce from Theorem 2.35 that


 2
1 2 4
det A =  3 4 0  .
2 0 5

107
It remains to compute the easier determinant, which we do here by expanding along the final
row:
1 2 4
2 4 1 2
3 4 0 =2 +5 = 2 · (−16) + 5 · (−2) = −42.
4 0 3 4
2 0 5
Therefore,
det A = (−42)2 = 1764.

We can also use Theorem 2.35, combined with Theorem 2.34, to quickly determine whether
a product of two matrices is invertible.

Theorem 2.37. Let A, B ∈ Rn×n . Then

AB is invertible ⇐⇒ A and B are both invertible.

Proof.

AB is invertible ⇐⇒ det(AB) ̸= 0
⇐⇒ (det A) · (det B) ̸= 0
⇐⇒ det A ̸= 0 and det B ̸= 0
⇐⇒ A and B are both invertible.

Now that we know that the determinant “behaves well” with respect to transposition and
multiplication, one might guess that a similar relation also holds for addition. However, this
is not the case, and there is no similar formula for the determinant of the sum of
matrices as the following simple example shows. Example Consider the matrices

1 0 0 0
A= and B= ,
0 0 0 1

and observe that A + B = I2 . Since A and B both contain a zero row, it follows that

det A = 0 = det B.

Since det(A + B) = det In = 1, it follows that

det(A + B) ̸= det A + det B.

Moreover, this example also shows that an analogue of Theorem 2.37, with sums instead of
products, is not true.

108
2.6 Inverse matrices

We now take a closer look at inverse matrices. One of the main motivations for working with
the inverse matrix is that it can be used to solve linear systems in a straightforward way.
Indeed, since the inverse matrix A−1 satisfies AA−1 = A−1 A = In , we have that

Ax = b ⇐⇒ x = A−1 b. (60)

In the previous section, we saw how the determinant can be used to quickly determine whether
or not a matrix is invertible. However, so far, we do not have a method for calculating what
the inverse is. Developing such a method will be the focus of this section. There are different
methods for calculating the inverse of a matrix, and we choose a method which is an extension
of the Gaussian elimination techniques we learned earlier in this chapter.
Suppose that we have a matrix A ∈ Rn×n that we know is invertible. Let ci denote the ith
column of A−1 , so
A−1 = (c1 . . . cn ).
Recall that we discussed earlier (see the discussion before Lemma 2.2) that unit vectors can
be used to extract columns from matrices. In particular,

ci = A−1 ei .

Then, following the logic of (60) (in other words, left-multplying both sides of the above
equation by A), we have
Aci = ei .
This shows that calculating ci is equivalent to solving the linear system Ax = ei . In section
2.3, we learnt how to do this by considering the augmented matrix (A|ei ), and using Gaussian
elimination to reduce this to (In |x). The resulting right hand side of the resulting augmented
matrix is a solution to the system Ax = b.
To use this method to calculate the inverse A−1 , we must repeat this procedure for each
column of A. However, we can apply the Gaussian elimination procedure to several vectors
at once! Hence, we can compute all columns of A−1 at once by computing the reduced row
echelon form of the augmented matrix
 
a11 . . . a1n 1 0
 .. .. ..
. .

 . .
an1 . . . ann 0 1

If A is invertible then the reduced echelon form of A is the identity matrix In (see Theorem
2.34. Thus, by using Gaussian elimination, we are able to compute

(A|In ) → (I|A−1 ).

We consolidate the discussion above in the following theorem.


Theorem 2.38. Let A ∈ Rn×n be an invertible matrix. Then, the reduced row echelon form
of  
a11 . . . a1n 1 0
 .. .. ..
. .

 . .
an1 . . . ann 0 1

109
has the form
0 a′11 . . . a′1n
 
1
 .. .. ..  .
 . . . 
0 1 a′n1 . . . a′nn
The matrix on the right hand side of the second augmented matrix above is the inverse of A.
That is,
A−1 = (a′ij )ni,j=1 .

Example - We want to compute the inverse of


 
1 2
A= .
3 4
By Gaussian elimination
1 2 1 0 −−−−−−−−−→ 1
   
2 1 0
R2 = R2 − 3R1
3 4 0 1 0 −2 −3 1
−−−−−−−−−→ 1
 
0 −2 1
R1 = R1 + R2
0 −2 −3 1
−−−−−−−−−→ 1
 
0 −2 1
R2 = (−1/2)R2 .
0 1 3/2 −1/2
It therefore follows from Theorem 2.38 that
 
−1 −2 1
A = . (61)
3/2 −1/2
Note that this conclusion agrees with the claim made on page 79, when we stated without
proof that this matrix A is invertible with the inverse matrix as in (61).
There are many opportunities for miscalculations throughout this process, and it is a good
idea to check that your solution is correct by checking that AA−1 = In .
Example
We want to compute the inverse of
 
1 0 2
A = 4 1 8 .
0 1 1
By Gaussian elimination
   
1 0 2 1 0 0 −−−−−−−−−→ 1 0 2 1 0 0
4 1 8 0 1 0 R2 = R2 − 4R1 0 1 0 −4 1 0
0 1 1 0 0 1 0 1 1 0 0 1
 
−−−−−−−−−→ 1 0 2 1 0 0
R3 = R3 − R2 0 1 0 −4 1 0
0 0 1 4 −1 1
 
−−−−−−−−−→ 1 0 0 −7 2 −2
R1 = R1 − 2R3 0 1 0 −4 1 0 .
0 0 1 4 −1 1

110
It therefore follows from Theorem 2.38 that
 
−7 2 −2
A−1 = −4 1 0 .
4 −1 1

111
3 Sequences and Series

This chapter is dedicated to formalising the idea of the limiting processes. It forms one of
the central ideas of mathematical analysis and defines the basis for essential concepts like
continuity, differentiability, integration etc.
For example, consider the following infinite sum of powers of 2:

1 1 1 X
1 + + + + ··· = 2−k .
2 4 8
k=0

As we add more and more terms (i.e. as k gets larger and larger), we get closer and closer
to 2, and we can get as close as we want (i.e. arbitrarily close) by adding enough terms. It
is intuitive to suggest that
X∞
2−k = 2.
k=0

We will develop a framework to consider such infinite sums more formally in this chapter.
We begin with the definition of a sequence.

Definition 3.1. Let M ̸= ∅ be an arbitrary set, and let I ⊂ Z be an infinite set. A sequence
in M is a mapping a : I → M . With the notation an := a(n), we can write the sequence as
(an )n∈I .
The range of a sequence (an )n∈I is the set {an : n ∈ I}. The domain I of a sequence is
called the index set of the sequence.
In most cases, we consider I = N or I = {K, K + 1, . . . } for some K ∈ Z. In the latter case,
we write the sequence as (an )n∈I = (an )∞
n=K . If the index set is clear, we may just write (an )
for (an )n∈I .

In the special cases M = R or M = C we say that (an )n∈I is a real-valued or complex-


valued sequence, respectively. In this course, we will almost always deal with real-valued or
complex-valued sequences.
To define a sequence, the most common way is to use an explicit formula, for instance

an = 2n (62)

or
1
bn = 1 + .
n
We can also define a sequence by recursion. This means that we give one (or more) starting
value(s) and a rule for how to calculate a new term using previous terms. For example, we
can set a1 = 2 and define ai = 2ai−1 for all i ≥ 2. This is another description of the sequence
(62).
Example - Consider the sequence (an )n∈N given by an = n1 . We can also represent this
sequence by listing its elements, that is this sequence is the same thing as the list
1 1
1, , , . . . .
2 3

112
We can immediately observe that the terms of this sequence get very close to 0 as n gets
large, but never reach zero. We say that the sequence converges to 0. We will give a formal
definition of what this means in the next section.
Example - One of the most famous sequences, which appears in several areas of natural
science, is the Fibonacci sequence. Here, the recursion depends on the previous two
values. The sequence (Fn )n∈N is defined by

F1 = 1, F2 = 1, and Fn = Fn−1 + Fn−2 for n ≥ 3.

The first values of this sequence are 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . . . It is an
√ interesting
1+ 5
phenomenon that the quotients Fn+1 /Fn converge to the golden ratio 2 . (See e.g.
Wikipedia for background on the importance of this constant).
Pn
Example - Given a sequence an , we can consider the new sequence sn = k=1 ak . We
would like to know if sn approaches a certain number when n goes to infinity. These special
sequences are called series, and we come back to this later in the chapter.

3.1 Convergence of sequences

The concept of convergence is central to mathematical analysis. Intuitively, it states that the
terms of the sequence (an )n∈N approach a limit with growing index n.

an

To define what it means to converge to something, the notion of a neighbourhood may be


useful.

Definition 3.2. Let M = R or M = C, a ∈ M and let ϵ > 0 be a real number. We define


the ϵ-neighbourhood of a in M by

Uϵ (a) := {x ∈ M : |x − a| < ϵ}.

Note that, for M = R, the ϵ-neighbourhood Uϵ (a) is just the open interval (a − ϵ, a + ϵ). For
M = C, Uϵ (a) is the disc of radius ϵ around centred at a in the complex plane C.

113
One can also consider neighbourhoods in a much more general situation. All we need is a
notion of distance. For instance, one may define the ϵ-neighbourhood of a point x ∈ Rd . We
may consider such generalisations later in the course.
We now come to the formal definition of convergence to a limit a.

Definition 3.3. Let (an )n∈N be a complex-valued sequence and a ∈ C. We say that the
sequence (an )n∈N converges to a if and only if

∀ ϵ > 0, ∃ n0 ∈ N : n ≥ n0 =⇒ |an − a| < ϵ.

An equivalent definition is

∀ ϵ > 0, ∃ n0 ∈ N : n ≥ n0 =⇒ an ∈ Uϵ (a).

We call a the limit of the sequence and write

a = lim an
n→∞

which is sometimes abbreviated to


an → a.
The sequence (an )n∈N is called convergent if there exists some a ∈ C such that an → a.
Otherwise, the sequence (an )n∈N is called divergent.

This definition may appear intimidating to some readers on first viewing. Let’s try to draw
a picture to illustrate the meaning of the definition.

an

a+ϵ
a
a−ϵ

n0 n

What this definition says is that, no matter how tiny we choose ϵ to be, the sequence will
always eventually stay within a distance ϵ of the limit a.
Note that the limit does not depend on the first terms of a sequence. In particular, we can
always disregard finitely many elements of a sequence when considering its limiting behaviour.
For instance, if (an )n∈N and (bn )n∈N are two sequences such that an = bn holds for all but
finitely many values of n, and limn→∞ an = a, then limn→∞ bn = a.
Let us consider some examples.

114
Example - Consider again the sequence (an )n∈N with an = n1 . For all ϵ > 0 we can find
some n0 ∈ N such that n10 < ϵ. This is the Archimedean property (in particular, see Theorem
1.39). Since n1 ≤ n10 for n ≥ n0 , we obtain

1 1
|an − 0| = ≤ <ϵ
n n0
for all n ≥ n0 . Therefore, an → 0.
Example - Consider the sequence (an )n∈N with an = (−1)n , i.e. the sequence which alter-
nates between 1 and −1. This sequence is divergent. For a proof we assume the opposite,
i.e., that (an ) converges to some a ∈ R. Now, by the definition of convergence, we have that
there exists some n0 such that

an ∈ U 1 (a) for all n ≥ n0


2

and thus we have by the triangle inequality,


1 1
|an+1 − an | = |an+1 − a + a − an | ≤ |an+1 − a| + |a − an | < + <2 (63)
2 2
for all n ≤ n0 . (Note that the value 1/2 is quite arbitrary here, and any choice of ϵ < 1 would
work.) However, it is also true for all n ∈ N that

|an+1 − an | = 2.

But this contradicts (63), and completes the proof.

Definition 3.4. Let (an )n∈N be a real sequence such that limn→∞ an = 0. Then we call
(an )n∈N a null sequence.

1
and (2−n )n∈N are null sequences.

Example - The sequences n n∈N

Exercise - Show that, for any c > 0 the sequence n1c n∈N is a null sequence.


We now consider the notion of boundedness. This is very similar to the notion of boundedness
for sets which we considered in Section 1.6.

Definition 3.5. Let (an )n∈N be a complex valued sequence. We call the sequence bounded
(by C) if and only if
∃ C > 0 : ∀ n ∈ N, |an | ≤ C.
A real-valued sequence (an )n∈N is bounded from above if and only if

∃ C ∈ R : ∀ n ∈ N, an ≤ C,

and bounded from below if and only if

∃ C ∈ R : ∀ n ∈ N, an ≥ C.

Examples - The sequences


((−1)n )n∈N

115
and  
42
n n∈N

are bounded (by 1 and 42 respectively). We also have that (−1)n 42



n n∈N is bounded (by
42). Actually, we easily see from the definition that the (term-wise) product of two bounded
sequences is bounded. The triangle inequality shows that the sum of two bounded sequences
is also bounded.
Let θ ∈ [0, 2π) be arbitrary and consider the sequence

(eniθ )n∈N .

This sequence is bounded by 1.


The sequence √
( n)n∈N
is bounded from below by 0 and is not bounded from above. What about the sequence

((−1)F (n) n)n∈N ,

where F (n) is the nth term in the Fibonacci sequence?fi


Next, we make an observation concerning the relationship between the convergence and
boundedness of a sequence.

Theorem 3.6. Let (an )n∈N be a convergent sequence. Then (an )n∈N is bounded.

Proof. Let a = limn→∞ an . By the definition of convergence (with ϵ = 1), there exists n0
such that, for all n ≥ n0 ,
|a − an | < 1.
It follows from the triangle inequality that

|an | = |(an − a) + a| ≤ |an − a| + |a| < 1 + |a|

holds for all n ≥ n0 . It therefore follows that, for all n ∈ N,

|an | ≤ max{1 + |a|, |a1 |, |a2 |, . . . , |an0 −1 |}.

Note that the sequence (an )n∈N given by an = (−1)n is bounded by 1 but not convergent.
Therefore, the opposite implication for the theorem above does not hold in general.
Example - Consider the sequence (an )n∈N with an = log2 n. This sequence is unbounded,
and it therefore follows from Theorem 3.6 that the sequence is divergent.
In the previous example, we have seen a sequence that appears to “converge to infinity”. We
now introduce the terminology of definite divergence of a real-valued sequence in order to
give a formal description for this kind of situation.

116
Definition 3.7. Let (an )n∈N be a real-valued sequence. The sequence (an )n∈N tends to ∞ if
and only if
∀ C ∈ R, ∃ n0 ∈ N : ∀ n ≥ n0 , an ≥ C.
In this case, we write limn→∞ an = ∞ or an → ∞ and call ∞ the improper limit of
(an )n∈N .
The sequence (an )n∈N tends to −∞ if and only if

∀ C ∈ R, ∃ n0 ∈ N : ∀ n ≥ n0 , an ≤ C.

In this case, we write limn→∞ an = −∞ or an → −∞ and call −∞ the improper limit of


(an )n∈N .
If the sequence (an )n∈N tends to ∞ or −∞, it is called definitely divergent.

Note that definitely divergent sequences are necessarily unbounded. Moreover, we do not
have such a concept for complex-valued sequences, as we do not have an order on C.

117
3.2 Calculation rules for limits

We now study how to determine the limit of more complicated sequences. This always follows
the same procedure; either we already know the limit of the sequence under consideration,
or one has to split up the sequence into easier parts that can be handled, or split again.
The following result helps us to reduce the task of determining a limit to several smaller and
hopefully easier limits.

Theorem 3.8. Let (an )n∈N and (bn )n∈N be convergent complex-valued sequences and let
λ ∈ C. Let a = limn→∞ an and b = limn→∞ bn . Then, we have

(i) limn→∞ (an + bn ) = a + b,

(ii) limn→∞ (λ · an ) = λ · a,

(iii) limn→∞ (an · bn ) = a · b,

(iv) if b ̸= 0 and bn ̸= 0 for all n ∈ N, then


an a
lim = .
n→∞ bn b

Proof. (i) Let ϵ > 0 be arbitrary. By the definition of convergence, there exist m0 , n0 ∈ N
such that
ϵ
n ≥ m0 =⇒ |an − a| <
2
and
ϵ
n ≥ n0 =⇒ |bn − b| < .
2
In particular, for all n ≥ max{m0 , n0 }, we have |an − a|, |bn − b| < 2ϵ . Then, by the
triangle inequality
ϵ ϵ
|(an + bn ) − (a + b)| = |(an − a) + (bn − b)| ≤ |an − a| + |bn − b| < + =ϵ
2 2
holds for all n ≥ max{m0 , n0 }.

(ii) Let ϵ > 0 be arbitrary. By the definition of convergence, there exists n0 ∈ N such that
ϵ
n ≥ n0 =⇒ |an − a| < .
|λ|

Therefore, for all n ≥ n0 ,


ϵ
|λan − λa| = |λ(an − a)| = |λ||an − a| < |λ| = ϵ.
|λ|

(iii) Since (bn ) is a convergent sequence, it follows from Theorem 3.6 that there is some
C > 0 such that
|bn | ≤ C
holds for all n ∈ N.

118
Let ϵ > 0 be arbitrary. By the definition of convergence, there exist m0 , n0 ∈ N such
that
ϵ
n ≥ m0 =⇒ |an − a| <
2C
and
ϵ
n ≥ n0 =⇒ |bn − b| < .
2|a|
ϵ ϵ
In particular, for all n ≥ max{m0 , n0 }, we have both |an − a| < 2C and |bn − b| < 2|a| .
Then, by the triangle inequality,

|an bn − ab| = |an bn − abn + abn − ab| ≤ |an bn − abn | + |abn − ab|
= |bn ||an − a| + |a||bn − b|
≤ C|an − a| + |a||bn − b|
ϵ ϵ
<C + |a| = ϵ.
2C 2|a|

(iv) We will show that


1 1
lim = . (64)
n→∞ bn b
Once this has been proven, part (iv) of the Theorem follows by an application of part
(iii). It remains to prove (64).
Let ϵ > 0 be arbitrary. There exists n0 ∈ N such that, for all n ≥ n0 ,
1
|bn − b| < min{|b|, ϵ|b|2 }.
2
It follows from Theorem 1.50 and |bn − b| < 21 |b|, that
1
|bn | > |b|
2
which is equivalent to
1 2
<
|bn | |b|
holds for all n ≥ n0 . Therefore, for all n ≥ n0
1 1 b − bn 1 1 1 1 1 1 2 1
− = = · · |b − bn | < · · ϵ|b|2 < · · ϵ|b|2 = ϵ.
bn b bbn |b| |bn | |b| |bn | 2 |b| |b| 2

Example - We can use the previous theorem to calculate the limit of the sequence (an )n∈N
given by an = 1 + πn . Write an = bn + πcn with
1
bn = 1, ∀n ∈ N and cn = .
n
Note that both of the sequences (bn ) and (cn ) are convergent, with limits 1 and 0 respectively.
By applying the first and second points of Theorem 3.8, it follows that

lim an = lim bn + π · lim cn = 1 + π · 0 = 1.


n→∞ n→∞ n→∞

119
Example - Consider the sequence (an )n∈N given by

3n2 + 4n + 100
an = √ .
7n2 + 13n + n

We can use Theorem 3.8 to calculate the value of limn→∞ an . Divide the numerator and
denominator by n2 to express an as
4 100
3+ n + n2
an = 13 1 .
7+ n + n3/2

Let
4 100 13 1
bn := 3 + + 2 , cn = 7 + + 3/2 .
n n n n
It follows from (parts (i) and (ii) of) Theorem 3.8 that the sequences (bn )n∈N and (cn )n∈N
are convergent with
lim bn = 3, lim cn = 7.
n→∞ n→∞

It then follows from Theorem 3.8(iv) that

bn 3
lim an = lim = .
n→∞ n→∞ cn 7

In order to make more use of Theorem 3.8, we need to build up a bigger collection of simple
sequences for which we know that they are convergent and what they converge to. Let us
start with a lemma that shows how to verify that a sequence is a null sequence by comparison
with another null sequence.

Lemma 3.9. Let (an )n∈N be a complex-valued null sequence and let c, C > 0 be arbitrary
positive real numbers. Suppose that the sequence (bn )n∈N satisfies

|bn | ≤ C|an |c

for all but finitely many n ∈ N. Then (bn )n∈N is a null sequence.

Proof. This is left as an exercise.

Example - Consider the sequence (bn )n∈N given by


10000
bn = .
n0.00001
It follows from Lemma 3.9 that (bn ) is a null sequence. Indeed,

bn = 10000|an |0.00001
1
where an = n and (an )n∈N is a null sequence.
Let us now consider other important “building blocks”, i.e., limits that may be considered
known from now on, together with the corresponding proofs. The first example is concerned
with powers of small complex bases.

120
Lemma 3.10. Let z ∈ C with |z| < 1. Then

lim z n = 0.
n→∞

Proof. Let
1
x := − 1.
|z|
Observe that x is a positive real number. Bernoulli’s Inequality (or even the Binomial The-
orem, see Corollary 1.43) implies that (1 + x)n ≥ 1 + nx holds for all n ∈ N. Therefore,
 n
n n 1 1 1 1 1
|z | = |z| = = n
≤ < = · an ,
1+x (1 + x) 1 + nx nx x

where an = n1 . Note that (an )n∈N is a null sequence. An application of Lemma 3.9 (with
c = 1 and C = x1 > 0) implies that (z n )n∈N is also a null sequence.

Exercise - Let z ∈ C with |z| > 1. Show that the sequence (z n )n∈N is divergent. For what
z ∈ C with |z| = 1 is the sequence (z n )n∈N convergent?
The next limit will be a useful building block for establishing the convergence of other se-
quences later.

Lemma 3.11. √
n
lim n = 1.
n→∞

Proof. Define √
n
xn = n − 1.
We will show that (xn )n∈N is a null sequence. It then follows from Theorem 3.8(i) that

lim n n = lim (xn + 1) = 0 + 1 = 1.
n→∞ n→∞

To show that (xn ) is a null sequence, we use Corollary 1.43 with m = 2 to deduce that
 
n n 2 n(n − 1) 2
n = (1 + xn ) ≥ 1 + xn = 1 + xn .
2 2

A rearrangement of this inequality gives


√ 1
xn ≤ 2· √ .
n

It then follows from Lemma 3.9 that (xn ) is a null sequence.

We will now use Lemma 3.11 in combination with Theorem 3.8 to give a strengthened version
of the previous result in which we allow the argument of the root to grow even faster.

Lemma 3.12. Let k ∈ N. Then √


n
lim nk = 1
n→∞

121
Proof. The proof is by induction on k. The base case k = 1 was established in Lemma 3.11.
Now, suppose that the result holds for k. We need to prove that

lim an = 1.
n→∞

where √
n
an = nk+1 .
√ √
n
Let bn = n n and cn = nk , and observe that an = bn · cn . Also, by Lemma 3.11 and the
induction hypothesis respectively, we have both

lim bn = 1
n→∞

and
lim cn = 1.
n→∞

It therefore follows from Theorem 3.8(iii) that


   
lim an = lim bn · lim cn = 1 · 1 = 1.
n→∞ n→∞ n→∞

The following result illustrates the fact that exponential growth is faster than polynomial
growth.

Lemma 3.13. Let z ∈ C with |z| > 1 and let k ∈ N be fixed. Then

nk
lim = 0.
n→∞ z n

Proof. Our plan for this proof is as follows: we will show that there is some constant C such
that
nk 1
n
≤C (65)
z n
k
holds for all but finitely many n ∈ N. It then follows from Lemma 3.9 that ( nz n ) is a null
sequence. It remains to prove (65) for almost all n ∈ N.
Set x := |z| − 1 > 0 and suppose that n ≥ 2k which is equivalent to n − k ≥ n2 . This
assumption on n is ok for us, since we are disregarding only finitely many values of n. Apply
Corollary 1.43 with this x to obtain
 
n n n
|z| = (1 + x) ≥ 1 + xk+1
k+1
n(n − 1) · · · (n − k) k+1
=1+ x
(k + 1)!
(n/2)k+1 k+1
≥ x
(k + 1)!
nk+1
= xk+1 .
(k + 1)!2k+1

122
It therefore follows that, for all n ≥ 2k,
nk nk k (k + 1)!2
k+1 1
= ≤ n · = · C,
zn |z|n xk+1 nk+1 n
k+1
where C = (k+1)!2
xk+1
is an absolute constant (i.e. it is independent of n). This proves (65),
and completes the proof of the lemma.

The next result gives a very helpful tool for the calculation of difficult limits. This one is
helpful when the sequence under consideration can be bounded from above and below by
sequences that converge to the same limit.
Theorem 3.14. (Sandwich Rule) Let (an )n∈N and (cn )n∈N be convergent real-valued se-
quences. Suppose that (bn )n∈N is a real-valued sequence and that there exists n0 ∈ N such
that
an ≤ bn ≤ cn ∀n ≥ n0 .
Suppose also that
lim an = L = lim cn .
n→∞ n→∞
Then (bn )n∈N is convergent and
lim bn = L.
n→∞

Proof. This is left as an exercise.

Our first application of the Sandwich Rule is to prove a variant of Lemma 3.11.
Lemma 3.15. Let a be a positive real number. Then

lim n a = 1.
n→∞

Proof. The lemma holds for trivial reasons if we set a = 1. For a > 1, observe that, for all
n≥a √ √
1 ≤ n a ≤ n n.

We have bounded the sequence ( n a)n∈N from above and below by two sequences which both
converge to 1 (by Lemma 3.11). It therefore follows from the Sandwich Rule that

lim n a = 1.
n→∞

For 0 < a < 1, let (bn )n∈N be the sequence given by


r
n 1
bn := .
a
Since 1/a > 1, it follows from what we have just proven that limn→∞ bn = 1. In particular,
for all ϵ > 0, there exists n0 such that, for all n ≥ n0 ,
|bn − 1| < ϵ.

n
Therefore, since a<1
√ √ √
√ √ n
a−1 √ n
a−1 n
a−1
n
| a − 1| = n
a √
n
= | n a| √
n
< √
n
= |1 − bn | < ϵ.
a a a

123
Example - We give another application of the Sandwich Rule in this example. Let x, y > 0
be fixed real numbers and consider the sequence (bn )n∈N where

bn = n xn + y n .

We will show that


lim bn = max{x, y}.
n→∞

We may assume (w.l.o.g.) that x ≥ y and seek to show that limn→∞ bn = x. Note that
√ √ √
n

n
√ √
n
x = n xn ≤ n xn + y n ≤ 2xn = 2 n xn = 2 · x.

Apply√the Sandwich Rule with an = x for all n and cn = n 2 · x. We know from Lemma 3.15
that n 2 → 1, and it then follows from Theorem 3.8(ii) that cn → x. The Sandwich Rule
then implies that
lim bn = x.
n→∞

Example - We can also use the Sandwich Rule to prove that

(1 + sin n)n
lim = 0.
n→∞ n2n
This is a good illustration of the use of the Sandwich Rule, since this sequence is rather
complicated and it is not easy to observe a pattern by writing out the first few terms of the
sequence.
Observe that
(1 + sin n)n (1 + 1)n 1
0≤ n
≤ = .
n2 n2n n
1
Apply the Sandwich Rule with an = 0 and cn = n. Both of these sequences converge to 0,
n)n
and thus limn→∞ (1+sin
n2n = 0 also.
Exercise - Let k ∈ N be fixed. Use the Sandwich Rule to prove that
n!
lim = 1.
n→∞ (n − k)!nk

We conclude this section by giving an analogue of Theorem 3.8 for definitely divergent se-
quences.

Theorem 3.16. Let (an )n∈N and (bn )n∈N be real-valued sequences and let b, λ ∈ R. Then

(i) limn→∞ an = ∞ and limn→∞ bn = ∞ =⇒ limn→∞ (an + bn ) = ∞,

(ii) limn→∞ an = ∞ and limn→∞ bn = ∞ =⇒ limn→∞ (an bn ) = ∞,

(iii) limn→∞ an = ∞ and limn→∞ bn = b =⇒ limn→∞ (an + bn ) = ∞,


λ
(iv) limn→∞ an = ∞ =⇒ limn→∞ an = 0,

(v) limn→∞ an = ∞ and λ > 0 =⇒ limn→∞ λan = ∞,

(vi) limn→∞ an = ∞ and λ < 0 =⇒ limn→∞ λan = −∞.

124
Proof. This is left as an exercise.

In general, one needs to take a little more care with these kinds of rules for definitely divergent
sequences, and we do not always have a convenient rule to apply. For instance, there is no
easy rule for determining the limit of the sequence (an bn )n∈N for the case when an → ∞ and
bn → 0. If we set an = n and bn = 2−n then Lemma 3.13 informs us that an bn → 0. However,
if we set an = n2 and bn = n1 , then an bn = n → ∞.

125
3.3 Monotone sequences

Let’s get straight to the key definition of this section.

Definition 3.17. A real-valued sequence (an )n∈N is called

• increasing if and only if


∀n ∈ N, an+1 > an ,

• non-decreasing if and only if

∀n ∈ N, an+1 ≥ an ,

• decreasing if and only if


∀n ∈ N, an+1 < an ,

• non-increasing if and only if

∀n ∈ N, an+1 ≤ an .

Moreover, we say that a sequence is monotone if it is non-increasing or non-decreasing, and


strictly monotone if it is either increasing or decreasing.

Note that, since the definition of monotonicity requires a notion of order, we do not have an
analogue of the definition above for complex-valued sequences.
Examples - Many of the sequences that we have discussed so far are monotone. For example
1
(and thus also strictly monotone). Any sequence of the form (nk )n∈N

n n∈N is decreasing
with k > 0 is increasing. If k < 0 then the sequence is decreasing, and if k = 0 then
the sequence is constant (and so both non-increasing and non-decreasing). The sequence
((−1)n )n∈N is not monotone.
For some sequences, a little more work is required to determine whether or not they are
monotone. In some cases, a helpful trick is to consider the quotients of consecutive terms
of a sequence and show that they are bounded (from above or below) by one. This works
because a sequence is increasing (for example) if and only if
an+1
>1
an
holds for all n ∈ N.

Lemma 3.18. The sequence (an )n∈N given by

1 n n+1 n
   
an := 1 + =
n n

is non-decreasing.

126
Proof. We consider quotients of successive terms. We will show
an+1
≥1
an
holds for all n ∈ N. Observe that
 n+1
n+2
(n + 2)n n+1 n + 1
 
an+1 n+1
= n+1 n
= ·
(n + 1)2

an n
n
n+1
(n + 1)2 − 1
 
n+1
= 2
·
(n + 1) n
 n+1
1 n+1
= 1− 2
· .
(n + 1) n
1
An application of Bernoulli’s Inequality (Thm. 1.30) with x = − (n+1)2 ≥ −1 yields

 n+1  
an+1 1 n+1 1 n+1
= 1− · ≥ 1 − (n + 1) = 1.
an (n + 1)2 n (n + 1)2 n

In fact, if we were a little more careful in this proof, we could use a strict version of Bernoulli’s
Inequality that holds for x > −1 to prove that the given sequence is decreasing (and hence
strictly monotone).
The following result, whose statement and proof is similar to Lemma 3.18, will be used later.
Lemma 3.19. The sequence (bn )n∈N given by

1 n+1 n + 1 n+1
   
bn := 1 + =
n n
is non-increasing.

Proof. This is left as an exercise.

The following result shows that monotonicity is a very helpful property as we only need to
check if a monotone sequence is bounded in order to know whether it is convergent or not.
Note that boundedness of a sequence is usually much easier to show.
Theorem 3.20 (Monotonicity Principle). (i) If (an )n∈N is a non-decreasing sequence which
is bounded above, then
lim an = sup{an : n ∈ N} ∈ R.
n→∞

(ii) If (an )n∈N is non-increasing sequence which is bounded below, then

lim an = inf{an : n ∈ N} ∈ R.
n→∞

(iii) If (an )n∈N is a monotone sequence. Then

(an )n∈N is convergent ⇐⇒ (an )n∈N is bounded. (66)

127
We use the notation sup(an ) as a shorthand for sup{an : n ∈ N}, and similarly inf(an ) =
inf{an : n ∈ N}.

Proof. (i) Suppose that (an )n∈N is non-decreasing sequence which is bounded above. This
means that the set {an : n ∈ N} is bounded above. It follows from the Completeness
Axiom (Axiom 1.38) that there exists t ∈ R such that

t = sup{an : n ∈ N}.

Now let ϵ > 0 be arbitrary. It follows from the definition of the supremum that t − ϵ is
not an upper bound for the set {an : n ∈ N}. In particular, there exists n0 ∈ N such
that an0 > t − ϵ.
However, since (an )n∈N is non-decreasing, it follows that for all n ≥ n0

t − ϵ < an0 ≤ an ≤ t < t + ϵ.

Hence, |an − t| < ϵ for all n ≥ n0 , and so by definition limn→∞ an = t.

(ii) The proof is very similar to part (i), and is left as an exercise.

(iii) We know from Theorem 3.6 that convergent sequences are bounded, which proves the
first direction of the implication (66). In order to prove (66), we need to prove the
reverse implication. This follows from the previous two parts of this theorem.
Indeed, suppose that (an )n∈N is a bounded sequence which is monotone. By definition
of monotonicity, the sequence is either non-decreasing or non-increasing. In the first of
these cases, we can use part (i) of this theorem to show that (an )n∈N is convergent and
limn→∞ an = t = sup(an ). Similarly, if (an )n∈N is non-increasing then the result follows
from part (ii).

Exercise - Let (an )n∈N be a non-decreasing real-valued sequence which is unbounded. Prove
that
lim an = ∞.
n→∞

With Theorem 3.20 we see that there are convergent sequences where we do not have to know
the limit to verify that it exists. In some cases, we may even define numbers just as limits
of specific sequences, because we do not have another (explicit) description. One typical
example is Euler’s number.

Lemma 3.21. The sequence (an )n∈N given by

1 n n+1 n
   
an := 1 + =
n n

is convergent.

Proof. We have already proven in Lemma 3.18 that this sequence is monotone. To prove this
lemma, we will show that (an )n∈N is bounded. We are then done, by Theorem 3.20.

128
Since (an )n∈N is non-decreasing, it follows from Lemma 3.18 that, for all n ∈ N,

an ≥ a1 = 2.

It remains to establish an upper bound for an . For this, we recall the related sequence
(bn )n∈N , given by
1 n+1 n + 1 n+1
   
bn := 1 + = .
n n
Observe that, for all n ∈ N, an ≤ bn . We also know, from Lemma 3.19, that (bn )n∈N is
non-increasing, which implies that, for all n ∈ N,

an ≤ bn ≤ b1 = 4.

We have found both an upper and lower bound for an , which means that the sequence is
indeed bounded and the proof is complete.

If we take a little more care with the application of Theorem 3.20, we see that this shows
that the limit of the sequence given by

1 n n+1 n
   
an := 1 + =
n n

exists and equals sup{an : n ∈ N}. We define this limit to be Euler’s number, denoted e.
That is,
1 n 1 n
   
e := lim 1 + = sup 1 + .
n→∞ n n

129
3.4 Subsequences

The concepts of the last sections deal with sequences that converge or, in other words, con-
centrate around a single point. In some cases, however, divergent sequences may also have
some points of interest for very large n. An obvious example is ((−1)n )n∈N , which appears
to converge towards two different points. Now, we want to formalise the idea of sequences
having more than one limit.

Definition 3.22. Let (n1 , n2 , n3 , . . . ) be an (infinite) increasing sequence of natural numbers


and let (an )n∈N be a sequence. Then, we call

(ank )k∈N = (an1 , an2 , . . . )

a subsequence of (an )n∈N .

Example - Consider the sequence (an )n∈N given by

an = (−1)n .

Two notable subsequences are given by taking the odd and even terms of the sequences. That
is, we can consider (n1 , n2 , . . . ) = (1, 3, . . . ) and (n1 , n2 , . . . ) = (2, 4, . . . ). These subsequences
are convergent (in fact, they are constant) with limit −1 and 1 respectively.
Exercise - Suppose that (an )n∈N is a sequence such that limn→ an = a. Show that any
subsequence (ank )k∈N satisfies
lim ank = a.
k→∞

Definition 3.23. Let (an )n∈N be a complex-valued sequence. We call a ∈ C an accumula-


tion point of (an )n∈N if there exists a subsequence (ank )k∈N with

lim ank = a.
k→∞

The accumulation points of the sequence ((−1)n )n∈N are −1 and 1. We can also consider
some more complicated examples.
Example - Consider the sequence defined by
(
1 if n is not prime
an = 1 .
n if n is prime

The accumulation points of this sequence are 0 and 1. The subsequence (an )n∈P , where P
denotes the set of all primes, converges to 0 and the subsequence (an )n∈P
/ is constant, always
taking value 1.
Next we want to show that each bounded sequence has at least one convergent subsequence.
This result bears the names of Bolzano and Weierstrass and is an important technical tool
for proofs in many areas of analysis.

Theorem 3.24 (Bolzano-Weierstrass Theorem). Let (an )n∈N be a bounded real-valued se-
quence. Then (an )n∈N has at least one convergent subsequence.

130
an

n
m

Proof. We call m a peak of (an )n∈N if


an < am , ∀ n > m.

Case 1 - Suppose that there are finitely many peaks m1 < m2 < · · · < ml . Set n1 = ml + 1.
In particular, n1 is not a peak, and so there exists an integer n2 > n1 such that an2 ≥ an1 .
Also, n2 is not a peak, and so there exists n3 > n2 such that an3 ≥ an2 ≥ an1 . We can
continue this process to obtain a (infinite) non-decreasing sequence (an1 , an2 , . . . ) such that
an1 ≤ an2 ≤ an3 ≤ an4 . . . . This subsequence is also bounded, because of the assumption
that (an )n∈N is bounded. It therefore follows from the Monotonicity Principle (Theorem 3.20)
that it is convergent.
Case 2 - Suppose that there are no peaks. The proof is the same as that of Case 1, expect
that we set n1 = 1.
Case 3 - Suppose that there are infinitely many peaks m1 < m2 < . . . . Then the sequence
(am1 , am2 , . . . ) is decreasing. It is also bounded, because of the assumption that (an )n∈N is
bounded. It therefore follows again from the Monotonicity Principle (Theorem 3.20) that
this subsequence is convergent.

Example - We can use the Bolzano-Weierstrass Theorem to show that certain sequences
contain convergent subsequences even in cases when the convergent subsequences are rather
difficult to see. For instance, consider the sequence (an )n∈N given by
n · cos(3n2 − 5)
an = .
n+1
It is not easy to see a pattern in this sequence, with the values of an jumping around fairly
randomly, somewhere in the range (−1, 1). However, it is not difficult to check that the
sequence (an )n∈N is bounded, and so the Bolzano-Weierstrass Theorem tells us that there
exists a convergent subsequence.
Note that the converse of the Bolzano-Weierstrass Theorem does not hold, i.e. not every
sequence with a convergent subsequence is bounded. One may consider, for example, the
sequence (an )n∈N given by (
n if n is even
an = .
0 if n is odd

131
We finish this section with an extreme example, highlighting the potentially strange behaviour
of accumulation points.
Example Consider the sequence (an )n∈N , which is a list of all rational numbers in the interval
(0, 1). One may define this sequence more formally by constructing a bijection between
f : N → Q ∩ (0, 1) (we did this back in Chapter 1) and then setting an = f (n).
For every real number x ∈ (0, 1) it is possible to define an infinite sequence of rational numbers
which converge to x. Such a sequence can be constructed using part 3 of Theorem 1.39 (and
I encourage you to formally define such a sequence).
In other words, the set of accumulation points for this sequence is the whole interval (0, 1).
This is an uncountable and therefore somewhat “larger” than the actual range of the sequence!

132
3.5 Cauchy criterion

In this section we introduce Cauchy sequences and the Cauchy criterion for establishing the
convergence of a sequence. The Cauchy criterion is, similarly to the Monotonicity Principle
(Theorem 3.20), an important tool to verify that a sequence is convergent without knowing
its limit and it will be used several times throughout the remainder of the course.

Definition 3.25. A complex-valued sequence (an )n∈N is called a Cauchy sequence if

∀ ϵ > 0, ∃ n0 ∈ N : m, n ≥ n0 =⇒ |an − am | < ϵ.

You should compare this definition with the definition of convergence in order to gain better
understanding.
We now take a look at two familiar sequences and examine whether or not they are Cauchy.
Example - The sequence (an )n∈N given by
1
an =
n
is a Cauchy sequence. To see this, observe that, by the triangle inequality,

1 1 1 1 1 1
|an − am | = − ≤ + = + .
n m n m n m

It therefore follows that, for all ϵ > 0 and for all m, n > 2ϵ , we have |an − am | < ϵ.
Example - The sequence (an )n∈N given by

an = (−1)n

is a not Cauchy sequence. Suppose for a contradiction that it is Cauchy. Then, for all ϵ > 0,
there is some n0 ∈ N such that |an − am | < ϵ holds for all m, n ≥ n0 . But, if we set ϵ = 1, we
obtain a contradiction, since no matter which value we choose for n0 , we have

|an0 − an0 +1 | = 2 > 1.

These two examples suggest that the property of being Cauchy may be similar to that of
being convergent (since we already know that the first sequence (1/n) is convergent, and that
((−1)n ) is not). Indeed, this intuition is correct, as the following important result shows.

Theorem 3.26 (Cauchy criterion). Let (an )n∈N be a real-valued sequence. Then

(an )n∈N is convergent ⇐⇒ (an )n∈N is Cauchy.

Before proving the Cauchy criterion, we prove a lemma that will be used in the proof.

Lemma 3.27. Let (an )n∈N be a complex-valued sequence which is Cauchy. Then (an )n∈N is
bounded.

133
Proof. Apply the definition of a Cauchy sequence with ϵ = 1. It follows that there is some
n0 ∈ N such that, for all m, n ≥ n0 ,

|an − am | < 1.

Also, by the triangle inequality,

|an | = |(an − an0 ) + an0 | ≤ |an − an0 | + |an0 | < 1 + |an0 |.

This gives a bound for all n ≥ n0 . It then follows that, for all n ∈ N, |an | ≤ C, where

C = max{|a1 |, |a2 |, . . . , |an0 −1 |, 1 + |an0 |}.

Proof of the Cauchy Criterion. First, we show that

(an )n∈N is convergent =⇒ (an )n∈N is Cauchy.

Suppose that (an )n∈N is convergent with an → a. Let ϵ > 0 be arbitrary. By the definition
of convergence, there exists n0 ∈ N such that for all m, n ≥ n0 ,
ϵ
|am − a|, |an − a| < .
2
It follows from the triangle inequality that, for all m, n ≥ n0 ,
ϵ ϵ
|am − an | = |(am − a) + (a − an )| ≤ |am − a| + |a − an | < + = ϵ.
2 2

Next, we consider the opposite implication. Suppose that (an )n∈N is Cauchy. Lemma 3.27
implies that (an )n∈N is bounded. By the Bolzano-Weierstrass Theorem (Thm. 3.24), (an )n∈N
has at least one convergent subsequence.
Let (ank )n∈N be a subsequence of (an )n∈N such that

lim ank = a.
k→∞

Let ϵ > 0 be arbitrary. Since (ank )n∈N tends to a, it follows from the definition of convergence
that there is some k0 ∈ N such that, for all k ≥ k0 ,
ϵ
|ank − a| < .
2
Also, since (an )n∈N is Cauchy, it follows that there is some n0 such that, for all m, n ≥ n0 ,
ϵ
|an − am | < .
2
Let k ∈ N be any integer such that both k ≥ k0 and nk ≥ n0 hold. Then, by the triangle
inequality, we have that for all n ≥ n0
ϵ ϵ
|an − a| = |(an − ank ) + (ank − a)| ≤ |an − ank | + |ank − a| < + = ϵ.
2 2

134
The Cauchy criterion is particularly useful as a shortcut for proving that certain sequences
are convergent, since it is generally easier to verify that a sequence is Cauchy than it is to
show that it is convergent. In particular, we do not need to know what the limit of the
sequence is when checking that it is Cauchy.
Example - Let (an )n∈N be a recursively defined sequence with a1 = 1 and
(
an + 21n if n is prime
an+1 = .
an − 21n if n is not prime

If we write down the first few terms of this sequence, it is not immediately obvious that
the sequence is convergent, and it is even more tricky to identify a possible limit. However,
without knowing anything about the possible limit of the sequence, we can show that is it
Cauchy, as follows.
First, observe that, for all n ≥ 2,
1
|an+1 − an | = .
2n
Now let ϵ > 0 and suppose that m and n are both larger than n0 . We need make sure that
n0 is sufficiently large for the forthcoming proof, and we will specify the choice of n0 to make
the argument work.
Without loss of generality, we may assume that m > n ≥ n0 . By the triangle inequality,

|am − an | = |(am − am−1 ) + (am−1 − am−2 ) + · · · + (an+1 − an )|


≤ |am − am−1 | + |am−1 − am−2 | + · · · + |an+1 − an |
1 1 1
≤ m−1 + m−2 + · · · + n
2 2 2
m−n  k
1 X 1
= n−1
2 2
k=1
1 1
< ≤ .
2n−1 2n0 −1
In particular, if we set n0 so that 2n0 −1 > 1ϵ , it follows that, for all m, n ≥ n0 ,

|am − an | < ϵ.
2

Formally, we may define n0 := ⌈log2 ϵ ⌉.

In summary, we have shown that (an )n∈N is a Cauchy sequence, and thus by the Cauchy
criterion, it follows that the sequence is indeed convergent.

135
3.6 Introduction to series

In this section, we use sequences to build series. A series is really just a special kind of
sequence (sn )n∈N given by
X n
sn = ak ,
k=1
where (an )n∈N is another sequence. The sum of all terms of the sequence (an )n∈N , i.e. the
limit of the sequence (sn )n∈N , is one of the main motivations for considering limits at all, and
some interesting phenomena appear when it comes to the question of whether such limits
exist.
We begin with a more formal definition.
Definition 3.28. Let (an )n∈N be a complex-valued sequence and
n
X
sn = ak .
k=1

We call sn the nth partial sum of the series



X
ak .
k=1

If the sequence (sn )n∈N converges with limn→∞ sn = s then we say that the series converges.
We call s the sum of the series, and write

X
ak = s.
k=1

If a series is not convergent, then it is divergent.


If limn→∞ sn = ±∞ then we write

X
ak = ±∞
k=1
and say that the series is definitely divergent.

Note that “series” P


is just another word for an infinite sum of elements of a sequence. More-
over, the notation ∞ k=1 ak should be understood as a formal symbol for the limit of the
corresponding sequence (sn )n∈N : it might be a number or ±∞, but it might also not exist.
We
P∞will also sometimes consider series which do not start with index 1, i.e. series of the form
k=k0 ak for some k0 ∈ Z. The most common variant we consider is with k0 = 0.

It should also be noted that, if we consider an arbitrary series ∞


P
k=1 ak , we should assume
that the terms ak are complex numbers unless stated otherwise. There will be some instances
later where we make the additional restriction that ak ∈ R.
The definition above states that a series converges if and only if the sequence (sn )n∈N of
partial sums converges. This implies that we can use the results from the previous sections
to analyse series. Moreover, we will see that there are even more tools for working with series.
But first let us consider some particularly important examples.

136
Lemma 3.29 (Geometric series). Let q ∈ C with |q| < 1. Then we have that
∞ ∞
X 1 X q
qk = , and qk = . (67)
1−q 1−q
k=0 k=1

Moreover, we have
n
X 1 − q n+1
qk = . (68)
1−q
k=0

Pn k
Proof. We will first prove (68), and then use it to prove (67). Let sn := k=0 q and consider
the equation
n
X
(1 − q)sn = (1 − q) qk
k=0
= (1 − q)(1 + q + q 2 + · · · + q n )
= (1 + q + q 2 + · · · + q n ) − (q + q 2 + q 3 + · · · + q n+1 )
= 1 − q n+1 .

In the last step, we have simply observed that all but the extreme terms in the two brackets
cancel out with one another. This proves (68).
To prove the first part of (67), we make use of some facts about convergence of sequences
that we established earlier in this chapter (namely Theorem 3.8 and Lemma 3.10) to see that

X 1 − q n+1 1 1
q k = lim sn = lim = · lim (1 − q n+1 ) = .
n→∞ n→∞ 1 − q 1−q n→∞ 1−q
k=0

The second sum from (67) follows immediately from the first.

In the proof above, we considered two long sums and observed that almost all of the terms
cancelled out. Such arguments are called telescoping tricks and sums of this form are
called telescoping sums. This kind of trick will be used more throughout this chapter.
1
Example - If we set q = 2 in Lemma 3.29, it follows that
∞ ∞
X 1 X 1
= 2, and = 1.
2k 2k
k=0 k=1

The next example shows that not all convergent sequences give rise to convergent
series. This is a very important example of a divergent series.

Lemma 3.30 (Harmonic series). Consider the sequence (an )n∈N given by an = n1 . Then, the
corresponding series satisfies

X 1
= ∞.
k
k=1

137
Proof. We need to show that the sequence of partial sums sn = nk=1 k1 tends to infinity. We
P
group the terms of the partial sum s2n and manipulate the sums as follows:
     
1 1 1 1 1 1 1 1 1 1
s2n = 1 + + + + + + + + ··· + + + · · · +
2 3 4 5 6 7 8 2n−1 + 1 2n−1 + 2 2n
1 1 1 1
≥ 1 + + 2 · + 4 · + · · · + 2n−1 · n
2 4 8 2
1 1 1 1
= 1 + + + ··· +
2 2 2 2
n
=1+ .
2
Since sn is an increasing sequence, it follows that, for all C ∈ R, there exists n0 = 2⌈2C⌉ such
that, for all n ≥ n0 we have
⌈2C⌉
sn ≥ sn0 = s2⌈2C⌉ ≥ 1 + > C.
2
Recalling the definition of the improper limit, we see that sn → ∞.
P∞ 1
The series
P∞ k=1 n = ∞ is called the harmonic series. By contrast with Lemma 3.30, the
series k=1 n−α converges for any α > 1, and so the harmonic series is something of a critical
example at which there is a change of behaviour.
Example - In this example, we discuss how the aforementioned telescoping trick can some-
times be a powerful tool for obtaining the precise value of apparently complicated series. We
will prove that

X 1
= 1.
k(k + 1)
k=1
We do not even know that this series is convergent yet. However, we first make the helpful
observation that
1 1 1
= − . (69)
k(k + 1) k k+1
It therefore follows that
n n   n n+1
X 1 X 1 1 X 1 X1 1
sn = = − = − =1− .
k(k + 1) k k+1 k k n+1
k=1 k=1 k=1 k=2

It therefore follows that limn→∞ sn = 1, as required.


In the example above, and in particular in the identity (69), we have used the method of
partial fraction decomposition to write a fraction with a product in the denominator as a sum
of two fractions with more simple denominators.
However, note that it is rare that we can easily compute the sum of a series precisely. Already
for the very similar example

X 1
k2
k=1
the problem of determining the value of the sum is considerably more
P∞difficult. We will use
1 π2
more sophisticated mathematics later in this program to prove that k=1 k2 = 6 . For many
other sums, there is just no closed expression.

138
3.7 Calculation rules and basic properties of series

In this section, we will use some of the theory that we have built for sequences to derive some
basic properties about series. We begin with an analogue of Theorem 3.8.

Theorem 3.31. Let ∞


P P∞
k=1 ak and k=1 bk be convergent series and let c ∈ C. Then we have


X ∞
X ∞
X
(ak + bk ) = ak + bk
k=1 k=1 k=1

and

X ∞
X
c · ak = c · ak .
k=1 k=1

Proof. This is left as an exercise.

Example - Recall from the previous section that


∞ ∞
X
−k
X 1
2 = 1 and = 1.
k(k + 1)
k=1 k=1

It therefore follows from Theorem 3.31 that


∞ ∞  ∞ ∞
πk(k + 1) + 2k

X X π 1 X
−k
X 1
k
= k
+ =π 2 + = π + 1.
k(k + 1)2 2 k(k + 1) k(k + 1)
k=1 k=1 k=1 k=1

The next result gives a useful application of the Monotonicity Principle (Theorem 3.20).

Theorem 3.32. Let (an )n∈N be a real-valued sequence such that an ≥ 0 for all n ∈ N. Let
sn = nk=1 ak . Then
P


X
(sn )n∈N is bounded ⇐⇒ ak converges.
k=1

Proof. Since ak ≥ 0 for all k ∈ N, it follows that the sequence (sn )n∈N is non-decreasing. In
particular, this sequence is monotone. It therefore follows from Theorem 3.20(iii) that

(sn )n∈N is bounded ⇐⇒ (sn )n∈N is convergent .

This proves the theorem.


P∞ 1
Example - We can use the previous result to
Pn 1 show that the series k=1 k! converges. We
just need to show that the sequence sn = k=1 k! is bounded. To see this, first observe that
k! ≥ 2k−1 holds for all k ∈ N. From this and from Lemma 3.29 it follows then that
n n n−1 ∞
X 1 X 1 X 1 X 1
≤ = ≤ = 2.
k! 2k−1 2k 2k
k=1 k=1 k=0 k=0

139
P∞ 1
Since k=1 k! converges, it immediately follows that
∞ ∞
X 1 X 1
=1+
k! k!
k=0 k=1
P∞ 1
and so k=0 k! also converges. Moreover, one can indeed show that

1 n
 
X 1
= e = lim 1 + . (70)
k! n→∞ n
k=0

We omit the proof of (70) here, but it will be considered on a forthcoming exercise sheet.
Next, we will apply the Cauchy criterion to the sequence of partial sums to obtain the
following result.

Theorem 3.33. Let ∞


P P∞
k=1 ak be a series. Then k=1 ak is convergent if and only if

m
X
∀ϵ > 0, ∃ n0 ∈ N : ∀m > n > n0 , ak < ϵ. (71)
k=n+1

P∞
Proof. By definition,
Pn the series k=1 ak is convergent if and only if the sequence of partial
sums sn = k=1 ak converges. By the Cauchy criterion, this sequence converges if and only
if it is Cauchy.
But what does it mean for the sequence of partial sums to be Cauchy? By definition, this
means that
∀ϵ > 0, ∃ n0 ∈ N : ∀m > n > n0 , |sm − sn | < ϵ. (72)
Pm Pn Pm
Observe that sm − sn = k=1 ak − k=1 ak = n+1 ak . Therefore, the statement (72) is
equivalent to the statement (71), which completes the proof.

This theorem immediately leads to the following simple criterion. In many cases, this is
already enough to show that a series is divergent.

Corollary 3.34. Let (an )n∈N be a sequence and suppose that the series ∞
P
k=1 ak converges.
Then
lim an = 0.
n→∞

In other words, a series can only be convergent if the corresponding sequence is a null sequence.
P∞
Proof. Suppose that the series k=1 ak converges. Then it follows from Theorem 3.33 (setting
m = n + 1 in (71)) that

∀ϵ > 0, ∃ n0 ∈ N : ∀n > n0 , |an+1 | < ϵ.

This implies that limn→an = 0.

140
Example - We can immediately use the previous corollary to see that, for any divergent
sequence, or any convergent sequence which is not a null sequence, the corresponding series
is not convergent. In particular, the series

X
(−1)k
k=1

is divergent. Also, for any m ∈ N,


∞ √
X k
km
k=1

is divergent.
An important remark is that the converse of Corollary 3.34 does not hold. For instance, as
we have shown in Lemma 3.30, the harmonic series is definitely divergent with

X 1
= ∞,
n
k=1

although the sequence of its summands (1/n)n∈N is a null sequence.


It is sometimes helpful to consider the following stronger form of convergence.

Definition 3.35. Let (an )n∈N be a complex-valued sequence with the property that there exists
C ∈ R such that
Xn
|ak | ≤ C, ∀n ∈ N.
k=1
P∞
Then we say that the series k=1 ak is absolutely convergent.

Note that,Pby Theorem 3.32, the series ∞


P
k=1 ak < ∞ is absolutely convergent if and only if

the series k=1 |ak | is convergent (which
Pwe could have used as an equivalent definition).
P∞ It is
therefore perfectly reasonable to write ∞ k=1 |ak | < ∞ as a shorthand for the series k=1 ak
being absolutely convergent.
We
P∞have already shown in Lemma 3.29 that, for q ∈ C with |q| < 1, the geometric series
q k is convergent. We will now show that it is also absolutely convergent.
k=1

Lemma 3.36. The geometric series ∞ k


P
k=1 q , with |q| < 1, is absolutely convergent.

Proof. We have
∞ ∞
X X |q|
|q k | = |q|k = ,
1 − |q|
k=1 k=1

where we have just applied Lemma 3.29 again, with |q| in the role of q.

Example - The series



X (−1)k
k
k=1

141
is called the alternating harmonic series. This series is not absolutely convergent, since
∞ ∞
X (−1)k X 1
=
k k
k=1 k=1

is the harmonic series, which is divergent. However, it turns out that the alternating harmonic
series is convergent.
The next result shows that absolute convergence is indeed a stronger condition than “mere”
convergence.

Theorem 3.37. If a series ∞


P
k=1 ak is absolutely convergent then it is also convergent.

Proof. Let ∞ k
P
k=1 a be an absolutely
P∞ convergent series and let ϵ > 0 be arbitrary. By Theorem
3.33 (applied to the series k=1 |ak |) and the triangle inequality, there exists n0 ∈ N such
that, for all m > n ≥ n0 ,
m
X m
X m
X
ak ≤ |ak | = |ak | < ϵ.
k=n+1 k=n+1 k=n+1
P∞
It then follows from a second application of Theorem 3.33 that k=1 ak is convergent.

142
3.8 Convergence tests

We will now discuss several criteria, called convergence tests, that can be used to verify if a
series is convergent or not. However, note that these test are sometimes inconclusive, i.e.,
we do not always get a definite answer by applying them, and one may need to apply other
techniques.

3.8.1 Comparison Test

The first result shows that we can obtain information about the convergence of a series by
comparing it to a series that is known to be convergent (or not).

Theorem 3.38. Let ∞


P P∞
k=1 ak and k=1 bk be two series.

1. If ∞
P
P∞bk is absolutely convergent and |ak | ≤ |bk | holds for all but finitely many k ∈ N,
k=1
then k=1 ak is also absolutely convergent.

2. If (an )n∈N and


P∞(bn )n∈N are real-valued
P∞ sequences such that 0 ≤ bk ≤ ak holds for all
k ∈ N, and k=1 bk = ∞, then k=1 ak = ∞.

Proof. 1. It follows from the hypothesis that there exists k0 ∈ N such that,

|ak | ≤ |bk |, ∀ k ≥ k0 .
P∞
Since k=1 bk converges absolutely, it follows that the sequence of partial sums sn =
P n
k=1 |ak | is bounded. Indeed, for any n ≥ k0 , we have

n
X 0 −1
kX n
X
|ak | = |ak | + |ak |
k=1 k=1 k=k0
0 −1
kX n
X
≤ |ak | + |bk |
k=1 k=k0
0 −1
kX ∞
X
≤ |ak | + |bk | ∈ R.
k=1 k=k0
P∞
It then follows from Theorem 3.32 that k=1 |ak | converges.

2. Let
n
X n
X
sn = an , tn = bn
k=1 k=1

denote the sequences of partial sums of the two series. Since bn ≥ 0, it follows from
Theorem 3.32 that the sequence (tn ) is not bounded. But also, sn ≥ tn for all n ∈ N,
and so it must also be the case that the sequence (sn ) is not bounded. It then follows
from the exercise after the proof of Theorem 3.20 that sn → ∞.

143
P∞ 1
Example - We will now use Theorem 3.38 to prove that the series k=1 k2 is (absolutely)
convergent. For any k ∈ N, we have k + 1 ≤ 2k and thus
1 k+1 1 1
= · ≤2· .
k2 k k(k + 1) k(k + 1)

Since both sides of this inequality are non-negative, it follows that

1 2
≤ .
k2 k(k + 1)

Note that the series ∞ 2


P
k=1 k(k+1) is absolutely convergent. Indeed, we showed earlier (see
the example after the proof of Lemma 3.30) that the sequence ∞ 1
P
k=1 k(k+1) converges, and it
then follows from Theorem 3.31 that ∞ 2
P
k=1 k(k+1) is also convergent, and moreover

∞ ∞
X 2 X 1
=2 = 2.
k(k + 1) k(k + 1)
k=1 k=1

Since
P∞ the2 corresponding sequence consists of non-negative real numbers, it follows that
k=1 k(k+1) is also absolutely convergent.
1 2
Apply the first part of Theorem 3.38 with ak = k2
and bk = k(k+1) . It follows that the series
P∞ 1
k=1 k2 is absolutely convergent.

Exercise - Use Theorem 3.38 to prove that


P∞
• 1
k=1 kc is absolutely convergent for all c ≥ 2, and
P∞
• 1
k=1 kc = ∞ for all c ≤ 1.

3.8.2 Root Test

Next, we present the Root Test.

Theorem 3.39. Let ∞


P
k=1 ak be a series.

1. If there exists a real number c < 1 such that


p
k
|ak | ≤ c

holds for all but finitely many k ∈ N, then ∞


P
k=1 ak is absolutely convergent.

2. Conversely, if pk
|ak | ≥ 1
P∞
holds for infinitely many k ∈ N, then k=1 ak is divergent.

1. The series ∞ k
P
Proof. k=1 c is absolutely convergent (see Lemma 3.36). Also, by the
hypothesis of the theorem |ak | ≤ ck P = |ck | holds for all but finitely many k ∈ N.
Therefore, by part 1 of Theorem 3.38, ∞ k=1 ak is absolutely convergent.

144
2. It follows from the condition that |ak | ≥ 1 for infinitely many k ∈ N. In particular, the
sequenceP∞(ak )k∈N is not a null sequence. It then follows from Corollary 3.34 that the
series k=1 ak does not converge.

The root test is usually most helpful when a the kth term of the corresponding sequence
involves a kth power.
Example - Consider the series

X k 100
sin(k) .
k=1
2k/2
This looks like a complicated series, and if we try to calculate the first few terms of the
corresponding sequence, they are rather large! However, we can use the comparison test and
the root test to prove that the series is convergent.
Note that, for all k ∈ N,
k 100 k 100
sin(k) ≤
2k/2 2k/2
Therefore, by part 1 of Theorem 3.38 it will be sufficient to prove that the series

X k 100
k=1
2k/2

is absolutely convergent. We use the root test. Observe that


s
k k 100 1 √k
k/2
= √ k 100 .
2 2
√k
We proved in Lemma√3.12 that limn→∞ k 100 = 1. In particular, there is some k0 ∈ N such
k
that, for all k ≥ k0 , k 100 ≤ 1.1. It follows that, for all k ≥ k0
s
k k 100 1 √k 1
k/2
= √ k 100 ≤ 1.1 · √ < 1.
2 2 2

The root test implies that



X k 100
k=1
2k/2
is absolutely convergent, as required.
Example - Consider the series

X k k/4
.
32+3k
k=1

For the root test, we study the terms


s
k k k/4 k 1/4
= √ .
32+3k 27 · k 9

145

Recall from Lemma√ 3.15 that limk→∞ k 9 = 1. Therefore, there is some k0 such that, for all
k ≥ k0 , we have k 9 ≤ 2. So, for all k ≥ k0 ,
s
k k k/4 k 1/4
≥ .
32+3k 54

We see that, for all k sufficiently large, the right hand side of the inequality above is at least
1. Indeed, for all k ≥ max{k0 , 544 }, we have
s
k k k/4 k 1/4
≥ ≥ 1.
32+3k 54

It follows from part 2 of Theorem 3.39 that the series



X k k/4
.
32+3k
k=1

is divergent.
The following corollary gives us a helpful repackaging of the root test.

Corollary 3.40. Let (ak )k∈N be a sequence.

1. If p
k
lim |ak | < 1
k→∞
P∞
then the series k=1 ak is absolutely convergent.

2. If p
k
lim |ak | > 1
k→∞
P∞
then the series k=1 ak is divergent.

Proof. 1. Since p
k
lim |ak | < 1
k→∞
p
it follows that there is some c < 1 such that k |ak | < c holdsP for all k sufficiently
large. It then follows from part 1 of Theorem 3.39 that the series ∞k=1 ak is absolutely
convergent.

2. If p
k
lim |ak | > 1
k→∞
p
then it follows that k |ak | >P
1 holds for all k sufficiently large. Part 2 of Theorem 3.39
then implies that the series ∞ k=1 ak is divergent.

146
3.8.3 Ratio Test

The next test is based on quotients of successive terms of the series.


Theorem 3.41. Let ∞
P
k=1 ak be a series.

1. If there exists some real number c < 1 such that, for all but finitely many k ∈ N,
ak+1
ak ̸= 0, and ≤ c,
ak
P∞
then k=1 ak is absolutely convergent.

2. Conversely, if for all but finitely many k ∈ N,


ak+1
ak ̸= 0, and ≥ 1,
ak
P∞
then k=1 ak is divergent.

Proof. 1. There exists k0 ∈ N such that, for all k ≥ k0 ,


ak+1
≤ c.
ak
It follows by induction that, for all m ∈ N,

|ak0 +m | ≤ cm |ak0 |.

Let
|ak0 |
bn := cn ·
.
ck0
P∞
The series k=1 bk is absolutely convergent. This follows from Theorem 3.31 and
Lemma 3.36. Also, for all k ≥ k0 ,

|ak | = |am+k0 | ≤ cm |ak0 | = ck−k0 |ak0 | = |bk |.

It follows from part 1 of Theorem 3.38 that ∞


P
k=1 ak is absolutely convergent.

2. There exists k0 ∈ N such that, for all k ≥ k0 ,


ak+1
≥ 1.
ak
It follows by induction that, for all k ≥ k0 ,

|ak | ≥ |ak0 |.

In particular, the sequence (aP


k )k∈N is not a null sequence. It therefore follows from
Corollary 3.34 that the series ∞ k=1 ak is divergent.

The following corollary gives us a helpful repackaging of the ratio test.

147
Corollary 3.42. Let (ak )k∈N be a sequence.

1. If
ak+1
lim <1
k→∞ ak
P∞
then the series k=1 ak is absolutely convergent.

2. If
ak+1
lim >1
k→∞ ak
P∞
then the series k=1 ak is divergent.

Proof. This is left as an exercise.


P∞
Example We can use the ratio test to prove that the series k=1 ak given by

k3
ak =
k!
is convergent. Indeed, for all k ∈ N
3
(k + 1)3 k!

ak+1 1 k+1 8
= · 3 = · ≤ .
ak (k + 1)! k k+1 k k+1
k+1
The last inequality is just an application of the fact that k ≤ 2 holds for all k ∈ N. It
therefore follows that, for all k ≥ 15 we have

ak+1 1
≤ .
ak 2
P∞ k3
The ratio test implies that the series k=1 k! is convergent.
Example - As was hinted at earlier, the ratio test does not always provide a definite answer
regarding the convergence or divergence of a series. To see this, let’s try and apply the ratio
test for the series ∞ 1
P
n=1 n (which we already know is divergent, see Lemma 3.30).
Observe that
ak+1 k
= .
ak k+1
k
Since the sequence k+1 tends to 1, it follows that neither of the two conditions in Theorem
3.41 hold, and we do not get any information about the series ∞ 1
P
n=1 n by this method.

148

You might also like