0% found this document useful (0 votes)

82 views

Math Skript

This document contains lecture notes for a course on Mathematics for Artificial Intelligence. It introduces fundamental mathematical concepts across multiple chapters, including sets and functions, matrices and systems of linear equations, sequences and series, continuous functions and limits, differential and integral calculus, Fourier series, and multivariate calculus. The preface acknowledges contributions from undergraduate assistants and notes that the lecture is new, so the notes are a work in progress and incorporate material from other courses. Suggestions for improvements are welcomed.

Uploaded by

Julia Planidina

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views

Math Skript

Uploaded by

Julia Planidina

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 289

Dr.

Mario Ullrich
Institut für Analysis

Version of
October 19, 2022

Mathematics for
Artificial Intelligence 1–3

lecture notes (in progress) – winter semester 2022

JOHANNES KEPLER
UNIVERSITÄT LINZ
Altenbergerstraße 69
4040 Linz, Österreich
www.jku.at
2

Preface
These lecture notes belong to the lecture of the same name, and were produced starting in the
winter semester 2019. Big part of the work has been done by Julian Hofstadler and Corinna
Perchtold, who were employed as undergraduate assistants in this period for the preparation of
these notes and corresponding slides.
This is the first time that this lecture has been held at the JKU Linz and therefore, these notes
are far from being perfect. However, many parts are taken (or merged) from the repeatedly
used lecture notes of Prof. Aicke Hinrichs (“Analysis für Lehramt”, JKU, 2018), Prof. Andreas
Neubauer (“Mathematics for Chemistry”, JKU, 2017) and myself (“Klassische Harmonische
Analysis”, JKU, 2017). I thank both colleagues for the permission to do so.

Suggestions for improvements and additional comments are appreciated!

Mario Ullrich
(mario.ullrich@jku.at)
October 2021
3

Contents

Preface 2

1 Sets, numbers and functions 6

1.1 Sets and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Relations and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4 Bounded sets, infimum and supremum . . . . . . . . . . . . . . . . . . . . . . . 26
1.5 Induction and combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.6 Absolute value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.7 Some elementary functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.8 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.9 Vectors and norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2 Matrices and systems of linear equations 53

2.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.4 The determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.5 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.6 Inverse matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3 Sequences and series 93

3.1 Convergence of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.2 Calculation rules for limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.3 Monotone sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.4 Subsequences and accumulation points . . . . . . . . . . . . . . . . . . . . . . . 107
3.5 Cauchy criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.6 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.7 Convergence tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.7.1 Comparison test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.7.2 Root test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.7.3 Ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.7.4 Cauchy’s condensation test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.7.5 Leibniz criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.8 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4

4 Continuous functions and limits 132

4.1 Calculation rules of continuous functions . . . . . . . . . . . . . . . . . . . . . 136
4.2 Limits of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.3 Intermediate and extreme value theorem . . . . . . . . . . . . . . . . . . . . . 148
4.4 Other types of continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

5 Differential calculus 159

5.1 Calculation rules for differentiable functions . . . . . . . . . . . . . . . . . . . . 163
5.2 Global and local extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.3 Mean value theorem and l’Hospital’s rule . . . . . . . . . . . . . . . . . . . . . 171
5.4 Monotonicity and convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.5 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.6 (*) Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

6 Basic integration theory 187

6.1 Antiderivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.2 Calculation rules for antiderivatives . . . . . . . . . . . . . . . . . . . . . . . . 191
6.3 A first definition of the integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.4 The fundamental theorem of calculus . . . . . . . . . . . . . . . . . . . . . . . 202
6.5 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.6 Piecewise continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

7 Fourier series 211

7.1 Periodic functions and trigonometric polynomials . . . . . . . . . . . . . . . . . 212
7.2 Fourier coefficients and Fourier series . . . . . . . . . . . . . . . . . . . . . . . 215
7.3 First convergence theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.4 The theorem of Dirichlet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

8 Multivariate Calculus 230

8.1 Sequences in Rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.2 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.3 Differential calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.3.1 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.3.2 (Total) differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8.3.3 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
8.3.4 Higher order partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 249
8.4 Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
5

8.4.1 Extrema subject to constraints . . . . . . . . . . . . . . . . . . . . . . . . . 263

8.5 Differential calculus for vector-valued functions . . . . . . . . . . . . . . . . . . 268
8.6 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
8.7 Multiple integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
6

1 Sets, numbers and functions

In this section we will introduce the most fundamental objects of mathematics – namely sets
and its elements, numbers, and relations between them. We then introduce some more involved
objects like supremum and infimum, make an excursion to induction and combinatorics, and dis-
cuss some special functions that will be important later on. Finally, we treat complex numbers,
which are necessary to give solutions to arbitrary polynomial equations.
All these things are the (grammatical) basis on which we build upon. It is therefore essential to
understand and memorize every part of this section, like an alphabet in a foreign language.

1.1 Sets and notation

A set M is a collection of different ’objects’ which we call elements of M . This rather intuitive
description of a set was first given by Georg Cantor (1845–1918). We use the following notation:

x belongs to M, write x ∈ M,
or
x is not in M, write x ∈
/ M.

A typical visualization of a set with an element is:

x∈M x∈
/M

M M
x
x

Some very important (and partly well-known) sets of numbers together with their ’symbol’ are:

• N := {1, 2, 3, . . . } is the set of natural numbers.

• N0 := {0, 1, 2, 3, . . . } is the set of natural numbers with zero.

• Z := {. . . , −3, −2, −1, 0, 1, 2, 3 · · · } is the set of integer numbers.

• Q is the set of rational numbers.

• R is the set of real numbers.

• C is the set of complex numbers.

(Note that we write “:=” instead of “=” if the equation is meant as a definition.)
7

All these sets will be precisely introduced and discussed later in this chapter. First, let us see
that there are multiple ways to define sets. The easiest way would be to list all its elements, as
for:

• A := {0, 1, 2}

• B := {Artificial Intelligence, Mathematics, Physics, Informatics}

However, if we have sets containing an infinite amount of elements we cannot name all of them.
In this case we use dots if it is clear what is contained in the set. For example, we might write

• G := {2, 4, 6, . . . } and

• U := {1, 3, 5, . . . }

for the even and odd natural numbers. However, this may lead to difficulties of interpretation
as this is not a unique description. For example, we may define the set of natural numbers by

N := {1, 2, 3, . . . } = {1, 2, 3, 4, 5, 6, 7, . . . },

and the set of all prime numbers by

P = {1, 2, 3, . . . } = {1, 2, 3, 5, 7, 11, . . . },

which are not distinguishable, if we only list the first 3 elements.

For a unique definition, it is therefore formally necessary to either list all elements of a set or
to precisely specify the properties of its elements, like in

• P := {n ∈ N : n is prime}

• G := {n ∈ N : n is an even number}.

A special but important set is the empty set, short ∅, which does not contain any element,
i.e. ∅ = {}.
Two sets M and N can also be related to each other. If for arbitrary m ∈ M we also have
m ∈ N then we say M is a subset of N , and we write M ⊂ N or M ⊆ N . In this case, we also
call N a superset of M . If we have a look at the sets defined above, we have e.g. G ⊂ N and
U ⊂ N. Note that for any set M we have the obvious relations M ⊂ M and ∅ ⊂ M .
Sets M and N are called equal if they contain the same elements, i.e. M ⊂ N and N ⊂ M .
For example we have
A = {0, 1, 2} = {0, 0, 0, 1, 1, 1, 2} =: Ã.

To verify this equation we have to show that A ⊂ Ã and Ã ⊂ A and start by showing A ⊂ Ã.
Obviously 0 ∈ Ã, 1 ∈ Ã and 2 ∈ Ã, hence by definition we have A ⊂ Ã. The other way around,
i.e. Ã ⊂ A, is left as an exercise. Note that multiplicities are irrelevant in sets.
If we have M ⊂ N and M 6= N , then we say that M is a proper or strict subset of N and
write M ( N . For example N ( N0 ( Z.

Remark 1.1. Some authors prefer to use “⊆” instead of “⊂” to indicate that equality is not
excluded. (And we also do so sometimes.) The same authors may use “⊂” instead of “(” for
proper subsets . So, one might be careful when using different literature.
8

Sets may contain other sets. An important example is the power set P(M ) for a set M , which
is the set of all possible subsets of M , i.e.,

P(M ) := {A : A ⊂ M }.

Consider once more the set A = {0, 1, 2}, then its power set is given by:

P(A) = ∅, {0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}, {0, 1, 2} .

Note that we always have M ∈ P(M ) and ∅ ∈ P(M ). (Important: The statement M ⊂ P(M )
is usually false! M contains elements, and P(M ) contains sets of elements.)

We can also create new sets from given sets, say M and N , by using set operations:
The union M ∪ N contains all elements which belong to the set M and all elements which
belong to the set N .
The intersection M ∩ N consists of all elements which are in both sets M and N .
The difference (or relative complement) of M and N , written as M \N , is the set of all elements
of M which are not contained in N .
If we only work with subsets M ⊂ Ω for a fixed set Ω, then we call Ω the underlying set or
the universal set. In this case, we work with the notation M c = Ω \ M for the complement
of M (in Ω).

M ∪N M ∩N

M N M N

M \N Mc

M N

The illustrations above are called Venn-diagrams (John Venn, 1834-1923) and are a good tool
when working with sets.
However, we often need precise definitions of these set operations in mathematical language.
9

Definition 1.2. Let M, N be sets in an universal set Ω. We define

• the union of M and N by

M ∪ N := {x : x ∈ M or x ∈ N },

• the intersection of M and N by

M ∩ N := {x : x ∈ M and x ∈ N },

• the difference of M and N

M \ N := {x ∈ M : x ∈
/ N}

• and the complement of M (in Ω)

M c := {x ∈ Ω : x ∈
/ M }.

Elements of sets are not ordered since {a, b} = {b, a}. Nevertheless, it is often important to
order the objects under consideration.

Definition 1.3 (Tuples and Cartesian product). Let A and B be sets, and a ∈ A and
b ∈ B arbitrary elements.

• The expression (a, b), which is sensitive to order, is called a tuple or an ordered pair.

• Two tuples (a, b) and (a0 , b0 ) are equal if and only if a = a0 and b = b0 .

• The set of all tuples

A × B := {(a, b) : a ∈ A, b ∈ B}
is called the Cartesian product of the sets A and B.

Note that

• a tuple (a, b) and a set {a, b} are completely different objects.

• if we consider more than two sets, say A1 , A2 , . . . , Ad for some d ∈ N, then we can also
define the (d-fold) Cartesian product

A1 × A2 × . . . × Ad := {(a1 , . . . , ad ) : ai ∈ Ai for all i = 1, . . . , d},

whose elements (a1 , . . . , ad ) are called (d-)tuples.

Example 1.4. Let A = {x, y} and B = {1, 2, 3}, then

A × B = {(x, 1), (x, 2), (x, 3), (y, 1), (y, 2), (y, 3)}

but
B × A = {(1, x), (1, y), (2, x), (2, y), (3, x), (3, y)}.
10

Let us finally fix some other mathematical language to make writing mathematical statements
more ’elegant’. To this end, we start with the universal and the existential quantifier, which
form the basis of many expressions in mathematical language. First, a proposition P is an
expression which can either be true or false, like 1 = 1 or 0 = 1. If we have that P (m) is a
proposition for all elements m of a set M , then we say that P (·) is a predicate for M .

Definition 1.5. Let P (·) be a predicate for M , i.e., P (m) is a proposition for every m ∈ M .
The universal quantifier builds a proposition

∀ m ∈ M : P (m),

which is true if and only if for all m ∈ M the proposition P (m) is true.
The existential quantifier builds a proposition

∃ m ∈ M : P (m),

which is true if and only if there exists at least one m ∈ M such that P (m) is true.
The uniqueness quantifier for which

∃! m ∈ M : P (m)

is true if and only if there exists exactly one m ∈ M such that P (m) is true.

Example 1.6. Consider the set M = {0, 1, 2} and set P (m) = (m > 1). Inserting all the
elements of M into P (·) we get (0 > 1), (1 > 1) and (2 > 1). Clearly, only the last statement
is true. Hence, (∀m ∈ M : P (m)) = (∀m ∈ M : m > 1) is a wrong proposition, while ∃m ∈
M : m > 1 is true. We even have that ∃!m ∈ M : m > 1 is true.
Example 1.7. With these quantifiers we can also give a more mathematical (or ’elegant’)
definition of a subset. We have that M ⊂ N if and only if
∀x ∈ M : x ∈ N.
Moreover, we have M ( N if and only if M ⊂ N and ∃x ∈ N : x ∈
/ M.

As you might have already noticed, we will often need the terms “if” or “if and only if”, and
therefore we define a mathematical symbol for them. Let A and B be two propositions. Then,

• “A =⇒ B” means “A implies B.” or “If A, then B.”

• “A ⇐⇒ B” means “A is true if and only if B is true”. That is, A =⇒ B and B =⇒ A.

(We also use “iff ” as abbreviation of “if and only if”.)

With all these notations, we may write certain definitions or statements without any ’usual
word’ and exclusively with mathematical symbols. For example,
M ⊂ N : ⇐⇒ (∀x ∈ M : x ∈ N ) ⇐⇒ (∀x : x ∈ M =⇒ x ∈ N ).
(Again, we use “: ⇐⇒ ” instead of “ ⇐⇒ ” to indicate that this is actually a definition.)
All of these statements are just mathematical language for “Every element of M is also an
element of N ” or “If an element is in M , then it is also in N ”. However, such short (and elegant
and exact) notation is often beneficial.
11

1.2 Relations and functions

Roughly speaking, relations shall describe connections between two objects. Here, we give a
formal description and important properties. We then introduce functions, which are used to
“map” every element of a set to something different, and discuss special relations that are used
to compare, group or order elements of a given set.

Definition 1.8. A relation R between two sets M and N is a subset of the cartesian
product of M and N , i.e. R ⊂ M × N .

To make things clearer we have a look at the upcoming illustration, which depicts every element
of R as a line. As you can see it is possible that x ∈ M is “connected” to some y ∈ N , which we
denote by (x, y) ∈ R. However, this does not have to be the case for every x ∈ M , and, different
elements of M may be mapped to the same y ∈ N . Moreover, x ∈ M can be mapped to more
than one element in N .

a relation

Example 1.9. Let M = {Anna, Philipp, Kevin, Julia} and N = {Corinna, Jakob, Anja}. Now
we define a relation R ⊆ M × N , where we have (x, y) ∈ R if and only if the first letter of x
equals the first letter of y. Clearly we have R = {(Anna, Anja), (Julia, Jakob)} ( M × N .
12

Now we head to a very important type of relation, i.e., functions, which assign to each element
of M exactly one element of N .

Definition 1.10. Let M, N 6= ∅.

We call f : M → N a function from M to N , if each x ∈ M is assigned exactly one f (x) ∈ N .
The mapping rule is written in the following way x 7→ f (x).
M is called domain (of definition) and N codomain of f .
Let S ⊂ M . We define the image of S under f as

f (S) := {f (x) : x ∈ S} ⊆ N,

and the range of f as

f (M ) := {f (x) : x ∈ M } ⊆ N.

Moreover, for T ⊂ N we define the preimage of T under f by

f −1 (T ) := {x : f (x) ∈ T } ⊆ M.

To show the connection between functions and relations, we define the following.

Definition 1.11. Let f : M → N be a function. We define the graph of f as

Gf := {(x, f (x)) : x ∈ M } ⊂ M × N.

Note that the graph of a function is a relation. In this sense, all functions induce a relation, but
not vice versa.
We can visualize real valued functions by plotting its graph in a usual coordinate system (in
R2 ). For f (x) = x2 and f (x) = x + 1 this is demonstrated in the next illustration.

f (x)
f (x) = x2

f (x) = x + 1

Figure 1: The graph of x2 and x + 1

In what follows, we will define several important properties of relation. We will always demon-
strate afterwards what this means for functions.

Definition 1.12. Let R ⊆ M × N be a relation. R is called

• injective if and only if

∀(x1 , y1 ), (x2 , y2 ) ∈ R : x1 6= x2 ⇒ y1 6= y2 ,

which is equivalent to

∀(x1 , y1 ), (x2 , y2 ) ∈ R : y1 = y2 ⇒ x1 = x2 .

• surjective if and only if

∀y ∈ N ∃x ∈ M : (x, y) ∈ R.

• bijective if and only if it is injective and surjective.

• functional if and only if

∀x ∈ M, y1 , y2 ∈ N : (x, y1 ), (x, y2 ) ∈ R ⇒ y1 = y2 .

Note that the graph of a function is a functional relation, and vice versa. We can therefore
rephrase the above definitions for functions. If f : M → N is a function we say:

f is injective :⇐⇒ ∀x1 , x2 ∈ M : x1 6= x2 ⇒ f (x1 ) 6= f (x2 )

f is surjective :⇐⇒ ∀y ∈ N ∃ x ∈ M : f (x) = y

f is bijective :⇐⇒ ∀y ∈ N ∃! x ∈ M : f (x) = y.

We call injective, surjective and bijective function also injections, surjections and bijections,
respectively.
14

Let us see some illustrations for a better understanding.

Figure 2: injective relation

Figure 3: surjective relation

Figure 4: a function (not injective and not surjective)

Figure 5: bijective function

We will now define two functions that can in principle be defined on arbitrary sets M . First, we
define the identity function IdM : M → M which maps each element to itself, i.e., x 7→ x.

Figure 6: The identity IdM : M → M

Let M and N be arbitrary nonempty sets and c ∈ N be fixed. The function f : M → N , x 7→ c

is called constant function, and is often just denoted by f = c.

Figure 7: The constant function f ≡ c

We now ask ourselves the question: If f : M → N is a function and y ∈ N is given, is there

some x ∈ M such that f (x) = y and is this x unique? This leads us to the concept of an inverse
function.

Definition 1.13. Let f : M → N and g : N → M be functions with the properties

∀x ∈ M : g(f (x)) = x
and
∀y ∈ N : f (g(y)) = y,
then f and g are inverses of each other.
In this case we write f −1 := g and g −1 := f and call f (or g) invertible.

Note that we used the notation f −1 already for the preimage, see Definition 1.10. There, the
input was a set, and the preimage was defined for any function f . For an invertible function
f : M → N we have, by definition, f −1 ({f (x)}) = {x} and f ({f −1 (y)}) = {y} for any x ∈ M and
y ∈ N . In particular, the preimage of any one-element subset of N has also exactly one element.
So, this notation makes sense, if we identify f −1 (y) with the unique element in f −1 ({y}).

Remark 1.14. If f : M → N is an invertible function and f −1 : N → M is the inverse function

of f , then we have f −1 (f (x)) = IdM (x) and f (f −1 (y)) = IdN (y).

Example 1.15. Let R+ be all positive real numbers, √ and f : R+ → R+ , x 7→ x2 and g : R+ →

+ √
R , y 7→ y. For fixed x > 0 we have g(f (x)) = x2 = x. On the other hand we have
√
f (g(y)) = ( y)2 = y, for arbitrary y > 0. Therefore, f and g are inverses of each other.

4 f (x) = x2

3 y

√
2 g(y) = y

x
0.5 1 1.5 2 2.5 3 3.5 4

√
Figure 8: The function f (x) = x2 and its inverse f −1 (x) = x

The following theorem provides us with a tool to check if a function has an inverse or not.
However, finding a closed formula is often much harder or even impossible. This is one of
the main reasons for using numerical software and approximations. We will see several
examples later during the course.
18

Theorem 1.16. Let f : M → N be a function. Then,

f is invertible ⇐⇒ f is bijective.

Proof. We want to prove an equivalence, and therefore have to prove both directions.
First we are going to prove “⇒”:
Since f is invertible, there exists f −1 : N → M such that

f −1 (f (x)) = x ∀x ∈ M

and
f (f −1 (y)) = y ∀y ∈ N.
Recall that bijective means surjective and injective. We show that f is surjective, i.e.,

∀y ∈ N ∃x ∈ M : f (x) = y.

Let y ∈ N be arbitrary but fixed. Set x0 := f −1 (y). Since f −1 is a function, x0 is unique.

Further f (x0 ) = f (f −1 (y)) = y. Thus for arbitrary y ∈ N we found a x ∈ M such that
f (x) = y. This shows that f is surjective.
Secondly we verify that f is injective. Let f (x1 ) = f (x2 ) for some x1 , x2 ∈ M . Since f (x1 ) ∈ N
and f (x2 ) ∈ N we may apply f −1 and get

x1 = f −1 (f (x1 )) = f −1 (f (x2 )) = x2 .

This shows that f is injective.

We now prove the reverse direction “⇐”: Assume f is bijective and check that f is invertible.
Since f is bijective we have ∀y ∈ N ∃!x ∈ M : f (x) = y. So for each y ∈ N we can find a unique
x ∈ M such that f (x) = y. Now we define g : N → M such that it maps each y ∈ N to this
unique x ∈ M , i.e. g(y) = x. Moreover we have f (g(y)) = f (x) = y and g(f (x)) = g(y) = x.
⇒ f is invertible with g = f −1 .

Invertible (or bijective) functions may be used to formally define the cardinality of a set.
For this, note that the existence of a bijective function f : M → N means that there is a one-
to-one correspondence between M and N . In particular, both sets must have the same
cardinality (aka. size). This means that a set M has cardinality n ∈ N, i.e., |M | = n, if and only
if there is a bijective function f : {1, . . . , n} → M . (Clearly, this is equivalent to the existence
of a bijective/invertible function g : M → {1, . . . , n}.)
Even more, for two finite sets A, B we have that

• |A| = |B| if there is a bijection f : A → B,

• |A| ≤ |B| if there is a injection f : A → B,

• |A| ≥ |B| if there is a surjection f : A → B.

(Verify this yourself!)

This notion also allows us (to some extent) to characterize the cardinality of an infinite set.

Definition 1.17. Let A be a set and n ∈ N. If the elements of A can be labeled by the
numbers {1, . . . , n}, then we say that A has cardinality n, and we write

|A| = n or #A = n.

If |A| < ∞, i.e., ∃n ∈ N : |A| = n, then we call A finite, or a finite set.

If |A| is not finite, then we call A infinite, or an infinite set, and write |A| = ∞.
If A is either finite or there exists a bijection f : N → A, then we call A countable.
If A is not countable, then we call A uncountable.

Note that countability is the precise definition of the “simple” property that the elements of A
can be enumerated by the natural numbers {1, 2, 3, . . . }.

Example 1.18. We have that f : N → Z with f (2n) := n and f (2n − 1) := −n + 1 for n ∈ N

is a bijection. Verify this yourself! (For example, f (1) = 0, f (2) = 1, f (3) = −1 and so on.)
Hence, Z is a countable set.

Example 1.19. Let M and N be finite sets. Check that |M × N | = |M | · |N |.

We can also consider the composition of functions. Let X, Y, Z be non-empty sets, f : X → Y

and g : Y → Z be functions. We then define a function (g ◦ f ) : X → Z by composing f and
g, i.e., (g ◦ f )(x) := g(f (x)). As an exercise check that g ◦ f is indeed a function.

f g
X Y Z

g◦f
X Z

Figure 9: The composition (g ◦ f )(x) := g(f (x))

Example 1.20. Consider the function h(x) = sin(x2 ). If we set f (x) = x2 and g(y) = sin(y)
we get (g ◦ f )(x) = g(f (x)) = sin(x2 ) = h(x). However, since (f ◦ g)(x) = f (g(x)) = (sin(x))2 ,
it is important to mind that, in general, we have (f ◦ g) 6= (g ◦ f ).

Theorem 1.21. Let f : X → Y and g : Y → Z be invertible functions. Then g ◦ f is

invertible and
(g ◦ f )−1 = f −1 ◦ g −1 .

Proof. Recall that f, g invertible ⇐⇒ f, g, f −1 , g −1 bijective. Hence

∀y ∈ Y ∃!x ∈ X : f −1 (y) = x
and
∀z ∈ Z ∃!y ∈ Y : g −1 (z) = y.
Furthermore

∀x ∈ X ∃!y ∈ Y : f (x) = y
and
∀y ∈ Y ∃!z ∈ Z : g(y) = z.

This yields
(f −1 ◦ g −1 )(z) = f −1 (g −1 (z)) = f −1 (y) = x
and
(g ◦ f )(x) = g(f (x)) = g(y) = z.
Thus
(g ◦ f )[(f −1 ◦ g −1 )(z)] = (g ◦ f )(x) = z
and
(f −1 ◦ g −1 )((g ◦ f )(x)) = (f −1 ◦ g −1 )(z) = x,
which shows that f −1 ◦ g −1 is the inverse of g ◦ f , and vice versa.

Finally, let us discuss some relations that are, instead of mappings, used to compare, group
or order elements of a set M . For this we consider relations R ⊂ M 2 = M × M . Such relations
have usually nothing to do with functions, but are still very essential. Again, let us give names
to some of their important characteristics.
21

Definition 1.22. Let R ⊆ M 2 be a relation for an arbitrary M 6= ∅. We call R

• reflexive if and only if

∀x ∈ M : (x, x) ∈ R,

• symmetric if and only if

∀(x, y) ∈ R : (y, x) ∈ R,

• antisymmetric if an only if

∀x, y ∈ M : (x, y), (y, x) ∈ R =⇒ x = y

• transitive if and only if

∀x, y, z ∈ M : (x, y), (y, z) ∈ R =⇒ (x, z) ∈ R

• total if an only if
∀x, y ∈ M : (x, y) ∈ R or (y, x) ∈ R.

Some relations which have some of these properties are especially important.

Definition 1.23. A relation is called

• equivalence relation if it is reflexive, symmetric and transitive,

• partial order if it is reflexive, antisymmetric and transitive,

• linear order if it is a partial order and total.

As an example consider the relation L ⊆ R2 , where we define (x, y) ∈ L :⇐⇒ x ≤ y for x, y ∈ R.

This is the well-known smaller or equal relation in R and we just write x ≤ y for (x, y) ∈ L
in the following. The smaller or equal relation is clearly

• reflexive: x ≤ x,
• antisymmetric: x ≤ y and y ≤ x =⇒ x = y,
• transitive: x ≤ y and y ≤ z =⇒ x ≤ z,
• total: x ≤ y or y ≤ x

for all x, y, z ∈ R, and hence, “≤” is a linear order.

Example 1.24. The usual equality relation “=” in R is reflexive, symmetric, antisymmetric
and transitive, but not total. It is therefore an equivalence relation and a partial order. As an
exercise, show that “=” is the only reflexive relation that is symmetric and antisymmetric.

Example 1.25. Define the “strictly less” relation “<” by a < b if and only if a ≤ b and
a 6= b. Determine all characteristics of “<”, as well as “≥”, and “>”. (Relations with the same
characteristics as “<” are called strict partial orders.)
22

Example 1.26. Let us also mention the not equal relation in R, which is defined by a 6= b if
and only if a < b or b < a. Which characteristics does it have?

Example 1.27. Show that the divisibility relation R ⊂ N2 , with (a, b) ∈ R :⇔ a|b, i.e., a divides
b, is a partial order.

Example 1.28. One can define a partial order on a set of sets by using the subset relation ⊂.
For example, for M := {∅, {1}, {2}, {1, 2}}, we have that, e.g., ∅ ⊂ {1} ⊂ {1, 2}. It is easy to
see that ⊂ is reflexive, antisymmetric and transitive. Hence, ⊂ is a partial order on M . However,
since {1} 6⊂ {2} and {2} 6⊂ {1}, it is not a linear order on M .
23

1.3 Real numbers

The commonly known natural numbers or rational numbers (fractions) are not sufficient for a
rigorous foundation of mathematical analysis. The historical development shows that for issues
concerning analysis, the rational numbers have to be extended to the real numbers. Maybe, you
already know a lot about the real numbers. However, you probably do not know that all its
properties follow from a few basic ones. So, let us introduce them from scratch.
We begin with the set of natural numbers, i.e.,
N := {1, 2, 3, . . . }.
However, we have seen that such a definition is not precise enough, and we want to define this
set solely by its properties.
These properties are called the Peano axioms, presented first by Giuseppe Peano (1858–1932)
around 1889. The only thing we need to assume is that we know what the number “1” is,
and what it means to add “1” to a number. (This is very reasonable, when we think about
“counting” in real life.)
We then define the natural numbers by the (unique) set N such that

1. 1 ∈ N,
2. n ∈ N =⇒ n + 1 ∈ N,
3. ∀n, m ∈ N : n = m ⇐⇒ n + 1 = m + 1, and
4. ∀n ∈ N : n + 1 6= 1.

In words, 1 is a natural number, for every natural number n, its “successor” n + 1 is also a
natural number, two natural numbers are equal iff their successors are equal, and no natural
number has the successor 1. Although these axioms seem to be obvious or redundant, depending
on the point of view, they are really the only “axioms” we need to assume to build up most of
modern mathematics. (A detailed discussion goes far beyond scope here.)
Next, the set of natural numbers with zero (or non-negative integers) is defined by
N0 := {0} ∪ N.

The sets N and N0 are closed under addition and multiplication, i.e., for all n, m ∈ N we have
n + m ∈ N and m · n ∈ N. However, we already get in trouble when we try to work with
subtraction, since 21 − 42 = −21 ∈
/ N.
The set of integer numbers is also closed under subtraction and is defined as
Z := {0, −1, 1, −2, 2, . . . } = {· · · − 2, −1, 0, 1, 2, . . . } = N0 ∪ (−N),
where −N := {−n : n ∈ N}. Clearly, for a, b ∈ Z we have a + b ∈ Z, a − b ∈ Z and a · b ∈ Z.
However, division is still a problem if we use integer numbers only.
The set of all fractions of two integers is the set of rational numbers which is denoted as
p

Q := : p, q ∈ Z , q 6= 0 ,
q
where we call p numerator and q denominator.
We call two rational numbers nk and ml equal if and only if km = ln. Further an integer k ∈ Z
can be identified with the fraction k1 ∈ Q. Consequently, the inclusions N ( Z ( Q are true.
24

One main reason to introduce “new” sets of numbers ist that we want to solve equations
and, in particular, we want to know if a solution exists in a given set.
For example, if we want to solve the equation a · x + b = 0, where a, b ∈ Q are fixed constants
and a 6= 0, it is easy to see that
−b
x= ∈Q
a
solves the equation.
However, what about the simple equation x2 − 2 = 0? Is there some x ∈ Q such that x2 = 2?
The following discussion will show that this is not possible.

Lemma 1.29. Let n ∈ N. Then we have n is even ⇐⇒ n2 is even.

Proof. We start with showing n is even ⇒ n2 is even.

So assume n is even and has the representation n = 2 · k for some k ∈ N. Then n2 = 2 · (2 · k 2 ).
Thus n2 is an even number.
Now we show n2 is even ⇒ n is even.
We prove this by contradiction, i.e., we show that n is odd implies that n2 is also odd. So, if n
has the representation n = 2 · m + 1 for some m ∈ N, we obtain n2 = 2m · (2 · m + 1) + 2m + 1,
which is odd. This proves the statement.

Theorem 1.30. There is no rational number x, such that x2 = 2.

Proof. Proof by contradiction, i.e., we assume ∃x ∈ Q : x2 = 2 and show that this yields a
wrong result. Since (−x)2 = x2 we may assume that x > 0 and that x = m n , where m, n ∈ N.
Furthermore we consider that at least one of the numbers n or m is odd. Otherwise we could
cancel by 2. This yields the equation

m2
x2 = =2
n2
which is equivalent to
m2 = 2 · n2 .
Hence m2 is an even number and consequently (previous lemma) m is an even number and we
can write m = 2 · k for some k ∈ N. So we have

m2 = 4 · k 2 = 2 · n2 ,

leading to
2 · k 2 = n2 .
Applying the previous lemma once again, we get the result that n has to be an even number,
which contradicts the assumption that either m or n has to be odd.
25

Thus the equation x2 − 2 = 0 is not solvable in Q, but as we all know from school there is a
√ √ 2
number 2 such that 2 = 2.
Making the so called number line complete, we finally get to the set of real numbers. These
numbers can have infinitely many decimals, i.e.,

a1 a2

R := z + r : z ∈ Z, r = + + . . . ; where a1 , a2 , · · · ∈ {0, 1, . . . 9} .
10 100

One may think of R as the set of all points on the number line, i.e., without holes. Note that
the rational numbers, written as decimal numbers, either have a finite number of digits or the
sequence of digits is periodic. This means, that some points on the line are missing. These
correspond to numbers which have a non-periodic infinite√number of decimals, in other terms
which cannot be written as fractions, as various roots, like 2, and the numbers π and e. These
ones, i.e. numbers in R \ Q are called irrational numbers. We will later see that the set R is
(assumed to be) in some sense complete. Such a precise definition was only given in the 19th
century, probably by Karl Weierstraß (1815–1897).
Although, we will not prove that here, let us note that the set of rational numbers is
countable, whereas the set of real numbers are uncountable. (The proofs can easily be
found in the literature and are quite interesting, but they go beyond the scope here.)
In summary the following set relations hold:

N ( N0 ( Z ( Q ( R.

Note that the following calculation rules are valid for any real numbers:

Axiom 1 (Field axioms). For all x, y, z ∈ R we have

Commutativity : x + y = y + x, x·y =y·x

Associativity : x + (y + z) = (x + y) + z, x · (y · z) = (x · y) · z
Distributivity : x · (y + z) = (x · y) + (x · z)
Identity elements : ∃0 ∈ R ∀x ∈ R : x + 0 = 0 + x = x,
: ∃1 ∈ R ∀x ∈ R : x · 1 = 1 · x = x
Inverse elements : ∀x ∈ R ∃y ∈ R : x + y = y + x = 0,
∀x ∈ R\{0} ∃y ∈ R : x · y = y · x = 1

We call these properties axioms because, actually, we somehow assume them to be true. (Or
how would you prove these statements?) Many of the calculations that follow in the upcoming
chapters could also be done with other fields. We do not go into details here.

Example 1.31. Another important and well-known field is the finite field Z2 := {0, 1} with
the addition 0 + 0 := 0, 1 + 0 := 1, 0 + 1 := 1, 1 + 1 := 0, and the multiplication 0 · 0 := 0,
1 · 0 := 0, 0 · 1 := 0, 1 · 1 := 1. (Note that we formally need to define what we mean by addition
and multiplication.) These are numbers and operations, computers are working with. Verify
yourself that Z2 is a field.
26

1.4 Bounded sets, infimum and supremum

First, recall the following calculation rules for inequalities. Let a, b, c, d ∈ R, then

• a < b =⇒ a ± c < b ± c

• a < b and c < d =⇒ a + c < b + d

• a < b =⇒ −a > −b
1 1
• 0 < a < b =⇒ 0 < b < a

These inequalities remain true if we replace < by ≤ and > by ≥, respectively.

We now want to specify the “least” and the “greatest” element of a set. To define this
formally, we first need the definition of bounded set (in R).

Definition 1.32. Let A ⊆ R. We say A is

• bounded from above if and only if

∃C ∈ R ∀a ∈ A : a ≤ C

and we call C an upper bound of A.

• bounded from below if and only if

∃c ∈ R ∀a ∈ A : c ≤ a

and we call c a lower bound of A.

• bounded if and only if A is bounded from above and from below.

Let us discuss this with the following basic subsets of R, i.e., intervals.

Definition 1.33. Let a, b ∈ R. Then we define the closed interval

[a, b] := {x ∈ R : a ≤ x ≤ b},

half open intervals

[a, b) := {x ∈ R : a ≤ x < b}
and
(a, b] := {x ∈ R : a < x ≤ b},
and open interval
(a, b) := {x ∈ R : a < x < b}
between a and b.
Moreover, we write [a, ∞) := {x ∈ R : a ≤ x} and (−∞, b) := {x ∈ R : x < b} etc.
27

Let a, b ∈ R such that a < b. Then for the closed interval [a, b] we have that a, a − 1, a − 42 are
lower bounds and b, b + 42 are upper bounds, and the same is true for the corresponding (half)
open interval. So, an upper bound is not unique.

To fix a specific upper/lower bound, let us first define the minimum and maximum of a set.

Definition 1.34. Let A ⊂ R be non-empty and T ∈ R. Then, T is called

• minimal element or minimum of A, denoted by min A := T , if

◦ T ≤ A, i.e., T is a lower bound and

◦ T ∈ A, i.e., T is contained in A.

• maximal element or maximum of A, denoted by max A := T , if

◦ A ≤ T , i.e., T is an upper bound and

◦ T ∈ A, i.e., T is contained in A.

If the maximum/minimum exists, it is clearly unique.

Example 1.35. Let a, b ∈ R with a < b. Then, min[a, b] = a and max[a, b] = b.

Example 1.36. The set of the natural numbers N ⊂ R is bounded from below, with min N = 1.

However, maxima and minima do not have to exist: While b is the least upper bound of both
[a, b] and (a, b), we have that b ∈ [a, b], but b ∈
/ (a, b). Hence, the set (a, b) does not have a
maximum (or minimum), but still we would like to work with the “best possible” bounds for
such a set, which are clearly a and b in this example.

For this we define the infimum and the supremum as the greatest lower bound and the least
upper bound, respectively. These objects will be very important in the upcoming analysis.

Definition 1.37. Let A ⊂ R be non-empty and T ∈ R. Then, T is called

• greatest lower bound or infimum of A, denoted by inf A := T , if

◦ T ≤ A, i.e., T is a lower bound and

◦ x ≤ A =⇒ x ≤ T , i.e., there is no greater lower bound.

• least upper bound or supremum of A, denoted by sup A := T , if

◦ A ≤ T , i.e., T is an upper bound and

◦ A ≤ x =⇒ T ≤ x, i.e., there is no smaller upper bound.

If A is not bounded from above (below) we set sup A := ∞ (inf A := −∞).

For the empty set we define sup ∅ := −∞ and inf ∅ := ∞.

Clearly, if inf A ∈ A, then inf A = min A, and if sup A ∈ A, then sup A = max A. In words, if
the infimum (or supremum) of a set A is contained in A, then A has a minimum (or maximum)
which has the same value.
28

Moreover, infimum and supremum are uniquely determined. To see this, assume that
there are two suprema T1 and T2 of A. Since sup A = T1 , we have A ≤ T1 . In addition, since
sup A = T2 , we obtain by the second defining property above that A ≤ x =⇒ T2 ≤ x. Setting
x = T1 , we have T2 ≤ T1 . In the same way, we get T1 ≤ T2 , and hence T1 = T2 .

Example 1.38. Let a, b ∈ R with a < b. Then,

min[a, b] = inf[a, b] = inf[a, b) = inf(a, b] = inf(a, b) = a

and
max[a, b] = sup[a, b] = sup[a, b) = sup(a, b] = sup(a, b) = b.
However, min(a, b), min(a, b], max(a, b) and max[a, b) do not exist.

Example 1.39. Let A = x2 : x ∈ (−1, 1) , then inf A = min A = 0 and sup A = 1. Verify

yourself!

However, it is not clear at all if every set has an infimum and supremum. For example, if
we would try the same with Q instead of R, i.e., we look for a least upper bound T ∈ Q for a
set A ⊂ Q, this would not be true. Consider 2
√ e.g. the set A = {x ∈ Q : x ≤ 2}. If we consider
A as a√subset of R, then its supremum is 2 ∈ R. But as a subset of Q it has no supremum,
since 2 ∈/ Q. The reason is, that the rational numbers have “gaps”.
The real numbers R were precisely defined to have no such “gaps”. However, note that this is
actually an assumption and we formalize this by the next axiom of this lecture.

Axiom 2 (Completeness axiom). Every non-empty subset A of R that is bounded from

above has a least upper bound. That is, there always exists T ∈ R such that T = sup A.

Let us state an equivalent definition of supremum and infimum for bounded sets in R.
Although it looks more complicated at first sight, this formulation is sometimes very helpful.
Let A ⊂ R be bounded from below. Then, T = inf A if and only if

• T ≤ A, i.e., T is a lower bound and

• ∀ε > 0 ∃a ∈ A : a < T + ε, i.e., T comes arbitrarily close to A.

Analogously, let A ⊂ R be bounded from above. Then, T = sup A if and only if

• A ≤ T , i.e., T is an upper bound and

• ∀ε > 0 ∃a ∈ A : a > T − ε, i.e., T comes arbitrarily close to A.

Remark 1.40.(*) Let us add, that all the definitions above (bounded, inf/sup) also make sense,
if we replace the “usual” ≤-relation on R by another partial order on some set M .
For example, consider the subset relation ⊂ on M := {∅, {1}, {2}, {1, 2}}, see also Example 1.28.
With A := {{1}, {2}}, we have sup A = {1, 2} ∈ M . (Verify precisely that this is the least upper
bound of A with respect to ⊂.) Note that A does not have a maximum. All other suprema are
easy to calculate. In particular, every subset of M has a supremum in M , i.e., M with ⊂ is
complete. (What about the same relation on M 0 := {∅, {1}, {2}}?)
29

Remark 1.41. (*) We call a set Ω with a partial order complete, if every A ⊂ Ω that is
bounded from above has sup A ∈ Ω. Thus, very formally, the real numbers R are assumed to be
a complete field with a linear order. Note that this would not be true for N and Z, as they do
not fulfill the field axioms, and also not for Q because, again, {x2 < 2} has no supremum in Q.

Let us finally discuss the Archimedean property. It is based on the fact that the set of natural
numbers is unbounded. Even though this property seems unimpressive, it was of significant
importance for real analysis.

Theorem 1.42. The following assertions hold:

a) Archimedean property: ∀x ∈ R ∃n ∈ N : n > x, i.e., N has no upper bound in R.

1
b) ∀ε > 0 ∃n ∈ N : n < ε.
m m
c) For x, y ∈ R with x < y there exists a rational number n ∈ Q with x ≤ n ≤ y.

Proof. a) Let us recall that the natural numbers N are defined solely by the properties 1 ∈ N,
and that n ∈ N implies that n + 1 ∈ N.
We first prove by contradiction that N is unbounded. For this, we assume that N is bounded.
Thus, the supremum x = sup N exists by the completeness axiom. As x is the smallest upper
bound of N, x − 1 is no upper bound of N, so there exists a n ∈ N with x − 1 < n. By addition
with 1 this also implies x < n + 1. But n + 1 ∈ N follows from n ∈ N, and therefore x is no upper
bound of N, which is a contradiction to the original assumption. So, N must be unbounded.
In order to show b) and c) one continues as follows:
N has no upper bound ⇐⇒ ∀x ∈ R ∃n ∈ N : n > x
1 1 1
⇐⇒ ∀x > 0 ∃n ∈ N : < ; set ε :=
n x x
1
⇐⇒ ∀ε > 0 ∃n ∈ N : < ε.
n
c) It is sufficient to check the case x, y > 0 (Why?). Then we are looking for m, n ∈ N such
that nx ≤ m ≤ ny. With part a) above, we get a n ∈ N with n(y − x) ≥ 1. Now take the
smallest natural number m ≥ nx (which clearly exists). Then m ≥ nx and m − 1 < nx. The
last condition implies that m < nx + 1 ≤ ny, so we get nx ≤ m ≤ ny.

Based on this, we easily calculate certain infima and suprema of (discrete) sets.
n o
1
Example 1.43. Let A = n : n ∈ N , then inf A = 0 and sup A = max A = 1.
To prove that, first note that 0 < n1 ≤ 1 for all n ∈ N. So, 0 is a lower bound, and 1 is an upper
bound. Since 1 ∈ A (for n = 1), we therefore obtain that sup A = max A = 1. For the infimum
we need to show the there is no larger lower bound. For this, let ε > 0 be arbitrary. By the
Archimedean property there exists an n ∈ N with n1 < ε. Hence, ε is not a lower bound. As this
works for all ε > 0, we conclude that 0 is the largest lower bound of A, and hence inf A = 0.
n o
1
Example 1.44. Let A = n2 −n−3
: n ∈ N , then inf A = min A = −1 and sup A = max A = 13 .
30

1.5 Induction and combinatorics

In the following we are dealing with mathematical induction which helps us to prove state-
ments or define objects for all n ∈ N.

Theorem 1.45 (Mathematical induction). A predicate P (n) is true for all n ∈ N if and
only if the following two steps hold:

a) Induction basis: P (1) is true.

b) Induction step: if P (n) is true for some n ∈ N, then P (n + 1) is also true.

Note that, by the Peano axioms, we reach every natural number this way.
The concept of mathematical induction can be used for the definition of sequences of objects,
which is sometimes quite helpful. If, for instance, G(n) is a quantity that has to be defined for
all n ∈ N, then it is sufficient to define G(1) and, for all n ∈ N, G(n + 1) in terms of G(n).
One example are the formal definitions of the sum and product symbols.

Definition 1.46. Let ak ∈ R for k ∈ N. Then we define sum and product as follows:
1
X n+1
X n
X
ak := a1 ak := an+1 + ak
k=1 k=1 k=1
Y1 n+1
Y n
Y
ak := a1 ak := an+1 · ak
k=1 k=1 k=1

k is called the summation index.

A special case of products: For a ∈ R we have a1 := a and an+1 := a · an .

Let us see how to prove by induction. As this is an often used and very structural proof, we
will use some short notation to highlight the corresponding parts. According to Theorem 1.45,
one needs to show that the statement is true for the first element, which we call the induction
basis, denoted by (IB). Then, by assuming the induction hypothesis (IH), that the assertion is
true for n, we prove that it is also true for n + 1, which we call the induction step (IS).
Let us discuss two examples to demonstrate this type of proof.
The first one is (at least without proof) known to many from school. The young Carl Friedrich
Gauß (1777–1855) knew it already by the age of nine.
Pn n(n+1)
Example 1.47. We prove the formula k=1 k = 2 (Gauß’sche Summenformel):
P1 1(1+1)
IB (n = 1) : k=1 k =1= 2 is true.
Pn n(n+1)
IH : k=1 k = 2

IS (n → n + 1) :
n+1 n
X X IH n(n + 1) 2 (n + 1)(n + 2)
k = k + (n + 1) = + (n + 1) = .
k=1 k=1
2 2 2
31

Another important instance of a proof by induction is Bernoulli’s inequality, which will be

essential later.

Theorem 1.48 (Bernoulli’s inequality). Let x ≥ −1 and n ∈ N. Then, we have

(1 + x)n ≥ 1 + nx.

Proof. Let x ∈ R such that x ≥ −1. We prove the statement for all n ∈ N by induction:
IB (n = 1) : (1 + x) ≥ 1 + x is clearly true.

IH : (1 + x)n ≥ 1 + nx

IS (n → n + 1) :
IH
(1 + x)n+1 = (1 + x)n (1 + x) ≥ (1 + nx)(1 + x)
= 1 + (n + 1)x + nx2 ≥ 1 + (n + 1)x.
This was what has to be shown.

Example 1.49. Try to prove the Bernoulli’s inequality with > instead of ≥. Can we use the
same assumption on x in this case?

Let us now turn to some combinatorial quantities. These quantities are heavily used when it
comes to discrete mathematics or elementary probability theory, as they represent the number
of permutations or subsets of certain size.

Definition 1.50. The factorial n! of a natural number n ∈ N is defined inductively by:

1! := 1 and ∀n ∈ N : (n + 1)! := (n + 1) · n!.

In addition, we set 0! := 1.
n
The binomial coefficient k (we say “n choose k”) for n, k ∈ N0 with n ≥ k is defined by
!
n n · (n − 1) · (n − 2) · · · (n − k + 2) · (n − k + 1) n!
:= = .
k k! k!(n − k)!
n n n n
Clearly, we have 0 = 1, 1 = n and k = n−k .

The factorial n! is the number of permutations (orderings) of a set of n numbers. If you pick
your first arbitrary element, you have n possibilities for your choice. When it comes to your
second decision you have just n − 1 options left, and so on.
The binomial coefficient nk is read as n choose k (or “n over k”; in German “n über k”). It

represents the number of ways to choose k unordered outcomes from a set of n possibilities, also
known as a combination. E.g., n2 is the number of two-element subsets of {1, . . . , n}. Clearly,

there are n(n − 1) ordered pairs (i, j) ∈ {1, . . . , n}2 with i 6= j. As the ordering is irrelevant for
sets, we have n(n−1) two-element subsets which coincides with n2 .

2
32

The following is a useful formula for binomial coefficients.

Lemma 1.51. Let n, k ∈ N with k ≤ n − 1. Then we have

! ! !
n+1 n n
= + .
k+1 k k+1

Proof.
! !
n n n(n − 1) · · · (n − k + 1) n(n − 1) · · · (n − k)
+ = +
k k+1 k! (k + 1)!
n(n − 1) · · · (n − k + 1)
= (k + 1) + (n − k)
(k + 1)!
n(n − 1) · · · (n − k + 1)
= n+1
(k + 1)!
(n + 1)n(n − 1) · · · (n − k + 1)
=
(k + 1)!
!
n+1
= .
k+1

The “simple” explanation of this equality is, that subsets of {1, . . . , n + 1} with k + 1 elements
can be splitted into those that contain the number n + 1 and those that don’t contain it. To find
the number of all sets that contain it, we need to count all k-element subsets of {1, . . . , n}, and
there are nk of them. The number of all sets that don’t contain it, is the same as the number
n
of all (k + 1)-element subsets of {1, . . . , n}, which is k+1 . So the total number is their sum.

Based on this, we can prove one of the most famous theorems in mathematics.

Theorem 1.52 (Binomial theorem). Let x, y ∈ R and n ∈ N. Then we have

n
!
n
X n k n−k
(x + y) = x y .
k=0
k

Proof. If we expand (x + y)n we get a sum of 2n expressions of the form z1 z2 · · · zn , where for all
i = 1, 2, . . . n we either have zi = x or zi = y. Counting all summands where x occurs exactly k
times (and y therefore n − k times) gives us nk . Hence there are exactly nk summands of the

form xk y n−k , which proves the theorem.

Example 1.53. As a good exercise try to prove the binomial theorem by induction using
Lemma 1.51.
33

Example 1.54. If we set x = y = 1 we get

n
!
X n
2n = .
k=0
k

For x = −1, y = 1 we get

n
!
X n
0= (−1)k .
k=0
k

Example 1.55. It is very important to know the following special cases of the binomial theorem.
For x, y ∈ R we have
(x + y)2 = x2 + 2xy + y 2 .
and
(x + y)3 = x3 + 3x2 y + 3xy 2 + y 3 .
(Verify these formulas!)

The binomial theorem can also be used to (easily) improve upon Bernoulli’s inequality.

Corollary 1.56. Let x ≥ 0. Then, for any k ∈ {1, . . . , n}, we have

!
n k
(1 + x)n ≥ 1 + x .
k

This lemma is a direct consequence of the binomial theorem, see Theorem 1.52 with y = 1. One
just leaves out some of the non-negative terms. Note that this is Bernoulli’s inequality for k = 1.
However, note that we need x ≥ 0 here. (It is false, e.g., for x = −1 and n ≥ k = 2.)
34

1.6 Absolute value

In this section we want to briefly discuss some essential types of functions, which need to be
introduced for the sake of completeness. We start with the absolute value of a number, since
this function and its generalizations are heavily used throughout this lecture and beyond.
The absolute value of x ∈ R is defined by:
(
x, if x ≥ 0,
|x| :=
−x, if x < 0.

The graph of the function x 7→ |x| is shown in Figure 10.

y
3 y = |x|

2.5

1.5

0.5
x
−3 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3

Figure 10: The graph of the function f (x) = |x|

Note that this function maps R onto R≥0 := [0, ∞), i.e., the domain and range of | · | are R and
R≥0 , respectively. As, e.g., | − 1| = |1|, | · | is not injective on R, and hence also not bijective.
(Try to prove | − x| = |x| formally!)

We list some of the most important properties of the absolute value.

Lemma 1.57. Let x, y, z ∈ R with z ≥ 0. Then, we have the following properties

1. |x| ≥ 0

2. |x| = 0 ⇐⇒ x = 0

3. |x| ≥ x

4. |x · y| = |x| · |y|

5. |x| ≤ z ⇐⇒ −z ≤ x ≤ z

Proof. We prove the lemma by using case distinction. In fact, this is a good illustration on
how to work with absolute values in general.
35

We only prove the first statement, the other work similarly, and are left as an exercise.
Case 1: x ≥ 0, then we have |x| = x ≥ 0.

Case 2: x < 0, then we have |x| = −x > 0.

Hence |x| ≥ 0 for all x ∈ R.

Absolute values appear very regularly, and it essential to know how to work with them.
Mostly, we are interested in the set of all numbers, say x, that satisfy a certain inequality.
So for a given inequality, say |x − 1| < 2, we want to find all x ∈ R that satisfy it. The set of all
these x is called the solution set of the inequality, and is often denoted by L.
Example 1.58. We look for the solution set L of the following inequality (in R):
2|x + 3| − 4|x − 1| ≥ 8x − 2.
We have to distinguish three regions: x ≤ −3, −3 < x ≤ 1 and 1 < x, since the expressions in
the absolute values change their signs at these points.
Case 1 (x ≤ −3): 2|x+3|−4|x−1| = 2(−x−3)−4(−x+1) = −2x−6+4x−4 = 2x−10 ≥ 8x−2
⇐⇒ x ≤ − 34
We therefore obtain: L1 := {x ∈ R : x ≤ −3, x ≤ − 34 } = (−∞, −3].

Case 2 (−3 < x ≤ 1): 2|x + 3| − 4|x − 1| = 2x + 6 + 4x − 4 = 6x + 2 ≥ 8x − 2

⇐⇒ x ≤ 2
We therefore obtain: L2 := (−3, 1].

Case 3 (x > 1): 2|x + 3| − 4|x − 1| = 2x + 6 − 4x + 4 = −2x + 10 ≥ 8x − 2

⇐⇒ x ≤ 65 ,
We therefore obtain: L3 := (1, 56 ].

Together: L = L1 ∪ L2 ∪ L3 = (−∞, −3] ∪ (−3, 1] ∪ (1, 56 ] = (−∞, 65 ].

The following inequality, which is probably the most well known one, is at the same time the
most important (also in more general scenarios). You will not use any other inequality more
often than this one.

Theorem 1.59 (Triangle inequality). Let x, y ∈ R. Then we have that

|x + y| ≤ |x| + |y|.

Proof. We already know x ≤ |x| and −x ≤ |x|, which implies

x + y ≤ |x| + |y| and − (x + y) = −x − y ≤ |x| + |y|.
These are both cases we need to check for proving the triangle inequality.

The triangle inequality also implies that

|x − z| = |x − y + y − z| ≤ |x − y| + |y − z|,
for all x, y, z ∈ R. Hence, we can consider the absolute value as a “distance” between numbers.
36

Moreover, we obtain the following (less obvious) inequality.

Corollary 1.60. Let x, y ∈ R. Then we have

|x| − |y| ≤ |x − y|.

Proof. The triangle inequality yields

|a + b| − |b| ≤ |a|

for arbitrary a, b ∈ R. We choose a = x − y and b = y to get

|x| − |y| ≤ |x − y|.

For b = −x we get
|y| − |x| ≤ |x − y|.

The result follows from |x| − |y| = max{|x| − |y|, |y| − |x|}.

1.7 Some elementary functions

We now turn to some other elementary functions. Note that it is not necessary to understand
all of these definitions completely yet. We just state them here as a reference for later.

a) Very simple functions are affine linear functions, which are defined as

f: R→R

x 7→ ax + b
with a, b ∈ R. If a = 0, f is called constant function. If a 6= 0 and b = 0 we call f a
linear function. Note that linear functions satisfy f (x + y) = f (x) + f (y).
(Is the same true for affine linear functions?)
Moreover, they are special cases of polynomial functions, which are defined by

p: R → R
n
X
x 7→ ai xi
i=0

with a0 , . . . , an ∈ R.

b) We may also consider the power functions

f : R+ → R
x 7→ xa ,

for some fixed a ∈ R and R+ := [0, ∞), see Figure 11. (In short: f (x) = xa for x > 0.)
37

4
y a=8 a=2 4 y

3 a=1 3

2 2
a = 1/2

a = 1/8
1 1 a = −1/8
a = −1/2
a = −1
x x a = −2
a = −8
1 2 3 1 2 3

Figure 11: The graph of the function f (x) = xa for different a

Let us say some words about, how these expressions are precisely defined. Since we know
how to multiply two or more numbers, we may clearly define the power functions with
non-negative integer exponents, i.e., xn with n ∈ Z, even for all x ∈ R \ {0}, as in the
polynomials above. (We only need to exclude x = 0 for negative exponents, because cannot
divide by 0.) Moreover,
√ since we know how a root of a positive number is defined, we can
also define, e.g., √2 = 21/2 or 31/8 . (31/8 is just the number z with z 8 = 3.) With this, we
n
can define x m = m xn for every x > 0, n ∈ Z and m ∈ N. Hence, we know how to define
xq for every x > 0 and q ∈ Q.
√
But what about 4 2 or π π ? (Think a bit how you would calculate/define these numbers!)
In this case, i.e., if we consider xa for some a ∈ R\Q, the most natural way is to define them
by using supremum or infimum. Taking the monotonicity into account (see Figure 11), we
define for x > 0 the powers
(
a sup{xq : q ∈ Q, q < a}, if x ≥ 1,
x := (1.1)
inf{xq : q ∈ Q, q > a}, if 0 < x < 1.
Note that we know how to compute the expressions inside sup/inf, and that sup/inf always
exist as the sets are bounded, due to the Completeness axiom.
With this, we get the well-known calculation rules for arbitrary x, y ∈ R+ , and a, b ∈ R
that
1 xa
x−a = a xa · xb = xa+b = xa−b
x xb
a
a b a·b a a a x xa
(x ) = x (x · y) = x · y = a
y y
However, note that there is –and there will be– no natural and satisfying way to define
arbitrary powers of a negative number, like (−2)π . We do not go into detail here.
c) Other well-known functions are the exponential functions, which characterise rapid
growth or decay. Exponential functions are defined by
f : R → R+ , x 7→ ax ,
38

where the basis a > 0 is a fixed parameter. That is, in contrast to power functions, the
variable x ∈ R is the exponent.

4 a>1

1
a<1

−3 −2 −1 1 2 3

Figure 12: The graph of the function f (x) = ax for different a

Note that ax is defined for all x ∈ R. For a precise definition, we use again supremum and
infimum, see (1.1). That is, for a ≥ 1 we set ax := sup{aq : q ∈ Q, q < x}. Together with
ax = ( a1 )−x for 0 < a < 1, this defines ax for all a > 0 and x ∈ R.
Among all exponential functions, one is particularly important. This is when we choose
the basis a = e ≈ 2, 7182818284, where e is Euler’s number. This special (irrational)
number e may be defined by different means:
n n ∞
1 1 1
X
e = sup 1+ : n ∈ N = lim 1+ = .
n n→∞ n k=0
k!

(Do not panic yet! We discuss the meaning of these expressions later.)
d) The exponential functions often appear together with the logarithmic functions, which
are there inverses:
f : R+ → R, x 7→ logb (x).
Read: logarithm of x to base b. Here, we usually assume that b > 1, but in principle one
may also consider b ∈ (0, 1).

2
b=2
1 b=e
b = 10
x
0.5 1 1.5 2 2.5 3

−1

−2

Figure 13: The graph of the function f (x) = logb (x) for different b
39

Some important choices for the base b are

– b = e ≈ 2.71828..., i.e., Euler’s number: We write ln(x) := loge (x) = log(x), and this
is called the natural logarithm.
– b = 2: We write lb(x) := log2 (x), and this is called binary logarithm.
– b = 10: We write lg(x) := log10 (x), and this is called common logarithm.
Remark 1.61. In order to clarify the notation beforehand we agree on the following:
For us log(x) = ln(x), and in all other cases we are more precise and write logb (x).

As inverse function it satisfies the two relations

blogb x = x and logb (bx ) = x,
which define the function values exactly. In particular, this means that logb x is the number,
say y, such that by = x.
Moreover, from the rules for exponentiation we obtain the following calculation rules for
all a, b, x, y > 0:
logb xy = y logb x

logb (xy) = logb x + logb y

x
logb = logb x − logb y
y
loga x
logb x =
loga b
e) The trigonometric functions play a crucial role in analytic geometry. The most impor-
tant are:
sin(x) (sine)
cos(x) (cosine)
tan(x) (tangent)
cot(x) (cotangent)

Figure 14 shows the ’geometric definition’ of these functions in the unit circle.
(cot(x), 1)
1
tan(x) (1, tan(x))

sin(x) sin(x), cos(x)

x x
−1 cos(x)1 −1 cot(x)

−1 −1

Figure 14: Illustration of trigonometric functions

The variable x ∈ R corresponds to the angle that is enclosed between the horizontal axis
and the line to the point (sin(x), cos(x)). It can therefore be given in degrees (deg) in the
range from 0◦ and 360◦ . However, it is more convenient in science to interpret x as the arc
length of the part of the unit circle that is contained between the lines. The appropriate
unit is called radians (rad). Since the length of the unit circle is given by 2π, radians can
by obtained from the degrees by:
deg
x = π.
180
The functions sin x and cos x are defined for all x ∈ R, see Figure 15.

1
sin x

−π/2 π/2 π 3 2π 5π/2

2π
cos x
−1

Figure 15: The graphs of sin and cos

Using some elementary geometry on the above illustration we obtain the calculation rules:

sin(−x) = − sin x, cos(−x) = cos x,

π π

sin x + = cos x, cos x + = − sin x,
2 2
sin(x + π) = − sin x, cos(x + π) = − cos x,
sin(x + 2π) = sin x, cos(x + 2π) = cos x.

Some important values are the following:

π π π π 2π 3π 5π
ϕ 0 6 4 3 2 3 4 6 π
√ √ √ √ √ √ √ √ √
0 1 2 3 4 3 2 1 0
sin(ϕ) 2 2 2 2 2 2 2 2 2
√ √ √ √ √ √ √ √ √
4 3 2 1 0 − 1 − 2 − 3 − 4
cos(ϕ) 2 2 2 2 2 2 2 2 2

The corresponding values in (π, 2π) can be obtained from sin(x + π) = − sin(x) and
cos(x + π) = − cos(x).
Moreover, we have the very important trigonometric identity

sin2 x + cos2 x = 1,
41

and the trigonometric addition formulas:

sin(x + y) = sin x cos y + cos x sin y

cos(x + y) = cos x cos y − sin x sin y

x+y x−y

sin x + sin y = 2 sin cos
2 2
x+y x−y

sin x − sin y = 2 cos sin
2 2
x+y x−y

cos x + cos y = 2 cos cos
2 2
x+y x−y

cos x − cos y = −2 sin sin
2 2

The following useful formulas may be deduced:

cos(2x) = cos2 (x) − sin2 (x) sin(2x) = 2 sin(x) cos(x)
2
1 + cos(2x) = 2 cos (x) 1 + sin(2x) = (sin(x) + cos(x))2
1 − cos(2x) = 2 sin2 (x) 1 − sin(2x) = (sin(x) − cos(x))2

(It’s not needed to memorize these formulas, but important to know where to find them.)

Based on sin and cos, we can define tan x and cot x by

sin x π
tan x := , x 6= + kπ, k ∈ Z,
cos x 2
cos x
cot x := , x 6= kπ, k ∈ Z.
sin x
π

These restrictions are necessary since sin(kπ) = 0 = cos 2 + kπ for all k ∈ Z, and we are
not allowed to divide by zero, see Figure 16.

4
tan x
3

1
cot x

− 5π −π − 3π − π2 − π4 π π 3π π 5π
4 4 4 2 4 4
−1

−2

−3

−4

Figure 16: The graphs of tan and cot

Additionally, we can define the inverse trigonometric functions. For this, first note
that sin and tan are increasing and on [ −π π −π π
2 , 2 ] and ( 2 , 2 ), respectively, cos is decreasing
on [0, π], see Figures 15 and 16. In particular, all three functions are bijective on these
intervals (i.e., every value of sin(x), cos(x) and tan(x) is achieved by exactly one x in the
interval), and therefore have inverse functions, see Theorem 1.16.
Due to their importance, the inverse trigonometric function have their own name. Namely,
arcsine arcsin := sin−1 : [−1, 1] → [− π2 , π2 ], the arccosine arccos := cos−1 : [−1, 1] →
[0, π], and the arctangent arctan := tan−1 : R → [−π/2, π/2]. These functions are de-
fined by
y = arcsin x : ⇐⇒ x = sin y and y ∈ [−π/2, π/2],
y = arccos x : ⇐⇒ x = cos y and y ∈ [0, π],
y = arctan x : ⇐⇒ x = tan y and y ∈ (−π/2, π/2),

and it is not hard to see that

– arcsin : [−1, 1] → [ −π π
2 , 2 ] is increasing,
– arccos : [−1, 1] → [0, π] is decreasing,
– arctan : R → ( −π π
2 , 2 ) is increasing,

see Figure 17.

arcsin x
π/2 arctan x

arccos x
−5 −4 −3 −2 −1 1 2 3 4 5

−π/2

Figure 17: The graphs of arcsin and arccos

f) Finally a special combination of exponential functions yield the hyperbolic functions,

defined in R by:
ex − e−x
sinh x := (hyperbolic sine)
2
ex + e−x
cosh x := (hyperbolic cosine)
2
sinh x
tanh x := (hyperbolic tangent)
cosh x

It obviously holds that cosh2 x − sinh2 x = 1.

1.8 Complex numbers

At a certain point in mathematics we are not able to continue with real numbers anymore. This
point is reached, once we want to find a solution of the equation x2 + 1 = 0. If we introduce a
new element √
i := −1,
then the equation has the two solutions ±i. This extension leads us to the (field of) complex
numbers.

Definition 1.62. The set of all complex numbers is defined by

C := {z = x + iy : x, y ∈ R}.

The representation of the term z = x + iy is called the canonical representation.

For a complex number z = x + iy we call x the real part of z and y the imaginary part.
We write Re z = x and Im z = y.

Let z = x + iy ∈ C. We define the complex conjugate of z by

z := x − iy.

In C we have the following calculation rules for z = x + iy and w = u + iv:

• z + w = (x + u) + i(y + v)

• zw = (xu − yv) + i(xv + uy)

These operations are associative, commutative, distributive and we have

1 1 x − iy x iy
z −1 = = = 2 − .
x + iy x + iy x − iy x + y 2 x2 + y 2
Hence, C with these operations is also a field.

Example 1.63. Consider z = 4 + 2i, then z is a complex number with Re z = 4, Im z = 2,

z = 4 − 2i and z −1 = 51 − 10
1
i.

√
Example 1.64. Recall that i is just a symbol for −1. For example, we have
√ q √ √
−4 = (−1) · 4 = 4 −1 = 2i.

Note that for x ∈ R we have x ∈ C with Im x = 0. Thus, R ( C.

Formally, one can identify the complex numbers C with a tuple (x, y) of real numbers. Each
complex number z = x + iy can, therefore, be illustrated as a point in the plane, which is called
the complex plane. The coordinate axes are called real and imaginary axis.
44

Figure 18: complex plane

(Exercise: Think about where to draw the complex conjugate z in the complex plane.)

We define the absolute value of a complex number z = x + iy to be the length of the straight
line from (0, 0) to (x, y).

Definition 1.65. Let z = x + iy ∈ C. We define the absolute value of z by

√ q
|z| := zz = x2 + y 2 .

Several calculation rules for the absolute value are just the same as for the absolute value on R.
(That’s why we use the same name.)

Lemma 1.66. Let z, w ∈ C. Then the following holds

1. |z| ≥ 0

2. |z| = 0 ⇔ z = 0

3. |z| ≥ |Re z|

4. |z| ≥ |Im z|

5. |zw| = |z||w|

Proof. 1. - 4. are left as an exercise to the reader. For 5. we see that

|zw|2 = zwzw = zzww = |z|2 |w|2 .

Again, we have the important triangle inequality.

Lemma 1.67 (Triangle inequality). Let z, w ∈ C. Then we have

|z + w| ≤ |z| + |w|,

with equality if and only if zw ≥ 0. (Note that zw ≥ 0 means, in particular, that zw ∈ R.)
Moreover, we have

|z| − |w| ≤ |z − w|.

Proof. For the triangle inequality, observe that

|z + w|2 = (z + w)(z + w) = zz + (zw + zw) + ww = |z|2 + 2 Re (zw) + |w|2

2
≤ |z|2 + 2|zw| + |w|2 = |z|2 + 2|z||w| + |w|2 = |z| + |w| .

Note that the only inequality here is an equality if zw ≥ 0.

For the second inequality, note that

|x + y| − |y| ≤ |x|.

Using x = z + w and y = −w we get

|z| − |w| ≤ |z + w|

and for y = −z we get

|w| − |z| ≤ |z + w|.
Combining these results we get the desired inequality.

Let us give a summary about the representation and visualization of complex numbers in the
complex plane:

C the plane (R2 )

z = x + iy point with coordinates (x, y)
Re z x-coordinate
Im z y-coordinate
real numbers points on the x-axis
pure imaginary numbers points on the y-axis
−z z reflected at the origin
z z reflected at the x-axis
|z| length, or distance of z to the origin
|z1 − z2 | distance between z1 and z2
z1 + z2 ’vector addition’
|z1 + z2 | ≤ |z1 | + |z2 | triangle inequality
46

Instead of using the cartesian coordinates (x, y) for a complex number z = x + iy, one can also
switch to the so called polar coordinates (r, ϕ). The following relations hold between them
(see Figure 15 and 18):
q q √
r := |z| := x2 + y 2 = (Re z)2 + (Im z)2 = zz
x := |z| cos ϕ
y := |z| sin ϕ

Here, we use that every point on the circle {(x, y) : x2 + y 2 = 1} can be written as (cos ϕ, sin ϕ)
for some ϕ ∈ R. Actually, one can find such a ϕ ∈ [0, 2π), and this must satisfy tan ϕ = xy .
Therefore, we can write z in the trigonometric or polar form:

z = r(cos ϕ + i sin ϕ).

We call r = |z| the radius, and ϕ the argument of the complex number z.
The following question arises: under which condition is the representation (r, ϕ) unique?
Answer: Yes, if |z| =
6 0 and we assume ϕ ∈ [0, 2π) (or any other interval of length 2π).

An easy and useful way of writing complex numbers can be obtained by using Euler’s formula

eiϕ := cos ϕ + i sin ϕ ϕ ∈ R.

Note that eiϕ can at the moment only be understood as a symbol for the right hand side above,
and it is already useful as such. However, we will see later that this is really an equation, i.e.,
we will also define ez for z ∈ C and show the above.

To work with this formula, it is essential to mind the following important values:

All remaining important values can be obtained from

sin(x + π) = − sin(x) and cos(x + π) = − cos(x)

π 3π
Using these values of the trigonometric functions at the points 2 , π, 2 , 2π we obtain
iπ 3π
e2 =i eiπ = −1 ei 2 = −i ei2π = ei0 = 1.

Additionally, we obtain from these values, together with the periodicity of the trigonometric
functions (or standard calculation rules for exponentials), that for k ∈ Z,
(
ikπ 1, if k is even,
e =
−1, if k is odd.

(Verify that!)
47

Using Euler’s formula we can write every complex number (in its polar form) as
z = reiϕ .

This representation is particularly useful when it comes to multiplication and powers of complex
numbers. Given two complex numbers
z = reiϕ and w = seiψ ,
we obtain
zw = reiϕ seiψ = rs ei(ϕ+ψ) .

From this formula (with z = w) we obtain by induction de Moivre’s formula for powers z n
of z = r(cos ϕ + i sin ϕ) = reiϕ with n ∈ N:
z n = rn (cos nϕ + i sin nϕ) = rn einϕ .

Example 1.68. As an example we calculate (1 + i)42 . We set z := 1 + i and bring z in its

trigonometric form √
1 + i = 2 cos π4 + i sin π4 .

(Verify this in detail!) By de Moivre’s formula we get

√ 42
(1 + i)42 = cos 42π 42π
= 221 cos π2 + i sin π2 = 221 i.

2 4 + i sin 4

With Euler’s formula one may also write this in a more compact way:
√ πi 42 42πi πi
(1 + i)42 = 2e 4 = 221 e 4 = 221 e10πi e 2 = 221 i.

Caution: The above consideration only holds for integer exponents. (Why also for negative
integers?) More general exponents need more care and will not be discussed here.

Moreover, the polar form is also very useful for theoretical purposes.
Example 1.69. Show that |z + w| = |z| + |w| for z, w ∈ C if and only if z and w have the same
argument. (Hint: Consider Lemma 1.67 together with the polar form of z and w.)

It is an important result in mathematics, that complex numbers are really all we need to solve
polynomial equations. In fact, the Fundamental theorem of algebra (in one of its variants) even
gives the precise answer for the number of solutions of polynomial equations. A proof of this
result is beyond the scope of this course.

Theorem 1.70 (Fundamental theorem of algebra). Let ak ∈ C for k = 0, 1, 2, . . . , n such

that an 6= 0. Then, we have the equality
n
X n
Y
ak z k = an · (z − zk )
k=0 k=1

for some (not necessarily different) z1 , . . . , zn ∈ C.

Pn k,
Clearly, each zk is a root of the polynomial k=0 ak z i.e.,
n
X
ak · (z` )k = 0 for all ` = 1, . . . , n.
k=0
48

Example 1.71. For example, we have

z 3 − z 2 + z − 1 = (z 2 + 1)(z − 1) = (z + i)(z − i)(z − 1),

and hence, the zeros of the polynomial are 1, i and −i.

Note that it is a hard problem in general to find the zeros of a given polynomial of high degree.
For such tasks, one usually employs numerical software which, in most cases, can only output
approximations of the zeros.

1.9 Vectors and norms

Let us finally discuss the important concepts related to vectors

 
v1
 v2 
 
v = (vi )di=1 d
 ..  ∈ C ,
=  
.
vd

with d being called the dimension. (Cd denotes the d-fold Cartesian product of C.)
Note that d-dimensional vectors v ∈ Cd consist of the d components v1 , . . . , vd ∈ C.
We define the addition and scalar multiplication of vectors component-wise. That is, for
two vectors u = (ui )di=1 , v = (vi )di=1 ∈ Cd and a number λ ∈ C, we define
       
u1 v1 u1 + v1 λv1
u2  v2  u2 + v2  λv2 
       
u + v :=  .  +  .  =  . 
     and λ · v :=  . 

 ..   ..  ..   .. 

 
ud vd ud + vd λvd

Note that it is important for vector addition that the vectors have the same dimension. If the
dimensions of the two vectors do not agree, then their sum is not defined.
(The term “scalar” is used to distinguish this multiplication from the others discussed below.)

Remark 1.72. Here we use the field of complex numbers C to ’build’ complex vectors v ∈ Cd .
Note that we can easily also consider only real vectors v ∈ Rd , as real vectors are special
complex vectors. However, since all definitions here work directly in the complex case, and this
will be needed later, we define it in the more general context and comment on necessary changes
when needed. Moreover, note that we could define vectors, and the corresponding operations,
in a much more general context, as long as the operations for the components are well-defined.
This will be discussed much later.

In analogy to the real and complex numbers above, we want to define (and use) a quantity that
allows for measuring ’how large’ a vector is.
Let us begin with the most important choice for such a quantity:
49

Definition 1.73. Let v = (vi )di=1 ∈ Cd .

Then, we define the Euclidean norm (or just length) of v by
v
u d
uX
kvk2 := t |vi |2 .
i=1

Moreover, for u = (ui )di=1 , v = (vi )di=1 ∈ Cd we define the inner product (or dot product)
of u and v by
d
X
hu, vi := ui vi .
i=1

With this, we have the useful representation

q
kvk2 = hv, vi.

For real vectors v ∈ Rd , these definitions simplify to

v
u d d
uX X
kvk2 := t vi2 and hu, vi := ui vi .
i=1 i=1

Remark 1.74. We use the subscript ’2’ here, because we will study also other norms later.
Note that some authors use the notation |·| also for the Euclidean norm, which shows its role as
generalization of the absolute value.

From now on, we use x, y, z also as symbols for vectors for notational convenience.

The inner product is (formally) a mapping h·, ·i : Cd × Cd → C, and we have the following
properties for all x, y, z ∈ Cd and λ ∈ C:

1. Positive definiteness: x 6= 0 =⇒ hx, xi > 0 (In particular, hx, xi ∈ R)

2. Linearity in first argument: hx + y, zi = hx, zi + hy, zi and hλx, yi = λhx, yi

3. Conjugate symmetry: hx, yi = hy, xi

These properties together imply

hλx + µy, zi = λhx, zi + µhy, zi

and
hx, λy + µzi = λhx, yi + µhx, zi.
for all x, y, z ∈ Cd and λ, µ ∈ C. (Verify this!) Note that we need to take the complex con-
jugate when we take a scalar out of the ’second input’. One says, the inner product is sesquilinear.
50

From these, we directly obtain that the Euclidean norm satisfies

1) kvk2 = 0 ⇐⇒ v = 0, where we write 0 := (0)di=1 for the zero vector, and

2) for any λ ∈ C and any v ∈ Cd we have kλvk2 = |λ| · kvk2 .

The first means that the norm of v is zero if and only if all components of v are zero (which is
called definiteness), and the second property is called homogeneity. Both properties were also
clear for the absolute value of a real/complex number, which is also covered by the above in the
case d = 1. However, there is a third property that is essential for the upcoming considerations.
Namely, the triangle inequality, see also Theorem 1.59 and Lemma 1.66.

Theorem 1.75 (Triangle inequality). For all x, y ∈ Cd we have

kx + yk2 ≤ kxk2 + kyk2 .

Moreover, the equality kx + yk2 = kxk2 + kyk2 holds if and only if y = α · x for some α ≥ 0.

Note that in the case that d = 2 or d = 3 the Euclidean norm coincides with the usual intuition
of the ’length’ between 0 and the point x, i.e., k · k2 is the length of the ’direct way’ between 0
and x. This makes the triangle inequality appear to be an obvious statement. However, when
it comes to d > 3 this might not be that clear, and therefore we present a proof below.

For this, we first need another inequality, which is also of independent interest. The Cauchy-
Schwarz (CS) inequality gives a relation between the inner product and the Euclidean norm.

Lemma 1.76 (Cauchy-Schwarz inequality). For x, y ∈ Cd we have that

|hx, yi| ≤ kxk2 kyk2 .

Moreover, we have the equality |hx, yi| = kxk2 kyk2 if and only if y = c · x for some c ∈ C.

Remark 1.77. The choice of the Euclidean norm for the following analysis is quite arbitrary,
especially when it comes to non-geometrical applications, and there are sometimes other natural
choices. We will discuss some other possibilities soon. However, although the triangle inequality
is an essential property also for ’other norms’, the (sometimes very helpful) Cauchy-Schwarz
inequality is special to the Euclidean norm. Therefore, we usually work with this norm.

Proof. Let us prove the second statement. The first follows by squaring both sides.
First, if either x = 0 or y = 0, then the statment is clearly true. Otherwise, we use the inequality
2 2
ab ≤ a +b 2
2 , which holds for any real numbers a, b, and follows from (a − b) ≥ 0. (Verify this!)
We define
|xi | |yi |
ai := and bi := .
kxk2 kyk2
Observe that, by definition,
d
X d
X
a2i = b2i = 1,
i=1 i=1
51

and therefore also

d
X a2i + b2i
= 1.
i=1
2
This yields that
d d
X X
xi yi ≤ |xi | · |yi |

i=1 i=1
d
X
= kxk2 kyk2 ai bi
i=1
d
X a2i + b2i
≤ kxk2 kyk2
i=1
2
= kxk2 kyk2 .

(Note that the first inequality, was just the triangle inequality for sums of numbers.)
To obtain equality, both above used inequalities need to be equalities. Equality in the first is
equivalent to xi yi having the same argument for all i, see Exercise 1.69. Equality in the second
is equivalent to ai = bi for all i (Check that!), which means |xi | = r|yi | for all i and some r > 0.
Since we fixed argument and absolute value, we obtain equality if and only if x = c · y for some
c ∈ C.

The CS inequality can now be used to prove the triangle inequality for k·k2 .

Proof of Theorem 1.75. For x, y ∈ Cd we obtain

kx + yk22 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi

= kxk22 + hx, yi + hx, yi + kyk22
= kxk22 + 2Re (hx, yi) + kyk22
≤ kxk22 + 2|hx, yi| + kyk22
≤ kxk22 + 2 kxk2 · kyk2 + kyk22
= (kxk2 + kyk2 )2 .

To obtain equality, note that Lemma 1.76 shows that we need y = c · x for some c ∈ C to obtain
equality in the second inequality. Using this, we need Re (c) = |c| to achieve equality in the first
inequality above. This holds iff c ≥ 0.

We finally introduce some particular important vectors which will appear very often in the
upcoming considerations. These are the unit vectors e1 , . . . , ed ∈ Rd , where ek is the vector
which is zero except for the k-th entry, which is one. That is,
     
1 0 0
0 1 0
     
e1 = 
 ..  ,
 e2 = 
 ..  ,
 ... , ed = 
 ..  .

. . .
0 0 1
52

Note that they all have norm 1, independent of which norm above one chooses.
One important property of the unit vectors is, that they can be used to represent arbitrary
vectors. For this, note that λ · ek with λ ∈ R is the vector with λ in the k-th entry, and zero
elsewhere. It is therefore easy to see that
 
x1
 x2 
  d
X
 .  =
 .  xk · e k ∈ Rd .
 .  k=1
xd

One might guess that working with a representation as given on the right can be quite useful.

Remark 1.78. Although we presented such a representation only with respect to the unit
vectors e1 , . . . , ed , we will see later that this works also with other vectors. A set of vectors that
can be used to represent arbitrary vectors in a unique way as above is called a basis of Rd . In
this context, the set {e1 , . . . , ed } is called the standard basis of Rd , and the numbers xk are
called coordinates of x with respect to {e1 , . . . , ed }. It is important to note again that the
xk ∈ R are just numbers, and only the ek ∈ Rd are vectors (i.e., elements from Rd ). We will
discuss this (and the related concept of a vector space) later in more detail.
53

2 Matrices and systems of linear equations

In this chapter we want to solve systems of equations. That is, we want to find possible values
for some free variables (or parameters etc.) that fulfill a certain collection of equations. This is
one of the most important disciplines in applied mathematics and numerical applications.
Here, we start by discussing linear equations, and show how they can be solved explicitly. These
equations are called linear because they depend on a variable only in a linear way.

For one free variable, linear equations are easy to solve:

If we want to solve the equation ax = b for some a, b ∈ R, then the solution is clearly given by
x = ab if a 6= 0. However, if a = 0 then this equation can be solved if and only if b = ax = 0.
That is, for b 6= 0 there is no x satisfying the equation, while for b = 0 this equation is fulfilled
for every x ∈ R.

For two free variables, it is already more involved:

Assume you want to find x1 , x2 ∈ R, such that the equations

a11 · x1 + a12 · x2 = b1 ,
a21 · x1 + a22 · x2 = b2

are fulfilled for some given a11 , a12 , a21 , a22 , b1 , b2 ∈ R. (Note that we use subscripts ak` in our
notation to keep track ’where’ a coefficient appears in the system of equations.)
As before, we see that the first equation is equivalent to
b1 − a12 · x2
x1 = if a11 6= 0.
a11
This can be put into the second equation, which then only depends on the unknown x2 , and
can therefore (potentially) be solved as in the one dimensional case. In this way, we may find
a unique solution for x2 which then implies a solution for x1 by the equation above. By this
procedure, we can find solutions to linear equations, also for larger systems.
However, it is not clear if there really is a (unique) solution for x1 and x2 for every choice of
a11 , a12 , a21 , a22 , b1 , b2 ∈ R. There might also be no or infinitely many solutions (as for a single
equation with a = 0).
For demonstration, let us discuss a specific example:
Consider the system of linear equations

2x1 + x2 = 1,
6x1 + 3x2 = 2.

The first equation is equivalent to x2 = 1 − 2x1 . If we put this into the second equation, we
obtain 6x1 + 3 − 6x1 = 2 which is never fulfilled. That is, there is no solution to this system of
equations. However, if we change the system to

2x1 + x2 = 1,
6x1 + 3x2 = 3,

then the first equation is still equivalent to x2 = 1 − 2x1 . But, after putting this into the second
equation, we have 6x1 + 3 − 6x1 = 3 which holds for every x1 ∈ R. Therefore, the above system
is solved by all (x1 , x2 ) ∈ R2 with 2x1 + x2 = 1, e.g., for (x1 , x2 ) = (0, 1) or (x1 , x2 ) = (1, −1).
This shows that such systems of equations might be quite sensitive to small changes of the
parameters (and this was just a two dimensional example). It is therefore desirable to have
54

criteria for a given system of equations to be (uniquely) solvable that can be checked more
easily and before we start trying to calculate a solution.
Moreover, this procedure is useful only for ’small’ systems of equations, say with at most 3
unknowns. It is rather impractical for larger systems, and there are faster methods to solve
such systems by hand (and with a computer). This is particularly important since modern
applications are usually ’high dimensional’.

The most convenient way to formally work with linear systems is to introduce matrices, which
is just a way of writing numbers in an array similarly to vectors. For example, we will write
the above system of equations by Ax = b, where the matrix A ∈ R2×2 (means that it is a 2 × 2
array of real numbers) and the vector b ∈ R2 are given by
! !
a11 a12 b1
A = and b = .
a21 a22 b2

We discuss shortly how the operation Ax, i.e., matrix-vector multiplication, is precisely defined,
and that a system of equations may then be written by Ax = b. This now looks almost like the
one dimensional equation, and one might like to just “divide by A if it is not zero” to obtain
a solution. Unfortunately, that’s not (always) so easy, and we have to be careful with what it
means that “A 6= 0” or to divide by a matrix.
However, we will see that matrices are the ’right’ tool to work with such problems, and we
will introduce operations and calculation rules for matrices, which will enable us to transform
systems of equations to others which might be easier to solve. By this, we introduce techniques
to solve also large systems of equations in a straight-forward way.

2.1 Matrices

Let us start by introducing matrices.

Definition 2.1. Let n, m ∈ N and aij ∈ R for 1 ≤ i ≤ n and 1 ≤ j ≤ m.

A (real) m × n-matrix (read “m by n matrix”) is an array given by
 
a11 a12 ... a1n

a21 a22 ... a2n 
(aij )m,n
 
A = i,j=1 =  .. .. .. .. .

 . . . .


am1 am2 . . . amn

In this case we use the notation A ∈ Rm×n , and call m and n the dimensions of A.
A m × 1 matrix is called a column vector, a 1 × n matrix is called a row vector,
and if m = n, then the matrix is called quadratic, or a square matrix.

Remark 2.2. We mostly consider only matrices of real numbers here. The case of complex
matrices, i.e., aij ∈ C, can be treated analogously, and we write A ∈ Cm×n . We will comment
on differences of the complex case if needed.
Remark 2.3. The index notation at (aij )m,n i,j=1 just means that we consider i = 1, . . . , m and
j = 1, . . . , n. Some authors use (aij )m,n
i=1,j=1 or even (aij )i=1,...,n,j=1,...,n for the same.
55

We now turn to basic operations of matrices. The first two, namely scalar multiplication and
matrix addition, are very easy (and familiar from the corresponding operations for vectors).
These operations are component-wise operations, meaning that they are performed in each
entry of the matrices individually.
The matrix addition of two m × n-matrices A = (aij )m,n m,n
i,j=1 and B = (bij )i,j=1 is defined by
component-wise addition, i.e.,
 
a11 + b11 a12 + b12 ... a1n + b1n
 a21 + b21 a22 + b22 ... a2n + b2n
 
bij )m,n

A + B = (aij + i,j=1 =  .. .. .. .. .

 . . . .


am1 + bm1 am2 + bm2 . . . amn + bmn

Note that it is important here that the matrices have the same dimension. If the dimensions of
the two matrices do not agree, then their sum is not defined.

The second operation is scalar multiplication, which is the component-wise product of a

matrix, say A = (aij )m,n
i,j=1 ∈ R
m×n , with a (real or complex) number λ ∈ R. That is,

 
λa11 λa12 ... λa1n
 λa21 λa22 ... λa2n 


λ·A = 
 .. .. .. .. 
 . . . . 

λam1 λam2 . . . λamn

(The term “scalar” is used to distinguish this from the matrix product discussed below.)

The third basic operation is the matrix product, i.e., the product of two matrices. In contrast
to the last operations, the product of two matrices is not defined component-wise:
Given a m×n-matrix A = (aij )m,n n,p
i,j=1 and a n×p-matrix B = (bij )i,j=1 , we define the m×p-matrix
m,p
C = (cij )i,j=1 with
n
X
cij = aik bkj ,
k=1
as the product of A and B, i.e., C = AB. This definition shows that the ij-th entry of the
product C, i.e., cij , depends only on the i-th row of A and the j-th column of B. One may mind
this rule by “cij is the product of the i-th row of A with the j-th column of B”.

To make this more precise, note that a matrix A ∈ Rm×n has m rows of length n, or n columns of
length m. So, let us assume that a matrix A has the rows a1 , a2 , . . . , am ∈ R1×n and columns
c1 , c2 , . . . , cn ∈ Rm×1 = Rm . We use the notation
 
a1
 a2 
 
A=
 ..  = c1 , c2 , . . . , cn .
 (2.1)
 . 
am

Note that ak and ck are not numbers, but vectors. However, the notation is consistent in the
sense that a matrix can be seen as a row vector consisting of column vectors, and vice versa.

Remark 2.4. It does not matter if we put commas in A = c1 , c2 , . . . , cn or not. Both notations
are used, and there is actually no room for misunderstanding once we specified clearly what the
entries “ck ” are.
56

Moreover, the matrix-vector product Ax of a matrix A = (aij )m,n

i,j=1 and a (column) vector
x = (xj )nj=1 ∈ Rn is defined through the matrix product by
   
a11 x1 + a12 x2 + · · · + a1n xn ha1 , xi

 a21 x1 + a22 x2 + · · · + a2n xn   ha2 , xi 
  
m
Ax =  ∈R ,
.. = .. 
. . 
  
  
am1 x1 + am2 x2 + · · · + amn xn ham , xi
Pn
where hai , xi = j=1 aij xj is the inner product of ai and x.
It follows from the definition of the matrix multiplication, that for a matrix A ∈ Rm×n of the
form (2.1), and another matrix B ∈ Rn×p with columns b1 , . . . , bp , i.e., B = (b1 , . . . , bp ), we have
 
a1 B
 a2 B 
  m,p
m×p
 ..  = hai , bj i i,j=1 ∈ R
AB = Ab1 , Ab2 , . . . , Abp =  .

 . 
am B

That is, the ij-th entry of AB is the inner product of the i-th row of A with the j-th column of
B, as stated above.
(Note that all inner dimensions for the involved matrix(-vector)-products agree.)

Let us now see an example.

Example 2.5. We calculate C = AB, where
 
1 6 !
7 9 1
A = 2 5 and B =
 
8 0 0
3 4

Since A ∈ R3×2 and B ∈ R2×3 , the product C = AB ∈ R3×3 will be a 3 × 3-matrix.

We calculate the upper left entry c11 = 1 · 7 + 6 · 8 = 55. In matrix form this looks like
   
1 6 ! 55 ∗ ∗
 7 9 1
2 5 =  ∗ ∗ ∗ ,
  
8 0 0
3 4 ∗ ∗ ∗

where ∗ is used for entries we do not know yet.

Next we use the first row of A and the second column of B to calculate c12 = 1 · 9 + 6 · 0 = 9:
   
1 6 ! 55 9 ∗
 7 9 1
2 5 =  ∗ ∗ ∗ .
  
8 0 0
3 4 ∗ ∗ ∗

To compute the second row of C we have to use the second row of A:

   
1 6 ! 55 9 1
 7 9 1
2 5  = 54 ∗ ∗ ,
  
8 0 0
3 4 ∗ ∗ ∗
   
1 6 ! 55 9 1
 7 9 1
2 5 = 54 18 ∗ .
  
8 0 0
 
3 4 ∗ ∗ ∗
57

Continuing this procedure we finally get

 
55 9 1
C = AB = 54 18 2 .
 
53 27 3

Note that in this case, also the matrix BA is defined. However, the product BA is a 2×2-matrix.
Namely, !
28 91
BA = ,
8 48
and there is no obvious relation between AB and BA.

Note that the matrix product is only defined if the inner dimensions agree. That is, if we want
to multiply a m × p-matrix A ∈ Rm×p and a q × n-matrix B ∈ Rq×n , then we need that p = q.
Otherwise, the product is not defined. Note that this implies that we might define the product
AB of two matrices A and B, but the “reverse” product BA is not defined. Consider, e.g., the
case A ∈ Rm×n and B ∈ Rn×n with m 6= n.
The following rules for calculation, which may remind on the respective rules for numbers, follow
easily from the definition:

• λ(A + B) = λA + λB for all matrices A, B ∈ Rm×n ,

• A(B + C) = AB + AC ∈ Rm×n if A ∈ Rm×p and B, C ∈ Rp×n .

However, even if A, B ∈ Rn×n , i.e., A and B are quadratic such that AB and BA is defined, we
do not have in general that AB = BA. That is, matrix multiplication is not commutative.

Moreover, there are identity elements for these operations, i.e., there are matrices such that
addition/multiplication of them do not change the second matrix. This corresponds to 0 and 1
for real and complex numbers.
For this, let us introduce the following special matrices.
For m, n ∈ N, we define the zero matrix 0mn ∈ Rm×n by

0 ··· 0
 
 .. . . .
0mn :=  . . ..  ,
0 ··· 0

i.e., 0mn = (aij )m,n

i,j=1 with aij = 0 for all i, j.
If the dimensions of the matrices under consideration are clear, we may just write 0 := 0mn .

For n ∈ N, we define the identity matrix, or just identity, In ∈ Rn×n by

 
1 0 ··· 0
0 1 · · · 0
 
In := 
 .. .. . . . .
. . . .. 

0 0 ··· 1
58

This matrix may be written as In = (δij )n,n

i,j=1 , where δij is the Kronecker delta defined by
(
1, if i = j,
δij =
0, otherwise.

Note that the identity is a square matrix, and we may write I := In if the dimension is clear.

Let us show formally that the identity matrix is an identity for matrix multiplication, see the
field axioms (Axiom 1). However, note that we need a different dimension of the identities if
the ’other matrix’ is not quadratic.

Example 2.6. Let In ∈ Rn×n and Im ∈ Rm×m be the identity matrices of the given dimensions,
and let A ∈ Rm×n be an arbitrary m × n-matrix. Let us check that

Im · A = A · In = A.

For this, we compute the ij-th entry of Im · A, which we call (Im A)ij . By definition
m
X
(Im A)ij = δik akj ,
k=1

where the Kronecker delta δik is zero if k 6= i. Thus the sum reduces to only one term, which is
δii aij = aij . This yields that the ij-th entry of the matrix product Im · A is aij , i.e., Im · A = A.
A similar calculation yields that A · In = A.
Note that for a quadratic matrix A ∈ Rn×n , we have

In · A = A · In = A.

Recall that the unit vectors ek = (δik )ni=1 ∈ Rn , k = 1, . . . , n, are the (column) vectors that
contain exactly one 1 and all other entries are zero. Let us also define the row unit vectors
eTk ∈ R1×n , i.e., eT1 := (1, 0, . . . , 0), eT2 := (0, 1, . . . , 0) and so on. With them, we can write the
identity by
eT1
 

eT2 


In = e1 , e2 , . . . , en =   ..  ,

 . 
eTn
i.e., ek is the k-th column, and eTk is the k-th row of In .
With the above considerations and In · A = A · In = A, we see that the unit vectors can be used
to “extract” the rows and columns from a matrix. That is, given a matrix A ∈ Rn×n of the
form (2.1), we obtain that

Aek = ck and eTk A = ak , (2.2)

i.e., Aek gives the k-th column, and eTk A gives the k-th row of A. (Verify this!) The same can
be done for rectangular matrices A ∈ Rm×n , but one need to consider unit vectors of different
length.
59

The last concept related to matrix multiplication that we will need is the inverse of a matrix.
That is, for a given matrix A ∈ Rn×n , the inverse matrix, if it exists, is a matrix A−1 ∈ Rn×n
such that
AA−1 = A−1 A = In .
If an inverse exists, then we call a matrix invertible or regular, see Section 2.6.
Some matrices are clearly invertible, like the identity with In−1 = In . Others are clearly not
invertible, like the zero matrix, because 0 cannot be multiplied by any matrix to become “non-
zero”. But, in general, it is not easy to see whether a matrix is invertible or not. For example,
the matrix ( 13 24 ) is invertible, but ( 13 26 ) is not. We will discuss a way to verify if a matrix is
invertible, and how to compute an inverse in Section 2.6.
However, let us already add here, that even if we know that a matrix is invertible, it is usually
difficult (also computationally) to compute an inverse.
We will come back to this issue later, and present some ways for computing the inverse, at least
for ’small’ matrices. This will be the ultimate tool to solve (certain) systems of linear equations.
But we will first discuss some more direct, but less powerful ways to calculate solutions.

Remark 2.7. Another interpretation of an inverse matrix is the following:

Recall that the matrix-vector product Ax of a matrix A = (aij )m,n i=1,j=1 and a (column) vector
n m
x ∈ R is a vector with Ax ∈ R . Hence, we can consider a matrix A also as a mapping
(aka. function), which maps one vector to another. In this formulation, the equation Ax = b is
solvable if b is contained in the range of A, see Definition 1.10, and there is a unique solution,
if A is invertible ( ⇐⇒ bijective), see Definition 1.13. We will see later that this mapping can
be bijective only if m = n, i.e., if A is a square matrix.

We finally discuss the transpose of a matrix. Since the dimensions of a matrix are important, it
makes a huge difference if a matrix is m × n or n × m, and it is quite useful to have a compact
notation to somehow ’switch’ the rows and columns of a matrix. That is, for a given m × n
matrix A = (aij ), we define its transpose AT as the n × m matrix whose rows are the columns
of A. To be more precise, the ij-th component of AT is aji , i.e.,
 
a11 a21 . . . am1

a12 a22 . . . am2 
T
(aji )n,m
 
A = j,i=1 =  .. .. .. .. .

 . . . .


a1n a2n . . . amn

With the above notation, we have

cT1
 
 T
c2 
AT =  T T T
 ..  = a1 , a2 , . . . , an ,

 . 
cTn
where the ak (ck ) are the rows (columns) of A, and note that a transposed row vector is a column
vector, and vice versa.
Example 2.8.
 T
1 6 !
1 2 3
2 5 =
 
6 5 4
3 4
60

And
 T T  
1 6 !T 1 6
1 2 3
2 5  = = 2 5
    
6 5 4
3 4 3 4

In the above example we saw that (AT )T = A, and this obviously holds in general. (The ij-th
component of (AT )T is the ji-th component of AT , which is aij .)
There is one calculation rule related to the transpose, that is sometimes also very useful for
computing the product of matrices. We state this in the following lemma.

Lemma 2.9. Let m, n, p ∈ N, A ∈ Rm×p and B ∈ Rp×n . Then,

(AB)T = B T AT .

(Note that the order has changed.)

In particular, this lemma shows that (AAT )T = AAT for every A ∈ Rn×n .

Proof. First of all, note that B T ∈ Rn×p and AT ∈ Rp×m . Therefore, the inner dimensions of
B T and AT agree, and their product is defined.
Now, let us write (C)ij for the ij-th entry of a matrix. By definition, we have that
p
X
(AB)ij = (A)ik (B)kj ,
k=1
and
p
X p
X p
X
T T T T
(B A )ij = (B )ik (A )kj = (B)ki (A)jk = (A)jk (B)ki = (AB)ji .
k=1 k=1 k=1

Since (AB)ji = (AB)T , we see that B T AT = (AB)T .

Some matrices do not change under transposing them.

Definition 2.10. A matrix A ∈ Rn×n with AT = A is called symmetric.

Note that symmetric matrices must be quadratic, and we will see later that symmetric matrices
have several important properties.

The easiest examples are the identity and the (quadratic) zero matrix. Another important class
of symmetric matrices are diagonal matrices.

Definition 2.11. A quadratic matrix A = (ai,j ) ∈ Rn×n is a diagonal matrix if there

exist numbers d1 , d2 , . . . , dn such that
(
di , if i = j
aij =
0, otherwise.

The numbers di are called diagonal elements of A and we write A = diag(d1 , d2 , . . . , dn ).

The transpose is often used when working with vectors. Therefore, it is essential to understand
especially this case.
Consider a column vector x ∈ Rn = Rn×1 , then its transposed xT ∈ R1×n is a row vector.
Now, since the inner dimensions in both cases agree, we can define xT x and xxT . However, we
obtain that xT x ∈ R = R1×1 is a number, but xxT ∈ Rn×n is a quadratic matrix.

Example 2.12. Let x = (1, 2)T ∈ R2 . Then, xT x = 12 + 22 = 5, but

!
T 1 2
xx = .
2 4

The above examples can clearly be generalized to the case of two different vectors x, y ∈ Rn .
That is, we can define the number xT y = ni=1 xi yi , which is just the inner
P
√product of x and y.
In particular, the Euclidean norm of a vector x can be written as kxk2 = xT x.
Moreover, we can define the matrix xy T ∈ Rn×n . One may even define such matrices based on
vectors of different dimensions. As these matrices appear rather often in theory and applications,
they have been given names.

Definition 2.13. Let A ∈ Rm×n . If there exist two vectors x ∈ Rm and y ∈ Rn such that

A = xy T ,

then we call A a rank-one matrix.

These matrices play an important role in the work with high dimensional data.

Remark 2.14. If we consider complex-valued matrices A = (aij ) ∈ Cm×n , then all the
definitions above still make sense. However, instead of a transpose AT , we would speak of the
adjoint matrix A∗ = (aji ), i.e., the conjugate transpose. (That is, we take additionally the
complex conjugate in each component.) Clearly, the adjoint equals the transpose if the matrix
is real-valued. If A = A∗ , we call A a self-adjoint (or hermitian) matrix.

Remark 2.15. The above calculation rules may be compared to the field axioms for real num-
bers, see Axiom 1. Note that many properties are also fulfilled for matrices. However, matrix
multiplication is not commutative, i.e., we do not have AB = BA in general, and that’s why the
set of matrices (of same dimensions) is not a field. (One may see that it is a group or even a
ring, but we do not discuss this type of algebraic structures here.)
62

2.2 Systems of linear equations

Let us now consider systems of linear equations, which are the most frequently occurring type
of multivariate (i.e., depending on more than one variable) problems to solve, although they
appear to be the easiest. One reason is that many (even “non-linear”) numerical problems can
be rewritten as, or approximated by, a (very large) system of linear equations. And although
such systems are usually solved by a computer, it is up to the user to transfer the problem
under consideration to a well-defined linear system. It is therefore indispensable to have a solid
understanding of these basic problems.
Throughout this section, there will be the parameters m, n ∈ N, where

n is the number of unknown variables

and
m is the number of equations that must be fulfilled.

The system of equations we want to solve here, will be of the following form.

Definition 2.16. Let n, m ∈ N, aij ∈ R and bj ∈ R for 1 ≤ i ≤ n and 1 ≤ j ≤ m.

A system of linear equations or linear system with real coefficients is given by

a11 x1 + a12 x2 + ... + a1n xn = b1

a21 x1 + a22 x2 + ... + a2n xn = b2
.. .. .. ..
. . . .
am1 x1 + am2 x2 + . . . + amn xn = bm

The xi with 1 ≤ i ≤ n are called variables, or unknowns.

The aij are called the coefficients of the system.
The matrix A = (aij )m,n
i=1,j=1 is the matrix of coefficients.
The tuple b := (b1 , . . . , bm ) is called the right hand side (RHS) of the system.
If there exist such numbers x1 , . . . , xn ∈ R that fulfill all the equations, then we call the tuple
x = (x1 , . . . , xn ) a solution to the linear system.
If there is no solution, then we call the linear system inconsistent.

If we recall that the matrix-vector product of the matrix of coefficients A ∈ Rm×n and the vector
of variables x ∈ Rn is defined by
 
a11 x1 + a12 x2 + · · · + a1n xn

a21 x1 + a22 x2 + · · · + a2n xn 
 ∈ Rm ,
 
Ax =  ..
.
 
 
am1 x1 + am2 x2 + · · · + amn xn
we see that the system of linear equations, given in Definition 2.16, can be written in short by
Ax = b.

Obviously, we are interested in solutions to a linear system. However, as already discussed

above, such systems may have no, a unique, or even infinitely many solutions. We will see that
63

this can be verified by analyzing the matrix of coefficients more detailed. Before we come to
this, let us introduce some more notation and discuss some examples.

Definition 2.17. Given a linear system Ax = b with coefficient matrix A and RHS b, then
we denote the set of solutions by
n o
L(A, b) := x ∈ Rn : Ax = b .

Example 2.18. Let us consider the examples from the beginning of Section 2. That is, we want
to solve the system
2x1 + x2 = 1,
6x1 + 3x2 = 3,
and we have already seen that this system is solved by any (x1 , x2 ) ∈ R2 with 2x1 + x2 = 1.
Putting this into our notation, we have that A = ( 26 13 ) and b = ( 13 ), and with this
n o
L(A, b) = L ( 26 13 ), ( 13 ) = (x1 , x2 ) : 2x1 + x2 = 1 .

Recall that changing the RHS to b = ( 12 ) leads to a system without solution, i.e.,

L(A, b) = L ( 26 13 ), ( 21 ) = ∅.

This example is somehow special because the existence of a solution depends on the RHS.
Although this is not a rare case, we will see that there are conditions for a linear system to
be uniquely solvable for any RHS b. This is of particular interest in applications, where the
RHS b usually represents some kind of measurements or requirements, which we may not choose
ourself, and we want to find a solution x to specify some parameters.

Before we discuss a systematic way to solve large linear systems, let us see some more examples.

Example 2.19. We want to solve the following problem:

Assume you’ve ordered 3 pizzas and 5 drinks, but you forgot the individual prices. You only
know that you’ve paid 42 EURO, and that a pizza was 6 EURO more expensive than a drink.
To solve this “problem” let x1 and x2 be the price of a pizza and a drink, respectively. From
our assumption on the overall cost we know that 3x1 + 5x2 = 42, and the second assumption
reads x1 = x2 + 6. This can be written as the linear system

3x1 + 5x2 = 42,

x1 − x2 = 6.

By substituting x2 = x1 −6 in the first equation, we see that a solution must satisfy 8x1 −30 = 42,
and we obtain that the price of a pizza is x1 = 9. From x2 = x1 − 6 we then see that x2 = 3.
Therefore, n o
L 31 −1
5 , ( 42 ) = ( 9 ) .

6 3

Note that L(A, b) is a set and therefore, we need to write L(A, b) = {x}, and not L(A, b) = x,
if x is the only solution.
64

Example 2.20. Let us consider the linear system

x1 + 2x2 + 6x3 = 0,
2x1 + 5x2 = −1,
x2 = 3.
By the third equation we already know that any solution must satisfy x2 = 3. If we plug this
into the second equation we see that x1 = −8. Finally, from the first equation, we see that the
only possible choice then for x3 is 31 . Therefore, L(A, b) = {(−8, 3, 13 )}, where A and b are the
corresponding matrix of coefficients and RHS, respectively.

Remark 2.21. In the above example, we wrote L(A, b) = {(x1 , . . . , xn )}. Note that this is a
bit inaccurate, because (x1 , . . . , xn ) seems to be a row vector, but we have the convention that
elements from Rn , so in particular the solution x ∈ Rn , are column vectors. More precisely,
we should have written L(A, b) = {(x1 , . . . , xn )T }. However, as it is obvious that the vector is
supposed to be a column vector, we usually omit the transpose, for simplicity. (Note that the
matrix product Ax would not be defined if x is a row vector.) In the same way, we might define
x = (x1 , . . . , xn ) ∈ Rn and assume, unless stated otherwise, that x is a column vector.

The next examples are given with solution for your own exercise.
Example 2.22. Consider the linear system
x1 + x4 = 0,
− 4x2 + 16x4 = 0,
2x3 − 6x4 = 0.
Then, the set of solutions is given by
1 0 0 1 0 n o
L 0 −4 0 16 , 0 = (−λ, 4λ, 3λ, λ) : λ ∈ R .
0 0 2 −6 0

In particular, there are infinitely many solutions.

Example 2.23. Consider the linear system
x1 + 2x2 + 6x3 = 0,
2x1 + 5x2 = −1,
x2 = 3,
x1 − x3 = 2.
Note that we already computed that, if we consider only the first three equations, then there is
the unique solution (x1 , x2 , x3 ) = (−8, 3, 31 ), see Example 2.20. However, this solution does not
agree with the fourth equation. So there is no solution to this linear system, i.e. L = ∅.

We now discuss one special case of equations, which will also lead to a first answer to the question
if a linear system can have infinitely many solutions.

Definition 2.24. Let A ∈ Rm×n and x ∈ Rn . A linear system of the form

Ax = 0,

i.e., the RHS is the zero vector 0 := 0m1 , is called a homogeneous system (of equations).
To a given linear system Ax = b, we call Ax = 0 the corresponding homogeneous system.
65

It is rather easy to see that for any matrix A we have A · 0 = 0 (again 0 is a vector here). Just
have a look at the definition of the matrix-vector product. This implies that every homogeneous
linear system has at least the solution x = 0, and this solution is called the trivial solution.
Written mathematically, we have that L(A, 0) ⊃ {0} for any matrix A.
In some cases, a homogeneous system has more than the trivial solution, see Example 2.22.
However, if the trivial solution is also the only solution of a homogeneous system, then the
following lemma shows that the solution to a linear system Ax = b, if it exists, is unique.

Lemma 2.25. Let A ∈ Rm×n , x, y ∈ Rn and b ∈ Rm , such that Ax = b and Ay = 0.

Then, x + y also solves the linear system, i.e., A(x + y) = b.

In particular, if there is a solution y 6= 0 to Ay = 0, and at least one solution x to Ax = b,

then there are infinitely many solutions.

Moreover, if Ay = 0 has only the trivial solution y = 0, and there is at least one solution x
to Ax = b, then this solution x is unique.

Proof. The first statement follows directly from the linearity of matrix multiplication. We obtain

A(x + y) = Ax + Ay = b + 0 = b.

For the second statement, note that if there is a solution y 6= 0 satisfies Ay = 0, then also
the vector λ · y satisfies A(λy) = λAy = 0, and is therefore also a solution. Hence, there are
automatically infinitely many solutions to the homogeneous system. Together with the first
part, we see that Ax = b has infinitely many solutions.
For the third part, assume that the only solution to Ay = 0 is y = 0. If there would be two
solutions x and z to the linear system, i.e., Ax = Az = b, then their difference would satisfy

A(x − z) = Ax − Az = b − b = 0.

In other words, x − z is a solution to the homogeneous system. As we assumed that the only
solution to the homogeneous system is zero, we obtain that x − z = 0 or x = z. This shows
that, if there are two solutions to Ax = b, then they are equal, which shows the uniqueness of
the solution.

Remark 2.26. Analogously one can define linear systems also in the complex case, i.e., co-
efficients and RHS might be complex. Then, we are clearly interested in complex solutions.
However, note that every complex equation can be written as two “real” equations by consid-
ering the real and complex parts separately. Therefore, every complex linear system can be
written as a larger real linear system. We therefore only consider the real case.
66

2.3 Gaussian elimination

Now we make an important observation which allows us to derive an algorithm for solving
linear systems by manipulating matrices. We can do the following operations to a linear system
without changing the set of solutions:

1) Interchanging any two equations, i.e., changing the order of the equations.

2) Multiplying an equation with a scalar 0 6= λ ∈ R.

3) Adding a multiple of an equation to another equation.

(Think for a second, why these operations do not change the set of solutions.)
Since every system of linear equations can be written with the help of a matrix, it is clearly
of interest how the above operations change the corresponding matrix of coefficients of a linear
system. We will see that they indeed allow for successive modifications that lead to “much
simpler” matrices, i.e., matrices in echelon form. From such a matrix, we will be able to basically
see if a corresponding linear system is (uniquely) solvable or not.

Let us start by discussing how the above operations to a linear system Ax = b affect the
corresponding matrix A. However, note already now that these operations also change the
RHS b of a linear system, and this is essential. We will come back to this shortly, but for now
we only consider the corresponding matrix of coefficients.
In view of the operations from above that can be used to change a linear system Ax = b without
changing the set of solutions, we see that the matrix A is changed in the following way:

1) interchanging two rows,

2) multiply a row with a scalar 0 6= λ ∈ R, and

3) adding a multiple of a row to another row.

For obvious reasons, these operations are called row operations.

The goal is now to use these operations in a way that “creates” a lot of zero entries in the matrix.
And we make this in a systematic way that “creates” the zero entries always in the “lower left
corner” of a matrix until it is of echelon form. The resulting matrices will look like
   
1 5 −1 2 3 −1 0
0 1 2  or 0 0 2 2 .
   
0 0 0 0 0 0 1

That is, there are as many as possible 0’s to the left of every row, and they are ordered such that
number of zero entries is increasing from top to bottom. In particular, the leading coefficient
of a row, i.e., the first non-zero entry of a row, is strictly to the right of the leading coefficient
of the row above. Before we discuss some examples, let us state the general definition.
67

Definition 2.27. A matrix C = (cij ) ∈ Rm×n of the form

 
0 0 c1j1 ∗ ∗
0

0 c2j2 ∗ ∗

0 0 c3j3 ∗ ∗
 
 
 
 
C= ,
0
 0 ckjk ∗ ∗

0 0


 
 
 
0 0

where ∗ stands for an arbitrary entry, is in row echelon form (ger. ’Treppenform’).
That is, there exist numbers k ≤ m and 1 ≤ j1 < · · · < jk ≤ n such that for all 1 ≤ i ≤ k:

• ciji 6= 0,

• cij = 0 for all j < ji , i.e. ciji is the first non-zero element in the i-th row, and

• c`ji = 0 for all ` > i, i.e. ciji is the last non-zero element in the ji -th column.

The number k is called rank of the matrix, and we write rank(C) := k.

If, in addition,

• ciji = 1 for all i ≤ k, and

• c`ji = 0 for all ` < i, i.e. ciji = 1 is the only non-zero element in the ji -th column,

then the matrix is in reduced row echelon form (ger. ’Treppennormalform’).

We do not prove the following statement here formally, but note that it is the basis of the
considerations below.

Theorem 2.28. Every matrix can be transformed to (reduced) row echelon form by per-
forming row operations. Moreover, the reduced row echelon form of a matrix is unique.

In contrast, a given matrix A can be transformed by row operations into different matrices in
(non-reduced) row echelon form. (For example, multiplying any row by 2 leads to another row
echelon form.) But even then, all row echelon forms of a matrix have the same “k”, i.e., the
same rank. That’s why the rank is a characteristic of a matrix, which turns out to be essential.

Definition 2.29. Let A ∈ Rm×n be arbitrary. We define the rank of A, denoted by

rank(A), as the rank of a corresponding row echelon form C of A.

Note that the definition implies that rank(A) ≤ min{m, n} for any A ∈ Rm×n , and one may say
that the rank is the number of independent rows in the matrix. (We will discuss later what
this precisely means.)
68

Let us see some examples, before we show that computing the reduced row echelon form of a
matrix is actually solving the corresponding system of linear equations.

Example 2.30 (Diagonal matrices). Diagonal matrices with all diagonal entries non-zero are
already in echelon form. By dividing each row by the diagonal entry, we obtain the reduced
row echelon form, which equals the identity. In particular, all diagonal matrices with non-zero
diagonal entries have the same reduced row echelon form.

Example 2.31 (Rearranging rows). In general, we may easily bring every matrix into echelon
form that only contains at most one non-zero entry per row. We just need to rearrange the
rows. For example, a row echelon form of
   
0 0 0 −1 4 0 0 0
4 0 0 0 0 3 0 0
is .
   
0 0 0 0 0 0 0 −1
  

0 3 0 0 0 0 0 0

(In the definition above, we have (j1 , j2 , j3 ) = (1, 2, 4), and c11 = 4, c22 = 3 and c34 = −1.)
1 0 0 0
The rank is 3, and the reduced row echelon form is 0100 .
0001

Example 2.32 (Removing duplicate rows). Addition of (a multiple of) a row to another is a
row operation. In particular, we can subtract a row from another. For example, by subtracting
the first row from the second, and twice the first row from the third, we see that the reduced
row echelon form of the matrix
   
1 1 1 1 1 1
1 1 1 is 0 0 0 .
   
2 2 2 0 0 0

The rank of this matrix is 1. (We have j1 = 1 and c11 = 1.)

The above example shows that a rank-one matrix has indeed rank one.
Given x ∈ Rn and y ∈ Rm , we consider the rank-one matrix xy T ∈ Rn×m . By definition of the
matrix product, we see that every column of xy T is a multiple of x, and every row is a multiple
of y T . In particular, the k-th row of xy T is xk · y T . Since all rows are multiples of each other,
we can proceed as in Example 2.32 to obtain rank(xy T ) = 1.
0
Example 2.33. For example, let x = 1 ∈ R3 = R3×1 and y T = (2, 3, 4, 5) ∈ R1×4 . We obtain
2
   
0 0 0 0 0
3×4
1 2 3 4 5 = 2 3 4 5  ∈ R .
   
2 4 6 8 10

Subtracting twice
2 3 4the second row from the third, and interchanging rows, leads to the row
5
echelon form 0 0 0 0 . This shows rank(xy T ) = 1.
0000
69

Let us consider some more examples.

Example 2.34. Let us consider the matrix

 
1 2 3
A = 4 5 6  .
 
7 8 9

We bring this matrix into reduced row echelon form by using only row operations. Systematically,
we want to create zeros in the lower left entries. So, we start by subtracting the first row 4-times
from the second, which leads to a zero in the first entry of the second row. We indicate this
operation by “II − 4I”. Afterward, we subtract the first row 7-times from the third (“III − 7I”),
which leads to a zero in the first entry of the last row, and so on.
 
1 2 3
4 5 6
 
 
7 8 9
 
1 2 3
II − 4I 0 −3 −6 
 
−−−−−→

7 8 9
 
1 2 3
III − 7I 0 −3 −6 
 
−−−−−−→

0 −6 −12
 
1 2 3
III − 2II 0 −3 −6 
 
−−−−−−→

0 0 0
 
1 2 3
1
− II 0 1 2
 
−−3−→
 
0 0 0
 
1 0 −1
I − 2II 0 1 2 
 
−−−−−→

0 0 0

This is the reduced row echelon form of the matrix. Moreover, we can see that the rank is 2.

Note that there are several ways of indicating which row operation is performed. For example,
one might be more precise and write “II → II − 4I” to state that the operation is performed
only in the second (II) row. We decided to use this notation with the convention that only the
row that appears first will be changed.
70

Example 2.35. We use “I ↔ II” to indicate that we interchanged the first and the second
row. (Note that afterwards, the first row is denoted by II and vice versa.)
 
0 8 0

 3 6 0 

6 0 1
 
 
6 15 0
 
3 6 0
 0 8 0 
I−−↔
−−II
 
→ 
 6 0 1


6 15 0
 
3 6 0
 0 8 0 
III − 2I
 
−−−−−−→ 0 −12 1
 
 
6 15 0
 
3 6 0
 0 8 0 
IV − 2I
 
−−−−−→ 0 −12 1
 
 
0 3 0
 
3 6 0
3  0 8 0 
IV − II
 
0 −12 1
 
−−−−−8−→
 
0 0 0
 
3 6 0
3  0 8 0 
III + II
 
0 0 1
 
−−−−−−2−→
 
0 0 0
 
1 2 0
1  0 8 0 
I
 
0 0 1
 
3→
−
 
0 0 0
 
1 2 0
1  0 1 0 
II
 
0 0 1
 
8−→
−
 
0 0 0
 
1 0 0
 0 1 0 
I − 2II
 
−−−−−→ 0 0 1
 
 
0 0 0

This matrix has therefore rank 3. (We have (j1 , j2 , j3 ) = (1, 2, 3) and c11 = c22 = c33 = 1.)

As you see it can be rather time consuming to compute the reduced row echelon form even for
rather small matrices. However, it is a very straight-forward method. Meaning that it is
always obvious what to do next, and miscalculation is basically the only source of errors.
71

Recall that the reduced echelon form of a matrix is unique, but the order of the calculations to
find it is not. To make computations generally easier and faster, there are two rules of thumb:

• Try to work with rows with fewer non-zero entries.

• If a row contains only one non-zero element, then we can immediately set all other elements
in the corresponding column to zero.

The second “shortcut” can clearly be performed by adding multiples of the row with the unique
non-zero entry to all other rows. Let’s consider an example.
Example 2.36. We consider the matrix
 
2 6 5 0 4

 4 17 8 0 16 

12 42 8 14 0 .
 

 
 0 0 13 0 0 
28 0 3 0 0
We see that in the fourth row there is only one non-zero entry, which is in the third column.
Hence we can reduce the third column immediately and get
 
2 6 0 0 4

 4 17 0 0 16 

12 42 0 14 0 .
 

 
 0 0 1 0 0 
28 0 0 0 0
Note that we also divided the 4th row by 13 to make things easier.
Next we see that the fifth row is already in the form we wish for the first row. So, we interchange
them. Again, we can put all entries in the first column to zero:
 
1 0 0 0 0

 0 17 0 0 16 

0 42 0 14 0 .
 

 
 0 0 1 0 0 
0 6 0 0 4
If we now subtract 4-times the last row from the second, we see that the new second row reads
(0, −7, 0, 0, 0). Dividing by -7, and putting all other entries in the second column to zero, yields
 
1 0 0 0 0

 0 1 0 0 0 

0 0 0 14 0 .
 

 
 0 0 1 0 0 
0 0 0 0 4
Deviding the third row by 14, and interchange III and IV , finally yields the reduced row echelon
form  
1 0 0 0 0
 0 1 0 0 0 
 
 0 0 1 0 0 .
 
 
 0 0 0 1 0 
0 0 0 0 1
We see that the rank is 5, and j` = ` for ` = 1, . . . , 5, i.e., (j1 , j2 , j3 , j4 , j5 ) = (1, 2, 3, 4, 5).
Moreover, c`` = 1 for ` = 1, . . . , 5.
72

Now we discuss how we can solve linear systems by calculating (reduced) row echelon forms
of matrices. We consider the linear system Ax = b with corresponding matrix A = (aij ) ∈ Rm×n
and RHS b = (b1 , . . . , bm ), and define the augmented matrix (A|b), i.e., we consider the array
 
a11 a12 ... a1n b1

 a21 a22 ... a2n b2 

(A|b) :=  .. .. .. .. .
..

 . . . . .


am1 am2 . . . amn bm

This means that we add b as new column. We already discussed that by interchanging, mul-
tiplying and adding rows we do not change the set of solutions. So, if some (C|b0 ) is obtained
from (A|b) only by row operations, then

L(A, b) = L(C, b0 ),

where L(A, b) denotes the set of solutions of Ax = b, see Definition 2.17.

Now assume that the augmented matrix (A|b) is transformed into an augmented matrix (C|b0 ),
where C is in row echelon form. (Here, we consider the vector b as the last column of the matrix,
and therefore have to respect it while performing row operations. But we only want to bring
A in row echelon form.) From this augmented matrix we can just “see” the solutions of the
corresponding linear system. This way of computing solutions is called Gaussian elimination.

Before we discuss the general procedure, let us see a minimal example.

3 5
and b = ( 42

Example 2.37. Consider the linear system Ax = b with A = 1 −1 6 ). We have
n o
already seen in Example 2.19 that x = ( 93 ) is the unique solution, i.e., L 31 −1
5 , ( 42 ) = ( 9 ) .

6 3
To compute this! solution by Gaussian elimination, we bring! the augmented matrix (A|b) =
3 5 42 1 −1 6
into the row echelon form by subtracting 3-times the second
1 −1 6 0 8 24
row from the first, and then interchanging rows. Dividing the second row
! by 8 and adding it to
1 0 9
the first, we obtain the reduced row echelon form (C|b0 ) = , from which we “see”
0 1 3
that x1 = 9 and x2 = 3.

1 2 6
Example 2.38. Let us again consider Example 2.20, which is given by Ax = b with A = 250
0 010
and b = −1 . We bring the augmented matrix in reduced row echelon form:
3
     
1 2 6 0 1 2 6 0 1 0 6 −6
 2 5 0 −1  II − 2I  0 1 −12 −1  ...→  0 1 0 3 
     
−−−−−→ −−
0 1 0 3 0 1 0 3 0 0 −12 −4
 
1 0 0 −8
...→  0 1 0 3 
 
−−
0 0 1 1/3

and we see that the unique solution is x = (−8, 3, 1/3).

For the general procedure, assume that A has rank k, i.e., rank(A) = k. We obtain that the
augmented matrix (C|b0 ) in row echelon form, which we obtain from (A|b), looks like

b01
 
0 0 c1j1 ∗ ∗
0

0 c2j2 ∗ ∗ b02 

0 0 c3j3 ∗ ∗ b03
 

 
 
0
 
(C|b ) =  ,
0 0 ckjk ∗ ∗ 0
bk 
 
0

0 b0k+1 

 
 
 
0 0 b0m

where ∗ stands for an arbitrary entry.

(Note that there are (n − jk ) many ∗’s to the right of ckjk , and this might be 0 if jk = n.)
Let us first have a closer look at the numbers b0k+1 , . . . , b0m :
If one of them is not equal to zero, then the linear system cannot have a solution. To see this
note that the `-th row, with ` > k, corresponds to the equation

0 · x1 + 0 · x2 + · · · + 0 · xn = b0l .

Since the left hand side is equal to zero for every x, we obtain a contradiction if one of the b` ’s
is not equal to zero. We therefore obtain the rule:

If b0` 6= 0 for some ` = k + 1, . . . , m, then L(A, b) = L(C, b0 ) = ∅.

In the other case, i.e., b0k+1 = . . . , b0m = 0, the system will always have a solution, which we
compute “from the bottom to the top”:
We first consider the last non-zero equation, i.e., the k-th equation, which reads
n
ck` x` = b0k ,
X

`=jk

and we know that the leading coefficient ckjk 6= 0.

In the case jn = n, we see that this equation reads ckn xn = b0k , which yields

b0k
x n = x jk = ,
ckjk

and we know the value of xn of any solution x = (x1 , . . . , xn ) of Ax = b.

If jn < n, then, by rearranging, the equality is equivalent to
 
n
1  0 X
xjk = bk − ck` x`  .
ckjk `=j +1 k

All possible solutions have to fulfill this identity, otherwise the k-th equation could not be true.
Therefore, for any given choice of xjk +1 , . . . , xn , we have to choose the unique xjk . However,
these x` with ` = jk + 1, . . . , n can be chosen freely.
74

Now assume that we have already fixed the values for xn , . . . , xjk . We turn to the next equation,
which is the (k − 1)-th:
By the same principle we can give a formula for xjk−1 depending only on the x` with ` ≥ jk−1 +1.
Since we have only fixed xn , . . . , xjk , we can again choose x` with ` = jk−1 + 1, . . . , jk − 1 freely.
(Note that there is no free choice if jk = jk−1 + 1.) So, after this step, we have computed the
components xn , . . . , xjk , . . . , xjk−1 of a solution (x1 , . . . , xn ) of Ax = b, i.e., the last components.
If we continue this process, we finally see, that there are precisely k = rank(A) such equalities
that “fix” the value of the unknowns xj1 , . . . , xjk . These equalities are
 
n
1  0 X
x ji = bi − ci` x` 
ciji `=j +1 i

for i = 1, . . . , k. But the remaining variables, i.e., the x` with ` ∈ {1, 2, . . . , n} \ {j1 , j2 , . . . , jk },
can all be chosen freely. Hence, when we write down the solution, there are n−k free parameters.
This is usually phrased as “a linear system has n − k degrees of freedom”.

Let us discuss some examples.

Example 2.39. Consider the linear system Ax = b with the matrix
 
2 1 −2
A = 0 1 2 
 
0 0 0

and b = (5, 1, 0). To compute a solution x = (x1 , x2 , x3 ), we consider the augmented matrix
 
2 1 −2 5
(A|b) = 0 1 2 1.
 
0 0 0 0
Since this matrix is already in row echelon form, we do not need to perform row operations.
Specifically, we have the row echelon form with (j1 , j2 ) = (1, 2). By Gaussian elimination (with
C = A and b0 = b), we can choose the variables with index in {1, 2, 3} \ {j1 , j2 } = {3} freely, i.e.,
we already know that x3 is a free parameter. Using the formulas established above, we obtain
(“from the bottom to the top”)
3
!
1 X 1
x2 = b2 − c2` x` = (1 − 2x3 ) = 1 − 2x3
a22 `=3
1

and, using this,

3
!
1 X 1 1
x1 = b1 − c1` x` = (5 − 1x2 + 2x3 ) = (5 − (1 − 2x3 ) + 2x3 ) = 2 + 2x3 .
a11 `=2
2 2

We therefore obtain that the set of solutions is

2 1 −2 5 n o
L 01 2 , 1 = (x1 , x2 , x3 ) ∈ R3 : x2 = 1 − 2x3 and x1 = 2 + 2x3 .
00 0 0

For notational convenience, we choose λ as a name for the free parameter, and write
2 1 −2 5 n o
L 01 2 , 1 = (2 + 2λ, 1 − 2λ, λ) ∈ R3 : λ ∈ R .
00 0 0

(Convince yourself that these are the same sets.)

Note that we have infinitely many solutions. For example, x = (2, 1, 0) (for λ = 0) or x =
(0, 3, −1) (for λ = −1).
However, if we would consider, e.g., the RHS b0 = (0, 0, 1), then there would be no solution since
the reduced echelon form  
2 1 −2 0
(A|b0 ) = 0 1 2 0
 
0 0 0 1
shows the contradiction in the last equation.

Example 2.40. Note that some formulas get easier when we transfer a linear system into
reduced row echelon form. If we consider the example from above, we see that it can be
transferred to
   
2 1 −2 5 1 0 −2 2
0
(A|b) =  0 1 2 1  →  0 1 2 1  =: (C|b ).
   
0 0 0 0 0 0 0 0
Again, the second row is seen to be equivalent to x2 = 1 − 2x3 . However, the first row shows
directly that x1 = 2 + 2x3 , and there is no need to plug the already obtained formula for x2 in,
as above. That’s another advantage of the reduced row echelon form. We obtain again
3

that L(A, b) = (2 + 2λ, 1 − 2λ, λ) ∈ R : λ ∈ R .

Example 2.41. Consider the linear system Ax = b with the matrix

 
2 1 −2
A= 0 1 2 ,
 
−2 −2 0
with an arbitrary RHS b = (b1 , b2 , b3 ) ∈ R3 . We want to see, depending on b, if there is a
solution x = (x1 , x2 , x3 ) and compute it. For this, we consider the augmented matrix and its
row echelon form
   
2 1 −2 b1 2 1 −2 b1
0
(A|b) =  0 1 2 b2  and (C|b ) = 0 1 2 b2 .
   
−2 −2 0 b3 0 0 0 b1 + b2 + b3
Note that we just added the first and the second row to the last. We see that this system has
a solution if and only if b1 + b2 + b3 = 0. And in this case, repeating the calculations from the
last example, we see that x3 is a free parameter, while x2 = b2 − 2x3 and
1 1
x1 = (b1 − x2 + 2x3 ) = (b1 − b2 + 4x3 ).
2 2
The set of solutions in the form
2 1 −2 b1 − b2

L 0 1 2 ,b = + 2λ, b2 − 2λ, λ ∈ R3 : λ ∈ R .
−2 −2 0 2
Note that a sometimes even faster way of finding the set of solutions is to compute the reduced
row echelon form of (A|b) which is
b1 −b2
 
1 0 −2 2
0 1 2 b2 .
 
0 0 0 b1 + b2 + b3
Just subtract the second row from the first in (C|b0 ) and divide the new first row by 2.
From this one can see the set of solutions.
76

Example 2.42. A larger example, that we only present with a short solution is the linear
system given by  
9 −6 −12 30 −6
 5 −10 −20 0 20 
(A|b) =  .
 
−5 2 4 10 −10
−2 −4 −8 40 −16
This has the reduced row echelon form
 
1 0 0 0 0
0 1 2 0 −2 
.
 
0 0 0 1 − 35 


0 0 0 0 0

We see that x3 is a free parameter. From the third equation we get that x4 = − 35 . We deduce
from the second equation that x2 = −2 − 2x3 , and the first equation yields that x1 = 0. So we
obtain the set of solutions
  

 0 

 
 −2 − 2λ
 
L(A, b) =  : λ ∈ R .


  λ  

− 35

 


By Gaussian elimination we just found all solutions of the linear system Ax = b.

Note that a linear system is inconsistent, i.e., has no solution, if and only if there is a zero
row in C (from the augmented matrix (C|b0 )) and the corresponding b0` 6= 0. Since there are
exactly m rows, and k = rank(A) of them are non-zero, we see that a linear system is solvable
(independent from the RHS b) if rank(A) = m. In this case, we say that A has full (row) rank.
Moreover, since there are exactly n unknowns, and the above procedure only fixes k of them.
Hence, there are “n − k degrees of freedom”. We see that a solution of Ax = b, if it exists, is
unique if and only if rank(A) = n.

We collect these findings in the following lemma, which are particularly meaningful if m = n,
i.e., if A is a square matrix. This is the case if there are as many equations as unknowns.

Lemma 2.43. Let A ∈ Rm×n .

1. If rank(A) < m, then the linear system Ax = b has no solution for certain b ∈ Rn .

2. If rank(A) < n, then the homogeneous system Ax = 0 has infinitely many solutions.

Hence, if rank(A) < min{m, n}, then the linear system Ax = b has either no or infinitely
many solutions, depending on b ∈ Rn .
Moreover, if A ∈ Rn×n is a square matrix, i.e., m = n, with rank(A) = n, then the linear
system Ax = b has a unique solution for any b ∈ Rn .

Proof. The lemma follows from the considerations above, together with the fact that the homo-
geneous system Ax = 0 has always at least the solution x = 0, see also Lemma 2.25.
77

Let us see an example with a square matrix of full rank.

Example 2.44. We have a look at
 
1 3 0 1
(A|b) = 2 0 2 0.
 
4 12 2 2
This matrix is transformed by row operations to the row echelon form
 
1 3 0 1
0 −6 2 −2,
 
0 0 2 −2
which is already enough to see that rank(A) = 3. Therefore, we can compute a unique solution.
One way is to transform the augmented matrix further to
   
1 3 0 1 1 0 0 1
0 −6 0 0  and then to 0 1 0 0 
   
0 0 2 −2 0 0 1 −1
We therefore “see” the unique solution x = (1, 0, −1).

It is worth noting that a row echelon form of a square matrix is always an upper triangular
matrix. That is, the row echelon form is of the form
a11 . . . a1n
 
 .. ..  ,
 . . 
0 ann
i.e., all entries below the diagonal are zero. The matrix has full rank if and only if all
diagonal entries are non-zero. On the contrary, every upper triangular matrix A ∈ Rn×n
with non-zero diagonal elements is already in row echelon form.

Matrices with full rank play an important role, in particular, since there is a unique solution to
a corresponding linear system independent of the RHS b. However, the technique above only
seems to work for a fixed RHS b.
In the following sections we will discuss the inverse of a matrix A ∈ Rn×n , if it exists, which will
lead to a formula for a solution of Ax = b for any b. However, let us add that it is not usual
to compute the inverse of a large matrix on a computer, since it is computationally quite hard.
Large linear systems are usually solved with (variants of) Gaussian elimination.

Remark 2.45. One straight-forward extension of Gaussian elimination is to solve a linear

system with coefficient matrix A ∈ Rm×n for different right hand sides, say b, c ∈ Rm . In this
case we consider the augmented matrix
(A|b|c)
and proceed as above. We then obtain the system (A0 |b0 |c0 ) , which is in row echelon form.
(Precisely, we again only want A0 to be in row echelon form.)
Then, we can compute solutions xb by considering (A0 |b0 ) and solutions xc by having a look at
(A0 |c0 ), since both are clearly in row echelon form. However, if we would now consider another
RHS, then we would need to repeat all the computations.
78

2.4 The determinant

We now introduce the determinant of a matrix, which will be a frequently appearing quantity,
and is a good tool to decide if a matrix is invertible or not. Moreover, it is needed to introduce
Cramer’s rule for solving linear systems, and to give an explicit formula for the inverse matrix.

Definition 2.46. Let A ∈ Rn×n be a square matrix, and denote the rows of A by
a1 , . . . , an ∈ R1×n . Moreover, let λ, µ ∈ R, and w ∈ R1×n .
We define the determinant of A by the unique mapping det : Rn×n → R such that:

1. For any 1 ≤ i ≤ n, the determinant is linear in the i-th row of A, i.e.,

   
a1 a1
 ..   . 
 .. 

 . 
  
det λai + µw = λ det(A) + µ det  w 
   
 ..   . 
 . 
.  . 
 
 
an an

2. If there exist i 6= j such that ai = aj , i.e., two equal rows, then det(A) = 0.

3. The identity matrix In ∈ Rn×n satisfies

det(In ) = 1.

Note that the determinant is only defined for square matrices.

The definition directly shows the connection to linear systems. However, let us show explic-
itly how the determinant changes under row operations.

Lemma 2.47. Let A ∈ Rn×n . Then,

1. If the matrix B is obtained by multiplying one row of A by a scalar λ ∈ R, then

det B = λ det A. In particular, det(λA) = λn det(A).

2. If the matrix B is obtained by interchanging two rows of A, then det B = − det A.

3. Adding a multiple of one row to another row does not change the determinant.

Let us show how this follows from the definition.

Proof. The first point follows immediately from Definition 2.46(1) with µ = 0. Moreover, note
that λA is the matrix with every row multiplied by λ. Since there are n rows, we have to apply
this rule for each row one by one and obtain det(λA) = λn det(A).
For the second point we assume that B is the matrix which is obtained by interchanging the
i-th and the j-th row of the matrix A. (The case of column uses again the transpose.) Recall
from Definition 2.46(3) that, if a matrix contains a row more than once, then its determinant is
79

zero. This allows us to “add a zero” in the following way

a1 a1 a1 a1 a1 a1
           
 ..   ..   ..   ..   ..   .. 
 .   .   .   .   .   . 
           
a  a  a  a  a  a 
 i  j  i  i  j  j
 .   .   .   .   .   . 
det A + det B = det  ..  + det  ..  = det  ..  + det  ..  + det  ..  + det 
         
 ..  ,

 aj   ai   aj   ai   ai   aj 
           
 ..   ..   ..   ..   ..   .. 
           
 .   .   .   .   .   . 
an an an an an an
where the second and the fourth summand on the RHS are just zero.
Now, note that the first and the second, as well as the third and the fourth, do only differ in
one row, which allows us to use Definition 2.46(2) to obtain
a1 a1 a1
     
   ..    .. ..





 . 




 . .
 a   a  a + a 
 i   j   i j
 .   .   . 
 ..  + det  ..  = det  ..  ,
det A + det B = det      
ai + aj  ai + aj  ai + aj 
     
 ..   ..   .. 
     
 .   .   . 
an an an
where we used again that for the last equality that the corresponding matrices differ only in
one row. However, the determinant on the right is just zero because it has two equal rows.
(The i-th and the j-th row equals ai + aj .) We therefore obtain det A + det B = 0 which gives
det B = − det A, and proves the second point of the Corollary.
For the third point, let B be the matrix, where we added λ-times the j-th row to the i-th row
of A. We obtain
a1 a1 a1
     
  ..  ..   .. 



 .   
 .   . 
a + λa  a  a 
 i j  i  j
 .
..
  .   . 
det B = det   = det  ..  + λ det  ..  = det A,
     
 aj  aj   aj 
     

..  ..   .. 
     
 
 .   .   . 
an an an
where we use that the last determinant is zero, because of two equal rows.

Let us start with the computation of the determinant of diagonal matrices.

Example 2.48. We consider the diagonal matrix D = diag(d1 , d2 , . . . , dn ) with dk ∈ R. By
observing that the i-th row of D is exactly di · eTi , we obtain
   
d1 e1 e1
 d2 e2 
 
 d2 e2 
  n
Y n
Y
det(D) = det  .  = d1 det  . 
   = · · · = di det(I n ) = di ,
 ..   .. 

i=1 i=1
dn en dn en
where we use det(In ) = 1. In particular, the determinant is zero iff dk = 0 for some k.
80

It is apparent that the same formual already holds for triangular matrices.

Lemma 2.49. Let A ∈ Rn×n be an upper triangular matrix, which is a matrix of the
form
a11 . . . a1n
 

A=
 .. ..  ,
. . 
0 ann
i.e., all entries below the diagonal are zero. Then,
n
Y
det A = aii .
i=1

This formula holds also for lower triangular matrices, which are zero above the diagonal.

Proof. Let us first assume that rank(A) < n. In this case we can produce a zero row in A by
using only row operations, see the definition of the rank. Therefore, the determinant is zero in
this case.
If rank(A) = n, then we know that A is already in row echelon form and that, in particular,
aii 6= 0 for all i = 1, . . . , n. Therefore, by adding multiples of rows to other rows (“from bottom to
top”), we can transform the matrix to a diagonal matrix without changing its diagonal elements
and its determinant. We obtain det A = det(diag(a11 , a22 , . . . , ann )) which proves the result.

The determinant is of special interest, because it can be used to characterize whether a linear
system is uniquely solvable, see Lemma 2.43, i.e., the matrix has full rank. We will see soon
that is equivalent to the corresponding matrix being regular (aka. invertible).

Theorem 2.50. Let A ∈ Rn×n . Then,

det A 6= 0 ⇐⇒ rank A = n

Proof. Note that every matrix can be brought into row echelon form by just adding repeatedly
rows to other rows, and this does not change the determinant. Since the row echelon form
of a square matrix is always a upper triangular matrix, we see that the matrix has full rank,
i.e., rank(A) = n, if and only if all diagonal entries of the row echelon form are not zero. By
Lemma 2.49 this is equivalent to det(A) 6= 0.

Let us see how to calculate determinants, starting with a small example.

Example 2.51. Consider the matrix A = ( 13 24 ). We know that adding a multiple of one row to
another does not change the determinant. Therefore,
! ! !
1 2 1 2 1 0
det = det = det = −2
3 4 0 −2 0 −2
81

Example 2.52. As above, we see that

     
2 1 −2 2 1 −2 2 1 −2
det  0 1 2  = det 0 1 2  = det 0 1 2  = 0
     
−2 −2 0 0 −1 −2 0 0 0

Repeating these computations for arbitrary matrices, we obtain rather easy formulas to mind.

Example 2.53 (Determinant of a 2 × 2 matrix). Consider the matrix

!
a11 a12
A= .
a21 a22

Bringing this matrix into upper triangular form, we see that

det(A) = a11 a22 − a12 a21 .

(Verify yourself!)
Using this formula again to calculate the determinant of the above example, we see
!
1 2
det = 1 · 4 − 2 · 3 = −2.
3 4

Example 2.54 (Determinant of a 3 × 3 matrix). For an arbitrary 3 × 3 matrix

 
a b c
A =  d e f ,
 
g h i

with a, . . . , i ∈ R, we can derive a similar formula, which is given by

det A = aei + bf g + cdh − ceg − bdi − af h.

This formula is usually called Rule of Sarrus, and can be easily minded by noting that one
has to multiply only entries on certain diagonals, and add them according to the orientation of
these diagonals. (Think for second which entries are multiplied, and how they are summed up.)
Again we want to give an example
 
1 2 3
det 4 5 6 = 45 + 84 + 96 − 105 − 72 − 48 = 0.
 
7 8 9

Let us present two more calculation rules, without proof. The first shows that transposing does
not change the determinant.

Lemma 2.55. For any square matrix A ∈ Rn×n we have

det(AT ) = det(A).
82

Note that this shows that Lemma 2.47 also holds if we replace “column” by “row”. That is, the
calculation rules for the determinant also hold for column operations.

The next rule of calculation shows that the determinant of the product of two matrices is the
product of the respective determinants. Recall that the determinant is only defined for square
matrices. Therefore, both matrices need to have the same dimensions.

Lemma 2.56. For any square matrices A, B ∈ Rn×n we have

det(A · B) = det A · det B.

The proofs of the last two lemmas are not hard, but quite long, and therefore we omit them
here. Note that the last lemma is mostly of theoretical value as, in general, we do not know if
a matrix is a product of easier matrices.

Example 2.57. Consider the matrix A = ( 15 7 10 ). To compute its determinant, we are lucky
22
to know that ( 15 22 ) = ( 3 4 )( 3 4 ), i.e., A = B 2 = B · B with B := ( 13 24 ). Since det(B) = −2,
7 10 1 2 1 2

see Example 2.53, we obtain

!
7 10
det = det(B)2 = 4.
15 22

Example 2.58. Another important case of Lemma 2.56 is that one of the determinants vanishes,
i.e., det(A) = 0 or det(B) = 0. In this case we already know that det(AB) = 0, without actually
computing the product AB. For example, since det( 10 00 ) = 0, we already know that
! !! !
1 0 1 2 1 2
det = 0 · det = 0.
0 0 3 4 3 4

Now that we know that the determinant “behaves well” with respect to transposition and mul-
tiplication, one might guess that a similar relation also holds for addition. However, and unfor-
tunately, there is no similar formula for the determinant of the sum of matrices as the
following simple example shows.

Example 2.59. We have a look at

! !
1 0 0 0
A= and B= ,
0 0 0 1

such that A + B = I2 . We obtain (e.g., by the formula as given in Example 2.53) that

det(A) + det(B) = 0 + 0 = 0 6= 1 = det(A + B).

This shows that, in general, we can not extrapolate from the determinant of A and B to the
determinant of their sum. (Clearly, there might be exceptions.)
83

Let us finally introduce the Laplace expansion for the determinant, which is also called co-
factor expansion or expansion along a row/column. This formula allows to compute the
determinant of large matrices recursively by computing the determinant of smaller matrices.
Let us first introduce a bit new notation.

Definition 2.60. Let A = (aij ) ∈ Rn×n and 1 ≤ i, j, ≤ n.

Then, we define the (i, j)-minor of A by
 
a1,1 ... a1,j−1 a1,j+1 ... a1,n
 . .. .. .. 
 .. . . . 
 
a . . . ai−1,j−1 ai−1,j+1 . . . ai−1,n 
Mij = det  i−1,1 ,
 
ai+1,1 . . . ai+1,j−1 ai+1,j+1 . . . ai+1,n 
 .. .. .. .. 
 
 . . . . 
an,1 . . . an,j−1 an,j+1 . . . an,n

i.e., Mij is the determinant of the (n − 1) × (n − 1)-submatrix of A obtained by deleting the

i-th row and the j-th column. We call M = (Mij )ni,j=1 the matrix of minors of A.
Moreover, we define the (i, j)-cofactor by Cij := (−1)i+j Mij ,
and call C = (Cij )ni,j=1 the cofactor matrix of A.

We can now state the following result.

Theorem 2.61 (Laplace expansion). Let A = (aij ) ∈ Rn×n . Then, we can compute the
determinant of A by expansion along ...

• the i-th row:

n
X n
X
det(A) = (−1)i+j aij Mij = aij Cij (fixed i)
j=1 j=1

• the j-th column:

n
X n
X
det(A) = (−1)i+j aij Mij = aij Cij (fixed j)
i=1 i=1

As a proof of this result would require a more detailed analysis, we leave it out.

Although this result may look complicated at first sight, it is actually rather simple to apply,
and can lead to very fast computations if the matrix under consideration contains many zeros.
Example 2.62. Consider again the matrix A = ( 13 24 ). We want to compute the determinant of
A using expansion along the first row. By Theorem 2.61 (with i = 1) we see that
2
!
1 2 X
det = (−1)1+j a1j M1j = (−1)1+1 a11 M11 + (−1)1+2 a12 M12 = 1 · M11 − 2 · M12 .
3 4 j=1

Using M11 = det(4) = 4 and M12 = det(3) = 3, we obtain det(A) = 4 − 2 · 3 = −2, as required.
84

Example 2.63. Consider the matrix

 
17 8 0 6
4 7 14 0
A= .
 
 0 13 0 0
0 3 2 0

To compute its determinant we look for a row or column with preferably only one non-zero
entry. This makes the Laplace expansion particularly useful, because most of the terms in the
sum vanish.
We choose the fourth column, i.e., we take the Laplace expansion with j = 4. (Clearly, there
are also other good choices.) We obtain
4
X
det(A) = (−1)i+4 ai4 Mi4 .
i=1

Now, note that a14 = 6, but a24 = a34 = a44 = 0. Therefore,

det(A) = (−1)5 a14 M14 = −6 · M14 .

To compute M14 we have to compute the determinant of the matrix that is obtained by deleting
the first row and the last column of A. That is,
 
4 7 14
M14 = det 0 13 0  = 104.
 
0 3 2

The last computation could be done directly with the Rule of Sarrus, or by using again Laplace
expansion along the second row to see that M14 = 13 · det( 40 14
2 ) = 13 · 8. We finally obtain
det(A) = −6 · 104 = −624.

The examples above show that one may compute determinants very fast by using Laplace
expansion. Moreover, it is interesting that some of the entries (like the 17 in the upper left
corner) were not even needed in the computation.

2.5 Cramer’s rule

In this section we introduce a straight-forward way of computing solutions to linear systems.

This also leads to a formula for the inverse matrix, based on certain determinants.

Let us start linear systems with 2 unknowns:

Given a 2 × 2-matrix A = ( aa21
11 a12
a22 ) and b = b1
b2 . The corresponding linear system Ax = b is
given by the equations
a11 x1 + a12 x2 = b1 ,
a21 x1 + a22 x2 = b2 .
Using row operations, this system can be transformed into

a11 a22 − a12 a21 x1 = b1 a22 − a12 b2 ,
a11 a22 − a12 a21 x2 = a11 b2 − b1 a21 .
85

Now, all the terms appearing in the last system of equations can be written as determinants.
Recall that det(A) = a11 a22 − a12 a21 . We obtain that the above system can be written by
!
b a
det(A) x1 = det 1 12 ,
b2 a22
!
a b
det(A) x2 = det 11 1 .
a21 b2

This shows that, whenever det(A) 6= 0, we can just divide by it to obtain the precise values of
x1 and x2 , i.e., !
1 b1 a12
x1 = det
det A b2 a22
and !
1 a b
x2 = det 11 1 .
det A a21 b2
Note that, to obtain the k-th entry of the (unique) solution x, we just need to replace the k-th
column of A by the RHS b and compute the corresponding determinant. After dividing by
det(A) we are done.

We will now see that the computations in the last example also work for more than two equations,
i.e., in the case n > 2. This is called Cramer’s rule.

Theorem 2.64 (Cramer’s rule). Let A ∈ Rn×n with det A 6= 0, and b ∈ Rn .

Then, the linear system Ax = b has the unique solution x = (x1 , . . . , xn )T given by

det(Ak )
xk = ,
det(A)

where Ak is given by
 
a1,1 . . . a1,k−1 b1 a1,k+1 . . . a1,n
 a2,1 . . . a2,k−1 b2 a2,k+1 . . . a2,n 


Ak := 
 .. .. .. .. .. .
 . . . . . 
an,1 . . . an,k−1 bn an,k+1 . . . an,n

Proof. From Theorem 2.50, we know that rank A = n ⇐⇒ det A 6= 0 and so, that there exists a
unique solution to the linear system Ax = b, see Lemma 2.43. Recall that the vectors ek ∈ Rn
with 1 ≤ k ≤ n are the unit vectors, and that x = (x1 , x2 , . . . , xn )T is the column vector
representing the solution. We now define the matrices

Xk = e1 e2 . . . ek−1 x ek+1 . . . en .

By computing the row echelon form of Xk we see that det(Xk ) = xk for all k = 1, . . . , n. If we
denote the columns of A by ck , i.e., A = (c1 , . . . , cn ), and recall that Aek = ck , we obtain

A · Xk = Ae1 Ae2 . . . Aek−1 Ax Aek+1 . . . Aen

= c1 c2 . . . ck−1 Ax ck+1 . . . cn .
86

Since Ax = b, we see that A · Xk = Ak with Ak given in the theorem. Using Lemma 2.56, we
see that
det(Ak ) = det(A · Xk ) = det(A) · det(Xk ) = xk · det A,
which proves the result.

Let us see some examples.

Example 2.65. We want to solve Ax = b, where

! !
1 3 7
A= and b= .
2 7 16

We see that det A = 1, hence Cramer’s rule, see Theorem 2.64, implies that
!
7 3
x1 = det = 49 − 48 = 1.
16 7

and !
1 7
x2 = det = 16 − 14 = 2.
2 16

The solution to this linear system is therefore x = (1, 2)T .

Example 2.66. Consider the matrix

 
8 1 3
A = 7 0 11,
 
5 5 0

for which we have det A = −280. Then, due to Cramer’s rule, we compute the following solution,
denoted by x = (x1 , x2 , x3 )T , for the RHS b = (1, 1, 1)T ,
 
1 1 3
−1 29
x1 = det 1 0 11 =
 
280 280
1 5 0
 
8 1 3
−1 27
x2 = det 7 1 11 =
 
280 280
5 1 0
 
8 1 1
−1 7
x3 = det 7 0 1 = .
 
280 280
5 5 1
87

2.6 Inverse matrices

We now combine the findings of the last sections to give an explicit formula for the inverse
matrix, if it exists. This is particularly useful to compute solutions of a linear system Ax = b if
the RHS b is a priori not known. Moreover, the inverse is handy when we want to work with a
(unique) solution theoretically.
For completeness, let us repeat the definition.

Definition 2.67. Let A ∈ Rn×n and assume that there exists some A0 ∈ Rn×n with the
property that
A · A0 = A0 · A = I n .
Then, we say that A is invertible or regular, and we write A−1 := A0 to denote the inverse.

Note that a matrix must be a square matrix for both matrix products above being defined.
That’s why we define the inverse only for A ∈ Rn×n .

There are two possible interpretations of the inverse:

1. We consider matrices as kind of numbers and, for a matrix A, we look for another matrix
that is the inverse element of A for matrix multiplication, see the field axioms (Axiom 1).

2. We consider matrices as mappings A : Rn → Rn by the matrix-vector product and look for

the inverse mapping, see Definition 1.13, which should also be given by a matrix.

Both interpretations are equivalent.

The inverse of a matrix can be used to solve linear systems as follows:

Since the inverse matrix A−1 satisfies A−1 A = In , we have that

Ax = b ⇐⇒ x = In x = A−1 Ax = A−1 b.

Therefore, the solution to the linear system Ax = b is given by x = A−1 b (matrix-vector

multiplication), whenever A is regular.

From Theorem 2.50 we already know that

det A 6= 0 ⇐⇒ rank A = n

for a given matrix A ∈ Rn×n . We now combine that with Lemma 2.43 to show that A is bijective,
and hence invertible, in this case.
Let us state that as a lemma.

Lemma 2.68. Let A ∈ Rn×n be a square matrix. Then,

det A 6= 0 ⇐⇒ A is invertible,

and in this case

1
det(A−1 ) = .
det A
88

Proof. For the equivalence, note that rank(A) = n if and only if the linear system Ax = b has a
unique solution for all b ∈ Rn , see Lemma 2.43. In other words, the mapping A : Rn → Rn (i.e.,
A maps vectors to vectors) is injective (“For every b ∈ Rn there is at most one x with Ax = b.”)
and surjective (“For every b ∈ Rn there is some x with Ax = b.”). Hence, A is bijective, and
therefore invertible (aka. regular), see Theorem 1.16.
From Lemma 2.56 we know that

det(A) · det(A−1 ) = det(A · A−1 ) = det(In ) = 1,

whenever A is regular, which proves the claim.

Example 2.69. Note that, if A is regular, then A−1 exists and is also regular. Hence, the
inverse of the inverse exists, and fulfills (A−1 )−1 = A. (Verify yourself!)

Note that the inverse of the product of matrices is the product of the inverses, but
we have to change the order (as for the transpose).

Lemma 2.70. Let A, B ∈ Rn×n be regular matrices. Then,

(A · B)−1 = B −1 · A−1 .

In particular, AB is also regular.

Proof. First note that det(AB) = det(A) det(B) 6= 0 due to Lemma 2.56 and Lemma 2.68,
which shows that AB is regular. If we note that (AB)(B −1 A−1 ) = A(BB −1 )A−1 = AA−1 = In ,
we see- that (B −1 A−1 ) is the inverse of AB.

For the computation of the inverse A−1 , denote the columns of A−1 by c1 , . . . , cn ∈ Rn , i.e.,

A−1 = c1 , c2 , . . . , cn .

We then have A−1 ek = ck , where ek is the k-th unit vector. (We used already earlier that
matrix-vector multiplication with a unit vector gives a column of the matrix.) Using the above
equivalence, with x = ck and b = ek , we see that

A−1 ek = ck ⇐⇒ Ack = ek .

That is, we can calculate ck , i.e., the k-th column of A−1 , by solving the linear system Ax = ek .

We can now use Cramer’s rule, together with the Laplace expansion, to compute the inverse
of A. Recall that the cofactor matrix of A is defined by C = (Cij ) ∈ Rn×n , where

Cij = (−1)i+j Mij

and Mij is the (i, j)-minor, i.e., the determinant of the matrix that is obtained by deleting the
i-th row and the j-th column, see Definition 2.60.
89

The following theorem shows that one can compute the inverse of a matrix as the transpose of
its cofactor matrix divided by the determinant.

Theorem 2.71. Let A ∈ Rn×n with det(A) 6= 0, and let C = (Cij ) ∈ Rn×n be the cofactor
matrix of A. Then,
1
A−1 = CT ,
det A
i.e.,
Cji
(A−1 )ij = ,
det A
where (A−1 )ij denotes the ij-th entry of A−1 .

Proof. Fix some j = 1, . . . , n. The discussion above shows that the j-th column of A−1 can be
computed by solving the linear system Ax = ej . Cramer’s rule, see Theorem 2.64, yields that
the i-th entry of the solution x = (x1 , . . . , xn ) to this linear system is given by
det(Ai )
xi = ,
det(A)
where
a1,1 ... a1,i−1 0 a1,i+1 ... a1,n
 
 .. .. .. .. 
 . . 0
. . 
 
a . . . aj−1,i−1 0 aj−1,i+1 . . . aj−1,n 
 j−1,1 
Ai = det  aj,1 . . . aj,i−1 1 aj,i+1 . . . aj,n .
 
 
aj+1,1 . . . aj+1,i−1 0 aj+1,i+1 . . . aj+1,n 
 .. .. .. ..
 

 . . . . 
an,1 . . . an,i−1 0 an,i+1 ... an,n
We just replaced the i-th column of A by ej . Now we use Laplace expansion, see Theorem 2.61,
with expansion along the i-th column. (Note that in the statement of the Laplace expansion we
used the j-th column. Therefore, we need to be careful with the indices.) We see that the only
non-zero entry in the i-th column of Ai is the 1 in the j-th row. We obtain (for fixed i)

det(Ai ) = (−1)i+j Mji = Cji ,

i.e., the determinant of Ai is the (j, i)-cofactor of A. (Note the reversed indices.) This finally
shows that
Cji
(A−1 )ij = xi = .
det A

The systematic procedure to compute the inverse is:

1. Compute the matrix of minors M = (Mij )ni,j=1 of A.

2. Compute the cofactor matrix C = (Cij )ni,j=1 with Cij = (−1)i+j Mij .

3. Transpose and dividing by determinant to obtain

1
A−1 = CT .
det A
90

Let us see some examples.

Example 2.72. Consider the matrix

!
1 2
A= ,
3 4

which satisfies det A = −2.

We start by computing the matrix of minors. Note that deleting a row and a column of A makes
it to a 1 × 1 matrix, i.e., a number, and its determinant is just that number. Therefore, we see
that the matrix of minors is given by
!
4 3
M = .
2 1

For example, we obtain M21 by deleting the second row and the first column of A, and compute
the determinant M21 = det(2) = 2.
To compute the cofactor matrix C, we have to multiply each entry Mij by (−1)i+j , i.e., we
multiply with −1 if i + j is odd, and leave all other entries unchanged. We obtain
!
4 −3
C = .
−2 1

We therefore obtain that

! !
1 1 4 −2 −2 1
A−1 = CT = − = .
det A 2 −3 1 3/2 −1/2

The result can (and should) be checked by computing

! ! !
−1 1 2 −2 1 1 0
AA = = = I2 .
3 4 3/2 −1/2 0 1

The inverse matrix can now be used to calculate the unique solution to Ax = b for any RHS b.
We obtain ! ! !
−1 −2 1 b1 b2 − 2b1
x = A b = = 3b1 −b2 .
3/2 −1/2 b2 2
3−2
For example, the solution to Ax = ( 31 ), i.e., we have b = ( 31 ), is given by x = 3·3−1 = ( 14 ).
2

Example 2.73. Now we compute the inverse of

 
1 0 2
A = 4 1 8  ,
 
0 1 1

by using Cramer’s rule. First, we have that det A = 1. The matrix of minors is found to be
 
−7 4 4
M = −2 1 1 ,
 
−2 0 1
91

where, e.g., M11 = det( 11 81 ) = −7 and M32 = det( 14 28 ) = 0. We obtain the cofactor matrix
 
−7 −4 4
C = 2 1 −1 ,
 
−2 0 1

and therefore the inverse

 
−7 2 −2
1
A−1 = CT = −4 1 0 .
 
det A
4 −1 1

Finally, we want to discuss the Gauss-Jordan algorithm to compute the inverse. This method
is very similar to the Gaussian elimination, and is sometimes handy, at least for small matrices.
(I do not suggest to use this method, as it is prone to error, but others think differently, and so
I state it for completeness.)
For this recall that we can apply the Gaussian elimination to more vectors at once, see Re-
mark 2.45, which can be used to solve the linear system for different RHS’s simultaneously.
From the discussion above, we know that we actually need to solve the linear systems Ax = ej
for all j = 1, . . . , n to obtain all columns of A−1 . Hence, we can compute all columns of A−1 at
once by computing the reduced row echelon form of

a11 . . . a1n 1 0
 
 .. .
.. . .. 
(A|In ) =  . .
an1 . . . ann 0 1

If A is regular, i.e., rank A = n, we know that the reduced row echelon form of A is the identity
matrix. Thus, by using Gaussian elimination, we are able to compute

(A|I) −→ (I|A−1 )

by using only row operations. This shows the following result.

Theorem 2.74. Let A ∈ Rn×n be an invertible matrix, i.e., det(A) 6= 0.

Then, the reduced row echelon form of

a11 . . . a1n 1 0
 
 .. .. .. 
 . . . 
an1 . . . ann 0 1

has the form

1 0 a011 . . . a01n
 
 .. .. .. 
 . . . 
0 0 0
1 an1 . . . ann
and it holds that A−1 = (a0ij )ni,j=1 .
92

Let us consider again the examples from above.

Example 2.75. We want to compute the inverse of

!
1 2
A= .
3 4

We apply the Gauss-Jordan algorithm, which means we have to transform the following matrix
into its echelon form by
!
1 2 1 0
3 4 0 1
!
1 2 1 0
II − 3I
−−−−−→ 0 −2 −3 1
!
1 0 −2 1
I + II
−−−−→ 0 −2 −3 1
!
1 0 −2 1
−1/2II .
−−−−−→ 0 1 3/2 −1/2

The result clearly agrees with the one from Example 2.72.

Example 2.76. We want to compute again the inverse of

 
1 0 2
A = 4 1 8.
 
0 1 1

We use the Gauss-Jordan algorithm

 
1 0 2 1 0 0
4 1 8 0 1 0
 
0 1 1 0 0 1
 
1 0 2 1 0 0
II − 4I 0 1 0 −4 1 0
 
−−−−−→
0 1 1 0 0 1
 
1 0 2 1 0 0
III − II 0 1 0 −4 1 0
 
−−−−−−→
0 0 1 4 −1 1
 
1 0 0 −7 2 −2
I − 2II 0 1 0 −4 1 0
 
−−−−−→
0 0 1 4 −1 1

−7 2 −2
to see A−1 = −4 1 0 , which is the same result as in Example 2.73.
4 −1 1
93

3 Sequences and series

This section is dedicated to the specification of the idea of limiting processes. It forms one of the
central ideas of mathematical analysis and defines the basis for essential concepts like continuity,
differentiability, integration etc.
A motivational example is the infinite sum of the numbers 2−n , n = 0, 1, . . . , i.e.,
1 1 1 1
1+ + + + + ....
2 4 8 16
We want to give an exact mathematical meaning for an infinite addition of terms which leads
us to the definition of a limit and the concept of convergence of sequences:
That is, we have a sequence of numbers, say an , which is given by some rule, e.g., by the recursion
an+1 := f (an ) for some fixed function f , and we want to know what happens if “n goes to ∞”.
That is, we want to find out what happens if we repeat such a process infinitely often.
To study such questions, we start again with precise definitions of the objects we are considering.

Definition 3.1 (Sequence). Let M 6= ∅ be an arbitrary set, and I ⊂ Z.

An (infinite) sequence in M is a mapping a : I → M .
With the notation an := a(n), we can write the sequence as (an )n∈I .
The range of a sequence (an )n∈I is given by {an : n ∈ N}.
The domain I of a sequence is called the index set of the sequence.
We write (an )n∈I ⊂ M to say that ∀n ∈ I : an ∈ M ,
and we write M I for the set of all sequences in M with index set I.
In most cases, we consider I = N or I = {K, K + 1, . . . } for some K ∈ Z.
In this case, we write (an )n∈I = (an )∞
n=K = (an )n≥K = (aK , aK+1 , aK+2 , . . . ).

If the index set is clear, we may just write (an ) for (an )n∈I .

As one considers a sequence as a mapping defined on the index set I ⊂ Z, one wants to express
that we are dealing with a list of elements in a particular order. Thus, we clearly distinguish
between the sequence (an ) and its range {an : n ∈ I}.
Note that two sequences (an )n∈I and (bn )n∈I are equal if and only if
∀n ∈ I : an = bn ,
in this case we write (an )n∈I = (bn )n∈I .
In the special cases M = R or M = C we say that (an )n∈I is a real-valued or complex-valued
sequence, respectively. We will focus on real-valued sequences in this lecture. However, most
statements also hold for complex-valued sequences. We comment on the differences when needed.

To define a sequence, the most common way is to use an explicit formula, for instance
1
an = 2n or bn = 1 + ,
n
or by a recursion, i.e., we give one (or more) starting value(s) and a rule how to calculate a
new term from previous terms. For the above examples we could write
a1 = 2, an+1 = 2an
94

and
n(n + 2)
b1 = 2, bn+1 = bn · .
(n + 1)2
(Verify these formulas!)

Example 3.2 (Fibonacci sequence). One of the most famous sequences, which appears in several
areas of natural science, is the so called Fibonacci sequence. Here, the recursion depends on
more than just the last value. The sequence (Fn )n∈N is defined by

F1 = 1, F2 = 1, and Fn = Fn−1 + Fn−2 for n ≥ 3.

The first values of this sequence are 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . . √
1+ 5
It is an interesting phenomenon that the quotients Fn+1 /Fn converge to the golden ratio 2
(See e.g. Wikipedia for its importance). We will see later how to prove such statements.

Example 3.3 (Infinite sums aka. series). If we want to add infinitely many numbers, say all the
numbers an , n ∈ N, then we can consider the new sequence sn = nk=1 ak , which can be given
P

recursively by sn = sn−1 + an , and we would like to know if sn approaches a certain number

when n goes to infinity. These special sequences are called series, and we come back to this in
Section 3.6.

3.1 Convergence of sequences

As mentioned above, the concept of convergence is central to mathematical analysis. Intuitively,

it states that the terms of the sequence (an )n∈N approach a limit with growing index n. In what
follows, we will consider only real- or complex-valued sequences with index set I = N.

(−1)n
Figure 19: convergence of sequence 1 + n

To define what is means to converge to something, we need the notion of a neighborhood.

Definition 3.4 (Neighborhood). Let M = R or M = C, a ∈ M and ε > 0.

We define the ε-neighborhood of a in M by

Uε (a) := {x ∈ M : |x − a| < ε}.

It is mostly clear from the context if we consider real or complex neighborhoods. Note that,
for M = R and a ∈ R the ε-neighborhood Uε (a) is just the open interval (a − ε, a + ε). In the
complex case, i.e., M = C and a ∈ C, Uε (a) is the disc of radius ε around a in the complex
plane.

Remark 3.5. Note already now, that this definition is quite flexible if we switch to more
complex situations. That is, we can define neighborhoods whenever we have a measure for the
’distance’ on the set M . We therefore use this notation to get used to it.

We now come to the formal definition of convergence to a limit a.

Definition 3.6 (Convergence). Let (an )n∈N ⊂ C and a ∈ C.

We say that the sequence (an )n∈N converges to a if and only if

∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 : |an − a| < ε,

or, equivalently,
∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 : an ∈ Uε (a),
or, equivalently,
∀ε > 0 ∃n0 ∈ N : (an )∞
n=n0 ⊂ Uε (a).

In this case we call a the limit of the sequence and write

a = lim an or an −−−→ a or simply an → a.

n→∞ n→∞

(an )n∈N is called convergent, or we say that the limit of (an )n∈N exists, if there exists
some a ∈ C such that an → a, otherwise (an )n∈N is called divergent.

The statement
∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 : |an − a| < ε
can be equivalently phrased as:
For all ε > 0 we have |an − a| < ε for...

• ... all large enough n.

• ... all but finitely many n.

• ... almost all n.

For the second wording, note that there must be a largest of the finitely many exceptions (i.e.,
the n for which |an − a| ≥ ε). One may choose n0 just one larger than this number.

Remark 3.7. Note that the limit does not depend on the first terms of a sequence. So, in
particular, if (bn ) is a sequence with bn = an for almost all n, then an → a ⇐⇒ bn → a.
96

Let us consider some examples.

Example 3.8. Consider the sequence (an )n∈N with an = n1 . For each ε > 0 we can find some
n0 ∈ N such that n10 < ε. This is the Archimedean property.
Since n1 ≤ n10 for n ≥ n0 , we obtain |an − 0| = n1 ≤ n10 < ε, and hence an → 0.

Example 3.9. Consider the sequence (an )n∈N with an = (−1)n . This sequence is divergent. For
a proof we assume the opposite, i.e., that (an ) converges to some a ∈ R. Now, by the definition
of convergence, we have that there exists some large enough n0 such that an ∈ U1/2 (a) for all
n ≥ n0 . (Note that the 1/2 is arbitrary here. Every ε < 1 would work.) However, we always
have |an+1 − an | > 1, and therefore, if an ∈ U1/2 (a), we have an+1 ∈
/ U1/2 (a). In particular, there
cannot be an n0 such that an ∈ U1/2 (a) for all n ≥ n0 : A contradiction. Hence, (an ) cannot be
a convergent sequence.

We now turn to some special properties of sequences or, as one might say, we just give names
to sequences with special properties. Afterwards we analyse the relation of these properties.

Definition 3.10 (Null sequence). Let (an )n∈N be a real sequence such that

lim an = 0.
n→∞

Then we call (an )n∈N a null sequence.

Example 3.11. The sequences ( n1 )n∈N and (2−n )n∈N are null sequences.

We will frequently use the trivial observation that

an → a ⇐⇒ (an − a) → 0,
i.e., (an ) converges to a if and only if (an − a) is a null sequence. This follows immediately from
the definition of the limit.
Example 3.12. The sequence (an )n∈N with an = 1 + n1 is not a null sequence.
1
Since we know that (1/n) is a null sequence, we can set a = 1 above, and see that an −a = n → 0.
Hence, an → 1.

We now turn to a second property of sequences: boundedness.

Definition 3.13. Let (an )n∈N ⊂ C be a sequence. We call the sequence bounded if

∃R > 0 ∀n ∈ N : |an | ≤ R.

Moreover, if (an )n∈N ⊂ R, we call the sequence bounded from above if and only if

∃C ∈ R ∀n ∈ N : an ≤ C,

and bounded from below if and only if

∃c ∈ R ∀n ∈ N : an ≥ c.

In other words a sequence is bounded (from above/below), if and only if its range is a bounded
set (from above/below).
97

Example 3.14. Clearly, the sequences ((−1)n )n∈N and ( 42

n )n∈N are bounded (by 1 and 42,
n 42

resp.). We also have that (−1) · n n∈N is bounded (by 42). Actually, we easily see from the
definition that the (term-wise) product of two bounded sequences is bounded.
The triangle inequality shows that the sum of two bounded sequences is also bounded.

Next we observe the following property of convergent sequences.

Theorem 3.15. Let (an )n∈N be a convergent sequence. Then (an )n∈N is bounded.

Proof. Let a ∈ C be the limit of (an )n∈N , i.e.

∀ε > 0 ∃n0 ∈ N ∀n ≥ n0 : |a − an | < ε.

Fix ε0 = 1. Then we can find a N such that ∀n ≥ N : |an − a| < 1. The triangle inequality
yields
|an | ≤ |an − a| + |a| < 1 + |a|,
for all n ≥ N . For the remaining elements {a1 , . . . aN −1 } we simply take the maximum, c1 =
max{|a1 |, |a2 |, . . . , |aN −1 |}. The maximum of a finite amount of real numbers always exists.
Hence
∀n ∈ N : |an | ≤ max{c1 , |a| + 1}.

Example 3.16. The sequence given by bn = (−1)n is bounded by 1 but not convergent. So the
other direction of the above theorem does not hold in general.

We finally introduce the terminology of definite divergence of a real-valued sequence. This

concept is used to give a formal definition of the idea that a sequence “converges to infinity”.

Definition 3.17. Let (an )n∈N ⊂ R. The sequence (an )n∈N tends to ∞ (= +∞) if

∀A > 0 ∃n0 ∈ N ∀n ≥ n0 : an > A.

We write an → ∞ or lim an = ∞, and call ∞ the improper limit of (an ).

n→∞
The tendency to −∞ is defined analogously with an < −A.
If the sequence (an ) tends to ±∞, it is called definitely divergent.

Note that definitely divergent sequences are necessarily unbounded.

Moreover, we do not have such a concept for complex-valued sequences, as we do not have an
order on C.
√
Example 3.18. We have limn→∞ n = ∞ and limn→∞ (−n2 ) = −∞.
98

3.2 Calculation rules for limits

We now study how to determine the limit of (complicated) sequences. This always follows the
same procedure: Either we already know the limit of the sequence under consideration, or one
has to split up the sequence into easier parts that can be handled, or splitted again.
To effectively do this, one needs a sufficiently large collection of known limits, and we will
present the most important below. Together with some rules of calculation, this will allow to
compute also the limit of quite complicated sequences.

Let us start with a lemma that shows how to verify that a sequence is a null sequence by
comparison with another null sequence. The proof is left to the reader.

Lemma 3.19. If (an )n∈N ⊂ C is a null sequence and (bn )n∈N ⊂ C is a sequence with

|bn | ≤ C · |an |c for some c, C > 0 and almost all n,

then (bn ) is also a null sequence.

From this lemma we directly see that the sequences ( nCc )n∈N for fixed c, C > 0 are null sequences.

Let us now consider other important “building blocks”, i.e., limits that may be considered known
from now on, together with the corresponding proofs.

The first example is concerned with powers of arbitrary complex bases.

Example 3.20. Let z ∈ C with |z| < 1. Then

lim z n = 0.
n→∞

1
Proof. For z = 0 the result is clear. For z 6= 0 we set x > 0 such that |z| = 1+x . With the
n
Bernoulli inequality or the binomial formula we get (1 + x) ≥ 1 + nx and obtain
1 1 11
|z n | = n
≤ ≤ .
(1 + x) 1 + nx xn

Since ( n1 ) is a null sequence, we get from Lemma 3.19 that z n is a null sequence as well.

As the above sequence is not bounded for |z| > 1 (Check yourself!), we obtain from Theorem 3.15
that it cannot be convergent, i.e., (z n ) is divergent if |z| > 1.
In the case |z| = 1, we cannot say in general if the sequence is convergent or not: Although we
have the constant (and therefore clearly convergent sequence) for z = 1, the sequence (z n ) does
not converge for other z, like z = −1 or z = eiπ/2 . We leave out the details here.

The next example shows what happens if we replace the n-th power by a n-th root.

Example 3.21. Let a > 0. Then, we have

√
n
lim a = 1.
n→∞
√
(For a = 0 we clearly have n
a → 0.)
99

√
Proof. Let us first consider a ≥ 1. We will show that xn := n a − 1 satisfies xn → 0, which
proves the statement. Since a ≥ 1 we have xn ≥ 0. By Bernoulli’s inequality (1 + x)n ≥ 1 + nx,
which holds for x ≥ −1, we obtain
√ n
a = n a = (1 + xn )n ≥ 1 + nxn .

This implies
a−1
|xn | = xn ≤ .
n
Since ( n1 ) is a null sequence we get that (xn ) is also a null sequence.
√
For a < 1, let b = 1/a > 1 and consider xn = n b − 1. From the above we know that (xn ) is a
√
(non-negative) null sequence. Moreover, we have n a ≤ 1 and therefore
√ √ √ √ n

| n a − 1| = 1 − n a = n a b − 1 ≤ 1 · xn .

Again, since (xn ) converges to zero, this proves the statement.

The next important limit shows that the constant a in the example above may even be replaced
by an unbounded sequence.
Example 3.22. √
n
lim n = 1.
n→∞
√
Proof. Let xn := n
n − 1, so we have to show that xn → 0. From Lemma 1.56 with k = 2 we
obtain !
n n 2 n(n − 1) 2
n = (1 + xn ) ≥ 1 + x =1+ xn .
2 n 2
This implies r
2
|xn | = xn ≤ .
n
√
Since ( √1n ) is a null sequence we get that xn is a null sequence, which proves n n → 1.

We state one last example before we turn to general rules for the calculation with limits. This
example could be phrased as “exponential growth is faster than polynomial growth”,
and is one of the basic arguments when dealing with limits.
Example 3.23. Let z ∈ C with |z| > 1 and k ∈ Z. Then,

nk
lim = 0.
n→∞ z n

Proof. The proof of this limit combines all the ideas from the preceeding examples.
First, note that the limit is already clear from the above if k ≤ 0. (Why?)
For k > 0, set x := |z| − 1 with x > 0, and assume that n > 2k. (This is possible, since we are
only interested in large n.) From Lemma 1.56 we obtain
!
n n n n · (n − 1) · · · (n − k) k+1
|z| = (1 + x) > xk+1 = x .
k+1 (k + 1)!
100

From n > 2k, we obtain that n − k > n/2 (or more general n − k + ` > n/2 for all ` ∈ {0, . . . , k}).
Therefore,
n · (n − 1) · · · (n − k) k+1 (n/2)k+1 k+1
|z|n = (1 + x)n > x > x .
(k + 1)! (k + 1)!
It follows
nk nk 2k+1 (k + 1)! nk 1
n = < =: K · ,

z |z|n x k+1 n k+1 n
k
where K is a constant (i.e., does not depend on n). As ( n1 ) is a null sequence, we get that ( nz n )
is also a null sequence.

The following calculation rules for convergent sequences and their limits will be very useful if
we want to compute the limits of more complicated sequences.

Theorem 3.24. Let (an )n∈N , (bn )n∈N be convergent sequences, and let λ ∈ C.
Moreover, let a := limn→∞ an and b := limn→∞ bn . Then, we have

(i) limn→∞ (an + bn ) = a + b

(ii) limn→∞ (λ · an ) = λ · a

(iii) limn→∞ (an · bn ) = a · b

an
(iv) If b 6= 0, then limn→∞ bn = ab .

Proof. For the first statement we need to show that ∀ε > 0∃n0 ∈ N : |an + bn − (a + b)| ≤ ε.
Therefore let ε > 0 be arbitrary but fixed. Using the triangle inequality we get that

|an + bn − a − b| ≤ |an − a| + |bn − b|.

Since (an )∞ ∞
n=1 and (bn )n=1 are convergent sequences we can find N ∈ N such that for all n ≥ N :

ε ε
|an − a| ≤ and |bn − b| ≤ .
2 2
Since this holds for arbitrary ε we get the result.
For the second statement, we use that

|λan − λa| = |λ| · |an − a|.

As (an )∞
n=1 is convergent we get the result.
For the third statement, note that

|an bn − ab| ≤ |an bn − abn | + |abn − ab| = |bn | · |an − a| + |a| · |bn − b|.

Since (bn )∞
n=1 is convergent it is bounded, so ∃C ∈ R∀n ∈ N : |bn | ≤ C. Hence

|an bn − ab| ≤ C|an − a| + a|bn − b|.

Therefore, the desired statement follows from the convergence of (an ) and (bn ).
101

For the last statement, we only need to prove that limn→∞ b1n = 1b , if b 6= 0. The more general
statement then follows together with part (iii).
Let us assume w.l.o.g. (i.e., without loss of generality) that b > 0. (Otherwise, consider the
sequence (−bn ).) We obtain that ∃N0 ∈ N ∀n ≥ N0 : bn > 2b . (Why?) Hence,
b − bn

1 1 2
b − b = b · b < b2 |b − bn |

n n

for n ≥ N0 . We obtain the result from bn → b.

Remark 3.25. For a complex-valued sequence (zn )n∈N , convergence to the complex number
z = x + iy ∈ C, i.e., zn → z, is equivalent to the convergence of the real and imaginary parts of
zn to x and y, respectively.
The first direction, i.e., that
Re zn → x and Im zn → y.
implies that zn → x + iy, follows from Theorem 3.24(i).
For the reverse statement, choose ε > 0 and n0 such that |zn − z| < ε for n ≥ n0 holds. Then,
for n ≥ n0 it holds that
| Re zn − Re z| = | Re(zn − z)| ≤ |zn − z| < ε.
As this holds for all ε > 0, this proves the convergence of the real part of zn to Re z = x. The
convergence of the imaginary part can be proven analogously. Consequently, we can say that
zn → z ⇐⇒ Re zn → Re z and Im zn → Im z.

With the help of calculation rules and the knowledge about limits we can compute more sophis-
ticated limits.
Example 3.26.
√
3 − 7 n + 42n 1 1
lim = 3 · lim − 7 lim √ + lim 42 = 0 − 0 + 42 = 42.
n→∞ n n→∞ n n→∞ n n→∞

Example 3.27. Let k ∈ Z. Then, we have

√
n
lim nk = 1.
n→∞

Proof. For k = 0 this is obvious (and for k = 1 we have shown that already in Example 3.22).
Moreover, we have for arbitrary k ∈ Z that
√
n
√
n
√
n √ √n
√
n
lim nk+1 = lim nk n = lim nk lim n n = lim nk · 1 = lim nk ,
n→∞ n→∞ n→∞ n→∞ n→∞ n→∞

as well as
√
n
√
n 1 √n 1 √n
lim nk−1 = lim nk lim √
n
= lim nk · √
n
= lim nk ,
n→∞ n→∞ n→∞ n n→∞ limn→∞ n n→∞
where we used Theorem 3.24(iv) for the second equality. By induction on k in both directions
(with induction basis k = 0), we get that the limit is the same for all k, and therefore equals 1.
102

The next result gives another tool for the calculation of difficult limits. This one is helpful
when the sequence under consideration can be bounded from above and below by sequences
that converge to the same limit.

Theorem 3.28 (Sandwich rule). Let (an )n∈N and (cn )n∈N be convergent real-valued se-
quences and let (bn )n∈N be a sequence such that

an ≤ bn ≤ cn for all n ∈ N.

If additionally,
a := lim an = lim cn ,
n→∞ n→∞

then (bn )n∈N is convergent with

lim bn = a.
n→∞

Proof. Let ε > 0 be arbitrary. We have a look at

(
a − bn , if bn ≤ a
|a − bn | =
bn − a, otherwise.

Since (an )n∈N and (cn )n∈N do both converge to a, we have that there is some n0 such that
bn − a ≤ cn − a ≤ ε and a − bn ≤ a − an ≤ ε for all n ≥ n0 . Thus, |bn − a| < ε for n ≥ n0 . As
this holds for all ε, we obtain limn→∞ bn = a.

Remark 3.29. Note that, as we consider limits, the assumption an ≤ bn ≤ cn in the sandwich
rule is not essential for the first terms and may be replaced by ∃N ∈ N ∀n ≥ N : an ≤ bn ≤ cn .

The following example gives a prominent application of the sandwich rule.

Example 3.30. For a, b ≥ 0 we have
√
n
lim an + bn = max{a, b}.
n→∞

Proof. W.l.o.g. assume that b ≥ a. We have that

√
n
√n
√n
√n
b = bn ≤ an + bn ≤ 2bn = 2 · b.
√ √
As n 2 goes to 1, the sandwich rule (with an = b and cn = n 2 · b) yields the result.

In other cases, we may even do not know the precise values of the terms of a sequence since
the explicit or recursive formula for them is to complicated. Also in such cases we can possibly
apply the sandwich rule to obtain the limit.
n
Example 3.31. Consider the sequence bn = (1+sin(n))
n2n . Since | sin(x)| ≤ 1 for all x ∈ R, we
(1+sin(n))n 2n 1
have 0 ≤ n2n ≤ n2n = n for all n ∈ N. Using the sandwich rule and that the sequences
on both sides are null sequences, we obtain bn → 0.
(Note that we did not even need the precise values of sin(n).)
103

We end this section with calculation rules for definitely divergent sequences:

an → ∞, bn → ∞ =⇒ an + bn → ∞ and an · bn → ∞
an → ∞, bn → b =⇒ an + bn → ∞
α
an → ∞, α ∈ R =⇒ →0
an
an → ∞, α > 0 =⇒ α · an → ∞
an → ∞, α < 0 =⇒ α · an → −∞

(Verify yourself!)
If an → ∞ and bn → ∞, no general rule can be given for (an − bn ) and abnn . Therefore, also
the limit of (an bn ) for an → 0 and bn → ∞ needs more care, and these limits do not have to
exist nor be definitely divergent, consider e.g. an = (−1)n /n and bn = n.
We will come back to the computation of such limits later.

3.3 Monotone sequences

We saw that it is essential to verify that sequences are convergent for applying the rules above.
Here, we show that a large class of sequences, namely all monotone and bounded sequences, are
convergent. This is an essential insight.
Since we want to assume that the terms of a sequence are (monotonically) ordered, we need an
order, and therefore only work with real-valued sequences here.

Definition 3.32 (Monotone sequences). A real-valued sequence (an )n∈N is called

• increasing if and only if

∀n ∈ N : an+1 > an ,

• non-decreasing if and only if

∀n ∈ N : an+1 ≥ an ,

• decreasing if and only if

∀n ∈ N : an+1 < an ,

• non-increasing if and only if

∀n ∈ N : an+1 ≤ an .

Moreover, we say that a sequence is monotone if it is non-increasing or non-decreasing,

and strictly monotone if it is either increasing or decreasing.

Note that a sequence that is non-decreasing and non-increasing at the same time, must be a
constant sequence.
104

Many of the sequences, that we have discussed so far, were strictly monotone. This holds clearly
for the sequences (n)n∈N , ( n1 )n∈N or more general (nk )n∈N for k ∈ Z \ {0}, as well as the sequence
(an )n∈N for a ∈ (0, 1) or a > 1. However, for some sequences this is not so easy to see.
1
Example 3.33. Let us have a look at the sequence given by an := n2 −n+1 . This is a decreasing
1 1 1 1
sequence, since an+1 = (n+1)2 −(n+1)+1 = n2 +2n+1−n−1+1) = n2 +n+1 < n2 −n+1 = an .

In some cases, a helpful trick is to consider the quotients of consecutive terms of a sequence
and show that they are bounded (from above or below) by one. That is, e.g., a sequence is
non-decreasing if
an+1
≥1 for all n ∈ N,
an
and non-increasing if
an+1
≤1 for all n ∈ N.
an

Example 3.34. One interesting example, that we will study more detailed soon, is the sequence
given by
1 n n+1 n

an := 1 + =
n n
We consider quotients of successive terms and observe that
n+1
n+2 n+1 n+1
an+1 (n + 2)n n+1 1 n+1

n+1
= n = = 1− .
an n+1 (n + 1)2 n (n + 1)2 n
n
1
The Bernoulli inequality (1 + x)n+1 ≥ 1 + (n + 1)x, with x = − (n+1)2 ≥ −1, yields

an+1 1 n+1

≥ 1− = 1.
an n+1 n
Hence (an ) is a non-decreasing sequence. (If we note that Bernoulli’s inequality is a strict
inequality for x > −1 with x 6= 0, we even obtain that (an ) is increasing.)

The following result shows that monotonicity is a very helpful property as we only have to know
if a sequence is bounded to verify whether it is convergent or not. Note that boundedness of a
sequence is usually much easier to show.

Theorem 3.35 (Monotonicity principle). Let (an )n∈N ⊂ R be a monotone sequence. Then,

(an ) is convergent ⇐⇒ (an ) is bounded.

Moreover, every monotone and unbounded sequence is definitely divergent.

In particular, if (an )n∈N is non-decreasing (or increasing), then

lim an = sup{an : n ∈ N} =: sup(an ),

n→∞

and if (an )n∈N is non-increasing (or decreasing), then

lim an = inf{an : n ∈ N} =: inf(an ).

n→∞

By the completeness axiom, supremum and infimum exist for every bounded subset of R.
105

Proof. We know from Theorem 3.15 that convergent sequences are bounded, which proves the
first direction of the statement.
For the second, let us consider the case where (an )n∈N is non-decreasing. The other case, where
(an )n∈N non-increasing, can be treated in the same way (replacing, in particular, sup by inf).
We now assume that (an )n∈N is bounded, and prove that it converges to a = sup{an : n ∈ N}.
Since (an )n∈N is bounded, we get that the range of (an )n∈N is a bounded set, which implies that
a = sup{an : n ∈ N} exists. Let now ε > 0 be arbitrary but fixed. Since a is the supremum
of the range of (an )n∈N we get (by definition) that a − ε is no upper bound for the sequence
(an )n∈N . Thus, there exists an n0 with an0 > a − ε. Since (an )n∈N in non-decreasing, the same
then holds for n ≥ n0 . That is, ∃n0 ∈ N ∀n ≥ n0 : a − ε < an . In addition, we clearly have
an ≤ a < a + ε. Hence, ∃n0 ∈ N ∀n ≥ n0 : |an − a| < ε. As this holds for all ε > 0, we obtain
that (an )n∈N converges to a.
The statement for unbounded sequences can be proven similarly, and is left for the reader.

With this theorem we see that there are convergent sequences where we do not have to know
the limit to verify that it exists. In some cases, we may even define numbers just as limits of
specific sequences, because we do not have another (explicit) description. One typical example
is Euler’s number:
Example 3.36 (Euler number). Consider sequences (an )n∈N , (bn )n∈N which are defined by
n n n+1 n+1
1 n+1 1 n+1

an := 1 + = and bn := 1 + = .
n n n n
Clearly an ≤ bn . Note that we have considered the sequence (an ) already in Example 3.34 and
showed that it is non-decreasing (which also implies that an ≥ a1 = 2 for all n ∈ N). It remains
to bound (an ) from above to show that it is convergent. For this, we show that (bn ) is bounded
from above. Together with an ≤ bn this implies also the boundedness of (an ). In fact, we show
that (bn ) is non-increasing, which implies that bn ≤ b1 = 4 for all n ∈ N. Again we compute
quotients of consecutive terms and obtain
n+1
n+1 !n+2
bn n (n + 1)2 n
= n+2 =
bn+1 n+2 n(n + 2) n+1
n+1
n+2
1 n 1 n

= 1+ ≥ 1+ = 1,
n(n + 2) n+1 n n+1
1
where we used Bernoulli’s inequality (1 + x)n+2 ≥ 1 + (n + 2)x, with x = n(n+2) ≥ −1.
bn+1
From this we have bn ≤ 1 and therefore that (bn ) is non-increasing. (In fact, it is decreasing.)
All in all
2 = a1 ≤ a2 ≤ . . . ≤ an ≤ . . . ≤ bn ≤ . . . ≤ b2 ≤ b1 = 4.
This shows that the limit of (an )n∈N exists and equals sup(an ), and we define this limit to be
Euler’s number. Moreover, the limit of (bn ) also exists, equals inf(bn ), and we show that this is
also equal to e. For this consider
n
1 1 an 4

bn − an = 1 + 1+ −1 = ≤ .
n n n n
This implies an ≤ bn ≤ an + n4 and the sandwich rule yields lim an = lim bn . (One may also use
that, with cn = (n + 1)/n, we have bn = an · cn . Then, cn → 1 yields the result.)
106

Therefore, we have the following characterizations for Euler’s number

1 n 1 n 1 n+1 1 n+1

e = lim 1 + = sup 1 + = lim 1 + = inf 1 + ,
n n n n
and we have e ≈ 2.7182818284590452353.... We will see in the following section, that e may also
be defined by the infinite sum ∞ 1
P
k=0 k! .

This example shows, in particular, that it may happen that a sequence is converging but we do
not know the precise value of its limit. In some cases, however, we are able to find the limit (or
at least some possible candidates) by alternative considerations. For example, if the sequence
(an )n∈N is convergent, we have
lim an = lim an+1 = lim an+2 = ...,
n→∞ n→∞ n→∞
which may just be seen as ignoring the first terms of a sequence. Such equations are particularly
useful for recursively defined sequences.
Example 3.37. Let x > 0. We consider the following recursively defined sequence:
1 x

a1 > 0 is arbitrary and an+1 := an + for all n ∈ N.
2 an
It is obvious that (an ) is a positive sequence, i.e., an > 0 for all n ∈ N. We now show that (an )
is decreasing for n ≥ 2, which implies, by the monotonicity principle, that (an ) is convergent.
(Note that, since we are only interested in limits, it is always ok to prove things only for large n.)
We obtain
1 x x

an+1 = an + ≤ an ⇐⇒ an + ≤ 2an ⇐⇒ x ≤ a2n
2 an an
√
for all n ∈ N. This is equivalent to an ≥ x, because the an are positive. We now show that
√ √
for all n ≥ 2 it holds that an ≥ x, or equivalently: an+1 ≥ x for all n ∈ N:
√ 1 x √ √

an+1 ≥ x ⇐⇒ an + ≥ x ⇐⇒ a2n + x ≥ 2 xan
2 a
2 √ n
⇐⇒ an − 2 xan + x ≥ 0
√
⇐⇒ (an − x)2 ≥ 0.
Since the last statement is clearly true, we finally obtain that (an )n∈N is decreasing for n ≥ 2,
and therefore convergent.
Moreover, we can determine the limit by exploiting the fact that the limits of (an )n∈N and
(an+1 )n∈N are the same. Let a := lim(an ) = lim(an+1 ). Then,
1 x 1 x 1 x

a = lim an+1 = lim an + = lim an + = a+
n→∞ n→∞ 2 an 2 n→∞ limn→∞ an 2 a
With the same calculations as above we see that this equation can only be fulfilled if
1 x √

a= a+ ⇐⇒ a2 = x ⇐⇒ a = ± x.
2 a
√ √
Since (an ) is non-negative, − x cannot be the limit of the sequence and therefore x is the
only possibility, i.e., √
lim an = x.
n→∞
√
(Note that the limit would be − x if the ’starting value’ a1 would be negative, Check yourself!)
107

3.4 Subsequences and accumulation points

The concepts of the last sections deal with sequences that converge or, in other words, concen-
trate around a single point. In some cases, however, also divergent sequences have only some
points of interest for very large n. An obvious example is ((−1)n )n∈N .
In this section, we want to formalize the idea of sequences having more than one limit, i.e., points
where the sequences accumulates for large n. We will show the (to some extent surprising) fact
that every bounded sequence has such an accumulation point. We will also specify special
convergent subsequences, which leads to the so-called limit superior and limit inferior of a
sequence.

Definition 3.38. Let (n1 , n2 , n3 , . . . ) be an increasing sequence of natural numbers and

(an )n∈N be a sequence. Then, we call

(ank )k∈N = (ank )∞

k=1 = (an1 , an2 , . . . )

a subsequence of (an )n∈N .

Example 3.39. Consider the sequence given by bn = (−1)n (1 − n1 ). This is not convergent, as
it "jumps" between ’close to 1’ and ’close to -1’. However, if we take the sequence of even natural
numbers, i.e., (n1 , n2 , n3 , . . . ) = (2, 4, 6, . . . ), then the terms of the subsequence (bnk )k∈N =
1
(b2n )n∈N = (b2 , b4 , b6 , . . . ) are of the form bnk = 1− 2k . Hence, (bnk )k∈N is a convergent sequence,
and hence, a convergent subsequence of (bn )n∈N .

Definition 3.40. Let (an )n∈N ⊂ C be a sequence.

We call a ∈ C an accumulation point of (an )n∈N if there exists a subsequence (ank )k∈N of
(an )n∈N with
lim ank = a.
k→∞

Equivalently, we may use the definitions

∀ε > 0 ∀n0 ∈ N ∃n ≥ n0 : |a − an | < ε,

or
∀ε > 0 ∀n0 ∈ N ∃n ≥ n0 : an ∈ Uε (a),
or n o
∀ε > 0 : # n ∈ N : an ∈ Uε (a) = ∞,

i.e., there are infinitely terms of (an ) in every neighborhood of a.

(Note the interchanged quantifiers compared to the definition of convergence.)

Example 3.41. Considering the example from above, i.e., bn = (−1)n (1 − n1 ), we see that
1
b2n → 1. Hence, 1 is an accumulation point of (bn ). Moreover, b2n+1 = −(1 − 2n+1 ) → −1 also
defines a convergent subsequence, and −1 is therefore also an accumulation point. It is not hard
to see that there is no other possible limit of a subsequence.
108

Next we want to show that each bounded sequence has at least one convergent subsequence. In
particular, we show (in the proof) that every sequence contains either a non-increasing or a non-
decreasing subsequence (or both), which together with the boundedness implies its convergence,
see Theorem 3.35. Note that every subsequence of a bounded sequence is also bounded.
This result bears the names of Bolzano and Weierstrass and is an important technical tool for
proofs in many areas of analysis.

Theorem 3.42 (Bolzano-Weierstrass). Let (an )n∈N ⊂ C be a bounded sequence.

Then, (an )n∈N has at least one convergent subsequence.
That is, (an )n∈N has at least one accumulation point.

Proof. We present a proof only in the real case. The complex case works similar.
We call m ∈ N a peak of (an )∞
n=1 if ∀n > m we have that an < am . If there exist infinitely many
peaks of (an )∞
n=1 , denoted by n1 , n2 , n3 , . . . then the sequence (an1 , an2 , an3 , . . . ) is decreasing
and bounded. Hence it is convergent as we know from before.
If there are at most l ∈ N peaks, then we we choose n1 bigger than the largest peak, or, if there
are no peaks, then we choose n1 = 1. In both cases n1 is no peak, hence there exists n2 > n1
such that an2 ≥ an1 . Furthermore n2 is no peak, which implies that there exists a n3 > n2
such that an3 ≥ an2 . If we repeat this process we end up with an non-decreasing subsequence
of (an )∞
n=1 , which is also bounded. Therefore this subsequence converges.

We can also give another proof of this statement, which is a bit more of geometric flavour.

Alternative proof. Every sequence (an )n∈N has infinitely many (not necessarily different) terms.
If infinitely many are equal, we are done, because a list of these terms is a convergent subse-
quence.
If not, assume w.l.o.g. that 0 ≤ an ≤ 1 for all n. Every other bounded sequence can be treated
the same way. Now, split the interval [0, 1] into the halves [0, 12 ] and [ 12 , 1]. As there are infinitely
many points in [0, 1], at least one of the halves must also contain infinitely many points. Now,
split up this one, and so on. With this procedure we come arbitrary close to a point a, whose
neighborhoods –by construction– all contain infinitely many points. This point is therefore an
accumulation point. (We just note here that a is defined as the intersection of infinitely may
nested intervals. It follows from the sandwich rule that this intersection is not empty.)

Remark 3.43. The Bolzano-Weierstrass theorem is also true for sequences in much more general
(e.g. multidimensional) situations.

Example 3.44. Note that every bounded sequence has at least one convergent subsequence but
not every sequence with a convergent subsequence is bounded. E.g., consider the sequence (an )
with a2n = n and a2n−1 = 0. Clearly, (a1 , a3 , a5 , . . . ) is a null sequence, but there is no upper
bound for this sequence.
109

Inspired by the proof of the Bolzano-Weierstrass theorem, we will define two special accumulation
points of a sequence, i.e., the smallest and the largest accumulation point. They can be seen
as the limiting bounds on the sequence, i.e., every limit of a convergent subsequence must lie
between them.

Definition 3.45. Let (an )n∈N ⊂ R be a real-valued sequence.

We define the limes inferior of (an )n∈N by

lim inf an := lim inf ak ,
n→∞ n→∞ k≥n

and the limes superior by

!
lim sup an := lim sup ak .
n→∞ n→∞ k≥n

Note that (inf k≥n ak )n∈N is a non-decreasing sequence, since we take the infimum over a smaller
set if n increases. Hence, together with Theorem 3.35, we can alternatively define the limes
inferior by
lim inf an = sup inf ak .
n→∞ n∈N k≥n

Similarly, we obtain for the limes superior that

lim sup an = inf sup ak .

n→∞ n∈N k≥n

Since the infimum and supremum exist of arbitrary bounded sets, we obtain that the limit
inferior and limit superior also exist for arbitrary bounded sequences. Moreover, if the sequence
is unbounded, then the corresponding ’inner’ infimum or supremum (or both) are infinity.
So, if we allow limes inferior and limes superior to have also the values −∞ and ∞, then they
exist for arbitrary sequences. That is, to every real-valued sequence (an ), we may assign
the unique values lim inf(an ) ∈ R ∪ {−∞, ∞} and lim sup(an ) ∈ R ∪ {−∞, ∞}.

Example 3.46. Consider the sequence (an )n∈N with an = (−1)n .

For all n ∈ N we see that inf k≥n ak = −1, which shows lim inf(an ) = −1.
Moreover, supk≥n ak = 1 for all n, which gives lim sup(an ) = limn supk≥n ak = 1.

Example 3.47. Consider the sequence (an )n∈N with an = (−1)n + n1 .

For n ∈ N we see that inf k≥n ak = inf (−1)k + k1 : k ≥ n = −1 (using the Archimedean

property), which shows lim inf(an ) = −1.

Moreover, supk≥n ak = 1 + n1 for even n and supk≥n ak = 1 + n+11
for odd n. (For odd n we have
supk≥n ak = an+1 .) This gives lim sup(an ) = limn supk≥n ak = 1.
1
One may also consider the subsequences (a2n ) and (a2n+1 ) of (an )n∈N with a2n = 1 + 2n →1
1
and a2n+1 = −1 + 2n+1 → −1, to see that −1 and 1 are accumulation points.
110

Example 3.48. Consider the sequence (an )n∈N with an = 3 + (−1)n (1 + 42

n ).
We have that
42 42

n n+1
sup ak = sup 3 + (−1) 1 + , 3 + (−1) 1+ ,...
k≥n n n+1
(
42
=
4+ n, n even,
42
4+ n+1 , n odd.
Hence, lim sup an = 4. In the same way, we obtain lim inf an = 2.

Example 3.49. Consider the sequence (an )n∈N with an = 2n + (−2)n . The terms of (an ) equal
either 2n+1 (for even n) or 0 (for odd n). Therefore, lim inf(an ) = 0 and lim sup(an ) = ∞.

The limit inferior and limit superior are accumulation points of (an ), if the sequence is
bounded. We show this for the limit inferior b := lim inf(an ).
With bn := inf k≥n ak we have b = lim(bn ). (Note that (bn ) is in general no subsequence of (an ).)
Now let n1 ∈ N be such that b1 > an1 − 12 . Such an n1 exists by the definition of the infimum.
Next, let n2 > n1 be such that bn1 +1 > an2 − 212 , and so on. That is, we obtain an increasing
sequence (nk )k∈N of natural numbers with bnk +1 > ank+1 − 21k . In addition, we have by definition
that bnk +1 ≤ ank+1 . We obtain that |bnk +1 −ank+1 | < 21k , which implies that (bnk +1 −ank+1 )k∈N is
a null sequence. Hence, limk→∞ bnk +1 = limk→∞ ank+1 . Since all subsequences of (bn ) converge
to the same limit, we obtain limk→∞ ank+1 = lim(bn ) = b, i.e., b is an accumulation point.

Moreover, lim inf(an ) and lim sup(an ) are the smallest and largest accumulation point of
(an ), respectively. To see this, let a ∈ R be any accumulation point of (an ), i.e., there exists a
subsequence (ank )k∈N with ank → a. Then we have the bounds
lim inf an ≤ a ≤ lim sup an .
n→∞ n→∞

Proof. If a > lim sup(an ), then there is some ε > 0 and infinitely many terms of (an ) that are
larger than lim sup(an ) + ε. Hence, supk≥n ak ≥ lim sup(an ) + ε for all n: A contradiction to the
definition of the limit.

Although all accumulation points of (an ) are bounded from below and above by lim inf(an )
and lim sup(an ), respectively, this clearly does not need to hold for all (large enough) terms of
the sequence. It may even happen that all elements of a sequence lie outside of the interval
lim inf(an ), lim sup(an ) . Consider, e.g., an = (−1)n (1 + n1 ) ∈

/ [−1, 1] with lim inf(an ) = −1 and
lim sup(an ) = 1.
Moreover, if we are (only) interested in the limiting behavior of the sequence (an ), then the
trivial bound
inf{an : n ∈ N} ≤ aN ≤ sup{an : n ∈ N},
which holds for all N ∈ N, is useless in general.

We can use the limit inferior and superior to bound the elements of a sequence for large N :
As lim inf and lim sup are the smallest and largest accumulation point of a sequence, respectively,
we

obtain have that all elements of (an ) with large enough n are at least close to the interval
lim inf(an ), lim sup(an ) . That is, for all ε > 0 there is some n0 ∈ N such that
lim inf(an ) − ε ≤ aN ≤ lim sup(an ) + ε
for all N ≥ n0 . (Verify this!)
111

We finally show that the limit inferior and limit superior are indeed generalizations of the concept
of a limit. Clearly, it is more generally applicable, as lim inf and lim sup are well-defined for
arbitrary sequences. The next result shows that, for convergent sequences, all these values are
just the same. This follows from the considerations above, and the fact that every subsequence
of a convergent sequence converges to the same limit.

Lemma 3.50. A sequence (an )n∈N ⊂ R is convergent (or definitely divergent) if and only
if
lim inf an = lim sup an
n→∞ n→∞

In this case,
lim an = lim inf an = lim sup an .
n→∞ n→∞ n→∞

This means that a sequence is convergent if and only if the sequence is bounded and
has exactly one accumulation point.

Remark 3.51 (Complex sequences). Note that the lim inf and lim sup cannot be defined for
complex-valued sequences, because we do not have an order on C, and hence no supremum or
infimum. However, it is still true that a complex-valued sequence is convergent if and only if it
is bounded and has exactly one accumulation point. For a proof, we may consider lim inf and
lim sup of the real and imaginary parts separately.

We finish this section with an extreme example.

Example 3.52. Consider the sequence (an )n∈N , which is a list of all rational numbers in [0, 1].
We already showed that the rational numbers are dense in R. Thus every x ∈ [0, 1] is an
accumulation point of (an )n∈N . In other words, the set of accumulation points is uncountable
and therefore in some sense “larger” than the set of the sequence elements.
112

3.5 Cauchy criterion

In this section we introduce the Cauchy criterion for proving convergence of a sequence. This
is, similarly to the monotonicity principle (Theorem 3.35), an important tool to verify that a
sequence is convergent, without knowing its limit. The Cauchy criterion will be the dominant
technique for proofs of convergence when it comes to higher mathematics, including sequences
of more general objects.
The central object is a Cauchy sequence.

Definition 3.53. A sequence (an )n∈N ⊂ C is called a Cauchy sequence if

∀ε > 0 ∃n0 ∈ N ∀n, m ≥ n0 : |an − am | < ε.

That is, the terms of a Cauchy sequence are pairwise close to each other for large n.

Compare this definition with the definition of convergence in order to gain better understanding.

1 1 1
Example 3.54. The sequence (an )n∈N with an = n satisfies |an − am | = m − n < ε for
n ≥ m > 1ε =: n0 . Hence, (an ) is a Cauchy sequence.

We now prove the important property that every convergent sequence is a Cauchy sequence,
and vice versa.

Theorem 3.56 (Cauchy criterion). Let (an )n∈N ⊂ C be a sequence. Then,

(an )n∈N is convergent ⇐⇒ (an )n∈N is a Cauchy sequence.

Proof. First we show “(an ) is convergent =⇒ (an ) is Cauchy sequence”:

For this, let a ∈ C be the limit of (an )n∈N and ε > 0. By the triangle inequality, we have

|an − am | = |an − a + a − am | ≤ |an − a| + |a − am |.

ε
Since (an )n∈N is convergent we can find n0 ∈ N such that for n, m ≥ n0 we have |an − a| ≤ 2
and |am − a| ≤ 2ε . Hence,

|an − am | ≤ |an − a| + |a − am | ≤ ε.

Since this holds for all ε > 0, we get that (an ) is a Cauchy sequence.
113

We now show the other direction “(an )n∈N is Cauchy sequence =⇒ (an )n∈N is convergent ”:
First, we choose ε = 1 in the definition of a Cauchy sequence, and obtain some n0 such that
|an − am | < 1 for all m, n ≥ n0 . Moreover, the triangle inequality implies

|an | ≤ |an − an0 | + |an0 | ≤ 1 + |an0 |

for all n ≥ n0 . With C := max{|a1 |, |a2 |, . . . , |an0 −1 |, 1 + |an0 |} we have

∀n ∈ N : |an | ≤ C

which makes (an )n∈N a bounded sequence. The Bolzano-Weierstrass theorem implies that
(an )n∈N has at least one accumulation point a ∈ C. Thus there exists a subsequence (ank )k∈N
(of (an )n∈N ) such that
lim ank = a.
k→∞

Using the triangle inequality again we have

|an − a| ≤ |an − ank | + |a − ank |.

Let ε > 0. By the convergence of (ank ) we obtain that |a − ank | < 2ε for large enough k, i.e., for
large enough nk . Moreover, since (an ) is a Cauchy sequence, |an − ank | < 2ε for n and nk large
enough. (Formally, this shows that, for all ε, there exist n0 , k0 such that for all n ≥ n0 , k ≥ k0
we have |an − a| < ε. But since the statement does not depend on k, we just omit this part.)
That is, for arbitrary ε > 0 and n large enough, we have

|an − a| < ε,

and this shows the convergence of (an )n∈N .

Remark 3.57. One might show for every convergent sequence discussed so far, that it is a
Cauchy sequence. The proof would following directly the first part of the proof above, and one
would not not learn much by these computations. (You might still try it on your own!) However,
verifying that a sequence is a Cauchy sequence is often much easier, as we will see later on.

Example 3.58. Note that it is not enough that neighboring elements of a sequence become arbi-
√
trarily close. For example, the sequence (an )n∈N with an = n, which is clearly not convergent,
satisfies
√ √ √ √
√ √ ( n + K − n)( n + K + n)
|an+K − an | = n + K − n = √ √
( n + K + n)
K K
= √ √ < √ → 0
n+K + n 2 n

for every fixed K ∈ N. Hence, ’terms at fixed distance’ become arbitrarily close together.
√
However, we have, e.g., |a4n − an | = n → ∞. So, there is no n0 such that |am − an | < 1 for all
m, n ≥ n0 .
114

3.6 Series

Now we want to discuss special sequences of the form

n
X
sn = ak ,
k=1
where (an )n∈N is a given real- or complex-valued sequence. That is, s1 = a1 , s2 = a1 + a2 ,
s3 = a1 + a2 + a3 and so on. The sum of all terms of the sequence (an )n∈N , i.e., the limit of
the sequence (sn )n∈N , is one of the main motivations for considering limits at all, and some
interesting phenomena appear when it comes to the question if such limits exist.
Let us again start with the necessary definitions.

Definition 3.59. Let (an )n∈N be a sequence and

n
X
sn := ak .
k=1

Then, we call sn the n-th partial sum of the (infinite) series

∞
X X
ak or just ak .
k=1

If the sequence of partial sums (sn ) converges to some s ∈ C, i.e., sn → s, then we say that
the series converges or that the sequence (an )n∈N is summable, call s the sum of the
series, and write
∞
X n
X
ak := lim ak = lim sn = s.
n→∞ n→∞
k=1 k=1
P∞
If sn → ±∞ we also write k=1 ak = ±∞, and say that the series is definitely divergent.
P∞
Otherwise we call the series k=1 ak divergent and the sequence (an )n∈N not summable.

Note that series is just another word for an infinite sum of elements of a sequence. Moreover,
the notation ∞
P
k=1 ak should be understood as a formal symbol for the limit: It might be a
number or ±∞, but it might also not exist (as a number).

One may clearly generalize this concept by considering

n
X
sn = ak
k=k0
P∞
for some k0 ∈ N and n ≥ k0 , and a sequence (ak )∞
k=k0 . A typical case is e.g. k=0 ak . The
corresponding definitions should be clear.
P
Note that we should use the shorter notation ak only if we exactly know what k0 is. If different
indices appear in a calculation (as almost always), then it is necessary to be more precise to
keep track of the summation limits.

The definition above states that a series converges if and only if the sequence (sn )∞
n=1 of partial
sums converges. This implies that we can use the results from the previous section to analyze
series. Moreover, we will see that there are even more tools for working with series. But first
let us see some examples, that will be essential for the upcoming considerations.
115

The first example is one of the most well-known and used infinite sums. Although it is usually
considered only for real q ∈ (−1, 1), we see that it also holds for complex bases.

Example 3.60 (Geometric series). Let q ∈ C with |q| < 1. Then we have that
∞ ∞
X 1 X q
qk = and qk = .
k=0
1−q k=1
1−q

Both follow from the even more general formula

n
X q k0 − q n+1
qk = ,
k=k0
1−q

which holds for all n ≥ k0 and q 6= 1. (Note that the last formula does not contain a limit.)

Proof. We only prove the result for k0 = 0, and leave the rest for the reader.
Let sn := nk=0 q k and consider the equation
P

n
X n
X n
X n
X n
X n+1
X
(1 − q)sn = qk − q · qk = qk − q k+1 = qk − q k = q 0 − q n+1
k=0 k=0 k=0 k=0 k=0 k=1
n+1
= 1−q .

From the next to the last equation we used the fact that the terms q k for k ∈ {1, 2, . . . , n}
appear in both sums (and the second sum is subtracted from the first). Thus, the only terms
that remain are q 0 and q n+1 (and the second gets a minus in front). Such arguments, i.e., that
many (or all) terms of a series cancel each other out, are called telescoping tricks and sums
of this form are called telescoping sums. We will come back to this kind of series later.
From the above equation we obtain
1 − q n+1
sn =
1−q
for all q 6= 1. Using this representation of the partial sums we can compute its limit easily. Note
1
that 1−q is a constant factor and (q n )n∈N with |q| < 1 is a null sequence, see Example 3.20.
Hence
∞
X 1 1 1
q k = lim sn = lim 1 − q n+1 = 1 − lim q n+1 = .
k=0
n→∞ 1 − q n→∞ 1−q n→∞ 1−q

1
Example 3.61. If we set q = 2 we get e.g. that
∞ ∞
X 1 X 1
= 2 and = 1
n=0
2n n=1
2n

Remark 3.62. Note that the telescoping trick from above also works for |q| > 1 (but not
n+1
if q = 1). It then follows from the explicit formula sn = q q−1−1 , where we just multiplied
numerator and denominator by −1, that (sn ) is unbounded, and therefore divergent. If q > 1
(and, in particular, a real number) we see that (sn ) tends to infinity, while the limits simply do
not exist for q ≤ −1. In general, the case |q| = 1 needs more care. We will see in Example 3.72
that ∞ k
k=0 q is divergent for every |q| ≥ 1, and definitely divergent only for q ≥ 1.
P
116

The next example is quite obvious; still, we need to discuss it.

Example 3.63 (Finite series). Let (an )∞

n=1 be a finite sequence, i.e., ∃ K ∈ N∀ k > K : ak = 0.
Then, we clearly have
∞
X K
X
ak = ak = a1 + a2 + · · · + aK .
k=1 k=1

Now we will see that not all convergent sequences lead to convergent series. Moreover,
this is the most important prototype of a divergent sequence, as it is “almost summable”. We
will see later what this means.
1
Example 3.64 (Harmonic series). Consider the sequence given by an = n. Then, the corre-
sponding series satisfies
∞
X 1
= ∞,
n=1
n
i.e., it is definitely divergent. This series is called harmonic series.
However, we will see later that n−α is convergent if α > 1.
P

Proof. We show that the sequence of partial sums is bounded from below by a divergent sequence.
For this, we successively group the terms in the partial sums, and bound this group by the
number of elements times its smallest member:
1 1 1 1 1 1 1 1 1 1

s2n = 1+ + + + + + + + ··· + + + ··· +
2 3 4 5 6 7 8 2n−1 + 1 2n−1 + 2 2n
1 1 1 1
≥ 1 + + 2 · + 4 · + · · · + 2n−1 · n
2 4 8 2
1 1 1 1
= 1 + + + + ··· +
2 2 2 2
n
= 1+ .
2
This means that, with N such that 2n ≤ N < 2n+1 ⇐⇒ n ≤ log2 (N ) < n + 1, we obtain
sN ≥ s2n ≥ 1 + n/2 ≥ 1+log22 (N ) . We obtain sn → ∞.

Example 3.65. We want to discuss the telescoping trick once more. This is sometimes a
very powerful tool to obtain the precise value of apparently complicated series. Let us therefore
prove
∞
X 1
= 1.
k=1
k(k + 1)
It is not clear yet that the series on the left hand side converges, and its precise value is also far
from obvious. But one might notice that the terms can be expanded to
1 k+1 k 1 1
= − = − .
k(k + 1) k(k + 1) k(k + 1) k k+1

With this, we obtain that

n n n+1
1 1 1 1 1
X X X
sn = − = − = 1− −→ 1.
k=1
k k+1 k=1
k k=2
k n + 1
117

The above telescoping trick may be generalized in the following way:

Assume that ak can be written as ak = bk − bk+` for some ` ∈ N. Then,
∞
X `
X
ak = bk + ` · lim bk .
k→∞
k=1 k=1

6n+9 1 1
Example 3.66. Consider an = n2 (n+3)2
= n2
− (n+3)2
=: bn − bn+3 . Hence,

n
X n
X n
X n+3
X
ak = (bk − bk+3 ) = bk − bk = b1 + b2 + b3 + bn+1 + bn+2 + bn+3 .
k=1 k=1 k=1 k=4
P∞ 1 1 49
Since bn → 0, we obtain k=1 ak =1+ 4 + 9 = 36 .

However, note that it is rare that we can compute the sum precisely. Already for the
example ∞ 1
P
k=1 k2 , which is very similar to the one above, we need some higher mathematics, to
2
find its precise value π6 ≈ 1.645. For many other sums, there is just no closed expression.

Still, we might be interested if the sum exists. For this, we can deduce several techniques from
our knowledge about sequences. Let us start with some calculation rules.

P∞ P∞
Theorem 3.67. Let k=1 ak and k=1 bk be convergent series, and let c ∈ C. Then we
have ∞ ∞ ∞
X X X
(ak + bk ) = ak + bk
k=1 k=1 k=1

and ∞ ∞
X X
c · ak = c · ak .
k=1 k=1

Proof. For the first equality we observe that

n
X n
X n
X
(ak + bk ) = ak + bk .
k=1 k=1 k=1

For the second equality consider

n
X n
X
c · ak = c · ak .
k=1 k=1

Since both series are convergent, we obtain the results from Theorem 3.24.

We now consider two results on the convergence of series that follow directly from the results
of the last section. In fact, they are just reformulations of the monotonicity principle (Theo-
rem 3.35) and the Cauchy criterion (Theorem 3.56).
118

The first is concerned with series with non-negative terms and bounded partial sums.

Theorem 3.68. Let (an )∞ n=1 ⊂ R be a non-negative sequence, i.e. ak ≥ 0 for all k ∈ N.
Then, the sequence of partial sums sn := nk=1 ak is bounded, i.e.,
P

∃C ∈ R ∀ n ∈ N : sn ≤ C,
P∞
if and only if the series k=1 ak converges.

Proof. Since ak ≥ 0, we obtain that (sn ) is a non-decreasing, and therefore monotone, sequence.
The result follows from the monotonicity principle (Theorem 3.35).

This theorem is already enough to show that the aforementioned alternative representation of
Euler’s number (defined in Section 2) as an infinite sum converges.
1
Example 3.69 (Euler’s number). Consider the series given by the sequence an = n! , starting
at 0 in this case. The partial sums are given by
n
X 1
sn = .
k=0
k!

Now note that k! = 1 · 2 · 3 · · · k ≥ 1 · 2k−1 for k ≥ 1, and therefore

∞
X 1 1/2
sn ≤ 1 + = 1+2· = 3.
k=1
2k−1 1 − 1/2

Thus we obtain that the series is convergent.

P
Let us also compute that the sum of the series an is e. The binomial theorem yields
n n
n !
1 n 1 n! 1
X X
1+ = k
= · k
n k=0
k n k=0
k! (n − k)! n
n
X 1 n!
= · k
≤ sn ,
k=0
k! (n − k)! n

n!
where the last step follows from (n−k)! = (n − k + 1) · (n − k + 2) · · · (n − 1) · n ≤ nk .
Hence,
n ∞
1 1
X
e := lim 1+ ≤ .
n→∞ n k=0
k!
On the other hand, for fixed m ∈ N and n ≥ m, we obtain, by similar arguments as above, that
n m
n ! !
1 n 1 n 1
X X
lim 1+ = lim k
≥ lim
n→∞ n n→∞
k=0
k n n→∞
k=0
k nk
m m
1 n! 1
X X
= lim = ,
k=0
k! n→∞ (n − k)! nk k=0
k!

where the last step follows from n!/(n − k)! ≥ (n − k)k .

Note that it was important that the number of elements in the sum was independent of n to
119

take the limit into the

n sum. This implies, by taking the limit m → ∞ on the right hand side,
1 P∞ 1
that limn→∞ 1 + n ≥ k=0 k! . Combining both inequalities we see that
n ∞
1 1
X
e = lim 1+ = .
n→∞ n k=0
k!

The next result is just the Cauchy criterion applied to partial sums.

P∞ P∞
Theorem 3.70 (Cauchy criterion). Let k=1 ak be a series. Then we have that k=1 ak
is convergent if and only if

m
X

∀ε > 0 ∃ n0 ∈ N ∀m > n ≥ n0 :
ak < ε.
k=n+1
P∞
In other words, the series k=1 ak is convergent if and only if the sequence of partial sums
is a Cauchy sequence.

Proof. By definition we have that ∞

P
k=1 ak converges if and only if the sequence of partial sums
sn = nk=1 ak converges. Since, for m > n, we have that
P

m
X n
X m
X
sm − sn = ak − ak = ak ,
k=1 k=1 k=n+1

we see that the condition in the theorem is equivalent to (sn ) being a Cauchy sequence. This is
equivalent to (sn ) being convergent, see Theorem 3.56.

This theorem immediately leads to the following simple criterion. In many cases, this is already
enough to show that a sequence is divergent.

Corollary 3.71. We have

∞
X
ak is convergent =⇒ lim ak = 0,
k→∞
k=1

i.e., every summable sequence is a null sequence.

P
In particular, if (an ) is not a null sequence, then ak is divergent.

Proof. We use the Cauchy criterion for series and set m = n + 1. Then we get

∀ε > 0 ∃ n0 ∈ N ∀n ≥ n0 : |an | < ε.

This is the definition of being a null sequence.

120

P∞ k
Example 3.72. From this, we finally obtain that k=0 q can only be convergent for |q| < 1.

Remark 3.73. We have already seen that there are null sequences which do not give a conver-
gent series, e.g. ( n1 )∞
n=1 . So the above corollary gives a necessary but not a sufficient condition
for a series to be convergent.

Remark 3.74. (*) Indeed the representation of e via a series can be generalized to obtain the
xk
known exponential function. This leads to ex = ∞
P
k=0 k! . (We will prove this much later!)

3.7 Convergence tests

We now discuss several criteria to prove that a series is convergent. These convergence tests
mostly do not lead to the precise sum of a series. However, they are quite generally applicable.
The first tests that will be discussed are based on another form of convergence of series, which
will turn out to be a stronger criterion.

Definition 3.75. Let (an )n∈N ⊂ C be a sequence such that

∞
X n
X
|ak | < ∞ : ⇐⇒ ∃C ∈ R ∀ n ∈ N : |ak | ≤ C,
k=1 k=1
P∞
i.e., the series of absolute values of an is bounded. Then, we say that the series k=1 ak is
absolutely convergent or the sequence (an )n∈N is absolutely summable.

|ak | < ∞ is really the same as |ak | being convergent. This follows from Theo-
P P
Note that
rem 3.68 and the fact that |ak | ≥ 0.
Moreover, note that for non-negative sequences (an )n∈N , i.e., ak ≥ 0 for all k, absolute summa-
bility and summability are just the same.

Let us first consider a known example.

P∞ k
Example 3.76. We have shown already that for |q| < 1 the series k=0 q is convergent. Since
∞ ∞
X X 1
|q k | = |q|k = .
k=0 k=0
1 − |q|
P k
can be shown as above, we see that q is absolutely convergent in the same range of q.
P∞ (−1)k
Example 3.77. The alternating harmonic series, i.e. k=1 k , is not absolutely conver-
gent, because
∞ k ∞
(−1) = 1
X X
k k
k=1 k=1

is the harmonic series, which is divergent. However, we will see later that the alternating
harmonic series is convergent.
121

The next result shows that absolute convergence is indeed a stronger criterion.

Theorem 3.78. Absolutely convergent series are also convergent.

Proof. We use the Cauchy criterion and the triangle inequality to prove this result. Let ∞
P
k=1 ak
be a absolutely convergent series and ε > 0. By the Cauchy criterion there exists some n0 ∈ N
such that for all m > n ≥ n0 we have
m
X
|ak | < ε.
k=n
The triangle inequality yields
m m
X X

a k
≤ |ak | < ε.
k=n k=n
P∞
Thus the Cauchy criterion implies that k=1 ak is convergent.

We will now discuss several criteria, called convergence tests, that can by used to verify if a
series is convergent or not. However, note that these test are sometimes inconclusive, i.e., we do
not get a definite answer by applying them, and one needs to apply other techniques.

3.7.1 Comparison test

The first test is based on the existence of another series, that is known to be convergent or
not. This (quite obvious) test is used in nearly every application, and is clearly based on the
numerous examples we are discussing. In particular, some of the other convergence tests are
based on a simple application of this one.

P∞ P∞
Theorem 3.79. Let k=1 ak and k=1 bk be series.

(i) If bk is absolutely convergent, and |ak | ≤ |bk | for all k ∈ N.

P
P
Then, ak is also absolutely convergent.

bk = ∞, and 0 ≤ bk ≤ ak for all k ∈ N, then also ak = ∞.

P P
(ii) If

Proof. Since |bk | is an upper bound for the partial sums of |ak |, we have that |ak | is con-
P P P
P
vergent by Theorem 3.68, and therefore finite. This implies that ak is absolutely convergent.
P
The second point follows from similar argument. We use that the partial sums of bk are
divergent (which is by Theorem 3.68 the same as bk = ∞, since bk ≥ 0), and that that partial
P
P P
sums of ak are just larger. This shows that also ak is unbounded, and hence divergent.
122

P∞ −c
Let us consider the series k=1 k for c > 0, with polynomially decaying terms.
Example 3.80. We have
∞ ∞ ∞
X 1 X 1 k+1 X 1
= ≤ 2 = 2 < ∞,
k=1
k2 k=1
k(k + 1) k k=1
k(k + 1)
P∞ 1 P∞ P∞
and hence that is (absolutely) convergent. Similarly, √1 ≥ 1
= ∞.
k=1 k2 k=1 k k=1 k

In general, we obtain that

∞
X 1
• is absolutely convergent for c ≥ 2, and
k=1
kc
∞
X 1
• = ∞ for c ≤ 1.
k=1
kc

(In fact, we will see soon that the series is actually convergent for all c > 1.)

One may also consider more complicated series.

P∞ k3 +4k2 −3
Example 3.81. For example, we obtain that k=1 k5 +k+1 is convergent, since
∞ 3 ∞ ∞

2 k 3 + 4k 3
k + 4k − 3 1
X X X
5 ≤ 5
= 5 < ∞,
k=1
k +k+1
k=1
k k=1
k2

P∞ k3 +4k2 −3
Example 3.82. However, k=1 k4 −k+1 = ∞, since
∞ 3 ∞ ∞

2 k3
k + 4k − 3 1
X X X
4 ≥ 4
= = ∞.
k −k+1 k k
k=1 k=1 k=1

3.7.2 Root test

We now turn to convergence tests that can be applied to the terms of a series, and we do not
need precise knowledge about the partial sums. As the proof shows, these tests just follow from
a comparison of the series under consideration with a geometric series, see Example 3.60.

P∞
Theorem 3.83 (root test). Let k=1 ak be a series.

(i) If there exists some q < 1 such that

q
k
|ak | ≤ q for almost all k,
P∞
then k=1 ak is absolutely convergent.

(ii) Conversely, if q
k
|ak | ≥ 1 for infinitely many k,
P
then ak is divergent.
123

Proof. By assumption, there is some k0 such that |ak | ≤ q k for k ≥ k0 . Hence, we get that
∞
X 0 −1
kX ∞
X 0 −1
kX ∞
X 0 −1
kX ∞
X
|ak | = |ak | + |ak | ≤ |ak | + qk ≤ |ak | + qk ,
k=1 k=1 k=k0 k=1 k=k0 k=1 k=0

0 −1
where the first inequality comes from Theorem 3.79. Now, kk=1 |ak | is a finite sum and ∞ k
P P
P∞ k=0 q
is a geometric series. As both are finite for q ∈ [0, 1), we get that k=1 ak converges absolutely.
For part (ii), just note that (an ) fails to converge to 0 under the given assumption.

The condition of the root test can be written equivalently with the help of limits:
For this, consider the limit superior
q
k
a := lim sup |ak |.
k→∞
P
Then, the series ak is

(i) absolutely convergent, if a < 1,

(ii) divergent, if a > 1,

(iii) and we do not gain any information from the root test, if a = 1.

ak with ak ≥ 0 necessarily satisfies ak = ∞.

P P
Recall that a not (absolutely) convergent series

Let us discuss some examples.

√
Example 3.84. Let ak = k1k . For k ≥ 2 we have that k ak = 1
k ≤ 12 . Thus by the root test the
series ∞ 1
P
k=1 kk converges absolutely.

Example 3.85. Consider the series ∞ 42 −k

P
k=1 sin(k) k 2 . Although the terms of the series, say
ak , look complicated and are very large in the beginning (e.g., a2 ≈ 1012 ), we obtain from the
comparison and the root test that it converges (absolutely). √
| sin(k) k 42 2−k | ≤
P 42 −k
|k 2 |. Now, since lim k k = 1, we obtain that
P
For √ this note that
k
lim k 42 2−k = 12 . It follows that the series is absolutely convergent, and therefore convergent.
That is, ∞ 42 −k is just a number.
P
k=1 sin(k) k 2
124

3.7.3 Ratio test

The next test is based on quotients of successive terms of the series.

P∞
Theorem 3.86 (ratio test). Let k=1 ak be a series.

(i) If there exists some q < 1 such that

ak+1
ak 6= 0 and a ≤q
for almost all k,
k
P∞
then k=1 ak is absolutely convergent.

(ii) Conversely, if

ak+1
ak 6= 0 and a ≥1
for almost all k,
k
P
then ak is divergent.

k0 −1
Proof. As in the proof of the root test we can split the series in ∞ ∞
P P P
k=1 ak = k=1 ak + k=k0 ak ,
Pk0 −1
where k=1 ak is a finite sum and does not change the convergence of the series. We assume
w.l.o.g that k0 = 1. By induction we have that |ak | ≤ q k−1 |a1 | for all k ∈ N. An index shift
implies
∞
X ∞
X ∞
X ∞
X
|ak | ≤ q k−1 |a1 | = q k |a1 | = |a1 | qk .
k=1 k=1 k=0 k=0
P∞
Since q ∈ [0, 1) we get that k=1 ak is absolutely convergent from Theorem 3.79.
For part (ii) we follow similar steps and obtain |ak | ≥ |ak0 | for all k ≥ k0 and k0 large enough.
P
Hence, (ak ) is not a null sequence, and consequently ak not convergent.

The condition of the ratio test can be written equivalently with the help of limits:
For this, assume that the limit
ak+1
a := lim
k→∞ ak
P
exists. Then, the series ak is

(i) absolutely convergent, if a < 1,

(ii) divergent, if a > 1,
(iii) and we do not gain any information from the ratio test, if a = 1.

Remark 3.87. One may prove that the ratio test is a bit more special than the root test in the
following sense:
a p
Assume that A := limk→∞ k+1 ak exists, then also B := limk→∞
k
|ak | exists and A = B. That

is, whenever we successfully applied the ratio test, we may have also applied the root test to
come to the same conclusion. (We will not prove this here.) However, the ratio test is sometimes
much easier to apply.
125

P∞ kk/2
Example 3.88. We show that the series k=1 k! is absolutely convergent.
For this, note that
(k+1)/2 k! (1 + 1/k)k/2

ak+1
= (k + 1) = √
2
≤ √ → 0.
a k/2
k (k + 1)! k+1
k k
The ratio test implies the absolute convergence.

Example 3.89. The root and ratio test have their limitations, e.g.,√ for series whose terms are
P −c k
only polynomially decaying, i.e., k for some c > 0. Since lim k −c = 1 independent of c,
we cannot distinguish between different c with the root test, although the series is convergent
for some c, and divergent for others, see Example 3.80.
126

3.7.4 Cauchy’s condensation test

The convergence test we want to discuss now is only applicable if the terms of the series are a
non-negative and monotone null sequence. However, this test is very powerful in this case.

P∞
Theorem 3.90 (Cauchy’s condensation test). Let k=1 ak be a series with 0 ≤ ak+1 ≤ ak
for all k. Then,
∞
X ∞
X
ak is convergent ⇐⇒ 2k a2k is convergent .
k=1 k=1

P
Proof. We will bound the series ak from above and below. This will imply the result by
Theorem 3.79. For this, we group the terms of the non-increasing sequence (ak ) into “blocks”
with indices {2k , 2k +1, . . . , 2k+1 −1}, and just bound all elements in such a block by the smallest
and the largest one, respectively.
To be precise, note that every natural number can be written uniquely as 2k + ` for some k ∈ N
and some ` ∈ {0, 1, . . . , 2k − 1} (Check that!), which shows

∞ ∞ 2X
−1k
X X
ak = a2k +` .
k=1 k=1 `=0

Moreover, by simply bounding by maximum or minimum and the number of elements, we obtain
k −1
2X
k
2 a2k+1 ≤ a2k +` ≤ 2k a2k
`=0

for all k ∈ N. This implies the statement.

P∞ −c
We are finally in the position to study the series k=1 k for c > 0.

Lemma 3.91. We have that

∞
X 1
<∞ ⇐⇒ c > 1.
k=1
kc

Proof. By the condensation test we see that this series is convergent if and only if
∞ ∞
k
2k (2k )−c =
X X
21−c < ∞.
k=1 k=1

Now note that this is just a geometric series (with q = 21−c ) and we have 21−c < 1 if and only
if c > 1, which proves the result.
127

3.7.5 Leibniz criterion

The last convergence test we want to discuss is again only applicable for special series. Again,
the terms of the series are based on a monotone null sequence. However, we consider their
alternating sum and show that this is always a convergent series.

Theorem 3.92 (Leibniz criterion). Let (ak )k∈N be monotone with ak → 0. Then,
∞
X
(−1)k ak is convergent.
k=1

Proof. Assume w.l.o.g. that (ak ) is non-increasing, and therefore non-negative. (Why?)
Since the sequence (ak ) is non-increasing, we have that ak − ak+1 ≥ 0 for all k. Thus

s2n+2 = s2n + (−1)2n+1 a2n+1 + (−1)2n+2 a2n+2 = s2n − (a2n+1 − a2n+2 ) ≤ s2n .

This means the sequence (s2n )n∈N is non-increasing. The same argument implies that the
sequence (s2n−1 )n∈N is non-decreasing. Furthermore s2n − s2n−1 = a2n ≥ 0 implies that s2n−1 ≤
s2n . This yields that for all n ∈ N

s1 ≤ s3 ≤ · · · ≤ s2n−1 ≤ s2n ≤ · · · ≤ s4 ≤ s2 .

So we have two monotone and bounded sequences, (s2n ) and (s2n+1 ), which are therefore con-
vergent. We still need to show that their limits are the same. (Otherwise we would have only
proven that the series has two accumulation points.) But we clearly have

|s2n − s2n−1 | = a2n+1 → 0,

which implies that (sn ) is convergent.

This theorem shows that alternating series are somehow easier to handle than their non-alternating
versions. The following example will demonstrate this.
k
Example 3.93. The alternating harmonic series (−1)
P
k converges by the Leibniz criterion,
1
since ak = k is a decreasing null sequence. Later, we will even prove that
∞
X (−1)k+1
= ln(2) ≈ 0.693,
k=1
k

where ln(x) is the natural logarithm. Recall that the ’normal’ harmonic series does not converge.

√1 (−1)k ak is
P
Example 3.94. Since ak = k+4
is a decreasing null sequence, we have that
(−1)k
convergent. However, note that bk = √
k
is also a null sequence, but not monotone, and
P 1
(−1)k bk = √ = ∞.
P
k
128

3.8 Power series

We finally consider series that contain a free parameter or, in other words, describe a function
whenever they are convergent.

Definition 3.95 (Power series). Let (ak )∞k=0 ⊂ C be a sequence and let c ∈ C.
Then, for z ∈ C, we define the (formal) power series as
∞
X
f (z) := ak (z − c)k .
k=0

We call the ak the coefficients and c the center of the power series.
We call
1
R := R(f ) := p
k
lim supk→∞ |ak |
the radius of convergence of the power series f .
p
k
p
k
(We set R = ∞ if lim supk→∞ |ak | = 0, and R = 0 if lim supk→∞ |ak | = ∞.)
We call Df = {y ∈ C : |y − c| < R(f )} the disc of convergence of f .

If we consider a power series only for real inputs, i.e., f (x) = ∞ k

k=0 ak (x − c) for some (ak ) ⊂ R,
P

c, x ∈ R, then we call it a real power series and Df = {y ∈ R : |y − c| < R(f )} = (c − R, c + R)

with R = R(f ) is the interval of convergence.
The term formal in the above definition means that we do not know a priori if the power series
converges for a given z. As for series in general: It might be a number or ±∞, but it might also
not exist (as a number).

However, as the name radius of convergence already indicates, the number R(f ) plays a crucial
role for deciding if a series converges.

Theorem 3.96 (Radius of convergence). Let (ak )∞ k=0 ⊂ C be a sequence, c ∈ C, and let f
be the corresponding formal power series with radius of convergence R := R(f ).
P∞
Then, the power series f (z) = k=0 ak (z − c)k is

• absolutely convergent for z ∈ Df = UR (c) = {y ∈ C : |y − c| < R}, and

• divergent for z ∈ {y ∈ C : |y − c| > R}.

For z with |z − c| = R, we do not get a definitive answer.

Proof. Set bk = ak (z − c)k . Set x = (z − c) and consider

q
k
q |x|
lim sup |bk | = lim sup |x| k |ak | = .
k→∞ k→∞ R
Thus the root test implies that if |x| < R, the series converges. If on the other hand |x| > R we
see that (bk ) cannot be a null sequence, so the series cannot be convergent.
129

P∞ zk P∞
Example 3.97. Let us consider the power series f (z) = k=0 k , i.e., k=0 ak (z − c)k with
ak = k1 and c = 0. We have
√ 1
| k ak | = √
k
→ 1.
k
P zk
So, the radius of convergence R(f ) = 1, and we obtain that k is absolutely convergent if
|z| < 1.

In some cases, however, it is easier to use the ratio test to verify convergence of a power series.
Luckily, we have that the radius of convergence can also be given by the corresponding limit, if
it exists.

P∞ k
Lemma 3.98. The radius of convergence of a power series f (z) := k=0 ak (z−c) satisfies

ak
R(f ) = lim
k→∞ ak+1

whenever the limit on the right hand side exists.

(Note the changed order in the quotient in comparison to the ratio test.)

We omit the formal proof.

P∞ xk
Example 3.99. Let us consider the real power series f (x) = k=0 k . We have

ak
= k + 1 → 1.

a k
k+1
k
x
k is absolutely convergent if |x| < 1. We will see later
P
So, by the ratio test we obtain that
xk
that this series describes the natural logarithm in this range by ln(1−x) = − ∞ k=0 k for |x| < 1.
P

Example 3.100. We also obtain that the power series

∞
X zk
k=0
k!

is absolutely convergent for every z ∈ C. To see this, note that

ak
= k + 1 → ∞.
a
k+1

We obtain by the above ratio test that the radius of convergence is ∞, and hence, that this
power series converges for all z ∈ C.
zk
We will see later that this series describes the exponential function, i.e, ez = ∞
P
k=0 k! .

Note that we can consider

f : Df → C
∞
X
z 7→ ak (z − c)k
k=0

as function, and it would be interesting to find explicit expressions as functions, at least for
some power series. We already know the most important example:
130

Example 3.101 (Geometric series). Consider the power series f (x) := ∞ k

k=0 x for x ∈ R, i.e.,
P
P∞ k
k=0 ak (z − √
c) with ak = 1 and c = 0.
We have lim k ak = 1, and hence Df = (−1, 1). We know from Example 3.60 that
∞
X 1
f (x) := xk = for x ∈ Df = (−1, 1).
k=0
1−x
1
Note that 1−x is also well-defined for other x, like x = −2. However, for x ∈
/ (−1, 1) the equality
to the power series does not hold, because f is divergent in this case.

1 1
10 1−x 1−x2
f (x)
8

2 8
1+x2
x 6
1+4x2
−2 −1.5 −1 −0.5 0.5 1 1.5 2

1 1 8 6
Figure 20: The graph of the functions 1−x , 1−x2 , 1+x2 , 1+4x2

Many series can be brought into the form of a geometric series, and we can therefore find also
explicit expressions for them. However, note this works only if the series converge.
Example 3.102. Consider the power series f (x) := ∞ x2k for x ∈ R.
P
k=0
P∞ 1
Using the last example, this can be written as f (x) = k=0 (x2 )k = 1−x 2 for all x ∈ (−1, 1).
P∞ k 2k ∞ 2 k 8
Analogously, we find k=0 8(−1) x = 8 k=0 (−x ) = 1+x2 for x ∈ (−1, 1), or
P

∞ ∞
X X 6
6(−4)k x2k = 6 (−4x2 )k = ,
k=0 k=0
1 + 4x2

which only holds for x ∈ (− 21 , 21 ). Note that it is also not easy to “see” from the graph of the
explicit expression, where it can be written as power series, see Figure 20.

Sometimes it is also helpful to write an explicit function as a series, i.e., a series expansions.
Example 3.103. Assume we want to write f (x) = x12 as a power series. Then, we can denote
y := 1 − x2 such that 1 − y = x2 . Since we can write ∞ k 1
k=0 y = 1−y for all |y| < 1, we see that
P

∞ ∞
1 1 X
k
X
f (x) = 2 = = y = (1 − x2 )k
x 1−y k=0 k=0
√ √
for all x ∈ R with |1 − x2 | < 1, i.e., x ∈ (− 2, 2) \ {0}.
P∞
However, rewriting this as a power series k=0 ak xk for some (ak ) ⊂ R would be a rather hard
computation. We will see later how to do that in a systematic way using derivatives.
131

Still we can use the above techniques to find out where more complicated series converge.

Example 3.104. Assume we want to find for which x ∈ R the series

∞ √
k 2−k (x2 − 2)k
X
f (x) :=
k=0
√ −k k
converges. By setting y = x2 − 2, we see that the power series g(y) := ∞
P
k=0 k 2 y has
q√ −1
k
radius of convergence R(g) = lim sup k 2−k = 2. Hence, g converges, whenever |y| < 2.
Moreover, we obtain that g diverges for |y| > 2. Verifying the cases y = ±2 separately, we see
that g diverges for all |y| ≥ 2. Using y = x2 − 2, we see that f (x) converges exactly for those
x ∈ R with |x2 − 2| < 2 ⇐⇒ x2 ∈ (0, 4) ⇐⇒ 0 < |x| < 2 or, equivalently, x ∈ (−2, 0) ∪ (0, 2).

Example 3.105. If we want to find for which x ∈ R the series

∞
X 1
f (x) := 2k (x3 − 1)k
k=0
k + 3
P∞ 1
converges, we set y = x3 − 1, and consider g(y) := k=0 k+3 2k y k . As above, R(g) =
q −1
1
lim sup k
k+3 2k = 1/2. Hence, g converges whenever |y| < 1/2. Moreover, we obtain
that g diverges for |y| > 1/2. Verifying the cases y = ± 21 separately, we see that g diverges for
(−1) k
y = 1/2 (harmonic series). However, for y = −1/2, we have g(−1/2) = ∞
P
k=0 k+3 , which is
convergent by the Leibniz criterion. Using y = x3 − 1, we see thatq
f (x) converges exactly for
those x ∈ R with − 21 ≤ x3 − 1 < 1
⇐⇒ x3 ∈ [ 21 , 32 ) ⇐⇒ x ∈
1 3 3

2
√3 , 2 .
2

Note that, in general, there is no way to give an explicit form for series as the ones
above.
132

4 Continuous functions and limits

We now come back to functions and, with the aid sequences and their limits, we will study certain
important properties which functions may have. As kind of a warm up we study continuity,
i.e., we precisely define what we mean by a continuous functions. However, on the way we
also define the essential concept of limits of a function (instead of a sequence), which will be
the basis also for the subsequent chapters.
As you probably know from school, a function is continuous if it has ’no jumps’ or ’can be drawn
without lifting the pen’, and this is a good intuition. However, this ’definition’ does not make
sense if it comes to more complex situations. Try, e.g., to draw the function f (x) = x · sin(1/x)
and decide whether it is continuous or not. (Hint: Use your favorite computer algebra software.)
Therefore, we need a formal definition of continuity. Throughtout this chapter, we only consider
real-valued functions defined on (subsets of) the real numbers.

Definition 4.1. Let D ⊂ R, x0 ∈ D and f : D → R. We call f continuous at x0 if for all

sequences (xn )n∈N with xn → x0 we have that limn→∞ f (xn ) exists and

lim f (xn ) = f (x0 ) = f lim xn .
n→∞ n→∞

If U ⊂ D and f is continuous at all x ∈ U we call f continuous on U , and

if f is continuous at all x ∈ D, then we just call f continuous.

Note that a function is continuous iff an expression of the form lim f (xn ) does not depend on
the specific sequence (xn ), but only on its limit x0 := lim(xn ) ∈ D.
Roughly speaking, we can interchange the limit with the function, if it is continuous.
Here, it is important that x0 ∈ D since otherwise f (x0 ) may not be defined. Later we will also
consider limits of functions in the other case.

Let us start with some easy examples.

Example 4.2. Constant functions are clearly continuous.

Example 4.3. The prototype of a discontinuous function is the Heaviside function which is
defined by (
0 if x < 0,
H(x) =
1 if x ≥ 0.

Figure 21: Heaviside function

133

1
If we now consider the sequence xn = n and −xn = − n1 we get that

lim H(xn ) = lim 1 = 1 6= 0 = lim 0 = lim H(−xn )

n→∞ n→∞ n→∞ n→∞

and hence that H is not continuous at 0. However, H is continuous at every other point. (This
is because H is then constant in a neighborhood around this point, and constant functions are
continuous.) Furthermore, the Heaviside function is not an ’exotic’ example of a not continuous
function, in fact this function plays an important role in physics.
Example 4.4. The example of the Heaviside function can be extended to ’jump-functions’.
Therefore let I = [a, b] be a closed interval, where a < b. If there exists some t ∈ I such that
a 6= t and b 6= t, then functions of the form
(
c1 if x < t
f (x) =
c2 if x ≥ t

are discontinuous at t, as long as c1 6= c2 . (For c1 = c2 this is clearly a constant function on I.)

Often one can find continuity in the following equivalent form, which is called the ε-δ-criterion.

Theorem 4.5 (ε-δ-criterion). Let f : D → R and x0 ∈ D. Then, f is continuous at x0 if

and only if
∀ε > 0 ∃ δ > 0 ∀x ∈ D : |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε.

In words: Given x0 ∈ D. For all (fixed) ε > 0 there exists δ > 0 such that for all x ∈ D with
|x − x0 | < δ we have that |f (x) − f (x0 )| < ε.

The condition in the above theorem was the first precise definition of a continuous function and
is one of the most essential (and frightening) mathematical statements. It may be stated as “a
small change in x0 only allows a small change in f (x0 )”. The precise definition is ultimately due
to Karl Weierstraß (1815–1897), who is often cited as the “father of modern analysis”.
The following figure gives a visualization of the criterion.

Figure 22: ε-δ-criterion

Proof. First we show f continuous at x0 =⇒ f satisfies ε-δ-criterion at x0 by contradiction. So

assume the ε-δ-criterion is not satisfied in x0 . That is, that there exists some ε0 > 0 such that

∀δ > 0 ∃x ∈ D : |x − x0 | < δ and |f (x) − f (x0 )| ≥ ε0 .

134

Now, for n ∈ N, let δn = n1 . Thus we can find xn ∈ D such that |xn −x0 | < n1 and |f (xn )−f (x)| ≥
ε0 . So we have found a sequence xn → x and f (xn ) −→ 6 f (x), which contradicts the continuity
of f . Hence, the ε-δ-criterion must be satisfied.
For the other direction assume the ε-δ-criterion holds and let ε > 0 and (xn ) be an arbitrary
sequence such that xn → x0 . By our assumption we can find δ > 0 such that for all x ∈ D we
have |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε. Since the sequence (xn ) converges to x0 we have that
there exists n0 ∈ N such that

n ≥ n0 =⇒ |xn − x0 | < δ =⇒ |f (xn ) − f (x0 )| < ε.

Hence xn → x0 =⇒ f (xn ) → f (x0 ).

Example 4.6. The identity, i.e. f (x) := x, is continuous on R. We want to prove the
statement two times, first by using the definition and then by using the ε-δ-criterion.

Proof by using the definition. Let x0 ∈ R and (xn ) ⊂ R be such that xn → x0 . Clearly,

|f (xn ) − f (x0 )| = |xn − x0 | → 0.

This yields that f is continuous at x0 . As this holds for all x0 ∈ R, we have that f is continuous.

Proof by using ε-δ-criterion. Let ε > 0 and x0 ∈ R. We choose δ = ε and obtain

|x − x0 | < δ =⇒ |f (x) − f (x0 )| = |x − x0 | < δ = ε.

Thus, by the ε-δ-criterion, f is continuous at x0 . As this holds for all x0 ∈ R, we have that f is
continuous.

Example 4.7. Let us consider the quadratic function f (x) = x2 on R. Since for every
convergent sequence (xn ) ⊂ R with xn → x, we have lim f (xn ) = lim x2n = (lim xn )2 = x2 =
f (x), and hence that f is continuous on R. We also give a proof of this fact using the ε-δ-criterion
to show the difference to the above example.

Proof by ε-δ-criterion. Let ε > 0 and x0 ∈ R. We need to find δ > 0 such that for all x with
|x − x0 | < δ we have that |f (x) − f (x0 )| = |x2 − x20 | < ε. Note that δ may depend on ε and on
x0 but not on x. (This may be justified by the order of the quantifiers in the definition.)
We use the triangle inequality to obtain
2
x − x2 = (x − x0 )(x + x0 ) = |x − x0 | |x + x0 | ≤ |x − x0 | |x − x0 | + 2|x0 | .

0

If we now assume δ ≤ 1, we obtain that

|x − x0 | + 2|x0 | < 1 + 2|x0 |

ε
for all x with |x − x0 | < δ ≤ 1. If we assume additionally that δ ≤ 1+2|x0 | , then
2
x − x2 < |x − x0 | 1 + 2|x0 | <
ε
0 1 + 2|x0 | = ε.
1 + 2|x0 |
135

ε ε
for all x with |x − x0 | < δ ≤ min{1, 1+2|x 0|
}. Therefore, we can set δ := min{1, 1+2|x 0|
} and
obtain
|x − x0 | < δ =⇒ |x2 − x20 | < ε.
Note that δ > 0 for all ε > 0, which implies that f is continuous at x0 . As this also holds for all
x0 ∈ R, we obtain that f is continuous.

Remark 4.8 (*). Note that in the above example, δ depends on ε and x0 , and it is not hard
to see that this dependence is necessary here. It is even no problem that δ → 0 if ε → 0 and/or
x0 → ±∞, since we only need δ > 0 for all fixed ε and x0 .

√
Example 4.9. The root function f (x) = x on [0, ∞) is continuous.

Proof. Let ε > 0 and x, y ∈ [0, 1]. The binomial theorem implies
√ √ √
| x − y|2 = x + y − 2 xy ≤ x + y − 2 min{x, y} = |x − y|.
√ √ √ √
This shows that | x − y| ≤ |x − y|1/2 . If we now choose δ = ε2 , we see that | x − y| < ε for
all x, y with |x − y| < δ. This shows that f is continuous on [0, ∞).

Example 4.10. Let us also show that the exponential function f (x) = ax for fixed a ∈ (0, ∞)
is a continuous function on R.

Proof. Let (xn ) be convergent with xn → x0 . Then, for every k ∈ N there is a nk ∈ N such that
1
|xn − x0 | ≤
k
√
for all n ≥ nk . Using the fact that a1/k = k
a → 1, as k → ∞, we obtain that axn −x0 → 1, as
n → ∞. This yields

|axn − ax0 | = |ax0 (1 − axn −x0 )| ≤ ax0 |1 − axn −x0 | → 0.

Hence limn→∞ axn = ax0 for every sequence (xn ) with xn → x0 .

Example 4.11. The trigonometric functions sin and cos are continuous on R. This can
be seen from their ’graphical definition’, or also from the inequality | sin(x) − sin(y)| ≤ |x − y|.
We do not give a formal proof here, as there will be a very simple one later.
136

4.1 Calculation rules of continuous functions

Next we want to establish some calculation rules for continuous functions. These rules allow to
prove continuity of complicated functions by proving that its (hopefully easier) ’building blocks’
are continuous.

Theorem 4.12 (Calculation rules for continuous functions). Let f, g : D → R be continuous

at x0 ∈ D and c ∈ R.
Then f + g, f · g and c · f are continuous at x0 .
f
If additionally g(x0 ) 6= 0, then g is also continuous at x0 .

Proof. The theorem follows immediately from Theorem 3.24 about the calculation rules for
limits. We only show that f + g is continuous at x0 , if f and g are continuous in x0 . The
remaining cases can be shown in the same way. Assume that (xn ) is a sequence in D converging
to x0 . By the calculation rules for limits and the continuity of f and g, we obtain

lim (f + g)(xn ) = lim f (xn ) + lim g(xn ) = f (x0 ) + g(x0 ) = (f + g)(x0 ).

n→∞ n→∞ n→∞

As this was independent of the specific sequence, f + g is continuous at x0 .

Let us discuss some examples.

Example 4.13. Let p(x) = nk=0 ck · xk , with c0 , . . . , cn ∈ C, be a polynomial of degree n
P

on R. Then, p(x) is continuous on R.

Proof. We have already seen that constant functions and the identity are continuous on R.
Applying the above theorem several times we obtain that also xk , and therefore ck xk , are
continuous on R. Adding up these terms and applying the theorem again, we get the result.
By Theorem 4.65, we see that p is uniformly continuous on every closed interval D ⊂ R. If D
is an (half-)open interval, say D = (a, b), then note that p is uniformly continuous on [a, b], and
therefore uniformly continuous on every subset, e.g., on D.

Example 4.14. Let p, q : D → R be polynomials, and S := {x ∈ D : q(x) = 0}. Then, the

rational function pq is continuous on D \ S.

Proof. We already know that p and q are continuous in D from the last example. Furthermore,
from the above theorem, pq is continuous at x0 whenever q(x0 ) 6= 0. Since q has no zeros in D
we get the result.

Example 4.15. A function f of the form

n
X
f (x) = a0 + ak sin(kx) + bk cos(kx) ,
k=1

where all ak , bk ∈ R and n ∈ N, is called (real) trigonometric polynomial. These functions

play an important role in Fourier analysis and signal processing. In the same way as above, we
see that f is continuous on R.
137

Example 4.16. Another consequence of the above theorem is the continuity of tan x at all
x ∈ R with cos x 6= 0, i.e. x 6= π2 + kπ for all k ∈ Z. This follows from the representation
sin x
tan x = cos x and the continuity of sin and cos. Analogously, if x 6= kπ then cot x is continuous.

Another type of operation offunctions we have to discuss is the composition of functions.

That is, (g ◦ f )(x) := g f (x) , i.e., we first apply one function and then the other function to
the output. The following theorem is sometimes particularly helpful in proving continuity of
complicated functions.

Theorem 4.17. Let D, E ⊂ R. Moreover, let f : D → E be continuous at x0 ∈ D, and

g : E → R be continuous at y0 = f (x0 ) ∈ E. Then, g ◦ f is continuous at x0 .

Proof. Consider an arbitrary sequence (xn ) in D converging to x0 . Setting yn = f (xn ) and using
continuity of f and g we obtain

lim g(yn ) = g lim yn = g lim f (xn ) = g f lim xn = g (f (x0 )) = g ◦ f (x0 ).
n→∞ n→∞ n→∞ n→∞

Example 4.18. Let f : D → R be a continuous function (on D). Then, |f | is continuous.

This easily follows from the composition g ◦ f of f with the continuous function g(x) = |x|.

Example 4.19. Let f, g : D → R be continuous functions (on D). Then min{f, g} and
max{f, g} are also continuous.

Proof. We know the representations

f + g − |f − g|
min{f, g} =
2
and
f + g + |f − g|
max{f, g} = .
2
(This may be shown by case distinction.) The right hand sides are continuous by the above
theorem.

We now show that also the inverse of a continuous function on intervals is continuous.
Recall that the inverse only exists for bijective functions.

Theorem 4.20. Let I be an interval and f : I → D ⊂ R be a bijective function. If f is

continuous on I, then the inverse function f −1 is continuous on D.

Note that the interval might be open, closed, bounded or unbounded, but it is important that
it is a “connected” set.
138

Proof. Let y ∈ D be arbitrary and let (yn ) be a sequence converging to y. We want to show
that f −1 (yn ) converges to f −1 (y). For this, we define xn := f −1 (yn ) and x := f −1 (y), thus
yn = f (xn ) and y = f (x). Clearly, the sequence (xn ) is contained in the bounded set [a, b].
Hence, the Bolzano-Weierstrass theorem (Theorem 3.42) yields that there exists a convergent
subsequence (xnk ), and we set z := limk→∞ xnk . Using that the continuity of f and that
z ∈ [a, b], we obtain
lim ynk = lim f (xnk ) = f (z).
k→∞ k→∞
Since yn → y also implies limk→∞ ynk = y = f (x), we get that f (z) = f (x). Since f is bijective,
this implies z = x. This gives limk→∞ xnk = x. As this holds for all convergent subsequences
of (xn ), we see that (xn ) has exactly one accumulation point, and is therefore convergent, i.e.,
xn → x. All in all we have shown that
f −1 (yn ) = xn → x = f −1 (y),
which concludes the proof.

This theorem gives an easy argument for the continuity of several known functions.
√
Example 4.21. We obtain that root functions f (x) = k x are continuous on [0, ∞) for
arbitrary k ∈ N. Just note that f is the inverse function of f −1 (x) = xk on [0, ∞), which is
bijective, and continuous because it is a polynomial.
In the same way, we obtain that f (x) = √ 1 −1/k is continuous on (0, ∞) for every k ∈ N.
k x = x

Example 4.22. We also obtain that logarithmic functions f (x) = logb (x) are continuous on
R+ := (0, ∞) for every b > 1.
Again, just note that f is the inverse function of the exponential function f −1 (x) = bx on
R, which is bijective, and continuous, see Example 4.10. (Note that f : R+ → R, and hence
f −1 : R → R+ .)

4.2 Limits of functions

By definition, a function f : D → R is continuous at x0 ∈ D, if limn→∞ f (xn ) is the same for

every sequence (xn ) with xn → x0 . We now want to give a shorter notation for this property,
and generalize this concept to x0 6∈ D. However, this may only work for x0 that lie ’close’ to D.
We specify this in the following definition.

Definition 4.23 (Accumulation point of a set). Let D ⊂ R be a non-empty set. We call

x0 ∈ R ∪ {−∞, +∞} an accumulation point of D if there exists a sequence (xn )n∈N in D
such that
x0 = lim xn and xn 6= x0 for all n ∈ N.
n→∞

An equivalent definition for x0 ∈ R is

∀ε > 0 : Bε (x0 ) \ {x0 } ∩ D 6= ∅,

i.e., every ε-neighborhood Bε (x0 ) around x0 contains a point of D different from x0 .

139

Note that we allow also ±∞ as accumulation points. Clearly, they are accumulation points if D
is not bounded (from below/above).
Moreover, note that accumulation points of a set D may not be contained in D, and that not
all points of D are accumulation points.
Example 4.24. The set of all accumulation points of (a, b) (or [a, b]) with a, b ∈ R is the closed
interval [a, b].
The set of all accumulation points of (a, ∞) is [a, ∞) ∪ {∞}.
Example 4.25. The sets N and Z do not contain any accumulation points. However, note that
N has the accumulation point ∞, and Z has ±∞ as accumulation points.
n o
Example 4.26. Consider M = n1 : n ∈ N . Then, 0 is an accumulation point of M , but 0 is
not in M . Moreover, 0 is the only accumulation point, since there is no non-constant sequence
1
in M converging to, e.g., 42 . This example shows that M and the set of its accumulation points
can even be disjoint.

We now can give a precise definition of what we mean by limits of functions.

Definition 4.27 (Limit of functions). Let D ⊂ R, f : D → R, y ∈ R and x0 ∈ R∪{−∞, ∞}

be an accumulation point of D. We call y the limit of f as x → x0 , if for arbitrary sequences
(xn ) in D such that xn → x0 and xn 6= x0 for all n ∈ N, we have

lim f (xn ) = y.
n→∞

In this case we use the notation

lim f (x) = y.
x→x0

In the case y = ±∞ we say that f tends to ±∞ as x → x0 .

It is important to note that the existence of the limit limx→x0 f (x) does not depend on the value
f (x0 ). However, if f is continuous at x0 ∈ D and x0 is an accumulation point, then this limit
must still be equal to f (x0 ).

Lemma 4.28. Let D ⊂ R, f : D → R and x0 ∈ D be an accumulation point of D. It holds

f is continuous at x0 ⇐⇒ lim f (x) = f (x0 ).

x→x0

(Verify this yourself!)

Remark 4.29. We call a point x0 ∈ D an isolated point of D if it is not an accumulation

point. That is, the only sequences that converge to an isolated point x0 are sequences that are
eventually constant, i.e., there is some n0 such that xn = x0 for all n ≥ n0 . (Think e.g. of D = N,
which consists only of isolated points.) For such sequences we clearly have f (xn ) → f (x0 )
and therefore, a function is always continuous at isolated points of its domain, but
limx→x0 f (x) is not defined, because there is no sequence with xn → x0 and xn 6= x0 for all n.
However, we will mostly talk about sets D without isolated points, like intervals, here.
140

Let us discuss some examples.

Example 4.30. Consider the function f (x) = x3 − x + 1 on R. Since f is continuous (as a
polynomial), we obtain limx→a f (x) = f (a) for every a ∈ R. So, e.g., limx→0 f (x) = f (0) = 1
and limx→2 f (x) = f (2) = 7.

Example 4.31. Let f : R \ {0} → R be defined by f (x) = x1 . Since f is continuous at all

x ∈ R \ {0}, we have, e.g., limx→1 f (x) = 1 or limx→−2 f (x) = −1/2. Moreover, we have

lim f (x) = lim f (x) = 0.

x→∞ x→−∞

However, note that limx→0 f (x) does not exists. For this, consider xn := 1/n and yn = −1/n,
which satisfy lim(xn ) = lim(yn ) = 0, but lim f (xn ) = ∞ and lim f (yn ) = −∞.

Example 4.32. Let f : R \ {0} → R be defined by f (x) = x12 . Since f is continuous at all
x ∈ R \ {0}, we have, e.g., limx→1 f (x) = 1 or limx→−2 f (x) = 1/4. Moreover, we have

lim f (x) = ∞ and lim f (x) = lim f (x) = 0.

x→0 x→∞ x→−∞

So, limx→x0 f (x) exists for all x0 ∈ R ∪ {−∞, ∞}.

Example 4.33 (Euler number). Let us consider an interesting limit that we have considered
already for sequences, see Example 3.36. Recall that Euler’s number was defined by
n n
1 1

e = lim 1+ = sup 1 + .
n→∞ n n n
It is not hard to see that one obtains the same limit for every xn → ∞ in place of xn = n, i.e.,
y y
1 1

e = lim 1+ = sup 1+ :y>0 .
y→∞ y y
(Verify this precisely!)
With this, we can find a useful representation for the powers ex for x ∈ R. First note that by
continuity of the function f (y) := y x on (0, ∞), we obtain
y x xy
1 1

ex = lim 1+ = lim 1+
y→∞ y y→∞ y
For x > 0 we use that for every sequence (yn ) we have yn → ∞ ⇐⇒ zn := x · yn → ∞. Hence,
xy z n
1 x x

ex = lim 1+ = lim 1+ = lim 1+ .
y→∞ y z→∞ z n→∞ n
(We used the substitution z = x · y ⇐⇒ y = xz . The last equality is just taking the special
1
sequence zn = n.) Using ex = e−x , we can prove the same equation for x < 0. (Verify this!)
We can now follow exactly the same lines as in Example 3.69, using the binomial theorem, to
obtain the representations
∞
x n xk
X
x
e = lim 1 + = ,
n→∞ n k=0
k!
which is valid for all x ∈ R.
141

Based on what we know about limits and continuous functions in general, we again obtain the
following rules of calculation.

Lemma 4.34. Let f, g : D → R, and x0 an accumulation point of D, such that

A := lim f (x) and B := lim g(x)

x→x0 x→x0

with A, B ∈ R. Then, we have

(i) lim f (x) ± g(x) = A ± B,

x→x0

(ii) lim f (x)g(x) = A · B, and

x→x0

f (x) A
(iii) if B 6= 0, then lim = B.
x→x0 g(x)

Moreover, if g : D → E and h : E → R are such that y0 := lim g(x) ∈ E and h is continuous

x→x0
at y0 , then lim h ◦ g(x) = h(y0 ).
x→x0

We leave the proof as an exercise. (Compare it to Theorem 3.24.)

2
Example 4.35. Let f : R → R with f (x) = 2x −x+1 . Then, we can write f = h ◦ g with
h(x) = 2x and g(x) = x2 − x + 1. Both functions are continuous on R and hence, limx→x0 f (x) =
2
2x0 −x0 +1 for every x0 ∈ R.
E.g., we have limx→2 f (x) = 8 and limx→−2 f (x) = 32. Moreover, for x0 = ±∞, we obtain that
limx→∞ f (x) = limx→−∞ f (x) = ∞. (Here, we use that x → ±∞ implies g(x) → ∞.)

Example 4.36. Let f : R \ {0} → R with f (x) = sin(1/x). Then, we can write f = h ◦ g with
h(x) = sin(x) and g(x) = x1 . Since h is continuous on R and g is continuous at every x0 6= 0
with y0 := lim g(x) = g(x0 ) = x10 , we obtain limx→x0 f (x) = h(y0 ) = sin( x10 ) for every x0 6= 0.
x→x0
So, e.g., limx→2 f (x) = sin( 12 ) or limx→1/π f (x) = sin(π) = 0. Moreover, we have (for x0 = ±∞)
that limx→∞ f (x) = limx→−∞ f (x) = sin(0) = 0.
1
However, the limit limx→0 f (x) does not exist. To see this, define the sequence xn = π(n+1/2) ,
n
and note that xn → 0, but f (xn ) = (−1) , which is not convergent, see Figure 23.

1
0.5 sin(1/x)
0
−0.5
−1
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

Figure 23: The function f (x) = sin(1/x)

142

Example 4.37. Let f : R \ {0} → R with f (x) = x2 · sin(1/x).

Again, all limits at x0 6= 0 exist. For x0 = 0 the above rules cannot be applied because
limx→0 sin(1/x) does not exist. However, one might notice that

−x2 ≤ x2 · sin(1/x) ≤ x2

for all x 6= 0, since | sin(y)| ≤ 1 for all y ∈ R. Hence, limn→∞ |f (xn )| ≤ limn→∞ x2n = 0 for every
null sequence (xn ). This implies that limx→0 f (x) = 0, see Figure 24.

x2
0.01
x2 sin(1/x)

-0.1 0.1

-0.01
−x2

Figure 24: The function f (x) = x2 · sin(1/x) with bounds ±x2

The calculations from the last example should remind you to the application of the sandwich
rule for limits of sequences, Theorem 3.28. In fact, we have a very similar rule for limits of
functions, if two enclosing functions have the same limit.

Theorem 4.38 (Sandwich rule for functions). Let f, g, h : D → R with

f (x) ≤ g(x) ≤ h(x) for all x ∈ D.

If x0 is an accumulation point of D and

L := lim f (x) = lim h(x),

x→x0 x→x0

then also limx→x0 g(x) = L. (In particular, the limit exists.)

Proof. Consider an arbitrary sequence (xn ) in D converging to x0 . By assumption, we have

f (xn ) ≤ g(xn ) ≤ h(xn ) for all n. Together with L := limn→∞ f (xn ) = limn→∞ h(xn ), Theo-
rem 3.28 implies that limn→∞ g(xn ) exists, and equals limn→∞ g(xn ) = L. (Note that we used
continuity here.) This proves the claim, since limn→∞ g(xn ) = L holds for every sequence with
xn → x0 .

Most often, we will apply the sandwich rule in the following form.

Corollary 4.39. Let f, g : D → R with |f (x)| ≤ g(x) for all x ∈ D.

If x0 is an accumulation point of D and limx→x0 g(x) = 0, then also limx→x0 f (x) = 0.
(In particular, the limit exists.)
143

Another thing that can be seen from the above example is that, sometimes, function are not
well-defined at a single point (or even on a set), but we can compute the limit at this point. In
such a case, we may “extend the domain” of the function using these limits. That is, given
a function f : D → R and some accumulation point x0 ∈ / D of D (i.e., f is not defined at x0 ),
such that y := limx→x0 f (x) exists. Then, we can define the function g : D ∪ {x0 } → R with
(
f (x) if x 6= x0
g(x) :=
y if x = x0 .

Just by definition, g is continuous at x0 . That’s why g is called a continuous extension of f .

Example 4.40. Let f : R \ {0} → R with f (x) = x2 · sin(1/x).

This function is continuous on R \ {0} and we computed in Example 4.37 that limx→0 f (x) = 0.
Hence, the function (
x2 · sin(1/x) if x 6= 0
g(x) :=
0 if x = 0.
in continuous on R.

Example 4.41. Let D = R \ {1} and f : D → R be given by

x2 − 1
f (x) = .
x−1
Then, f is not well defined in x0 = 1 (since dividing by zero is not allowed). However,

x2 − 1 (x + 1)(x − 1)
lim = lim = lim (x + 1) = 2.
x→1 x − 1 x→1 x−1 x→1

x−1
(Note that x−1 = 1 can only be used for x 6= 1, which holds inside the limit.)
Hence, the function g : R → R with
x2 −1
(
x−1 if x 6= 1
g(x) =
2 if x = 1,

is continuous on R. A closer look shows that, in fact, g(x) = x + 1 on R.

Example 4.42. The Heaviside function H : R → R, i.e.,

(
0 if x < 0,
H(x) =
1 if x ≥ 0.

is another example where limx→0 H(x) does not exist.

144

However, note that limx→0 H(x) would exist, if we only allow positive or negative sequences,
respectively. This motivates the definition of one-sided limits of functions.

Definition 4.43 (One-sided limits). Let D ⊂ R, y ∈ R and f : D → R.

• Let x0 be an accumulation point of D+ := D ∩ (x0 , ∞).

We say that f has a right-sided limit y as x → x0 , if for arbitrary sequences
(xn ) ⊂ D+ with xn → x0 we have that

f (xn ) → y.

We use the notation limx&x0 f (x) = y or limx→x+ f (x) = y.

• Let x0 be an accumulation point of D− := D ∩ (−∞, x0 ).

We say that f has a left-sided limit y as x → x0 , if for arbitrary sequences (xn ) ⊂ D−
with xn → x0 we have that
f (xn ) → y.
We use the notation limx%x0 f (x) = y or limx→x− f (x) = y.
0

Alternatively, we say that f approaches y if x → x0 from the right/left.

Again, in the case y = ±∞ we say that f tends to ±∞ as x % x0 (or x & x0 ).

Note that the assumption that x0 is an accumulation point of D+ just means that there are
points in D that are to the right and arbitrarily close to x0 , i.e., there is a sequence in D+
converging to x0 . That x0 is an accumulation point of D− means the same ’to the left’.

Example 4.44. Let H be the Heaviside function as above. Then, H is not continuous at 0,
but the one-sided limits exist and we have

lim H(x) = lim H(x) = 1 and lim H(x) = lim H(x) = 0.

x&0 x→0+ x%0 x→0−

Since these limits are different, limx→0 H(x) does not exist.

Example 4.45. We have a look at the function

1
f (x) =
1 − x2
on D = R\{±1}. We have
1 1 1 1
lim = −∞, lim = ∞, lim = lim = 0.
x&1 1 − x2 x%1 1 − x2 x→∞ 1−x 2 x→−∞ 1 − x2
We see that all one-sided limits of f exist, but the limits at x0 = ±1 do not exist.
145

1
Figure 25: f (x) = 1−x2

Example 4.46. Let us examine the following function

(
x−2
x2 −4
for x 6= ±2
f (x) =
0 for x = ±2

on D = R.
Due to
x−2 x−2 1
f (x) = 2
= =
x −4 (x − 2)(x + 2) x+2
for x 6= −2 we obtain
1 1
lim f (x) = lim = .
x→2 x→2 x+2 4
Hence, f is not continuous at x = 2. Moreover, we have lim f (x) = −∞ and lim f (x) = ∞.
x%−2 x&−2
So, it is also discontinuous at x = −2.

x−2
Figure 26: f (x) = x2 −4

We see that one-sided limits can exist although the limit does not exist. In this case, the one-
sided limits are still helpful to verify if the (two-sided) limit exists, or even if the function is
continuous. This is again an example of a mathematical concept that is just introduced to split
a task into two easier ones.
146

We now prove that limx→x0 f (x) exists if and only if both one-sided limits exist and are equal.

Theorem 4.47. Let D ⊂ R, f : D → R and x0 ∈ D be an accumulation point of D.

Then,

lim f (x) exists ⇐⇒ lim f (x) and lim f (x) exist and are equal.
x→x0 x&x0 x%x0

In this case, we have limx→x0 f (x) = limx%x0 f (x) = limx&x0 f (x).

In particular,

f is continuous at x0 ⇐⇒ lim f (x) = lim f (x) = f (x0 ).

x%x0 x&x0

Proof. First we assume that limx→x0 f (x) exists. Clearly, right and left handed limits are special
cases and therefore exist and limx%x0 f (x) = limx&x0 f (x).
For the other direction we assume that right- and left-handed limits in x0 exist and limx%x0 f (x) =
limx&x0 f (x). Let (xn ) be an arbitrary sequence such that xn → x0 and xn 6= x0 . We can split
(xn ) into two subsequences (yk+ ) and (yk− ), where yk+ > x0 and yk− < x0 for all k ∈ N. By our
assumption we get that
lim f (yk+ ) = lim f (yk− ).
k→∞ k→∞

Let us finally consider another interesting example, which is one of the most important limits
related to trigonometric functions. We consider the function si : R → R with

 sin x ,

if x 6= 0,
si(x) := x
1, if x = 0.


0.5

six
−15 −10 −5 5 10 15

This function is called the sinus cardinalis and is clearly well defined for all x 6= 0. We will
prove now that si is a continuous function on R.
Example 4.48. It remains to show that
sin x
lim = 1.
x→0 x
147

Proof. As si is even, i.e. si(−x) = si(x), it is sufficient to show that limx&0 sinx x = 1. Now, recall
the definition of sin, cos and tan on the unit circle, i.e., the circle with radius 1 and center at
the origin, see the following figure.

We see that the area of the enclosed circular sector with angle x ∈ [0, π2 ), which equals x2 , is
larger than the area of the triangle with legs sin x, cos x, but smaller than the area of the triangle
with legs tan x, 1. Using the corresponding area formulas we obtain
1 1 1
sin x cos x ≤ x ≤ tan x.
2 2 2
This is equivalent to
sin x
sin x cos x ≤ x ≤ tan x = ,
cos x
and therefore to
sin x 1
cos x ≤ ≤ .
x cos x
(Really try to prove this inequality from the one before!)
We know that cos x is continuous and limx→0 cos x = cos(0) = 1. Using the sandwich rule we
obtain
sin x
1 ≤ lim ≤ 1.
x→0 x
148

4.3 Intermediate and extreme value theorem

We now discuss two important properties of continuous functions. Both are (or at least look)
obvious for ’easy’ functions. However, we show that they hold under the weak assumption that
the function is continuous and defined on a closed interval.
The first will be the Intermediate value theorem, which states that a continuous function on a
closed interval attains all values between the function values at the endpoints of the interval.

Theorem 4.49 (Intermediate value theorem). Let I = [a, b] be a closed interval and
f : I → R be a continuous function. Then, for every y ∈ R with

min f (a), f (b) ≤ y ≤ max f (a), f (b) ,

there exists some x ∈ I such that

f (x) = y.

As this theorem holds for every y ∈ J := min{f (a), f (b)}, max{f (a), f (b)} , this theorem states
that the image of I under f , i.e. f (I), at least contains this interval, i.e. f (I) ⊃ J, if I is a
closed interval, see Figure 27.

Figure 27: Every value between f (a) and f (b) is attained

Proof. The case f (a) = f (b) is obvious. Now assume w.l.o.g. that f (a) < f (b). The case
f (a) > f (b) can be proven in the same way by replacing f by −f .
Now let y ∈ [f (a), f (b)]. We will define a sequence (xn ) with xn → x ∈ I and f (x) = y.
First, let a1 = a, b1 = b and x1 = a1 +b 2 , i.e., x1 is the midpoint between a and b.
1

If f (x1 ) ≥ y, then we set [a2 , b2 ] to be the ’left half’ of [a1 , b1 ]. If on the other hand f (x1 ) < y,
we set [a2 , b2 ] to be the ’right half’ of [a1 , b1 ]. In both cases we get that [a2 , b2 ] ⊂ [a1 , b1 ] and
f (a2 ) ≤ y ≤ f (b2 ). We iterate this process.
Let’s make that more formal: For n ∈ N and given an , bn such that f (an ) ≤ y ≤ f (bn ), we
define
an + bn
xn := ,
2
149

i.e., the midpoint of [an , bn ], and set

an+1 := an ,
bn+1 := xn ,

in the case that f (xn ) ≥ y. If f (xn ) < y, we set

an+1 := xn ,
bn+1 := bn .

With this, we get two sequences (an ) and (bn ) with [an , bn ] ⊂ [a, b] and f (an ) ≤ y ≤ f (bn ) for
all n ∈ N, and
a = a1 ≤ a2 ≤ · · · ≤ an ≤ · · · ≤ bn ≤ · · · ≤ b2 ≤ b1 = b.
This yields that (an ), (bn ) are monotone and bounded sequences, which are therefore convergent.
Moreover, we have bn − an = |a−b| 2n−1
, because we always halve the interval, and get that

lim an = lim bn = : x.
n→∞ n→∞

We clearly have x ∈ [a, b]. (Why?) Using f (an ) ≤ y ≤ f (bn ) for all n, we obtain

f (x) = lim f (an ) ≤ y ≤ lim f (bn ) = f (x).

n→∞ n→∞

Hence
f (x) = y.

An important special case is the following corollary, which is sometimes called Bolzano’s theorem.

Corollary 4.50. Let f : [a, b] → R with f (a) < 0 < f (b).

Then, f has at least one zero in (a, b), i.e., there is some t ∈ (a, b) with f (t) = 0.

Example 4.51. The intermediate value theorem yields an alternative argument for the existence
of arbitrary positive roots: For a > 0 and n ∈ N, let f : R≥0 → R be given by f (x) = xn −a. As a
polynomial, f is continuous. Furthermore, we have f (0) = −a < 0 and f (1+a) = (1+a)n −a > 0.
The intermediate value theorem then yields some x ∈ [0, 1 + a] with f (x) = 0, i.e. xn = a or
√
x = n a, respectively.

Example 4.52. The intermediate value theorem also yields the existence of real valued zeros
of polynomials with an odd degree. Therefore let n ∈ N be odd and p : R → R a polynomial of
degree n given by
p(x) = an xn + an−1 xn−1 + . . . + a1 x + a0
with an 6= 0. We assume w.l.o.g. that an > 0. Clearly we have limx→∞ p(x) = +∞ and
limx→−∞ p(x) = −∞. Consequently, there exist a, b ∈ R with a < b and p(a) < 0, p(b) > 0.
The intermediate value theorem yields then a x ∈ (a, b) with p(x) = 0.
150

Example 4.53. The next application refers to fixed points, i.e., points that are not changed
by a function. Let f : [0, 1] → [0, 1] be continuous. Then, there exists a x ∈ [0, 1] with f (x) = x.
For the proof we look at the continuous function g(x) = f (x) − x. We have g(0) = f (0) − 0 ≥ 0
and g(1) = f (1) − 1 ≤ 0, and the intermediate value theorem then yields a x ∈ [0, 1] with
g(x) = 0, so we have f (x) = x.

Figure 28: Fixed point of a function

We now want to discuss the extreme value theorem, which states that a continuous function on
a closed interval attains also its minimal and maximal value.

Let us start with the definition of minimal and maximal points of a function.
We use the term extremum if we do not specify if it is a minimum or maximum. Moreover, we
call them global extrema, as we will later discuss also a local variant of this concept.

Definition 4.54. Let D be any set and f : D → R.

Then, f has a (global) minimum at x0 ∈ D if

f (x) ≥ f (x0 ) for all x ∈ D,

and f has a (global) maximum at x0 ∈ D if

f (x) ≤ f (x0 ) for all x ∈ D.

The point x0 is called (global) minimum/maximum point, or global extreme point.

The value f (x0 ) is called minimum/maximum, or, collectively, extreme values.

Note that the minimum/maximum (value) of a function, if it exists, is unique. However, there
might still be more than one minimum/maximum point.

Example 4.55. Consider the function f (x) = x2 on [−1, 1].

This function has a unique global minimum point at 0 with minimum value 0, and two global
maximum points at −1 and 1 with maximum value 1. Note that the same function on the open
interval (−1, 1) does not have (global) maxima.
151

We now turn to the important extreme value theorem.

Theorem 4.56 (Extreme value theorem). Let I = [a, b] ⊂ R be a closed interval and
f : I → R be a continuous function. Then there exist xmin , xmax ∈ I such that

f (xmin ) = inf f (x) := inf {f (x) : x ∈ I} ,

x∈I
f (xmax ) = sup f (x) := sup {f (x) : x ∈ I} .
x∈I

In other words, continuous functions attain their extreme values on closed intervals.

This theorem shows that the infimum inf x∈I f (x) and supremum supx∈I f (x) are attained at
some points in I. In fact, infimum and supremum are actually minimum and maximum.

Proof. We only show that f attains its maximal value, the other case can be treated similarly.
Recall that f (I) = {y ∈ R : f (x) = y for some x ∈ I}. Since I is non-empty, f (I) is non-empty
and therefore, S := sup f (I) exists (the case S = ∞ is still allowed here). By the properties
of suprema there exists a sequence (yn ) in f (I) such that yn → S. Furthermore there exists
a sequence (xn ) in I such that f (xn ) = yn . This sequence is bounded, since all xn ∈ I, i.e.,
a ≤ xn ≤ b. By the Bolzano-Weierstrass theorem (Theorem 3.42) there exists a convergent
subsequence (xnk ) which converges to some x0 ∈ R. Now, by the sandwich rule (Theorem 3.28)
and a ≤ xnk ≤ b, we also obtain that a ≤ x0 ≤ b, i.e., x0 ∈ I. The definition of continuity yields

S = lim ynk = lim f (xnk ) = f (x0 ),

k→∞ k→∞

which proves the claim with xmax := x0 (and implies S < ∞).

Remark 4.57. Note that it is important that we have a closed interval in the extreme
value theorem. For open intervals the statement is not true in general. For instance, if we
consider f (x) = x (we already know that this is a continuous function) and I = (a, b) = (0, 1),
then
sup f (x) = 1 inf f (x) = 0,
x∈I x∈I

but clearly 0 6∈ f (I) and 1 6∈ f (I).

Example 4.58. Another example that shows that the extreme value theorem does not hold in
general for open intervals is the function f (x) = x1 on (0, 1). This function is continuous, but
unbounded and therefore does not have a maximum.

If we combine both theorems we obtain the following:

• The extreme value theorem shows that m := inf x∈I f (x) and M := supx∈I f (x) are
attained at some points xmin , xmax ∈ I.

• The intermediate value theorem (applied to the interval [xmin , xmax ]) implies that all
intermediate values are attained.
h i
• In short: f (I) = minx∈I f (x), maxx∈I f (x) .
152

• In particular, continuous functions map closed intervals to closed intervals.

Figure 29: All values between the extreme values f (xmin ) and f (xmax ) are attained

One essential corollary (which might appear obvious), is the following.

Corollary 4.59. Let I = [a, b] ⊂ R be a closed interval and f : I → R be a continuous

function. Then, there exists some R ∈ R such that

|f (x)| ≤ R for all x ∈ I.

In words, continuous functions on closed intervals are bounded, i.e., supx∈I |f (x)| < ∞.

Try to prove that yourself!

We finally want to discuss shortly how one can actually find a point x∗ ∈ [a, b] such that f (x∗ ) = y
for y with f (a) ≤ y ≤ f (b). (The intermediate value theorem implies that it exists.) For the
sake of simplicity we only discuss the case that y = 0. (One might consider g(x) = f (x) − y
otherwise.) The proof of the intermediate value theorem leads directly to a (practical) algorithm
for the approximation of x∗ with f (x∗ ) = 0 that is called the bisection method.
Let f : [a, b] → R be continuous with f (a) < 0 < f (b). The bisection method is inductively
defined as follows:

1. Start with k := 1, a1 := a, and b1 := b.

2. Compute the midpoint xk := 21 (ak + bk ).

3. If f (xk ) > 0, we set ak+1 := ak and bk+1 := xk ,

4. otherwise ak+1 := xk and bk+1 := bk .

5. Set k → k + 1 and continue with Step 2.

153

We always have ak , bk ∈ [a, b] and f (ak ) < 0 < f (bk ), i.e., there is always a zero x∗ in the interval
[ak , bk ], and we have
1
|xk − x∗ | ≤ (bk − ak ) = 2−k (b − a).
2
The iteration is stopped if (by chance) f (xk ) = 0 holds for some k or if the upper bound is
smaller than a prescribed ε > 0. xk is then used as an “ε-approximation” for the zero x∗ .
√
Example 4.60. The positive zero of the function f (x) = x2 − 2 is clearly x∗ = 2. We start
the bisection method with a = 0 and b = 2 and want to achieve an error |xk − x∗ | that is smaller
than ε = 0.05.
The requirements of the bisection method are fulfilled, since f is continuous and f (0) < 0 < f (2).
To satisfy the prescribed error bound ε > 0, we need that
1

−k
2 (2 − 0) < ε ⇐⇒ k > log2 + 1.
ε
For ε = 0.05, we can choose k = 6. In fact, for k = 6 the bounds shows that the error will be at
most 2−5 = 321
= 0.03125. Let us have a look on the first iterations:

k ak bk xk f (xk )

1 0 2 1 -1
3 1
2 1 2 2 4
3 5 7
3 1 2 4 − 16
5 3 11 7
4 4 2 8 − 64
11 3 23 17
5 8 2 16 256
11 23 45 23
6 8 16 32 − 1024

√
The actual error of x6 = 45
32 ≈ 1.40625 to the exact zero x∗ = 2 ≈ 1.414214 is, however, only
about 0.007964.

4.4 Other types of continuity

Let us introduce two stronger forms of continuity of functions that will be useful later. Note
that both are defined for the whole domain, and not at single points.

Definition 4.61. Let f : D → R be a real function. We call f uniformly continuous on

D if for all sequences (xn ),(yn ) ⊂ D with limn→∞ |xn − yn | = 0 we have that

lim |f (xn ) − f (yn )| = 0.

n→∞

By the above considerations, we immediately see that constant functions and the identity f (x) :=
x are uniformly continuous (Check yourself!), and we will show in Theorem 4.70 that every
uniformly continuous function is continuous. To see that uniform continuity is indeed a stronger
condition, consider the following example.
154

Example 4.62. Let f (x) := x2 on R, which is continuous by Example 4.7. Moreover, consider
the sequences defined by xn = n + 1/n and yn = n, which clearly satisfy |xn − yn | = n1 → 0.
However, we have

1 2

n + 2 + 1 − n2 → 2.

2
2
|f (xn ) − f (yn )| = n + −n =

n n2

This shows that f is not uniformly continuous on R.

As discussed in Example 4.7, the difference in proving continuity of, e.g., the functions x and
x2 , was that we had to choose δ depending on x0 in the latter case. We will now see that
uniformly continuous functions are precisely those, where we may find a δ independent of x0 in
an “ε-δ-proof” of continuity. (Clearly, such a δ still depends on ε in general.)

Theorem 4.63 (ε-δ-criterion for uniform continuity). Let f : D → R. Then, f is uniformly

continuous if and only if

∀ε > 0 ∃ δ > 0 ∀x, y ∈ D : |x − y| < δ =⇒ |f (x) − f (y)| < ε.

In words: For all (fixed) ε > 0 there exists δ > 0 such that for all x, y ∈ D with |x − y| < δ
we have that |f (x) − f (y)| < ε.

From this equivalent definition of uniform continuity, one may already see that it is indeed
a stronger condition than continuity. For this, note the additional “for all” quantifier in the
statement. For a better understanding, try to write the definition of “f is continuous on D (i.e.,
for all x0 ∈ D)” solely with quantifiers, and spot the difference.

Proof. First we prove that the ε-δ-criterion for uniform continuity implies uniform continuity.
Therefore assume that

∀ε > 0 ∃ δ > 0 ∀x, y ∈ D : |x − y| < δ =⇒ |f (x) − f (y)| < ε,

and let (xn ), (yn ) be sequences in D such that |xn − yn | → 0.

Now, fix some ε0 > 0, so we can find δ0 > 0 such that

∀x, y ∈ D : |x − y| < δ0 =⇒ |f (x) − f (y)| < ε0 .

Furthermore, since |xn − yn | → 0, we can find n0 ∈ N such that

∀n ≥ n0 : |xn − yn | < δ0 .

Hence
∀n ≥ n0 : |f (xn ) − f (yn )| < ε0 .
Since ε0 was arbitrary, we obtain that limn→∞ |f (xn ) − f (yn )| = 0. Since also (xn ) and (yn )
were arbitrary, we get that f is uniformly continuous on D from the definition.
The other direction is proved by contradiction. Therefore, we assume

∃ ε0 > 0 ∀δ > 0 ∃ x, y ∈ D : |x − y| < δ and |f (x) − f (y)| ≥ ε0 ,

155

Example 4.64. Try to prove yourself, using the ε-δ-criterion, that the absolute value function,
i.e. f (x) = |x|, is uniformly continuous on R.

Uniform continuity is an essential tool for many of the following considerations, in the same
way as absolute convergence was essential for series. But uniform continuity is sometimes not
so easy to show. The following theorem shows that, however, if a continuous function is defined
on a closed interval, then it is automatically uniformly continuous.

Theorem 4.65. Let a, b ∈ R with a < b. A continuous function f : [a, b] → R on a closed

interval [a, b] is uniformly continuous.

Together with Theorem 4.70 (see below), this shows that continuity and uniform continuity are
just the same for functions defined on closed intervals.

which contradicts |f (xnk ) − f (ynk )| ≥ ε0 . Hence, f must be uniformly continuous.

We come back to a known example.

Example 4.66. Let f (x) = x2 on D := [a, b] for some a < b. We know from Example 4.7 that f
is continuous on R, and therefore also on every D ⊂ R. Moreover, we know from Example 4.62
that f is not uniformly continuous on R. However, by Theorem 4.65, we see that f is uniformly
continuous on every closed interval.
This can also be proven directly by the same considerations as in the proof of Example 4.7.
ε
We obtain that, for δ = min{1, 1+2c } with c := max{|a|, |b|}, we have |f (x) − f (y)| < ε for all
x, y ∈ [a, b] with |x − y| < δ, proving the claim.
156

The last type of continuity we want to discuss is Lipschitz continuity. This one is the strongest,
but also the easiest to verify, concept one may consider. Luckily, it is enough to deal with this
type for most practical applications.

Definition 4.67. Let f : D → R. We call f Lipschitz continuous if there exists some

L > 0 such that
∀x, y ∈ D : |f (x) − f (y)| ≤ L|x − y|.
The constant L is called Lipschitz constant.

Example 4.68. Again, the constant function f (x) := c and the linear function f (x) := x,
are Lipschitz continuous. In both cases we can choose the Lipschitz constant L = 1. (For the
constant function, the Lipschitz constant may be chosen arbitrarily small.)

Example 4.69. It is also not hard to show that f (x) := x2 on D is Lipschitz continuous on
arbitrary bounded D ⊂ R. Check yourself!

Theorem 4.70. Let f : D → R. Then,

f is Lipschitz continuous =⇒ f is uniformly continuous =⇒ f is continuous

Proof. First we show that Lipschitz continuous functions are uniformly continuous. For arbitrary
ε > 0 we set δ = Lε , where L is the Lipschitz constant of f . Using the ε-δ-criterion for uniform
continuity we obtain

∀x, y ∈ D : |x − y| < δ =⇒ |f (x) − f (y)| < L · δ = ε.

As this holds for all ε > 0, the ε-δ-criterion for uniform continuity implies that f is uniformly
continuous.
Now assume that f is uniformly continuous on D and let x0 ∈ D. The uniform continuity yields

∀ε > 0 ∃δ > 0 ∀x ∈ D : |x − x0 | < δ =⇒ |f (x) − f (x0 )| < ε.

Hence f is continuous on D.

This theorem shows that Lipschitz continuity implies the two other forms of continuity we have
just discussed. However, one may still ask if some of these concepts are actually the same. We
have already seen that the function f (x) = x2 is not uniformly continuous on R, which shows
that uniform continuity is indeed stronger than continuity. The following example shows that a
function may be continuous on a closed interval (and therefore uniformly continuous), but not
Lipschitz continuous. That is, we do not have the reverse implications in Theorem 4.70, i.e.

Lipschitz continuous ⇐=
6 Uniformly continuous ⇐=
6 Continuous.
157

√
Example 4.71. Let f : [0, 1] → R with x 7→ x. Then f is uniformly continuous but not
Lipschitz continuous.

√
1 f (x) = x

0.8

y
0.6

0.4

0.2

x
0.2 0.4 0.6 0.8 1

√
Figure 30: The functionf (x) = x

Proof. First we show that f is uniformly continuous on [0, 1]. Again, we use the ε-δ-criterion
for the purpose of demonstration. Therefore, let ε > 0 and x, y ∈ [0, 1]. The binomial theorem
implies √ √ √
| x − y|2 = x + y − 2 xy ≤ x + y − 2 min{x, y} = |x − y|.
√ √ √ √
This shows that | x − y| ≤ |x − y|1/2 . If we now choose δ = ε2 , we see that | x − y| < ε for
all x, y with |x − y| < δ. This shows that f is uniformly continuous on [0, 1].
√ √
We now show that, however, f is not Lipschitz continuous. Multiplying and dividing by | x+ y|
(and the binomial theorem) yield

√ √ |x − y|
| x − y| = √ √ .
| x + y|

If f would be Lipschitz continuous, then there would exist some L > 0 such that
√ √ |x − y|
| x − y| = √ √ < L|x − y|
| x + y|
1√
for all x, y ∈ [0, 1]. However, this would mean L > |√x+ y|
for all x, y ∈ [0, 1], which is clearly
not true, as the right hand side can be made arbitrary large by choosing x and y small enough.
Hence f cannot be Lipschitz continuous.

Example 4.72. Let us finally mention, without formal proof, that the trigonometric func-
tions sin and cos are Lipschitz continuous on R with Lipschitz constant L = 1. This can
be seen from their ’graphical definition’. See also Figure 31 which shows that all function values
lie in the colored cone. The trigonometric functions are therefore also uniformly continuous.
158

Figure 31: Lipschitz continuity of sin function

159

5 Differential calculus
In this chapter we want to introduce and study derivatives of real-valued functions which give us
a better understanding of how small local changes of the input effect the output of a function.
For functions defined on the real line, as will assume throughout this chapter, one may think
about the slope (german: Anstieg) of the tangent line attached to a point of the graph of the
function. As already for continuity, this is a good intuition and, when well understood, makes
many of the upcoming results obvious for ’easy’ functions. However, we need again a precise
definition of a derivative to handle cases where a visualization is not possible or helpful.
Prominent application of the differential calculus are an alternative (and more precise) defini-
tion of minima and maxima of a function, and the derivative provides us with information of
whether a function is increasing or decreasing in a given point. However, there is much more
information ’hidden’ in the values of the higher order derivatives at a point. In fact, under
certain assumptions on the function, a function can be given approximately in a neighborhood
of the point just by knowing some of these values. This will be formalized by means of the
Taylor polynomial. Finally, we present the very useful l’Hospital rule, which is a powerful tool
to compute complicated limits.
Let us begin with a precise definition of a derivative of a function, which is clearly only a precise
notion (using limits of function) of the slope of a tangent line in a point.

Definition 5.1. Let I = (a, b) and f : I → R. We call f differentiable at x0 ∈ I if

f (x0 + h) − f (x0 ) f (x) − f (x0 )

lim = lim exists.
h→0 h x→x 0 x − x0
df
In this case we call this limit derivative of f at x0 , and write f 0 (x0 ) or d
dx f (x0 )or dx (x0 ).
If f is differentiable at every point of I, we call f differentiable (in I) and denote by f 0 or
d
dx f the derivative of f .

From now on we mostly consider functions that are defined on an open interval I. This is
because the endpoints of an interval need sometimes more care. (Moreover, the derivative is
clearly not defined at isolated points, if the domain of definition contains some.) We comment
on the differences when needed.
The expression
f (x0 + h) − f (x0 )
h
is called difference quotient. Geometrically this is the slope of the secant through the points
(x0 , f (x0 )) and (x0 + h, f (x0 + h)). Hence if f 0 (x0 ) exists, it is the slope of the tangent or the
function at the point x0 .
160

Figure 32: Difference quotient

Obviously, constant functions, i.e., f (x) = c for c ∈ R, have the derivative f 0 = 0. Let us discuss
some more examples which will serve as building blocks for more complicated functions.
Example 5.2. Let f (x) = xn , with n ∈ N. Then f is differentiable on R with f 0 (x) = nxn−1 .

Proof. We use the binomial theorem to compute

! ! !
f (x + h) − f (x) (x + h)n − xn n n−2 n n−3 2 n n−1
= = nxn−1 + x h+ x h + ··· + h .
h h 2 3 n

For fixed x ∈ R, we then compute the limit as h → 0 and end up with

f (x + h) − f (x)
lim = nxn−1 .
h→0 h

Example 5.3. Let f (x) = x1n , n ∈ N. Then f is differentiable on R \ {0} with f 0 (x) = −n xn+1
1
.
0 k 0
Together with the last example, we have (1) = 0 and (x ) = kx k−1 for all k ∈ Z \ {0}. We will
see that this formula in fact holds for all k ∈ R \ {0}.

Proof. We want to compute f 0 (x) for x 6= 0. Therefore we use the binomial theorem to obtain
n
1 1 1 1 x − (x + h)n

n
− n =
h (x + h) x h xn (x + h)n
Pn n n−k k
!
1 xn − k=0 k x h
= n n
h x (x + h)
−nxn−1 h − nk=2 nk xn−k hk
P !
1
=
h xn (x + h)n
−nxn−1 − nk=2 nk xn−k hk−1
P
=
xn (x + h)n
161

Observe that the denominator converges to x2n for h → 0. To take the limit into the denomi-
nator, we have to assume x 6= 0. In the numerator, all terms in the latter sum go to zero for
h → 0. Therefore,
n
h n−1 n−k hk

−nxn−1
P
0 k=1 k+1 x
f (x) = lim n Pn n n−k k = = −nx−n−1 .
h→0 x k=1 k x h x2n

Example 5.4. Let f (x) = sin(x). Then f is differentiable on R and it holds f 0 (x) = cos(x),
i.e.,
sin0 (x) = cos(x) for all x ∈ R.

Proof. With the help of the trigonometric addition formula

x+y x−y

sin(x) − sin(y) = 2 cos sin
2 2
we can compute that

h
sin(x + h) − sin x 2 cos x + 2 sin h2
h

sin h2
= = cos x + h
.
h h 2 2

Using sin(x)
x → 1 as x → 0, see Example 4.48 and the continuity of cos we obtain the result as
h → 0.

Example 5.5. In the same way, one can compute that cos is differentiable on R and

cos0 (x) = − sin(x),

where we us
x+y x−y

cos(x) − cos(y) = −2 sin sin .
2 2

We now turn to the important exponential function x 7→ ex =: exp(x), which is defined by

n ∞
x xk
X
x
e = lim 1+ = ,
n→∞ n k=0
k!

To compute its derivative, we need an inequality, which is also of independent interest.

Lemma 5.6. For all x < 1 we have

1
1 + x ≤ ex ≤ .
1−x
The lower bound holds for all x ∈ R.
162

Figure 33: Inequality from Lemma 5.6

Proof. First, recall that Bernoulli’s inequality (Theorem 1.48) states that

(1 + y)n ≥ 1 + ny for y ≥ −1.

n
With y = nx , we obtain that 1 + nx ≥ 1 + x for all n ≥ |x|, and hence ex ≥ 1 + x.
To bound ex from above, note that the above computation (with y = − nx ) shows that e−x ≥
1
1 − x. Hence, for x < 1, we obtain ex ≤ 1−x .

With this, we can compute the derivative of ex

Example 5.7. Let f (x) = ex , then f is differentiable on R and f 0 (x) = f (x), i.e.,
d x
(ex )0 = e = ex .
dx
Proof. Again, we compute the difference quotient
f (x + h) − f (x) ex+h − ex eh − 1
= = ex .
h h h
From Lemma 5.6 we obtain
1
1 + h ≤ eh ≤
1−h
for h < 1. This implies
1 eh − 1 1
≤ ≤
1 + |h| h 1 − |h|
eh −1
for |h| < 1. (Prove this by case distinction.) Hence, h → 1 which implies

eh − 1
ex → ex .
h

Remark 5.8. Note that calculating a derivative is in fact the same as calculating limits of a
function. Therefore, to show that a function is not differentiable at a point x0 , it is sufficient to
find two sequences (hn ) and (h̃n ) such that hn → 0 and h̃n → 0, but

f (x + hn ) − f (x) f (x + h̃n ) − f (x)

lim 6= lim .
n→∞ hn n→∞ h̃n
163

A typical example is the following.

Example 5.9. Let f (x) = |x|, then f is not differentiable in 0.

1 −1
Proof. Set hn = n and h̃n = n . It follows that

f (x + hn ) − f (x) f (x + h̃n ) − f (x)

1 = lim 6= lim = −1.
n→∞ hn n→∞ h̃n

This shows that a continuous function is not necessarily differentiable, however, the reverse
statement holds.

Theorem 5.10. Let f : (a, b) → R be differentiable at x0 . Then, f is continuous at x0 .

Proof. By assumption,
f (x) − f (x0 )
lim = f 0 (x0 ).
x→x0 x − x0
By the calculation rules for limits we get
f (xn ) − f (x0 )

lim (f (x) − f (x0 )) = lim (x − x0 ) = lim (x − x0 ) f 0 (x0 ) = 0.
x→x0 x→x0 xn − x0 x→x0

Thus, limx→x0 f (x) = f (x0 ), i.e. f is continuous.

5.1 Calculation rules for differentiable functions

As in the previous sections we want to establish some rules of calculation for differentiable func-
tions that will be the main tools to derive the derivatives of complicated functions. In particular,
we will establish rules for calculating the derivative of products, quotients and compositions of
functions, as well as of the inverse.
If one is not already completely confident with these rules of calculation, it is highly recom-
mended to practice and calculate derivatives of increasingly complicated functions. Note that √
with the rules that follow, it should be no problem to calculate the derivative of sin(x42 ) ecos( x)
step-by-step, although it might be time-consuming.

Theorem 5.11 (Linearity of derivatives). Let f, g be differentiable at x0 . Then,

• (f + g)0 (x0 ) exists and (f + g)0 (x0 ) = f 0 (x0 ) + g 0 (x0 ), and

• for any c ∈ R we have (c · f )0 (x0 ) = c · f 0 (x0 ).

Proof. Both follow easily from the calculation rules for limits.

Example 5.12. We already know that (xn )0 = nxn−1 for all n ∈ N. By linearity, all polynomials
are differentiable on R and
0
cn xn cn−1 xn−1 + . . . c1 x + c0 = cn nxn−1 + cn−1 (n − 1)xn−2 + · · · + c2 2x + c1 .
164

Next we have a look at the product of differentiable functions.

Theorem 5.13 (Product rule). Let f, g be differentiable at x0 , then (f g)0 (x0 ) exists and

(f g)0 (x0 ) = f 0 (x0 )g(x0 ) + g 0 (x0 )f (x0 ).

In short, (f g)0 = f 0 g + g 0 f .

Proof. We compute

f (x + h)g(x + h) − f (x)g(x)
(f g)0 (x) = lim
h→0 h
g(x + h) − g(x) f (x + h) − f (x)
= lim f (x + h) + lim g(x)
h→0 h h→0 h
= f (x)g 0 (x) + f 0 (x)g(x),

where we use that differentiable functions are continuous.

The next rule is for the composition of two functions.

Theorem 5.14 (Chain rule). Let f : I → J and g : J → R such that f is differentiable

at x0 and g is differentiable at f (x0 ). Then, (g ◦ f )0 (x0 ) exists and

(g ◦ f )0 (x0 ) = g 0 f (x0 ) f 0 (x0 ).

In short, (g ◦ f )0 = (g 0 ◦ f ) · f 0 .

Proof. We set y0 = f (x0 ) and have a look at the function

( g(y)−g(y )
0
y−y0 if y 6= y0
h(y) :=
g 0 (y0 ) if y = y0 .

This function is continuous in y0 and we see that

g(y) − g(y0 ) = h(y)(y − y0 )

for all y ∈ J. This yields, with y = f (x), that

g f (x) − g f (x0 ) h f (x) f (x) − f (x0 )
lim = lim
x→x0 x − x0 x→x0 x − x0
f (x) − f (x0 )
= lim h f (x) lim
x→x0 x→x0 x − x0
0 0 0
= h(y0 )f (x0 ) = g (y0 )f (x0 ).

Again, we have used that g ◦ f is continuous.

165

We are now able to consider also quotients of functions.

Theorem 5.15 (Quotient rule). Let f, g be differentiable at x0 , and assume that g(x0 ) 6= 0.
Then, fg is differentiable at x0 and
0
f f 0 (x0 )g(x0 ) − f (x0 )g 0 (x0 )
(x0 ) = .
g g(x0 )2
0
f f 0 g−f g 0
In short, g = g2
.

0
Proof. We want to use the product rule for the functions f and g1 . But first we compute 1
g .
0
1 −1 1
Recall that x = x2
for x 6= 0, and that g can be written as h ◦ g, where h(y) = y1 . The chain
rule yields 0
1 g 0 (x0 )
(x0 ) = − .
g g(x0 )2
Using the product rule, we obtain
0
f f 0 (x0 )g(x0 ) − f (x0 )g 0 (x0 )
(x0 ) = .
g g(x0 )2

These rules allow to compute complicated derivatives by computing several easy derivatives.

Example 5.16. We want to compute the derivative of the tangent function tan : (− π2 , π2 ) → R.
By the quotient rule, and sin2 x + cos2 x = 1, we obtain

d d sin x sin0 x cos x − sin x cos0 x cos2 x + sin2 x 1

tan x = = 2
= 2
= = 1 + tan2 x.
dx dx cos x cos x cos x cos2 x

Example 5.17. The calculation rules for derivatives and the already proven fact that trigono-
metric functions and (algebraic) polynomials are differentiable, implies that trigonometric poly-
nomials are differentiable on R.

There is also a formula for differentiating inverse functions. Note that the assumption that
a continuous function f is strictly monotone on an interval I is just equivalent to f mapping
bijectively to another interval J ⊂ R.

Theorem 5.18. Let f : I → R be strictly monotone and continuous. If f is differentiable

at x0 ∈ I, and f 0 (x0 ) 6= 0, then f −1 is differentiable at y0 = f (x0 ) and
0 1 1
f −1 (y0 ) = = 0 −1 .
f 0 (x0 ) f (f (y0 ))
0
In short, f −1 = 1
f 0 ◦f −1
.
166

Proof. By Theorem 4.20, we know that f −1 is continuous (in y0 ). Take an arbitrary sequence
(yn ) ⊂ f (I) with yn → y0 and yn 6= y0 , and define xn := f −1 (yn ). We obtain

xn = f −1 (yn ) → f −1 (y0 ) = x0 ,

as well as xn 6= x0 . Therefore,
f −1 (yn ) − f −1 (y0 ) xn − x0 1 1
lim = lim = lim = ,
n→∞ yn − y0 n→∞ f (xn ) − f (x0 ) n→∞ f (xn )−f (x0 ) f 0 (x 0)
xn −x0

where we use f 0 (x0 ) 6= 0 in the last equality.

Example 5.19. We compute the derivative of the natural logarithm ln(y), y > 0. This is the
inverse of the function f (x) = ex = y with derivative f 0 (x) = ex , see Example 5.7. Thus
d 1 1
ln(y) = 0 = .
dy f (x) y

Example 5.20. We use the derivative of ln(x) and the chain rule to prove that
d a
x = axa−1 for arbitrary a ∈ R and x > 0.
dx
For this, write xa = ea ln(x) . With f (x) = a ln(x) and g(x) = ex , we obtain f 0 (x) = a
x and
g 0 (x) = ex , which yields
d a d a
x = (g ◦ f )(x) = g 0 (f (x))f 0 (x) = ea ln(x) = axa−1 .
dx dx x

Example 5.21. Now consider f (x) = tan(x) for −π π

2 < x < 2 . We can compute the derivative
−1 −π π
of the inverse function arctan = tan : R → ( 2 , 2 ) by using the above theorem and obtain

d 1 1 1
arctan y = 0 = = ,
dy f (x) 1 + tan2 x 1 + y2
where we use Example 5.16 and, again, set y := f (x).

Example 5.22. One can also use the theorem to calculate the derivatives of the inverses of the
trigonometric functions, see Section 1.7, to obtain
1 1 1 1
arcsin0 (y) = 0 = =q =p with y = sin(x),
sin (x) cos(x) 1 − sin2 (x) 1 − y2

and
1 −1 −1 −1
arccos0 (y) = 0
= =p 2
=p with y = cos(x),
cos (x) sin(x) 1 − cos (x) 1 − y2

for all y ∈ (−1, 1). (Note that we used sin2 x + cos2 x = 1.)
167

5.2 Global and local extrema

In this section we discuss the most important application for derivatives. That is, the calculation
and classification of extreme points, i.e., points where a function attains a (local) maximum or
minimum. Let us first restate the definition of global extrema, see Definition 4.54.

Definition 5.23. Let D be any set and f : D → R.

Then, f has a (global) minimum at x0 ∈ D if

f (x) ≥ f (x0 ) for all x ∈ D,

and f has a (global) maximum at x0 ∈ D if

f (x) ≤ f (x0 ) for all x ∈ D.

The point x0 is called (global) minimum/maximum point, or global extreme point.

The value f (x0 ) is called minimum/maximum, or, collectively, extreme value.

Again, note that the extreme values of a function, if they exist, are unique, but there might be
several extreme points. (Think about f (x) = x2 on [−1, 1] or (−1, 1), see Example 4.55)

As we want to discuss the connection of extreme points to the derivative of a function, which is
a local property, we also need a local notion of extreme points.

Definition 5.24. Let D ⊂ R and f : D → R.

Then, f has a local minimum at x0 ∈ D if there exists ε > 0 such that

f (x) ≥ f (x0 ) for all x ∈ D ∩ (x0 − ε, x0 + ε),

and a strict local minimum if f (x) > f (x0 ) for all x ∈ D ∩ (x0 − ε, x0 + ε) \ {x0 }.
Analogously, we say f has a local maximum at x0 ∈ D if there exists ε > 0 such that

f (x) ≤ f (x0 ) for all x ∈ D ∩ (x0 − ε, x0 + ε),

and a strict local maximum if f (x) < f (x0 ) for all x ∈ D ∩ (x0 − ε, x0 + ε) \ {x0 }.
The point x0 is called local maximum/minimum point, or local extreme point.

It follows immediately that a global extreme point has to be a local extreme point.
However, as we have seen in the last example, there might be already several global extrema
and clearly even more local extrema, see Figure 34.
4
2

−20 −15 −10 −5 5 10 15 20

−2

x2
Figure 34: Graph of x · sin(x) · e− 100
168

We will now discuss how to find (and classify) local extrema by using derivatives. One possible
way of finding a global extremum of a function f : [a, b] → R then is:

1. Find all local extreme points, say t1 , . . . , tk .

2. Calculate the function values f (ti ), as well as f (a) and f (b).
3. The largest/smallest value corresponds to the maximum/minimum.

If the function is defined on an open (or unbounded) interval (a, b) with −∞ ≤ a < b ≤ ∞, then
calculating f (a) and f (b) in Step 2 should be replaced by calculating the boundary values
limx→a f (x) and limx→b f (x). Clearly, if they lead to the maximal/minimal values, then there
exists no global maximum/minimum point, as it is not attained in the domain.
We now turn to step one, i.e., finding all local extrema. For this, note that the above figure
shows that the slope of the tangent line attached to an extremum is zero, i.e., the tangent line
is horizontal. This means that the derivative at such a point is zero. We now show that this is
a necessary condition if the function is differentiable, which means that it is enough to ’check’
all points with this condition if you want to find a local extremum.

Theorem 5.25 (Necessary condition for an extreme point). Let I = (a, b) and f : I → R.
If x0 ∈ I is a local extreme point of f and f is differentiable at x0 , then

f 0 (x0 ) = 0.

We call x0 ∈ I with f 0 (x0 ) = 0 a stationary point of f .

In particular, if a function f : I → R is differentiable, but satisfies f 0 (x) 6= 0 for all x ∈ I, then

it cannot have local extreme points in I. Maximum and minimum values can in this case only
be attained at the ’boundary’ of the interval.
We see that the only possible candidates for extreme points inside the domain of a function
are the stationary points and the points where f is not differentiable (if they exist). We call these
points the critical points of f , and the corresponding function values the critical values.
We therefore only have to calculate all critical values and the boundary values of a function to
determine its supremum and infimum, and to decide if they are (global) maximum and minimum.
Note that, f 0 (x0 ) = 0 is not a sufficient condition for having an extreme point at x0 , i.e.,
there are functions with f 0 (x0 ) = 0, but x0 is no local maximum/minimum point. Let us
consider f (x) = x3 on R, see Figure 35. Clearly, f has no maximum or minimum at 0, but
f 0 (0) = 3 · 02 = 0.

−2 2
−1

−2

Figure 35: The graph of f (x) = x3

169

Proof of Theorem 5.25. Let f have a local minimum at x0 and let ε > 0 be as in the definition
required. We now have for x ∈ I with x0 < x < x0 + ε, which implies x − x0 > 0, that
f (x) − f (x0 )
≥ 0.
x − x0
f (x)−f (x0 )
So, limx&x0 x−x0 ≥ 0. On the other hand for x0 − ε < x < x0 we obtain

f (x) − f (x0 )
≤ 0,
x − x0

thus limx%x0 f (x)−f

x−x0
(x0 )
≤ 0. Since f 0 (x0 ) exists, we necessarily have f 0 (x0 ) = 0. The case of
local maxima can be treated similarly.

In many cases, it is already enough to determine all stationary points in order to find all (global)
extreme points of a function. But only from the function and derivative value at a point, we
can not decide if a stationary point is indeed an extreme point and, if it is, if we face a local
maximum or a local minimum. This is in particular a problem, if we are not able to draw the
function under consideration. However, there is a sufficient condition for having an extrema
that involves higher-order derivatives.
Let f : (a, b) → R be differentiable and let f 0 be also differentiable (at x). Then we say that f
is twice differentiable (at x) and write
d d d 0

00
f (x) := f (x) = f (x).
dx dx dx
This procedure can be repeated as long as derivatives exist and so we can define the n-th
derivative of f inductively by
dn d (n−1)
f (n) (x) := n
f (x) = f (x).
dx dx
In the special case of n = 2 or n = 3 we write f 00 (x) or f 000 (x), respectively. If the n-th derivative
of f at a point exists, then we say that f is n-times differentiable at this point. If the n-th
derivative of f (at x0 ) exists and is a continuous function (at this point), then we say that f is
n-times continuously differentiable (at x0 ).

Example 5.26. Let f (x) := xn for some n ∈ N.

By the already discussed rules we obtain f 0 (x) = nxn−1 , f 0 (x) = n(n − 1)xn−2 or, in general,
f (k) (x) = n(n − 1) · · · (n − k + 1) xn−k for all k ≤ n. In particular, f (n) (x) = n! for all x ∈ R.
Since the derivative of the constant function (n = 0) is the zero function, we additionally see
that f (k) (x) = 0 for all k > n.

Example 5.27. We now consider f (x) := xa for arbitrary a ∈ R, see Example 5.20.
If a ∈ N0 , we are back in the situation of the last example. That is, the formula for the higher-
order derivatives leads to f (k) ≡ 0 (i.e., f (k) (x) = 0 for all x ∈ I) for all k > a.
This does not hold if a is negative or not a natural number, i.e.,

dk a
x = a(a − 1) · · · (a − k + 1) xa−k for arbitrary a ∈ R \ N0 , x > 0 and k ∈ N.
dxk
170

Example 5.28. Note that all differentiable functions that we discussed so far, were also contin-
uously differentiable, i.e., they possess a continuous derivative (on the whole domain). In fact,
it is not easy to find a differentiable function that is not continuously differentiable.
The classical example of such a function is
(
x2 sin(1/x), if x 6= 0,
f (x) :=
0, if x = 0.

Figure 36: f (x) = x2 sin( x1 )

First of all, by using the bound |f (x)| ≤ |x|2 , we obtain that f is continuous. By the definition
of the derivative at x0 = 0, we obtain that f 0 (0) = limh→0 h sin(1/h) = 0, where we again use
the boundedness of sin. For x0 6= 0 we use the calculation rules for derivatives, and we obtain
(
2x sin(1/x) − cos(1/x), if x 6= 0,
f 0 (x) =
0, if x = 0.

Hence, the function f is differentiable on R.

However, f 0 is not continuous. To see this, note that the limit limx→0 f 0 (x) does not exist.
1
(Consider e.g. the sequence xn = nπ → 0, which leads to cos(1/xn ) = (−1)n .)

We now turn to a sufficient condition that allows to deduce if a stationary point (i.e. f 0 (x0 ) = 0)
is a local extreme point, and, moreover, if it is a local maximum or a local minimum.

Theorem 5.29 (Second derivative test). Let I = (a, b), x0 ∈ I and f : I → R be twice
continuously differentiable at x0 , i.e., f 00 exists and is continuous at x0 .
Moreover, assume that x0 is a stationary point of f , i.e., f 0 (x0 ) = 0. Then,

f 00 (x0 ) > 0 =⇒ f has a strict local minimum at x0 ,

and
f 00 (x0 ) < 0 =⇒ f has a strict local maximum at x0 .

If f 00 (x0 ) = 0, we do not gain any information about the possible extremum.

Although we could prove this theorem here, using a rather longish reasoning, we will present a
very short argument at the end of the following section, see the Proof of Theorem 5.29.
171

Note that a precise characterization of an extremum would also involve higher-order derivatives
and is rather complicated. As the above is usually enough, we do not state the characterization
here. However, note that we will learn later that the knowledge of all derivatives f (k) (x0 ),
k ∈ N0 , at a given point x0 , if they exist, allows us to reconstruct the function exactly in a
neighborhood around x0 . That is, we can obtain all ’local information’ of a function from its
higher-order derivative values, if the function can be differentiated infinitely often.

Example 5.30. The function f (x) = 3x2 − 6x + 5 (on R) satisfies f 0 (x) = 6x − 6 and f 00 (x) = 6,
see Figure 37. It therefore has a unique critical point at x0 = 1 which is a local minimum. As
it is the only extreme point, and limx→±∞ f (x) = ∞, we obtain that f is not bounded from
above, i.e., f does not have a maximum, and f has a global minimum at x0 = 1 with minimum
value f (x0 ) = 2.

Figure 37: Derivatives of f (x) = 3x2 − 6x + 5

Example 5.31. If f 00 (x0 ) = 0, then we can not say if x0 is an extreme point and, if it is, which
one. To see this, consider, e.g., f (x) = x3 , g(x) = x4 and h(x) = −x4 on R, and x0 = 0.
Then, f 0 (0) = f 00 (0) = g 0 (0) = g 00 (0) = h0 (0) = h00 (0) = 0, but f has no extrema at 0, g has a
maximum at 0 and h has a minimum at 0.

5.3 Mean value theorem and l’Hospital’s rule

In this section we discuss the mean value theorem, which is a theorem that assures that the
derivative of a function attains a certain value, if we only know the function values at the end-
points of the interval. In the same way as the other ’existence theorems’, which were ultimately
all due to the Bolzano-Weierstrass theorem 3.42, this will be an important tool in the proofs of
the upcoming theorems.
In particular, it will imply l’Hospital’s rule for calculating certain limits, which may be seen as
∞
a (fast) way of calculating expressions of the form ’ 00 ’ or ’ ∞ ’.
Now we are ready to prove the first mean value theorem, namely the Theorem of Rolle. This
may be seen as a special case of the upcoming theorem, but it already contains the main idea.

Theorem 5.32 (Rolle). Let f : [a, b] → R be continuous and differentiable on (a, b). Fur-
thermore, assume that f (a) = f (b). Then there exists some ξ ∈ (a, b) such that f 0 (ξ) = 0.

Proof. For constant f the statement is obvious. So we may assume that f is not constant. Since
f is continuous on [a, b] we know from the extreme value theorem (Theorem 4.56) that it has a
172

maximum and a minimum, which are not equal. Moreover, since f (a) = f (b), one of the (global)
extreme points has to be in (a, b), i.e., not at the boundary points. By Theorem 5.25, this point,
say ξ ∈ (a, b), satisfies f 0 (ξ) = 0.

Geometrically the above theorem states that the graph of f has at least one point where the
tangent is horizontal. Again, if you have a drawing of a function, this result seems to be a trivial
statement. However, it will also be necessary in more complex situations, and we will see that
it is important to respect all the assumptions.

Example 5.33. It is important the the function is differentiable in the whole interval.
To see this, consider the absolute value function f (x) = |x| on [−1, 1], which is continuous and
satisfies f (−1) = f (1) = 1. However, it is not differentiable at x0 = 0, and at all other points it
satisfies f 0 (x) = 1 or f 0 (x) = −1. So, there is no ξ ∈ (−1, 1) with f 0 (ξ) = 0.

We now use the Theorem of Rolle to prove the mean value theorem.

Theorem 5.34 (Mean value theorem). Let f : [a, b] → R be continuous in [a, b] and differ-
entiable in (a, b). Then, there exists some ξ ∈ (a, b) such that

f (b) − f (a)
f 0 (ξ) = .
b−a

Figure 38: Mean value theorem

Proof. Consider the function

f (b) − f (a)
h(x) := f (x) − x−a .
b−a
We see that h is continuous and differentiable as f , and satisfies h(a) = h(b) = f (a). Therefore,
Rolle’s theorem implies that there is some ξ ∈ (a, b) with h0 (ξ) = 0. Since

f (b) − f (a)
h0 (x) = f 0 (x) − ,
b−a
we obtain the result for x = ξ.
173

Note that the mean value theorem gives information about the derivative of a function, even if
we only know the function values at the boundary points. For example, given some f : [0, 1] → R
with f (0) = 0 and f (1) = 5. The mean value theorem states that, if f is differentiable on (0, 1),
then its derivative must attain the value 5, i.e., there exists ξ ∈ (0, 1) with f 0 (ξ) = 5. Note that
the function f (x) = 5x is the only linear function with these function values and satisfies f 0 ≡ 5.
The theorem then states that every function with the same boundary values has also a point
with this slope. (Recall that the intermediate value theorem, Theorem 4.49, implies e.g. that
there is a ξ ∈ (0, 1) with f (ξ) = 2.)

The following theorem is a slight generalization of the mean value theorem which is the first
step in proving l’Hospital’s rule, and will be also of interest later.

Theorem 5.35 (General mean value theorem). Let f, g : [a, b] → R be continuous in [a, b]
and differentiable in (a, b). Then, there exists some ξ ∈ (a, b) such that

f 0 (ξ)(g(b) − g(a)) = g 0 (ξ)(f (b) − f (a)).

In particular, if g 0 6= 0 on (a, b), then there exists some ξ ∈ (a, b) such that

f 0 (ξ) f (b) − f (a)

= .
g 0 (ξ) g(b) − g(a)

Proof. We first consider the case that g(a) = g(b). By Rolle’s theorem we know that there exists
some ξ ∈ (a, b) with the property that g 0 (ξ) = 0. So we only have to show the first point, as
g 0 6= 0 is clearly not true. To do so we have a look at

f 0 (ξ)(g(b) − g(a)) = 0 = g 0 (ξ)(f (b) − f (a)),

which was to show. For the other case, i.e. g(a) 6= g(b), we define the function
f (b) − f (a)
h(x) = f (x) − (g(x) − g(a)).
g(b) − g(a)
Since f and g are continuous on [a, b] and differentiable on (a, b), we have that h is continuous
on [a, b] and differentiable on (a, b). Moreover, it is easy to compute that

h(a) = h(b).

An application of Rolle’s theorem yields that there exists some ξ ∈ (a, b) such that
f (b) − f (a) 0
0 = h0 (ξ) = f 0 (ξ) − g (ξ).
g(b) − g(a)
Regrouping this equation leads to

f 0 (ξ)(g(b) − g(a)) = g 0 (ξ)(f (b) − f (a)),

which proves the first point of the theorem. For the second point we assume additionally that
g 0 6= 0 on (a, b), i.e., g 0 (x) 6= 0 for all x ∈ (a, b). Thus, we can divide the last but one equation
by g 0 (ξ) and obtain
f 0 (ξ) f (b) − f (a)
0
= .
g (ξ) g(b) − g(a)
174

We now come back to the computation of limits.

Note that we had problems with examples of the form

x2 − 16 4x2 − 5x
lim or lim .
x→4 x − 4 x→∞ 1 − 3x2

If we plug in x = 4 in the first limit we get 00 , another similar case appears if we would plug in
∞
"∞" in the second limit as we would get −∞ (recall that, if x tends to ∞, then a polynomial
behaves like its largest power).

We now introduce l’Hospital’s rule, which is a method to compute such limits, if they exist.

Theorem 5.36 (l’Hospital). Let I = (a, b) and x0 ∈ [a, b]. Let f, g : I\{x0 } → R be differen-
tiable on I \{x0 } and g 0 6= 0. Furthermore assume that either limx→x0 f (x) = limx→x0 g(x) =
0 or limx→x0 f (x) = ±∞, limx→x0 g(x) = ±∞ holds. Then we have

f (x) f 0 (x)
lim = lim 0 ,
x→x0 g(x) x→x0 g (x)

if the limit on the right hand side exists, or is definitely divergent.

Proof. We only prove the case limx→x0 f (x) = limx→x0 g(x) = 0. Otherwise replace f by 1/f ,
and g by 1/g. First observe that we can continuously extend f, g to x0 with f (x0 ) = g(x0 ) = 0.
By the general mean value theorem for any x ∈ I such that x 6= x0 there exists ξ ∈ (x0 , x)
satisfying
f 0 (ξ) f (x) − f (x0 ) f (x)
0
= = .
g (ξ) g(x) − g(x0 ) g(x)
0
If x → x0 , it follows that ξ → x0 and since limξ→x0 fg0 (ξ)
(ξ)
exists (this was our assumption) we
obtain
f (x) f 0 (x)
lim = lim 0 .
x→x0 g(x) x→x0 g (x)

Remark 5.37. This rule of calculation was published in 1696 by Guillaume Francois Antoine,
Marquis de L’Hospital (1661–1704) in the very first textbook on differential calculus. However,
it was actually proven in 1694 by the famous mathematician Johann Bernoulli (1667–1748).

As a first application of l’Hospital’s rule we prove that the second derivative test is valid.

Proof of Theorem 5.29. For this, let x0 ∈ I be such that f 0 (x0 ) = 0 and f 00 (x0 ) > 0. Since f 0 (x0 )
exists, we have from Theorem 5.10 that f is continuous at x0 . This implies that limx→x0 (f (x) −
f (x0 )) = 0. We obtain, by l’Hospital’s rule, f 0 (x0 ) = 0 and the definition of the second derivative,
that
f (x) − f (x0 ) f 0 (x) 1 f 0 (x) − f 0 (x0 ) f 00 (x0 )
lim = lim = lim = > 0.
x→x0 (x − x0 )2 x→x0 2(x − x0 ) 2 x→x0 x − x0 2

This shows that the limit on the left hand side exists and is positive. Therefore, the function
inside the limit must be positive in a neighborhood of x0 . In detail: There are some ε, δ > 0
175

s.t. f (x)−f (x0 )

(x−x0 )2
> δ for all x with 0 < |x − x0 | < ε. Since (x − x0 )2 is positive for x 6= x0 , we
obtain that f (x) − f (x0 ) > δ(x − x0 )2 > 0 for x with 0 < |x − x0 | < ε. This implies that

f (x) > f (x0 ) for x ∈ I ∩ (x0 − ε, x0 + ε).

The case of a local maximum follows by analogous arguments.

Let us see some more examples where we can use this rule.

Example 5.38.
sin x cos x
lim = lim = 1.
x→0 x x→0 1

Example 5.39.
x3 − 1 3x2
lim = lim =3
x→1 x − 1 x→1 1

This rule can also be used several times to calculate a limit as the following examples will show.

Example 5.40.
1 − cos x sin x cos x 1
lim = lim = lim = .
x→0 x2 x→0 2x x→0 2 2
Example 5.41.
!
ln x x−1

x
lim x = exp lim x ln x = exp lim −1 = exp lim = e0 = 1.
x&0 x&0 x&0 x x&0 −x−2

Example 5.42.
1 − cos x2 1
sin x2 1
cos x2 1
lim = lim 2 = lim 4 = .
x→0 1 − cos x x→0 sin x x→0 cos x 4

5.4 Monotonicity and convexity

A common application of the mean value theorem is the characterization of the monotonicity of
a function, which is based on its derivative.

Definition 5.43 (Monotonicity). Let f : (a, b) → R.

We call f (strictly) increasing if

∀ x1 , x2 ∈ (a, b) : x1 < x2 =⇒ f (x1 ) < f (x2 ),

or (strictly) decreasing if

∀ x1 , x2 ∈ (a, b) : x1 < x2 =⇒ f (x1 ) > f (x2 ).

If we replace ’<’ by ’≤’, or ’>’ by ’≥’, then we say f is non-decreasing or non-increasing,

respectively.
176

A useful condition to check on monotonicity is given in the following theorem.

Theorem 5.44. Let f be differentiable on I = (a, b).

Then,
f non-decreasing ⇐⇒ f 0 ≥ 0
and
f non-increasing ⇐⇒ f 0 ≤ 0.

Proof. W.l.o.g. we only prove the first statement.

First, we assume that f 0 (x) ≥ 0 for all x ∈ I. By the mean value theorem we get that for
x1 < x2 ∈ I there exists a ξ ∈ (x1 , x2 ) such that

f (x2 ) − f (x1 )
= f 0 (ξ) ≥ 0.
x2 − x1
This implies f (x2 ) ≥ f (x1 ) for arbitrary points x1 , x2 ∈ I with x1 < x2 , so f is non-decreasing.
Now assume that f is non-decreasing. This is, f (x2 ) ≥ f (x1 ) for all x2 > x1 . This also implies
that f (x2 ) ≤ f (x1 ) for all x2 < x1 . In any case, we get

f (x2 ) − f (x1 )
≥ 0,
x2 − x1

and therefore, in particular, f 0 (x1 ) ≥ 0 for all x1 ∈ I.

Remark 5.45. Note that if ’f is increasing’ does not imply that f 0 > 0 in general. As an
example consider f (x) = x3 , which is an increasing function, but f 0 (0) = 0.

Example 5.46. The function f (x) = ex is increasing on R, since ∀x ∈ R : f 0 (x) = ex > 0. This
implies that f has no stationary points.

Example 5.47. The function f (x) = − ln(x) is decreasing on R+ , since ∀x > 0 : f 0 (x) = − x1 <
0. Again, this implies that there are no stationary points.

The second derivative helps us to determine the shape of a function, once plotted in a coordinate
system. If a line between two points of a curve is above the graph, we call the function convex,
if otherwise the line is below the graph, we say the function is concave.

Definition 5.48. Let I be an interval and f : I → R. We say that f is convex in I if

∀λ ∈ (0, 1) and x0 , x1 ∈ I there holds

f (1 − λ)x0 + λx1 ≤ (1 − λ)f (x0 ) + λf (x1 ).

We call f concave if ∀λ ∈ (0, 1) and x0 , x1 ∈ I we have

f (1 − λ)x0 + λx1 ≥ (1 − λ)f (x0 ) + λf (x1 ).

Remark 5.49. From the definition we obtain immediately f is concave if and only if −f is
convex.
177

Example 5.50. Have a look at the graphs of f (x) = x2 and g(x) = ln(x), see Figure 39. Clearly,
f (x) is strictly convex and g(x) is strictly concave on R+ . (Calculate the second derivatives of
both functions. What do you see?) The remark above states that −f (x) is strictly concave.

4 y f (x) = x2

2
g(y) = ln(x)
x
1 2 3
−2

Figure 39: The function f (x) = x2 and g(x) = ln(x)

Remark 5.51 (*). Convexity is a very useful property for certain optimization problems.

For a twice differentiable function it suffices to check the sign of the second derivative.

Theorem 5.52. Let f be twice differentiable on an open interval I. Then f is convex if

and only if f 00 (x) ≥ 0, or concave if and only f 00 (x) ≤ 0.

Proof. For now we only prove the case of convexity, concavity can be treated analogously.
Assume f 00 (x) ≥ 0 for all x ∈ I = (a, b). Moreover, let x0 , x1 ∈ I such that a < x0 < x1 < b.
Then, for any λ ∈ (0, 1), we let x := (1 − λ)x0 + λx1 ∈ (x0 , x1 ). By the mean value theorem we
can find ξ0 ∈ (x0 , x) and ξ1 ∈ (x, x1 ) such that
f (x) − f (x0 ) f (x1 ) − f (x)
= f 0 (ξ0 ) and = f 0 (ξ1 ).
x − x0 x1 − x
Since f 00 ≥ 0, we get that f 0 has to be non-decreasing. Hence f 0 (ξ0 ) ≤ f 0 (ξ1 ) and we obtain
f (x) − f (x0 ) f (x) − f (x0 ) f (x1 ) − f (x) f (x1 ) − f (x)
= ≤ = .
λ(x1 − x0 ) x − x0 x1 − x (1 − λ)(x1 − x0 )
The identities λ(x1 − x0 ) = x − x0 and (1 − λ)(x1 − x0 ) = x1 − x follow from the definition of x.
Using the above inequality and regrouping the terms we obtain

f (x) = f (1 − λ)x0 + λx1 ≤ (1 − λ)f (x0 ) + λf (x1 ),
which is the definition of convexity.
On the other hand assume that f is convex. Using the inequality in the definition of convex
functions and the above calculations we see that it holds
f (x) − f (x0 ) f (x1 ) − f (x0 ) f (x1 ) − f (x)
≤ ≤ ,
x − x0 x1 − x0 x1 − x
for arbitrary x0 < x < x1 . Letting x → x0 and x → x1 we obtain
f (x1 ) − f (x0 )
f 0 (x0 ) ≤ ≤ f 0 (x1 ).
x1 − x0
Thus f 0 is non-decreasing and therefore f 00 ≥ 0.
178

5.5 Taylor’s theorem

In the last subsections, we have seen that some (local or global) properties of a function may be
characterized by its derivatives. Now, we will show that, under certain assumptions, a function
can be characterized exactly (in a neighborhood) just by knowing all higher-order derivatives of
a function at one point. This shows that the very local behavior of a function can be used to
determine it exactly everywhere.
Let us start with a result that shows that we can approximate a function of the function in
a neighborhood of a point by a polynomial. This is Taylor’s theorem.
Recall that the n-th derivative of f : (a, b) → R (at x) is defined inductively by

dn d (n−1)
f (n) (x) := n
f (x) = f (x),
dx dx
if it exists.

Theorem 5.53 (Taylor’s theorem). Let f : (a, b) → R be (n + 1)-times differentiable and

let x0 ∈ (a, b). Then, for all x ∈ (a, b) there is a ξ between x0 and x such that
n
X f (k) (x0 ) f (n+1) (ξ)
f (x) = (x − x0 )k + (x − x0 )n+1 .
k=0
k! (n + 1)!

We call
n
X f (k) (x0 )
Tn (x) := (x − x0 )k ,
k=0
k!
the Taylor polynomial of f of order n (at x0 ).
The term
f (n+1) (ξ)
Rn (x) := f (x) − Tn (x) = (x − x0 )n+1
(n + 1)!
is called the remainder of the Taylor polynomial (in Lagrange form).

Note that ξ above depends on x. (This may already be clear since it lies between x and x0 ,
i.e., ξ ∈ (x, x0 ) or ξ ∈ (x0 , x).) That’s why some authors write ξ = ξx to make this explicit.
In particular, to obtain the equality above, one needs to find a specific ξ for every x, which is
very impractical. However, this formula is helpful, when we want to prove that the error of the
Taylor polynomial, which is the difference of f (x) and Tn (x), is ’small’ for all x ∈ (a, b). We
just have to bound f n+1 (ξ) for all possible ξ ∈ (a, b).

Proof. Let x ∈ (a, b) be arbitrary and, w.l.o.g., assume x > x0 . Define, for t ∈ [x0 , x], the
function
n
X f (k) (t) m
g(t) := f (x) − (x − t)k − (x − t)n+1 ,
k=0
k! (n + 1)!
where we choose m such that g(x0 ) = 0.
Clearly, g(x) = 0. So, together with g(x0 ) = 0, Rolle’s theorem yields the existence of some
ξ ∈ (x0 , x) such that g 0 (ξ) = 0. We compute the first derivative of g (in t) and obtain, using the
179

product rule, that

n n
0
X f (k+1) (t) k
X f (k) (t) m
g (t) = − (x − t) + (x − t)k−1 + (x − t)n
k=0
k! k=1
(k − 1)! n!
f (n+1) (t) (x − t)n
=− (x − t)n + m ,
n! n!
where we have also used a telescoping trick (or just an index shift).
Thus, there exists ξ ∈ (x0 , x) such that
f (n+1) (ξ) (x − ξ)n
0 = g 0 (ξ) = − (x − ξ)n + m
n! n!
which holds if and only if m = f (n+1) (ξ). Therefore,
n
X f (k) (x0 ) f (n+1) (ξ)
0 = g(x0 ) = f (x) − (x − x0 )k − (x − x0 )n+1 ,
k=0
k! (n + 1)!
which proves the claim.

(k)
Remark 5.54. It is easy to check that Tn (x0 ) = f (k) (x0 ) for all k = 1, . . . , n.

One straightforward application is to polynomials. Note that, if p : R → R is a polynomial of

degree n, then all derivatives p(k) of order k > n satisfy p(k) (x) = 0 for all x ∈ R. (If this is not
clear to you, prove it!) We therefore obtain that the Taylor polynomial of p of order n is exact.
Example 5.55. Let p : R → R be a polynomial of degree n, then for all x, x0 ∈ R we have
n
X p(k) (x0 )
p(x) = (x − x0 )k ,
k=0
k!
i.e. the Taylor polynomial of order n is exactly p, independent of x0 .

Proof. Clearly, p is of the form p(x) = a0 + a1 x + · · · + an xn . However, the binomial theorem

yields
k
!
k
(x − x0 )` xk−`
X
xk = (x − x0 + x0 )k = 0 ,
`=0
`
so we can write p as
p(x) = b0 + b1 (x − x0 ) + · · · + bn (x − x0 )n ,
where the bk clearly depend on x0 . Differentiating yields
p(x) = b0 + b1 (x − x0 ) + b2 (x − x0 )2 · · · + bn (x − x0 )n
p0 (x) = b1 + 2 b2 (x − x0 ) + · · · + nbn (x − x0 )n−1
p00 (x) = 2b2 + 3 · 2 b3 (x − x0 ) + · · · + n(n − 1)bn (x − x0 )n−2
..
.
p(n) (x) = n! bn .
If we now plug in x = x0 , we see that most terms vanish and we obtain
p(k) (x0 ) = k! bk .
180

The equation in Example 5.55 is sometimes useful to rewrite polynomials. In particular, if one
is interested in properties around a specific point.
Example 5.56. Let p(x) = x3 − 2x2 + 3 and consider x0 = −1. Since p has the derivatives
p0 (x) = 3x2 −4x, p00 (x) = 6x−4 and p000 (x) = 6, we obtain p(−1) = 0, p0 (−1) = 7, p00 (−1) = −10
and p000 (−1) = 6. Example 5.55 implies

p(x) = 7(x + 1) − 5(x + 1)2 + (x + 1)3 .

In the same way we may also expand more general functions around a point if we know
its derivatives, but note that there is an additional error term, which is in general not easy to
determine exactly. Under additional assumptions, however, one may give practical bounds.

Corollary 5.57. In the setting of Theorem 5.53, assume additionally that f : (a, b) → R
satisfies |f (n+1) (x)| ≤ M for some M < ∞ and all x ∈ (a, b), then

M (b − a)n+1
|f (x) − Tn (x)| ≤ for all x ∈ (a, b).
(n + 1)!

Proof. The bound follows directly from Theorem 5.53, by noting that |x − x0 | ≤ (b − a) for all
x, x0 ∈ (a, b).

Although this gives a useful uniform bound on the error, it is clearly useless, if we consider
functions that are defined on R. Let us discuss an example.

Example 5.58. Consider the function f (x) = ex on I = (−1, 1). Assume you want to approxi-
1
mate f by a polynomial with preferably small degree, and you allow an error of at most ε = 50 .
(k) x (k)
We know, see Example 5.7, that f (x) = e for all k ∈ N, and therefore also f (0) = 1.
Setting x0 = 0, we obtain that
n
X xk
Tn (x) = .
k=0
k!

Moreover, note that |f (k) (x)| ≤ e for all x ∈ (−1, 1) and k ∈ N. It follows from Corollary 5.57
that
e · 2n+1
|ex − Tn (x)| ≤ .
(n + 1)!
One may check the first values of n, to see that n = 7 is enough. That is, we obtain
!

x x2 x3 x4 x5 x6 x7 1
e − 1+x+ + + + + + ≤ 0.0173 ≤

2 6 24 120 720 7! 50

for all x ∈ (−1, 1). (Try to visualize this with some computer algebra software!)

One should notice that the upper bound on the error in the last example goes to zero (very fast)
with n. This is not only the case for the function ex , but for all infinitely often differentiable
functions that satisfy the assumption of Corollary 5.57 for all n ∈ N with the same M , i.e., if
n o
sup |f (k) (x)| : k ∈ N, x ∈ (a, b) ≤ M.
181

(b−a)n
In this case, we obtain that limn→∞ |f (x) − Tn (x)| = 0, since we always have n! → 0 for
a, b ∈ R, and therefore
∞
X f (k) (x0 )
f (x) = lim Tn (x) = (x − x0 )k
n→∞
k=0
k!

for all x0 ∈ (a, b), and we call the right hand side the Taylor series of f (around x0 ).

Remark 5.59. Think for a second about the last formula!

Note that (under the very restrictive conditions needed) the equation holds for all x, x0 ∈ (a, b).
However, only the right hand side depends on x0 . Isn’t it fascinating that the value of the right
hand side does not change, if we change x0 ?

Note that Taylor series are a special case of the more general power series, see Section 3.8. In
particular, Taylor’s theorem leads to the earlier announced and constructive way to write rather
general functions as power series
∞
X
f (x) := ak (x − x0 )k
k=0

for some (real- or complex-valued) sequence (ak )k∈N0 and x0 ∈ R. See Section 3.8 for some
general results on the convergence of the right hand side. In particular, we obtain convergence
if |x − x0 | < R, where R is the radius of convergence
1
R = lim inf √
k→∞ k ak

f (k) (x0 )
see Theorem 3.96. For Taylor series, we consider ak = k! .

Remark 5.60. Note that convergence of the series ak (x − x0 )k does not mean that this sum
P

equals f (x). For this, we need a bound on the values of the derivatives at all ξ between x0 and
x, see Theorem 5.53. One may get in trouble otherwise. For example, if we write the function
f (x) = e|x| as its Taylor series at x0 = 1. Note that at this point f (x) equals ex , and so would
be the Taylor series. This series would converge for all x ∈ R, but wouldn’t be equal to f for
x < 0. (Note that f is not differentiable at x0 = 0.)

We now bring our conditions in a form that is more useful to decide if a function can be written
as Taylor series everywhere. This also allows for an analysis of functions defined on R. Note
that, however, in many cases, the above equality does not hold in the whole domain of definition,
but only in a neighborhood of the point x0 .
182

Theorem 5.61. Let f : I → R (with I = (a, b) or I = R) be an infinitely often differentiable

function, and let x0 ∈ I. If r > 0 is such that
rn
lim · sup |f (n) (ξ)| = 0,
n→∞ n! ξ∈Ur (x0 )

where Ur (x0 ) := (x0 − r, x0 + r) ⊂ I, then f can be written by its Taylor series

∞
X f (k) (x0 )
f (x) = (x − x0 )k for all x ∈ Ur (x0 ).
k=0
k!

In particular, if, for every fixed ξ ∈ I, we have

q
n
|f (n) (ξ)|
lim = 0,
n→∞ n
then the above holds for all x, x0 ∈ I.

Remark 5.62. Note that the latter condition in the theorem is clearly fulfilled if |f (n) (x)| ≤ c·C n
for every fixed x ∈ I, where c, C ≥ 1 may depend on x, but not on n.

Proof. For the first statement, note that x ∈ Ur (x0 ) if and only if |x − x0 | < r. We can therefore
(n)
bound the remainder from Theorem 5.53 by |Rn−1 (x)| ≤ |f n!(ξ)| rn . Taking into account that ξ
is between x0 and x, and therefore also ξ ∈ Ur (x0 ), we see that the assumption of the theorem
implies |Rn−1 (x)| → 0.
For the second part, let x ∈ I be fixed. For x = x0 the statement is obvious. So, w.l.o.g.,
we assume x > x0 . Then, since f is infinitely often differentiable on I, and [x0 , x] is strictly
contained in I, we get that f (n+1) (ξ) (i.e., the derivative of f (n) at ξ) exists for all ξ ∈ [x0 , x].
This implies that f (n) , and therefore |f (n) |, is continuous on [x0 , x], see Theorem 5.10. From
Theorem 4.56 we know that a continuous function on a closed interval attains its maximum, say
at ξ ∗ ∈ [x0 , x], i.e., supξ∈[x0 ,x] |f (n) (ξ)| = |f (n) (ξ ∗ )|.
√
n
|f (n) (ξ)|
Using our assumption limn→∞ n = 0 for all ξ ∈ I, we obtain, in particular, that
q
n
|f (n) (ξ ∗ )| 1 nn
< ⇐⇒ |f (n) (ξ ∗ )| <
n 5|x − x0 | 5n |x − x0 |n

for all large enough n. Moreover, we have that nn ≤ 4n n! for all n ∈ N. This can be proven
inductively, by using (1 + n1 )n ≤ 4, see Example 3.36. With Rn from Theorem 5.53, we obtain

|f (n) (ξ ∗ )|
n
nn 1 4
|Rn−1 (x)| ≤ |x − x0 |n ≤ ≤ −→ 0,
n! n! 5n 5
where the second inequality only holds for large enough n. Since x ∈ I was arbitrary, we get
∞
X f (k) (x0 )
f (x) = (x − x0 )k for all x∈I
k=0
k!

from Theorem 5.53.

183

Let us come back to the last example.

Example 5.63. Let f (x) = ex on R. As already discussed above, we have |f (n) (x)| = ex , which
is independent of n. We can therefore apply the second part of Theorem 5.61 with I = R and
x0 = 0, and get
∞
X xk
ex =
k=0
k!
for all x ∈ R.

We now consider the trigonometric functions. It may come as a surprise, that one can
approximate these periodic (wave) functions up to an arbitrary precision by polynomials on the
whole real line. However, a visualization of the first Taylor polynomials is very instructive for
understanding what’s going on.

Example 5.64. Let us have a look at the cosine cos : R → R.

We clearly know by now that cos0 = − sin, cos00 = − cos, cos(3) = sin, cos(4) = cos, and so on.
In any case we have | cos(k) (x)| ≤ 1 for all k ∈ N and x ∈ R. Theorem 5.61 now implies that
the cosine can be written by its Taylor series for all x0 , x ∈ R. Choosing the point x0 = 0, we
get the function values of the derivatives cos(0) = 1, cos0 (0) = 0, cos00 (0) = −1, cos(3) (0) = 0,
cos(4) = 1, and this “1, 0, −1, 0” pattern repeats periodically. In formulas, cos(2k−1) (0) = 0 and
cos(2k) (0) = (−1)k for all k ∈ N. We obtain
∞
X x2k x2 x4
cos x = (−1)k = 1− + − +....
k=0
(2k)! 2! 4!

In the same way, one can calculate the following Taylor series, and prove their convergence:
∞
X x2k+1 x3 x5
sin x = (−1)k = x− + − +...
k=0
(2k + 1)! 3! 5!
∞
X x2k x2 x4
cosh x = = 1+ + + ...
k=0
(2k)! 2! 4!
∞
X x2k+1 x3 x5
sinh x = = x+ + + ...
k=0
(2k + 1)! 3! 5!

(Prove this yourself!)

Remark 5.65. In the proof of Theorem 5.61 we have used the bound nn ≤ 4n n! for all n ∈ N.
Later we will prove Stirling’s formula, which states that, in fact,
√ n
n

1
√ n
n
2πn ≤ n! ≤ 1+ 2πn
e 11n e
√
n
n!
for all n ∈ N. In particular, it implies nn ≤ en · n! for all n, and n → 1e .
184

Example 5.66 (Taylor series of ln(x)). Let us now consider the natural logarithm f (x) = ln(x)
on R+ = (0, ∞), which is an example that shows that a Taylor series may converge only for x
in a neighborhood of x0 , i.e. we cannot apply the second part of Theorem 5.61.
For all x ∈ R+ we obtain f 0 (x) = x1 = x−1 , and therefore
(−1)k−1 (k − 1)!
f (k) (x) =
xk
for all k ∈ N and x ∈ R+ , see Example 5.3. With this, the Taylor polynomial of f around
x0 ∈ R+ is given by
n n
X f (k) (x0 ) X (−1)k−1 (x − x0 )k
Tn (x) = ln(x0 ) + (x − x0 )k = ln(x0 ) + ·
k=1
k! k=1
k xk0
n k
(−1)k−1 x
X
= ln(x0 ) + · −1 .
k=1
k x0
Clearly, these sums converge absolutely as n → ∞ if | xx0 − 1| < 1 (use e.g. the root test), which
k−1
holds if and only if x ∈ (0, 2x0 ). Moreover, for x = 2x0 , we have Tn (x) = nk=1 (−1)k , which
P

is the alternating harmonic series and, therefore, convergent, see Example 3.93. For all other x,
i.e. x > 2x0 , the Taylor series is clearly not convergent, since the terms of the sum are not a null
sequence. Hence,
∞
1 x k
X
ln(x) = ln(x0 ) − · 1−
k=1
k x0
holds if and only if 0 < x ≤ 2x0 . Choosing x0 = 1 we obtain the typical series expansion of
ln(x) at x0 = 1:
∞
X (1 − x)k
ln(x) = −
k=1
k
Pn (−1)k−1
for x ∈ (0, 2]. In particular, ln(2) = k=1 k .

Remark 5.67. Note that one may write functions f (x) as its Taylor series at different points x0 .
This can be used to evaluate, or give a short notation for, certain infinite sums. For example, if
we consider the Taylor series of ln(x) and cos(x) we see, e.g., that
∞ k
1 1
X
1− = ln(e) = 1
k=1
k e
(use x0 = e and x = 1), or
∞
X (−π)k √
= cos π .
k=1
(2k)!

Remark 5.68 (Complex arguments). Note that the series in the above expansions of ex , sin,
cos etc., are absolutely convergent series for all x ∈ R. This means, that the series make also
sense if we allow x to be a complex number, i.e., x ∈ C. This is the natural way of extending
real valued functions to the complex case.
Example 5.69. Use Taylor series to prove that
eix = cos x + i sin x
√
for all x ∈ R, where i := −1.
185

5.6 (*) Newton’s method

In the last section about differentiability we want to study Newton’s method, which is a com-
monly used method if we want to calculate zeros of a function. Before we prove the convergence
of this method we want to discuss the main idea in detail. We are interested in solving

f (x) = 0.

If f is differentiable then we can use Taylor’s theorem to approximate f

f (x) ≈ f (x0 ) + f 0 (x0 )(x − x0 ).

Here x0 is a fixed value, so we can easily solve

f (x0 ) + f 0 (x0 )(x − x0 ) = 0

f (x0 )
and obtain x = x1 = x0 − f 0 (x0 ) . This implies an algorithm, where for arbitrary x0 we compute
in each step
f (xk )
xk+1 = xk − .
f 0 (xk )
Geometrically, we compute xk+1 consecutively as the zero of the tangent of f in the point xk .
Hopefully, this gives a point which is at least very close to a zero of f after enough calculation
steps.

Figure 40: Some steps of Newton’s method

Now we consider f ∈ C 2 (I), i.e., twice continuously differentiable functions, such that there
exists some α ∈ I such that f (α) = 0. Moreover, we assume that f 0 (x) 6= 0 for all x ∈ I. This
ensures that xk+1 is always well-defined.
Now we use Taylor’s theorem and get for arbitrary x ∈ I that
f 00 (ξ)
0 = f (α) = f (x) + f 0 (x)(α − x) + (α − x)2 ,
2
for some ξ ∈ (x, α) (or (α, x)). We can divide this equation by f 0 (x) 6= 0 and obtain
f (x) 1 f 00 (ξ)
0= + (α − x) + (α − x)2 .
f 0 (x) 2 f 0 (x)
186

Regrouping and plugging in x = xn yields

f (xn ) 1 f 00 (ξ)
xn+1 − α = xn − − α = (α − xn )2 .
f 0 (xn ) 2 f 0 (xn )

Thus
1 f 00 (ξ)

|xn+1 − α| = 0 · |xn − α|2 .
2 f (xn )
00

So we see that if ff0 (x(ξ) does not behave too badly, and we choose x0 not too far from α, the

n)
error should decrease quadratically. We will now explain how to choose x0 , in order to guarantee
convergence.
For this, define
m1 := sup |f 00 (x)| and m2 := inf |f 0 (x)|
x∈I x∈I

and assume m2 > 0. With the above formula we obtain that

m1
|xn+1 − α| ≤ · |xn − α|2 .
2m2
To guarantee that the method converges, we want that the distance to the solution gets smaller
m1
and smaller, i.e., |xn+1 − α| < |xn − α|, we need that 2m 2
· |xn − α| < 1. That is, we need
|x0 − α| < 2m
m1
2
, which might be very small, as this then implies the bound for the other n.
This shows the local convergence of Newton’s method.

Theorem 5.70 (Local convergence of Newton’s method). Let I = (a, b) and f : C 2 (I) with
f (α) = 0 for some α ∈ I. Moreover, assume there is some δ > 0 such that

inf x∈Uδ (α) |f 0 (x)|

δ < .
supx∈Uδ (α) |f 00 (x)|

Then, for all x0 ∈ Uδ (α), the Newton’s method converges, and we have |xn −α| ≤ 2−n |x0 −α|.
187

6 Basic integration theory

In the last section we discussed the differential calculus of functions defined on the real line, or
on an open interval I ⊂ R. With this we discussed certain local properties of functions, and
how these properties can be characterized by using derivatives.
Note that the derivative f 0 of a differentiable function f : I → R is again a function on I. If f
is continuously differentiable, then f 0 is continuous, and so on. This leads to the question, if (at
least in some cases) there is also an inverse operation, or in other words:

Given some f : I → R, is there a differentiable function F : I → R such that F 0 = f ?

Such a function will be called a antiderivative of f .

In contrary with the definition of the derivative, which is unique if the function is differentiable,
there are some problems with this definition. In particular, we will see that, if an antiderivative
exists, it is not unique. (One can easily see, e.g., that all constant functions F (x) = c, c ∈ R,
are antiderivatives of the zero function f (x) = 0.) For other functions, the antiderivative just
does not exist. However, we will see that an antiderivative exists for all continuous functions,
and even more. This statement is one part of the fundamental theorem of calculus.

The other part of the theorem is concerned with the integral of a function over an interval:

For a non-negative function f : [a, b] → [0, ∞),

Rb
the integral a f (x) dx is the area between the x-axis and the graph of f ,
i.e., the area of the set (x, y) ∈ R2 : a < x < b, 0 < y < f (x) .

Clearly, we have to define precisely what this means. In contrast to derivatives, which were
closely connected to the slope at a point, local extrema, and other local properties of a function,
the integral is a global quantity. However, we will see later that both concepts are very much
related. In particular, if f : [a, b] → R is continuous, then there exists an antiderivative F of f
and Z b
f (x) dx = F (b) − F (a).
a
This is (the second part of) the fundamental theorem of calculus.

In this chapter we will first discuss antiderivatives of functions. Then we introduce a basic
definition of an integral, and show the fundamental theorem of calculus, which gives a rather
easy way of computing integrals (or areas). Later, we will see the limitations of this (too naive)
approach, and turn to the more powerful Lebesgue integral. For this we also need to discuss
some basic measure theory, i.e., we discuss what we mean by the area of a set.

Let us shortly comment on the existence of “different definitions of an integral”:

Over the centuries, several different approaches have been introduced to define the integral of a
function, or equivalently the area of a set, precisely. This may be surprising, since the area is a
unique number. And in the ’simple’ cases that one is usually concerned with (i.e., for functions
that one can draw), it is indeed the case that all the different approaches, that one can find in
the literature, coincide. However, when it comes to theory, and the question which functions are
integrable, then it is worth to think about clever concepts. In fact, all of the concepts below have
some disadvantages, and there are always functions, which are not integrable. This corresponds
to the problem that there are sets (which are by the way impossible to draw) to which we cannot
assign a meaningful area. We will discuss that shortly later.
188

Remark 6.1 (History). The first systematic ideas of an area (or integral) go back to 500 BC,
when people tried to compute the area of simple areas, like land plots. Already in the 3rd
century BC, Archimedes (c. 287 – c. 212 BC) used these ideas to find an approximation of the
area of a circle with radius one, and thereby determined the inequality 3 + 10 10
71 < π < 3 + 70 .
Only in the 19th century, mathematics was brought to a more formal level, which allowed for
very precise statements. The first “correct” definition on an integral was given by Augustin-Louis
Cauchy (1789–1857) in 1823. This was extended (or improved) by several mathematicians. The
most famous approaches are the Riemann integral, introduced in 1854 by Bernhard Riemann
(1826–1866) and by now the classical approach for introducing an integral to students, and the
Lebesgue ingegral, introduced in 1902 by Henri Léon Lebesgue (1875–1941), which is the one
actually used in research. Here, we only shortly comment on the Riemann integral, and focus
on the definition of the more powerful Lebesgue integral.

6.1 Antiderivatives

The theory of the previous chapters was mostly done for functions defined on (open) intervals,
which was enough to present the most important results. However, all the definitions (and many
of the results) would also make sense for functions defined on the union of open intervals, by
considering the function separately on each interval, or even on more general domains. But note
that for some theorems, like the mean value theorem (Theorem 5.34), it was essential that the
domain was an interval.
In order to come closer to more formal (or general) statements of theorems, we will present the
following for a larger class of domains. This will also be necessary since we want to define the
integral also over general domains.

Definition 6.2. Let Ω ⊂ R. Then, we call Ω an open set, if

∀x ∈ Ω ∃ε > 0 : Uε (x) ⊂ Ω,

where Uε (x) = (x − ε, x + ε) is the ε-neighborhood of x.

That is, around every point there is a small open interval, that is completely contained in Ω.
Moreover, let Ωc := R \ Ω denote the complement of Ω.
Then, we call Ω a closed set, if Ωc is an open set.

Clearly, open intervals (a, b) and Ω = R are open. Also sets of the form (a, ∞), and their unions,
are open. Therefore, also R \ {0} = (−∞, 0) ∪ (0, ∞) is open. Similarly, we obtain that closed
intervals [a, b] are closed since ([a, b])c = R \ [a, b] = (−∞, a) ∪ (b, ∞) is open. Therefore, also
sets {a}, that contain only one element, are closed.

Remark 6.3. By this notation we can present the results in more generality, and without
specifying a specific form of the domain. Note that open sets Ω are exactly those sets, where
we can define the derivative of a function at every x ∈ Ω. (We had problems with the boundary
points!) However, if we consider in the following an open (or closed) set Ω, you may just think
about an open (or closed) interval, or unions of them.
189

We now turn to antiderivatives.

Definition 6.4. Let Ω ⊂ R be an open set. If F : Ω → R is a differentiable function such

that
F 0 (x) = f (x) for all x ∈ Ω,
then we call F an antiderivative or indefinite integral of f .
We also use the notation Z Z
F = f (x) dx = f dx

to say that F is a antiderivative of f , and call f the integrand.

This definition seems easy to handle since all of us already computed many derivatives. However,
if we compute a derivative of a differentiable function, then we always ended up with a (unique)
function, and we had a nice point-wise criterion for deciding if a function is differentiable.
This is now different since we want to find a function F , but we only have information about
its derivative F 0 = f . This is not enough to end up with a unique antiderivative F . To see this,
note that knowing the slope in each point does not give any information about the function
values at all. This is because a function with the same derivative might be at any “height”. Let
us write this down mathematically. For any function f and F , and c ∈ R, we have that

F is an antiderivative of f ⇐⇒ F + c is an antiderivative of f.

For this, we only used that the derivative of a constant function equals zero. In particular, if a
function has an antiderivative, then it has infinitely many.
R
Remark 6.5. Note that the notation f (x) dx, which shall denote a function, might be confus-
ing,
R
because it does not allow for the direct use of function values. E.g.,
R
we would never write
f (x) dx(2) for F (2) or so. Moreover, the correct meaning of F = f (x) dx is just “F is an
antiderivative of f ”, which is not an actual equality, but the derivatives of both sides have to
coincide everywhere. However, this notation is useful when we want to talk about (properties
of) the antiderivative as a function, since we do not have to reserve/waste a new letter.

Let us start with the easy example of the exponential function ex , which does not change under
differentiation.
Example 6.6. For the exponential function, we know that (ex )0 = ex , and therefore that
F (x) = ex is one possible antiderivative, i.e.,
Z
ex dx = ex .

However, if one asks for all antiderivatives of ex , then we have to take F (x) = ex +c for arbitrary
c ∈ R, i.e., Z
ex dx = ex + c.

In most applications, it is enough to know just one of the antiderivatives, and therefore we
mostly omit the constant c. However, keep in mind that a antiderivative is not unique.
Moreover, note that the two equations above combined do clearly not imply that ex = ex + c for
every x. The equal signs should be interpreted as “the derivative on the right hand side is ex ”.
190

1
Example 6.7. Now consider f (x) = x on Ω = R \ {0}, and we show that
dx 1
Z Z
= dx = ln |x|.
x x
First of all, we know from the last chapter that (ln(x))0 = x1 for all x > 0. Hence, F (x) = ln(x)
for x > 0. But ln(x) is not defined for x < 0 and therefore, it is not obvious how to choose F (x)
such that F 0 (x) = x1 . But it is easy to verify that, for x < 0, the function ln |x| = ln(−x) is well
defined and (ln(−x))0 = −x 1
· (−1) = x1 . This proves the claim.
This is already an example that shows, that it might be hard to find the antiderivative of a
given function, but it is easy to verify that a function is an antiderivative.
(Hint: Always double check your antiderivative by calculating its derivative!)

Next we provide a list of antiderivatives which we will use from now on. All of them follow by
differentiating the right hand side. (Do this again as an exercise!)

ax
Z
ax dx = , a > 0, a 6= 1
ln a
xa+1
Z
xa dx = , a 6= −1
a+1
dx
Z
= ln |x|
Z x
cos x dx = sin x
Z
sin x dx = − cos x
dx
Z
= tan x
cos2 x
dx
Z
= − cot x
sin2 x
dx
Z
= arctan x
1 + x2
dx
Z
√ = arcsin x
1 − x2
dx
Z
√ = arcosh x
x2 − 1
dx
Z
√ = arsinh x
x2 + 1

R
All the antiderivatives f dx above exist on the whole domain where f is defined.

However, not all functions have a antiderivative, as the following example shows.
Example 6.8. If we consider the function f : R → R with f (x) = 1 for x ≥ 0, and f (x) = 0
for x < 0, i.e., the Heaviside function, then it is not hard to see that an antiderivative must be
constant, say F (x) = c, for x < 0, and linear, say F (x) = x + b for x > 0. Otherwise we would
not get F 0 (x) = f (x) for x 6= 0. It remains to consider x = 0. Since F has to be differentiable,
it has to be continuous, and we obtain that b = c. However, it is easy to check that such a
function F cannot be differentiable at 0. (F has a kink.)
191

6.2 Calculation rules for antiderivatives

As for derivatives we will now present some rules that are useful when we want to find the
antiderivative of a complicated function, which is composed of some elementary functions, like
the ones given above.
However, and unfortunately, although we were able to determine the derivative for nearly any
combination of ’easy’ functions, it is much harder to find an antiderivative. In fact, a very
common strategy is to guess an antiderivative, and then to verify it by calculating its deriva-
tive. For this, one clearly needs to be well-practiced in calculating derivatives. Moreover, it is
sometimes just impossible to determine a closed formula for the antiderivative, even for ’easy
2 2
looking’ functions like e−x . We will discuss e−x dx and similar functions later in more detail.
R

The first calculation rule for antiderivatives, which directly follows from the corresponding rules
for derivatives, is the linearity.

R R
Lemma 6.9 (Linearity). Let F = f (x) dx and G = g(x) dx. Then, for all α, β ∈ R,
Z Z Z
αF + βG = α f (x) dx + β g(x) dx = αf (x) + βg(x) dx.

If F and G have different domains, then F +G is understood as a function on the intersection.

Proof. We only have to verify that the derivative of the function on the left equals the one in
the intgral on the right. By Theorem 5.11, we obtain

(αF + βG)0 = αF 0 + βG0 = αf + βg,

since F 0 = f and G0 = g.

Now let us see some examples.

Example 6.10. We have

x4 x3
Z Z Z
3 2 3
(x + x ) dx = x dx + x2 dx = + .
4 3
x4 x3
In particular, all antiderivatives of x3 + x2 are of the form F (x) = 4 + 3 + c for some c ∈ R.

Example 6.11. We have

Z
√ 2
Z
3/2 2
Z Z
3/2
Z
x+x dx = x + 2x +x dx = x dx + 2 x dx + x2 dx

x2 4x5/2 x3
= + + .
2 5 3
192

In some cases, one may need some modifications of the integrand to bring it to the right form.

Example 6.12.

x2 − x4 1 − x4 − (1 − x2 ) 1 − x2
Z Z Z
dx = dx = 1 − dx
1 − x4 1 − x4 (1 − x2 )(1 + x2 )
1
Z Z
= 1 dx − dx = x − arctan x.
1 + x2

In the same way as we used linearity of differentiation above, we can also utilize the other
calculation rules from Section 5.1 to deduce rules for antiderivatives.
Recall that the product rule states that

(f g)0 = f 0 g + f g 0

whenever the derivatives exist, see Theorem 5.13. Since this equality shows that f g is an
antiderivative of f 0 g+f g 0 , we obtain the rule of integration by parts (or partial integration)
by rearranging the terms.

Lemma 6.13 (Integration by parts). Let f and g be differentiable functions. Then,

Z Z
0
f g dx = f g − f g 0 dx.

Let us see an example that shows how this rule is usually applied.
R
Example 6.14. Assume we want to compute F = ln(x) dx, i.e., an antiderivative of ln(x).
Since we know the derivative of ln, i.e. (ln(x))0 = x1 , we may choose g(x) = ln(x) above.
Additionally, we choose f (x) = x, which satisfies f 0 (x) = 1. We obtain
Z Z Z Z
ln(x) dx = 1 · ln(x) dx = f 0 g dx = f g − g 0 f dx
1
Z
= x ln(x) − x dx = x ln(x) − x
x
= x(ln(x) − 1).

The same ’trick’ is very useful for integrating products of polynomials with sine, cosine or
exponential functions. However, sometimes one has to integrate several times to obtain a explicit
formula for the antiderivative.

Example 6.15. If we want to calculate x sin x dx, we set f 0 (x) = sin x and g(x) = x, which
R

implies that g 0 (x) = 1 and f (x) = − cos x. Partial integration yields

Z Z
x sin x dx = (− cos x)x − (− cos x) dx = sin x − x cos x.
193

Example 6.16. If we want to calculate x2 ex dx, we set g(x) = x2 and f 0 (x) = ex , which
R

implies that g 0 (x) = 2x and f (x) = ex . Partial integration yields

Z Z
x2 ex dx = x2 ex − 2 x ex dx.

This still involves the integral x ex dx, which we calculate by setting g(x) = x and f 0 (x) = ex ,
R

which implies that g 0 (x) = 1 and f (x) = ex . Again, by partial integration

Z Z
x ex dx = xex − ex dx = xex − ex ,

and therefore Z
x2 ex dx = x2 ex − 2xex + 2ex .

All the examples above involve polynomials (or more precisely, monomials), which vanish after
enough differentiation. This is not the case if we consider the product of a trigonometric and a
exponential function. In such cases, it may happen that we reach the same integral again after
some steps of partial integration. This can be used to calculate the antiderivative.

Example 6.17. We try to calculate cos(x) 2x dx.

We set f 0 (x) = cos(x) and g(x) = 2x , which yields f (x) = sin(x) and g 0 (x) = ln(2) · 2x .
Therefore, Z Z
cos(x) 2x dx = sin(x) 2x − ln(2) · sin(x) 2x dx.

Partial integration also gives

Z Z
sin(x) 2x dx = − cos(x) 2x + ln(2) · cos(x) 2x dx.

(Verify yourself!) Both equations together imply

Z Z
x x x
cos(x) 2 dx = sin(x) 2 + ln(2) cos(x) 2 − (ln(2)) · 2
cos(x) 2x dx.

Now, the same indefinite integral appears on both sides. Rearranging, and dividing by (1+ln(2)2 )
leads to
sin(x) + ln(2) cos(x)
Z
cos(x) 2x dx = 2x .
1 + ln(2)2

sin(x)−cos(x)
sin(x) ex dx = ex
R
Example 6.18. Show that 2 .

The next rule we want to employ is the chain rule, see Theorem 5.14. For this recall that for
two differentiable functions F and g, we have
d
(F ◦ g)0 (x) = F (g(x)) = F 0 (g(x)) · g 0 (x).
dx
If F is a antiderivative of f , then this shows that F ◦ g is an antiderivative of (f ◦ g) · g 0 . This
is called the substitution rule.
194

R
Lemma 6.19 (Substitution rule). Let F = f x and g be a differentiable function. Then,
Z
f g(x) g 0 (x) dx.

F g(x) =

Let us again discuss some examples to understand this rule.

Example 6.20. Assume we want to calculate x6 cos(x7 + 1) dx.

(This may also be done by partial integration, but would take ages.)
We see that the difficult part is the “x7 + 1” in the cosine. Let’s write g(x) = x7 + 1. Then, we
have g 0 (x) = 7x6 . If we now write f (x) = cos(x), we obtain
1
Z Z
6 7
x cos(x + 1) dx = g 0 (x) f (g(x)) dx.
7
The integral on the right hand side equals F (g(x)) by the substitution rule, where F (x) = sin(x)
is the antiderivative of f . We obtain

sin x7 + 1

1
Z
6 7
x cos(x + 1) dx = F (g(x)) = .
7 7

The substitution rule is sometimes easier to handle, if we introduce the substitution t = g(x).
If we then use the very informal reasoning that

dt d d
= t= g(x) = g 0 (x),
dx dx dx
and that this “implies” dt = g 0 (x) dx ( ⇐⇒ dx = dt
g 0 (x) ), then we can write
Z Z
0
f (g(x)) g (x) dx = f (t) dt = F (t).
R
Note that f (t) dt is now a function in t, and we have to replace t = g(x) at the end. Al-
though the above arguments were rather informal, we know that this formula is correct from
the substitution rule.

Example 6.21. Consider the integral

Z 6
x3 − 5 · x2 dx.

(Again, one could expand the brackets and integrate all the easy terms, which is rather lengthy.)
dt
With the substitution t = x3 − 5, we have dx = 3x2 , and therefore x2 dx = dt
3 . We obtain

dt t7
Z 6 Z
x3 − 5 · x2 dx = t6 = .
3 21
If we finally substitute t = x3 − 5, we obtain
7
x3 − 5
Z 6
3 2
x −5 · x dx = .
21
195

Example 6.22. There are also quite tricky √ examples, which can be solved by substitution.
Consider for example
R√
the function f (x) = 1 − x2 for x ∈ [0, 1], and we want to compute its
antiderivative 1 − x2 dx. Here, we use the substitution the other way around and define
x = sin(t) (is equivalent to t = arcsin(x)) for t ∈ [0, π2 ]. We obtain dx
dt = cos(t), and therefore
Z p Z q
1 − x2 dx = 1 − sin2 (t) · cos(t) dt.

Since cos2 (t) + sin2 (t) = 1, and cos2 (t) = | cos(t)| = cos(t) for t ∈ [0, π2 ], we get
p

Z p Z
1 − x2 dx = cos2 (t) dt for x = sin(t).

By partial integration, with f 0 (t) = g(t) = cos(t), yields

Z Z
cos2 (t) dt = sin(t) cos(t) + sin2 (t) dt
Z
= sin(t) cos(t) + 1 − cos2 (t) dt
Z
= sin(t) cos(t) + t − cos2 (t) dt

From this equation we get cos2 (t) dt = 21 (sin(t) cos(t) + t). Putting this in the equation above,
R
√
re-substituting x = sin(t) (or t = arcsin(x)), and noting that cos(arcsin(x)) = 1 − x2 (Try to
prove this!), we obtain
1 p
Z p
1 − x2 dx =x 1 − x2 + arcsin(x)
2
Note that, although the left hand side looks elementary, its antiderivative could involve functions
that one would not expect there, e.g., arcsin. We will see later how this integral is related to
the area of a circle.

There is one particularly useful rule that follows directly from the substitution rule. One may
even deduce it directly from the product rule by noting that, for a differentiable function f , we
have
0 f 0 (x)
ln |f (x)| = ,
f (x)
whenever f (x) 6= 0. We obtain the following corollary.

Corollary 6.23. Let f : Ω → R \ {0} be a differentiable function. Then,

f 0 (x)
Z
dx = ln |f (x)|.
f (x)

To see that this result follows from the substitution rule, consider the substitution t = f (x).

Example 6.24. The last formula is always useful, when one wants to find the antiderivative of
a rational function, such that the the numerator is the derivative of the denominator.
An easy application is
x 1 2x 1
Z Z
2
dx = 2
dx = ln(1 + x2 ),
1+x 2 1+x 2
where we set f (x) = 1 + x2 , which gives f 0 (x) = 2x, and omit the absolute value, because f > 0.
196

Another application is

sin(x)
Z Z

tan(x) dx = dx = − lncos(x).
cos(x)

For this, let f (x) = cos(x), which implies that f 0 (x) = − sin(x).

Remark 6.25. Let us finally recall again, that an antiderivative is not unique. It is just a
function, whose derivative satisfies something. However, it can be unique, if we know in advance
that it satisfies additional conditions. In particular, one function value is enough. For example,
if we are looking for an antiderivative F of ex , such that F (0) = 0, then we obtain F (x) = ex −1.
(Verify yourself!)
This is a special case of so-called initial value problems (or boundary value problems):
For given f : I → R, x0 ∈ I and y0 ∈ R, we want to find F such that F 0 = f and F (x0 ) = y0 .
One may think about the following example: Let F (t) be the position of a train (or car, or
particle) at time t moving on the real line. Then, f (t) = F 0 (t) is the velocity. If we assume
that only f is given, i.e., we know only the velocity at every point in time, then we can clearly
say something about overall distance traveled, say, between time t = 0 and t = 1. However, we
have no chance to say, where the train actually is at time t = 1 (or any other time), if we do
not know where the train departed.

6.3 A first definition of the integral

As noted in the beginning of this chapter, antiderivatives are very much connected to the inte-
gration of a Rfunction. That is, for a given function f : R → [0, ∞), and a given interval [a, b],
the integral ab f (x) dx is the area between the x-axis and the graph of f , i.e., the area of the set
2

(x, y) ∈ R : a < x < b, 0 < y < f (x) , see Figure 41.

Rb
Figure 41: The area of the shaded region is denoted by a f (x) dx

Clearly (and as we all hopefully already got used to), we have to define precisely what this
means. In particular, we have to clarify which functions are integrable, and which functions (or
sets) do not allow for the definition of a meaningful area.
197

Remark 6.26. It might come as surprise that there are sets, such that it is not possible to assign
them an area (or a volume in higher dimension). This corresponds to the existence of functions
such that we just cannot say how big the area below the graph is. However, we already were
in a similar situation when we discussed in Chapter 5 that not all functions are differentiable
(Consider, e.g., f (x) = |x| at x = 0.). And, similarly to there, the answer to “Is f integrable?”
very much depends on the precise definitions. As we discussed in Remark 6.1, there were several
attempts to obtain “the most general” or “the most useful” definition of an integral. (Presently,
we consider the Lebesgue integral, which we will introduce later, the “best”. But who knows...)
However, it was a great advance in mathematics to understand that there cannot be a universal
definition that assigns an integral to any function. And this insight has a rather fascinating
reformulation, if we talk about volumes in three dimensions:
The Banach-Tarski paradox states that we can split a ball into five disjoint pieces and, without
deforming one of them, we can put them back together in a different way to obtain two identical
copies of the same ball. In particular, if we would be able to assign a volume to each of the
(impossible to draw) pieces, then this construction would show that the volume of the ball equals
twice the volume of the same ball. An obvious contradiction. A proof of this statement (which
appeared first in 1924), goes far beyond the scope of this lecture.
It is obvious, why this is called a paradox: The same is clearly not possible in our real (physical)
world, as we cannot just double a ball. This is one of the most prominent examples of a very
counterintuitive mathematical result and shows that we have to be careful with our definitions.

We now introduce one possibility of defining an integral. Actually, this is probably the most
simple and straightforward definition, which is therefore restricted to “easy” functions. Later,
when we need more powerful mathematical tools, we have to give also a more involved definition.
However, note that for “easy” functions (that one can draw) it is imperative that all these
different definitions should give the same result.
Let us consider a continuous function f : Ω → R, and a closed interval [a, b] ⊂ Ω. (Note that
f is therefore bounded on [a, b].) If we now want to calculate the area between the graph of f
and the x-axis, then we could divide the interval [a, b] into equal subintervals, and approximate
the area in this subinterval just by the area of a suitable rectangle. Clearly, there are many
reasonable choices. Figure 42, e.g., shows the (bad) approximation of the integral by using the
smallest and the largest rectangle in each interval (when we divide it only into four subintervals).

Figure 42: Using the smallest and the largest rectangle.

198

Although the resulting approximation of the integral might be quite different, this difference
gets smaller and smaller if we increase the number of subintervals, see Figure 43.

Figure 43: Smallest and the largest rectangles

This suggests that it is actually unimportant which of the rectangles we take, and we will prove
that this is in fact the case when we consider continuous functions on a closed interval.
Therefore we may just use the rectangles whose height is given by the left endpoint of the
subinterval. Note that, in the case of monotonically increasing functions, these are the same as
the lower rectangles, see Figure 42(left).

Remark 6.27. Note that all the reasoning also makes sense for possibly negative functions.
In this case, the area between f and the x-axis is counted negatively. In particular, if the integral
of a function is zero, then this only means that the area above the x-axis equals the area below.

Let us put this into formulas:

Assume we divide the interval I = [a, b], which has length (b − a), into n ∈ N subintervals of
equal length, which are therefore all of length b−a
n . That is, we use the partition

b−a b−a 2(b − a) (n − 1)(b − a)

[a, b] = a, a + ∪ a+ ,a + ∪ ... ∪ a + ,b
n n n n
n−1
[ k(b − a) (k + 1)(b − a)

= a+ ,a + .
k=0
n n

Sn−1 h k k+1 i
Note that in the special case [a, b] = [0, 1], this partition has the simple form k=0 n , n .

For illustration purposes, let us stick to the case [a, b] = [0, 1]. To approximate the integral
of a function f : [0, 1] → R, we first consider the first subinterval [0, n1 ]. In this interval, we
approximate the area below the graph by the area of the rectangle [0, n1 ] × [0, f (0)], which is
clearly n1 · (f (0) − 0). See again Figure 42, where this area is just zero. We then consider
the second subinterval [ n1 , n2 ], approximate the area by the rectangle [ n1 , n2 ] × [0, f ( n1 )], which is
1 1
n f ( n ), and so on. If we add up the areas of all these rectangles, and call the sum Qn (f ), we
obtain
1 n−1
X k
Qn (f ) := f . (6.1)
n k=0 n
199

Similarly, we obtain for functions f : [a, b] → R on arbitrary intervals the sum

b − a n−1
X k(b − a)

f a+ .
n k=0 n

To assign the function f its integral over [a, b], it remains to show that these sums converge if
we make n larger and larger. This is given in the following lemma. To keep things simple, we
only show the case [a, b] = [0, 1]. The general case, can be proven along the same lines.
Moreover, as we already discussed above, our choice of the left endpoint to determine the height
of the rectangles was somewhat arbitrary, and we have to justify that it is indeed irrelevant. For
this, we show that one might also take the smallest or the largest of these rectangles in each
subinterval, and the result would still be the same. For this, define the lower sums
1 n−1 k k+1
X
Ln (f ) := min f (x) : x ∈ ,
n k=0 n n
and the upper sums
1 n−1 k k+1
X
Un (f ) := max f (x) : x ∈ , .
n k=0 n n
(Note that the minima and maxima exist, since f is continuous on the closed (sub-)intervals.)
Consider again the above pictures where the lower and upper sums are depicted.
In particular, we have
Ln (f ) ≤ Qn (f ) ≤ Un (f )
for every continuous function. (If this is not obvious to you, verify it!) So, if Ln (f ) and Un (f )
converge to the same value, then (by the sandwich rule) Qn (f ) also converges to that value.

Lemma 6.28. Let f : [0, 1] → R be continuous. Then,

lim Ln (f ) = lim Un (f ).
n→∞ n→∞

In particular, both limits exist. Therefore, also the sequence Qn (f ) n∈N
converges.

Proof. To prove that the limits of Ln (f ) and Un (f ) are equal, we will show that the difference
Un (f ) − Ln (f ) converges to zero, i.e., that for all ε > 0 there is some n0 ∈ N such that
|Un (f ) − Ln (f )| < ε for all n ≥ n0 . First of all, note that f is continuous on a closed interval,
and therefore uniformly continuous, see Theorem 4.65.
Let us now fix some ε > 0. By the uniform continuity we obtain that there is some δ > 0 such
that |x − y| < δ implies |f (x) − f (y)| < ε. If we now take some n0 > 1δ , we see that n1 ≤ n10 < δ
for every n ≥ n0 . Since the interval [ nk , k+1 1
n ] has length n , we obtain that |f (x) − f (y)| < ε for
all x, y ∈ [ nk , k+1
n ], if n ≥ n0 . (We use that |x − y| < δ for all such x, y and n.) In particular,

max f (x) − min f (x) < ε

k k+1 k k+1
x∈[ n , n ] x∈[ n , n ]

for all n ≥ n0 , and therefore

1 n−1 1 n−1
!
X X
Un (f ) − Ln (f ) = max f (x) − min f (x) < ε = ε.
n k=0 k k+1
x∈[ n , n ] k k+1
x∈[ n , n ] n k=0

As this holds for all ε > 0, this proves the claim.

200

By the above lemma, it does not matter which specific points we choose in the respective
intervals. We always obtain the same limit. Therefore, we can define the integral of a function
as the limit of the given average of the function values.

Definition 6.29. Let f : Ω → R be continuous, and consider an interval [a, b] ⊂ Ω.

Then, we define by

b − a n−1
Z b X k(b − a)

f (x) dx := lim f a+
a n→∞ n n
k=0

the (definite) integral of f over [a, b]. We call a and b the limits of the integral.

Note that ab f (x) dx (if it exists) is a number, i.e., the area between the graph and the x-axis.
R

Therefore, the “x” is just a placeholder, and one may use any other letter. That is, e.g.,
Z b Z b Z b
f (x) dx = f (t) dt = f (ξ) dξ = . . .
a a a

Rb Rb
We sometimes even omit the integration variable, and write a f dx = a f (x) dx.
Although the definition of the integral looks rather simple, it is not a very practical one. The
involved limit is usually hard to determine. However, we will see in the following section that
the integral can be given in terms of the antiderivative of a function. This is also the typical way
of calculating integrals, and justifies the similarity of the notations. But always bear in mind
that antiderivatives are functions (more precisely: classes of functions) and not just a number.

Example 6.30. Let us consider the function f : [0, 1] → R given by f (x) = x.

One does not need higher mathematics to see that the integral (i.e. the area below the graph)
is 21 . In fact, it is just half of the square with side-length 1. Let us see if this fits our definition.
Since f is continuous, its integral is given by

1 n−1 1 n−1 1 n−1

Z 1 Z 1
k X k
X X
f (x) dx = x dx = lim f = lim = lim 2 k.
0 0 n→∞ n n n→∞ n n n→∞ n
k=0 k=0 k=0

(n−1)n
From the formula n−1
P
k=0 k = 2 (which is called “Gaußsche Summenformel” in German, but
doesn’t seem to have a name in English) we then obtain
Z 1 Z 1 1
1 (n − 1)n 1− n 1
f (x) dx = x dx = lim 2
= lim = .
0 0 n→∞ n 2 n→∞ 2 2

From the definition of the integral as a limit, we obtain that it satisfies a list of rules, like
linearity. Most of them may even already be clear from the “graphical definition”. We state
them without a proof.
201

Lemma 6.31. Let f, g : Ω → R be continuous functions, and let [a, b] ⊂ Ω. Then,

Rb
1) f = 0 on [a, b] =⇒ a f dx = 0
Ra
2) a f dx = 0 (That’s the area over an interval with length 0.)
Rb Rb Rb
3) a f + g dx = a f dx + a g dx (Linearity)
Rb Rb
4) a λ · f (x) dx = λ · a f dx for λ ∈ R. (Homogeneity)
Rb Rc Rb
5) c ∈ [a, b] =⇒ a f dx = a f dx + c f dx (Splitting the area in two parts.)
Rb Rb
6) f ≤g =⇒ a f dx ≤ a g dx (Monotonicity w.r.t. the function)
Rd Rb
7) f ≥ 0 and [c, d] ⊂ [a, b] =⇒ c f dx ≤ a f dx (Monotonicity w.r.t. the limits)

There is one case of the above lemma that is particularly important. In analogy to the very
similar inequality for (finite) sums, this is also called triangle inequality.

Corollary 6.32. Let f : Ω → R be a continuous function, and let [a, b] ⊂ Ω. Then,

Z
b Z b
f dx ≤ |f | dx.

a a

Proof. Clearly, f ≤ |f | and −f ≤ |f |. The inequality therefore follows from |x| = max{x, −x},
Lemma 6.31(4) with λ = −1, and Lemma 6.31(6).

Let us finally state some remarks on difficulties with and variants of the above definition.

Remark 6.33 (Non-continuous functions). Continuity seems to be an unnecessary assumption

for the considerations above. For example, the definitions from above make perfect sense for
indicator functions of intervals: For a given set A, the indicator function, which is denoted
by χA , satisfies χA (x) = 1 for x ∈ A, and χA (x) = 0 for x ∈ / A. Clearly, χA is not continuous if
A ( R. If we consider an interval [a, b] ⊂ [0, 1], then it is not hard to verify (Do it!) that
Z 1
χ[a,b] (x) dx = b − a,
0

which equals the true area. (Note that Lemma 6.28 holds also for such functions.)
We will comment on such “piecewise functions” in detail in Section 6.6.
If we consider the Dirichlet function χQ instead, i.e.,R the indicator function of the rational
numbers, then, with our definition, we would obtain 01 χQ dx = 1, because all the function
values we compute are at rational points. And therefore, 01 χR\Q dx = 1 − 01 χQ dx = 0.
R R

However, this is unsatisfactory (and in contrast to our intuition), since there are more irrational
than rational numbers. One might check that Lemma 6.28 fails for this function. Hence, the
outcome of our “integration procedure” depends on the chosen function values, and we have to
be careful how we define an integral.
To solve this issue (at least partially) one must be more elaborate and think about a definition of
integrable functions. One could then define the integral for a much larger class of functions.
202

(Still not for all functions, see Remark 6.26.) We will talk about a more powerful definition,
i.e., the Lebesque integral, later. But, as stated many times, every generalization of the above
concept should lead to the same result when applied to a continuous function.

Remark 6.34 (Improper integrals). In the definitions above it was essential that we talk about
continuous (and therefore bounded) functions on a closed (and therefore bounded) interval. This
implies that all the rectangles that were used for the approximation of the integral have finite
size. Otherwise, the above definition is clearly useless. However, this can be relaxed a bit by
considering improper integrals. With this, it is also be possible to define the integral of some
unbounded functions on unbounded intervals, i.e., we may determine the area of unbounded
sets. We will shortly come back to this when we know how to compute integrals fast.

Remark 6.35. The “Q” in Qn (f ) in (6.1) stands for “quadrature rule”. This is how such
averages over function values are usually called. Quadrature rules are also very much used in
practice, but then one sometimes has to think about “more clever” averages. In particular, this
is important as we cannot compute the above limits in general, and therefore have to work with
finite n, which leads to errors that we might want to minimize. However, as we only work with
the limits here, Lemma 6.28 shows that such optimizations are not needed.

Remark 6.36. Even if we would work with more clever choices of points in the definition of the
quadrature rule, and therefore ultimately in the definition of the integral, this would still be not
enough for a “satisfactory definition”. (We do not want to comment here on what this means
exactly.) One of the first definitions that met these standards is the Riemann integral, see
Remark 6.1. For this definition we not only have to consider arbitrary points in each subinterval,
we also have to consider arbitrary subintervals (i.e., partitions) of the given interval. As this is
more technical than needed here, we skip the precise definitions and basic results, which can be
found in most beginners books on calculus.

6.4 The fundamental theorem of calculus

We now turn to the fundamental theorem of calculus, which provides us with an easier way of
determining integrals, and which shows the existence of antiderivatives for continuous functions.
Recall that a differentiable function F is an antiderivative of f if and only if F 0 = f . We start
with the following additional result, which is of independent interest.

Theorem 6.37 (Mean value theorem for definiteR integrals). Let f, g : [a, b] → R be contin-
uous functions, and assume that g ≥ 0. Then, ab f (x)g(x) dx exists and there exists some
ξ ∈ [a, b] such that
Z b Z b
f (x)g(x) dx = f (ξ) g(x) dx.
a a
In particular, we have (for g = 1) that
Z b
1
f (x) dx = f (ξ)
b−a a

for some ξ ∈ [a, b].

203

Proof. Since f is continuous, it attains its extrema on [a, b]. Let us denote them by m :=
minx∈[a,b] f (x) and M := maxx∈[a,b] f (x). It follows that m g(x) ≤ f (x)g(x) ≤ M g(x) for all
x ∈ [a, b], and therefore
Z b Z b Z b
m g(x) dx ≤ f (x)g(x) dx ≤ M g(x) dx,
a a a
Rb
see Lemma 6.31. We define I = a g(x)dx and obtain
Z b
m·I ≤ f (x)g(x)dx ≤ M · I.
a

If I = 0, then g = 0 and any ξ ∈ [a, b] can be used to obtain the result. Otherwise we divide by
I and get
1 b
Z
m≤ f (x)g(x) dx ≤ M.
I a
Due to the intermediate value theorem f attainsR every value in the interval [m, M ] (i.e., between
its exteme values), so in particular the value I1 ab f (x)g(x)dx.

This is all we need to formulate the main result of this section. For this, we set
Z b Z a
f dx := − f dx,
a b

whenever b < a. (Note that we have defined the left hand side only for a < b.)

Theorem 6.38 (Fundamental theorem of calculus). Let f be continuous on some open

interval I ⊂ R, and a ∈ I. Then, the function F : I → R defined by
Z x
F (x) = f (y) dy
a

is an antiderivative of f , i.e., F 0 = f .
Moreover, for any a, b ∈ I and any antiderivative F of f , we have
Z b
f (x) dx = F (b) − F (a),
a

and we write [F ]ba := F (b) − F (a).

Proof. First we show that F as given is an antiderivative of f . Therefore we calculate the

derivative of F , by considering its difference quotient. For h 6= 0, we have
Z x+h Z x !
F (x + h) − F (x) 1
= f (y) dy − f (y) dy
h h a a
Z x Z x+h Z x !
1
= f (y) dy + f (y) dy − f (y) dy
h a x a
Z x+h
1
= f (y) dy,
h x

where we used Lemma 6.31(5) in the second equation.

204

The mean value theorem implies the existence of some ξh ∈ [x, x + h] such that
Z x+h
1
f (y)dy = f (ξh ).
h x

Clearly, ξh → x as h → 0. The continuity of f therefore yields f (ξh ) → f (x) for h → 0. This

shows the first part of the theorem, i.e.,
F (x + h) − F (x)
F 0 (x) = lim = lim f (ξh ) = f (x)
h→0 h h→0

for all x ∈ I. For the second part, we first plug in a and b into the antiderivative that we just
defined and obtain
Z b Z a Z b
F (b) − F (a) = f (y) dy − f (y) dy = f (y)dy.
a a a

Moreover, note that different antiderivatives differ only by a constant. Therefore, the right hand
side is the same for any antiderivative of f .

Remark 6.39. The last theorem shows that integration gives us an antiderivative of a continu-
ous function f , and one may ask for an interpretation of this. One somehow sketchy possibility,
which originates from a graphical point of view, is the following:
If we start at F (a) and then add all the changes that F makes between a and b (if this would
be possible), then we arrive at F (b). Now, these changes (or slopes) are precisely the values of
the derivative of F , and summing all of them up is like integration. So, one might guess that
F (b) = F (a) + ab F 0 (x) dx. But, since F 0 = f , this is exactly what the fundamental theorem of
R

calculus is about.

With this very important and powerful theorem we can calculate many integrals easily. (At
least if you know many antiderivatives.) Let us consider again the example from above.
Example 6.40. Let us consider the function R1
f : [0, 1] → R given by f (x) = x.
We already showed in Example 6.30 that 0 x dx = 12 . This may also be shown by considering
2
F (x) := x2 , which is an antiderivative of f . We therefore have from the fundamental theorem
that " #1
x2 12 02
Z 1
1
x dx = = − = .
0 2 0 2 2 2

We now consider some more complicated examples, which may be difficult to compute only with
the definition of the integral as a limit.
x4 x3
Example 6.41. We know from Example 6.10 the antiderivative (x3 + x2 ) dx =
R
4 + 3 on R.
We therefore obtain, for example, the values
" #1
x4 x3
Z 1
3 2 1 1 7
(x + x ) dx = + = + = .
0 4 3 0
4 3 12
or " #1
x4 x3
Z 1
2
(x3 + x2 ) dx = + = .
−1 4 3 −1
3
(Try to find these values using the definition of the integral, or by any other means.)
205

Example 6.42. We now want to Rcompute definite integrals of the natural logarithm ln(x).
From Example 6.14 we know that ln(x) dx = x(ln(x) − 1) for all x > 0, i.e., all x such that
ln(x) is defined. From this, we obtain, e.g., the value
Z e
ln(x) dx = [x(ln(x) − 1)]e1 = e(ln(e) − 1) − (ln(1) − 1) = 1.
1

These examples already show that, with the techniques we’ve just learned, we can calculate
integral (or areas) precisely without much effort. (However, it is again essential that you know
many (anti)derivatives and how to obtain them.)

We will finally present (again) the rules for integration, that we already discussed in the section
about antiderivatives. Although they can be directly deduced from there, we state them again
for definite integrals to clarify their meaning.
Let us start with integration by parts, which is also called partial integration.

Corollary 6.43 (Integration by parts). Let f, g be continuously differentiable on [a, b].

Then,
Z b Z b
f 0 (x)g(x) dx = [f g]ba − f (x)g 0 (x) dx.
a a

Proof. This follows from Lemma 6.13 together with Theorem 6.38.

Remark 6.44. Note again that we did not define the derivative at a boundary point. Hence,
whenever we assume a function to be “continuously differentiable on [a, b]”, we actually mean
that the function is “continuous on [a, b] and continuously differentiable on (a, b)”. We think
this should not lead to any confusion.

Example 6.45. If we want to calculate 0π x sin x dx, we set f 0 (x) = sin x and g(x) = x, which
R

implies that g 0 (x) = 1 and f (x) = − cos x, see Example 6.15. Partial integration yields
Z π Z π
π
x sin x dx = (− cos x)x 0
− (− cos x) dx
0 0
π π
= (− cos x)x 0
+ sin x 0
= − cos(π) · π + sin π = π.

Example 6.46. Again, we can use this formula also more than once, see Example 6.16. However,
the involved formulas for definite integrals can sometimes be much simplified, in contrast to
antiderivatives. For example, if we want to calculate 01 x2 ex dx, we set g(x) = x2 and f 0 (x) = ex ,
R

which implies that g 0 (x) = 2x and f (x) = ex . Partial integration yields

Z 1 h i1 Z 1 Z 1
x 2 x x
2
x · e dx = x e −2 xe dx = e − 2 xex dx.
0 0 0 0

Applying the rule once more we end up with

Z 1
x2 · ex dx = e − 2 [xex − ex ]10 = e − 2.
0

(Check this in detail!)

206

Next we consider again the substitution rule, see Lemma 6.19.

Corollary 6.47 (Substitution rule). Let I = [a, b], f be continuous and g be continuously
differentiable on I. Then,
Z b Z g(b)
0
f (g(x)) g (x) dx = f (y) dy
a g(a)

Note that, usually, we have a (complicated looking) integral like the one on the left and want to
transform it to a more easy one, like the one on the right. We will come to some examples soon.

Proof. Although this follows directly from Lemma 6.19 together with Theorem 6.38, we present
a proof here, because this one is slightly different and one can see where the fundamental theorem
comes in. First of all, we use the chain rule, see Theorem 5.14, to calculate

(F ◦ g)0 (x) = (F 0 ◦ g)(x)g 0 (x).

If now F is an antiderivative of f , i.e., F 0 = f , then we can apply the fundamental theorem of

calculus two times to see
Z g(b) Z b Z b
0
f (x) dx = F (g(b)) − F (g(a)) = (F (g(x))) dx = f (g(x))g 0 (x) dx.
g(a) a a

When we use this rule we can use the following (formally not completely correct) strategy:
Rb
1. We want to calculate a f (g(x))g 0 (x) dx, i.e., we assume that the integral is of this form
for some f, g.

2. Set y = g(x).
dy
3. Calculate dx = d
dx g(x) = g 0 (x). (Now check again if the integral is of the above form!)
1
4. Regroup to dx = g 0 (x) dy and plug in.

5. Since the new integration variable is y = g(x), we have to replace the limits of the integral
by g(a) and g(b).
R g(b)
6. Consider the new integral g(a) f (y) dy.

Let us see some examples.

Example 6.48. Consider the integral 0π sin(2x) dx.

This is of the form above with f (x) = sin(x) and g(x) = 2x. Following the above procedure, we
dy
set y = 2x and calculate dx = 2, which implies dx = dy2 . Hence,
Z π Z 2π
1 1 2π 1
sin(2x) dx = sin(y) dy = − cos(y) 0 = (cos(0) − cos(2π)) = 0.
0 2 0 2 2
207

Example 6.49. Consider the integral 12 2x+42

1 R
dx.
dy dy
Setting y = 2x + 42 we obtain dx = 2, which implies dx = 2 .
Z 2 Z 46
1 1 1 1
dx = dy = ln(46) − ln(44) .
1 2x + 42 2 44 y 2

The above examples can be summarized in the following general rule.

Example 6.50 (Linear substitution). Let F be an antiderivative of some continuous function

f , and consider the integral
Z b
f (αx + β) dx,
a
1
where α 6= 0. We set y = αx + β. This implies dx = α dy, and hence
Z b Z αb+β
1 1
f (αx + β) dx = f (y)dy = (F (αb + β) − F (αa + β)) .
a α αa+β α

Once one gets used to this kind of substitutions, it should be no problem to calculate integrals
like Z 2
− cos(3) + cos(−11)
sin(7x − 11) dx =
0 7
in short time.

Let us finish this section with a warning regarding a typical mistake.

1 1
Example 6.51. Consider the function f (x) = x12 , and try to compute −1
R
x2
dx.
When we try to use the strategies we used so far, and are not careful enough, then we might
1
use the antiderivative of f , which we know to be F (x) = −x , and compute
Z 1 1
1 1

dx = = −1 − 1 = −2.
−1 x2 −x −1

However, this is clearly wrong since the integral of a positive function must be positive. But
where is the mistake? The problem is that f is not continuous on [−1, 1], which is necessary for
the fundamental theorem. In fact, the function is not even defined at x = 0.
Although this looks like a drastical example, it also shows that it is important to verify all
assumption before we apply such a theorem. Otherwise, the result might be completely senseless.

6.5 Improper integrals

So far we discussed how to calculate integrals (or areas below graphs), whenever the function and
the corresponding interval is bounded. However, one might imagine that we can also calculate
integrals of functions over unbounded intervals if the function is “small enough” for large x. By
a similar reasoning, we can integrate functions with a pole at the boundary, i.e., functions that
diverge to infinity at the boundary of the interval, if the divergence is “fast enough”. We discuss
both cases.
208

Let us start with unbounded intervals. In this case, we define the integrals
Z ∞ Z b
f dx := lim f dx,
a b→∞ a
Z b Z b
f dx := lim f dx,
−∞ a→−∞ a
Z ∞ Z 0 Z ∞
f dx := f dx + f dx
−∞ −∞ 0
whenever the integrals and limits on the right hand side exist. We then say that the integrals
on the left converge.
From the fundamental theorem of calculus, we know how to compute the finite integrals on the
right
Rb
hand side. That is, if F is an antiderivative of f (on the corresponding interval), then
a f dx = F (b) − F (a). To compute the above limits, it is therefore enough to compute the
limits for the antiderivative. That is, if we denote
F (−∞) := lim F (a) and F (∞) := lim F (b),
a→−∞ b→∞
then we obtain
Z ∞
f dx = [F ]∞
a = F (∞) − F (a),
a
Z b
f dx = [F ]b−∞ = F (b) − F (−∞),
−∞
Z ∞
f dx = [F ]∞
−∞ = F (∞) − F (−∞),
−∞
whenever the corresponding limits exist.
Let us see some examples.
Example 6.52. Consider f (x) = x1α = x−α on [1, ∞). An antiderivative of f is given by
1
F (x) = 1−α x1−α for α 6= 1, and F (x) = ln(x) for α = 1. Noting that F (x) converges to 0 when
x → ∞ if α > 1, and diverges otherwise, we obtain
(
Z ∞
dx 1
α−1 for α > 1,
= F (∞) − F (1) =
1 xα ∞ for α ≤ 1.

From this example we can deduce the following general rule for improper integrals. Note that
we had a very similar statement for series, see Lemma 3.91.

Lemma 6.53. Let a > 0 and f : [a, ∞) → R be a continuous function. Then,

R∞ c
• a f dx is convergent if f (x) ≤ xβ
for some c < ∞, β > 1.
R∞ c
• a f dx is divergent if f (x) ≥ x for some c > 0 and all large enough x.

Proof. Exercise.

As another example, show the following.

Example 6.54. Z ∞
dx
= π.
−∞ 1 + x2
209

By the above arguments, we see that it is not necessary for the antiderivative to be defined at
the limits of the integral. (Note that a function on R is never defined at ±∞. We can only
compute limits.) This can also be used if a function is defined, and has an antiderivative, on an
open interval (a, b), but not on the boundary points.
Example 6.55. Consider for example the functions f (x) = x1 or f (x) = ln(x) on (0, 1). Both
functions are continuousR on (0, 1), and therefore have an antiderivative on (0, 1). However, for
computing the integral 01 f dx by using the fundamental theorem directly, we would need the
values F (0) and F (1). But, since both functions are not even defined at 0, it makes no sense to
ask for a function whose derivative equals f at 0, i.e., there cannot be an antiderivative at 0.

Consider a continuous function f : (a, b) → R, and let F : (a, b) → R one of its antiderivatives.
We then define
F (a) := lim F (x) and F (b) := lim F (x),
x&a x%b

if these limits exist, and set

Z b
f dx = F (b) − F (a).
a
One can say that we replace F by its continuous extension to [a, b], when it exists, and then use
this extension in the fundamental theorem.
Remark 6.56. One could prove that the above limits of the function F exist if and only F is
uniformly continuous on (a, b). We omit the details here.

Let us again see an example.

Example 6.57. Consider again f (x) = ln x on (0, 1). From Example 6.42 we know that
F (x) = x(ln x − 1) is an antiderivative of f . First of all, F (1) = 1(0 − 1) = −1, but F (0)
(“= 0(−∞ − 1)”) is not defined. However, we can use l’Hospitals rule, see Theorem 5.36, to
calculate the corresponding limit. We obtain
ln x − 1 1/x
lim F (x) = lim = lim = lim −x = 0
x&0 x&0 1/x x&0 −1/x2 x&0

and therefore Z 1
ln x dx = F (1) − F (0) = −1.
0

Example 6.58. Consider f (x) = x1α = x−α on (0, 1]. An antiderivative of f is given by
1 1
F (x) = 1−α x1−α for α 6= 1, and F (x) = ln(x) for α = 1. Therefore, F (1) = 1−α for α 6= 1.
Noting that F (x) converges to 0 when x → 0 if α < 1, and diverges otherwise, we obtain
(
Z 1
dx 1
1−α for α < 1,
= F (1) − F (0) =
0 xα ∞ for α ≥ 1.
R1 1
√ dx = 2 (for α = 1 ) and 1 x dx = 1 (for α = −1), but 1 1 dx
R R
In particular, we obtain 0 x 2 0 2 0 x
does not converge.
R ∞ dx
Remark 6.59. Note that, by the Examples 6.52 and 6.58, we have that 0 xα = ∞ for all
α ∈ R.
Example 6.60. Show that the results of Examples 6.52 and 6.58 are equivalent by using the
substitution rule with f (x) = g(x) = x1 .
210

6.6 Piecewise continuous functions

Let us finally comment on functions, which have some desired properties only piecewise.

Definition 6.61. Let I = [a, b]. We say that a function f : I → R is piecewise continuous
if and only if there are exist a finite number of points x1 , . . . , xm ∈ I such that

1. f is continuous on every subinterval [a, x1 ), (xm , b] and (xk , xk+1 ) for k = 1, . . . , m − 1,

2. the limits limx%xk f (x) and limx&xk f (x) exist and are finite.

We call x1 , . . . , xm the (finite) discontinuities of f .

A simple example of a function for which such piecewise considerations might be necessary is
the indicator function of an interval [c, d] ⊂ R, i.e.
(
1, if x ∈ [c, d],
χ[c,d] (x) :=
0, if x ∈
/ [c, d].

However, one might also think about other piecewise defined functions, like

2
−x , if x < 0,


f (x) := 2x2 + 1, if x ∈ [0, 1],


x, if x > 1.

These functions are clearly not continuous on R. However, when restricting to the individual
“pieces” of the functions then they are continuous. (Actually, both function are infinitely often
differentiable on each subinterval.) Since also the needed (one-sided) limits are finite, both
functions are piecewise continuous.
Now, to compute the integral of such piecewise countinuous functions, we can just split the
integral into the corresponding parts and then use the respective rules for calculating integrals.
That is, if f : [a, b] → R is a piecewise continuous function with exceptions x1 , . . . , xm , then we
use
Z b Z x1 m−1
X Z xk+1 Z b
f dx = f dx + f dx + f dx.
a a k=1 xk xm

Since, f is now a continuous function in each subinterval. This expression is well-defined. For
example, we easily obtain
Z ∞ Z c Z d Z ∞
χ[c,d] dx = 0 dx + 1 dx + 0 dx = d − c.
−∞ −∞ c d

However, note that the subintervals (xk , xk+1 ) are (by Definition 6.61) open intervals. Therefore,
formally, we need to treat the integrals as improper integrals. The assumption about the one-
sided limits ensures that these integrals always exist. (One might argue with an continuous
extension of f to the closed interval [xk , xk+1 ].)

Remark 6.62. The assumption about the one-sided limits could be relaxed a bit. However, since
we want to say that “every piecewise continuous function is integrable”, we need an assumption
that excludes, e.g., x1 of being piecewise continuous.
211

7 Fourier series
As discussed several times, it is the major task of natural and applied science to give an approx-
imation of reality (which is actually a complicated function). We have already used derivatives
to obtain approximations of functions by using Taylor’s theorem, see Theorem 5.53 and Corol-
lary 5.57. Although this is often very useful in theory, it has several drawbacks when it comes
to actual computations. First of all, the quality of the Taylor polynomial of a function is limited
by its smoothness, i.e., how often the function is differentiable, and the size of the domain. This
is clearly unsatisfactory, as an actual computational problem may not be “nice” in this respect.
For example, classification problems are naturally not concerned with smooth functions.
In this section we introduce and discuss Fourier series, which is a particular famous way of
writing functions as series based on certain integrals. The main idea is that functions can be
written as superpositions (or sums) of wave functions, represented by certain multiples of cos
and sin. That is, for many functions f : [0, 1) → R, we can find numbers ak , bk such that the
trigonometric polynomials
n
X
ak cos(2πkx) + bk sin(2πkx) ≈ f (x)
k=0
are “good” approximations of f for large enough n, where ak , bk must depend on f , much like
the coefficients in the Taylor polynomial. In particular, one may recover f if we send n → ∞.

Before we come to precise definitions, let us give some comments on the (partly recent) history
of the theory of Fourier series.
Remark 7.1 (History). Fourier series and the corresponding Fourier analysis are nowadays
of remarkable significance in a lot of applied sciences, especially in physics (acoustics, optics,
astrophysics) but also in signal processing, cryptography, oceanography and economics.
The first attempts of using trigonometric series for the approximations of functions dates back
to 1740 when the mathematicians Daniel Bernoulli (1700–1782) and Jean-Baptiste le Rond
d’Alembert (1717–1783) discussed this possibility. But only when Jean Baptiste Joseph Fourier
(1768–1830) presented his famous work “Théorie analytique de la chaleur” (“The analytical the-
ory of heat”) in 1822, it became apparent how powerful these techniques are. Fourier managed
to solve the heat equation (in one dimension) by using the series, which are by now referred to
as Fourier series.
At this time it was the general thought that every continuous function can be written as such
a series. However, one of the first actual convergence results is due to Peter Gustav Lejeune
Dirichlet (1805–1859). In 1829 he proved that the Fourier series of a Lipschitz continuous
function converges. In order to treat Fourier series, Bernhard Riemann (1826–1866) actually
invented his definition of an integral (the Riemann integral) and discovered the so called local-
ization theorem in 1853. It took until 1876 for Paul du Bois-Reymond (1831–1889) to find a
continuous function, whose Fourier series did not converge at any point. This was a big surprise
at that time. However, in 1904 the Hungarian mathematician Leopold Fejér (1880–1959) could
show that for any continuous function, its Fourier series converges on arithmetic average. This
means, that we can recover any continuous function from its Fourier coefficients, but we need
to be careful how to use them. This was a major breakthrough, and lead to big advances in
theoretical and applied sciences.
The final word on this problem was given only in 1966 when the Swedish mathematician Lennart
Carleson (1928–now) showed that the Fourier series of any square-integrable function converges
almost everywhere. (We will see later what this means.) This was a question posed in 1915 by
Nikolai Nikolajewitsch Lusin (1883–1950), and Carleson became world-famous for this.
212

7.1 Periodic functions and trigonometric polynomials

As all the theory so far, and all that will come, we need assumptions about the functions under
consideration. Here, we will mostly assume that the functions f : [0, 1) → R are (piecewise)
continuous. This is to ensure that the integrals that we use are well-defined. This assumption
is not necessary for many claims, but we do not have the theoretical background to treat more
general cases, so far.
Moreover, as we want to write functions as sums of cosine and sine functions, which are obviously
periodic when considered as functions on the real line, it is natural to assume the same for the
functions under consideration.

Definition 7.2. A function f : R 7−→ C is called periodic, or 1-periodic, if

f (x + 1) = f (x) for all x ∈ R.

Note that a periodic function is completely known if we know its function values in [0, 1). All
other function values on R follow from periodicity. E.g., f (0) = f (1) and f ( 37 ) = f ( 13 ) for peri-
odic functions. That’s why we also call functions defined on [0, 1) periodic functions, and mean
by this its periodic extension to R, i.e., we define f (x) := f ({x}) for x ∈ R \ [0, 1), where
{x} := x − bxc ∈ [0, 1) is the fractional part of x. See Figure 44 for the periodic extensions of
the functions f, g : [0, 1) → R with f (x) = x and g(x) = 12 − |{x} − 12 |, which are called sawtooth
wave and triangle wave.
(Note that, by the periodic extension, we have f (2) = f (0) = 0 and not f (2) = 2.)

Figure 44: The sawtooth wave and the triangle wave.

This is useful when describing properties of a function, which include the boundary points. For
example, the above functions show that, while both f and g are continuous on [0, 1), only the
triangle wave g is continuous when considered as a periodic function. We fix this notation in
the following definition.

Definition 7.3. We say that a periodic function f : [0, 1) → C has a property if and only
if its periodic extension to R, i.e., the function f : R → C with f (x) = f ({x}), has this
property.

For example, when we say that a periodic function f is continuous, then this also implies that
the function values at the boundary coincide, or, more precisely, that limx&0 f (x) = limx%1 f (x).
The same statements hold for differentiability and so on. In particular, note that the left function
in Figure 44, i.e., the sawtooth wave, is also (infinitely often) differentiable if considered as a
function on (0, 1). However, as a periodic function it is not even continuous.
213

Let us come to the most important examples, which are called trigonometric monomials, in
correspondence to the (algebraic) monomials xk for k ∈ N.
Example 7.4 (Trigonometric monomials). The functions cos(2πkx) and sin(2πkx) are infinitely
often differentiable and 1-periodic functions for any k ∈ Z. Verify this yourself and make plots
of the functions for different k.

Remark 7.5 (Other periods). It is somewhat arbitrary to choose 1 as the length of a period.
Another standard choice would be to consider 2π-periodic functions, i.e., functions that satisfy
f (x) = f (x + 2π), like e.g. cos(x). We chose this normalization to ease the notation.
More general, one can study functions with an arbitrary period ω > 0, i.e., we have f (x + ω) =
f (x) for all x ∈ R, but note that, in this case, we always have that the function x 7→ f (ω · x) is
a 1-periodic function.

Although we considered so far mostly real-valued functions, it is very natural to study Fourier
series directly for complex-valued functions. For this, note that every complex-valued
function f : R → C √ can be written as f (x) = u(x) + i · v(x), where u, v : R → R are both
real-valued, and i = −1. We call such a function f continuous/differentiable/etc., if the same
is true for the real part u and the imaginary part v.

An especially important class of periodic functions are the trigonometric polynomials. These
are sums of the cosine and sine functions discussed in Example 7.4. Recall that (algebraic)
polynomials were functions p : R → R of the form p(x) = nk=0 ak xn . We showed that, under
P

certain assumptions, we have that a function can be approximated quite well by using algebraic
polynomials. This was Taylor’s theorem 5.53.
We now want to have similarly simple “building blocks” also for periodic functions, and it
turns out that the functions cos(2πkx) and sin(2πkx) are suitable. However, as it simplifies the
notation a lot and is a very useful tool, we use Euler’s formula

eit = cos(t) + i sin(t) for t ∈ R,

and use e2πikx = cos(2πkx) + i sin(2πkx), which are 1-periodic, as building blocks for trigono-
metric polynomials. But note that we also have to consider “negative frequencies” then.

Definition 7.6. A trigonometric polynomial p : R → C is a periodic function of the

form n
X
p(x) = ck e2πikx ,
k=−n

where n ∈ N and c−n , . . . , cn ∈ C are called the coefficients of the trigonometric polynomial.

Note that this very short notation for a trigonometric polynomial can clearly be written out by
using cosine and sine terms. In particular, by using Euler’s formula, we obtain that
n
X n
X
2πikx
p(x) = ck e = ck cos(2πkx) + i sin(2πkx)
k=−n k=−n
Xn n
X
= ck cos(2πkx) + i ck sin(2πkx)
k=−n k=−n
214

However, this representation can be simplified further by using that cosine is even, i.e., cos(−x) =
cos(x), and that sine is an odd function, i.e., sin(−x) = − sin(x). From this we get
n
X n
X
p(x) = c0 + (ck + c−k ) cos(2πkx) + i (ck − c−k ) sin(2πkx).
k=1 k=1

(Check this in detail!) We may therefore write trigonometric polynomials, as in Definition 7.6,
in the form n n
X X
p(x) = a0 + ak cos(2πkx) + bk sin(2πkx).
k=1 k=1
if we set a0 = c0 and, for k ≥ 1,

ak := ck + c−k ,
(7.1)
bk := i(ck − c−k ).

Example 7.7. The trig. polynomial p(x) = e6πix + e−6πix can be written as p(x) = 2 cos(6πx).
More generally, every trigonometric polynomial with all ck ∈ R such that ck = c−k , gives a sum
of cosines with real coefficients.
Example 7.8. Write p(x) = sin(2πx) in the form given in Definition 7.6.

Example 7.9. Why is eπix (or sin(x), or x2 ) not a trigonometric polynomial?

From the above relations, we can deduce quite some information of a trigonometric polynomial,
just by looking at its coefficients. In particular, we can say if a trigonometric polynomial is
indeed real-valued, i.e., if p(x) ∈ R for every x ∈ R.

Lemma 7.10. Let p be a trigonometric polynomial in the form given in Definition 7.6.
Then, p is real-valued, i.e., p : R → R, if and only if

Re (ck ) = Re (c−k ) and Im (ck ) = −Im (c−k ).

We leave the proof for the reader.

In particular, a real-valued trigonometric polynomial p is uniquely defined by the values of
c0 , c1 , . . . , because the ck for k < 0 follow from the above equations.

Remark 7.11 (Function defined on a circle). Another way of looking at periodic functions
is to assume they are defined on a circle. For this note that, instead of assuming that the
function has “the same behavior” at the endpoints of the interval [0, 1), we may also assume
that the endpoints are just the same, i.e., we assume 0 = 1. We can then talk about properties
like continuity when “x goes around the circle”, and there is no distinguished point like a
boundary. Mathematically, there are many ways of modeling this. The most prominent is to
consider functions defined on the complex unit circle S1 := {z ∈ C : |z| = 1}. This also gives a
justification of the name trigonometric polynomial since, with the parametrization z = e2πit , it
can be written as z 7→ nk=−n ck z k , z ∈ S1 , which looks like an algebraic polynomial.
P
215

7.2 Fourier coefficients and Fourier series

We now turn to the approximation of periodic functions by trigonometric polynomials. For this,
we need the so-called Fourier coefficients of a function. These values are then used to build
up an approximation of the function, the Fourier series. Note that in a similar way, we used
derivative values to obtain approximations by algebraic polynomials, by using Taylor’s theorem.

Definition 7.12. Let f : [0, 1) → C. Then, for a given integer k ∈ Z, we call

Z 1
fˆ(k) := f (x) e−2πikx dx
0

the k-th Fourier coefficient of f .

For n ≥ 0, we call the trigonometric polynomial
n
fˆ(k) e2πikx
X
Sn f (x) :=
k=−n

the n-th partial sum of the Fourier series of f .

Moreover, we call the series
 
∞
fˆ(k) e2πikx 
X
Sf (x) := lim Sn f (x) =
n→∞
k=−∞

the Fourier series of f . (We use this notation also if the limit does not exist.)
We say that the Fourier series equals f (pointwise) if f (x) = Sf (x) for all x ∈ [0, 1).

Note that the Fourier coefficients, and therefore the partial sums of the Fourier series, are well-
defined if the involved integrals are. So, in particular, for any piecewise continuous function
f on [0, 1). (We do not even need that f is continuous as a periodic function.) However, the
Fourier series does not necessarily make sense (aka. converge), or even then, it must not be
equal to f everywhere. Before we see this with an easy example, let us state the derivative and
the (indefinite) integral of our basic building blocks.

Lemma 7.13. Let k ∈ Z \ {0}. Then,

d 2πikx e2πikx
Z
e = (2πik) e2πikx and e2πikx dx =
dx 2πik
for all x ∈ R. In particular, we obtain
Z 1
e2πikx dx = 0
0

6 0, and 01 e2πi0x dx = 1.
R
for k =

If we accept that Theorem 5.11 and Lemma 6.9, i.e., linearity of differentiation and integration,
also hold for complex-valued functions, then a proof is straightforward, and we omit it.
216

We first discuss the Fourier series of a trigonometric polynomial. Similarly as algebraic poly-
nomials can be represented exactly by some finite Taylor polynomial, it is probably no surprise
that some (large enough) partial sum of the Fourier series is exact for trigonometric polynomials.
We give a detailed proof here for demonstration.

Example 7.14. For N ∈ N and ck ∈ C, consider the trigonometric polynomial p given by

N
X
p(x) = ck e2πikx .
k=−N

Then, p̂(k) = ck for all k, and

Sn p = p for all n ≥ N.
In particular, Sp = p.

Proof. We compute the Fourier coefficients of p:

 
Z 1 N
cj e2πijx  e−2πikx dx
X
p̂(k) = 
0 j=−N
N Z 1
e2πijx e−2πikx dx
X
= cj
j=−N 0

N
X Z 1
= cj e2πi(j−k)x dx.
j=−N 0

(Note that we were able to switch integral and sum, only because the sum is finite.) We have

Z 1  1 if j = k ,
e2πi(j−k)x dx = h
1 2πi(j−k)x
i1
0 
2πi(j−k) e =0 if j 6= k ,
0

such that
N
X
p̂(k) = cj δjk ,
j=−N

with δjk denoting the Kronecker-δ,

(
1 if j = k ,
δjk =
0 6 k.
if j =

In other words (
ck if − N ≤ k ≤ N,
p̂(k) =
0 otherwise.
This shows that Sn p = p for all n ≥ N , and therefore Sp = lim Sn p = p.
217

We continue by computing the Fourier series of a function that is not a trigonometric polynomial.
This example will show, when we finally finish it, that Fourier series are sometimes very helpful
in computing complicated sums.
Example 7.15. Consider the periodic function f (x) = x on [0, 1) (i.e., we consider the function
f (x) = {x} on R). By using the above, and integration by parts, we obtain fˆ(0) = 12 and, for
k 6= 0,
Z 1 Z 1
fˆ(k) = f (x)e−2πikx dx = x e−2πikx dx
0 0
" #1
x e−2πikx
Z 1
1
= − e−2πikx dx
−2πik 0
−2πik 0
1
= .
−2πik
We therefore obtain the Fourier series
1 X e2πikx
Sf (x) = +
2 k6=0 −2πik

or, after some computations

∞
1 X 1
Sf (x) = − sin(2πkx).
2 k=1 πk

Although the last series looks like the (divergent) harmonic series, we will see later that it is
actually convergent, and equals f (x) = x, for any x ∈ (0, 1). However, we see already now, that
the series is not equal to f at all x ∈ [0, 1), as f (x) = Sf (x) is false for x = 0. To see this, note
that all terms in the above series equal 0 if x = 0. Therefore, the series converges to Sf (0) = 12 .
(Indeed, Sn f (0) = 21 for all n, see also Figure 45.) However, we clearly have f (0) = 0. This
shows that we need to be careful with the points of convergence of a Fourier series.

Remark 7.16 (Non-convergence). The above example shows that continuity in [0, 1) is not
enough such that all functions are representable as its Fourier series. The reason is, that f (x) =
{x} is not continuous when considered as a periodic function. (It has jumps at x ∈ Z.) And if we
plot the first partial sums of the Fourier series, see Figure 45, we see that also the approximation
close to these jumps is not very good. One may even prove that Sn f is unbounded, i.e., that
limn→∞ maxx∈[0,1) |Sn f (x)| = ∞, which shows that an approximation can be arbitrarily bad for
finite n. Moreover, it is possible to show that there is also a periodic and continuous function
whose Fourier series diverges in a point. Both statements go far beyond this lecture.

Let us consider another Fourier series before we turn to statements about convergence.
Example 7.17. Consider the periodic function f : [0, 1) → C that is given by f (x) = (x − 12 )2 .
Note that f is continuous (as a periodic function). We obtain the Fourier coefficients
Z 1 2 Z 1/2
1 1
fˆ(k) = x− e−2πikx dx = t2 e−2πik(t+ 2 ) dt
0 2 −1/2
Z 1/2 Z 1/2
= e−πik t2 e−2πikt dt = (−1)k t2 e−2πikt dt,
−1/2 −1/2
218

Figure 45: Sn f for f (x) = {x} and n = 1, 2, 5, 50.

where we used the substitution t = x − 21 . For k = 0 we obtain that

Z 1/2
1
fˆ(0) = t2 dt = .
−1/2 12
For k 6= 0 we get
" #1/2
e−2πikt e−2πikt
Z 1/2 Z 1/2
2 −2πikt 2
t e dt = t − 2t dt
−1/2 −2πik −1/2 −1/2 −2πik
Z 1/2
1
= 0+ t e−2πikt dt
πik −1/2
" #1/2 
1  e−2πikt
Z 1/2 −2πikt
e
= t − dt
πik −2πik −1/2 −1/2 −2πik
!
e−πik eπik
Z 1/2
1 1 −2πikt
= − − e dt
πik −4πik 4πik −2πik −1/2
!
1 (−1)k+1
= −0
πik 2πik
(−1)k
= ,
2(πk)2
and therefore Z 1/2
1
fˆ(k) = (−1)k t2 e−2πikt dt = .
−1/2 2(πk)2
1 2
The Fourier series of f (x) = ({x} − 2) is therefore
∞
1 X 1 2πikx 1 X 1
Sf (x) = + 2
e = + cos(2πkx).
12 k6=0 2(πk) 12 k=1 (πk)2
219

Note that these sums are clearly absolutely convergent (Why?), but, so far, we do not know if
Sf equals f at any point.
However, when we have a look at the first partial sums of this Fourier series, then it seems to
converge very fast, i.e., already for n = 20 we see almost no difference to the original function,
see Figure 46. It even looks like the partial sums would converge uniformly to f .

Figure 46: Sn f for f (x) = ({x} − 21 )2 and n = 1, 2, 5, 20.

If we could prove that Sf equals f , i.e., that the partial sums converge pointwise to the function,
then, in particular, we would have that Sf (0) = f (0). Noting that f (0) = 14 and that all the
cosine terms in the above series equal 1 at x = 0, this would imply that
∞
1 X 1 ? 1
+ 2
= Sf (0) = f (0) = ,
12 k=1 (πk) 4

and therefore ∞
X 1 ? π2
= .
k=1
k2 6
This formula is correct (and by the way quite beautiful), but we did not prove it so far.
It still remains to show that the Fourier series converges to f at any point. We will do this
by presenting a general rule, based on an assumption on the Fourier coefficient, that a Fourier
series converges at all points. This assumption will be fulfilled, e.g., by twice-differentiable and
periodic functions.
Remark 7.18. The method above is probably the most powerful technique for computing
certain infinite sums. In some cases, there is even no other (manageable) way. One may try,
e.g., to compute ∞ 1
P
k=1 k2 by any other means.
However, we can clearly not evaluate every series by this. One example of this kind is ∞ 1
P
k=1 k3 .
(For practice, try to find the value of this series. But do not try too long! There’s no explicit
form of this number.)
220

Remark 7.19. Using the computations from page 214 (with ck := fˆ(k)), we can write the
partial sums of the Fourier series solely by sums of sine and cosine functions, as indicated in the
introduction. That is, we can write
n
X n
X
Sn f (x) = a0 + ak cos(2πkx) + bk sin(2πkx)
k=1 k=1

with Z 1 Z 1
ak := 2 f (x) cos(2πkx) dx and bk := 2 f (x) sin(2πkx) dx
0 0
for k ∈ N. (Verify this using (7.1).) However, this form has almost no advantages and it is often
more work to compute ak and bk separately, instead of just the ck . That’s why we usually work
with the Fourier coefficients as given above.

Remark 7.20. As stated in the beginning, it was rather arbitrary to choose the period 1. If
one chooses the period 2π (which is another prominent choice), then the Fourier coefficients are
usually defined by
1 2π
Z
ˆ
f (k) := f (x) e−ikx dx.
2π 0
This implies that also the Fourier series of functions may look different. (And they are because
of the different domain/period.) Therefore, you should be careful when using other literature.

7.3 First convergence theorems

We now turn to a result about point-wise convergence of Fourier series, i.e., we want to know
if the partial sums of the Fourier series of a function converge to this function at all points.
This is clearly a desirable property, and we will show that this holds for functions whose Fourier
coefficients are absolutely summable, i.e., k∈Z |fˆ(k)| < ∞.
P

Under this assumption, we can prove even more:

We show that the Fourier series converges uniformly to the function. That is, the difference
between f (x) and Sn f (x) is, for large enough n, “small” for all x simultaneously. This is a very
important property when it comes to approximations, where we want to approximate a function
on the whole domain, e.g., by the partial Fourier series Sn f , and we want that |f (x)−Sn f (x)| < ε,
for some given ε > 0, for every x ∈ R. That is, we want that supx |f (x) − Sn f (x)| < ε.
Before we discuss that with some examples, let us put this into a definition. Since we need this
also later, we phrase it a bit more general for arbitrary sequences of functions (fn )n∈N , i.e.,
every term of this sequence is a function fn : D → C (Later, we choose fn = Sn f and D = R.)
and we may ask if such a sequence converges. But, we need to specify what it means for a
sequence of functions to converge.
221

Definition 7.21. Let (fn )∞

n=1 be a sequence of functions on a set D, and let f : D → C.

(i) If
lim fn (x) = f (x) for all x ∈ D,
n→∞
pw
then we say that (fn ) converges point-wise to f . We use the notation fn −→ f or
“fn → f point-wise”.

(ii) If
lim sup |fn (x) − f (x)| = 0,
n→∞ x∈D

then we say that (fn ) converges uniformly to f . We write “fn → f uniformly”.

Uniform convergence implies point-wise convergence. In particular, if a sequence converges

uniformly to a function f , then this function is the point-wise limit, i.e., f (x) = limn→∞ fn (x).
(Just note that we have |fn (x) − f (x)| ≤ supy∈D |fn (y) − f (y)| for every n ∈ N and x ∈ D.)
However, it is not obvious that these are really different concepts. For this, let us consider the
following simple example.

Example 7.22. Consider fn (x) = xn on D = [0, 1). Then, since we know that xn → 0 for every
pw
x ∈ [0, 1), we have that fn −→ 0. However, we have for every fixed n ∈ N that

sup |xn − 0| = sup xn = 1.

x∈[0,1) x∈[0,1)

Since this does not converge to 0, we obtain that fn is not uniformly convergent (to 0).
Roughly speaking, the sequence xn converges arbitrarily slow to 0 (depending on x), and there-
fore not uniformly.

Example 7.23. Another example of a sequence of functions, and its limit, that we discussed
already is the difference quotient: For this, let f : (a, b) → R be a continuous function, and
define the sequence of functions

f (x + n1 ) − f (x)
fn (x) = 1 .
n

pw
We know that fn −→ f 0 , i.e., fn converges point-wise to the derivative of f , if f is differentiable.
One can also show that fn converges uniformly to f 0 if f is continuously differentiable. (We do
not need that here.)

We do not want to go too much into detail here. In most cases, we will just talk about point-
wise convergence, which should be easy to comprehend, since we actually already work with this
type of convergence for some time. However, many results hold directly for the much stronger
uniform convergence, and that’s why we also state it. Moreover, it is sometimes a helpful tool
in proving our results. Let us briefly comment on the power of this type of convergence: In
general, we do not know in advance properties of the point-wise limit of a sequence of functions.
However, if fn → f uniformly, then some properties are preserved. For example, if every
fn is continuous, then we obtain that also f is continuous. This is a very powerful insight!
222

(To see that this is false without uniform convergence, consider e.g., fn (x) = xn on [0, 1] which
converges pointwise to a discontinuous function. Which one?)

In the sequel we will need only two of the properties that are preserved under uniform conver-
gence. We state them in the following lemma.

Lemma 7.24. Let (fn )∞

n=1 be a sequence of continuous functions on a set D, and let
f : D → C be such that fn → f uniformly. Then,

(i) f is continuous, and

(ii) for all a, b ∈ D with [a, b] ⊂ D, we have

Z b Z b
f dx = lim fn dx.
a n→∞ a

Note that the second part of this lemma actually states that we can “interchange” the limit
(in f (x) = limn→∞ fn (x)) and the integral. Since the terms of the sequence are usually much
easier functions than the limit, this gives a nice way of computing integrals. Moreover, the first
enables us to show that the limit of continuous functions is continuous, even if we do not know
the limit precisely. This is particularly useful when working with Fourier series, because in this
case (fn = Sn f ) the functions fn are obviously continuous. Hence, uniform convergence of the
partial sums of a Fourier series directly implies continuity of the limit.

lim |f (xm ) − f (x0 )| < ε.

m→∞

Since this holds for all ε > 0, this proves part (i).
For the second part, we use the triangle inequality to obtain
Z
b Z b Z b
f dx − fn dx ≤ |f (x) − fn (x)| dx.

a a a

ε
By uniform convergence, supx |f (x) − fn (x)| < b−a for all n ≥ n0 and n0 large enough. This
implies Z
b Z b ε
Z b
f dx − fn dx < 1 dx = ε

b−a a

a a
for n ≥ n0 . This implies the result.
223

We now turn to the convergence of Fourier series.

P∞ ˆ
Theorem 7.25. Let f : [0, 1) → C be continuous. If k=−∞ |f (k)| < ∞ holds, then

Sf (x) = lim Sn f (x) = f (x)

n→∞

for every x ∈ [0, 1). Moreover, Sn f → f uniformly.

The proof of this theorem makes use of the following lemma, which says that two distinct
functions can be distinguished by their Fourier coefficients. This is a natural statement, and
one of the fundamental bases of Fourier analysis. However, a formal proof would be rather
complicated, and we omit it here.

Lemma 7.26. Let f, g : [0, 1) → C be continuous functions such that fˆ(k) = ĝ(k) for all
k ∈ Z. Then,
f (x) = g(x) for all x ∈ [0, 1).

In particular, fˆ(k) = 0 for all k ∈ Z if and only if f = 0.

Remark 7.27. The statement of Lemma 7.26 is equivalent to the statement that for every
function f 6= 0, there exists some k ∈ Z such that fˆ(k) 6= 0.

Proof of Theorem 7.25. To simplify notation, we set ak := fˆ(k) for k ∈ Z. According to the
Pn
assumptions, the sequence of partial sums k=−n |ak | n≥0 converges, because it is a monotone
and bounded sequence. In particular, it is a Cauchy sequence. This means that for every ε > 0
there exists n0 ∈ N, such that for all n ≥ n0 we have
X
|ak | ≤ ε.
k : |k|>n

Thus for all n ≥ n0 , we obtain

X X
2πikx

sup |Sf (x) − Sn f (x)| = sup ak e ≤ |ak | < ε,
x∈[0,1) x∈[0,1) |k|>n |k|>n

where we use |e2πikx | = 1. Since this holds for all ε > 0, we see that Sn f converges uniformly to
Sf . It remains to show that Sf (x) = f (x) for all x. Since we want to use Lemma 7.26 for this,
we need to show that both f and Sf are continuous, and that their Fourier coefficients coincide.
First, f is continuous by assumption, and by Lemma 7.24, Sf is continuous as the uniform limit
of continuous functions. (Trigonometric polynomials are always continuous.)
Let us consider the Fourier coefficients. It is easy to see that, if a sequence of functions (gn )
converges uniformly to g, then, for fixed ` ∈ Z, gn e2πi`· converges also uniformly to g e2πi`· .
(Verify this!) By Lemma 7.24(ii), this implies that
Z 1 Z 1
lim gn (x) e−2πi`x dx = g(x) e−2πi`x dx.
n→∞ 0 0
224

In other words, ĝn (`) → ĝ(`) for all ` ∈ Z. We now apply this to gn = Sn f and g = Sf .
Moreover, recall from Example 7.14 that
Z 1 (
−2πi`x a` , if |`| ≤ n ,
Sd
n f (`) = Sn f (x) e dx =
0 0, otherwise.

(This just means that the “first n” Fourier coefficients of the partial sum Sn f coincide with the
corresponding Fourier coefficients of f .) Since Sd n f (`) = a` for all n ≥ |`|, we clearly obtain
limn→∞ Sn f (`) = a` for all ` ∈ Z. Together with the above, and the uniform convergence of
d
Sn f to Sf , we have
c (k) = lim Sd
Sf n f (k) = ak = f (k)
ˆ
n→∞

for all k ∈ Z. This finally shows that all Fourier coefficients of Sf and f coincide. Together
with their continuity, we obtain from Lemma 7.26 that Sf (x) = f (x) for all x ∈ [0, 1).

With Theorem 7.25 we can finally prove that the example from the end of Section 7.2 was correct.

Corollary 7.28. We have that

∞
X 1 π2
= .
k=1
k2 6

Proof. As discussed above, the Fourier coefficients of the function f (x) = ({x} − 12 )2 satisfy
Z 1/2
1
fˆ(k) = (−1)k t2 e−2πikt dt = .
−1/2 2(πk)2

We know from Lemma 3.91 that ∞ ˆ

k=−∞ |f (k)| < ∞. Therefore, we know from Theorem 7.25
P

that Sf (x) = f (x) for every x ∈ [0, 1). In particular, Sf (0) = f (0). This implies that
∞
1 1 X 1 1 X 1
= f (0) = Sf (0) = + = + .
4 12 k6=0 2(πk)2 12 k=1 (πk)2

Rearranging yields the result.

Note that we can deduce a bit more from the proof of Theorem 7.25. In particular, we showed
that, given an absolutely summable sequence of complex numbers, then the trigonometric poly-
nomials with these coefficients converge to a continuous function. This is a helpful statement
when a function is given by its Fourier coefficients. (Note that the theorem shows that, under
the given assumptions, a function might be uniquely defined by its Fourier coefficients.) We
state this in the following lemma.
225

Lemma 7.29. Let (ak )k∈Z be a complex-valued sequence such that

∞
X
|ak | < ∞.
k=−∞

Then, the sequence of trigonometric polynomials (gn )n≥0 with

n
X
gn (x) = ak e2πikx
k=−n

converges uniformly to a continuous periodic function g : [0, 1) → C with

ĝ(k) = ak for all k ∈ Z.

Example 7.30. Given the sequence (ak )k∈Z with ak = 0 for k < 0, and ak = e−k for k ≥ 0.
From Lemma 7.29 we obtain that
∞
e−k e2πikx
X
g(x) =
k=0
describes a continuous periodic function. To see this, we do not even need to make any compu-
tations regarding the infinite series. It is enough that ak is non-negative and summable.
1
(One could use geometric series to obtain the explicit form g(x) = 1−e2πix−1 .)

Although Theorem 7.25 is useful and gives a simple criterion for a Fourier series to converge, it is
still not satisfactory as it gives a property of the Fourier coefficients as a sufficient condition. In
some cases, one would prefer to check only a condition of the function itself, like differentiability.
The next result, which follows almost immediately from Theorem 7.25, shows that the Fourier
series of twice differentiable functions converges uniformly. However, note that this would not
be helpful for proving the result in Corollary 7.28, because this function is not differentiable
at 0, see Figure 46. In this respect, Theorem 7.25 is more general.

Theorem 7.31. Let f : [0, 1) → C be periodic and twice continuously differentiable. Then,

Sf (x) = lim Sn f (x) = f (x)

n→∞

for every x ∈ [0, 1). Moreover, Sn f → f uniformly.

For the proof of this result it is enough to show that the Fourier coefficients of a twice differen-
tiable function are absolutely summable. As this is of independent interest, we state it in the
following lemma in a more general form.

Lemma 7.32. Let s ∈ N and f : [0, 1) → C be periodic and s-times continuously differen-
tiable. Then, we have
M
|fˆ(k)| ≤ for all k 6= 0,
|2πk|s
where
M := max |f (s) (x)|.
x∈[0,1)
226

Proof. Let k 6= 0. Using integration by parts, we obtain

Z 1
fˆ(k) = f (x)e−2πikx dx
0
" #1
e−2πikx e−2πikx
Z 1
= f (x) − f 0 (x) dx
−2πik 0 0 −2πik
Z 1
1
= f 0 (x) e−2πikx dx,
2πik 0

where we’ve used the periodicity of f (and e−2πikt ) in the last equation. If we repeat the
integration by parts additional (s − 1)-times, we get
Z 1
1
fˆ(k) = f (s) (x) e−2πikx dx.
(2πik)s 0

The triangle inequality finally implies

Z 1
M
|fˆ(k)| ≤ |2πk|−s |f (s) (x)| |e−2πikx | dx ≤ .
0 |2πk|s

With this it is easy to prove Theorem 7.31.

Proof of Theorem 7.31. According to the assumptions, f 00 is continuous, such that there exists
M < ∞ with
|f 00 (x)| ≤ M, ∀x ∈ [0, 1).
From Lemma 7.32 we have fˆ(k) ≤ |2πk|−2 M ≤ |k|−2 M for all k 6= 0, and consequently
∞ ∞
1
|fˆ(k)| ≤ fˆ(0) + 2M
X X
< ∞.
k=−∞ k=1
k2

Therefore, we obtain from Theorem 7.25 that we have Sn f → f uniformly.

Remark 7.33. Let us state again that the assumption of the last theorem is that the function
f under consideration has to be continuously differentiable as a periodic function. This means,
that also its derivative f 0 (or derivatives of higher order) have to be periodic functions and,
in particular, have to satisfy f 0 (0) = f 0 (1). (It is a typical mistake to forget this property.)
For example, the periodic function f : [0, 1) → R with f (x) = (x − 12 )2 is continuous, since
limx&0 f (x) = limx%1 f (x) = 41 . However, its derivative satisfies f 0 (x) = 2x − 1 for x ∈
(0, 1), which implies limx&0 f 0 (x) = −1 6= 1 = limx%1 f 0 (x). Therefore, f is not continuously
differentiable as a periodic function.

Theorem 7.25 and Theorem 7.31 only state the qualitative statements that the Fourier series
converge. However, for applications of the theory for actual computations, it is necessary to
have also quantitative results. That is, we want to know how large we have to choose n such
that the error is small when we approximate f by Sn f . Fortunately, such error bounds can be
obtained if we are a bit more careful in bounding the corresponding infinite sums.
227

For this we state the following lemma, which is a useful tool for bounding (infinite) sums by
certain integrals. Note that this can also be used as a convergence test for series, similarly to
the ones given in Section 3.7, to verify if a sequence of numbers is summable. But this time,
this may also lead to reasonable bounds on the sum of the series.

Lemma 7.34. Let h : [0, ∞) → [0, ∞) be a continuous and non-increasing function. Then,
for all m, n ∈ N0 with m < n, we have
n
X Z n n−1
X
h(k) ≤ h(x) dx ≤ h(k).
k=m+1 m k=m

In particular, if H is an antiderivative of h, then,

∞
X
h(k) ≤ H(∞) − H(m),
k=m+1

where H(∞) := limx→∞ H(x).

Proof. First, we split the integral into several integrals over intervals of length 1 to obtain
Z n n−1
X Z k+1
h(x) dx = h(x) dx.
m k=m k

For each k, the mean value theorem (Theorem 6.37) yields that there is some ξk ∈ [k, k + 1]
R k+1
such that k h(x) dx = h(ξk ). Since h is non-increasing, we have h(k + 1) ≤ h(ξk ) ≤ h(k), and
therefore
n−1
X Z n n−1
X
h(k + 1) ≤ h(x) dx ≤ h(k).
k=m m k=m

An index shift implies the first statement. The second follows by taking the limit n → ∞.

Example 7.35. The most prominent application of the last lemma is to bound the power series
P∞ −s for s > 1. We obtain for every n ∈ N that
k=n+1 k

∞
1
k −s ≤ n−s+1 .
X

k=n+1
s−1

(Verify this!)

With these bounds at hand, we are able to give explicit quantitative bounds on the error of
Fourier series. Recall that we have presented a similar bound already for Taylor polynomials in
Corollary 5.57. However, with this bound we were not able to give good bounds for functions
that are not very smooth. In contrast, the next bound shows that we can give arbitrarily good
approximations of a function, as long as it is twice differentiable. (Of course, we need to compute
enough Fourier coefficients for this. We do not discuss here, how this could be done.)
228

Corollary 7.36. Let f : [0, 1) → C be continuous such that for some s > 1 and B < ∞ we
have
B
|fˆ(k)| ≤
|k|s
for all k 6= 0. Then,
2B 1
|f (x) − Sn f (x)| ≤
s − 1 ns−1
for all x ∈ [0, 1) and n ∈ N.

Proof. Combining the techniques we have learned so far, we obtain

X
ˆ |fˆ(k)|
X
2πikx

|f (x) − Sn f (x)| =
f (k)e ≤
|k|>n |k|>n
X B ∞
X 1
≤ = 2B
|k|>n
|k|s k=n+1
ks
2B 1
≤ s−1
.
s−1 n

Let us finally again discuss the example from the beginning, see Example 7.17, and see how
good an approximation by a partial sum of the Fourier series would be.
Example 7.37. We consider the periodic function f : [0, 1) → C that is given by f (x) = (x− 21 )2 .
We already showed in Example 7.17 that its Fourier coefficients equal
1
fˆ(k) = .
2(πk)2
Therefore, they satisfy the bound of Corollary 7.36 with s = 2 and B = 2π1 2 . We therefore
obtain the bound
1
|f (x) − Sn f (x)| ≤ 2
π n
1
for all n ∈ N. One might check, that n = 11 suffices to obtain |f (x) − S11 f (x)| < 100 (or n = 102
1
for |f (x) − S102 f (x)| < 1000 ). This finally justifies that an approximation of this function can
already be good for rather small n, see Figure 46.
Example 7.38. Bounds like those in the last example can also be useful when we want to
approximate certain series by finite sums. Consider again the same example, but only at x = 0.
1
+ ∞ 1 1
P
At this point the Fourier series reads Sf (0) = 12 k=1 (πk)2 , and we have f (0) = 4 , see
Example 7.17. Therefore,
n

1 X 1 1
|f (0) − Sn f (0)| = − ≤ 2 ,

6 (πk) 2 π n
k=1

which implies
n

π2 X 1 1
− ≤ .

2

6
k=1
k n
2
We can therefore give a rather good approximation of π6 with an error of at most n1 by just
computing the sum of the first n terms of the series. Note that this can actually be done by
hand, and it was done like this before calculators were invented. At these times, such error
bounds were essential.
229

7.4 The theorem of Dirichlet

We now shortly comment on some sort of final result regarding the convergence of Fourier
series. This is Dirichlet’s theorem, which states that it is actually enough to consider only
local properties of a function to obtain pointwise convergence of the Fourier series. Note that,
in contrast to this, all the previous results required some global knowledge about the function:
either through its Fourier coefficients, or because we assumed the function to be differentiable
everywhere. The following theorem, which is only a special case of Dirichlet’s theorem, states
that the latter assumption would be enough at a point.

Theorem 7.39. Let f : [0, 1) → C be piecewise continuous. Then, if f is differentiable at

x0 ∈ (0, 1), we have
Sf (x0 ) = lim Sn f (x0 ) = f (x0 ).
n→∞

Moreover, if f is differentiable (as a periodic function), then Sn f → f uniformly.

We do not prove this statement here. Actually, a proof of this theorem requires quite a lot of
prerequisites, and it would fill several lectures to tackle every single detail.
Let us see an example that shows how useful this theorem is.
Example 7.40. Consider again the periodic function f (x) = x on [0, 1), see Example 7.15 and
Figure 45. We have shown already that fˆ(0) = 21 and fˆ(k) = −2πik
1
for k 6= 0, which yields the
Fourier series ∞
1 X e2πikx 1 X 1
Sf (x) = + = − sin(2πkx).
2 k6=0 −2πik 2 k=1 πk

This series is not convergent at x = 0, because Sf (0) = 21 6= 0 = f (0). However, since f is

differentiable at all x ∈ (0, 1), we know from Dirichlet’s theorem that Sf (x) = f (x) for all
x ∈ (0, 1). This can be used, e.g., to easily compute the series
∞
X sin(kt) π−t
=
k=1
k 2

for every t ∈ (0, 2π). (Verify this!)

Remark 7.41 (Advanced Fourier series). In the last sections we discussed partial sums of
Fourier series, and how they converge to the original function. Although we presented several
results that show this convergence, we have also indicated that this is not always true. Let
us finally add that, if we would combine the Fourier coefficients in a more clever way, then
the corresponding series converge pointwise for every periodic and continuous function. To be
precise, if consider the average of the first n partial sums

1 n−1
X
σn f (x) = Sm f (x),
n m=0

which are called Cesaro means of the partial sums, then we might prove (with a lot of effort)
that σn f → f uniformly for every continuous function. This is called Fejer’s theorem. This
result, and its more advanced variants, are heavily used in everyday applications (like JPEG
or MP3), in particular, due to their nice convergence guarantee. This shows that one might be
careful how to build up an approximation, based on given information.
230

8 Multivariate Calculus
In this chapter we initiate the study of functions that depend on more than one variable, which
are called multivariate functions. In analogy to the case of real-valued functions of one variable,
we will introduce some concepts, like continuity or differentiability, which will then lead to
results on, e.g., extreme values of multivariate functions. Note that the study and computation
of minima and maxima of functions that depend on many parameters is one of the main subjects
of optimization, and therefore particularly important in AI applications.
The general type of multivariate functions are vector-valued functions, which have the form

V : Rd → Rm

for some d, m ∈ N. That is, V maps a vector of length d to a vector of length m. We already
studied the case m = d = 1 in detail, and such functions will be called univariate. A special case
of vector-valued function are vector fields, which have d = m, and have the nice interpretation
of attaching a vector to each point in space.
The easiest vector-valued functions are given by matrix-vector multiplication with a matrix
A ∈ Rm×d . That is, we define V : Rd → Rm by V (x) = Ax ∈ Rm for x ∈ Rd . Such functions are
called linear functions. (Recall that univariate linear functions are just multiplication with a
scalar.) However, we also need to discuss non-linear functions, for which the components of the
output may result from arbitrary operations with the input, and not only linear combinations.
For this, we start by considering the special case m = 1, i.e., we consider multivariate real-
valued functions
f : Rd → R
with d ∈ N. (We use “d” for dimension.) Although this might seem to be a very special case, we
will see later that general vector-valued functions can be handled rather easily (component-wise)
once we have the right tools for real-valued functions.
Let us start with an example, and some comments on the visualization of functions.

Example 8.1. Define f : R2 → R by

f (x1 , x2 ) = x21 + x22 − x1 · x2 .

For example, we have f (0, 0) = 0, f (1, 1) = 1 and f (1, 12 ) = 34 .

In contrast to univariate functions, visualization of multivariate might not always be easy or
meaningful. However, one might at least want to try in the case d = 2, where there are
two popular ways of plotting a function, see Figure 47. For the first, we just plot the graph
{(x1 , x2 , x3 ) ∈ R3 : x3 = f (x1 , x2 )} of the function f in three-dimensional space. On the other
hand, we can also have a look at the level sets Nc = {(x1 , x2 ) ∈ R2 : f (x1 , x2 ) = c} for c ∈ R, i.e.,
Nc contains all points of the function f which are at the ’height’ c. (For continuous functions,
these points form lines.) The level sets can now be used to get a two dimensional plot of f ,
which is called a contour plot.
231

Figure 47: A plot of the graph and the contour plot of f (x1 , x2 ) = x21 + x22 − x1 x2 in [0, 1]2 .

Remark 8.2. Note that, as usual, vectors x ∈ Rd are considered as column vectors when it
comes to matrix-vector multiplication. However, when considered as input of some function we
use the notation x = (x1 , . . . , xd ) and f (x) = f (x1 , . . . , xd ). This should not lead to confusion.

8.1 Sequences in Rd

Since we want to investigate continuity and other properties of multivariate functions, we have
to mimic the concepts that we introduced in the univariate setting. In particular, we need the
concept of a limit of a (convergent) sequence. Recall that in the univariate case, we used often
that, for a sequence (an )n∈N ⊂ R, the convergence lim an = a is equivalent to lim |an − a| = 0,
and we want to do the same here. Therefore, we actually only need to replace the absolute value
by another quantity that allows for measuring ’how large’ a vector is. This can then be used to
measure ’how close’ two vectors are.
Recall that we already introduced in Section 1.9 the Euclidean norm and the corresponding
inner product
v
q u d d
uX X
kxk2 = hx, xi = t |xi |2 , where hx, yi = x i yi
i=1 i=1

for x = (x1 , . . . , xd ) ∈ Rd and y = (y1 , . . . , yd ) ∈ Rd .

In analogy with the univariate case, we have the triangle inequality

kx + yk2 ≤ kxk2 + kyk2 .

for all x, y ∈ Rd , see Theorem 1.75. Moreover, the Cauchy-Schwarz inequality (Lemma 1.76)
states that
|hx, yi| ≤ kxk2 kyk2
for all x, y ∈ Rd , and we have the equality |hx, yi| = kxk2 kyk2 if and only if y = c · x for some
c ∈ R.
232

We are now ready to treat limits of sequences of vectors. But note that we have to be
careful with the indices: In the following, we assume that (xk )k∈N ⊂ Rd is a sequence of vectors,
i.e., xk ∈ Rd for every k ∈ N. Now, these vectors have d entries, which will be denoted by
 
xk,1
xk,2 
 
xk  ..  .
=  
 . 
xk,d

In analogy to the convergence of univariate sequences, we define the following.

Definition 8.3 (Convergence). Let (xk )k∈N ⊂ Rd be a sequence of vectors. If there exists
some y = (y1 , . . . , yd ) ∈ Rd such that

xk,i → yi for all i = 1, . . . , d.

then we say (xk ) converges to y, and write xk → y (as k → ∞), or limk→∞ xk = y.

If there is no such vector y the sequence is not convergent, or divergent.

That is, a sequence (xk ) converges if and only if every component (xk,i ), i = 1, . . . , d, of the
sequence converges.

Let us see an example.

Example 8.4. We consider the sequence given by

!
k−2 1 ek + k 2

xk = , cos + π , .
k2 + 1 k ek

Using our knowledge about real sequences, and the above lemma, we see that xk → x with

x = (0, −1, 1) ,

k−2 1 ek +k2
which are the limits of k2 +1
, cos k + π and ek
.

With this in mind, we can compute limits of vectors just by computing d usual limits. However,
it is sometimes handy to have a characterization of convergence that is based on the Euclidean
norm. For illustration, let us see the following easy example.

Example 8.5. We have a look at the sequence which is given by

!
1 1
xk = .
k 1

This sequence clearly converges to 0 ∈ R2 . If we now look at the norms of the differences xk − 0
we see that s
1√
2 2 r
1 1 1 1
kxk − 0k2 = −0 + −0 = 2
+ 2 = 2.
k k k k k
Thus kxk − 0k → 0 as k → ∞.
233

Having a closer look to the above example, or the definition of the norm in general, we see that a
sequence (xk ) converges to a vector y if and only if the norms kxk − yk converge to 0 (a number).
We state this in the following lemma. The proof is an easy exercise.

Lemma 8.6. Let (xk )k∈N ⊂ Rd be a sequence of vectors and y ∈ Rd . Then,

xk → y ⇐⇒ xk − y → 0 ⇐⇒ kxk − yk2 → 0.

By this equivalence, we can see the similarity to the univariate situation: We had that a sequence
of numbers (an ) converges to a if and only if the absolute values |an − a| converge to zero. This
similar appearance will be helpful, as many proofs of results on continuity and differentiation
that follow will just look very similar to the proofs from the univariate setting.

Let us also comment on other norms that one can consider for vectors. We will mostly use
the Euclidean norm for the considerations below, but it is sometimes useful (or necessary) to
consider another ’distance’ between points. However, note that the Cauchy-Schwarz inequality
is special to the Euclidean norm.
Let us shortly say what we consider a norm. In particular, to collect all properties that are
shared by these quantities. (We discuss that later more detailed.)

Definition 8.7 (Norm). Let k·k : Rd → R+ is called a norm if there holds

1) kxk = 0 ⇐⇒ x = 0

2) For any λ ∈ R and any x ∈ Rd we have kλxk = |λ| · kxk.

3) For all x, y ∈ Rd we have kx + yk ≤ kxk + kyk.

The properties 1) - 3) are called definiteness, homogeneity and triangle inequality.

As shown above, the Euclidean norm k · k2 , which is also called the 2-norm, is a norm in this
sense.
There are two other important norms. The first is the 1-norm, which is sometimes called
Manhattan norm, given by
d
X
kxk1 = |xi |
i=1
for x = (x1 , . . . , xd ) ∈ Rd . The other one is the maximum norm, or ∞-norm, given by
n o
kxk∞ = max |x1 |, |x2 |, . . . , |xd | .
It is easy to see that both fulfill the first requirement for being a norm, the definiteness. For the
homogeneity note that
d
X d
X
kλxk1 = |λxi | = |λ| |xi | = |λ| · kxk1 .
i=1 i=1
and
kλxk∞ = max{|λx1 |, |λx2 |, . . . , |λxd |} = |λ| max{|x1 |, |x2 |, . . . , |xd |} = |λ| kxk∞ .
234

Finally, the triangle inequality follows from

d
X d
X d
X d
X
kx + yk1 = |xi + yi | ≤ (|xi | + |yi |) = |xi | + |yi | = kxk1 + kyk1
i=1 i=1 i=1 i=1
and
kx + yk = max{|x1 + y1 |, |x2 + y2 |, . . . , |xd + yd |}
≤ max{|x1 | + |y1 |, |x2 | + |y2 |, . . . , |xd | + |yd |}
≤ max{|x1 |, |x2 |, . . . , |xd |} + max{|y1 |, |y2 |, . . . , |yd |},
where we used the univariate triangle inequality several times.

It is important to note that differences between these norms cannot be too large, as the following
lemma shows.

Lemma 8.8. For any x ∈ Rd we have

√
1) kxk∞ ≤ kxk2 ≤ d kxk∞ ,
√
2) √1d kxk1 ≤ kxk2 ≤ d kxk1 ,

3) kxk∞ ≤ kxk1 ≤ d kxk∞ .

Note that, in contrast to what may be suggested by the name, the maximum norm is the smallest
of these norms. However, as all these norms only differ by a multiplicative constant (that only
depends on the dimension), we obtain that for a sequence (xk ) we have
xk → x ⇐⇒ kxk − xkp → 0,
where p stands for 1,2 or ∞. Therefore, when talking about convergence, it does not matter
which norm we use.
Let us finally prove the last lemma.

Proof. Let k be such that |xk | = max{|x1 | , |x2 | , . . . , |xd |}. Then
v v v
u d u d u d
uX uX 2
uX √
|xk | ≤ t |xi |2 ≤ t |xk | = |xk | t 1 = d |xk | .
i=1 i=1 i=1

(Note the indices.) Thus the first point follows. For the third point we calculate
d
X d
X
|xk | ≤ |xi | ≤ |xk | = d |xk | .
i=1 i=1
The upper bound in the second point follows by combining point 1) and 3) as
√ √
kxk2 ≤ d kxk∞ ≤ d kxk1 .
For the lower bound we use the Cauchy-Schwarz inequality, see Lemma 1.76, to obtain
v
d
X d
X
u d
uX √
kxk1 = |xi | = |xi | · 1 ≤ kxk2 · t 12 = dkxk2 .
i=1 i=1 i=1
√
Dividing by d gives the lower bound on kxk2 .
235

8.2 Continuous functions

The definition of continuity in the multivariate case is the same as in the one dimensional case.
That is, we require that we can interchange the limit with the function.

Definition 8.9. Let Ω ⊂ Rd , f : Ω → Rm and x0 ∈ Ω.

Then we call f continuous at x0 if for any sequence (xk )k∈N ⊂ Rd , such that xk ∈ Ω for
all k and xk → x0 , we have

lim f (xk ) = f lim xk = f (x0 ).
k→∞ k→∞

If U ⊂ Ω and f is continuous at any x0 ∈ U , then we say that f is continuous on U .

Again, to have a shorter notation, we use the concept of a limit of a function. Let us recall what
we mean by this, see Definition 4.27.

Definition 8.10 (Limit of functions). Let Ω ⊂ Rd and f : Ω → Rm . Moreover, let y ∈ Rm

and x0 ∈ Rd be an accumulation point of Ω, i.e., there is a sequence (xk )k∈N ⊂ Ω \ {x0 }
with xk → x0 . Then, we call y the limit of f as x → x0 , if for all sequences (xk ) ⊂ Ω \ {x0 }
with xk → x0 , we have
f (xk ) → y.
In this case we use the notation
lim f (x) = y.
x→x0

Note that, for technical reasons, we have to exclude the limit from the sequences, i.e., we only
consider sequences (xk ) with xk → x0 and xk 6= x0 to define the above limit. If no such sequence
exists for x0 , then we call x0 an isolated point of Ω, and the limit is not defined. However, as
discussed in Remark 4.29, a function is always continuous at isolated points, and we therefore
see that a function is continuous on U ⊂ Ω if and only if

lim f (x) = f (x0 ).

x→x0

for any accumulation point x0 ∈ Ω.

Note that in the definitions above we also consider vector-valued functions, i.e. m ≥ 2, as
it really makes no difference here. One just has to consider all limits component-wise, as by
definition. However, we focus on real-valued functions for some time now.
Already the similarity of all the definitions to the univariate case may indicate that several of
the rules and concepts will also work here, and this is indeed true. The only additional difficulty
is that there are now more indices and variables that have to be considered separately.
But, if one uses a good notation and keeps track of all details, then there are almost no additional
concepts needed in this section.
Let us start with some easy examples for d = 2.
Let the functions f, g : R2 → R and h : R × (R \ {0}) → R be given by

• f (x1 , x2 ) = x1 + x2 ,
236

• g(x1 , x2 ) = x1 · x2 , and
x1
• h(x1 , x2 ) = x2 .

(Note that h : R × (R \ {0}) → R means that the second input is not allowed to be zero.)
These functions are particularly simple as they are just a sum/product/quotient of functions
depending on one variable. From our knowledge about real sequences, we can easily check that
these functions are continuous. For this, let us assume that (xk )k∈N is an arbitrary sequence
converging to some x0 = (x0,1 , x0,2 ) ∈ R2 . We use the notation xk = (xk,1 , xk,2 ). Using that
(xk,1 , xk,2 ) → (x0,1 , x0,2 ) if and only if xk,1 → x0,1 and xk,2 → x0,2 , we obtain
lim f (xk ) = lim (xk,1 + xk,2 ) = lim (xk,1 ) + lim (xk,2 ) = x0,1 + x0,2 = f (x0 ).
k→∞ k→∞ k→∞ k→∞

Since this holds for arbitrary sequences converging to arbitrary x0 ∈ R2 , we see that f is
continuous. The same calculations can be done for g and h.

Similar computations can be used to prove continuity of sum/product/quotient of more general

functions. This leads to a statement very similar to the case univariate functions.

Lemma 8.11. Let Ω ⊂ Rd and f, g : Ω → R be continuous at x0 ∈ Ω.

Then f + g and f · g are continuous at x0 .
f
If additionally g(x0 ) 6= 0, then g is also continuous at x0 .

Proof. The proof is exactly the same as the proofs of Theorem 4.12 and Theorem 4.17.

One can consider the above lemma also for vector-valued functions f, g : Rd → Rm any m ≥ 2.
Clearly, by considering all components separately, we see that the lemma holds also for f + g.
However, product and quotient do not make sense for vector-valued functions.
Additionally, we can again consider the composition of functions. Recall that g ◦ f (x) := g(f (x)). For
vector-valued functions this makes sense only if the dimensions of the corresponding functions agree.
That is, if f ’outputs’ a vector from Rp , then g must ’accept’ such vectors as ’input’. However, in this
case one can show analogously to the univariate case that the composition of continuous functions
is again a continuous function.

Lemma 8.12. Let Ω ⊂ Rd , f : Ω → Rp and g : Rp → Rm for some d, p, m ∈ N.

If f is continuous at x0 and g is continuous at y0 = f (x0 ), then g ◦ f is continuous at x0 .

The most important example for now is with p = m = 1 and h : R → R with h(y) = |y|. This implies
that the absolute value |f | is continuous for every real-valued continuous function f : Rd → R.

For convenience, we repeat the simple proof.

Proof. If limk→∞ xk = x0 , then limk→∞ f (xk ) = f (x0 ) = y0 , since f is continuous at x0 . Since g is

continuous at y0 , this implies limk→∞ g(f (xk )) = g(y0 ) = g(f (x0 )), proving the claim.
In a more compact form, on could write

lim g(f (xk )) = g lim f (xk ) = g f lim xk = g(f (x0 )).
k→∞ k→∞ k→∞
237

With these results, one can easily prove that certain functions are continuous.

Example 8.13. First, we consider the norms discussed above, which are clearly also real-valued functions
on Rd . For notational convenience, let us write f1 (x) := kxk1 , f2 (x) := kxk2 and f∞ (x) := kxk∞ for
x ∈ Rd , or short fp := k · kp for p ∈ {1, 2, ∞}.
All these functions are compositions of the functions gi (x1 , . . . , xd ) := |xi |, which are clearly continuous
(and actually univariate). f1 and f∞ are their sum and maximum, respectively, which are therefore
continuous. (One might proof by induction that the maximum of d numbers is continuous.) For f2 , note
Pd
that also gi2 and hence i=1 gi2 are continuous
√ on Rd . Continuity of f2 then follows by the last part of
Lemma 8.11 with D = R+ and h(t) := t.

Example 8.14. Have a look at the function

√
x3 x1
f (x1 , x2 , x3 ) = x1 cos(x2 ) + e − .
1 + (x2 + x1 )2

This function consists only of continuous functions, and the denominator is bounded away from zero,
which implies that f is continuous.

One important class of continuous functions are multivariate (algebraic) polynomials.

Example 8.15. A multivariate algebraic polynomial has the form

X
p(x1 , . . . , xd ) = ck · xk11 xk22 · · · xkdd ,
kkk1 ≤r

where the sum is over all k = (k1 , . . . , kd ) ∈ Nd0 with kkk1 = k1 + · · · + kd ≤ r ∈ N0 , and the ck ∈ R
are the coefficients of the polynomial. We call r the degree of this polynomial (at least if ck 6= 0 for at
least one k with kkk1 = r). One may even write such polynomials in a shorter way by introducing the
so-called multi-index notation. That is, for x ∈ Rd and k ∈ Nd0 , we define
d
Y
xk := xki i = xk11 xk22 · · · xkdd .
i=1

xk .
P
With this, we can write multivariate polynomials in the familiar form p(x) = kkk1 ≤r ck
A simple example for d = 2 is given by

p(x1 , x2 ) = 2x31 x2 + x21 x22 + 3x2 − 42.

This polynomial is in the above form with c(3,1) = 2, c(2,2) = 1, c(0,1) = 3, c(0,0) = −42 and all other ck ’s
equal zero. The degree of this polynomial is therefore 4.
By Lemma 8.11, multivariate polynomials are clearly continuous.

However, in general it is not so easy to prove continuity of multivariate functions. The reason is that
there are ’too many’ sequences converging to a point. (While there were only left- and right-handed
limits in the univariate case, there are infinitely many ’directions’ for multivariate functions.) Let us see
the following example.

Example 8.16. We consider the function defined by

(
x1 ·x2
x21 +x22
if x 6= 0
f (x1 , x2 ) =
0 if x = 0.

This function is clearly continuous at any x0 6= 0. If we want to prove continuity at 0, i.e., that
limx→0 f (x) = 0, we need to consider all null sequences. In a first try, we may consider the sequences
238

given by yk = ( k1 , 0) and zk = (0, k1 ), which correspond to the limits in coordinate direction. Since
f (yk ) = f (zk ) = 0 for all k (because one of the components is always zero) we see that f (yk ) → 0 and
f (zk ) → 0, which might indicate that the function is continuous. However, if we consider the sequence
tk = (1/k, 1/k) (i.e., the limit ’from the diagonal’), then we see that
1
k2 1
f (tk ) = 1 1 = .
k2 + k2
2

This implies that the limit limx→0 f (x) does not exist, i.e., f is not continuous at 0.

This shows that proving continuity of a function is sometimes an issue, and one needs kind of intuition
to give a detailed proof. In most cases, one even has to try several approaches, as there is no general rule
for this. However, in some cases a function depends on its input only by the norm of the input, and for
such functions one might easily obtain continuity.
Example 8.17. We consider the function defined by
( 1−cos(|x1 |+|x2 |)
√ 2 2 if x 6= 0
f (x1 , x2 ) = x1 +x2
0 if x = 0.
p
Again, the function is continuous at x0 6= 0. For x0 = 0, note that Lemma 8.8 implies that x21 + x22 =
k(x1 , x2 )k2 ≥ √12 k(x1 , x2 )k1 , and therefore, with x = (x1 , x2 ) 6= 0,

√ |1 − cos kxk1 |

|1 − cos kxk1 |
|f (x)| = ≤ 2 .
kxk2 kxk1
If we now take into account that x → 0 if and only if kxk1 → 0 (and use the substitution t := kxk1 ), we
obtain that
√ |1 − cos(t)|
lim |f (x)| ≤ lim 2 = 0.
x→0 t→0 t
(Here, we used l’Hospital’s rule.) This implies limx→0 f (x) = 0, and thus that f is continuous.

We want to finish this subsection with the ε-δ-criterion for multivariate functions, as this will again be
needed in some of the upcoming proofs. Note that this is again very similar to the univariate criterion,
see Theorem 4.63.

Theorem 8.18. Let f : Ω → R and p ∈ {1, 2, ∞}. Then, f is continuous at x0 ∈ Ω if and only if
for any ε > 0 there exists a δ > 0 such that for all x ∈ Ω, we have

kx − x0 kp < δ =⇒ |f (x) − f (x0 )| < ε.

In a formula,
∀ε > 0 ∃δ > 0 ∀x ∈ Ω : kx − x0 kp < δ =⇒ |f (x) − f (x0 )| < ε.

The proof is almost the same as for the one-dimensional case, but we still want to give it here. Moreover,
note that by Lemma 8.8 continuity is just equivalent for all p ∈ {1, 2, ∞}. Therefore, it is enough to
prove it for p = 2 (or any other of them).

Proof. First we show that the ε-δ-criterion holds if f is continuous. Therefore we assume the opposite,
i.e., that there exists ε0 > 0 such that for any δ > 0 there is some y ∈ Ω with ky − x0 k < δ but
|f (y) − f (x0 )| > ε0 , and show that this contradicts the continuity of f . In particular, by assumption, we
may choose δ = n1 and find some yn ∈ Ω with the property
1
kyn − x0 k ≤ and |f (yn ) − f (x0 )| > ε0 .
n
239

This implies that yn → x0 , but f (yn ) 6→ f (x). So, f is not continuous at x, which is a contradiction.

Next we assume that the ε-δ-criterion holds and show that in this case f is continuous. For this, fix
some ε > 0. By definition, if a sequence (xk ) converges to x0 , then for all k large enough it holds that
kxk − x0 k < δ, where δ > 0 is as given by the ε-δ-criterion. Therefore |f (xk ) − f (x0 )| < ε. As this holds
for all ε > 0 and all sequences (xk ), we obtain

lim f (y) = f (x),

y→x

so f is continuous at x.

8.3 Differential calculus

Now we are able to start our discussion about differentiating multivariate functions. As there are several
variables one could imagine that there are different concepts and approaches to do so. Basically we
want to discuss three different kinds of derivatives in the sequel, which are partial derivatives, derivatives
which are seen as linear mappings, i.e., the total derivative, and the directional derivatives. All of them
are somehow related to each other in some way, but as we will see some of them can exist, whilst other
don’t.

Note that, in the same way as in the univariate case, there are some technical problems related to boundary
points of the domain of a function. Therefore, we consider at first only open sets in Rd , which are the
replacement of open intervals on the real line.

Definition 8.19. Let x ∈ Rd , ε > 0 and k·k be a norm on Rd . Then we define

Uε (x) = {y ∈ Rd : kx − yk < ε}

as open neighborhood or open ball (w.r.t. k·k) of radius ε around x.

A set G ⊂ Rd is called open if for every x ∈ G there exists some ε > 0 such that Uε (x) ⊂ G.
A set C ⊂ Rd is called closed if Rd \ C is open.

In particular, every element x0 ∈ G of an open set G is an accumulation point of G.

8.3.1 Partial derivatives

Since this topic includes a lot of indices and so on we use the following notation which will make everything
a bit shorter. If not indicated otherwise k·k always denotes the Euclidean norm, i.e. kxk = kxk2 , and
G ⊂ Rd denotes an open subset of Rd . Moreover, we want to remind you that the i-th unit vector, write
ei , was defined to have a 1 in the i-th coordinate and zeros else.
Finally, let us note that, due to the different indices needed, there might be some confusion between the
coordinates of a vector and the elements of a sequence, as already explained above. In what follows we
always write (or at least try to)
x = (x1 , . . . xd ) ∈ Rd ,
such that xi ∈ R, i = 1, . . . , d, is a number, i.e., the ith coordinate of x, and we use (again) x0 ∈ Rd for
a specific point.
240

Definition 8.20 (Partial derivative). Let G ⊂ Rd be an open set, f : G → R and x ∈ G. If the

limit
∂f f (x + hei ) − f (x)
(x) := lim
∂xi h→0 h
exists, then we call the function f partially differentiable at x w.r.t. the i-th coordinate.
∂f
If ∂xi
(x) exists for any 1 ≤ i ≤ n and all x ∈ M ⊂ G, then we call f partially differentiable in M
and simply partially differentiable if M = G.
If f is partially differentiable (in a neighborhood of x) and all partial derivatives are continuous (at
x), then we say f is continuously partially differentiable (at x).

∂f
Sometimes, also other notations for the partial derivatives ∂x i
are used, like Di f or Dei f or ∂xi f , or just
δi f or fi . Therefore, one needs to be careful when using other literature.
Example 8.21. Let us have a look at the function
f (x1 , x2 ) = x1 · x2 − e−x1 ,
∂f
considered on G = R2 . To calculate ∂x1 (x), using the definition of partial derivatives, we have to compute
the limit h → 0 of
(x1 + h) · x2 − e−(x1 +h) − (x1 · x2 − e−x1 )

f (x + he1 ) − f (x) f (x1 + h, x2 ) − f (x1 , x2 )
= =
h h h
(x1 + h − x1 ) · x2 − (e−x1 −h − e−x1 )
=
h
h · x2 e−x1 (e−h − 1)
= − ,
h h
where x = (x1 , x2 ). This implies, by recalling known limits, that
∂f
(x) = x2 + e−x1 .
∂x1
Note that we only send h → 0 above, and that x1 and x2 are fixed.
In the same way, we obtain
∂f f (x1 , x2 + h) − f (x1 , x2 ) x1 · (x2 + h − x2 ) − (e−x1 − e−x1 )
(x) = lim = lim = x1 .
∂x2 h→0 h h→0 h

A detailed inspection of the above example shows that there is a simpler way to compute partial deriva-
tives, than just to plug in the definition. In fact, note that partial differentiation w.r.t. x1 actually does
not ’touch’ x2 . Therefore, we may just consider x2 as a (fixed) constant and differentiate the univariate
function depending only on x1 . To be precise, consider the expression
f (x + hei ) − f (x) f (x1 , . . . , xi + h, . . . , xd ) − f (x1 , . . . , xi , . . . , xd )
lim = lim .
h→0 h h→0 h
where x = (x1 , x2 , . . . , xd ) is a fixed point. In this expression, the ’inputs’ x1 , . . . , xi−1 , xi+1 , . . . xd are
’untouched’, allowing to treat them as fixed. So by defining the univariate function
g(xi ) := f (x),
we see that
∂f
g 0 (xi ) =
(x).
∂xi
Thus we can compute partial derivatives by calculating one dimensional derivatives, which allows to use
all our knowledge from the previous chapter.
Let us see how this procedure helps us if we want to compute partial derivatives.
241

Example 8.22. We come back to the example 8.21, where we considered

f (x1 , x2 ) = x1 · x2 − e−x1 .

Observing x2 as constant we can immediately see that

∂f
(x) = x2 + e−x1 .
∂x1
In almost the same we way we see that
∂f
(x) = x1 .
∂x2
So indeed this way of calculating the partial derivatives is much easier to handle.

Example 8.23. Sometimes we also have to use some (one dimensional) calculation rules to compute
partial derivatives. We have a look at

f (x1 , x2 ) = sin(x31 + x2 ).

Applying the chainrule for one dimensional functions we compute

∂f
(x) = 3x21 cos(x31 + x2 ).
∂x1
where we again considered x2 as constant. Analogously it follows that
∂f
(x) = cos(x31 + x2 ).
∂x2

Example 8.24. We define G = Rd \{0} (note that this is an open set) and compute the partial derivatives
of f (x) = kxk2 . Since
f (x) = (x21 + x22 + · · · + x2d )1/2 ,
the chainrule implies that
∂f 1 1 1
(x) = 2 2 2 1/2
· 2xi = xi .
∂xi 2 (x1 + x2 + · · · + xd ) kxk2

Example 8.25. The function defined by

(
x1 ·x2
x21 +x22
if x 6= 0
f (x1 , x2 ) =
0 if x = 0.

Using the product rule for one dimensional functions we easily see that f is partially differentiable in
R2 \ {0}. Furthermore, we observe that

f (hei ) − f (0) 0
= =0
h h
∂f ∂f
for any h ∈ R \ {0}. Thus ∂x1 (0) and ∂x2 (0) also exist, making f partial differentiable on R2 .
However, f is not continuous at 0 as we will show by using the sequence xk = (1/k, 1/k), which converges
to 0. But
1
2 1
f (xk ) = k 1 = .
2 k2 2
This implies that f (xk ) cannot converge to 0 = f (0).

Writing all partial derivatives of a function in a (row) vector, we obtain a compact notation.
242

Definition 8.26 (Gradient). Let f : G → R be partially differentiable in G. Then, we call

∂f ∂f ∂f
(grad f )(x) = ∇f (x) = (x), (x), . . . , (x)
∂x1 ∂x2 xd

the gradient of f at the point x ∈ G.

Remark 8.27. Some authors prefer to write the gradient as a column vector.

Example 8.28. We consider again Example 8.23, where we considered

f (x1 , x2 ) = sin(x31 + x2 ).

We already computed the partial derivatives and saw that they exist for any (x1 , x2 ) ∈ R2 . Thus

∇f (x) = 3x21 cos(x31 + x2 ), cos(x31 + x2 ) .

There is a result for gradients which can be considered as generalization of the product rule.

Theorem 8.29 (Product rule). Let f, g : G → R be partially differentiable functions. Then we have

∇(f g) = (∇f ) · g + (∇g) · f.

Proof. By the definition of the gradient it is sufficient to proof the statement for each coordinate. We
use the product rule for for one dimensional functions to compute
∂f g ∂f ∂g
(x) = (x) · g(x) + (x) · f (x).
∂xi ∂xi ∂xi

Example 8.30. Consider the easy example

f (x1 , x2 ) = x1 · x2 ,

for which we clearly have ∇f (x) = (x2 , x1 ). However, if we write f = g · h with g(x1 , x2 ) = x1 and
h(x1 , x2 ) = x2 , we see that ∇g(x) = (1, 0) and ∇h(x) = (0, 1) for all x = (x1 , x2 ) ∈ R2 . With the
product rule, we obtain

∇f (x) = ∇(gh)(x) = ∇g(x) · h(x) + ∇h(x) · g(x)

= (h(x), 0) + (0, g(x)) = (h(x), g(x)) = (x2 , x1 ).

Note that in these computations, it is somehow obvious that everything is done at the (fixed) point x.
Therefore, the ’(x)’ is unnecessary and we write in short

∇f = ∇(gh) = ∇g · h + ∇h · g
= (h, 0) + (0, g) = (h, g) = (x2 , x1 ).

8.3.2 (Total) differentiability

Example 8.25 shows that existence of partial derivatives does not imply continuity of functions, which
was a necessary condition for differentiating univariate functions. This, in particular, shows why we want
to find a somehow ’better’ generalization of the one dimensional differentiability. Therefore, recall that
a function f : R → R is differentiable at x0 ∈ R if and only if
f (x0 + h) − f (x0 ) f (x) − f (x0 )
lim = lim
h→0 h x→x 0 x − x0
243

Figure 48: Tangent line T to a function f at x0

exists and in this case we can write

f (x) − f (x0 ) f (x) − f (x0 ) − (x − x0 )f 0 (x0 )
f 0 (x0 ) = lim ⇐⇒ lim = 0.
x→x0 x − x0 x→x0 x − x0

One way to interpret this, is that T (x) := f (x0 ) + (x − x0 )f 0 (x0 ) is the tangent line to f at x0 .
The tangent is by definition an affine function, i.e., a linear function plus a constant. Therefore, one can
say that the tangent line to a function f at x0 is given by T (x) = f (x0 ) + D(x − x0 ), where D : R → R is
a linear function. Note that D(x − x0 ) = 0 for x = x0 (since D is linear), and therefore T (x0 ) = f (x0 ).
We will now use this to define differentiability in the multidimensional case. Recall that a linear mapping
D : Rd → R is characterized by the property that D(x + y) = D(x) + D(y) for any x, y ∈ Rd , and can
Pd
always be described by D(x) = i=1 ai xi = aT x = ha, xi for some a ∈ Rd .
This definition does not appear very handy, but we will see shortly its relation to the gradient.

Definition 8.31. Let G ⊂ Rd be an open set, f : G → R and x ∈ G.

We call f (totally) differentiable at x if there exists a linear mapping dfx : Rd → R such that

f (x + y) − f (x) − dfx (y)

lim = 0.
y→0 kyk

The mapping dfx is called (total) derivative (or differential) of f at x.

Equivalently, f is (totally) differentiable at x with derivative dfx if there exists a function r : Rd →
R such that
f (x + y) = f (x) + dfx (y) + r(y)
(whenever x + y ∈ G) and
r(y)
lim = 0.
y→0 kyk

If f is differentiable at every point of G, then we simply say f is differentiable.

Remark 8.32. There is again some other commonly used notation for the derivative dfx . For example,
f 0 (x), Dx f , or (even without the point x) Df or df . So, again be careful when using other literature.

In the same way as for univariate functions, i.e., d = 1, the derivative dfx can be used to define the
tangent plane T to a function f at the point x ∈ Rd by T (y) := f (x) + dfx (y − x), which is the best
approximation by an affine function at x. We come back to this and give a handy formula for the
tangent plane using partial derivatives.

Let us discuss an example.

244

Figure 49: Tangent plane T to f at x intersects f at P = (x, f (x))

Example 8.33. Let f : R3 → R be given by

2
f (x) = kxk = x21 + x22 + x23
with x = (x1 , x2 , x3 ). We see that
f (x + y) = (x1 + y1 )2 + (x2 + y2 )2 + (x3 + y3 )2
= x21 + 2x1 y1 + y12 + x22 + 2x2 y2 + y22 + x23 + 2x3 y3 + y32
= x21 + x22 + x23 + 2x1 y1 + 2x2 y2 + 2x3 y3 + y12 + y22 + y32
= f (x) + 2x1 y1 + 2x2 y2 + 2x3 y3 + f (y).
(We first collected the terms that do not depend on y, then the linear terms, and then the rest.) We see
2
that r(y) = f (y) = kyk satisfies
2
r(y) kyk
lim = lim = lim kyk = 0.
y→0 kyk y→0 kyk y→0

Therefore, the linear mapping/function

3
X
dfx (y) = 2(x1 y1 + x2 y2 + x3 y3 ) = 2 xi yi
i=1

is the (total) derivative of f at x. The tangent plane T to f at a point x ∈ R3 is therefore

3
X 3
X
2 2
T (y) = f (x) + dfx (y − x) = kxk + 2 xi (yi − xi ) = −kxk + 2 xi yi .
i=1 i=1

This example was particularly simple, because the error of the linear approximation, i.e., the function r,
did not depend on x. However, this may clearly happen, as the next example shows.
Example 8.34. Let f : R2 → R be given by
f (x) = x21 · x2
with x = (x1 , x2 ). We see that
f (x + y) = (x1 + y1 )2 (x2 + y2 )
= (x21 + 2x1 y1 + y12 )(x2 + y2 )
= f (x) + 2x1 x2 y1 + x21 y2 + x2 y12 + 2x1 y1 y2 + y12 y2 .
245

(Again, we collected the terms that do not depend on y, then the linear terms, then the rest.) We see
that r(y) = x2 y12 + 2x1 y1 y2 + y12 y2 (which depends on x) satisfies
y2 y 2 y2

r(y) y1 y2
lim = lim x2 1 + 2x1 + 1 = 0.
y→0 kyk y→0 kyk kyk kyk
(Verify this yourself using y1 y2 = max{y1 , y2 } min{y1 , y2 } and kyk ≥ kyk∞ .)
Therefore, the linear function
dfx (y) = 2x1 x2 y1 + x21 y2
is the (total) derivative of f at x.

Let us now show that differentiability is indeed a stronger condition than the existence of partial deriva-
tives, and implies continuity.

Theorem 8.35. Let G ⊂ Rd be open and f : G → R be (totally) differentiable at x ∈ G. Then,

1) f is continuous at x,
2) all partial derivatives of f exist at x, and
3) the (total) derivative of f is given by the gradient by
d
X ∂f
dfx (y) = ∇f (x) · y = h∇f (x), yi = (x) · yi .
i=1
∂xi

Proof. Since f is differentiable at x we know that dfx and r exist such that
f (x + y) = f (x) + dfx (y) + r(y),
r(y)
dfx is linear and that limy→0 kyk = 0. Thus we have that

lim f (y) = lim f (x + y) = f (x) + lim dfx (y) + lim r(y).

y→x y→0 y→0 y→0

Since dfx is linear, and therefore continuous, we have that limy→0 dfx (y) = dfx (0) = 0. For the second
limit we use limy→0 |r(y)| ≤ limy→0 |r(y)|
kyk = 0. This shows that all in all we have that

lim f (y) = f (x), i.e., f is continuous at x.

y→x

To show that all partial derivatives exist, and how they Pd are related to the total derivative, first observe
that, due to linearity, dfx can be written by dfx (y) = i=1 ai yi for some a1 , . . . , ad ∈ R. Using again the
Pd
representation f (x + y) = f (x) + i=1 ai yi + r(y) for the specific sequences y = hei for h → 0 (h ∈ R),
we obtain
f (x + hei ) − f (x) hai + r(hei ) r(hei )
= = ai +
h h h
r(hei ) r(y) ∂f
(ej denotes the j-th unit vector) Since limh→0 h = limy→0 kyk = 0 we see that ∂xi (x) exists and

∂f f (x + hei )
(x) = lim = ai ,
∂xi h→0 h
proving the theorem.

Example 8.36. If we have a look at Example 8.33 we saw that f , which was given by
2
f (x) = kxk ,
P3
was differentiable for any x ∈ R3 . Moreover, we calculated that dfx (y) = 2hx, yi = 2 i=1 xi yi . The
components are given by 2(x1 , x2 , x3 ), which is exactly the gradient of f .
246

Still we would like to have a simple criterion to decide if f is differentiable.

Theorem 8.37. Let f : G → R be a continuously partially differentiable function at x ∈ G. Then,

f is (totally) differentiable at x and the derivative is given by the gradient.

Proof. Due to Theorem 8.35 it is sufficient to show that f is differentiable.

Since G is open there exists a neighbourhood of x which is contained in G, so for δ > 0 small enough and
kyk < δ the points
k
X
z (k) = x + yi ei , k = 0, . . . , d,
i=1

are contained in G. Again, ei denotes the i-th unit vector. Furthermore, we see that

z (k) − z (k−1) = yk ek , z (0) = x and z (d) = y.

Thus the mean value theorem of differential calculus, see Theorem 5.34, implies that there exists some
ξk ∈ [0, 1] such that
∂f (k−1)
f (z (k) ) − f (z (k−1) ) = z + ξk yk ek · yk .
∂xk
(Make sure you understand why we can apply the one dimensional mean value theorem here!)
We set ηk = z (k−1) + ξk yk ek and use a telescoping trick to see
d−1 d
X X ∂f
f (x + y) − f (x) = f (z (k+1) ) − f (z (k) ) = (ηk ) · yk .
∂xk
k=1 k=1

Now we define
d
∂f X ∂f
ak = (x) and r(y) = (ηk ) − ak yk ,
∂xk ∂xk
k=1

such that
d
X
f (x + y) − f (x) = ak · yk + r(y) = h∇f (x), yi + r(y).
k=1
∂f
Due to the continuity of all partial derivatives it follows that ak = limy→0 ∂xk (ηk ). Moreover, an appli-
cation of the Cauchy-Schwarz inequality yields

r(y) ≤ k∇f (η) − ak · kyk ,

where we set η = (η1 , . . . , ηd ) and a = (a1 , . . . , ad ). Putting everything together we obtain that

r(y)
lim = 0,
y→0 kyk

and therefore that dfx (y) = h∇f (x), yi.

Remark 8.38. The opposite of the above theorem does not hold, i.e., there exist differentiable functions
not continuously partially differentiable. One example that shows this is the function f (x) =
which are
1
kxk2 sin kxk , continuously extended to x = 0 by setting f (0) = 0, which is differentiable at 0, but
whose partial derivatives are not continuous. We omit the details.
247

The theorem which we just showed also allows to make notation a bit easier, i.e. shorter. In particular
we now know that, if a function is continuously partially differentiable, then it is differentiable and we
can interpret the gradient as derivative. In this case the gradient is also continuous, since all components
are continuous. So from here on we call a function continuously differentiable, if it is continuously
partial differentiable. Nevertheless it is important to be very precise if we do not have continuous partial
derivatives.
Let us see some more examples.

Example 8.39. Let C be a d × d matrix and consider

d
X d
X d X
X d
f (x) = xT Cx = xi cij xj = cij xi xj ,
i=1 j=1 i=1 j=1

for x = (x1 , . . . , xd ) ∈ Rd . Functions of this form are of particular interest for quadratic optimization
problems.
Linearity and the product rule implies that the k-th component of ∇f is given by
d X d d X d
∂f X ∂(xi xj ) X ∂xj ∂xi
(x) = cij = cij xi · + xj ·
∂xk i=1 j=1
∂xk i=1 j=1
∂xk ∂xk

∂xj ∂xj ∂xj

Using that ∂xk = 1 for j = k and ∂xk = 0 for j 6= k, i.e., ∂xk = δjk , we obtain

d X d d d
∂f X X X
(x) = cij (xi · δjk + xj · δik ) = cik xi + ckj xj
∂xk i=1 j=1 i=1 j=1

= (xT C)k + (Cx)k = (xT C)k + (xT C T )k = xT (C + C T )

k
,

where (y)k denotes the k-th entry of the (row/column) vector y. Therefore,
T
∇f (x) = xT (C + C T ) = (C + C T )x .

(Note that this is a row vector, as required.)

In the important special case that C is a symmetric matrix, i.e., C = C T , we obtain
T
∇f (x) = 2xT C = 2 (Cx) .

Clearly, all partial derivatives (i.e., entries of ∇f ) are linear and therefore continuous functions. By
Theorem 8.37 this implies that f is (totally) differentiable.

Example 8.40. Now we have a look at the function

2
f (x) = e−kxk = exp −(x21 + x22 + · · · + x2d ) .

It follows that
∂f 2
(x) = −2xk e−kxk .
∂xk
Since all partial derivatives, and thus the gradient, are continuous, we see that f is differentiable for any
x ∈ Rd . Moreover,
d
2 X 2
dfx (y) = h∇f (x), yi = −2e−kxk xk yk = −2e−kxk hx, yi
k=1

is the (total) derivative of f at x.

248

8.3.3 Directional derivatives

We finally discuss how to use multidimensional derivatives to describe the slope (or increase) of a function
in a fixed direction. Note that in the multidimensional setting we need to decide in which direction we
want to measure the slope. Just imagine you go for a walk on a mountain. Then there might be a
different slope in each direction, and it might be of interest in which direction is the largest increase or
decrease. It will turn out, that the gradient actually points to the direction of largest increase/decrease.

Definition 8.41 (Directional derivatives). Let G ⊂ Rd be open, f : G → R and x ∈ G.

A vector v ∈ Rd with kvk2 = 1 is called a direction, and we define the set of all directions, i.e., the
(d − 1)-dimensional unit sphere, by

Sd−1 := {v ∈ Rd : kvk2 = 1}.

The directional derivative of f at x ∈ G w.r.t. v is given by

f (x + hv) − f (x)
Dv f (x) = lim
h→0 h
if the limit exists.

∂f
Remark 8.42. Again, there are many different notations for Dv f (x), like ∂v (x) and ∇v f (x).
Remark 8.43. If we choose v = ei , we see that Dei f = Di f is the i-th partial derivative.
Remark 8.44. The intuition might be, that Dv f (x) is the height change if we make ’one step’ of length
1 on the tangent plane starting at x in direction v.

As before, we can give a formula for directional derivatives using the gradient.

Theorem 8.45. Let G ⊂ Rd be open, f : G → R be differentiable at x ∈ G and v ∈ Sd−1 .

Then, the directional derivative at x w.r.t. v can be computed as
d
X ∂f
Dv f (x) = dfx (v) = h∇f (x), vi = (x) · vi .
i=1
∂xi

Proof. As for the proof of Theorem 8.35 we use a special choice of y in the definition of the (total)
derivative to obtain the result.
By Definition 8.31 with y = h · v and h ∈ R small enough, we see that
f (x + hv) − f (x) dfx (hv) + r(hv) r(hv)
= = dfx (v) + ,
h h h
where we used that dfx (hv) = h · dfx (v) since dfx is linear. Moreover, limh→0 r(hv)
h = limy→0 r(y)
kyk =0
implies that Dv f (x) = dfx (v). The rest of the statement follows from Theorem 8.35.

This result allows us to easily compute directional derivatives of functions.

Example 8.46. Let us consider the function from Example 8.22, i.e.,
f (x) = x1 x2 − e−x1 .
√ √
We have that ∇f (x) = (x2 + e−x1 , x1 ). So the directional derivative w.r.t. v = (1/ 2, 1/ 2) is given by
1
Dv f (x) = √ (x2 + e−x1 + x1 ).
2
249

Example 8.47. As in Example 8.39 consider f (x) = xT Cx where C was a symmetric matrix. The
gradient of f is
∇f (x) = 2Cx,
which implies that for any v with kvk = 1 the directional derivative is given by

Dv f (x) = 2xT Cv.

Interestingly, it turns out that the gradient points to the direction of the largest slope. That is, the
gradient ∇f (x) is the direction such that |Dv f (x)| is maximized.

Theorem 8.48. Let G ⊂ Rd and f : G → R be differentiable at x ∈ G. Then,

|Dv f (x)| ≤ k∇f (x)k for all v ∈ Sd−1 = {v ∈ Rd : kvk = 1}.

∇f (x)
Moreover, Dv f (x) = ± k∇f (x)k if and only if v = ± k∇f (x)k .

Proof. Since Dv f (x) = h∇f (x), vi, see Theorem 8.45, we obtain from the Cauchy-Schwarz inequality
(Lemma 1.76) that
|Dv f (x)| = |h∇f (x), vi| ≤ k∇f (x)k kvk = k∇f (x)k ,
since kvk = 1. Moreover, we have equality if and only if v = c · ∇f (x), and this c must be ± k∇f1(x)k since,
again, we require kvk = 1. Now, for c = k∇f1(x)k we obtain Dv f (x) = k∇f (x)k, and for c = − k∇f1(x)k
we obtain Dv f (x) = − k∇f (x)k.

8.3.4 Higher order partial derivatives

Clearly, and as in the univariate case, we sometimes want to differentiate a function more than once,
which leads to the theory of higher order partial derivatives. Again, as in the univariate case, this can be
done be iterating the differentiation procedure. However, since there are more coordinates than one, it
seems that we need to be careful in which order we calculate the derivative. Luckily, this is not the case
if the functions under consideration are ’nice enough’, in which case we have
∂ ∂f ∂ ∂f
= .
∂xi ∂xj ∂xj ∂xi

That is interchanging the order of differentiation does not change the partial derivative. In particular,
we will see that this is true if all involved (second-order) partial derivatives are continuous.

Definition 8.49. Let G ⊂ Rd be open and f : G → R be partially differentiable at x ∈ G.

If for any i, j ∈ {1, 2, . . . , d} we have that the second-order partial derivatives

∂2f ∂ ∂f
(x) := (x)
∂xj ∂xi ∂xj ∂xi

exist, then we call f twice partially differentiable at x.

If all first-order partial derivatives are totally differentiable (at x), then we call f twice differentiable
(at x).
If all second-order partial derivatives are continuous (at x), then we call f twice continuously
differentiable (at x).
250

(Make sure that you understand the difference between these definitions.)

Example 8.50. We want to compute all second-order partial derivatives of

f (x1 , x2 ) = x21 x22 + x2 − x1 .

We observe that
∂f ∂f
= 2x1 x22 − 1 and = 2x21 x2 + 1.
∂x1 ∂x2
∂f
Differentiating ∂x1 once more w.r.t. x1 , x2 , we see

∂2f ∂ ∂f ∂
2 = = (2x1 x22 − 1) = 2x22 .
∂x1 ∂x1 ∂x1 ∂x1

and
∂2f ∂
= (2x1 x22 + 1) = 4x1 x2 .
∂x2 ∂x1 ∂x2
Analogously we compute
∂2f ∂2f
= 4x1 x2 and = 2x21 .
∂x1 ∂x2 ∂x22

This example shows that we have to systematically compute and write down second-order partial deriva-
tives, especially for large d. However, since we have to derive w.r.t. xi and xj for i, j ∈ {1, 2, . . . , n} we
can use a matrix to collect all these functions.

Definition 8.51 (Hessian). Let G ⊂ Rd and f : G → R be twice partially differentiable at x ∈ G.

The matrix
∂2f ∂2f ∂2f
 
∂x21
(x) ∂x1 ∂x2 (x) ... ∂x1 ∂xd (x)
 2 
 ∂ f (x) ∂2f ∂2f
2
d  ∂x2 ∂x1 ∂x22
(x) ... ∂x2 ∂xd (x)

∂ f
Hf (x) = (x) =
 
∂xi ∂xj .. .. .. .. 
i,j=1

 . . . .


 
∂2f ∂2f ∂2f
∂xd ∂x1 (x) ∂xd ∂x2 (x) ... ∂x2d
(x)

is called the Hessian (matrix) (german: Hesse-Matrix) of f at the point x.

Example 8.52. The Hessian of the function from Example 8.50 above, which was given by f (x) =
x21 x22 + x1 − x2 with x = (x1 , x2 ), is

2x22

4x1 x2
Hf (x) = .
4x1 x2 2x21

In this example, the Hessian is a symmetric matrix, which raises the question for which functions we have
∂2 ∂2
∂xi ∂xj f = ∂xj ∂xi f . We will see soon that this is guaranteed under rather weak assumptions.

However, we first show that the Hessian is related to second-order directional derivatives, which
is differentiating twice into given directions. That is, for two directions u, v ∈ Sd−1 , we compute the
directional derivative w.r.t. u of the directional derivative Dv f . This is similar to Theorem 8.45, where
we showed that the gradient is connected to the (first-order) directional derivatives.
251

Theorem 8.53. Let G ⊂ Rd , f : G → R be twice differentiable at x ∈ G and u, v ∈ Sd−1 .

Then, the second-order directional derivative at x w.r.t. u and v can be computed as
d X
d
X ∂2f
Du (Dv f )(x) = uT Hf v = (x) · ui vj ,
i=1 j=1
∂xi ∂xj

where Hf := Hf (x) is the Hessian of f at x.

In particular, Dv2 f (x) = v T Hf v.

Proof. Since f is twice differentiable, which implies that all first-order partial derivatives are (totally)
Pd ∂f
differentiable, we obtain that also Dv f (x) = i=1 ∂xi (x) · vi is differentiable, because it is a sum of
differentiable functions. This implies that we can use the gradient of Dv f to compute the directional
derivatives Dv f , see Theorem 8.45. That is,
d
X ∂(Dv f )
Du (Dv f )(x) = (x) · uj
j=1
∂xj

We therefore obtain with the Hessian Hf := Hf (x) that

 ∂ ∂f ∂ ∂f ∂ ∂f 
∂x1 ∂x1 (x) ∂x1 ∂x2 (x) ... ∂x1 ∂xd (x)
 ∂ ∂f ∂ ∂f ∂ ∂f

 ∂x2 ∂x1 (x) ∂x2 ∂x2 (x)
 ... ∂x2 ∂xd (x)

T T 
u Hf v = v  .. ..
v


 . . 

∂ ∂f ∂ ∂f ∂ ∂f
∂xd ∂x1 (x) ∂xd ∂x2 (x) ... ∂xd ∂xd (x)
Pd   Pd 
∂ ∂f ∂ ∂f
i=1 ∂x1 ∂xi (x)vi ∂x1 i=1 ∂xi (x)vi
P   
 d ∂ ∂f  ∂ Pd ∂f
∂x2 ∂xi (x)vi  ∂xi (x)vi 
 
 i=1
T  T  ∂x2
 i=1
= u  
 = u 

.. .. 
. .
   
   
Pd ∂ ∂f ∂
P d ∂f
i=1 ∂xd ∂xi (x)vi ∂xd i=1 ∂xi (x)vi
∂
∂x1 Dv f (x)
 
 ∂ Dv f (x) d
∂
T  ∂x2
X
= u   = Dv f (x) · uj

..
 .  j=1
∂xj
∂
∂xd Dv f (x)
= Du (Dv (f ))(x).

The next theorem due to Hermann Schwarz (1843–1921) from 1873, which has a rather long history with
several earlier incomplete proof attempts, shows that the Hessian is symmetric (i.e., one can interchange
partial derivatives), whenever f is twice continuously differentiable.

Theorem 8.54 (Schwarz’s theorem). Let G ⊂ Rd and f : G → R be twice continuously differentiable

at x ∈ G. Then, for any i, j ∈ {1, 2, . . . , d}, it holds that

∂2f ∂2f
(x) = (x).
∂xi ∂xj ∂xj ∂xi

In particular, this shows that the Hessian of f is a symmetric matrix, i.e. Hf (x) = (Hf (x))T .
252

Proof. First note that we can prove the statement individually for every pair (i, j) ∈ {1, . . . , d}2 and we
can treat the other components as constants. Therefore, is it is sufficient to prove the statement for the
case that d = 2, i = 1, and j = 2. This means we consider a function f : R2 → R. Additionally we
assume that w.l.o.g. x = 0 ∈ G, which will save a lot of notation. (If x 6= 0, then we consider the function
g(·) = f (· + x).) We will use the notation f1 and f2 to denote the partial derivatives w.r.t. x1 or x2 ,
respectively.
Observe that by definition of partial derivatives, we have

∂2f ∂f2 f2 (k, 0) − f2 (0, 0)

(0, 0) = (0, 0) = lim
∂x1 ∂x2 ∂x1 k→0 k
f (k, h) − f (k, 0) − f (0, h) + f (0, 0)
= lim lim .
k→0 h→0 kh

Now we fix small enough k, h 6= 0, such that (k, h) ∈ G, and have a look at the univariate function
g(t) = f (t, h) − f (t, 0). An application of the mean value theorem of differential calculus, see Theorem
5.34, yields that there exists some ξ ∈ [0, k] such that

g(k) − g(0) = kg 0 (ξ) = k(f1 (ξ, h) − f1 (ξ, 0)).

(Here we use that g is continuous and differentiable on the interval (0, k).)
Another application of the mean value theorem shows that there exists some η ∈ [0, h] such that

∂f1 ∂2f
f1 (ξ, h) − f1 (ξ, 0) = h (ξ, η) = h (ξ, η).
∂x2 ∂x2 ∂x1
Plugging in for g(k) − g(0) we see that

∂2f

f (k, h) − f (0, k) − f (h, 0) + f (0, 0) = kh (ξ, η) .
∂x2 ∂x1

If k, h → 0 then ξ, η → 0 and due to continuity of the second-order derivatives it follows that

∂2f f (k, h) − f (0, k) − f (h, 0) + f (0, 0)

(0, 0) = lim lim
∂x1 ∂x2 k→0 h→0 kh
2
∂ f
= lim lim (ξ, η)
ξ→0 η→0 ∂x2 ∂x1

∂2f
= (0, 0).
∂x2 ∂x1

Remark 8.55. The continuity of the second-order partial derivatives is a necessary condition for the
above result to hold for all such functions. If all second-order partial derivatives exist, but are not
necessarily continuous, then there are examples where the Hessian is symmetric or not, depending on the
point x.

Example 8.56. Let us have a look at the function

f (x1 , x2 ) = sin(x1 ) cos(x2 ).

We see that the Hessian is given by

sin(x1 ) cos(x2 ) cos(x1 ) sin(x2 )
H(x) = (−1) .
cos(x1 ) sin(x2 ) sin(x1 ) cos(x2 )

Note that, since all second-order partial derivatives are continuous (which should be known also before
2 2
their computation), we don’t need to compute ∂x∂1 ∂x f
2
and ∂x∂2 ∂x
f
1
separately. They are just equal, so
computation of one of them is enough to write down the Hessian.
253

Analogously, one can also define partial derivatives of arbitrary order.

Definition 8.57. Let G ⊂ Rd and f : G → R be k-times partially differentiable (at x ∈ G).

If all partial derivatives of each k-th order partial derivative exist (at x ∈ G), then we say that f is
(k + 1)-times partially differentiable (at x ∈ G). We use the notation

∂ ∂ ∂f
Dik+1 . . . Di2 Di1 f (x) := ··· (x)
∂xik+1 ∂xi2 ∂xi1

for ij ∈ 1, . . . , d with j = 1, . . . , k + 1.
If all k-th order partial derivatives are totally differentiable (at x), then we call f (k + 1)-times
differentiable (at x).
If all (k + 1)-st order partial derivatives are continuous (at x), then we call f (k + 1)-times contin-
uously differentiable (at x).

The Schwarz theorem also holds in this case.

Theorem 8.58. Let G ⊂ Rd and f : G → R be k-times continuously differentiable. Then for any
i1 , i2 , . . . , ik ∈ {1, 2, . . . , d} and any permutation σ of {1, . . . , k} we have

Diσ(k) . . . Diσ(2) Diσ(1) f (x) = Dik . . . Di2 Di1 f (x).

Proof. This follows by inductively applying the Schwarz theorem, see Theorem 8.54.

8.4 Extrema
As in the one dimensional case, one can use differential calculus to study optimization problems, i.e., to
find extrema. Although some of the techniques used here are (or at least look) more complicated than
the corresponding parts of Section 5, the overall strategy is the same:

1. We use the derivative to find candidates for (local) extrema, i.e., stationary points,
2. if possible, we use the second derivatives to check if (local) maximum or minimum.
3. Finally, we consider the boundary of the considered domain separately.

As we learned in Section 5 in the univariate case, stationary points are the candidates for local extrema,
if they are in the domain of the function. But also functions without stationary points may have extreme
points. (Consider the easy example of a linear function on a closed interval.) For this we always needed
to compute the function values at the boundary points and/or consider the limit to ±∞ to verify if a
function has global/local extrema.
Unfortunately, all these objects are more difficult to handle than in the univariate case. First of all, there
are several (partial) derivatives in a given point, in contrast to the univariate case, and so we need to
generalize the previous concepts. We will see that, here, the gradient plays the role of the derivative and
the Hessian matrix will substitute the second derivative.
An additional difficulty comes from multivariate domains. While the boundary of a bounded interval,
which was the typical domain for univariate function, consists only of two points, the boundary in the
multivariate setting is more complex and needs more care. We will see an example in a second.
254

However, let us first recall the definition of global extrema from Definition 4.54.

Definition 8.59. Let D be any set and f : D → R.

Then, f has a (global) minimum at x0 ∈ D if

f (x) ≥ f (x0 ) for all x ∈ D,

and f has a (global) maximum at x0 ∈ D if

f (x) ≤ f (x0 ) for all x ∈ D.

The point x0 is called (global) minimum/maximum point, or global extreme point.

The value f (x0 ) is called minimum/maximum, or, collectively, extreme value of f .

Note that this is word by word the same definition; it works for arbitrary domains.

Also the definition of local extrema comes only with minimal modifications, see Definition 5.24.

Definition 8.60. Let Ω ⊂ Rd and f : Ω → R.

Then, f has a local minimum at x0 ∈ Ω if there exists ε > 0 such that

f (x) ≥ f (x0 ) for all x ∈ Uε (x0 ) ∩ Ω,

and a strict local minimum if f (x) > f (x0 ) for all x ∈ Uε (x0 ) ∩ Ω \ {x0 }.
Analogously, we say f has a local maximum at x0 ∈ Ω if there exists ε > 0 such that

f (x) ≤ f (x0 ) for all x ∈ Uε (x0 ) ∩ Ω,

and a strict local maximum if f (x) < f (x0 ) for all x ∈ Uε (x0 ) ∩ Ω \ {x0 }.
The point x0 ∈ Ω is called local maximum/minimum point, or local extreme point.

We use Ω as a domain here, to make clear that this set does not need to be an open set. (Note that we
assumed that G is always open.)

Before we turn to the actual computation of extrema and extreme values, let us first consider the question
whether a function has an extremum or not. For this, recall from extreme value theorem (Theo-
rem 4.56) that every univariate function f : [a, b] → R, defined on a closed interval, attains its minimum
and maximum.

Theorem 8.61 (Extreme value theorem). Let C ⊂ Rd be a closed and bounded set and f : C → R
be a continuous function. Then there exist xmin , xmax ∈ C such that

f (xmin ) = inf f (x) := inf {f (x) : x ∈ C} ,

x∈C
f (xmax ) = sup f (x) := sup {f (x) : x ∈ C} .
x∈C

In other words, continuous functions attain their extreme values on closed and bounded sets.

Proof. Coming soon...

255

We start with an “easy” example.

Example 8.62. Consider

2 2
f (x) := e−(x1 +2x2 )
with x = (x1 , x2 ) on the set Ω := {x : kxk ≤ 1} = {(x1 , x2 ) : x21 + x22 ≤ 1}.

Figure 50: 3d and contour plot of f (x1 , x2 ) = exp(−(x21 + 2x22 )).

From what we know about exponentials, we see that f (x) ≤ 1 for all x ∈ R2 , and that f (x) = 1 if and
only if x = (0, 0). Therefore, f (0) > f (x) for every x 6= 0, which implies that x0 = (0, 0) is a strict local
maximum as well as the global maximum of f .
When looking for a minimum we first realize that f (x1 , x2 ) converges to zero, when x1 and/or x2 tend
to infinity, but f > 0. Therefore, if we would consider f on R2 , then f would not have a minimum. (One
would say its infimum is 0.) However, we consider extreme values on Ω and, since we have |x1 | ≤ 1 and
|x2 | ≤ 1 for all x = (x1 , x2 ) ∈ Ω, it is obvious that
2 2
f (x) = e−(x1 +2x2 ) ≥ e−(1+2) = e−3 ≥ 0.049 ∀x ∈ Ω.

Moreover, due to monotonicity, we see that the minimum is on the boundary {x : kxk = 1}.
It might already be seen (from the contour plot above) that f decreases fastest in x2 -direction, and
therefore that the global/local minima are at x0 = (0, ±1) such that f (x0 ) = e−2 . However, it remains
to find a way to verify this in a systematic way.

This example shows that even in simple examples, it might be non-trivial to determine all extrema when
we are working on bounded sets. Therefore, we first consider here functions that are defined on
the whole Rd , i.e., we consider extrema of functions f : Rd → R. In the context of optimization, this is
sometimes called free (or unconstrained) optimization. We will come back to extrema on subsets Ω ⊂ Rd ,
which corresponds to constrained optimization, afterwards.

Remark 8.63. It might be interesting to note here that the set Rd is open and closed at the same time,
see Definition 8.19. This follows from the fact that the empty set ∅ is, by definition, open. (As there is
no x ∈ ∅, the requirement to be open set is trivially true.) Therefore, all the results from the previous
subsections, which were stated for open sets, are valid for G = Rd .

Before we turn to the generalization of the concepts from Section 5, let us discuss another simple exam-
ple that shows that there might be infinitely many (local) extrema, but none of them is a strict local
extremum.

Example 8.64. Consider f : R2 → R with f (x1 , x2 ) := sin(x1 + x2 ).

256

Figure 51: 3d and contour plot of f (x1 , x2 ) = sin(x1 + x2 ).

We already know that sin(t), t ∈ R, is maximal whenever t = π2 + 2kπ and minimal when t = 3π 2 + 2kπ,
where k ∈ Z. So all possible maxima x0 = (x1 , x2 ) have to satisfy x1 + x2 = π2 + 2kπ for some k ∈ Z,
and for such a point we have f (x0 ) = 1. This implies that f (x) ≤ f (x0 ) for all x ∈ R2 , i.e., such x0 are
(local) maxima. (Recall that sin(t) ∈ [−1, 1] for all t ∈ R.)
However, f does not have any strict local maxima. To see this, let x0 = (x1 , x2 ) be a local maximum and
y = (x1 + δ, x2 − δ) with δ > 0 be another point. We obtain

f (y) = sin(x1 + δ + x2 − δ) = sin(x1 + x2 ) = f (x0 ) = 1.

Hence, y is also a maximum and, since y can be arbitrary close to x0 for δ small enough, we obtain that
x0 cannot be a strict local maximum. The same arguments work for minima of f .

As for d = 1, we need criteria to check if some x ∈ Rd is a minimum/maximum of a function f . Recall

that in the univariate case we saw that if the slope of a function was different from 0 at some point then
it was not possible for this point to be a minimum/maximum, see Theorem 5.25. We will now show a
very similar result, which gives a necessary condition for being a (local) maximum/minimum.

Theorem 8.65 (Necessary condition for an extreme point). Let G ⊂ Rd be open and f : G → R be
partially differentiable. If x0 ∈ G is a local extremum, then

∇f (x0 ) = 0.

A point x0 ∈ G such that ∇f (x0 ) = 0 is called stationary point of f .

This means that, if ∇f (x) 6= 0 for some point x, then this point cannot be an extremum.

Proof. We show that the statement is true if x0 is a local maximum. The same arguments can be used
if x0 is a local minimum. For this, we consider each entry of the gradient separately and use the known
results from the univariate setting. Fix i ∈ {1, . . . , d} and let U = Uε (x0 ) be a neighborhood of x0 such
that f (x0 ) ≥ f (x) for all x ∈ U .
Since x0 + tei ∈ U for t ∈ (−ε, ε), we have by assumption that the function g : (−ε, ε) → R,

g(t) := f (x + tei )

exist and is differentiable. (We use that the i-th partial derivative of f exists.) Moreover, again by
x0 + tei ∈ U , we see that
g(0) = f (x0 ) ≥ f (x0 + tei ) = g(t)
257

for any t ∈ (−ε, ε). Thus, g has a local maximum at 0 implying that

g 0 (0) = 0,
∂f
see Theorem 5.25. Since g 0 (0) = ∂xi (x0 ) and the above holds for all 1 ≤ i ≤ d the result follows.

Let us see how we can use this result to determine local extrema.
2 2
Example 8.66. We consider the function f (x) = e−x1 −2x2 , x = (x1 , x2 ), from Example 8.62. Computing
the gradient we see that
2 2 2 2

∇f (x) = −2x1 e−x1 −2x2 , −4x2 e−x1 −2x2 = −2f (x) · (x1 , 2x2 ).

2 2
Since f (x) = e−x1 −2x2 6= 0 for any x ∈ R2 , we see that

∇f (x) = 0 ⇐⇒ x = 0.

Hence, the only possible local extremum of f is at x0 = 0. Although we already know for this function
that x0 = 0 is a maximum, we still need a systematic way for general functions to verify if a stationary
point is indeed a maximum or a minimum.

We already mentioned that Theorem 8.65 is only a necessary condition for x0 to be an extremum.
Unfortunately, this is not a sufficient condition as the next example shows.

Example 8.67. We want to determine all extrema of f : R2 → R,

f (x1 , x2 ) = x21 · x2 .

Figure 52: Plot of f (x1 , x2 ) = x21 x2 .

To do so we compute
∇f (x) = (2x1 x2 , x21 ),
implying that ∇f (x) = 0 ⇐⇒ x1 = 0. Note that x2 can be chosen arbitrary. Therefore, all points in the
set {(0, x2 ) : x2 ∈ R} are stationary points and they are the only candidates for local extrema. However,
we have f (0, x2 ) = 0 for every x2 and therefore, as indicated by the plot, none of these points is a global
extremum as the the function attains smaller and larger values than 0. Regarding local extrema, note
that f (x1 , x2 ) ≥ 0 whenever x2 ≥ 0 and f (x1 , x2 ) ≤ 0 whenever x2 ≤ 0. Therefore, if x2 > 0, then there
is a neighborhood Uε (x0 ) around x0 = (0, x2 ) such that f (x) ≥ 0 = f (x0 ) in Uε (x0 ). (One can choose
258

ε = x2 .) This shows that x0 is a local minimum. In the same way, we obtain that every x0 = (0, x2 ) with
x2 < 0 is a local maximum.
It remains to check the point x0 = (0, 0). For this not that, for every ε > 0, we have f (ε, ε) > 0 and
f (ε, −ε) < 0. Therefore, every neighborhood of (0, 0) contains points with smaller and larger function
values, which shows that (0, 0) is not a (local) extremum, although ∇f (0, 0) = 0.

We now turn to a method, based on derivatives, to verify if a function has a maximum or a minimum,
or no extremum at all. This method is, similarly to the univariate case, called the second partial
derivative test, see Theorem 5.29. Recall that in the univariate case we used positivity/negativity of
the second derivative to decide if a stationary point is a minimum/maximum. However, since the second
derivative of a multivariate function is represented by a matrix, i.e., the Hessian matrix, we first need a
notion of positivity of a matrix.

Definition 8.68. Let A ∈ Rd×d be a symmetric matrix. We call A ...

• positive definite if
v T Av > 0 for all v ∈ Rd \ {0}.

• positive semi-definite if
v T Av ≥ 0 for all v ∈ Rd .

• negative definite if
v T Av < 0 for all v ∈ Rd \ {0}.

• negative semi-definite if
v T Av ≤ 0 for all v ∈ Rd .

If A is neither positive semi-definite nor negative semi-definite, then A is called indefinite.

Remark 8.69. Clearly, A is positive definite if and only if −A is negative definite.

Remark 8.70. One time-saving argument to verify that a matrix is indefinite is that is has entries on
the diagonal with different signs. To see this, one just needs to consider the unit vectors ei . Using
that eTi Aei = Aii , i.e., the i-th diagonal entry of A, we see that different signs on the diagonal of A imply
that there are i, j such that eTi Aei < 0 and eTj Aej > 0, which makes the matrix indefinite. However, the
converse is not true: A matrix with all entries of the same sign is not always definite.

1 3
Example 8.71. Consider the matrix A = . We see that
3 4

T v1 + 3v2
v Av = (v1 , v2 ) = v12 + 4v22 + 6v1 v2 .
3v1 + 4v2

1 −2
This expression is positive, e.g., for v = , but negative, e.g., for v = . Hence, A is indefinite.
0 1
259

2 1
Example 8.72. Consider the matrix A = . We see that
1 1

2v1 + v2
v T Av = (v1 , v2 ) = 2v12 + v22 + 2v1 v2 .
v1 + v2

This expression is positive for all v ∈ Rd \ {0}, and hence the matrix is positive definite. However, it is
not clear how to verify this precisely. We now introduce a direct method. (Still, try to verify it directly.)

Determining if a matrix is pos./neg. definite can be quite time-consuming, and is in most cases not
straight-forward. We therefore present an easy method based on the determinant of a matrix, see Sec-
tion 2.4. This is also the fastest method available, at least for small matrices. Note that this method
does not allow for determining semi- or indefiniteness.

Lemma 8.73 (Sylvester’s criterion). Let A = (aij )di,j=1 ∈ Rd×d be a symmetric matrix, and let
k k×k
Ak = (aij )i,j=1 ∈ R be the (upper left) submatrices of A. Then, A is ...
• positive definite if and only if det(Ak ) > 0 for all k = 1, . . . , d.
• negative definite if and only if det(Ak ) > 0 for even k and det(Ak ) < 0 for odd k.

A proof of this result is out of reach with our present knowledge and can be found in the literature.

Example 8.74. The matrix

2 1
A=
1 1
from Example 8.72 is positive definite. Using Sylvester’s method, with A1 = 2 and A2 = A, we see that
det(A1 ) = 2 > 0 and det(A2 ) = 1 > 0.

Example 8.75. Consider the matrices

1 1 0 1
A= and B= .
3 4 1 0

Since, det(A1 ) = 1, det(A2 ) = det(A) = −2 and det(B1 ) = 0, we observe from Sylvester’s criterion that
A and B both cannot be positive or negative definite.
Indeed, we know from Example 8.71 that A is indefinite. For B, we see that v T Bv = (v1 , v2 )( vv21 ) = 2v1 v2
−1
1

is positive for v = ( 1 ), and negative for v = 1 , proving that B is indefinite.

Example 8.76. Consider the (symmetric) matrices

   
1 1 3 −1 1 −1
A = 1 −1 4 and B= 1 −2 0 .
3 4 2 −1 0 −3

First, note that A has entries on the diagonal with different signs, and is therefore indefinite, see Re-
mark 8.70. (As det(A1 ) = 1 and det(A2 ) = 0, also Sylvester’s criterion is inconclusive.)
For B, we observe that det(B1 ) = −1 < 0, det(B2 ) = 1 > 0 and det(B3 ) = det(B) = −1 < 0. By
Sylvester’s criterion, that A and B both cannot be positive- or negative-definite.
(A proof without Sylvester’s criterion would be a mess!)
260

Based on the notion of definiteness, we can finally give a sufficient condition for being an extremum.

Theorem 8.77 (Second (partial) derivative test). Let f : G → R be twice continuously differentiable
and let x0 ∈ G such that ∇f (x0 ) = 0. Then, we have
1) Hf (x0 ) is positive-definite =⇒ x0 is a strict local minimum
2) Hf (x0 ) is negative-definite =⇒ x0 is a strict local maximum
3) Hf (x0 ) is indefinite =⇒ x0 is not an extremum of f
In all other cases (i.e., semi-definite but not definite), we do not gain information from the second
derivative test.

Remark 8.78. We actually show below that points x0 ∈ G such that ∇f (x0 ) = 0 and Hf (x0 ) indefinite
are actually points such that for every ε > 0 there exist x, y ∈ Uε (x0 ) with f (x) > f (x0 ) and f (y) < f (x0 ).
That is, in every neighborhood there exist strictly smaller and larger function values. Such points, which
are clearly no extrema, are usually called saddle points.

Proof of Theorem 8.77. We consider the first case, i.e., that Hf (x0 ) is positive-definite. First, note that
there is some α > 0 such that v T Hf (x0 )v ≥ α for all v ∈ Sd−1 , i.e., all v with v T v = 1. (This can be
proven by using that v T Hf (x0 )v must attain its minimum on Sd−1 .) Now note that, by continuity of all
second-order partial derivatives, there is some ε > 0, such that
2
∂2f

∂ f α

∂xi ∂xj (x) − (x 0 <
)
∂xi ∂xj 2d
2 2
for all x ∈ Uε (x0 ) and all i, j = 1 . . . , d. That is, ∂x∂i ∂x
f
j
(x) is ’close’ to ∂x∂i ∂x f
j
(x0 ) in a small neighborhood
around x0 . From this, we obtain that

d X d 2 2
X

T
∂ f ∂ f
v Hf (x) − Hf (x0 ) v = (x) − (x0 ) · vi vj

i=1 j=1 ∂x i ∂x j ∂x i ∂x j
d X d
∂2f ∂2f
X

≤
∂xi ∂xj (x) − (x 0 · |vi ||vj |
)
i=1 j=1
∂xi ∂xj
d
! d 
α X X
< |vi |  |vj |
2d i=1 j=1
α α
≤ kvk2 = ,
2 2
see Lemma 8.8. This implies

v T Hf (x)v = v T Hf (x0 )v + v T Hf (x) − Hf (x0 ) v

≥ v T Hf (x0 )v − v T Hf (x) − Hf (x0 ) v

α α
> α− = >0
2 2
for all x ∈ Uε (x0 ) and v ∈ Sd−1 . That is, Hf (x) is also positive-definite in a neighborhood of x.
We now fix some v ∈ Sd−1 and consider the univariate function g(t) := f (x0 + tv) − f (x0 ). We see that
g(0) = 0 and g 0 (0) = Dv f (x0 ) = h∇f (x0 ), vi = 0, by assumption. Taylor’s theorem (Theorem 5.53) now
shows that
g 00 (ξ) 2 g 00 (ξ) 2
g(t) = g(0) + g 0 (0) · t + ·t = ·t
2 2
261

for some ξ ∈ (0, t). From Theorem 8.53, we obtain that g 00 (ξ) = Dv2 f (x0 + ξv) = v T Hf (x0 + ξv)v. In
particular, positive-definiteness implies that g 00 (ξ) > 0 for all ξ ∈ [0, ε), independent of v.
Since t ∈ (0, ε) implies ξ ∈ (0, ε), we have g(t) > 0, i.e., f (x0 + tv) > f (x0 ), for all t ∈ (0, ε), independent
of v. In other words, f (x) > f (x0 ) for all x ∈ Uε (x0 ) \ {x0 }, which proves the claim.
The case of Hf (x0 ) negative-definite follows from considering −f instead. (Note that H−f (x0 ) is positive-
definite then.)
Finally, if H(x0 ) is indefinite, then there exist some u, v ∈ Sd−1 such that
v T Hf v > 0 and uT Hf u < 0,
again in a neighborhood of x0 . Thus, the (univariate) function g(t) = f (x0 + tv) − f (x0 ) is positive, and
the function h(t) = f (x0 + tu) − f (x0 ) is negative, for every small enough t > 0. In other words, every
neighborhood of x0 contains points with smaller and larger function value, respectively, which shows that
f cannot have an extremum at x0 .

Now we want to use the above result to calculate some extrema.

Example 8.79. We have a look at the function

2 0 x1
f (x1 , x2 ) = 2x21 + 3x32 = (x1 , x2 ) .
0 3 x2
Thus
4x1 4 0 2 0
∇f (x1 , x2 ) = and Hf (x1 , x2 ) = =2 .
6x2 0 6 0 3
The only possible candidate for a local extremum is x0 = (0, 0). Now, note that H := Hf (0, 0) satisfies
v1
v T Hv = (v1 , v2 ) 4v
2 2
6v2 = 4v1 + 6v2 , which is clearly positive whenever v = ( v2 ) 6= 0. This can also be
1

seen by Sylvester’s criterion (Lemma 8.73). Hence, Hf (0, 0) is positive definite, and x0 is a strict local
minimum.

Example 8.80. The previous example can be generalized by considering functions of the form
f (x) = xT Cx,
where C is a symmetric matrix, see also Example 8.39. Moreover, we already showed there that ∇f (x) =
2Cx, which implies that all possible extrema have to satisfy Cx = 0. Hence, we always have that x0 = 0
is a stationary point and that f (0) = 0. Moreover, we obtain that Hf (x) = 2C for any x, i.e., Hf (x) is
independent of x.
If C is positive-definite, then C is an invertible matrix (We prove that later.), which implies that 0 is
the only stationary point. Moreover, by positive-definiteness of the Hessian Hf = 2C, we obtain from
Theorem 8.77 that x0 = 0 is a strict local minimum. (This could also be seen from xT Cx > 0 for every
x 6= 0.) If C is negative-definite it follows analogously that 0 is a strict local maximum and that there
are no other extrema. If C is indefinite and invertible there are no extrema at all.
Finally, if C is ’only’ positive/negative semi-definite, or, more general, not invertible, then we cannot say
anything just by using the second derivative test.

In the special case d = 2, we can employ Sylvester’s criterion to obtain a very useful formulation of the
second derivative test.

Corollary 8.81 (Second derivative test for d = 2). Let G ⊂ R2 , f : G → R be twice continuously
differentiable and let x0 ∈ G such that ∇f (x0 ) = 0.
Moreover, let H := Hf (x0 ) be the Hessian of f at x0 with upper left entry H11 . Then, we have
1) det(H) > 0 and H11 > 0 =⇒ x0 is a strict local minimum
2) det(H) > 0 and H11 < 0 =⇒ x0 is a strict local maximum
3) det(H) < 0 =⇒ x0 is not an extremum of f
If det(H) = 0, we do not gain information from the second derivative test.
262

Again, we omit the proof. (The third point cannot be proven here.)

Example 8.82. With this we can now easily verify that

2 2
f (x) := e−(x1 +2x2 )

with x = (x1 , x2 ) from Example 8.62 has a (global) maximum at x0 = (0, 0). We already know from
Example 8.66 that ∇f (x) = −2f (x)(x1 , 2x2 ), and that this implies that x0 = (0, 0) is the only stationary
point of f . Computing the Hessian matrix of f we obtain
2
4x1 − 2 8x1 x2
Hf (x) = f (x) ·
8x1 x2 16x22 − 4

and therefore Hf (0) = −2 0

0 −4 . Since det(Hf (0)) = 8 > 0 and H11 = −2 < 0 (where we use the
notation from Corollary 8.81), we see that x0 = (0, 0) is a strict local maximum.

Example 8.83. Consider again the function f : R2 → R,

f (x1 , x2 ) = x21 · x2 ,

from Example 8.67. Since ∇f (x) = (2x1 x2 , x21 ), we saw that all points in the set {(0, x2 ) : x2 ∈ R} are
stationary points, and we also verified which of them are extrema.
Moreover, we easily obtain the Hessian

2x2 2x1
Hf (x1 , x2 ) = ,
2x1 0

and therefore Hf (0, x2 ) = 2x0 2 00 . This matrix has det(Hf (0, x2 )) = 0 for every x2 ∈ R. Therefore, the

second derivative test does not lead to an answer whether some of the stationary points are extrema or
not.

Example 8.84. We want to compute the extrema of f : R2 → R with

f (x1 , x2 ) = sin(x1 ) cos(x2 ).

Figure 53: 3d and contour plot of f (x1 , x2 ) = sin(x1 ) cos(x2 ).

The gradient of f is
cos(x1 ) cos(x2 )
∇f (x) =
− sin(x1 ) sin(x2 )
and the Hessian, which we already computed in Example 8.56, is

sin(x1 ) cos(x2 ) cos(x1 ) sin(x2 )
Hf (x) = (−1) .
cos(x1 ) sin(x2 ) sin(x1 ) cos(x2 )
263

Let us compute all stationary points of f , i.e. all x = (x1 , x2 ) such that ∇f (x) = 0. Since sin(t) and
cos(t) cannot be zero at the same time (i.e., for the same t), we obtain that ∇f (x) = 0 if either

cos x1 = 0 and sin x2 = 0

or
sin x1 = 0 and cos x2 = 0.
We start with the first case, i.e. cos x1 = 0 and sin x2 = 0, which implies that

π (2k1 + 1)π
x1 = k1 π + = and x2 = k2 π,
2 2
for some k1 , k2 ∈ Z. (For example, x1 = π2 and x2 = 0.)
Plugging this into the Hessian, we see that for such an x = (x1 , x2 ) we have

(−1)k1 +k2

0
Hf (x) = (−1) .
0 (−1)k1 +k2

(Make sure that you understand the basic properties of cos/sin that lead to this.)
This shows Hf (x) = −1 0
10
0 −1 if k1 + k2 is even, and that Hf (x) = ( 0 1 ) if k1 + k2 is odd.
Therefore, (
π maximum, if k1 + k2 is even,
x0 = k1 π + , k2 π is a strict local
2 minimum, if k1 + k2 is odd.
(Verify that the corresponding matrices are pos./neg. definite!)
For example, the point ( 25π
2 , 42π) is a strict local maximum (k1 = 12, k2 = 42).
If we consider the second case, i.e., sin x1 = 0 and cos x2 = 0, we need that

π (2k2 + 1)π
x1 = k1 π and x2 = k2 π + = ,
2 2
for some k1 , k2 ∈ Z. (For example, x1 = 0 and x2 = π2 .) Note that the contour plot above already shows,
that there cannot be an extremum at such points, because every point on these ’lines’ has points with
smaller and larger function value around it.
However, to prove this, plug these (x1 , x2 ) into the Hessian, to obtain

(−1)k1 +k2

0
Hf (x) = (−1) .
(−1)k1 +k2 0
0 −1

This shows that either Hf (x) = ( 01 10 ) or Hf (x) = −1 0 , but both matrices have an eigenvalue 1 and
an eigenvalue −1, making Hf (x) indefinite. Therefore, all points x0 = k1 π, k2 π + π2 with k1 , k2 ∈ Z are
no extrema. (They are actually saddle points, see Remark 8.78.)

8.4.1 Extrema subject to constraints

The results above allow to find extrema of functions which are defined on open sets, like G = Rd , and
to verify if these extreme are minima or maxima. However, in many applications we are interested in
extrema which are contained in some given (closed) set Ω ⊂ G. For this, we have to consider the boundary
of the set separately. Recall from the univariate case that a function defined on a closed interval can
have a minimum/maximum at the boundary points. For example, the function f (t) = 2t on [1, 2] has a
minimum at 1 and a maximum at 2, but the derivative of f is nowhere zero. These boundary points are
easy to check, but for multivariate functions the boundary is more complex.
In what follows, we consider only functions defined on G = Rd and we want to find its extrema in sets
that are given by
Ω = {x ∈ Rd : g(x) ≤ c}.
264

for some function g : Rd → R and c ∈ R. For the sake of simplicity, we assume here that the function g
is a continuously differentiable function. From this we obtain, in particular, that the interior

Ωo := {x ∈ Rd : g(x) < c}

of the set Ω is an open set. We can therefore find all extrema of a function inside Ωo by the techniques
from above. It remains to consider the boundary of Ω, i.e.,

∂Ω := {x ∈ Rd : g(x) = c}

Extrema of a function on the boundary ∂Ω, which are defined in the same way as in Definition 8.60 with
Ω replaced by ∂Ω, are called extrema subject to the constraint g(x) = c.
Before we come to a more systematic way of finding the extrema of a function, let us note that the
equation g(x) = g(x1 , . . . , xd ) = c does sometimes lead to an ’easy’ restriction of just one coordinate,
which we may just plug into our original problem to find the extrema on ∂Ω. That is, g(x1 , . . . , xd ) = c
can sometimes be written as xd = h(x1 , . . . , xd−1 ) for some function h : Rd−1 → R. In this case, finding
an extrema of f of ∂Ω is just the same as finding an extrema of

F (x1 , . . . , xd−1 ) := f x1 , . . . , xd−1 , h(x1 , . . . , xd−1 )

on Rd−1 . That is, the restriction just reduces the dimension by one. Let us see an example.

Example 8.85. We want to compute the extrema of the function

f (x1 , x2 ) = x21 + x22

in the set Ω = {(x1 , x2 ) ∈ Rd : x1 + x2 ≥ 2}. In the notation from above, we have

Ω = {x ∈ R2 : g(x) ≤ c} with g(x1 , x2 ) = −x1 − x2 and c = −2.

To find the extrema of f in Ω we first compute the stationary points of f on Rd , i.e., all points such that
∇f vanishes. We already saw that x = (0, 0) is the only stationary point of f . However, (0, 0) is not
contained in Ω, because it satisfies g(0, 0) = 0 6≤ −2, and therefore cannot be an extremum of f in Ω.
Hence, there are no (local) extrema in Ωo .
To treat the boundary, i.e., ∂Ω = {x ∈ R2 : g(x) = c}, note that

g(x1 , x2 ) = −2 ⇐⇒ x2 = 2 − x1 .

Therefore, every point (x1 , x2 ) with g(x1 , x2 ) = −2 is of the form (x1 , 2 − x1 ). To find an extremum of
f is therefore the same as finding an extremum of the (univariate) function

F (x1 ) := f (x1 , 2 − x1 ) = x21 + (2 − x1 )2 = 2x21 − 4x1 + 4.

We see that this function has a minimum at x1 = 1. Since x2 = 2 − x1 , we obtain that f has a minimum
subject to the constraint x1 + x2 = 2 at the point (1, 1). By geometric reasoning, we see that (1, 1) is a
global minimum of f in Ω, and that f has no maximum. (Make a plot!)

The last example shows that it is sometimes easy to incorporate the constraint and then find the extrema
on the boundary. However, this is clearly not always the case, and we need a way to compute extrema
subject to a constraint g(x) = c systematically. This is done by the method of Lagrange multipliers.
For this, we define the Lagrange function

L(x, λ) := f (x) + λ · g(x) − c ,

where the number λ ∈ R is called the Lagrange multiplier. Note that the function L : Rd+1 → R now
depends on d + 1 variables, namely the d ’original’ variables and λ.
265

It turns out that finding extrema subject to constraints is just the same as finding the extrema of the
Lagrange function L. For this we need to compute the gradient, i.e., all partial derivatives, of L and find
points where it is zero. Note that the gradient of L(x, λ) is given by

∂L ∂L ∂L
∇L(x, λ) = ,..., , (x, λ)
∂x1 ∂xd ∂λ
and that the partial derivative w.r.t. λ is just
∂L(x, λ)
= g(x) − c.
∂λ
Therefore, setting this partial derivative to zero is equivalent to g(x) = c, which is precisely the constraint.
This partial derivative is therefore not of much interest and we mostly need only the first entries of the
gradient, which we denote by
∂L ∂L
∇x L := ,...,
∂x1 ∂xd
and, by the definition of the Lagrange function, we see that

∇x L(x, λ) = ∇f (x) + λ · ∇g(x).

To compute ∇x L it is therefore enough to compute the gradients of f and g.

It remains to show that this is useful to find extrema of f under a constraint g(x) = c. This is based on
the following theorem, which is again only a necessary condition.

Theorem 8.86 (Necessary condition). Let f, g : Rd → R be continuously differentiable functions,

and x0 be a local extremum of f subject to the constraint g(x) = c for some c ∈ R. Then, if
∇g(x0 ) 6= 0, there exists a constant λ such that

∇f (x0 ) = −λ∇g(x0 ).

The equation from this theorem means that the gradients of f and g at the point x0 are parallel, i.e.,
they show in the same or the opposite directions. The additional constant λ is necessary, because the
gradients might be of different length.

A formal proof of Theorem 8.86 is out of reach at the moment, as it is based on the (rather involved)
implicit function theorem that we don’t discuss here. However, we can give a geometric explanation of
this necessary condition, see Figure 54.

Figure 54: Contour plot of a function and a constraint with their gradients.

Note that the gradient of a function f at x0 is always perpendicular to the ’surface’ {x : f (x) = f (x0 )},
i.e., the level set at height f (x0 ). Therefore, if f has an extremum, say a maximum, subject to g(x) = c
266

at x0 , then the level set {x : f (x) = f (x0 )} ’touches’ the set {x : g(x) = c} in a single point, implying
that the gradients of f and g at x0 are parallel. If this would not be fulfilled, then one could ’wander’
along {x : g(x) = c} and increase the function value, which contradicts that x0 is a maximum.

Theorem 8.86 implies that for each extrema at x0 subject to a constraint g(x) = c, with ∇g(x0 ) 6= 0,
there is some λ such that ∇f (x0 ) + λ∇g(x0 ) = 0. That is, if x0 is a (local) extremum subject to g(x) = c,
then there is some λ such that
∇L(x0 , λ) = 0.
It is therefore necessary to find all stationary points of L. Moreover, the above theorem makes no state-
ment about points with ∇g(x) = 0, and they have to be considered separately.
In summary, to find all extrema of f subject to a constraint g(x) = c we need to consider all critical
points of L, which are
1. all points x0 with ∇g(x0 ) = 0 and g(x0 ) = c, and
2. all points x0 with ∇g(x0 ) 6= 0 and ∇L(x0 , λ) = 0 for some λ.

Note that, as in the unconstrained optimization, not each of these points is an extremum, and one needs
a different reasoning to verify if they are (local) minima or maxima.

Remark 8.87. There is also a variant of the second derivative test for constraint optimization. This
method is based on the so-called bordered Hessian matrix, which is the Hessian of L. Unfortunately, it is
not as simple as before (by verifying positive/negative-definiteness) to determine the type of an extremum
subject to constraints. We do not discuss the details here.

Let us consider some examples.

Example 8.88. We consider again the function from Example 8.85. That is, we want to find the extrema
of f (x1 , x2 ) = x21 + x22 subject to the constraint g(x1 , x2 ) := x1 + x2 = 2, see Figure 55. (For sake of
notation, we use a different g here than in Example 8.85.)
First, we check if the constraint has a vanishing gradient by computing

∇g(x) = (1, 1).

Clearly, this is never zero, and therefore all critical points are points x = (x1 , x2 ) where the gradient of
the Lagrange function

L(x, λ) = f (x) + λ(g(x) − c) = x21 + x22 + λ(x1 + x2 − 2)

is zero for some λ. Differentiation yields

∇x L(x, λ) = 2x1 + λ, 2x2 + λ

and
∂L(x, λ)
= x1 + x2 − 2.
∂λ
Setting ∇x L(x, λ) = 0 we see that the equation is solved by x1 = x2 = − λ2 .
We finally need to find λ such that this point satisfies the constraint g(x) = x1 + x2 = 2. Clearly, this
leads to λ = −2, and therefore to the unique critical point (1, 1), as already shown in Example 8.85. This
point is clearly a minimum, see Figure 55.
267

Figure 55: Extrema of f (x1 , x2 ) = x21 + x22 subject to x1 + x2 = 2.

Example 8.89. We want to determine all extrema of f : R2 → R,

f (x1 , x2 ) = x21 · x2 subject to x21 + x22 ≤ 3.

Recall from Example 8.67 that all points from {(0, x2 ) : x2 ∈ R} are stationary points of f and that all of
them except (0, 0) are local extrema. However, all of these points satisfy f (x) = 0 and are therefore clearly
no global extrema. As f is unbounded it actually does not have global extrema (without constraints).
We now consider the bounded and closed domain Ω := {x : x21 + x22 ≤ 3}, see Figure 56. The function
clearly has global minima and maxima in Ω, and they are not in the interior Ωo , since all stationary
points have function value 0. (However, they are still local extrema in Ω.) Therefore, the global extrema
are on the boundary ∂Ω = {x : x21 + x22 = 3}, and to find them, we need to find the extrema of f subject
to g(x) = x21 + x22 = 3.

Figure 56: Plot of f (x1 , x2 ) = x21 x2 subject to x21 + x22 = 3.

To do so we compute
∇f (x) = (2x1 x2 , x21 )
and, for g(x) = x21 + x22 ,
∇g(x) = (2x1 , 2x2 ).
First of all, ∇g(x) = 0 if and only if x = (0, 0), but x = (0, 0) does not satisfy g(x) = 3, i.e., x ∈
/ ∂Ω, and
can therefore be ignored. Now we consider the Lagrange function

L(x, λ) = f (x) + λ(g(x) − 3) = x21 x2 + λ(x21 + x22 − 3),

268

which has the gradient

∇x L(x, λ) = 2x1 x2 + 2λx1 , x21 + 2λx2 .

First note that λ = 0 corresponds to the (local) extrema of f without constraints, because ∇x L(x, 0) =
∇f (x). Therefore, solving the equation ∇x L(x,
√ 0) = 0 gives the set√ of solutions {(0, x2 ) : x2 ∈ R}, and
the only points of this set on ∂Ω are P1 = (0, 3) and P2 = (0, − 3). As in Example 8.67, we see that
P1 is a local maximum and P2 is a local minimum with f (P1 ) = f (P2 ) = 0, see also Figure 56.
Let us consider the general equation ∇x L(x, λ) = 0, i.e., the system of equations
2x1 x2 + 2λx1 = 0,
x21 + 2λx2 = 0.
We see that the first equation reads 2x1 (x2 + λ) = 0, which is satisfied if either x1 = 0 or x2 + λ = 0.
Since x1 = 0 only leads to local extrema (see above), we consider the second case, which implies x√2 = −λ.
Putting this into the second equation gives x21 − 2λ2 = 0, i.e., x21 = 2λ2 and therefore x1 = ± 2 · |λ|.
It remains to find suitable λ by putting into the constraint x21 + x22 =√2λ2 + (−λ)2 = 3. This√ gives the
solutions λ1 = 1 and λ2 = −1.√ For λ = 1 we obtain
√ the points P 3 = ( 2, −1) and P 4 = (− 2, −1), and
for λ = −1 we obtain P5 = ( 2, 1) and P6 = (− 2, 1).
Finally, by computing the function values f (P3 ) = f (P4 ) = −2 and f (P5 ) = f (P6 ) = 2, we see that f
has global minima in Ω = {x : x21 + x22 ≤ 3} at P3 and P4 , and global maxima in Ω at P5 and P6 .

Example 8.90. Let us finally consider again our initial example from Example 8.62, i.e.,
2 2
f (x) := e−(x1 +2x2 )
with x = (x1 , x2 ) on the set Ω := {x : kxk ≤ 1} = {(x1 , x2 ) : x21 + x22 ≤ 1}.
We already know that f has a global maximum at (0, 0) with f (0) = 1, see Example 8.82, and no other
stationary point. It therefore has a global minimum at the boundary ∂Ω = {x : kxk = 1}. To find the
minima, first note that ∇g only vanishes at (0, 0), which is not in ∂Ω, and can be ignored. Now consider
the Lagrange function
2 2
L(x, λ) = e−(x1 +2x2 ) + λ(x21 + x22 − 1)
such that

∇x L(x, λ) = −2x1 f (x) + 2λx1 , −4x2 f (x) + 2λx2

= −2x1 (f (x) + λ), −2x2 (2f (x) + λ) .
So, ∇L(x, λ) = 0 holds if either x1 = 0 and x2 = ±1, or x2 = 0 and x1 = ±1. (Verify why there are no
other possibilities!) Since f (0, ±1) = e−2 and f (±1, 0) = e−1 , we see that f has a global minimum in Ω
at x0 = (0, ±1).

8.5 Differential calculus for vector-valued functions

So far we mostly considered functions of the form f : G → R. However, as mentioned in the very beginning
of this chapter, there are also the so called vector valued functions, e.g. vector fields, which often appear
in physics and data analysis. These functions map vectors of Rd to vectors in Rm , where we assume
m > 1. From here on we will always use the notation
 
f1 (x)
 f2 (x) 
f (x) =  .  ,
 
 .. 
fm (x)

where for all 1 ≤ i ≤ m we have fi : Rd → R. This implies that for each component of f , i.e. for
each fi , we are able to use all results we proved so far. We will see that, it is often enough to study all
components, i.e. we can reduce our questions to problems which only contain real functions. Therefore
it makes sense to use the following definition.
269

Definition 8.91. Let G ⊂ Rd and f : G → Rm .

We call f continuous, differentiable, etc. if and only if for every i = 1, . . . , m we have that fi is
continuous, differentiable, etc., respectively.

Example 8.92. The function given by

f (x1 , x2 ) = (x1 cos x2 , x1 sin x2 )
consists of the components
f1 (x1 , x2 ) = x1 cos x2 and f2 (x1 , x2 ) = x1 sin x2 .
Both of these components are continuous making f a continuous function. This function is an example
of an so called vector field, as it maps R2 into itself. Such a function can be visualized by drawing a
vector at each point, see Figure 57.

Figure 57: Vector plot of f (x1 , x2 ) = (x1 cos x2 , x1 sin x2 )

Example 8.93. Another type of interesting vector-valued functions are representations of curves, which
are mappings h : R → Rm , i.e., the map a number to a point in Rm . One prominent example is the helix
h(t) = (r cos(2πt), r sin(2πt), t)
with radius r > 0, see Figure 58 This function is clearly continuous, since all components are continuous.

Figure 58: Plot of the curve h(t) = (2 cos(2πt), 2 sin(2πt), t)

270

The analysis of individual vector-valued function goes along the same lines than for real-valued (multi-
variate) functions. However, as there are now m different components that we need to keep track of, we
need some more definitions.
First, we discuss the partial derivatives for vector-valued functions. The only difference between real-
and vector-valued functions is that we need to consider all partial derivatives of every component fi of f ,
and we use again a matrix to collect them. (Recall that the partial derivatives of a real-valued function
were collected in the gradient, i.e., a 1 × d matrix aka. vector.) Nevertheless the computation of partial
derivatives is not harder in this setting as we will immediately see from the next definition.

Definition 8.94. Let G ⊂ Rd , f : G → Rm and x ∈ G.

If for any 1 ≤ j ≤ d and any 1 ≤ i ≤ m the partial derivative

∂fi fi (x + hej ) − fi (x)

(x) = lim
∂xj h→0 h

exists we call f partially differentiable at x.

If this is the case for any x ∈ G we simply say that f is partial differentiable.
Moreover, if f is partial differentiable at x ∈ G, we call
 ∂f1 ∂f1 ∂f1 
∂x1 (x) ∂x2 (x) ... ∂xd (x)
 ∂f ∂f2 ∂f2

 2 (x)
 ∂x1 ∂x2 (x) ... ∂xd (x) 

Jf (x) :=  .
 
 .. .. .. 
 . . 

 
∂fm ∂fm ∂fm
∂x1 (x) ∂x2 (x) ... ∂xd (x)

the Jacobian matrix or functional matrix of f at x.

We see that the i-th row of the Jacobian is given by the gradient of fi . We will illustrate this by the
following example.
Example 8.95. Let us compute the Jacobian of
f (x1 , x2 , x3 ) = (x1 x2 , ex3 ),
which is a function from R3 to R2 with f1 (x) = x1 x2 and f2 (x) = ex3 , where x = (x1 , x2 , x3 ). We see
that
∇f1 (x) = (x2 , x1 , 0) and ∇f2 (x) = (0, 0, ex3 ).
Therefore we see that the Jacobian is given by

x2 x1 0
J(x) = .
0 0 ex3

In fact, we already saw some vector fields and their Jacobian, although a bit hidden.
Example 8.96. Let f : G → R be a twice-partially differentiable function. (Note that f is not vector-
valued!) The gradient is then a mapping from G to Rd , as for every x ∈ G ⊂ Rd we have ∇f (x) ∈ Rd ,
i.e. gradients are always vector fields if they exist.
Now, since every component of the gradient is partially differentiable by assumption, we can compute
the Jacobian of ∇f (x), and we see that it is actually given by the Hessian of f , i.e.
J∇f (x) = Hf (x).
Note that under the additional assumption that f is twice-continuously differentiable, we even have that
the Jacobian of ∇f is symmetric. (Why is this the case?)
271

Similar to multivariate real functions it is not sufficient to only consider partial derivatives. Let us adapt
the definitions to this vector-valued case, which is quite straightforward.

Definition 8.97. Let G ⊂ Rd , f : G → Rm be a continuous function.

If there exists a linear mapping D : Rd → Rm and a mapping r : Rd → Rm such that

f (x + y) = f (x) + D(y) + r(y),

where r satisfies
r(y)
lim = 0,
y→0 kyk
then we call f differentiable at x. We call D = dfx the (total) derivative of f at x.

Remark 8.98. Recall that a linear mapping from Rd → Rm is always described by a matrix. Moreover,
kr(y)k
r(y) ∈ Rm is a vector and limy→0 r(y)
kyk = 0 if and only if limy→0 kyk = 0

We see that f is differentiable in the above sense if and only if all components fi : Rd → R are differen-
tiable, where 1 ≤ i ≤ m. This allows to use all results for real functions when considering vector-valued
ones. In particular, we are able to generalize the results which were used to connect partial derivatives and
differentiability, see Section 8.3.2. All the proof are basically the same, with the additional observation
that the Jacobian of a vector-valued function can be written as
 
∇f1 (x)
 ∇f2 (x) 
Jf (x) =  .
 
..
 . 
∇fm (x)

Let us summarize this in the following theorem, which we state without proof.

Theorem 8.99. Let G ⊂ Rd and f : G → Rm be differentiable at x ∈ G with derivative dfx . Then,

1) f is continuous at x,
2) all partial derivatives of all components fi exist at x, and
3) the (total) derivative of f is given by matrix multiplication with the Jacobian by
 
h∇f1 (x), yi
 h∇f2 (x), yi 
dfx (y) = Jf (x) · y =  .
 
..
 . 
h∇fm (x), yi

Moreover, if f : G → Rm is a mapping such that all partial derivatives of all components are contin-
uous at x ∈ G, then f is also differentiable at x.

Example 8.100. We consider the function

x1 x2
f (x1 , x2 ) = .
e−x1 −x2
The Jacobian is given by
x2 x1
J(x) = .
−e−x1 −x2 −e−x1 −x2
Clearly, all entries of the Jacobian are continuous functions, which shows that f is (totally) differentiable.
272

By the above theorem, we see that many computations related to the derivative of a vector-valued function
can be performed by computations of the (real-valued) component-functions fi . For example, we easily
see that d(f + g)x = dfx + dgx for all f, g : Rd → Rm that are differentiable at x, by using the statements
for each component individually.
In addition, using that the derivative is given by the Jacobi matrix, we obtain a particularly useful formula
for the derivative of the composition of functions. In fact, the derivative (i.e. Jacobi matrix) of the
composition g ◦ f is given by the matrix-product of the Jacobi matrices f and g (at appropriate points).
As in the univariate case, it is called the chain rule.

Theorem 8.101 (Chain rule). Let f : Rd → Rp be differentiable at x ∈ Rd and g : Rp → Rm be

differentiable at y = f (x) ∈ Rp . Then, their composition is also differentiable at x and

Jg◦f (x) = Jg (f (x)) · Jf (x).

In short, this can be written as Jg◦f = (Jg ◦ f ) · Jf .

Proof. To keep track of all the variables we use the following notation throughout this proof, A = Jf (x) ∈
Rp×d , B = Jg (y) ∈ Rm×p , and
f (x + ξ) = f (x) + Aξ + r(ξ)
g(y + η) = g(y) + Bη + s(η)

with ξ ∈ Rd and η ∈ Rp , where r(ξ) kξk → 0 as ξ → 0 and

s(η)
kηk → 0 as η → 0. We now choose η =
f (x + ξ) − f (x) = Aξ + r(ξ) and compute
(g ◦ f )(x + ξ) = g(f (x + ξ))
= g(f (x) + η)
= g(f (x)) + BAξ + Br(ξ) + s(Aξ + r(ξ)).
Now we define ϕ(ξ) = Br(ξ) + s(Aξ + r(ξ)) and if we show that
ϕ(ξ)
→ 0, as ξ → 0,
kξk
it follows that BA is the derivative of g ◦ f as required.
r(ξ)
We observe that the property kξk → 0 as ξ → 0 (and that B is continuous as linear mapping) implies
that also
Br(ξ) r(ξ)
=B →0 as ξ → 0.
kξk kξk
Now, note that
ks(Aξ + r(ξ))k ks(Aξ + r(ξ))k kAξ + r(ξ)k
= ·
kξk kAξ + r(ξ)k kξk
and since Aξ +r(ξ) → 0 for ξ → 0, we now that the first term on the right hand side goes to zero as ξ → 0.
It remains to prove that the second term is bounded. For this note that kr(ξ)k ≤ kξk by√assumption
and that kAξk ≤ CA kξk
√ √ for some constant CA < ∞. The latter follows from kAξk ≤ d kAξk∞ ≤
dd max(|ai,j |)kξk∞ ≤ dd max(|ai,j |)kξk2 .
We finally obtain
ks(Aξ + r(ξ))k ks(Aξ + r(ξ))k kAξk + kr(ξ)k ks(Aξ + r(ξ))k
≤ · ≤ (CA + 1) · → 0,
kξk kAξ + r(ξ)k kξk kAξ + r(ξ)k
ϕ(ξ)
as ξ → 0. All in all this shows that kξk → 0 as ξ → 0, which finishes the proof.
273

Example 8.102. We consider the functions f : R2 → R3 and g : R3 → R3 given by

 2   2
x1 y1
f (x) =  x22  and g(y) = y22  ,
x1 x2 y32
where x = (x1 , x2 ) and y = (y1 , y2 , y3 ).
The Jacobians are given by
   
2x1 0 2y1 0 0
Jf (x) =  0 2x2  and Jg (y) =  0 2y2 0 .
x2 x1 0 0 2y3
Note that in order to compute g ◦ f (x) and anything that is related to this function we have to set
y = f (x). Thus we obtain  2 
2x1 0 0
Jg (f (x)) =  0 2x22 0 
0 0 2x1 x2
and finally by the chain rule
2x21 4x31
     
0 0 2x1 0 0
Jg◦f (x) = Jg (f (x)) · Jf (x) =  0 2x22 0 · 0 2x2  =  0 4x32  .
0 0 2x1 x2 x2 x1 2x1 x22 2x21 x2
Having a look at
x41
 

(g ◦ f )(x) =  x42 
x21 x22
and computing the Jacobian of this function directly, we obtain exactly the same matrix.

Example 8.103. Now we have a look at

ex1 −x2
 

h(x) =  cos(sin(x1 )) 
x23 + sin(x1 )
This function can be written as g ◦ f , where
ey1
   
x1 − x2
f (x) =  x23  and g(y) =  cos y3 
sin x1 y2 + y3
with x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ). (There are many different choices for f and g. Try to find
some!)
If we compute the corresponding Jacobian matrices
 y   
e 1 0 0 1 −1 0
Jg (y) =  0 0 − sin y3  and Jf (x) =  0 0 2x3  ,
0 1 1 cos x1 0 0
we obtain
ex1 −x2 −ex1 −x2
 
0
Jg◦f (x) = Jg (f (x)) · Jf (x) = − sin(sin(x1 )) cos(x1 )
 0 0 .
cos x1 0 2x3
Of course we would also be able to compute this Jacobian directly, but this example suggests a quite
useful method one can use. We want to compute the Jacobian of a complicated function, in this case h.
To do so we rewrite it as the composition of two easier functions f, g and use the chainrule to compute
Jg◦f . This may lead to more computations and a matrix multiplication, but we only have to compute
very easy derivatives if we make a clever choice of f, g. Thus the problem may become easier in some
cases as many derivatives can be written down immediately.
274

8.6 Taylor series

One of the main applications where one needs higher-order partial derivatives is the generalization of
Taylor’s theorem to the multidimensional case. This is a very commonly used technique in classical
and modern physics and chemistry, but also a large class of technical problems requires a method to
approximate functions. Here, we again only discuss real-valued functions f : R → R.
Before we start we want to introduce the commonly used multi-index notation, which will be needed
later on. From here on we assume that α ∈ Nd0 is a d-dimensional vector of natural numbers (or zeros).
For vectors of this kind we introduce the following quantities

|α| = α1 + α2 + · · · + αd
α! = α1 ! · α2 ! · · · αd !.

If f is |α|-times continuously differentiable then we define

∂ α1 ∂ α2 ∂ αd
Dα f (x) = D1α1 D2α2 · · · Ddαd f (x) = . . . f (x),
∂xα1 ∂x2
1 α2
∂xαd
d

i.e. we differentiate α1 times w.r.t. x1 , α2 times w.r.t. x2 and so on. Since f is |α|-times continuously
differentiable it follows from Theorem 8.58 that we could change the order of differentiation. Moreover,
we will use the following useful notation
αd
xα = xα1 α2
1 x2 · · · xd .

To show Taylor’s theorem we need the following auxiliary result.

Lemma 8.104. Let G ⊂ Rd be open, f : G → R be k-times continuously differentiable and let

x, y ∈ G such that for any t ∈ [0, 1] we have that x + ty ∈ G.
Then, g : [0, 1] → R with g(t) := f (x + ty) is k-times continuously differentiable and we have

dk g X k!
g (k) (t) = (t) = Dα f (x + ty) · y α .
dtk α!
|α|=k

Remark 8.105. This formula for the derivatives of g makes sense also at the endpoints of the interval
[0, 1], although we usually avoided to consider derivatives of the endpoints of the domain of the function.
For this note that, since G is open, and x, y ∈ G, we know that also small neighborhoods around x and
y are contained in G. In particular, there is some ε > 0 such that x + ty ∈ G for all t ∈ (−ε, 1 + ε). If
we now define the function g : (−ε, 1 + ε) → R, we can consider its derivatives also at t = 0 and t = 1.

Proof. We split the proof in two parts:

First part: We show by induction that
d
dk g X
(t) = Dik . . . Di1 f (x + ty) · yi1 · · · yik .
dtk i ,...,i =1
1 k

If k = 1 this is just an application of the definition of the total derivative. For this, write x̄ := x + ty and
note that we can write f (x + (t + h)y) = f (x̄ + hy) = f (x̄) + dx̄ f (hy) + r(hy), and so
g(t + h) − g(t) f (x + (t + h)y) − f (x + ty)
g 0 (t) = lim = lim = d(x+ty) f (y)
h→0 h h→0 h
d
X
= Di f (x + ty) · yi ,
i=1
275

see Definition 8.31 and Theorem 8.35 (with x replaced by x + ty). Now we assume that the statement is
true for k − 1, i.e.
d
dk−1 g X
(t) = Dik−1 . . . Di1 f (x + ty) · yi1 · · · yik−1 .
dtk−1 i ,...i =1 1 k−1

This function is still differentiable, since Dik−1 . . . Di1 f is so for every choice of i1 , . . . , ik−1 . By the same
computation than above, with f replaced by Dik−1 . . . Di1 f , we see

d
d dk−1 g d X
g (k) (t) = (t) = Dik−1 · · · Di1 f (x + ty) · yi1 · · · yik−1
dt dtk−1 dt i
1 ,···ik−1 =1

d
X d
= yi1 · · · yik−1 · Dik−1 · · · Di1 f (x + ty)
i1 ,...,ik−1 =1
dt
d
X d
X
= yi1 · · · yik−1 · Dj Dik−1 · · · Di1 f (x + ty) · yj
i1 ,...,ik−1 =1 j=1
d
X d
X
= Dj Dik−1 . . . Di1 f (x + ty) · yi1 · · · yik−1 · yj
i1 ,...ik−1 =1 j=1

Using the index ik instead of j the first part of the proof follows.
Second part: Now we have to show that
d
X X k!
Dik . . . Di1 f (x + ty)yi1 · · · yik = (Dα f )(x + ty)y α .
i1 ,...,ik =1
α!
|α|=k

For this, we need to count the number of different partial derivatives. Let α ∈ Nd0 with |α| = k. By
Schwarz’ theorem (Theorem 8.58) we can interchange all these derivatives, i.e., if ij appears αj times
(1 ≤ j ≤ k)

Di1 Di2 . . . Dik f (x + ty)yi1 · · · yik = D1α1 D2α2 . . . Dkαk f (x + ty)y1α1 y2α2 . . . ydαd ,

using the multi-index notation from above. Moreover, the number of tuples (i1 , . . . , ik ) such that ij
appears exactly αj times for 1 ≤ j ≤ d is α1 !α2k!!...αd ! . This proves the desired formula.

With the help of this result we are able to show the multivariate version of Taylor’s theorem.

Theorem 8.106 (Taylor’s theorem). Let G ⊂ Rd be open, f : G → R be (n + 1)-times continuously

differentiable and x, y ∈ G such that for any t ∈ [0, 1] the point x + ty is also contained in G. Then
there exists some θ ∈ (0, 1) such that
X Dα f (x) X Dα f (x + θy) α
f (x + y) = · yα + ·y .
α! α!
|α|≤n |α|=n+1

Proof. Let g(t) = f (x+ty), t ∈ [0, 1], which is (n+1)-times continuously differentiable, see Lemma 8.104.
Hence we are able to apply the univariate Taylor’s theorem, see Theorem 5.53, implying
n
X g (k) (0) g (n+1) (θ)
g(1) = + ,
k! (n + 1)!
k=1
276

for some θ ∈ (0, 1). (Here, we use the Taylor polynomial of order n at x0 = 0.)
Lemma 8.104 implies that for 1 ≤ k ≤ n
g (k) (0) X 1
= Dα f (x) · y α
k! α!
|α|=k

and
g (n+1) (θ) X 1 α
= D f (x + θy) · y α .
(n + 1)! α!
|α|=n+1

Observing that g(1) = f (x + y) the result follows.

A simple reformulation of the above result, using y = x − x0 for some fixed x0 , gives a statement similar
to the univariate Taylor’s theorem 5.53.

Corollary 8.107. Let G ⊂ Rd , f : G → R be (n + 1)-times continuously differentiable, x0 ∈ G and

let U = U (x0 ) be a neighborhood of x0 which is completely contained in G. Then, for any x ∈ U we
have the representation
X Dα f (x0 ) X Dα f (ξ)
f (x) = (x − x0 )α + (x − x0 )α
α! α!
|α|≤n |α|=n+1

for some ξ between x0 and x, i.e., ξ = x0 + θ(x − x0 ) for some θ ∈ (0, 1). We call
X Dα f (x0 )
Tn (x) := (x − x0 )α ,
α!
|α|≤n

the Taylor polynomial of f of order n (at x0 ).

The term
X Dα f (ξ)
Rn (x) := f (x) − Tn (x) = (x − x0 )α
α!
|α|=n+1

is called the remainder of the Taylor polynomial.

Note that ξ is not explicitly known here, and depends (somehow) on f .

Proof. Note that since x ∈ U there exists some y ∈ Rd with x = x0 + y and such that for any t ∈ [0, 1] we
have that x0 + ty ∈ U . An application of Theorem 8.106, with x replaced by x0 and y = x − x0 , yields
the result.

We now turn to error bounds in this approximation. Note that, in contrast to the explicit formula for
the remainder of Tn above, where we needed f to be (n + 1)-times continuously differentiable, we only
need that f is n-times continuously differentiable to obtain error bounds.

Corollary 8.108. Let G ⊂ Rd , f : G → R be n-times continuously differentiable, x0 ∈ G and let

U = U (x0 ) be a neighborhood of x0 which is completely contained in G.
Moreover, assume additionally that |Dα f (x)| ≤ M for some M < ∞, all α ∈ Nd0 with |α| = n, and
all x ∈ U (x0 ). Then for any x ∈ U (x0 ) we have

2M · dn n
|f (x) − Tn (x)| ≤ · kx − x0 k .
n!
277

Note that this upper bound might be very large (i.e., bad) for large d and small n, in which case we only
have reasonable bounds for x very close to x0 .

Proof. First of all, writing f as its order n − 1 Taylor polynomial, and regrouping, we obtain
X Dα f (x0 ) X Dα f (ξ)
f (x) = (x − x0 )α + (x − x0 )α
α! α!
|α|≤n−1 |α|=n
X Dα f (x0 ) X Dα f (ξ) − Dα f (x0 )
= (x − x0 )α + (x − x0 )α
α! α!
|α|≤n |α|=n
X Dα f (ξ) − Dα f (x0 )
= Tn (x) + (x − x0 )α .
α!
|α|=n

That is, f can also be written by its Taylor polynomial of order n, but with a different error/remainder.
To bound all the terms separately, we need that
d
Y d
Y αi n
(x − x0 )α = (xi − x0,i )αi ≤ kx − x0 k = kx − x0 k
i=1 i=1

for all α ∈ Nd0 with |α| = n. In addition, we need a special case of the multinomial theorem, i.e.,
X n!
= dn ,
α!
α∈Nd
0 : |α|=n

which follows from combinatorial arguments. (We omit details here.) We finally obtain

X α α

D f (ξ) − D f (x 0 ) α

|f (x) − Tn (x)| =
(x − x0 )
α!
|α|=n
n
n d
n
X 1
≤ 2M kx − x0 k ≤ 2M kx − x0 k .
α! n!
|α|=n

Example 8.109 (Second order approximation). We have a look at a twice continuously differentiable
function f : G → R, G ⊂ Rd , and assume that x0 = 0 and that all second-order partial derivatives of f
are bounded in G. We want to compute the Taylor representation of f for k = 2, which is formally given
by
X Dα f (0)
f (x) = · xα + R2 (x).
α!
|α|≤2

First of all, the only term with |α| = 0 if f (0). Considering the terms with |α| = 1, where any α contains
exactly one non-zero entry with a 1. Thus, all α we have to consider are given by the unit vectors and
α! = 1. Therefore we see that
d
X X Dα f (0)
T2 (x) = f (0) + Di f (0) · xi + xα .
i=1
α!
|α|=2

For |α| = 2 we see that any such vector can be obtained as ei + ej , where 1 ≤ i ≤ j ≤ d. If i = j then
α! = (2ei )! = 2, and if i < j then α! = (ei + ej )! = 1!. This shows that

X Dα f (0) d
1X 2 X
xα = Di f (0) · x2i + Di Dj f (0) · xi xj
α! 2 i=1 i<j
|α|=2
d
1X 2 1X
= Di f (0) · x2i + Di Dj f (0) · xi xj ,
2 i=1 2
i6=j
278

since all second-order partial derivatives are continuous, and can therefore be interchanged. It follows
that
1
T2 (x) = f (0) + ∇f (0) · x + xT Hf (0)x,
2
where we write ∇f (0) · x for h∇f (0), xi. (This makes sense since the gradient is a row vector.) In general,
i.e., with x0 6= 0, we have
1
T2 (x) = f (x0 ) + ∇f (x0 ) · (x − x0 ) + (x − x0 )T Hf (x0 )(x − x0 ).
2
Note that the error (for x0 = 0) can be estimated by
2
|f (x) − T2 (x)| ≤ M · d2 · kxk ,

if all second-order partial derivatives are bounded by M , see Corollary 8.108.

Example 8.110. We now want to use this general formula to calculate the second-order Taylor approx-
imation for
f (x1 , x2 , x3 ) = x1 x2 + ex3 ,
at x0 = (2, 1, 0). It is easy to compute that
 
0 1 0
∇f (x) = x2 , x1 , ex3

and Hf (x) = 1 0 0 .
0 0 ex3

Thus, f (x0 ) = 3,  
0 1 0
∇f (x0 ) = 1, 2, 1 and Hf (x0 ) = 1 0 0 .
0 0 1

The second-order Taylor approximation of f at x0 = (2, 1, 0) is therefore given by

1 2
T2 (x) = 3 + 1 · (x1 − 2) + 2 · (x2 − 1) + 1 · (x3 − 0) + (x1 − 2)(x2 − 1) + ·x
2 3

Moreover, note that all second-order partial derivatives are bounded by max{1, ex3 }. In particular, for
the special choice
G = {x ∈ R3 : kxk < r},
where r > 0, we have M := supx∈G |Dα f (x)| ≤ er for all α ∈ N30 with |α| = 2.
Therefore, we have the error bound
2
|f (x) − T2 (x)| ≤ 9er · kx − x0 k ,

according topCorollary 8.108. We see that the error of the approximation is smaller than ε > 0, if
kx − x0 k < 9ε e−r .

We already saw that being able to use higher order derivatives leads to a better approximation. This
suggests, like in the one dimensional case, to try to write certain functions as the limit of Taylor poly-
nomials. A necessary condition is then that one has to be able to compute arbitrary derivatives of the
function.

Definition 8.111. Let f : G → R be infinitely-often differentiable and let x0 ∈ G.

The formal series given by
X Dα f (x0 )
T f (x) = (x − x0 )α
d
α!
α∈N0

is called Taylor series of f at x0 .

279

Note that, a priori, we do not know if T f (x) converges, and even then, we do not know if T f (x) = f (x)
for some x ∈ G. In the same way as in the univariate case, we introduce some criteria that imply the
convergence, at least in a neighborhood of x0 .

Theorem 8.112. Let G ⊂ Rd be open, f : G → R be infinitely-often differentiable and x0 ∈ G. If

r > 0 is such that
rn
lim · max sup |Dα f (ξ)| = 0,
n→∞ n! α∈Nd0 : |α|=n ξ∈Ur (x0 )

and Ur (x0 ) = {x ∈ Rd : kx − x0 k < r} is completely contained in G.

Then, f can be written by its Taylor series
X Dα f (x0 )
f (x) = (x − x0 )α for all x ∈ Ur (x0 ).
d
α!
α∈N0

Proof. We have a look at the Taylor polynomials, which were given by

X Dα f (x0 )
Tn (x) = (x − x0 )α .
α!
|α|≤n

We know from Corollary 8.108 that we have the error bound

Example 8.113. Let f (x1 , x2 ) = ex1 +x2 . We want to compute the Taylor series of this function for
x0 = 0. The partial derivatives of f are
∂f (x) ∂f (x)
= ex1 +x2 = .
∂x1 ∂x2
Thus, all partial derivatives are given by f itself, which implies Dα f (x0 ) = f (x0 ) = 1 for all α ∈ Nd0 . We
obtain the Taylor series
X Dα f (0) X xα
T f (x) = xα = .
2
α! 2
α!
|α|∈N0 |α|∈N0

To√
see that this series converges, let Ur = {x ∈ R2 : kxk < r}, for some r > 0. We see that |Dα f (x)| ≤
2r
e for all x ∈ Ur . (Why?)
If we observe that √
rn α rn e 2r
lim · max sup |D f (ξ)| ≤ lim = 0,
n→∞ n! α∈Nd : |α|=n ξ∈Ur n→∞ n!
0

since n! grows faster than any exponential, we obtain from Theorem 8.112 that T f (x) = f (x) for any
x ∈ Ur . Since r was arbitrary the Taylor series of f converges for every point x ∈ R2 .
280

8.7 Multiple integrals

We now turn to the integral of real-valued functions f : Rd → R. As for one-dimensional integration, we
only discuss a valid definition here for continuous functions. Moreover, we first discuss integration
over boxes (or hyperrectangles) of the form R = [a1 , b1 ] × · · · × [ad , bd ]. Later we will consider also more
general domains, and in the next chapter we even discuss a more general approach to integration, which
gives us many (theoretical) tools to work with integrals (aka. averages). Here we focus on the actual
computation.
To keep presentation simple, we will mostly illustrate the two-dimensional case here and consider corre-
sponding examples. In this case, we denote the underlying rectangle by R = [a, b] × [c, d]. Note that,
similar to the univariate case, one can think of the integral as the volume that is enclosed between the
graph of the function, which represents a surface in R3 , and the x-y-plane, see Figure 59.

Figure 59: Volume under a surface

For such continuous functions defined on rectangles we can follow exactly the same lines as in Section 6.3
and define the integral as the limit of an average of function values.
However, since the possible domains are much more complicated in the multivariate case, we have to be a
bit more precise here, and actually need the precise definition of a Riemann integral. Let us only illustrate
the two-dimensional case. As for univariate integrals, we split the rectangle R = [a, b] × [c, d] into smaller
parts. These are obtained by partitioning each of the intervals [a, b] and [c, d] into smaller intervals, and
consider their Cartesian products. For this, assume we have a partition a = s0 < s1 < · · · < sn = b of
[a, b] and a partition c = t0 < t1 < · · · < tn = d of [c, d]. The n2 Cartesian products of these univariate
intervals, say R1 , . . . , Rn2 , are all of the form Rk = [si , si+1 ] × [tj , tj+1 ] for some i, j = 0, . . . , n, with area
|Rk | = (si+1 − si )(tj+1 − tj ). If we now bound the values of a function in each of these rectangles by the
smallest or largest one, we obtain the lower and upper sums as in Section 6.3, which are lower and upper
bounds for the value of the integral, if it exists, independent from the chosen partition. Here, we take in
a addition those partitions that lead to the largest and smallest value, respectively. That is, we consider
the lower and upper sums
2 2
n
X n
X
Ln (f ) := sup |Rk | min f (x) and Un (f ) := inf |Rk | max f (x) ,
x∈Rk x∈Rk
k=1 k=1

where the sup / inf are over all partitions as described above. It is not hard to see that Ln (f ) ≤ Un (f )
for arbitrary functions f . Moreover, Ln (f ) is monotonically increasing with n, and Un (f ) is decreasing,
which implies that both sequences converge. So, if their limits are the same, i.e., limn→∞ Ln (f ) =
limn→∞ Un (f ), then we define the integral of f by this common limit.
(The generalization to higher dimensions is straightforward.)
281

Qd
Definition 8.114 (Riemann-integrable functions). Let R = i=1 [ai , bi ] be a box and f : R → R
be a bounded function. Then, if
lim Ln (f ) = lim Un (f ),
n→∞ n→∞

we call f a (Riemann-)integrable function and define the integral of f over R by this common
limit, i.e., Z Z
f (x)dx := f (x1 , . . . , xd ) d(x1 , . . . , xd ) := lim Ln (f ).
R R n→∞

This definition is quite impractical as the involved limits, suprema and infima are hard to determine. We
will discuss shortly how to evaluate integrals easier. However, for continuous functions defined on a box
(or rectangle), the definition can be much simplified:
We can, again, define the value of the integral by the limit of cubature rules applied to f , see Remark 6.35.
Let us assume R = [a, b] × [c, d] and define the sums
n
(b − a)(d − c) X i j
Qn (f ) := f a + (b − a), c + (d − c) , (8.1)
n2 i,j=1
n n
Pn
see also (6.1). In the special case R = [0, 1]2 , we have Qn (f ) = n12 i,j=1 f ni , nj .

The following lemma shows that these sums (aka. averages) converge to the integral for continuous f . We
state it directly for higher dimensions. The modifications of the above definitions (and the proof sketch
below) are straightforward.

Qd
Lemma 8.115. Let R = i=1 [ai , bi ] and f : R → R be continuous. Then, f is integrable and
Z
f (x)dx = lim Qn (f ).
R n→∞

Sketch of proof. We use the same ideas as in the univariate case. Moreover, we only prove the case d = 2
with R = [0, 1]2 here. If we use an equidistant partition R1 , . . . , Rn2 of R, i.e., all Rk are of the form
j j+1 ∗
[ ni , i+1
n ] × [ n , n ] for some i, j = 1, . . . , n, and denote the corresponding lower and upper sum by Ln (f )
∗
and Un (f ), we obtain
L∗n (f ) ≤ Ln (f ) ≤ Un (f ) ≤ Un∗ (f ).
(Recall that Ln involves the supremum over all partitions and is therefore larger, similar for Un .)
Hence, f is integrable if limn→∞ L∗n (f ) = limn→∞ Un∗ (f ). For this, let `i = min{f (x) : x ∈ Ri } and
ui = max{f (x) : x ∈ Ri }. We obtain, for fixed ε > 0, that

|ui − `i | < ε.

for all i = 1, . . . , n2 , if n is large enough. (As in Lemma 6.28, we use here that f on a bounded set, here
the rectangle, is not only continuous but even uniformly continuous.) This yields that

n2
(b − a)(d − c) X
|L∗n (f ) − Un∗ (f )| < ε = (b − a)(d − c) ε
n2 i=1

for large enough n. Since this holds for all ε > 0, we obtain that the limits are equal.
To obtain this limit equals limn→∞ Qn (f ) note that, by the choice of the partition, we have L∗n (f ) ≤
Qn (f ) ≤ Un∗ (f ). The sandwich rule implies the result.

Although the above gives us a valid definition, it is not very handy when we want to compute integrals.
Unfortunately, there is no equivalent for the antiderivative for multivariate functions, which was,
282

together with the fundamental theorem of calculus (Theorem 6.38), the most powerful technique to
evaluate integrals. Luckily, we are again in a situation that allows to reduce everything to the evaluation
of one-dimensional integrals. This is (a special case of) Fubini’s theorem which is probably the most
important result related to multiple integrals.

Qd
Theorem 8.116 (Fubini). Let R = i=1 [ai , bi ] be a box and f : R → R be continuous. Then,
! ! !
Z Z b1 Z b2 Z bd
f (x)dx = ··· f (x1 , . . . , xd ) dxd ··· dx2 dx1 .
R a1 a2 ad

The order of the iterated integrals can be chosen arbitrary.

In the special case d = 2, with a function f : [a, b] × [c, d] → R, this reads

Z Z b Z d ! Z d Z b !
f (x)dx = f (x1 , x2 ) dx2 dx1 = f (x1 , x2 ) dx1 dx2 .
R a c c a

We usually omit the brackets as the order of integral signs and the dxi ’s should leave no room for
confusion. However, it is always beneficial to use brackets when things are not clear.

We do not discuss a proof here, because we come back to this important result in the next chapter, where
we prove that this is holds even more generally.

Fubini’s Theorem now allows us to use all the calculation rules for univariate integrals to obtain corre-
sponding results also for multiple integrals. In particular, in view of Lemma 6.31, we obtain the linearity
of the integral, i.e., for any continuous functions f, g : Rd → R and λ, µ ∈ R we have that
Z Z Z
λf (x) + µg(x) dx = λ f (x) dx + µ g(x) dx,
R R R

where R is some box. (To exercise formalism, verify this calculations on your own by using Fubini’s
theorem and linearity for one dimensional integrals.) Moreover, we obtain that a non-negative continuous
function is the zero function if and only if its integral is zero, and, as in Corollary 6.32, that we have the
triangle inequality Z Z
f (x) dx ≤ |f (x)| dx
R R
also for multiple integrals. Let us see some examples.

Example 8.117. We start with a continuous function of the form f (x1 , x2 ) = g(x1 )h(x2 ) on a rectangle
R = [a, b] × [c, d], which are called product functions. Such functions are nice to handle with Fubini’s
theorem since we have
Z Z bZ d Z bZ d
f (x) dx = f (x1 , x2 ) dx2 dx1 = g(x1 )h(x2 ) dx2 dx1
R a c a c
! ! Z !
Z b Z d Z b d
= g(x1 ) h(x2 ) dx2 dx1 = g(x1 ) dx1 h(x2 ) dx2 .
a c a c

That is, the integral of a product function is the product of the integrals.

To see a specific example, let f (x1 , x2 ) = cos(x1 ) sin(x2 ) and R = [0, π/2] × [0, π]. We want to calculate
Z Z π/2 Z π
f (x) dx = cos(x1 ) sin(x2 ) dx2 dx1 .
R 0 0
283

Since cos(x1 ) does not depend on x2 we are allowed to move it outside the inner integral. So,
Z π/2 Z π Z π/2 Z π
cos(x1 ) sin(x2 ) dx1 dx2 = cos(x1 ) sin(x2 ) dx2 dx1
0 0 0 0
! Z
Z π/2 π
= cos(x1 ) dx1 sin(x2 ) dx2 .
0 0

R π Rπ
However, these univariate integrals are easily evaluated to be 0
2
cos tdt = 1 and 0
sin tdt = 2, implying
Z
f (x) dx = 2.
R

Example 8.118. Let us consider the multivariate polynomial p(x1 , x2 ) = x21 x32 + x1 + x2 + 3 on R =
[0, 1]2 = [0, 1] × [0, 1]. We compute
Z Z 1Z 1
x21 x32 + x1 + x2 + 3 dx1 dx2

p(x) dx =
[0,1]2 0 0
Z 1 Z 1 Z 1 Z 1 Z 1
= x21 x32 dx1 + x1 dx1 + x2 dx1 + 3dx1 dx2
0 0 0 0 0
Z 1 Z 1 Z 1 Z 1 Z 1
= x32 x21 dx1 + x1 dx1 + x2 1dx1 + 3dx1 dx2
0 0 0 0 0
Z 1
31 1
= x2 + + x2 + 3 dx2
0 3 2
Z 1
1 3 7
= x2 + x2 + dx2
0 3 2
49
= .
12
Again, every calculation in this example can be reduced to one-dimensional integration theory.

Example 8.119. Clearly, Fubini’s theorem is also very helpful for more complicated multivariate func-
tions. Again, we can deduce everything to univariate integrals, but one should be careful with the different
variables appearing, and sometimes one should find the correct order of integration to get a simple solu-
tion.
Consider e.g. the function f (x1 , x2 ) = x1 cos(x1 x2 ) on R = [0, π]2 . We obtain
Z Z πZ π
f (x) dx = x1 cos(x1 x2 ) dx1 dx2
[0,π]2 0 0
Z π Z π

= x1
cos(x1 x2 ) dx2 dx1
0 0
Z π π
sin(x1 x2 )
= x1 dx1
0 x1 x2 =0
Z π Z π
sin(πx1 )
= x1 − 0 dx1 = sin(πx1 )dx1
0 x1 0
1 − cos(π 2 )
= .
π
Here, we computed first the integral w.r.t. x2 because the corresponding integrand ’looked simpler’. In the
same way we could have calculated the integral by starting with x1 . Then, we would need to work with
the antiderivative (w.r.t. t) of t cos(tx2 ), which would result in a more complicated calculation. However,
by Fubini’s theorem, the result would clearly be the same.
284

Clearly, it is not always the case that one has to integrate w.r.t. a box. However, general domains in
higher dimensions are even harder to tackle than already in the univariate case, and it is just not possible
to define integrals of arbitrary functions over arbitrary domains. We therefore introduce the following
rather general class of domains, which are those sets that can be assigned a volume (or area for d = 2)
by our definition of an integral (which is by now only defined over boxes). Recall that the indicator
function χA : Rd → R for a set A ⊂ Rd is defined by
(
1, x ∈ A,
χA (x) :=
0, x ∈/ A.

Definition 8.120 (Jordan-measurable set). Let A ⊂ Rd be a bounded domain, i.e., there is a box
R ⊂ Rd with A ⊂ R. We call A Jordan-measurable, if χA is Riemann-integrable.
In this case, we define the volume of A by
Z
vold (A) := χA (x) dx.
R

Moreover, we call a bounded function f : A → R integrable over A, if f · χA is integrable, and we

set Z Z
f (x) dx := f (x) · χA (x) dx.
A R

(Here we set (f · χA )(x) = 0 for x ∈

/ A.)

That is, we now also defined integrals over more general domains. However, as this kind of generalization
is the topic of the next chapter, we do not go into detail here and only discuss a specific class of domains,
which is anyhow quite usual in applications.

Definition 8.121 (Normal domains). A bounded set A ⊂ R2 of the form

A = (x1 , x2 ) ∈ R2 : x1 ∈ [a, b], ϕ1 (x1 ) ≤ x2 ≤ ψ1 (x1 )

for some a, b ∈ R and continuous functions ϕ1 , ψ1 : [a, b] → R, is called a normal domain.

(Strict inequalities are also allowed.)
In higher dimensions, we define inductively that A ⊂ Rd is called a normal domain if

A = (x0 , xd ) ∈ Rd : x0 ∈ A0 , ϕd (x0 ) ≤ xd ≤ ψd (x0 )

for some normal domain A0 ∈ Rd−1 and continuous functions ϕd , ψd : A0 → R.

(We used for a simpler notation x0 := (x1 , . . . , xd−1 ).)

Before we discuss easier examples, let us see how this definition has to be understood in higher dimensions.
It just means that the variables of the domain are restricted in a sequential way, meaning that the k-
th variable is contained in an interval that is bounded by some continuous functions of the first k − 1
variables. For example, for d = 3, a normal domain has the form
n o
(x, y, z) ∈ R3 : x ∈ [a, b], y ∈ ϕ1 (x), ψ1 (x) , z ∈ ϕ2 (x, y), ψ2 (x, y) .

We now want to compute integrals over normal domains. Let us start with the case d = 2. If we combine
Fubini’s theorem (on a box that contains A) with the fact that we can restrict the boundaries of the
integral, if the function is zero ’outside’, we obtain that normal domains are Jordan-measurable
and that the integral can be written in an easier form.
285

Lemma 8.122. Let A ⊂ R2 be a normal domain of the form

A = x ∈ R2 : x1 ∈ [a, b], ϕ(x1 ) ≤ x2 ≤ ψ(x1 )

where ϕ ≤ ψ and both are continuous functions. Then, the integral of an integrable function f : A → R
equals
Z Z b Z ψ(x1 )
f (x) dx = f (x1 , x2 ) dx2 dx1 .
A a ϕ(x1 )

In particular, the area of A equals

Z Z b
vol2 (A) = 1 dx = ψ(t) − ϕ(t) dt.
A a

We omit a formal proof.

For a normal domain A ⊂ R3 in three dimensions as above, we obtain that the integral of an integrable
function f : A → R can be computed by
Z Z b Z ψ1 (x) Z ψ2 (x,y)
f (x) dx = f (x, y, z) dz dy dx
A a ϕ1 (x) ϕ2 (x,y)

and corresponding formulas hold for the volume and in higher dimension.

Let us turn to some examples.

Example 8.123. For some given a ∈ [−1, 1], let us consider

A := x ∈ R2 : x1 ∈ [a, 1], x21 + x22 ≤ 1 ,

which describes a circular segment of the circle with radius 1 (centred in the origin), see Figure 60.
The second condition can be rewritten to a condition on x2 depending only on x1 , which leads to
q q
A = x ∈ R2 : x1 ∈ [a, 1], x2 ∈ − 1 − x21 , 1 − x21 .

This shows that A is a normal domain.

Figure 60: A circular segment.

286

To compute the area of A, i.e., vol2 (A), we calculate

Z Z 1 Z √1−x21 ! Z 1 q
1 dx = √ 1 dx2 dx1 = 2 1 − x21 dx1 .
A a − 1−x21 a
√ √
We know from Example 6.22 that an antiderivative of 1 − t2 is given by 21 t 1 − t2 + arcsin(t) . There-

fore, Z
π p
1 dx = − a 1 − a2 + arcsin(a).
A 2
In particular, for a = 0, we obtain that the area of a half-circle is
Z 1p
π
vol2 x ∈ R2 : x1 ∈ [0, 1], x21 + x22 ≤ 1 = 2

1 − t2 dt = ,
0 2
and the area of the full circle, which we obtain for a = −1, is
Z 1p
2 2 2

vol2 x ∈ R : x1 + x2 ≤ 1 = 2 1 − t2 dt = π.
−1

Example 8.124. In a similar way, we can compute the volume of the 3-dimensional unit ball
A := (x, y, z) ∈ R3 : x2 + y 2 + z 2 ≤ 1 .

The ball can again be written as a normal domain by

n h p p i h p p io
A = (x, y, z) : x ∈ [−1, 1], y ∈ − 1 − x2 , 1 − x2 z ∈ − 1 − x2 − y 2 , 1 − x2 − y 2 ,

and we obtain that

Z Z 1 Z √
1−x2 Z √1−x2 −y2 ! !
vol3 (A) = 1 dx = √ √ 1 dz dy dx
A −1 − 1−x2 − 1−x2 −y 2
Z 1 Z √1−x2 p
!
= 2 √ 1 − x2 − y 2 dy dx.
−1 − 1−x2
y
For the inner integral, we use the substitution t = √1−x 2
to obtain
Z Z 1 Z 1 p Z 1
1 dx = 2 (1 − x2 ) 1 − t2 dt dx = π (1 − x2 ) dx
A −1 −1 −1
4π
= ,
3
R1 √ π
where we used again, as above, that −1
1 − t2 dt = 2.

Calculating not only the volume of a set but the integral of a given function, works exactly along the
same lines.
Example 8.125. We want to integrate the function f (x1 x2 ) = x21 x22 over the triangle with corners
(0, 0), (1, 0) and (1, 1), see Figure 61. This set can be modeled as the normal domain A = {x ∈ R2 : x1 ∈
[0, 1], x2 ∈ [0, x1 ]}.
Using Fubini, we compute
Z Z 1 Z x1
f (x)dx = x21 x22 dx2 dx1
A 0 0
Z 1 Z x1
= x21 x22 dx2 dx1
0 0
Z 1
1 3
= x21 · x dx1
0 3 1
Z 1
1 1
= x51 dx1 = .
3 0 18
287

Figure 61: The triangular normal domain

This approach allows us to compute many integrals. However, for this we need to bring the domain in the
correct form as a normal domain, and then we need to compute successively all the univariate integrals,
which can be very time-consuming.
Such heavy computations can sometimes be avoided by bringing the integral under consideration in a
more familiar form. That is, we use some change of variables to transform an integral to another
integral we already know. As in the univariate case, this is done by integration by substitution,
which is one of the most powerful ways to compute (difficult) integrals.

Theorem 8.126 (Substitution rule). Let G ⊂ Rd be open and A ⊂ G be a bounded and Jordan-
measurable set. Moreover, let Φ : G → Rd be a continuously differentiable and injective function such
that either det JΦ (u) > 0 or det JΦ (u) < 0 for any u ∈ G, where JΦ (u) is the Jacobi matrix of Φ at
u ∈ G.
Then, Φ(A) is also Jordan-measurable and, for any bounded and continuous function f : Φ(A) → R,
we have Z Z
f (x) dx = f (Φ(u)) · |det JΦ (u)| du.
Φ(A) A

We say that we use the substitution x = Φ(u).

The proof is quite involved and we omit it here. Note that we need the additional assumption that f is
bounded, because we do not know in general that continuous functions on Jordan-measurable domains
are integrable. (One might think on the univariate example f (t) = 1/t on (0, 1].)

Let us see how this result can be applied.

Example 8.127 (Polar coordinates). One of the most classical application are polar coordinates,
which can be used to describe any point in R2 . Given some x ∈ R2 we determine the distance to the
origin, denoted by r := kxk > 0 and the angle w.r.t. the x1 -axis, denoted by θ ∈ [0, 2π). In other words,
x = (x1 , x2 ) can be written as x1 = r cos θ and x2 = r sin θ.
The mapping Φ(r, θ) = (r cos θ, r sin θ), which maps each radius r and angle θ to a point in R2 , is
continuously differentiable (w.r.t. r and θ) at any (r, θ) ∈ R2 =: G and

cos θ −r sin θ
det JΦ (r, θ) = det = r(cos2 θ + sin2 θ) = r > 0.
sin θ r cos θ
If we now want to find areas of or integrals over domains that can be easier described in polar coordinates,
it is beneficial to use the substitution x = Φ(r, θ).
p
For example, to compute the area of the annulus B := {x ∈ R2 : R1 ≤ x21 + x22 ≤ R2 } for some
0 < R1 < R2 < ∞, see Figure 62, one should realize that B = Φ(A) with
n o
A := (r, θ) : R1 ≤ r ≤ R2 , θ ∈ [0, 2π) ,
288

q
Figure 62: The annulus {x ∈ R2 : r ≤ x21 + x22 ≤ R}

which is clearly a normal domain.

We can therefore use the substitution rule with the function f ≡ 1 to compute the area
Z Z 2π Z R2 Z 2π Z R2
vol2 (B) = 1 dx = |det JΦ (r, θ)| drdθ = r drdθ = (R22 − R12 )π.
B 0 R1 0 R1

In the same way, we can use this formula to calculate the integral of functions f : R2 → R p
over such annuli.
1
For example, if we consider f (x) := kxk = (x21 + x22 )−1/2 on the set A = {x ∈ R2 : R1 ≤ x21 + x22 ≤ R2 }
with 0 < R1 < R2 < ∞, where it is clearly continuous, we obtain
Z Z 2π Z R2 Z 2π Z R2
1
f (x) dx = p r drdθ = 1 drdθ = 2π(R2 − R1 ).
B 0 R1 r2 cos2 θ + r2 sin2 θ 0 R1

Clearly, one can also consider circular sections of such annuli, i.e., sets of the form Φ(A) with

A = {(r, θ) : R1 ≤ r ≤ R2 , θ1 ≤ θ ≤ θ2 }

for some 0 < R1 < R2 and 0 ≤ θ1 < θ2 < 2π. Then, Theorem 8.126 shows that
Z Z θ2 Z R2
f (x) dx = f (r cos θ, r sin θ) · r drdθ.
Φ(A) θ1 R1

Example 8.128 (Linear mappings). Another important class of transformations Φ : Rd → Rd are linear
mappings. Recall that they are given by d × d-matrices, i.e., Φ(x) = T x for some T ∈ Rd×d . For such a
mapping, it is easy to compute that

JΦ (x) = T (independent of x).

Therefore, if det(T ) 6= 0, then we have that Φ is injective and that det(JΦ (x)) is smaller or larger than
zero for any x ∈ Rd . We can therefore use Theorem 8.126 to obtain,
Z Z
f (x) dx = |det(T )| f (T u)) du
T (A) A

for arbitrary continuous functions f : Rd → R and Jordan-measureable domains A ⊂ Rd . In particular,

for the function f ≡ 1, we obtain

vold T (A) = |det(T )| vold A .

For example, if one is interested in the area of the set

n o
A = (x1 , x2 ) ∈ R2 : 0 ≤ 2x1 − x2 ≤ 1, 2 ≤ x1 + x2 ≤ 4 ,
289

then we can use the substitution u1 := 2x1 − x2 and u2 := x1 + x2 , i.e., u = T x with

2 −1
T = .
1 1

We see that n o
T (A) = (u1 , u2 ) ∈ R2 : 0 ≤ u1 ≤ 1, 2 ≤ u2 ≤ 4 = [0, 1] × [2, 4]

and det(T ) = 3, and so

vol2 T (A) 1 · (4 − 2) 2
vol2 (A) = = = .
|det(T )| 3 3

Let us discuss one more involved example.

Example 8.129. Let us compute the integral of f (x, y) = xy over the set
n y o
B := (x, y) ∈ R2+ : 1 ≤ ≤ 2, 1 ≤ xy ≤ 2 .
x
Note that this function is well-defined and continuous due to the second condition.
To compute this integral we consider the substitution u := xy and v := xy, because the domain is just
A := {(u, v) : 1 ≤ u, v ≤ 2} = [1, 2]2 under this substitution. To apply Theorem 8.126 we need to find the
mapping Φ with Φ(u, v) = (x, y) such that with Φ(A) = B.
For this note that u = xy is equivalent to y = xu. Plugging this into v = xy we obtain v = x2 u, i.e.,
√
x = uv , and therefore y = uv. Therefore, the desired mapping is given by
p

p v
Φ(u, v) = √u on A = [1, 2]2 .
uv

(This mapping is continuously differentiable in an open set around A.) By Theorem 8.126 we obtain
Z 2Z 2
v √
Z Z r
f (x, y) d(x, y) = f , uv · |det JΦ (u, v)| d(u, v) = u · |det JΦ (u, v)| du dv.
B A u 1 1

It remains to compute the Jacobi matrix and its determinant, which are

− 12 u−3/2 v 1/2 12 u−1/2 v −1/2

!
−1
JΦ (u, v) = and det JΦ (u, v) = .
1 −1/2 1/2 1 1/2 −1/2 2u
2u v 2u v

(Verify this!) Therefore,

Z Z 2 Z 2 Z 2 Z 2
1 1 1
f (x, y) d(x, y) = u· du dv = 1 du dv = .
B 1 1 2u 2 1 1 2

Week 1 Lecture Slides
100% (1)
Week 1 Lecture Slides
30 pages
Trigonometry Nichod by Abhas Saini All Imp Results and Questions
No ratings yet
Trigonometry Nichod by Abhas Saini All Imp Results and Questions
59 pages
Advanced Mathematical Methods For Scientists and Engineers
100% (2)
Advanced Mathematical Methods For Scientists and Engineers
2,143 pages
Lecture Eight: Facility Location
No ratings yet
Lecture Eight: Facility Location
12 pages
Linear Programming Examples
No ratings yet
Linear Programming Examples
25 pages
Calculus 1 - 4
No ratings yet
Calculus 1 - 4
500 pages
Ec119 Notes
No ratings yet
Ec119 Notes
88 pages
Introductory Mathematical Analysis For Quantitative Finance: Daniele Ritelli and Giulia Spaletta
No ratings yet
Introductory Mathematical Analysis For Quantitative Finance: Daniele Ritelli and Giulia Spaletta
237 pages
Notes on Mathematical Analysis
No ratings yet
Notes on Mathematical Analysis
110 pages
MH1810 Notes 2023 (Part1)
No ratings yet
MH1810 Notes 2023 (Part1)
82 pages
LectureNotes137 Preview PDF
No ratings yet
LectureNotes137 Preview PDF
201 pages
UofTMAT137Lecture Notes
No ratings yet
UofTMAT137Lecture Notes
201 pages
Vk1e PDF
No ratings yet
Vk1e PDF
278 pages
MA1521+Lecture+Notes
No ratings yet
MA1521+Lecture+Notes
248 pages
4b FeherKosToth MathAnExII PDF
No ratings yet
4b FeherKosToth MathAnExII PDF
211 pages
Real Analysis Helpful Definitions and Theorems All +
No ratings yet
Real Analysis Helpful Definitions and Theorems All +
34 pages
José A. Cuesta - Calculus 1
No ratings yet
José A. Cuesta - Calculus 1
328 pages
Book Analyse
No ratings yet
Book Analyse
156 pages
Calc1 forInfAndStatStudents
No ratings yet
Calc1 forInfAndStatStudents
207 pages
1st Year Maths Notes
76% (34)
1st Year Maths Notes
48 pages
Analysis 1
No ratings yet
Analysis 1
248 pages
Lectures Compressed (3985) MAT135
No ratings yet
Lectures Compressed (3985) MAT135
300 pages
Advanced Mathematical Methods For Scientists and Engineers
No ratings yet
Advanced Mathematical Methods For Scientists and Engineers
2,143 pages
Lecture Notes To Transition To Advanced Mathematics - Mauch
No ratings yet
Lecture Notes To Transition To Advanced Mathematics - Mauch
1,451 pages
Precourse TUM Management & Technology 2019: Dr. Michael Kaplan Winter Term 2019/20
No ratings yet
Precourse TUM Management & Technology 2019: Dr. Michael Kaplan Winter Term 2019/20
8 pages
Maths
No ratings yet
Maths
49 pages
Master
No ratings yet
Master
367 pages
Master
No ratings yet
Master
367 pages
Buy ebook Basic Analysis I Introduction to Real Analysis Volume I Jirí Lebl cheap price
100% (1)
Buy ebook Basic Analysis I Introduction to Real Analysis Volume I Jirí Lebl cheap price
50 pages
(eBook PDF) An Introduction to Analysis 4th Edition by William R. Wadeinstant download
100% (3)
(eBook PDF) An Introduction to Analysis 4th Edition by William R. Wadeinstant download
44 pages
1172
No ratings yet
1172
45 pages
Apostol Sol
No ratings yet
Apostol Sol
94 pages
PDF Mathematical Analysis: Volume I Teo Lee Peng download
100% (2)
PDF Mathematical Analysis: Volume I Teo Lee Peng download
55 pages
MATH1021 Textbook
No ratings yet
MATH1021 Textbook
235 pages
Lebl - Basic Analysis
No ratings yet
Lebl - Basic Analysis
252 pages
Download Full Calculus of One and Many Variables Kenneth Kuttler PDF All Chapters
100% (2)
Download Full Calculus of One and Many Variables Kenneth Kuttler PDF All Chapters
55 pages
EMat
No ratings yet
EMat
154 pages
Mul Downey
No ratings yet
Mul Downey
189 pages
Basic Analysis I Introduction to Real Analysis Volume I Jirí Lebl - The latest ebook is available, download it today
100% (1)
Basic Analysis I Introduction to Real Analysis Volume I Jirí Lebl - The latest ebook is available, download it today
66 pages
Introduction To Mathematics
No ratings yet
Introduction To Mathematics
184 pages
Neunhauserer - 12 2 Beautiful Mathematical Theorems With Short Proofs
No ratings yet
Neunhauserer - 12 2 Beautiful Mathematical Theorems With Short Proofs
128 pages
A Primer of Real Analysis
No ratings yet
A Primer of Real Analysis
152 pages
Complete Download Calculus of One and Many Variables Kenneth Kuttler PDF All Chapters
100% (3)
Complete Download Calculus of One and Many Variables Kenneth Kuttler PDF All Chapters
65 pages
12 Beautiful Mathematical Theorems With Short Proofs
No ratings yet
12 Beautiful Mathematical Theorems With Short Proofs
128 pages
MAST10021 2023 Notes
No ratings yet
MAST10021 2023 Notes
209 pages
Math 185
No ratings yet
Math 185
106 pages
Get Mathematical Analysis: Volume I Teo Lee Peng free all chapters
100% (2)
Get Mathematical Analysis: Volume I Teo Lee Peng free all chapters
65 pages
[FREE PDF sample] (Ebook) Calculus of One and Many Variables by Kenneth Kuttler ebooks
100% (7)
[FREE PDF sample] (Ebook) Calculus of One and Many Variables by Kenneth Kuttler ebooks
67 pages
Analysis of Functions of A Single Variable
No ratings yet
Analysis of Functions of A Single Variable
258 pages
Math 147 Forrest Notes PDF
No ratings yet
Math 147 Forrest Notes PDF
416 pages
MAE101
No ratings yet
MAE101
100 pages
MATH 220 Course Notes 2022-09-06
No ratings yet
MATH 220 Course Notes 2022-09-06
133 pages
The Satisfiability Problem: Algorithms and Analyses
From Everand
The Satisfiability Problem: Algorithms and Analyses
Uwe Schöning
No ratings yet
Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications
From Everand
Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications
Gunter Bolch
5/5 (1)
The Finite Element Method for Electromagnetic Modeling
From Everand
The Finite Element Method for Electromagnetic Modeling
Gérard Meunier
No ratings yet
Symbolic Data Analysis: Conceptual Statistics and Data Mining
From Everand
Symbolic Data Analysis: Conceptual Statistics and Data Mining
Lynne Billard
No ratings yet
Dynamics for Engineers
From Everand
Dynamics for Engineers
Soumitro Banerjee
No ratings yet
Open Data Structures: An Introduction
From Everand
Open Data Structures: An Introduction
Pat Morin
4/5 (4)
Nano Mechanics and Materials: Theory, Multiscale Methods and Applications
From Everand
Nano Mechanics and Materials: Theory, Multiscale Methods and Applications
Wing Kam Liu
No ratings yet
Computer-Aided Modeling of Reactive Systems
From Everand
Computer-Aided Modeling of Reactive Systems
Warren E. Stewart
No ratings yet
Control Systems
From Everand
Control Systems
Francisco Luis Pagola y de las Heras
No ratings yet
Theory of Preliminary Test and Stein-Type Estimation with Applications
From Everand
Theory of Preliminary Test and Stein-Type Estimation with Applications
A. K. Md. Ehsanes Saleh
No ratings yet
Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control
From Everand
Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control
James C. Spall
4/5 (2)
Answers: Exercise 1.1
No ratings yet
Answers: Exercise 1.1
19 pages
Math 112 Module 10 Derivative of Trigonometric Functions
100% (1)
Math 112 Module 10 Derivative of Trigonometric Functions
30 pages
Function IIT JEE
No ratings yet
Function IIT JEE
4 pages
Diffraction Notes Aktu
No ratings yet
Diffraction Notes Aktu
9 pages
GATE Chemistry 2011
No ratings yet
GATE Chemistry 2011
15 pages
Problemas de Optimización
No ratings yet
Problemas de Optimización
19 pages
PEAK SEARCHING ALGORITHMS and APPLICATIO
No ratings yet
PEAK SEARCHING ALGORITHMS and APPLICATIO
8 pages
Differential Calculus
No ratings yet
Differential Calculus
4 pages
Game Theory With Mixed Strategy
No ratings yet
Game Theory With Mixed Strategy
16 pages
Bus 135 Practice Questions (Mid 2)
No ratings yet
Bus 135 Practice Questions (Mid 2)
5 pages
Application of Differential Calculus
No ratings yet
Application of Differential Calculus
22 pages
Improving Food Processing Using Modern Optimizatio
No ratings yet
Improving Food Processing Using Modern Optimizatio
37 pages
Applications of The Derivative: Critical Points: Relative Extremum: Maxima and Minima
No ratings yet
Applications of The Derivative: Critical Points: Relative Extremum: Maxima and Minima
5 pages
ANSYS Optimization Module Manual
100% (1)
ANSYS Optimization Module Manual
10 pages
Mathematics-Optional: by Venkanna Sir and Satya Sir
No ratings yet
Mathematics-Optional: by Venkanna Sir and Satya Sir
6 pages
4.1 Fuzzy Inference Systems (Mamdani) : Figure 4-1
No ratings yet
4.1 Fuzzy Inference Systems (Mamdani) : Figure 4-1
7 pages
SPE-19545-MS Predicting Wellbore Trajectory
No ratings yet
SPE-19545-MS Predicting Wellbore Trajectory
13 pages
(Bagajewicz) On The Generalized Benders Decomposition
No ratings yet
(Bagajewicz) On The Generalized Benders Decomposition
10 pages
MATH1009 Test 4 Solutions
No ratings yet
MATH1009 Test 4 Solutions
3 pages
Quiz 1 Problems and Solutions
No ratings yet
Quiz 1 Problems and Solutions
10 pages
Unconstrained Optimization
No ratings yet
Unconstrained Optimization
11 pages
HW9 PDF
No ratings yet
HW9 PDF
22 pages
MIT18 02SC Pset4 PDF
No ratings yet
MIT18 02SC Pset4 PDF
4 pages
Solutions Chapter 1 Calculus Adams 6th Edition
No ratings yet
Solutions Chapter 1 Calculus Adams 6th Edition
17 pages
Mat 1k03 Test1 Solutions
No ratings yet
Mat 1k03 Test1 Solutions
6 pages